Ok, I tried searching for the answer to this but couldn't find a
thread about it, so here goes...
I'm using sessions that I am storing in the database, is there an
accepted rails way of deleting out the old sessions?
Unless I hear a better idea, I was going to write a simple class and
call it from a cronjob to delete sessions that have been inactive
after a specified period of time.
That will be processed for every request, which isn't really
necessary. It probably won't add THAT much overhead, but if you have a
high volume site you'd want to offload the session clearing into
something else.
* you could make a rake task that deletes sessions older than a
certain offset, and call that from cron
* put it into your authentication code, rather than a before_filter,
that way it'll only run when someone logs in
* your idea of a simple class that runs from cron, using script/
runner. Seems like a valid approach.
* use a scheduling gem/plugin.
No it won't. It has randomization code that causes it to not run
_most of the time_. This is exactly how session gc should be handled.
It will ramp up proportionally with traffic.
Actually that could be never or always, relying on random numbers to make decisions on whether to do something “most of the time” is a bad idea.
The pointers that were given by Sax were more valid options. I’d personally prefer the cron tab option, since you can run it on a regular and low activity time, it’s built-in and already running on any unix-based OS and thus requires no extra processes. It could even be a little script that runs outside of Rails, since it’s a bit of overkill to start a whole Rails instance just to delete some records in the sessions table.
Thx for the input guys, yea Im also not sure why I would want to run
the session removal on a random basis, it seems like using script/
runner is the way Im gonna go. I can't put it into the authorization,
fwiw because the application doesn't have any authorization layer. Im
gonna look into the scheduling gem.
No it won't. It has randomization code that causes it to not run
_most of the time_. This is exactly how session gc should be handled.
It will ramp up proportionally with traffic.
Actually that could be never or always, relying on random numbers to make
decisions on whether to do something "most of the time" is a bad idea.
Since quantum physics works entirely by probabilities (that is random
numbers) and microprocessors are built from semi-conductors which
operate because of the laws of quantum physics, it could be said that
any software is entirely dependent on the operation of random numbers.
Therefore however it is coded it is 'relying on random numbers to
make decisions on whether to do something'.
Seriously, though, to suggest that something coded using random
numbers to be executed 1% of the time may either never run or always
run is incorrect. Assuming it is correctly coded of course.
Well, since you are going on the philosophical tour here, there’s more than one random variable coming into play here. Not only the mod 10 result, but also the number of hits on the application, the time at which they hit the application etc. That’s not even playing with probabities, that’s just plain gambling.
All I was trying to point out, is that you have no way of knowing if and when the sessions table would be cleaned, just like you have no way of knowing if you have a chance of winning a game of bingo or the lotto, since you are bringing in a lot more variables than just the semi-random computer generated ones. You could hit it the first time, you could hit it twice in a row and you could wait days to hit it. The fact that you have a 10% chance or a 1% chance of hitting the right number is still a probability, not a certainty. When it comes to cleaning a table that just keeps piling up records that become stale, I do like to have some kind of guarantee that it will clean when I want it, not when quantum physics and random people surfing to my application decide it’s the right time.
I feel like I’m missing a major point here. Assuming the table is correctly range partitioned and indexed, most databases should be able to handle relatively large table sizes. I agree that is a best practice to archive old, unused data, but that can likely be done on a monthly basis, or less often, depending on traffic. Why would you need to consider a solution that “will ramp up proportionally with traffic”?
The original poster was asking for the correct way to clean the sessions table. Although there is no clear cut answer to that one, I personally feel random number generation is by no means the correct way to go. Yes, the database should be able to look up sessions very quickly, but as you pointed out, depending on traffic, it will eventually drain needless resources as the number of records increase, both in terms of server cycles and storage.
Now, until cookie-based storage became available, we used the database for session storage and used quite a few techniques over the years we’ve been developing Rails apps. As the number of applications increased, we started handling it differently. In rough lines, we used:
First couple of applications: before_filter triggered by authentication (or some other action that clearly had to do with sessions)
Cron tab that invokes script/runner during low traffic times (the problem here was that for each of the applications, a whole Rails instance was started and that consumed quite a bit of memory as the number of apps increased on the VPS we then had)
Cron tab that invoked the mysql command line and just went through all of the databases deleting sessions in one session
The last solution was really quick, used very little resources and worked fine during the time we actually needed it. It was a little bash script, nothing special, along the lines of:
mysql -h localhost -u[someuser-with-necessary-privileges] < sql_commands_file
where sql_commands_file just had a series of commands to clean the sessions:
USE databasename1
DELETE FROM sessions WHERE NOW() - updated_at > 3600
USE databasename2
DELETE FROM sessions WHERE NOW() - updated_at > 3600
USE databasename3
DELETE FROM sessions WHERE NOW() - updated_at > 3600
I think we cleaned it up a bit by just generating the whole sql commands sequence in bash using loop script, but you get the picture.
No it's not. It's not relying on random numbers in the sense you are
implying. The random numbers are just a way to implement a mod
percentage, as in not doing it "most of the time".
Look at the way PHP does session garbage collection for example. You
set a callback function that only works _some of the time_.
When using db driven sessions you don't want to clear out all the old
sessions all of the time. You just want a rolling table setup that
clears itself based on traffic flow.
yeah, my main point was that the method would be run for every
request. Probably not that many milliseconds in the grand scheme of
things, but why add any extra processing to your requests when you can
externalize it?
yeah, my main point was that the method would be run for every
request.
Just like your before_filter for user authentication.
Probably not that many milliseconds in the grand scheme of
things, but why add any extra processing to your requests when you can
externalize it?
Putting it in cron doesn't guarantee it will always find something to
delete. It just means you now have to maintain a cron entry external
to your actual app.
Putting it in cron doesn't guarantee it will always find something to
delete. It just means you now have to maintain a cron entry external
to your actual app.