Efficient way to prune a table

I have a table that retains strings to display in a select box. The select box is supposed to display only the most recently added 10 Strings (there is a field, tstamp, which holds the timestamp of the most recent change to a row).

So there's never a reason for the table to hold more than 10 strings. The issue -- my question -- is how do I keep only the most recent 10 records in the db, efficiently? This issue must be faced by many, and many times over. Rather than try to reinvent the wheel here, can anyone tell me how this sort of thing is typically handled? Thanks, RVince

We have a similar problem, and the two options we saw were: 1) cron job that ran either rails or sql command to prune the table 2) a function that runs when something else happens (like login/logout)

We ended up choosing 2, just because it was easier to implement (didn't
have to play around with cron jobs etc).

Cheers Simon

Simon, Is there a way to do it all in one sql statement? May I ask how you implement this in Ruby/.Rails, say, doing it when something occurs (like login/logout?) Thanks Rvince

Hi,

SearchResults is the model that we are cleaning up, so we do the following
on login

SearchResults.delete_all(['updated_at < ?', 1.week.ago])

Cheers Simon

Ah....wonderful. Exacly what I was hoping. I was afraid I would have to put out a series of straight sql, but once again, ruby/rails has a better way. Thanks Simon!

Hmmm, I think that just throwing out records older than week wont work for me -- I need to keep, say, 5 or 10 of them. How can I specify this? THanks -Rvince

RVince wrote:

Hmmm, I think that just throwing out records older than week wont work for me -- I need to keep, say, 5 or 10 of them. How can I specify this? THanks -Rvince

Look into using a limit clause on your delete operation.

Best,

Yes, but I still need to pare down he table size periodically to the N most recent records

I'm assuming you have an ActiveRecord model that is managing this table. If so, you could handle the delete in that model, perhaps with a callback (after_save). (Again, without knowing the details of the application, this may or may not be a good approach performance wise depending on how much activity is on the table.)

You could also just schedule a job to periodically clean it up and then define a default scope on the model that limits it to the ten most recent records. Assuming you have timestamp columns (updated_at specifically), this is a trivial sort/limit.

RVince wrote:

Yes, but I still need to pare down he table size periodically to the N most recent records

I know. That's why you want to use delete with limit (and order).

Best,

Yes, I can see there is no easy, rails-way to do this. In effect, I need to

1. read in the X most recent records (Select * from Chanelnotes order by tstamp DESC limit 10) 2. Delete the entire table (delete * in Channelnotes) 3. Do an insert on the ten records I read in.

There's no other way.

Doesn't "ORDER BY tstamp" need to be "ORDER BY tstamp DESC"? The latest 10 need to be captured, not the oldest 10.

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain fields in my tables -- a timestamp field, a crossref field -- which is a string that would be the index of the field in another db whose structure might be trying to follow mine or vice versa).

Additionally, I'm not even sure how to perform this in my rails app. I see an ActiveRecord.find_by_sql and count_by_sql but there is nothing more generic whic hwould allow me to perform an SQL delete statement out of rails -- is there?

Thanks for your help on this guys. I had a hard time getting my head around this and am most grateful to your help! -RVince.

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain fields in my tables -- a timestamp field, a crossref field -- which is a string that would be the index of the field in another db whose structure might be trying to follow mine or vice versa).

if your timestamp is called created_at or updated_at Rails will set it for you

Additionally, I'm not even sure how to perform this in my rails app. I see an ActiveRecord.find_by_sql and count_by_sql but there is nothing more generic whic hwould allow me to perform an SQL delete statement out of rails -- is there?

There is delete_all but that expects merely a set of sql conditions.

ActiveRecord::Base.connection.execute "..." allows you to execute arbitrary sql statements

ActiveRecord::Base.connection.delete is nearly identical to that (it calls execute) but you may prefer the explicitness of calling a method called delete, more importantly using the delete method will flush rails' sql cache.

Fred

The issue -- my question -- is how do I keep only the most recent 10 records in the db, efficiently?

For efficiency, might PStore or TokyoCabinet be more appropriate choices? All those db accesses might add up.

Ron

I’'ve taken this approach:

def self.prune_table if self.count > 1000 youngest_victim = find(:first, :select => :id, :order => ‘created_at desc’, :offset => 1000) delete_all([“id <= ?”,youngest_victim.id])

end end

My requirements are not particularly strict around the 1000. Just more or less around there. You could do the same with created_at I guess.

Any obvious issues with this approach?

Regards

Ivor