Efficient way to prune a table

I have a table that retains strings to display in a select box. The
select box is supposed to display only the most recently added 10
Strings (there is a field, tstamp, which holds the timestamp of the
most recent change to a row).

So there's never a reason for the table to hold more than 10 strings.
The issue -- my question -- is how do I keep only the most recent 10
records in the db, efficiently? This issue must be faced by many, and
many times over. Rather than try to reinvent the wheel here, can
anyone tell me how this sort of thing is typically handled? Thanks,
RVince

We have a similar problem, and the two options we saw were:
1) cron job that ran either rails or sql command to prune the table
2) a function that runs when something else happens (like login/logout)

We ended up choosing 2, just because it was easier to implement (didn't
have to play around with cron jobs etc).

Cheers
Simon

Simon, Is there a way to do it all in one sql statement? May I ask how
you implement this in Ruby/.Rails, say, doing it when something occurs
(like login/logout?) Thanks Rvince

Hi,

SearchResults is the model that we are cleaning up, so we do the following
on login

SearchResults.delete_all(['updated_at < ?', 1.week.ago])

Cheers
Simon

Ah....wonderful. Exacly what I was hoping. I was afraid I would have
to put out a series of straight sql, but once again, ruby/rails has a
better way. Thanks Simon!

Hmmm, I think that just throwing out records older than week wont work
for me -- I need to keep, say, 5 or 10 of them. How can I specify
this? THanks -Rvince

RVince wrote:

Hmmm, I think that just throwing out records older than week wont work
for me -- I need to keep, say, 5 or 10 of them. How can I specify
this? THanks -Rvince

Look into using a limit clause on your delete operation.

Best,

Yes, but I still need to pare down he table size periodically to the N
most recent records

I'm assuming you have an ActiveRecord model that is managing this
table. If so, you could handle the delete in that model, perhaps with
a callback (after_save). (Again, without knowing the details of the
application, this may or may not be a good approach performance wise
depending on how much activity is on the table.)

You could also just schedule a job to periodically clean it up and
then define a default scope on the model that limits it to the ten
most recent records. Assuming you have timestamp columns (updated_at
specifically), this is a trivial sort/limit.

RVince wrote:

Yes, but I still need to pare down he table size periodically to the N
most recent records

I know. That's why you want to use delete with limit (and order).

Best,

Yes, I can see there is no easy, rails-way to do this. In effect, I
need to

1. read in the X most recent records (Select * from Chanelnotes order
by tstamp DESC limit 10)
2. Delete the entire table (delete * in Channelnotes)
3. Do an insert on the ten records I read in.

There's no other way.

Doesn't "ORDER BY tstamp" need to be "ORDER BY tstamp DESC"? The
latest 10 need to be captured, not the oldest 10.

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain
fields in my tables -- a timestamp field, a crossref field -- which is
a string that would be the index of the field in another db whose
structure might be trying to follow mine or vice versa).

Additionally, I'm not even sure how to perform this in my rails app. I
see an ActiveRecord.find_by_sql and count_by_sql but there is nothing
more generic whic hwould allow me to perform an SQL delete statement
out of rails -- is there?

Thanks for your help on this guys. I had a hard time getting my head
around this and am most grateful to your help! -RVince.

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain
fields in my tables -- a timestamp field, a crossref field -- which is
a string that would be the index of the field in another db whose
structure might be trying to follow mine or vice versa).

if your timestamp is called created_at or updated_at Rails will set it
for you

Additionally, I'm not even sure how to perform this in my rails app. I
see an ActiveRecord.find_by_sql and count_by_sql but there is nothing
more generic whic hwould allow me to perform an SQL delete statement
out of rails -- is there?

There is delete_all but that expects merely a set of sql conditions.

ActiveRecord::Base.connection.execute "..." allows you to execute
arbitrary sql statements

ActiveRecord::Base.connection.delete is nearly identical to that (it
calls execute) but you may prefer the explicitness of calling a method
called delete, more importantly using the delete method will flush
rails' sql cache.

Fred

The issue -- my question -- is how do I keep only the most recent 10
records in the db, efficiently?

For efficiency, might PStore or TokyoCabinet be more appropriate
choices? All those db accesses might add up.

Ron

I’'ve taken this approach:

def self.prune_table
if self.count > 1000
youngest_victim = find(:first, :select => :id, :order => ‘created_at desc’, :offset => 1000)
delete_all([“id <= ?”,youngest_victim.id])

end
end

My requirements are not particularly strict around the 1000. Just more or less around there. You could do the same with created_at I guess.

Any obvious issues with this approach?

Regards

Ivor