How to cap table with active record

We want to use SQL/active record for logging but limit the table size so that older messages disappear off of the table Some process has to run periodically to do that.

Suppose I want to keep my table size to not much bigger than 50,000,000 rows or so. What is the easiest, most efficient way to delete any extra rows that there may be ? This is an SQL/active record problem I have not encountered before.

  I would know in theory how to get all the records as an array by calling MyLog.find(:all) and ordering it by date and then iterating from where I want to chop off to the end and deleting each one, but that may not be the most efficient or acceptable way to do that.

Jedrin wrote in post #1058661:

We want to use SQL/active record for logging but limit the table size so that older messages disappear off of the table Some process has to run periodically to do that.

Suppose I want to keep my table size to not much bigger than 50,000,000 rows or so. What is the easiest, most efficient way to delete any extra rows that there may be ? This is an SQL/active record problem I have not encountered before.

I don't know the current state of using these with Rails and ActiveRecord, but it sound to me like what you need is a Round-Robin Database Storage Engine:

  I would know in theory how to get all the records as an array by calling MyLog.find(:all) and ordering it by date and then iterating from where I want to chop off to the end and deleting each one, but that may not be the most efficient or acceptable way to do that.

Using MyLog.find(:all) would be a really bad idea. Selecting all from a database table that has the potential of containing more than a few hundred records is almost never a good idea.

Besides that's not the right way to count records in a table any. That's why we have SQL count. Rails support count through aggregates:

You would also, certainly, not want to sort the results in memory as you suggest. You would instead as the database engine to do that for you:

Here's an approach that may work. Keep in mind this was put together quickly so you'll need to test it out for yourself.

MAX_TABLE_SIZE = 50000000 row_count = MyLog.count delete_limit = (row_count > MAX_TABLE_SIZE) ? row_count - MAX_TABLE_SIZE : 0 logs_to_delete = MyLog.order('created_at').limit(delete_limit) logs_to_delete.each do |log|   MyLog.delete(log) end

Some example SQL the above would generate: SELECT COUNT(*) FROM "my_logs" #returns 50000150 SELECT "my_logs".* FROM "my_logs" ORDER BY created_at LIMIT 150

DELETE FROM "my_logs" WHERE "my_logs"."id" = 1 DELETE FROM "my_logs" WHERE "my_logs"."id" = 2 DELETE FROM "my_logs" WHERE "my_logs"."id" = 3 ... DELETE FROM "my_logs" WHERE "my_logs"."id" = 150

Make a background job to run that daily. Probably still not the most efficient way to do it, but shouldn't be too bad if run often enough. I'm sure there's a way to do this without calling separate delete statements for each object, but I'll leave that as an exercise for the reader.

I sort of knew count() is what I might use even though I said find(:all), I haven't used count() in a long time and I forget the syntax and all or how to set it up (it used to have some special set up in the old rails), but it's still sort of the same problem.

The round robin sounds good, but I doubt it exists in the DB that we use (MS SQL Server) .. So we have to do it our self. Someone has said you can write a process that runs in SQL server itself. Not something I am familiar with, but I guess one of the guys I work with can help me out ..

If you have a reasonably consistent number of new records each day or week or whatever then you could delete old ones by date rather than count, so keeping six months worth of records for example. This would be much easier as you could just find the records where created_at is before a given date and delete them. Put an index on created_at obviously.

Colin

If your strategy is to just keep the newest MAXNUMBER records you might consider just adding an after_create method to your model.

Jedrin wrote in post #1058661:

We want to use SQL/active record for logging but limit the table size so that older messages disappear off of the table Some process has to run periodically to do that.

Here's something I discovered after posting my reply, but certainly worth considering.

Do you logging with MongoDB instead of a SQL database. MongoDB has built-in support for high performance logging scenarios:

See the following for more: