Large database table issue

Hi guys

I have a problem with a fairly standard Rails / mySQL query. I must be
doing something very wrong. I was hoping somebody could shed some
light on this.

First model: article, has_many comments
Second model: comment, belongs_to article

The number of articles and comments can be big (one to two millions).
Every once in a while, I "clean" the DB and remove older articles and
comments:

Articles.delete_all(["a_date < ?", xxxx]
Comments.delete_all("comments.article_id not in (select id from
articles)")

The first query is executed with no problems. But the second one hangs
and completely loads my database server.

Notes:
- I am not using :dependent => :delete since I was thinking doing a
destroy instead of delete on Articles would be too slow (my
understanding is when you "destroy", Rails loads the object first)
- I have killed all other queries on the DB, just to make sure there
are no locks somewhere.

Thanks!
Pierre

Delete the comments first?
Comments.delete_all("comments.article_id in (select id from articles
where a_date < ?)", xxxx)

Do you have an index on article_id? Id yo don't this will be super
slow. A subselect that this is probably going to be slow - i'd try a
left join instead, i.e. something like

Comment.joins("left join articles on articles.id =
article_id").where("articles.id is null").delete_all

An index on article id is still advisable

Fred

I have a problem with a fairly standard Rails / mySQL query. I must be
doing something very wrong. I was hoping somebody could shed some
light on this.

First model: article, has_many comments
Second model: comment, belongs_to article

The number of articles and comments can be big (one to two millions).
Every once in a while, I "clean" the DB and remove older articles and
comments:

Articles.delete_all(["a_date < ?", xxxx]
Comments.delete_all("comments.article_id not in (select id from
articles)")

The first query is executed with no problems. But the second one hangs
and completely loads my database server.

Innodb or myisam table types? If it's the latter, the table in question is going to get locked while the delete happens. If you're removing a lot of rows, regardless of how you index it, it's going to be slow. In addition to the other advice, you might clean the table up more often (so you're removing fewer rows).

Another option if you're willing to stick with mysql is to use their ??? (can't remember the name) feature. It lets you create what appears to be a normal table, but it actually splits it up into multiple tables based on one of the columns – in your case… the date. So when you remove the old entries you're not touching the "latest table". At least if I'm remembering things right.

-philip

I think you're talking about partitioning:

  http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

-J

Thanks a lot guys for your help.

I apologize I did not mention it, but I had the index on article_id
unfortunately.

Fred: I tried your LEFT JOIN method and indeed, when I EXPLAIN it the
second select is of type eq_ref (vs unique_subquery with my method) so
it should be faster. Yet, when I try it on a smaller test DB, the
difference seems only marginal I am afraid. It still goes through all
the comments rows it seems.

I just made a simple test directly in mySQL on my instance: "select
count(*) from comments" and even that one does not want to return! I
tried to restart mysql: same. I did not know this was even possible.

I realize it is now more a mySQL issue than a Rails one, but just in
case: have you guys ever been confronted to something like that?

I will try partitioning as well.

Thanks
Pierre

Thanks a lot guys for your help.

I apologize I did not mention it, but I had the index on article_id
unfortunately.

Fred: I tried your LEFT JOIN method and indeed, when I EXPLAIN it the
second select is of type eq_ref (vs unique_subquery with my method) so
it should be faster. Yet, when I try it on a smaller test DB, the
difference seems only marginal I am afraid. It still goes through all
the comments rows it seems.

I just made a simple test directly in mySQL on my instance: "select
count(*) from comments" and even that one does not want to return! I
tried to restart mysql: same. I did not know this was even possible.

Count(*) isn't magically fast in innodb - mysql has to do an index scan (versus being able to just read some table metadata for some db types).

You might try using show innodb status to see what is going on (there's another thing you can do to see more detailed info about locks being held, but I don't remember off the top of my head.

Another option might be to delete the comments before you delete the articles (so inner joining comments & articles with whatever condition on articles you use to determine what to delete).

Fred

i don’t think partitioning is going to help you if count(*) didn’t even work.

some basic questions for you on the comments table. is there a primary key? is there a separately indexed article_id? i’m sure you have them, but better to rule out the easy stuff first.

i was going to suggest using the exists() function rather than an outer join, but again it doesn’t sound like your main issue. (you should
learn to use the exists() function if you don’t know it. i’ve found it to work better.)

for the order of operations you are trying to do, i agree with an earlier suggestion to delete comments first, then articles. also don’t use “not in”. instead delete comments.* from comments where comments.article_id = @article"

(sorry if this shows up like 3 times, it is my first time posting to this group and it wasn’t working.)

Guys,

Thanks a lot for all your help and input. Just in case it can help
someone else one day: it turned out that my table was corrupted. It
was a real surprise to me since I was still able to "use it" (via
joins) and mySQL would not "complain".

I found out doing a "check table": it did not return an error but it
was crashing mysqld every time.

Thanks
Pierre