What is the most efficient way to split a table into 2 groups?

Patrick_Doyle · September 28, 2009, 9:32pm

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts" table with an ID of 2, then I grab all the records from the parts table whose ID is not 2.

I played around a little with :include clauses, thinking that I should, at least, be able to fetch the record from the "lots" table and the "parts" table simultaneously (with something like a joins clause and a "lots.part_id = parts.id" WHERE clause), but didn't get anywhere with that.

I will end up leaving this the way it is (most likely), especially since this isn't the right phase of development to be worrying about optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some ruby code to select the one record from the rest? If so, what would that code look like? I don't like this:

everything = Part.all the_one = everything.select {|x| x.id == 2} the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted code!

Any thoughts, ideas, or snide remarks?

--wpd

Frederick_Cheung · September 28, 2009, 9:42pm

I will end up leaving this the way it is (most likely), especially since this isn't the right phase of development to be worrying about optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some ruby code to select the one record from the rest? If so, what would that code look like? I don't like this:

Try both, benchmark them - don't take some random person on the internet's word for it!

everything = Part.all the_one = everything.select {|x| x.id == 2} the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted code!

partition will do it in one

Fred

11175 · September 28, 2009, 10:07pm

Patrick Doyle wrote:

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts" table with an ID of 2, then I grab all the records from the parts table whose ID is not 2.

I think what's happening here is this: you're calling @lot.part.id, so Rails needs to load @lot.part, which accounts for the extra query. To fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I played around a little with :include clauses, thinking that I should, at least, be able to fetch the record from the "lots" table and the "parts" table simultaneously (with something like a joins clause and a "lots.part_id = parts.id" WHERE clause), but didn't get anywhere with that.

:includes is useless here. :joins will do the trick.

I will end up leaving this the way it is (most likely), especially since this isn't the right phase of development to be worrying about optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some ruby code to select the one record from the rest?

That's what I think I'd do.

If so, what would that code look like? I don't like this:

everything = Part.all the_one = everything.select {|x| x.id == 2} the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted code!

So just iterate once, and test for the special value as you go. If you can't do that, then make 2 DB queries.

Any thoughts, ideas, or snide remarks?

--wpd

Best,

11175 · September 28, 2009, 10:18pm

Frederick Cheung wrote: [...]

That's going to iterate over all of the records. �twice! �in interpreted code!

partition will do it in one

Hey, that's good to know!

If you need to do the partitioning in the DB, you could use ORDER BY abs(id - :magic_id). That way the first record will be the magic one.

Fred

Best,

Patrick_Doyle · September 29, 2009, 12:27pm

partition will do it in one

Thanks Fred, that's exactly what I was looking for.

Try both, benchmark them - don't take some random person on the internet's word for it!

Ahh, but you are not "some random person on the internet". You are an active member of this community who regularly answers questions for newbies and unfortunates who can be too lazy to look things up themselves. As such, I have come to appreciate and respect your answers, especially the ones that say, "the method for which you are looking is #partition, but it's up to you to determine if it makes things go faster or not"

Thanks again for all the help.

--wpd

Patrick_Doyle · September 29, 2009, 12:30pm

Oh that's clever! I like that!

Thanks.

--wpd

Patrick_Doyle · September 29, 2009, 4:29pm

Patrick Doyle wrote:

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts" table with an ID of 2, then I grab all the records from the parts table whose ID is not 2.

I think what's happening here is this: you're calling @lot.part.id, so Rails needs to load @lot.part, which accounts for the extra query. To fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I should let this be, but now I'm curious... what does :joins actually do for me?

(in script/console)

l=Lot.find(13, :joins => :part)

produces (in the logfile)

SELECT "lots".* FROM "lots" INNER JOIN "parts" ON "parts".id = "lots".part_id WHERE ("lots"."id" = 13)

and that looks like what I wanted. But when I follow that up with

l.part

I see in the logfile:

SELECT * from "parts" WHERE ("parts"."id" = 2)

so it seems that the inner join was wasted. Is this a development vs. production thing? (I know, I'm not supposed to worry about optimization now, but I figure if I learn how to write generally more optimal code by default, it will only help).

If I try:

l=Lot.find(13, :include => :part)

I see in the logfile:

SELECT * FROM "lots" WHERE ("lots"."id" = 13) SELECT * FROM "parts" WHERE ("parts".id" = 2)

so neither the :joins nor the :include options seem to help, although the :include option prevents the subsequent database access when I finally access l.part

Again, this is not stopping me from anything, and I really should leave the optimization phase for later, but I am curious about the intended behavior of :include and :joins, and puzzled by the apparent behavior.

--wpd