What is the most efficient way to split a table into 2 groups?

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two
statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13)
Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2)
Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts"
table with an ID of 2, then I grab all the records from the parts
table whose ID is not 2.

I played around a little with :include clauses, thinking that I
should, at least, be able to fetch the record from the "lots" table
and the "parts" table simultaneously (with something like a joins
clause and a "lots.part_id = parts.id" WHERE clause), but didn't get
anywhere with that.

I will end up leaving this the way it is (most likely), especially
since this isn't the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest? If so, what would
that code look like? I don't like this:

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted code!

Any thoughts, ideas, or snide remarks?

--wpd

I will end up leaving this the way it is (most likely), especially
since this isn't the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest? If so, what would
that code look like? I don't like this:

Try both, benchmark them - don't take some random person on the
internet's word for it!

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted code!

partition will do it in one

Fred

Patrick Doyle wrote:

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two
statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13)
Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2)
Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts"
table with an ID of 2, then I grab all the records from the parts
table whose ID is not 2.

I think what's happening here is this: you're calling @lot.part.id, so
Rails needs to load @lot.part, which accounts for the extra query. To
fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I played around a little with :include clauses, thinking that I
should, at least, be able to fetch the record from the "lots" table
and the "parts" table simultaneously (with something like a joins
clause and a "lots.part_id = parts.id" WHERE clause), but didn't get
anywhere with that.

:includes is useless here. :joins will do the trick.

I will end up leaving this the way it is (most likely), especially
since this isn't the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest?

That's what I think I'd do.

If so, what would
that code look like? I don't like this:

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That's going to iterate over all of the records. twice! in interpreted
code!

So just iterate once, and test for the special value as you go. If you
can't do that, then make 2 DB queries.

Any thoughts, ideas, or snide remarks?

--wpd

Best,

Frederick Cheung wrote:
[...]

That's going to iterate over all of the records. �twice! �in interpreted code!

partition will do it in one

Hey, that's good to know!

If you need to do the partitioning in the DB, you could use ORDER BY
abs(id - :magic_id). That way the first record will be the magic one.

Fred

Best,

partition will do it in one

Thanks Fred, that's exactly what I was looking for.

Try both, benchmark them - don't take some random person on the
internet's word for it!

Ahh, but you are not "some random person on the internet". You are an
active member of this community who regularly answers questions for
newbies and unfortunates who can be too lazy to look things up
themselves. As such, I have come to appreciate and respect your
answers, especially the ones that say, "the method for which you are
looking is #partition, but it's up to you to determine if it makes
things go faster or not" :slight_smile:

Thanks again for all the help.

--wpd

Oh that's clever! I like that!

Thanks.

--wpd

Patrick Doyle wrote:

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two
statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13)
Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2)
Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2)

It looked a bit silly to me -- first I grab a record from the "parts"
table with an ID of 2, then I grab all the records from the parts
table whose ID is not 2.

I think what's happening here is this: you're calling @lot.part.id, so
Rails needs to load @lot.part, which accounts for the extra query. To
fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I should let this be, but now I'm curious... what does :joins actually
do for me?

(in script/console)

l=Lot.find(13, :joins => :part)

produces (in the logfile)

SELECT "lots".* FROM "lots" INNER JOIN "parts" ON "parts".id =
"lots".part_id WHERE ("lots"."id" = 13)

and that looks like what I wanted. But when I follow that up with

l.part

I see in the logfile:

SELECT * from "parts" WHERE ("parts"."id" = 2)

so it seems that the inner join was wasted. Is this a development vs.
production thing? (I know, I'm not supposed to worry about
optimization now, but I figure if I learn how to write generally more
optimal code by default, it will only help).

If I try:

l=Lot.find(13, :include => :part)

I see in the logfile:

SELECT * FROM "lots" WHERE ("lots"."id" = 13)
SELECT * FROM "parts" WHERE ("parts".id" = 2)

so neither the :joins nor the :include options seem to help, although
the :include option prevents the subsequent database access when I
finally access l.part

Again, this is not stopping me from anything, and I really should
leave the optimization phase for later, but I am curious about the
intended behavior of :include and :joins, and puzzled by the apparent
behavior.

--wpd