Negative find : trying to find what is NOT there :-)

Hi,

Sorry about the confusing title. I'm feeling a bit frustrated about Ruby now..

Ok, I have a membership database appication. For one view, I would like to show all members who has NOT payed for the current term :slight_smile:

So, with the models member and fee_payment, (one member has many fee_payments) I try to get the exklusion by finding all members, finding all fee_payments for the current term, and exclude the members associated with a fee_payment in the fee_payments list from the list of members => the members who has not payed.

So, this is what I tried:

    @maybe_active_members = @club.members.find(:all,:conditions => ["is_active = ?",true ], :order => "surname" )

    @term = "VT2008" #Hard-coded term

   #Find all mayments from the current term

    @fps = FeePayment.find(:all, :conditions => ["term = ? AND club_id = ?", @term , @club])     #Get an array of the member associated with the fee_payment     @fpm = @fps.each {|f | f.member }

   #Exlude members who has payed from the list     @active_members = @maybe_active_members.reject {|m | @fpm.include?(m) }

Arrgh! What am I doing wrong? Is there a simpler way?

NOT very good with this (or any other) stuff..

/Fredrik

Sorry about the confusing title. I'm feeling a bit frustrated about Ruby now..

Ok, I have a membership database appication. For one view, I would like to show all members who has NOT payed for the current term :slight_smile:

So, with the models member and fee_payment, (one member has many fee_payments) I try to get the exklusion by finding all members, finding all fee_payments for the current term, and exclude the members associated with a fee_payment in the fee_payments list from the list of members => the members who has not payed.

So, this is what I tried:

   @maybe_active_members = @club.members.find(:all,:conditions => ["is_active = ?",true ], :order => "surname" )

   @term = "VT2008" #Hard-coded term

  #Find all mayments from the current term

   @fps = FeePayment.find(:all, :conditions => ["term = ? AND club_id = ?", @term , @club])    #Get an array of the member associated with the fee_payment    @fpm = @fps.each {|f | f.member }

  #Exlude members who has payed from the list    @active_members = @maybe_active_members.reject {|m | @fpm.include?(m) }

Arrgh! What am I doing wrong? Is there a simpler way?

NOT very good with this (or any other) stuff..

You want an "outer join" query. Join (SQL) - Wikipedia

You want to return all the members and their payment info for that term. If no payment, return blank data. Then filter on that.

It's doable with AR by using :include and :conditions appropriately.

-philip

Hi,

Thank you for your quick relply. Ok, I need an OUTER JOIN.. but how do I do one in rails? If I do:

@out = @club.members.find(:all,:conditions => [" term = ?" ,"VT2008" ], :include => :fee_payments)

I get the number of FeePayments correctly.. however, If I do

@out = @club.members.find(:all,:conditions => [" term != ?" ,"VT2008" ], :include => :fee_payments)

I get only 1 element (where there should be > 200 of them. Hmm.. how come?

/Fredrik

Thank you for your quick relply. Ok, I need an OUTER JOIN.. but how do I do one in rails? If I do:

@out = @club.members.find(:all,:conditions => [" term = ?" ,"VT2008" ], :include => :fee_payments)

You want this one and then filter the result only looking at cases where the fee_payment is nil.

What's the SQL in your dev log for this?

Hi!

Thanks for all your help. This became my solution to the problem (could not get outer joins to work):

    @maybe_active_members = @club.members.find(:all,:conditions => ["is_active = ?",true], :order => "surname" )

    @has_payed = @club.members.find(:all,:conditions => ["is_active = ? AND term = ?",true, @term ], :include => "fee_payments", :order => "surname" )

    @active_members = @maybe_active_members.reject {|m | @has_payed.include?(m) }

Seems to work (unless anyone sees a problem with the code..)

The SQL I got was

Member Load Including Associations (0.009988) SELECT members."id" AS t0_r0, members."given_name" AS t0_r1, members."surname" AS t0_r2, members."date_of_birth" AS t0_r3, members."gender" AS t0_r4, members."primary_adress" AS t0_r5, members."primary_zip" AS t0_r6, members."primary_city" AS t0_r7, members."primary_email" AS t0_r8, members."primary_telephone" AS t0_r9, members."primary_mobile" AS t0_r10, members."secondary_adress" AS t0_r11, members."secondary_zip" AS t0_r12, members."secondary_city" AS t0_r13, members."secondary_email" AS t0_r14, members."secondary_telephone" AS t0_r15, members."secondary_mobile" AS t0_r16, members."is_active" AS t0_r17, members."club_id" AS t0_r18, members."created_at" AS t0_r19, members."updated_at" AS t0_r20, fee_payments."id" AS t1_r0, fee_payments."member_id" AS t1_r1, fee_payments."club_id" AS t1_r2, fee_payments."term" AS t1_r3, fee_payments."created_at" AS t1_r4, fee_payments."updated_at" AS t1_r5 FROM members LEFT OUTER JOIN fee_payments ON fee_payments.member_id = members.id WHERE (members.club_id = 1 AND (is_active = 't' AND term = 'VT2008')) ORDER BY surname

/Fredrik

you should try to avoid hitting ruby to strip out the unnecessary records. Your database is streamlined to execute these types of queries, so try to use it before falling back on code.

In one of my apps, I've got users and orders. To display all the users who have not placed an order, I can use something like the following:

select id from users where not exists (select * from orders where user_id = users.id);

or

select id from users where id not in (select user_id from orders);

You'll have to use find_by_sql to use the above queries, but like I said, it's much faster than using code to iterate and reject your returned rows. Especially since if you look at your code, you're iterating over each element of @maybe_active_members (N elements in total), and for each of those elements, you're scanning through another list of N elements, in the array @has_payed. So you're executing a O(n^2) algorithm every time this code is run, which is bad, not to mention that you're running two sql queries before hand.

You may hear others say that premature optimization is wrong, and I agree, but in this case, I think it's important to understand the difference between tuning your query using sql (preferred) or just using brute force through ruby (last resort). So even if you don't make this change now, at least you'll know that in the future when you have hundreds of thousands of users, that this will potentially be a bottleneck.

Adam

Alternatively, using an ANSI style join:

select users.id from users left join orders on orders.user_id = users.id where orders.user_id is null

Additionally, you could abstract your hand crafted query to RoR through a database view or such. That way RoR doesn't need to know about your SQL and your SQL doesn't need to be embedded in RoR.

The best of both world really :slight_smile:

Hi,

I agree. I would like to reduce the DB access to one and perhaps to it in SQL instead. However, right now, I don't know what kind of DB I will be running this on. Also, for my application, the n woud reastivally be about 100. Did some simulations with generated members, and sorting 449 from 450 does not seem to be a problem. Even in sqlite on a tiny Macbook :slight_smile: So, I wil make a note of the possibility of focusing on this point when optimization is needed.. but continuing on to complete the app instead. :slight_smile: Database independence seems to be more important right now.

But, Thanks ! I just love the quality of feedback I get on this list.

/Fredrik

Out of curiosity, does your project contains any database specific DDL or DML? Perhaps some create table, index, partition, constrain, sequence, analytic, statistic, or such?

Or is it fully driven end-to-end by RoR's ORM? E.g. all the database artifacts, as well as the database itself is generated, managed by RoR, from cradle-to-grave?

If your project already have database specific artifacts, it is but a small step to re-factor some of the more advance DML away from RoR and expose them back to RoR in an ORM friendly way such as a view or cursor or whatnot.

After all, what's the point of an ORM but to shield you away from common SQL trivialities?

On the other hand, if an ORM starts mimicking most of SQL peculiarities, perhaps something has gone terribly wrong along the way :slight_smile:

Hi,

Acctualy, I have been able to not use any database specific structures (so far, knock on wood) and only use the fascilities that RoR ORM and migrations provides me with. I have donw simulations using 1.5 as many simutaneous objects as I could possible think would be used at the same time using this app, and there is a slight lag. However, since these simlations then involve a data set that is 8 times the size of the data set that would currently be handled :slight_smile: I will, I think, opt out of any early database specific optimizations for the time beeing.

For instance, adding storage in the parent model would easilly bring it back to O(n) rather than O(n*m), but make the app less beautiful :slight_smile:

/Fredrik

Fredrik Karlsson wrote:

Ok, I have a membership database appication. For one view, I would like to show all members who has NOT payed for the current term :slight_smile:

The discussion in this thread seems pretty useful, but here's how I tackled it using AR as much as possible.

http://blog.hasmanythrough.com/2006/9/9/finding-unassociated-objects