ActiveRecord Select

If I do this:

x = Client.select("DISTINCT(client_category)")

Then I get an array of relations. Instead, what I need is the entire row (model instance attributes) for each of the returned relations. What is the syntax to obtain the complete records?

I cannot find a discussion of the select method in the current AR API (ActiveRecord::Base) and so I am having some difficulty in turning this into something I can use.

Queries don’t really work that way. You can’t really select the full record when using distinct client_category.

Lets say you have the following rows:

client_category, client_name Cat1, Foo Cat1, Bar Cat2, Other

If you are selecting distinct client_category, and want the full row, which of the two “Cat1” rows should it return?

Tim Shaffer wrote in post #975719:

Queries don't really work that way. You can't really select the full record when using distinct client_category.

Lets say you have the following rows:

client_category, client_name Cat1, Foo Cat1, Bar Cat2, Other

If you are selecting distinct client_category, and want the full row, which of the two "Cat1" rows should it return?

What I want to accomplish is to obtain one example of each value and the entire row that contains it. How does one do that? I do not care which row is returned but I only want one row for each distinct value.

Why? If you do not care which row is returned then presumably you do not care what is in the rest of the row so why do you need it?

Colin

Colin Law wrote in post #975727:

Colin Law wrote in post #975727:

If you are selecting distinct client_category, and want the full row, which of the two "Cat1" rows should it return?

What I want to accomplish is to obtain one example of each value and the entire row that contains it. How does one do that? I do not care which row is returned but I only want one row for each distinct value.

Why? If you do not care which row is returned then presumably you do not care what is in the rest of the row so why do you need it?

Colin

Because I want one representative row for each category.

I am interested to know why, when you do not care what the rest of the row contains.

Colin

Colin Law wrote in post #975760:

I am interested to know why, when you do not care what the rest of the row contains.

I have answered that question. I just want to quickly obtain one representative row from each category. I can get the same result by iterating over an array of possible values and doing a find_by_category(x).first on each. I would prefer a less cumbersome approach.

Do you know how to accomplish what I desire? If so then I would appreciate it if you would tell me.

No I do not. I was hoping that if I understood in detail why you want this that I could suggest an alternative approach. I still do not understand why you might want one representative row from each category when you do not care what the rest of the row contains.

Colin

James Byrne wrote in post #975768:

Colin Law wrote in post #975760:

I am interested to know why, when you do not care what the rest of the row contains.

I have answered that question. I just want to quickly obtain one representative row from each category. I can get the same result by iterating over an array of possible values and doing a find_by_category(x).first on each. I would prefer a less cumbersome approach.

Do you know how to accomplish what I desire? If so then I would appreciate it if you would tell me.

This is actually a common issue with Object-Relational-Mapping (ORM), such as ActiveRecord. Each instance of an ActiveRecord subclass represent a specific row in a database table. Objects have identity, and in an ORM system identity is maintained by mapping the database row's primary key to a specific ActiveRecord subclass instance.

What you have presented here is a SQL problem that doesn't not map easily into an ORM system. This is what others have been attempting implying. What you have presented here is a SQL problem I'd suggest using SQL to solve it. You should be able to run any arbitrary SQL you want. All you need is the database connection and execute your SQL.

I'm not going to go out of my way to try to figure out the required SQL to accomplish your goal. I don't know of a way to solve your problem through any ORM. You need to drop down to the SQL level to solve it if it's really something your design requires.

This requirement in itself raises questions about your design in my mind, but it's your project and you're closer to the problem domain.

Robert Walker wrote in post #975774:

This requirement in itself raises questions about your design in my mind, but it's your project and you're closer to the problem domain.

This is not part of any design. I need a small sample of representative test data from a live data set such that each row belongs to a different category and all active categories are represented. Period.

If it is for testing you might be better using Machinist or Factory Girl to generate the data. Then you can generate exactly the data required to test each aspect of your app.

Colin

I need a small sample of representative test data from a live data set such that each row belongs to a different category and all active categories are represented. Period. Now that we know your requirements, we can recommend a way to do it. Selecting distinct is not the solution to the requirements you posed.

Here is an example of code that would fit your needs:

records = Category.all.each do |category| records << category.clients.first end

Tim Shaffer wrote in post #975786:

Here is an example of code that would fit your needs:

records = Category.all.each do |category|   records << category.clients.first end

Yes, that is what I ended up doing, more or less. What I did was this:

cset = Client.select("DISTINCT(client_category)").each do |c| cset << Client.find_by_client_category(c.client_category) end

However, since I was in the console when I ran into this situation I was hoping that there existed a more direct way of accomplishing it ( in other words on a single line ) and that I was simply ignorant of the method. Thus my question.

Thank all of you for your comments and help.

Colin Law wrote in post #975784:

... This is not part of any design. I need a small sample of representative test data from a live data set such that each row belongs to a different category and all active categories are represented. Period.

If it is for testing you might be better using Machinist or Factory Girl to generate the data. Then you can generate exactly the data required to test each aspect of your app.

Even with a direct solution to your problem Colin, suggestion is still quite valid. Testing using a live sampling of data can give a false sense of confidence. The better approach to testing is to use a factory to manufacture data specifically designed to test some aspect of your model.

Testing should be automated, and exist to prevent the possibility of invalid data from ever making it into the database. Not to validate existing data that is already assumed to be valid.

To solve it in SQL you would (probably) need sub-queries or unions to join distinct rows for each type of category... all very messy (as has been hinted).

Comparing your solution to Tim's suggestion, I infer that you don't have "client_category" as an association? (ie: client belongs_to :category). You could use the "group_by" method for collections, and select the first off each group - that saves you doing a query per client, at the cost of doing one large query and a load of Ruby iteration.

  Client.all.group_by(&:client_category).map(&:first)

... it's also easily chainable to write on one line (but you could compress your own solution to one line with .inject instead of .select - so it's not exactly much different my way, but possibly a little more legible?)

At the end of the day, you're in the realms of doing some fudging to get the data out you're after (as you've discovered), as you're not doing something that's a natural fit for the DB or ORM. Sorry if that's not massively helpful... just letting you know we feel your pain.

Michael Pavling wrote in post #975879:

Comparing your solution to Tim's suggestion, I infer that you don't have "client_category" as an association? (ie: client belongs_to :category).

No, we do not have any association by that name. The data originates off site and we do not know what client_categories are even possible or what the codes mean.

You could use the "group_by" method for collections, and select the first off each group - that saves you doing a query per client, at the cost of doing one large query and a load of Ruby iteration.

  Client.all.group_by(&:client_category).map(&:first)

Could you explain the '&:' idiom to me? I cannot seem to find any examples of it by googling.

At the end of the day, you're in the realms of doing some fudging to get the data out you're after (as you've discovered), as you're not doing something that's a natural fit for the DB or ORM. Sorry if that's not massively helpful... just letting you know we feel your pain.

Thanks. Misery shared is misery lessened.

Michael Pavling wrote in post #975879:

> You could use the "group_by" method for collections, and select the > first off each group - that saves you doing a query per client, at the > cost of doing one large query and a load of Ruby iteration.

> Client.all.group_by(&:client_category).map(&:first)

Could you explain the '&:' idiom to me? I cannot seem to find any examples of it by googling.

foo(&:bar) is basically short hand for foo {|f| f.bar}. You'll probably get more google hits by searching for Sym to_proc

Fred

Frederick Cheung wrote in post #975887:

foo(&:bar) is basically short hand for foo {|f| f.bar}. You'll probably get more google hits by searching for Sym to_proc

Fred

Thanks. I indeed did have more success with 'Sym to proc'. Of course the bar was set rather low given I found nothing the other way.

BTW I realized my original desire with a little ( actually a great deal of ) help from the PostgreSQL list. The following works like a champ, but only for PostgreSQL as far as I know.

> cset = Client.select("DISTINCT ON(client_category) *") => [ . . . > > cset.size => 6 > cset[0] => #<Client id: 587, dollar_value_total: #<BigDecimal:2b1ebb23a910,'0.974E3',9(18)>, dollar_value_currency_code: "CAD", expected_at: "2010-07-19 04:00:00", expected_to: "133", client_category: "0492", office_code_for_unlading: " ", service_priority: " ", service_option_identifier: "00257", . . ., lock_version: 0>

The DISTINCT ON extension might be of some interest to those working in PostgreSQl.

James Byrne wrote in post #975885:

Could you explain the '&:' idiom to me? I cannot seem to find any examples of it by googling.

These two posts were helpful for me: http://jlaine.net/2008/5/8/amp-lified http://swaggadocio.com/post/287689063

P.S. The example JLaine uses to illustrate implicit conversion is incorrect, the rest of the article is fine though.