ActiveRecord Select

If I do this:

x = Client.select("DISTINCT(client_category)")

Then I get an array of relations. Instead, what I need is the entire
row (model instance attributes) for each of the returned relations.
What is the syntax to obtain the complete records?

I cannot find a discussion of the select method in the current AR API
(http://api.rubyonrails.org/classes/ActiveRecord/Base.html) and so I am
having some difficulty in turning this into something I can use.

Queries don’t really work that way. You can’t really select the full record when using distinct client_category.

Lets say you have the following rows:

client_category, client_name
Cat1, Foo
Cat1, Bar
Cat2, Other

If you are selecting distinct client_category, and want the full row, which of the two “Cat1” rows should it return?

Tim Shaffer wrote in post #975719:

Queries don't really work that way. You can't really select the full
record when using distinct client_category.

Lets say you have the following rows:

client_category, client_name
Cat1, Foo
Cat1, Bar
Cat2, Other

If you are selecting distinct client_category, and want the full row,
which
of the two "Cat1" rows should it return?

What I want to accomplish is to obtain one example of each value and the
entire row that contains it. How does one do that? I do not care which
row is returned but I only want one row for each distinct value.

Why? If you do not care which row is returned then presumably you do
not care what is in the rest of the row so why do you need it?

Colin

Colin Law wrote in post #975727:

Colin Law wrote in post #975727:

If you are selecting distinct client_category, and want the full row,
which
of the two "Cat1" rows should it return?

What I want to accomplish is to obtain one example of each value and the
entire row that contains it. How does one do that? I do not care which
row is returned but I only want one row for each distinct value.

Why? If you do not care which row is returned then presumably you do
not care what is in the rest of the row so why do you need it?

Colin

Because I want one representative row for each category.

I am interested to know why, when you do not care what the rest of the
row contains.

Colin

Colin Law wrote in post #975760:

I am interested to know why, when you do not care what the rest of the
row contains.

I have answered that question. I just want to quickly obtain one
representative row from each category. I can get the same result by
iterating over an array of possible values and doing a
find_by_category(x).first on each. I would prefer a less cumbersome
approach.

Do you know how to accomplish what I desire? If so then I would
appreciate it if you would tell me.

No I do not. I was hoping that if I understood in detail why you want
this that I could suggest an alternative approach. I still do not
understand why you might want one representative row from each
category when you do not care what the rest of the row contains.

Colin

James Byrne wrote in post #975768:

Colin Law wrote in post #975760:

I am interested to know why, when you do not care what the rest of the
row contains.

I have answered that question. I just want to quickly obtain one
representative row from each category. I can get the same result by
iterating over an array of possible values and doing a
find_by_category(x).first on each. I would prefer a less cumbersome
approach.

Do you know how to accomplish what I desire? If so then I would
appreciate it if you would tell me.

This is actually a common issue with Object-Relational-Mapping (ORM),
such as ActiveRecord. Each instance of an ActiveRecord subclass
represent a specific row in a database table. Objects have identity, and
in an ORM system identity is maintained by mapping the database row's
primary key to a specific ActiveRecord subclass instance.

What you have presented here is a SQL problem that doesn't not map
easily into an ORM system. This is what others have been attempting
implying. What you have presented here is a SQL problem I'd suggest
using SQL to solve it. You should be able to run any arbitrary SQL you
want. All you need is the database connection and execute your SQL.

I'm not going to go out of my way to try to figure out the required SQL
to accomplish your goal. I don't know of a way to solve your problem
through any ORM. You need to drop down to the SQL level to solve it if
it's really something your design requires.

This requirement in itself raises questions about your design in my
mind, but it's your project and you're closer to the problem domain.

Robert Walker wrote in post #975774:

This requirement in itself raises questions about your design in my
mind, but it's your project and you're closer to the problem domain.

This is not part of any design. I need a small sample of representative
test data from a live data set such that each row belongs to a different
category and all active categories are represented. Period.

If it is for testing you might be better using Machinist or Factory
Girl to generate the data. Then you can generate exactly the data
required to test each aspect of your app.

Colin

I need a small sample of representative test data from a live data set such that each row belongs to a different category and all active categories are represented. Period.
Now that we know your requirements, we can recommend a way to do it. Selecting distinct is not the solution to the requirements you posed.

Here is an example of code that would fit your needs:

records = []
Category.all.each do |category|
records << category.clients.first
end

Tim Shaffer wrote in post #975786:

Here is an example of code that would fit your needs:

records = []
Category.all.each do |category|
  records << category.clients.first
end

Yes, that is what I ended up doing, more or less. What I did was this:

cset = []
Client.select("DISTINCT(client_category)").each do |c|
cset << Client.find_by_client_category(c.client_category)
end

However, since I was in the console when I ran into this situation I was
hoping that there existed a more direct way of accomplishing it ( in
other words on a single line ) and that I was simply ignorant of the
method. Thus my question.

Thank all of you for your comments and help.

Colin Law wrote in post #975784:

...
This is not part of any design. I need a small sample of representative
test data from a live data set such that each row belongs to a different
category and all active categories are represented. Period.

If it is for testing you might be better using Machinist or Factory
Girl to generate the data. Then you can generate exactly the data
required to test each aspect of your app.

Even with a direct solution to your problem Colin, suggestion is still
quite valid. Testing using a live sampling of data can give a false
sense of confidence. The better approach to testing is to use a factory
to manufacture data specifically designed to test some aspect of your
model.

Testing should be automated, and exist to prevent the possibility of
invalid data from ever making it into the database. Not to validate
existing data that is already assumed to be valid.

To solve it in SQL you would (probably) need sub-queries or unions to
join distinct rows for each type of category... all very messy (as has
been hinted).

Comparing your solution to Tim's suggestion, I infer that you don't
have "client_category" as an association? (ie: client belongs_to
:category).
You could use the "group_by" method for collections, and select the
first off each group - that saves you doing a query per client, at the
cost of doing one large query and a load of Ruby iteration.

  Client.all.group_by(&:client_category).map(&:first)

... it's also easily chainable to write on one line (but you could
compress your own solution to one line with .inject instead of .select
- so it's not exactly much different my way, but possibly a little
more legible?)

At the end of the day, you're in the realms of doing some fudging to
get the data out you're after (as you've discovered), as you're not
doing something that's a natural fit for the DB or ORM. Sorry if
that's not massively helpful... just letting you know we feel your
pain.

Michael Pavling wrote in post #975879:

Comparing your solution to Tim's suggestion, I infer that you don't
have "client_category" as an association? (ie: client belongs_to
:category).

No, we do not have any association by that name. The data originates off
site and we do not know what client_categories are even possible or what
the codes mean.

You could use the "group_by" method for collections, and select the
first off each group - that saves you doing a query per client, at the
cost of doing one large query and a load of Ruby iteration.

  Client.all.group_by(&:client_category).map(&:first)

Could you explain the '&:' idiom to me? I cannot seem to find any
examples of it by googling.

At the end of the day, you're in the realms of doing some fudging to
get the data out you're after (as you've discovered), as you're not
doing something that's a natural fit for the DB or ORM. Sorry if
that's not massively helpful... just letting you know we feel your
pain.

Thanks. Misery shared is misery lessened.

Michael Pavling wrote in post #975879:

> You could use the "group_by" method for collections, and select the
> first off each group - that saves you doing a query per client, at the
> cost of doing one large query and a load of Ruby iteration.

> Client.all.group_by(&:client_category).map(&:first)

Could you explain the '&:' idiom to me? I cannot seem to find any
examples of it by googling.

foo(&:bar) is basically short hand for foo {|f| f.bar}. You'll
probably get more google hits by searching for Sym to_proc

Fred

Frederick Cheung wrote in post #975887:

foo(&:bar) is basically short hand for foo {|f| f.bar}. You'll
probably get more google hits by searching for Sym to_proc

Fred

Thanks. I indeed did have more success with 'Sym to proc'. Of course
the bar was set rather low given I found nothing the other way.

BTW I realized my original desire with a little ( actually a great deal
of ) help from the PostgreSQL list. The following works like a champ,
but only for PostgreSQL as far as I know.

> cset = Client.select("DISTINCT ON(client_category) *")
=> [ . . .
>
> cset.size
=> 6
> cset[0]
=> #<Client id: 587, dollar_value_total:
#<BigDecimal:2b1ebb23a910,'0.974E3',9(18)>, dollar_value_currency_code:
"CAD", expected_at: "2010-07-19 04:00:00", expected_to: "133",
client_category: "0492", office_code_for_unlading: " ",
service_priority: " ", service_option_identifier: "00257", . . .,
lock_version: 0>

The DISTINCT ON extension might be of some interest to those working in
PostgreSQl.

James Byrne wrote in post #975885:

Could you explain the '&:' idiom to me? I cannot seem to find any
examples of it by googling.

These two posts were helpful for me:
http://jlaine.net/2008/5/8/amp-lified
http://swaggadocio.com/post/287689063

P.S. The example JLaine uses to illustrate implicit conversion is
incorrect, the rest of the article is fine though.