Objects with versions: how to avoid 1 + n queries?

Firstly, I've asked a similar question before[*]. I didn't get any
answers back then, maybe now I have better luck

Let's say I have models like this

  class Article < ActiveRecord::Base
    has_many :versions
  end

  class Version < ActiveRecord::Base
    belongs_to :article
    default_scope order('updated_at')
    scope :published, where(:state => 'published')
    validates :state, :inclusion => { :in => ['draft', 'published'] }
    validates :title, :content, :presence => true
  end

It's easy to find the latest version for an article, just

  article.versions.last

The same for published versions isn't much more complicated

  articles.versions.published.last

Of course, I'm not only dealing with single articles

  Article.all

Then, in a articles/_article.html.erb

  <%= article.version.last.title %>

Oops! That triggers another database access for each article.

Now, I know how to deal with this in SQL, with either a correlated sub-
select or an even more complicated left outer join. I've forced the
first technique into ActiveRecord wrapping, which isn't fun. Far from
being any help. ARec gets in the way. I've dabbled with ARel which was
even worse, but may be due to my inexperience with it.

Currently, the nicest solution I can think of is to accept the SQL and
go native: i.e., define a view in the database and a read-only model on
top of it.

I appreciate any suggestions how to do this *elegantly* in ARec.

Michael

[*] 2010-06-20, ActiveRecord and ARel: correlated subqueries?

It's hard to answer your post, as your only question is in the
subject. Everything else you typed was statement, so I can't be sure
what bit you want addressed. But I'll assume (and please correct me if
I'm wrong):

Does "include" help your problem?

@articles = Article.all(:include => :versions)

(this would go in whatever controller action has a view that renders
the articles/_article.html.erb partial with a collection @articles)

Michael Schuerig wrote:

Firstly, I've asked a similar question before[*]. I didn't get any
answers back then, maybe now I have better luck

Let's say I have models like this

  class Article < ActiveRecord::Base
    has_many :versions
  end

  class Version < ActiveRecord::Base
    belongs_to :article
    default_scope order('updated_at')
    scope :published, where(:state => 'published')
    validates :state, :inclusion => { :in => ['draft', 'published'] }
    validates :title, :content, :presence => true
  end

Side question: why aren't you using one of the existing version plugins?
It's possible that this problem has already been solved.

It's easy to find the latest version for an article, just

  article.versions.last

The same for published versions isn't much more complicated

  articles.versions.published.last

Your data model is smelly. You should probably have the version
metadata in the Article record, not in a separate table. With your
model, all your Article data fields are really in the Version object.
That's cumbersome, as you're finding out.

Of course, I'm not only dealing with single articles

  Article.all

Then, in a articles/_article.html.erb

  <%= article.version.last.title %>

Oops! That triggers another database access for each article.

Not if you had done Article.all :joins => :versions in the first place.

Now, I know how to deal with this in SQL, with either a correlated sub-
select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need anything
more complex?

I'd consider subqueries a code smell. There are some few cases where
they're useful, but 9 times out of 10 you probably really wanted a join.

I've forced the
first technique into ActiveRecord wrapping, which isn't fun. Far from
being any help. ARec gets in the way.

In this case, AR is getting in the way of something you shouldn't have
to be doing anyway. There's no need to outsmart it here.

I've dabbled with ARel which was
even worse, but may be due to my inexperience with it.

Likely. I've never used Arel, but my impression from the docs is that
it could be quite helpful in this sort of case.

Currently, the nicest solution I can think of is to accept the SQL and
go native: i.e., define a view in the database and a read-only model on
top of it.

You could do that, but it seems like overkill. See above for more
reasonable solutions.

I appreciate any suggestions how to do this *elegantly* in ARec.

Stop overthinking. Stop fighting Rails. Stop trying to use such a poor
data model.

Michael

[*] 2010-06-20, ActiveRecord and ARel: correlated subqueries?
--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/

Best,

Then let me make the question completely explicit: How do I load several
articles with their latest versions from the database using only a
single request while retaining elegant code?

Michael

Michael Schuerig wrote:
> Firstly, I've asked a similar question before[*]. I didn't get any
> answers back then, maybe now I have better luck
>
> Let's say I have models like this
>
> class Article < ActiveRecord::Base
>
> has_many :versions
>
> end
>
> class Version < ActiveRecord::Base
>
> belongs_to :article
> default_scope order('updated_at')
> scope :published, where(:state => 'published')
> validates :state, :inclusion => { :in => ['draft', 'published']
> } validates :title, :content, :presence => true
>
> end

Side question: why aren't you using one of the existing version
plugins? It's possible that this problem has already been solved.

Because none of them fit, last time I looked. I need to interact with
several versions at the same time.

> It's easy to find the latest version for an article, just
>
> article.versions.last
>
> The same for published versions isn't much more complicated
>
> articles.versions.published.last

Your data model is smelly. You should probably have the version
metadata in the Article record, not in a separate table. With your
model, all your Article data fields are really in the Version object.
That's cumbersome, as you're finding out.

The data model expresses what I need. Versions of the same article can
have different titles. It is likely that an older version of an article
is already published whereas the latest version is still in draft state.

> Of course, I'm not only dealing with single articles
>
> Article.all
>
> Then, in a articles/_article.html.erb
>
> <%= article.version.last.title %>
>
> Oops! That triggers another database access for each article.

Not if you had done Article.all :joins => :versions in the first
place.

:joins with a symbol does an inner join. When there are several draft
versions, which one would I get? Answer: any. I want a specific one: the
latest.

> Now, I know how to deal with this in SQL, with either a correlated
> sub- select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need
anything more complex?

Before you claim that I'm wrong, please consider the possibility that
you don't understand what I'm trying to do. Thank you.

Stop overthinking. Stop fighting Rails. Stop trying to use such a
poor data model.

Stop being overly confident of your opinion when you don't understand
the situation. If you think my problem statement was not precise enough,
you could have asked for the details you thought were missing. No need
to start out with bold and unwarranted statements.

Michael

Michael Schuerig wrote:

> end

[...]

model, all your Article data fields are really in the Version object.
That's cumbersome, as you're finding out.

The data model expresses what I need.

Maybe not. If you're finding it this difficult to use, I wonder.

Versions of the same article can
have different titles. It is likely that an older version of an article
is already published whereas the latest version is still in draft state.

That doesn't really invalidate my earlier suggestion, I think, or at
least a modification of it.

Not if you had done Article.all :joins => :versions in the first
place.

:joins with a symbol does an inner join.

So use an SQL fragment if you need an outer join (which, on reflection,
I suppose you do).

When there are several draft
versions, which one would I get? Answer: any. I want a specific one: the
latest.

You can specify conditions and sort order on fields from joined tables.
That should do the trick.

> Now, I know how to deal with this in SQL, with either a correlated
> sub- select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need
anything more complex?

Before you claim that I'm wrong, please consider the possibility that
you don't understand what I'm trying to do. Thank you.

I did consider that possibility. If you think I'm misunderstanding you,
please tell me what I got wrong.

Stop overthinking. Stop fighting Rails. Stop trying to use such a
poor data model.

Stop being overly confident of your opinion when you don't understand
the situation. If you think my problem statement was not precise enough,
you could have asked for the details you thought were missing.

Your problem statement was plenty precise -- precise enough for me to
determine that you may have a modeling problem.

Now, if you think there's something I don't understand, please tell me
what it is and I'll modify my suggestions accordingly.

No need
to start out with bold and unwarranted statements.

I don't believe anything I said was unwarranted. You came here looking
for advice, right? I gave you the best advice I know how to. You may
not like it, but please don't just make nebulous "you don't understand"
statements without saying what you think I missed. The best advice is
not always the advice you wanted to hear.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/

Best,

I had a couple more ideas just after I posted.

Marnen Laibow-Koser wrote:

Michael Schuerig wrote:

[...]

:joins with a symbol does an inner join.

So use an SQL fragment if you need an outer join (which, on reflection,
I suppose you do).

On further reflection, I don't know why you would need an outer join
unless not every Article has a Version associated. Is that the case?

When there are several draft
versions, which one would I get? Answer: any. I want a specific one: the
latest.

You can specify conditions and sort order on fields from joined tables.
That should do the trick.

Another (possibly zany) idea: group the versions by article_id.

Are you ultimately trying to retrieve the latest version only for each
article? That's what you imply, but a couple of things you wrote made
me thing that you might want to retrieve multiple versions. Which is
it?
[...]

Best,

I had a couple more ideas just after I posted.

Marnen Laibow-Koser wrote:
> Michael Schuerig wrote:
[...]

>> :joins with a symbol does an inner join.
>
> So use an SQL fragment if you need an outer join (which, on
> reflection, I suppose you do).

On further reflection, I don't know why you would need an outer join
unless not every Article has a Version associated. Is that the case?

I need an *inner* join with the latest of several versions, possibly
additionally meeting the condition of being in state "published".

>> When there are several draft
>> versions, which one would I get? Answer: any. I want a specific
>> one: the latest.
>
> You can specify conditions and sort order on fields from joined
> tables. That should do the trick.

Another (possibly zany) idea: group the versions by article_id.

Are you ultimately trying to retrieve the latest version only for
each article? That's what you imply, but a couple of things you
wrote made me thing that you might want to retrieve multiple
versions. Which is it?
[...]

If you are unsure of this, that is the question you ought to have asked
before even starting to make suggestions. The answer: Yes, I'm trying to
display lists of either only the latest version or the only latest
published version of each article, if any such version exists.

To find the latest of several things, you need to compare them (or have
the comparison pre-packaged in an index). In the context of a select
statement for articles, finding the latest version can take the form of
a correlated sub-select:

  SELECT articles.*, versions.* FROM articles
    JOIN versions ON articles.id = versions.article_id
   WHERE versions.updated_at =
         (SELECT MAX(v.updated_at) FROM versions AS v
           WHERE v.article_id = articles.id)

With an index on versions(article_id, updated_at) this isn't even too
bad on the database. There are other ways with uncorrelated sub-selects
and outer joins.

Let me emphasize that my problem is *not* getting the right data out of
the database. I can get it in a single query. And I can make
ActiveRecord do this. In order to get there, I had to write more literal
SQL than I would have liked, for instance the entire condition
containing the sub-select. AFAICT, ARel doesn't make things any easier,
it only mirrors the structure of literal, but plain SQL in convoluted
Ruby.

As I wrote already in closing of my original question: I appreciate any
suggestions how to do this elegantly in ARec.

Michael

Michael Schuerig wrote:

On further reflection, I don't know why you would need an outer join
unless not every Article has a Version associated. Is that the case?

I need an *inner* join with the latest of several versions, possibly
additionally meeting the condition of being in state "published".

Then I'm not sure why :joins won't do the trick.

each article? That's what you imply, but a couple of things you
wrote made me thing that you might want to retrieve multiple
versions. Which is it?
[...]

If you are unsure of this, that is the question you ought to have asked
before even starting to make suggestions.

The answer doesn't make a heck of a lot of difference to my suggestions,
except in the last layer of details in the join, which did not
originally concern me. :slight_smile:

The answer: Yes, I'm trying to
display lists of either only the latest version or the only latest
published version of each article, if any such version exists.

To find the latest of several things, you need to compare them (or have
the comparison pre-packaged in an index).

Of course.

In the context of a select
statement for articles, finding the latest version can take the form of
a correlated sub-select:

  SELECT articles.*, versions.* FROM articles
    JOIN versions ON articles.id = versions.article_id
   WHERE versions.updated_at =
         (SELECT MAX(v.updated_at) FROM versions AS v
           WHERE v.article_id = articles.id)

Yes, that solution had occurred to me. But the subquery is unnecessary,
I think:

[outer SELECT and JOIN as above]
ORDER BY article.id, version.created_at desc
GROUP BY article.id
[further JOINs may be necessary in some DBs, but the principle holds]

This will give equivalent results and may be more ActiveRecord-friendly.

OTOH, I *have* once or twice used subqueries with AR. It isn't my
favorite thing to do, but it is possible if absolutely necessary (which
I don't think it is here).

[...]

As I wrote already in closing of my original question: I appreciate any
suggestions how to do this elegantly in ARec.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/

Best,

Michael Schuerig wrote:
>> On further reflection, I don't know why you would need an outer
>> join unless not every Article has a Version associated. Is that
>> the case?
>
> I need an *inner* join with the latest of several versions,
> possibly additionally meeting the condition of being in state
> "published".

Then I'm not sure why :joins won't do the trick.

Because :joins, doing a simple inner join on articles and versions,
creates a relation with a row for each version. I want one row, made up
from an article with its latest (published) version. What condition
would you put on the join, apart from corresponding keys, in order to
achieve that?

> The answer: Yes, I'm trying to
> display lists of either only the latest version or the only latest
> published version of each article, if any such version exists.
>
> To find the latest of several things, you need to compare them (or
> have the comparison pre-packaged in an index).

Of course.

> In the context of a select
> statement for articles, finding the latest version can take the
> form of
>
> a correlated sub-select:
> SELECT articles.*, versions.* FROM articles
>
> JOIN versions ON articles.id = versions.article_id
>
> WHERE versions.updated_at =
>
> (SELECT MAX(v.updated_at) FROM versions AS v
>
> WHERE v.article_id = articles.id)

Yes, that solution had occurred to me. But the subquery is
unnecessary, I think:

[outer SELECT and JOIN as above]
ORDER BY article.id, version.created_at desc
GROUP BY article.id
[further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version? Before you go and
suggest using MAX(updated_at) on the groups, consider how groups work or
get Bill Karwin's "SQL Antipatterns" and read ch. 15, "Ambiguous
Groups".

Also, for any suggestions containing an ORDER BY versions.updated_at in
the outer SELECT: The sorting is applied at the very end, not somewhere
in between

Michael

Michael Schuerig wrote:

Then I'm not sure why :joins won't do the trick.

Because :joins, doing a simple inner join on articles and versions,
creates a relation with a row for each version. I want one row, made up
from an article with its latest (published) version. What condition
would you put on the join, apart from corresponding keys, in order to
achieve that?

Probably the group strategy mentioned below. That will work with
:joins.

[...]

[outer SELECT and JOIN as above]
ORDER BY article.id, version.created_at desc
GROUP BY article.id
[further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version?

Sorry, I meant versions.updated_at desc in the order clause, not
created_at. And the way it works is this: the ORDER clause sorts the
versions by article ID, then by timestamp, ensuring that versions for
the same article are together with the newest first. Then the GROUP
clause returns only one record -- the first, ergo the newest -- for each
article ID.

Before you go and
suggest using MAX(updated_at) on the groups,

It's not necessary to do that. That was sort of the whole point.

consider how groups work or
get Bill Karwin's "SQL Antipatterns" and read ch. 15, "Ambiguous
Groups".

Not familiar with the book, though I'll keep my eye out for a copy. But
is there anything ambiguous here, now that I've corrected the field
name?

Also, for any suggestions containing an ORDER BY versions.updated_at in
the outer SELECT: The sorting is applied at the very end, not somewhere
in between

Right. Did I provide any such versions? I think not, because that is
blindingly obvious to me.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/

Best,

Marnen Laibow-Koser wrote:

[...]

[outer SELECT and JOIN as above]
ORDER BY article.id, version.created_at desc
GROUP BY article.id
[further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version?

Sorry, I meant versions.updated_at desc in the order clause, not
created_at. And the way it works is this: the ORDER clause sorts the
versions by article ID, then by timestamp, ensuring that versions for
the same article are together with the newest first. Then the GROUP
clause returns only one record -- the first, ergo the newest -- for each
article ID.

Well, you learn something new every day! I found the Ambiguous Groups
article on the Web and found that the behavior I thought was standard --
returning the first value in the case of multiple values -- is actually
a MySQL quirk (ironic when you consider that I haven't used MySQL in
years). Back to the drawing board.

Best,

I did tell you that you ought to understand a problem before jumping to
solutions, didn't I? I have to admit I was piqued by your reaction to my
original question: it was bold -- and ignorant. You don't have to answer
every question in this mailing list single-handedly.

Anyway, I'm still looking for a way to express the existing, working SQL
solutions (sub-select for the latest version corresponding to an
article) in elegant ActiveRecord code.

Michael

I think that antagonising those trying to help will not enamour you to
others reading your question. There is nothing wrong with being bold
when answering questions and no-one knows everything. To suggest that
someone is ignorant because they are unaware of that particular mysql
quirk is absolutely ludricrous. By using the definition of ignorant
as 'not knowing absolutely everything' you must include yourself or
you would not have to come here to ask your question in the first
place.

Colin

>> Marnen Laibow-Koser wrote:
>>
>> Well, you learn something new every day! I found the Ambiguous
>> Groups article on the Web and found that the behavior I thought
>> was standard -- returning the first value in the case of multiple
>> values -- is actually a MySQL quirk (ironic when you consider
>> that I haven't used MySQL in years). Back to the drawing board.
>
> I did tell you that you ought to understand a problem before
> jumping to solutions, didn't I? I have to admit I was piqued by
> your reaction to my original question: it was bold -- and
> ignorant. You don't have to answer every question in this mailing
> list single-handedly.

I think that antagonising those trying to help will not enamour you
to others reading your question. There is nothing wrong with being
bold when answering questions and no-one knows everything.

You are right in general, but not in every particular case. I'd like to
claim that this particular case does not fit. Re-read the discussion an
judge for yourself. I may not like to be told that my code is smelly and
my design design is broken by someone who clearly does not understand
the setting -- but then, who does like such a thing?

It's great to have around people who are trying to be helpful. It's
event better, if they go about it without patronizing those who ask.

To
suggest that someone is ignorant because they are unaware of that
particular mysql quirk is absolutely ludricrous.

You're getting it wrong. I wouldn't expect anybody to know about exactly
how MySQL's implementation of GROUP BY behaves. MySQL allows grouping
expressions that are not allowed by the SQL standard and it behaves in
predictable ways when using them. This has nothing to do with my
question, though, as it doesn't involve MySQL in any way. Now, if
someone *does* know about MySQL's specific behaviour and take it for
standard, they might be tempted to base a solution on that.

Still, as I said more than once, I know what I want to do at the
SQL/database level. I don't see how to express it elegantly in
ActiveRecord.

Michael

Michael Schuerig wrote:

Marnen Laibow-Koser wrote:

Well, you learn something new every day! I found the Ambiguous
Groups article on the Web and found that the behavior I thought was
standard -- returning the first value in the case of multiple values
-- is actually a MySQL quirk (ironic when you consider that I
haven't used MySQL in years). Back to the drawing board.

I did tell you that you ought to understand a problem before jumping to
solutions, didn't I?

I believe I do understand the problem; I was incorrect regarding the SQL
standard, however.

I am still considering a better solution for your problem, though I'll
admit that the spirit in which my attempts have so far been received is
not exactly encouraging me to spend much more time on the issue.

I have to admit I was piqued by your reaction to my
original question: it was bold -- and ignorant.

Sometimes the best way to tease out the right solution is to propose a
solution and see where it fails. Consider it to be mailing list TDD. :slight_smile:

And *of course* my initial answer was ignorant. I can't see your code
or DB, so I only know as much about it as you post to the list. That
description will almost inevitably be incomplete.

You don't have to answer
every question in this mailing list single-handedly.

I don't try to. I help where I believe I have useful knowledge. In
this case, I was also helping in order to keep my own DB skills sharp.

Anyway, I'm still looking for a way to express the existing, working SQL
solutions (sub-select for the latest version corresponding to an
article) in elegant ActiveRecord code.

I may have something else for you. I believe I am close, at any rate.
But patronizing those who are trying to help you is not the way to get
them to continue helping.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/

Best,

Michael Schuerig wrote:

I am still considering a better solution for your problem, though
I'll admit that the spirit in which my attempts have so far been
received is not exactly encouraging me to spend much more time on
the issue.

> I have to admit I was piqued by your reaction to my
> original question: it was bold -- and ignorant.

Sometimes the best way to tease out the right solution is to propose
a solution and see where it fails. Consider it to be mailing list
TDD. :slight_smile:

Then don't complain if the reaction indicates a failure.

Thank you for your effort. I think we could have reached that point much
more easily.

> Anyway, I'm still looking for a way to express the existing,
> working SQL solutions (sub-select for the latest version
> corresponding to an article) in elegant ActiveRecord code.

I may have something else for you. I believe I am close, at any
rate. But patronizing those who are trying to help you is not the
way to get them to continue helping.

I wasn't patronizing you, look it up in the nearest dictionary. I
completely agree that I was telling you in no uncertain terms that I
didn't think much of the way you were telling me that my code is smelly
and my design is wrong (compare "patronize"). I hope you agree by now
that your initial assessment was rash and wrong.

Michael

Michael Schuerig wrote:

Michael Schuerig wrote:

TDD. :slight_smile:

Then don't complain if the reaction indicates a failure.

I wasn't the one complaining, was I?

[...]

I may have something else for you. I believe I am close, at any
rate. But patronizing those who are trying to help you is not the
way to get them to continue helping.

I wasn't patronizing you, look it up in the nearest dictionary.

OK, *now* you're patronizing me ("displaying or indicative of an
offensively condescending manner"), even if you weren't before.

I
completely agree that I was telling you in no uncertain terms that I
didn't think much of the way you were telling me that my code is smelly
and my design is wrong (compare "patronize").

If I see suspect designs, I say so. If you don't want advice, don't ask
for help.

And "smelly" means "suspect", not "wrong" -- at least that's the sense
in which I use it. If I believed it were absolutely wrong, I would have
said so.

Best,