Objects with versions: how to avoid 1 + n queries?

Firstly, I've asked a similar question before[*]. I didn't get any answers back then, maybe now I have better luck

Let's say I have models like this

  class Article < ActiveRecord::Base     has_many :versions   end

  class Version < ActiveRecord::Base     belongs_to :article     default_scope order('updated_at')     scope :published, where(:state => 'published')     validates :state, :inclusion => { :in => ['draft', 'published'] }     validates :title, :content, :presence => true   end

It's easy to find the latest version for an article, just

  article.versions.last

The same for published versions isn't much more complicated

  articles.versions.published.last

Of course, I'm not only dealing with single articles

  Article.all

Then, in a articles/_article.html.erb

  <%= article.version.last.title %>

Oops! That triggers another database access for each article.

Now, I know how to deal with this in SQL, with either a correlated sub- select or an even more complicated left outer join. I've forced the first technique into ActiveRecord wrapping, which isn't fun. Far from being any help. ARec gets in the way. I've dabbled with ARel which was even worse, but may be due to my inexperience with it.

Currently, the nicest solution I can think of is to accept the SQL and go native: i.e., define a view in the database and a read-only model on top of it.

I appreciate any suggestions how to do this *elegantly* in ARec.

Michael

[*] 2010-06-20, ActiveRecord and ARel: correlated subqueries?

It's hard to answer your post, as your only question is in the subject. Everything else you typed was statement, so I can't be sure what bit you want addressed. But I'll assume (and please correct me if I'm wrong):

Does "include" help your problem?

@articles = Article.all(:include => :versions)

(this would go in whatever controller action has a view that renders the articles/_article.html.erb partial with a collection @articles)

Michael Schuerig wrote:

Firstly, I've asked a similar question before[*]. I didn't get any answers back then, maybe now I have better luck

Let's say I have models like this

  class Article < ActiveRecord::Base     has_many :versions   end

  class Version < ActiveRecord::Base     belongs_to :article     default_scope order('updated_at')     scope :published, where(:state => 'published')     validates :state, :inclusion => { :in => ['draft', 'published'] }     validates :title, :content, :presence => true   end

Side question: why aren't you using one of the existing version plugins? It's possible that this problem has already been solved.

It's easy to find the latest version for an article, just

  article.versions.last

The same for published versions isn't much more complicated

  articles.versions.published.last

Your data model is smelly. You should probably have the version metadata in the Article record, not in a separate table. With your model, all your Article data fields are really in the Version object. That's cumbersome, as you're finding out.

Of course, I'm not only dealing with single articles

  Article.all

Then, in a articles/_article.html.erb

  <%= article.version.last.title %>

Oops! That triggers another database access for each article.

Not if you had done Article.all :joins => :versions in the first place.

Now, I know how to deal with this in SQL, with either a correlated sub- select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need anything more complex?

I'd consider subqueries a code smell. There are some few cases where they're useful, but 9 times out of 10 you probably really wanted a join.

I've forced the first technique into ActiveRecord wrapping, which isn't fun. Far from being any help. ARec gets in the way.

In this case, AR is getting in the way of something you shouldn't have to be doing anyway. There's no need to outsmart it here.

I've dabbled with ARel which was even worse, but may be due to my inexperience with it.

Likely. I've never used Arel, but my impression from the docs is that it could be quite helpful in this sort of case.

Currently, the nicest solution I can think of is to accept the SQL and go native: i.e., define a view in the database and a read-only model on top of it.

You could do that, but it seems like overkill. See above for more reasonable solutions.

I appreciate any suggestions how to do this *elegantly* in ARec.

Stop overthinking. Stop fighting Rails. Stop trying to use such a poor data model.

Michael

[*] 2010-06-20, ActiveRecord and ARel: correlated subqueries? -- Michael Schuerig mailto:michael@schuerig.de Michael Schürig | Sentenced to making sense

Best,

Then let me make the question completely explicit: How do I load several articles with their latest versions from the database using only a single request while retaining elegant code?

Michael

Michael Schuerig wrote: > Firstly, I've asked a similar question before[*]. I didn't get any > answers back then, maybe now I have better luck > > Let's say I have models like this > > class Article < ActiveRecord::Base >
> has_many :versions >
> end >
> class Version < ActiveRecord::Base >
> belongs_to :article > default_scope order('updated_at') > scope :published, where(:state => 'published') > validates :state, :inclusion => { :in => ['draft', 'published'] > } validates :title, :content, :presence => true >
> end

Side question: why aren't you using one of the existing version plugins? It's possible that this problem has already been solved.

Because none of them fit, last time I looked. I need to interact with several versions at the same time.

> It's easy to find the latest version for an article, just > > article.versions.last > > The same for published versions isn't much more complicated > > articles.versions.published.last

Your data model is smelly. You should probably have the version metadata in the Article record, not in a separate table. With your model, all your Article data fields are really in the Version object. That's cumbersome, as you're finding out.

The data model expresses what I need. Versions of the same article can have different titles. It is likely that an older version of an article is already published whereas the latest version is still in draft state.

> Of course, I'm not only dealing with single articles > > Article.all > > Then, in a articles/_article.html.erb > > <%= article.version.last.title %> > > Oops! That triggers another database access for each article.

Not if you had done Article.all :joins => :versions in the first place.

:joins with a symbol does an inner join. When there are several draft versions, which one would I get? Answer: any. I want a specific one: the latest.

> Now, I know how to deal with this in SQL, with either a correlated > sub- select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need anything more complex?

Before you claim that I'm wrong, please consider the possibility that you don't understand what I'm trying to do. Thank you.

Stop overthinking. Stop fighting Rails. Stop trying to use such a poor data model.

Stop being overly confident of your opinion when you don't understand the situation. If you think my problem statement was not precise enough, you could have asked for the details you thought were missing. No need to start out with bold and unwarranted statements.

Michael

Michael Schuerig wrote:

> end

[...]

model, all your Article data fields are really in the Version object. That's cumbersome, as you're finding out.

The data model expresses what I need.

Maybe not. If you're finding it this difficult to use, I wonder.

Versions of the same article can have different titles. It is likely that an older version of an article is already published whereas the latest version is still in draft state.

That doesn't really invalidate my earlier suggestion, I think, or at least a modification of it.

Not if you had done Article.all :joins => :versions in the first place.

:joins with a symbol does an inner join.

So use an SQL fragment if you need an outer join (which, on reflection, I suppose you do).

When there are several draft versions, which one would I get? Answer: any. I want a specific one: the latest.

You can specify conditions and sort order on fields from joined tables. That should do the trick.

> Now, I know how to deal with this in SQL, with either a correlated > sub- select or an even more complicated left outer join.

No way! A simple join should do it. Why do you think you need anything more complex?

Before you claim that I'm wrong, please consider the possibility that you don't understand what I'm trying to do. Thank you.

I did consider that possibility. If you think I'm misunderstanding you, please tell me what I got wrong.

Stop overthinking. Stop fighting Rails. Stop trying to use such a poor data model.

Stop being overly confident of your opinion when you don't understand the situation. If you think my problem statement was not precise enough, you could have asked for the details you thought were missing.

Your problem statement was plenty precise -- precise enough for me to determine that you may have a modeling problem.

Now, if you think there's something I don't understand, please tell me what it is and I'll modify my suggestions accordingly.

No need to start out with bold and unwarranted statements.

I don't believe anything I said was unwarranted. You came here looking for advice, right? I gave you the best advice I know how to. You may not like it, but please don't just make nebulous "you don't understand" statements without saying what you think I missed. The best advice is not always the advice you wanted to hear.

Michael

-- Michael Schuerig mailto:michael@schuerig.de Michael Schürig | Sentenced to making sense

Best,

I had a couple more ideas just after I posted.

Marnen Laibow-Koser wrote:

Michael Schuerig wrote:

[...]

:joins with a symbol does an inner join.

So use an SQL fragment if you need an outer join (which, on reflection, I suppose you do).

On further reflection, I don't know why you would need an outer join unless not every Article has a Version associated. Is that the case?

When there are several draft versions, which one would I get? Answer: any. I want a specific one: the latest.

You can specify conditions and sort order on fields from joined tables. That should do the trick.

Another (possibly zany) idea: group the versions by article_id.

Are you ultimately trying to retrieve the latest version only for each article? That's what you imply, but a couple of things you wrote made me thing that you might want to retrieve multiple versions. Which is it? [...]

Best,

I had a couple more ideas just after I posted.

Marnen Laibow-Koser wrote: > Michael Schuerig wrote: [...]

>> :joins with a symbol does an inner join. > > So use an SQL fragment if you need an outer join (which, on > reflection, I suppose you do).

On further reflection, I don't know why you would need an outer join unless not every Article has a Version associated. Is that the case?

I need an *inner* join with the latest of several versions, possibly additionally meeting the condition of being in state "published".

>> When there are several draft >> versions, which one would I get? Answer: any. I want a specific >> one: the latest. > > You can specify conditions and sort order on fields from joined > tables. That should do the trick.

Another (possibly zany) idea: group the versions by article_id.

Are you ultimately trying to retrieve the latest version only for each article? That's what you imply, but a couple of things you wrote made me thing that you might want to retrieve multiple versions. Which is it? [...]

If you are unsure of this, that is the question you ought to have asked before even starting to make suggestions. The answer: Yes, I'm trying to display lists of either only the latest version or the only latest published version of each article, if any such version exists.

To find the latest of several things, you need to compare them (or have the comparison pre-packaged in an index). In the context of a select statement for articles, finding the latest version can take the form of a correlated sub-select:

  SELECT articles.*, versions.* FROM articles     JOIN versions ON articles.id = versions.article_id    WHERE versions.updated_at =          (SELECT MAX(v.updated_at) FROM versions AS v            WHERE v.article_id = articles.id)

With an index on versions(article_id, updated_at) this isn't even too bad on the database. There are other ways with uncorrelated sub-selects and outer joins.

Let me emphasize that my problem is *not* getting the right data out of the database. I can get it in a single query. And I can make ActiveRecord do this. In order to get there, I had to write more literal SQL than I would have liked, for instance the entire condition containing the sub-select. AFAICT, ARel doesn't make things any easier, it only mirrors the structure of literal, but plain SQL in convoluted Ruby.

As I wrote already in closing of my original question: I appreciate any suggestions how to do this elegantly in ARec.

Michael

Michael Schuerig wrote:

On further reflection, I don't know why you would need an outer join unless not every Article has a Version associated. Is that the case?

I need an *inner* join with the latest of several versions, possibly additionally meeting the condition of being in state "published".

Then I'm not sure why :joins won't do the trick.

each article? That's what you imply, but a couple of things you wrote made me thing that you might want to retrieve multiple versions. Which is it? [...]

If you are unsure of this, that is the question you ought to have asked before even starting to make suggestions.

The answer doesn't make a heck of a lot of difference to my suggestions, except in the last layer of details in the join, which did not originally concern me. :slight_smile:

The answer: Yes, I'm trying to display lists of either only the latest version or the only latest published version of each article, if any such version exists.

To find the latest of several things, you need to compare them (or have the comparison pre-packaged in an index).

Of course.

In the context of a select statement for articles, finding the latest version can take the form of a correlated sub-select:

  SELECT articles.*, versions.* FROM articles     JOIN versions ON articles.id = versions.article_id    WHERE versions.updated_at =          (SELECT MAX(v.updated_at) FROM versions AS v            WHERE v.article_id = articles.id)

Yes, that solution had occurred to me. But the subquery is unnecessary, I think:

[outer SELECT and JOIN as above] ORDER BY article.id, version.created_at desc GROUP BY article.id [further JOINs may be necessary in some DBs, but the principle holds]

This will give equivalent results and may be more ActiveRecord-friendly.

OTOH, I *have* once or twice used subqueries with AR. It isn't my favorite thing to do, but it is possible if absolutely necessary (which I don't think it is here).

[...]

As I wrote already in closing of my original question: I appreciate any suggestions how to do this elegantly in ARec.

Michael

-- Michael Schuerig mailto:michael@schuerig.de Michael Schürig | Sentenced to making sense

Best,

Michael Schuerig wrote: >> On further reflection, I don't know why you would need an outer >> join unless not every Article has a Version associated. Is that >> the case? > > I need an *inner* join with the latest of several versions, > possibly additionally meeting the condition of being in state > "published".

Then I'm not sure why :joins won't do the trick.

Because :joins, doing a simple inner join on articles and versions, creates a relation with a row for each version. I want one row, made up from an article with its latest (published) version. What condition would you put on the join, apart from corresponding keys, in order to achieve that?

> The answer: Yes, I'm trying to > display lists of either only the latest version or the only latest > published version of each article, if any such version exists. > > To find the latest of several things, you need to compare them (or > have the comparison pre-packaged in an index).

Of course.

> In the context of a select > statement for articles, finding the latest version can take the > form of > > a correlated sub-select: > SELECT articles.*, versions.* FROM articles >
> JOIN versions ON articles.id = versions.article_id >
> WHERE versions.updated_at = >
> (SELECT MAX(v.updated_at) FROM versions AS v >
> WHERE v.article_id = articles.id)

Yes, that solution had occurred to me. But the subquery is unnecessary, I think:

[outer SELECT and JOIN as above] ORDER BY article.id, version.created_at desc GROUP BY article.id [further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version? Before you go and suggest using MAX(updated_at) on the groups, consider how groups work or get Bill Karwin's "SQL Antipatterns" and read ch. 15, "Ambiguous Groups".

Also, for any suggestions containing an ORDER BY versions.updated_at in the outer SELECT: The sorting is applied at the very end, not somewhere in between

Michael

Michael Schuerig wrote:

Then I'm not sure why :joins won't do the trick.

Because :joins, doing a simple inner join on articles and versions, creates a relation with a row for each version. I want one row, made up from an article with its latest (published) version. What condition would you put on the join, apart from corresponding keys, in order to achieve that?

Probably the group strategy mentioned below. That will work with :joins.

[...]

[outer SELECT and JOIN as above] ORDER BY article.id, version.created_at desc GROUP BY article.id [further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version?

Sorry, I meant versions.updated_at desc in the order clause, not created_at. And the way it works is this: the ORDER clause sorts the versions by article ID, then by timestamp, ensuring that versions for the same article are together with the newest first. Then the GROUP clause returns only one record -- the first, ergo the newest -- for each article ID.

Before you go and suggest using MAX(updated_at) on the groups,

It's not necessary to do that. That was sort of the whole point.

consider how groups work or get Bill Karwin's "SQL Antipatterns" and read ch. 15, "Ambiguous Groups".

Not familiar with the book, though I'll keep my eye out for a copy. But is there anything ambiguous here, now that I've corrected the field name?

Also, for any suggestions containing an ORDER BY versions.updated_at in the outer SELECT: The sorting is applied at the very end, not somewhere in between

Right. Did I provide any such versions? I think not, because that is blindingly obvious to me.

Michael

-- Michael Schuerig mailto:michael@schuerig.de Michael Schürig | Sentenced to making sense

Best,

Marnen Laibow-Koser wrote:

[...]

[outer SELECT and JOIN as above] ORDER BY article.id, version.created_at desc GROUP BY article.id [further JOINs may be necessary in some DBs, but the principle holds]

How does this ensure that I get the latest version?

Sorry, I meant versions.updated_at desc in the order clause, not created_at. And the way it works is this: the ORDER clause sorts the versions by article ID, then by timestamp, ensuring that versions for the same article are together with the newest first. Then the GROUP clause returns only one record -- the first, ergo the newest -- for each article ID.

Well, you learn something new every day! I found the Ambiguous Groups article on the Web and found that the behavior I thought was standard -- returning the first value in the case of multiple values -- is actually a MySQL quirk (ironic when you consider that I haven't used MySQL in years). Back to the drawing board.

Best,

I did tell you that you ought to understand a problem before jumping to solutions, didn't I? I have to admit I was piqued by your reaction to my original question: it was bold -- and ignorant. You don't have to answer every question in this mailing list single-handedly.

Anyway, I'm still looking for a way to express the existing, working SQL solutions (sub-select for the latest version corresponding to an article) in elegant ActiveRecord code.

Michael

I think that antagonising those trying to help will not enamour you to others reading your question. There is nothing wrong with being bold when answering questions and no-one knows everything. To suggest that someone is ignorant because they are unaware of that particular mysql quirk is absolutely ludricrous. By using the definition of ignorant as 'not knowing absolutely everything' you must include yourself or you would not have to come here to ask your question in the first place.

Colin

>> Marnen Laibow-Koser wrote: >> >> Well, you learn something new every day! I found the Ambiguous >> Groups article on the Web and found that the behavior I thought >> was standard -- returning the first value in the case of multiple >> values -- is actually a MySQL quirk (ironic when you consider >> that I haven't used MySQL in years). Back to the drawing board. > > I did tell you that you ought to understand a problem before > jumping to solutions, didn't I? I have to admit I was piqued by > your reaction to my original question: it was bold -- and > ignorant. You don't have to answer every question in this mailing > list single-handedly.

I think that antagonising those trying to help will not enamour you to others reading your question. There is nothing wrong with being bold when answering questions and no-one knows everything.

You are right in general, but not in every particular case. I'd like to claim that this particular case does not fit. Re-read the discussion an judge for yourself. I may not like to be told that my code is smelly and my design design is broken by someone who clearly does not understand the setting -- but then, who does like such a thing?

It's great to have around people who are trying to be helpful. It's event better, if they go about it without patronizing those who ask.

To suggest that someone is ignorant because they are unaware of that particular mysql quirk is absolutely ludricrous.

You're getting it wrong. I wouldn't expect anybody to know about exactly how MySQL's implementation of GROUP BY behaves. MySQL allows grouping expressions that are not allowed by the SQL standard and it behaves in predictable ways when using them. This has nothing to do with my question, though, as it doesn't involve MySQL in any way. Now, if someone *does* know about MySQL's specific behaviour and take it for standard, they might be tempted to base a solution on that.

Still, as I said more than once, I know what I want to do at the SQL/database level. I don't see how to express it elegantly in ActiveRecord.

Michael

Michael Schuerig wrote:

Marnen Laibow-Koser wrote:

Well, you learn something new every day! I found the Ambiguous Groups article on the Web and found that the behavior I thought was standard -- returning the first value in the case of multiple values -- is actually a MySQL quirk (ironic when you consider that I haven't used MySQL in years). Back to the drawing board.

I did tell you that you ought to understand a problem before jumping to solutions, didn't I?

I believe I do understand the problem; I was incorrect regarding the SQL standard, however.

I am still considering a better solution for your problem, though I'll admit that the spirit in which my attempts have so far been received is not exactly encouraging me to spend much more time on the issue.

I have to admit I was piqued by your reaction to my original question: it was bold -- and ignorant.

Sometimes the best way to tease out the right solution is to propose a solution and see where it fails. Consider it to be mailing list TDD. :slight_smile:

And *of course* my initial answer was ignorant. I can't see your code or DB, so I only know as much about it as you post to the list. That description will almost inevitably be incomplete.

You don't have to answer every question in this mailing list single-handedly.

I don't try to. I help where I believe I have useful knowledge. In this case, I was also helping in order to keep my own DB skills sharp.

Anyway, I'm still looking for a way to express the existing, working SQL solutions (sub-select for the latest version corresponding to an article) in elegant ActiveRecord code.

I may have something else for you. I believe I am close, at any rate. But patronizing those who are trying to help you is not the way to get them to continue helping.

Michael

-- Michael Schuerig mailto:michael@schuerig.de Michael Schürig | Sentenced to making sense

Best,

Michael Schuerig wrote:

I am still considering a better solution for your problem, though I'll admit that the spirit in which my attempts have so far been received is not exactly encouraging me to spend much more time on the issue.

> I have to admit I was piqued by your reaction to my > original question: it was bold -- and ignorant.

Sometimes the best way to tease out the right solution is to propose a solution and see where it fails. Consider it to be mailing list TDD. :slight_smile:

Then don't complain if the reaction indicates a failure.

Thank you for your effort. I think we could have reached that point much more easily.

> Anyway, I'm still looking for a way to express the existing, > working SQL solutions (sub-select for the latest version > corresponding to an article) in elegant ActiveRecord code.

I may have something else for you. I believe I am close, at any rate. But patronizing those who are trying to help you is not the way to get them to continue helping.

I wasn't patronizing you, look it up in the nearest dictionary. I completely agree that I was telling you in no uncertain terms that I didn't think much of the way you were telling me that my code is smelly and my design is wrong (compare "patronize"). I hope you agree by now that your initial assessment was rash and wrong.

Michael

Michael Schuerig wrote:

Michael Schuerig wrote:

TDD. :slight_smile:

Then don't complain if the reaction indicates a failure.

I wasn't the one complaining, was I?

[...]

I may have something else for you. I believe I am close, at any rate. But patronizing those who are trying to help you is not the way to get them to continue helping.

I wasn't patronizing you, look it up in the nearest dictionary.

OK, *now* you're patronizing me ("displaying or indicative of an offensively condescending manner"), even if you weren't before.

I completely agree that I was telling you in no uncertain terms that I didn't think much of the way you were telling me that my code is smelly and my design is wrong (compare "patronize").

If I see suspect designs, I say so. If you don't want advice, don't ask for help.

And "smelly" means "suspect", not "wrong" -- at least that's the sense in which I use it. If I believed it were absolutely wrong, I would have said so.

Best,