Broken has_many

This has been killing me this weekend. Doing a has_many fetch is accurate the *first* time only.

Here's a simple example:

tp = TestParent.new(:name => "testing")
tp.save!

tc = TestChild.new(:name => "test1", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test2", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test3", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and test3" *CORRECT*

tc = TestChild.new(:name => "test4", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and test3" *WRONG!*

tp = TestParent.find(tp.id)
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, test3, and test4" *CORRECT*

In my case, we were preparing an order of line items (the children of the order) and then fetching the results to get a subtotal to add one last line item (think tax or such) but then the last line item is lost on the following receipt page, mailers, etc.

This is Rails 4.0.8, Pg 0.17.1, ruby 2.1.1p30 (2014-02-10 revision 44904) [x86_64-linux]. Models and migration are very, very basic in the test above:

---- migration
class HasManyTest < ActiveRecord::Migration

def self.up
execute %{

CREATE TABLE test_parents (
        id serial primary key,
        created_at timestamp ,
        updated_at timestamp,
  name varchar(255)
);

CREATE TABLE test_children (
        id serial primary key,
        created_at timestamp ,
        updated_at timestamp,
  name varchar(255),
  test_parent_id int4 references test_parents(id)
);

}
end

def self.down
  raise "No you can't!"
end

end

I hate to reply to myself, but I narrowed it down to Rails caching by DEFAULT of model queries. It can be worked around by passing ‘true’, like this:

tc.name = “something different”

tc.save!

tp.test_children.map {|x| x.name }.to_sentence # => “test1, test2, test3, and test4” WRONG

tp.test_children(true).map {|x| x.name }.to_sentence # => “test1, test2, test3, and something different” CORRECT

Another workaround is to just stop using has_many and such in favor of manual functions, a la:

def test_children

return TestChild.where([“test_parent_id = ?”, self.id])

end

Is there a way to turn this sort of caching off globally? (Other caching is fine, I don’t want to turn all caching off.)

BTW- It is a bit mind blowing that this is turned on by default. Possible data corruption shouldn’t ever be preferred by default over (possible) speed gains. I’d still categorize this as a serious bug, at least as a configuration default.

Thanks!

Phil

Try tp.reload after you make a change in the collection the way you have
been. If you had been building your children entries as:

tp.children.create(name: "test1")

tp would be fully cognizant of all it's children.

Thanks for the reply. It still seems to be a systemic integrity problem. But the additional info on another work around is welcome.

Phil

Not that I know of. As of rails 4 (or is it 3.2?) the generated association accessors are defined in a module (rather than directly on the class so it’s easy enough to override them so you could do

class TestParent

has_many :test_children

def test_childen(force_reload=true)

super(force_reload)

end

end

To change it for all associations without having to do this on a per association basis would probably require some monkey patching inside the activerecord code - sounds brittle.

As tamouse says if you use build/create on the association then you won’t see this behaviour, although there can still be differences between the array thus constructed in memory and the array you’d get if you had selected from the database (for example if you have an order clause on the association, custom select etc.)

Rails has been like this at least since the 1.0 days - I can’t say I’ve ever run into particular issues from it.

Fred

It’s not a systemic integrity problem, it is the way Rails has always worked. Using tp.test_children.create() is not “another work-around”, it is the recommended way of adding children to a parent model that you have already instantiated and has been available for as long as I can remember (at least since rails 2.0.x).

Scenarios where you would actually need to re-query the database every time you access a relation are rare. If you really need to, you have that option, but in no way should that be a default.

Jim

The cleanest way to deal with this is to use the association. Instead of:

tc = TestChild.new(:name => “test4”, :test_parent_id => tp.id)

do:

tp.test_children.new(:name => “test4”)

This will correctly reflect the new record when subsequently calling tp.test_children.

–Matt Jones

Is there a way to turn this sort of caching off globally? (Other caching is fine, I don’t want to turn all caching off.)

BTW- It is a bit mind blowing that this is turned on by default. Possible data corruption shouldn’t ever be preferred by default over (possible) speed gains. I’d still categorize this as a serious bug, at least as a configuration default.

Not that I know of. As of rails 4 (or is it 3.2?) the generated association accessors are defined in a module (rather than directly on the class so it’s easy enough to override them so you could do

class TestParent

has_many :test_children

def test_childen(force_reload=true)

super(force_reload)

end

end

To change it for all associations without having to do this on a per association basis would probably require some monkey patching inside the activerecord code - sounds brittle.

As tamouse says if you use build/create on the association then you won’t see this behaviour, although there can still be differences between the array thus constructed in memory and the array you’d get if you had selected from the database (for example if you have an order clause on the association, custom select etc.)

Ah, yes, I see it now in activerecord-4.0.8/lib/active_record/associations/collection_association.rb

OK, makes sense. I’ve love to have that be a config option I could throw in the environments config files to turn that off, if needed.

Rails has been like this at least since the 1.0 days - I can’t say I’ve ever run into particular issues from it.

In my case it was rather nasty. It went something like this:

@order = Order.new …
… record line items…
@order.line_items… calculate subtotal to calculate tax and record that as the final line item
run charge on credit card, render receipt page, send mailers, etc.

The result was the customer got receipt page/emails and charged without tax! Fulfillment and Accounting pulling up the order would confusingly get the right total and tax.

Thanks, I have a much better understanding of what’s going on now.

Phil

(I apologize in advance for top posting, but trying to keep this thread consistent.)

I would say it’s a definite problem for us. This project is an old legacy project that several people have been working on for 6+ years. When I’m rewriting, say, the receipt page and mailer, how could I possibly know in advance that @order.line_items would give stale results without digging through the details of the controller? If I have to do @order.line_items(true) (and with every other has_many, etc.) everywhere just to be sure, I’d rather just simply like to have the feature turned off globally.

While it can be ‘rare’ that somebody might run into this, I’d still argue that significant performance gains of caching associations like this is also rare/minimal (would be interesting to test). I’d rather lean towards data accuracy than speed, at least in my case.

I’d simply like a simple config option to turn on/off this feature. I’d also suggest that it could even be dangerous to have on by default if things have to be done in particular ways to get around the potential pit falls, but I’d just be happy to have a config option to turn it off.

Thanks for the reply!

Phil

Hi,

I disagree with you on your statement - “significant performance gains of caching associations like this is also rare/minimal”.

Almost all the apps that I’ve built had pages where I’ve used the same association so many times for rendering the view and disabling caching in those cases would be an absolute performance disaster! Think of all the other problems it can cause… eager loading of associations?

Well like Jim said its very rare that someone has to do what you’re doing and even if they have to do it, Rails has a way of doing it properly :slight_smile:

Thanks!

Steve

I would recommend you start by getting all that domain logic out of the Controllers and into context or composite objects. Check out http://en.wikipedia.org/wiki/Data,_context_and_interaction or James Coplien’s excellent book “Lean Architecture” in which he discusses DCI at length.

This doesn’t actually solve the specific Rails-doesn’t-reload-associations problem you’re having, and in fact I’ve worked on an e-commerce codebase with the exact same problem you describe. We had several @order.reload calls because of the way Rails is implemented.

Moving to a DCI pattern doesn’t eliminate the Rails problem here, but it encapsulates the business logic into a single context object whose responsibility is to both do the operations and then tell the object to reload correctly, producing the expected results. You can test the context object independently of the controller, which is ideal.

I actually think there is significant performance gain from not reloading associations in general, and in e-commerce specific apps you simply need to know this about Rails and suck it up and do @order.reload when you make certain operations. Once you realize this is just how Rails works, and you move that domain logic into a place more appropriate than the controller, it starts to bother you less that you have to do it in the first place.

Just my two ¢

-Jason

Thanks for the reply Steve!

I guess it’s hard to say what the advantages are without testing… Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I’m not sure what advantages association caching is actually providing. A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I’d like to do when I get some time). Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works

I’m not sure what you mean by the ‘rails way’ in this case. Usually the rails way means not having to worry about cache management by default. In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong. This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).

Phil

I guess it’s hard to say what the advantages are without testing… Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I’m not sure what advantages association caching is actually providing. A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I’d like to do when I get some time).

If your dataset is small and properly indexed, and you have very few related items, chances are you will see very little change in performance from reloading every time you access the order line items. If you have several hundred line items, things may be a little different. If your database is on a different machine than the web server, things may also be a little different.

Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works

Ironically, this has nothing to do with what you are having problems with. You will still have the “stale” object. (Unless you are suggesting that Rails should check the updated_at for every ActiveRecord object every single time you access it. In that case, good luck. I doubt you will ever find any ORM so poorly written.)

I’m not sure what you mean by the ‘rails way’ in this case. Usually the rails way means not having to worry about cache management by default.

I don’t worry about cache management by default. In addition, I don’t have to worry about how to start using the database caching, and when I might need it.

In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong. This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).

If your order line items are wrong, that is because somewhere in the code, somebody made some incorrect assumptions and wrote some incorrect code, examples of which have already been pointed out. Thankfully, the goal of the framework is not to make sure that can never happen by default, with the side effect that you may need to make drastic changes to the code if you ever want to start caching. The framework does provide multiple ways to create related many records in a way that updates the relation in the parent object.

In your case, you can either find that problem and fix it, or simply force reload the items in the mailer you are working on. To complain about bad code as a “Rails Problem” is not useful. Bad code can be written in any language, using any framework.

To be honest, I have written code that requires a reload where certain functionality is invoked. I have a callback on deleting a related many that modifies a value in the parent. However, that code really should be refactored such that the children are not destroyed directly, but instead a method on the parent should destroy the child.

Jim

I guess it’s hard to say what the advantages are without testing… Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I’m not sure what advantages association caching is actually providing. A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I’d like to do when I get some time).

If your dataset is small and properly indexed, and you have very few related items, chances are you will see very little change in performance from reloading every time you access the order line items. If you have several hundred line items, things may be a little different. If your database is on a different machine than the web server, things may also be a little different.

Yes, you are probably right(?).

Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works

Ironically, this has nothing to do with what you are having problems with. You will still have the “stale” object. (Unless you are suggesting that Rails should check the updated_at for every ActiveRecord object every single time you access it. In that case, good luck. I doubt you will ever find any ORM so poorly written.)

No no, I mean there’s already a layer of SQL caching that takes care of repeated identical selects between updates and inserts. Is there not? (I think there is?) Association caching seems unneeded(?) or at least a lot less intelligent than the existing SQL query caching.

I’m not sure what you mean by the ‘rails way’ in this case. Usually the rails way means not having to worry about cache management by default.

I don’t worry about cache management by default. In addition, I don’t have to worry about how to start using the database caching, and when I might need it.

In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong. This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).

If your order line items are wrong, that is because somewhere in the code, somebody made some incorrect assumptions and wrote some incorrect code, examples of which have already been pointed out. Thankfully, the goal of the framework is not to make sure that can never happen by default, with the side effect that you may need to make drastic changes to the code if you ever want to start caching. The framework does provide multiple ways to create related many records in a way that updates the relation in the parent object.

Before blaming ‘incorrect code’ or bad programmers, look back again at my example. It’s quite trivial and reproducible. Saves order line items, in what I would consider a completely sane and logical way, call back line items to compute tax, add the last line item (the tax), and then throws the @order to payment processing, mailers, receipt page… where totals are wrong and tax is missing. I understand that doing things differently avoid the problem and I have used them to fix this issue on my end, but I still am quite skeptical as to how a programmer is supposed to know about these pitfalls.

In your case, you can either find that problem and fix it, or simply force reload the items in the mailer you are working on. To complain about bad code as a “Rails Problem” is not useful. Bad code can be written in any language, using any framework.

Oh, absolutely. I’m not just venting to vent, I have a few (what I think are constructive) questions:

  • How can one turn off ‘association caching’ (for lack of a better name) without having to edit the .rb file in the gem (the answer appears to be you can’t?)

  • If not, can we simply add a simple config switch (e.g. config.active_record… that goes in config/environments/*) to Rails to turn this feature off/on easier in the future? (Not just for my sake, I mean as a general improvement.)

  • If this association caching was added as a hack to improve performance back in the day, is it really still needed? Perhaps it can be removed or turned off by default? Perhaps it was added before SQL caching and it is not needed(???)

To be honest, I have written code that requires a reload where certain functionality is invoked. I have a callback on deleting a related many that modifies a value in the parent. However, that code really should be refactored such that the children are not destroyed directly, but instead a method on the parent should destroy the child.

It would be interesting if has_many and such could throw a callback on the child model(s) to know to update automatically. Again, if this is already handled by the SQL caching layer, perhaps it’s all moot to even have caching at the association layer. It makes sense that Rails can’t know that changes are being make to the database outside of the scope of Rails (e.g. somebody on a SQL console), but it seems like it could be cleaner about knowing what it is doing within its self.

My apologies if I sounded a bit frustrated before. Now that I understand what’s happening and how to avoid the problem, I am back in my zen state. ;’) Thanks again for your previous thoughts and insight.

Phil

It sounds like you are thinking in terms of the database, while ActiveRecord is an ORM. You need to think of it in terms of objects which happen to be persisted in a database. When you create a line item record in the database, your @order object does not know about it, so you have to tell it to reload. The only other ORM I’ve used is Apple’s CoreData, and the same thing applies. If you modify the database without using the object in question, the object will not be updated until you reload it.

Jim