The method to_a is not an intuitive way to force ActiveRecord to not re-query the db when accessing a collection of child objects

It’s been explained that the way around retrieving objects of a child collection in different places in memory is to use to_a. I find that unintuitive and it’s something that will constantly trip up beginners.

Why not something more explicit instead, perhaps like this?

parent.children(:refetch => false).first

The issue I’m referring to is as follows:

Without to_a

parent.children.first.object_id
=> 70286000097380
parent.children.first.object_id
=> 70286008395440

With to_a

parent.children.to_a.first.object_id
=> 70286008787320
parent.children.to_a.first.object_id

=> 70286008787320
1 Like

I’m having a little trouble understanding what you mean and I’d love it if you could tell me more.

I’ve sometimes wondered about an “airplane mode” API for ActiveRecord, something that would detach existing database connections so that you would get errors if you tried to make another query. Something like:

@connected_post = Post.first
@connected_post.comments
# => [<#Comment>, <#Comment>]

@detached_post = Post.first.detach
@detached_post.comments
# => ActiveRecord::TriedToMakeQueryOnDetachedRecordError

@preloaded_post = Post.includes(:comments).first.detach
@preloaded_post.comments
# => [<#Comment>, <#Comment>]

Java Hibernate does something like this and, while there are a lot of things I find frustrating about Hibernate, the feeling of security that one isn’t going to have extra database hits can be really nice.

Is this something that speaks to the concern you have, or am I totally off base?

5 Likes

Super interesting concept @Betsy_Haibel Betsy, do you remember how this concept is called in Hibernate?

I would love such an “offline” ActiveRecord feature as well to avoid N+1 queries

Thanks for the reply, @Betsy_Haibel.

Your idea is speaking to the concern I have. The thing I find surprising with ActiveRecord is that

post = Post.find(1000)
post.comments

will return an Array

post = Post.find(1000)
post.comments.to_a

also returns an Array

So it seems that to_a needs to be more explicit, based on what developers are using it for.

Right now, I think developers are using to_a as a way to avoid the objects in the child collection changing when accessing post.comments multiple times.

post = Post.find(1000)
post.comments.first.object_id
post.comments.first.object_id
# The object_ids are different

The objects in the array get a new location in memory since they’re different objects. That’s what I think leads to confusion.

In the following case, post represents a row in the db. However, accessing it multiple times does not change the object_id. comments also represent rows in the db. Accessing it multiple times does, however, cause different object_ids.

post = Post.find(1000)
post.title
post.title
post.comments.first
post.comments.first

ORMs can be confusing regarding when the decide to re-query, which is not a problem that’s unique to ActiveRecord. The ORM makes decisions on when to re-query the db, so I believe it’s important to help the developer understand when the re-queries are going to happen, and how that affects the objects in memory. By naming methods in a more explicit way, such as using detatch, the developer will have a better understanding of what ActiveRecord is doing.

Thanks for the discussion!

@mstrom81 maybe I’m missing something - doesn’t post.comments return ActiveRecord_Associations_CollectionProxy ?

1 Like

@yoelblum I’m not sure it has a formal name within Hibernate. (But I don’t know Hibernate that deeply – luckily, I was able to persuade my current team to use Ruby instead of Java for greenfield projects shortly after I joined!)

But basically, in Hibernate, you need to execute all SQL queries within a “JPA session context.” These session contexts have a database connection that is automagically dependency injected (I have no idea how that part works). If you call a method that requires an association to be loaded within this context, it will work. You can also pre-fetch associations in this context.

Once you’re back out of the session context, your Hibernate model object has access to all of the associations that were loaded and/or pre-fetched within it. But you can’t fetch any other data. If you try, you get a LazyIntializationException. More on this here: LazyInitializationException - What it is and the best way to fix it

I think that going 100% Hibernate-style and forcing all queries to occur within a database connection context isn’t a very Rails-y approach. But I think that making that style opt-in (with a detach method or something similar) might be an interesting way to solve the “fear of accidental queries.”

1 Like

@yoelblum In Rails 4+, yes. Before Rails 4, it doesn’t seem like it.

Related to that, though, is the idea that ActiveRecord shouldn’t require developers to understand intricacies of ActiveRecord as much as it seems to. Instead, it should behave like developers would expect it to behave.

When I think of what post.comments should return, it should return an Array of comments. Why make developers try to understand what ActiveRecord_Associations_CollectionProxy is?

@mstrom81 My understanding is a little different than yours. You’re saying:

The thing I find surprising with ActiveRecord is that

post = Post.find(1000)
post.comments

will return an Array

post = Post.find(1000)
post.comments.to_a

also returns an Array

My understanding of Rails is that the first version (just post.comments) returns an ActiveRecord::Relation object. ActiveRecord::Relation objects look like arrays when they’re output into IRB, and duck-type as arrays. However, they have abilities arrays don’t. ActiveRecord::Relation objects can have other query methods chained onto them, like so:

post.comments.where('created_at > 3.days.ago')

When that chaining happens, Rails doesn’t run the SQL query until something else forces the Relation object to “resolve” into an array. I find this a really powerful feature!

In this context, to_a is just one of the methods that you can call to force the Relation into an array and force the query to run.

Here’s where I’m still confused:

You keep talking about memory location and object ids, which seem like pretty esoteric topics for the newbies you’re concerned about! Is there some specfic bug you’re seeing because your use case relies on a constant memory location? Or are you trying to use these as a proxy for some other concern that I don’t quite understand yet?

Is it that the “array-like” nature of Relation objects is confusing for newbies who think they’re already arrays? Or is that guess totally off base?

I agree! Maybe other developers feel differently, but from what I can tell, ActiveRecord surprises people. I think a good way to avoid surprise is documenting the weirdness or, preferably, making the interface more flexible with explicit options.

Whoops, posted at the same time. This answers my question – thanks for clarifying your concern!

For me, the ability to chain additional query refinements onto a Relation or Association object is stupidly powerful, and coercing them into arrays prematurely would lose a lot of that power. Back in the Rails 2 days, we didn’t have lazy query chaining yet. I remember it being really hellish in Rails 2 to try to build specific queries that minimized database load & memory usage.

Do you think we could message the distinction between array-like AR objects and Arrays to newbies better? Would that help with your concern while still keeping the power of lazy querying?

Thanks for the discussion. The object_id topic is an example of the concern. It’s not often that bugs come about because this, but I think it’s sort of violates the principle of least surprise.

I think it’s two things:

The lazy evaluation concept is powerful, but often times Rails developers aren’t fully aware of that concept.

Chaining query methods together is super useful, but needing to call to_a to force a query to run just doesn’t seem intuitive.

I think better documenting lazy evaluation and making it clearer that to_a forces a query to run would really help. Or perhaps aliasing to_a with something like run_query

Array-like AR objects are bound to cause confusion, so writing up some more documentation on that would be excellent as well.

Do you know what kinds of documentation you might find helpful there? e.g. Do you think it needs to be part of the AR querying Rails guide, or made more emphasized in the api.rubyonrails.org documention…? It’s okay if you’re not sure, I just want to make sure I’m capturing all of your ideas.

Also, since you have a pretty clear idea of the problem you’re seeing, do you have any time to work on that documentation?

Or perhaps aliasing to_a with something like run_query

There’s load, which does approximately that.

Specifically, load is the operation that various other things, including to_a, trigger when they actually need the records to be present. Unlike to_a, load returns self, so you still have a Relation that answers to all the extra methods; to_a follows the Ruby convention and converts the vaguely-array-ish Relation collection object into a totally standard Ruby Array.

(Some of these things may be less true for Rails 3.x, which I see you mentioned above… but I don’t think there’s much we can do to retroactively improve old versions.)

2 Likes

I think the best place for that documentation would be the AR querying Rails guide.

Sure, I could work on that documentation!

Yay! Is it obvious to you how to get started, or do you need a hand with that? Happy to help facilitate if there’s anything you need.

Thanks @matthewd. I think it would help to document the differences between the Rails versions. It wouldn’t make sense to change that code, but some more documentation might help since there are a fair amount of developers writing code for Rails 3 and 4 apps still, I would guess.

I think some of the confusion (at least for me) has been understanding when Active Record decides to run queries. As @Betsy_Haibel pointed out, detach is a concept that other ORMs have. The explicitness of that seems to be clearer that what ActiveRecord does, which can tend to leave people guessing about when queries will be run.

I think I know where to get started. Thanks for the discussion.

1 Like