Proposal: has_many_ids — association macro for PostgreSQL array columns

Hey all,

I’ve been working on a feature I’d like to propose for Rails core and wanted to get feedback before opening a PR.

The pattern

A pretty common denormalization in PostgreSQL apps is storing associated record IDs directly in an array column on the owner:

# posts table has: tag_ids integer[], default: '{}'
class Post < ActiveRecord::Base
  # today you'd do something like:
  def tags
    Tag.where(id: tag_ids)
  end
end

This works, but you lose preloading (includes), so you end up with N+1s or hand-rolling preload logic. Every app I’ve seen do this ends up with slightly different bespoke solutions.

What I’m proposing

A new has_many_ids macro that gives you a read-only association proxy backed by an array column:

class Post < ActiveRecord::Base
  # Auto-detects the tag_ids integer[] column
  has_many_ids :tags

  # Explicit column name
  has_many_ids :highlighted_tags, class_name: "Tag", column: :highlighted_tag_ids

  # Scopes work
  has_many_ids :active_tags, -> { where(active: true) }, column: :tag_ids
end
post.tags           # => SELECT * FROM tags WHERE id = ANY($1) with [post.tag_ids]
post.tag_ids        # => [1, 2, 3] (reads the column)
post.tag_ids = [4, 5]  # writes the column, resets the association

Post.includes(:tags)  # preloads in 2 queries, no N+1

Why a separate macro instead of a has_many option

Earlier iterations tried adding an array_key: option to has_many, but this association doesn’t fulfill the has_many contract. There are no joins, no build/create/<</delete, no dependent:. Grafting it onto has_many meant scattering if array_key? checks throughout the association internals (not great).

A separate macro lets it have its own builder, reflection, and association class (HasManyIdsAssociation < HasManyAssociation) without touching any existing code paths. It follows the same pattern as has_many :through, which has its own HasManyThroughAssociation.

What it does

  • Loading uses WHERE id = ANY($1) — single bind parameter, statement-cacheable regardless of array length
  • includes / preload work, properly expands each owner’s array into the preloader’s key map
  • If your association is :tags, it looks for a tag_ids array column automatically. Or specify column: explicitly
  • Works with integer[] and bigint[] (type inferred from the column)

What it intentionally does not do

  • No build, create, <<, delete — raises HasManyIdsReadonlyAssociationError with a helpful message pointing you to update the array column directly
  • No joins / eager_load — raises HasManyIdsJoinNotSupportedError (suggests preload / includes)
  • No dependent: — this is a read-only view over a denormalized column, not an ownership relationship
  • No counter_cache, inverse_of, through, as: (polymorphic)

The idea is to keep the scope very narrow for a v1. This is a convenience for reading a denormalized column through the association interface, nothing more.

Implementation

It follows the same pattern as has_many :through — own builder, reflection, and association class. Existing has_many codepaths are untouched.

Open questions

has_many_ids felt right because the column stores IDs and the macro name hints at what’s different. Open to other suggestions.

I have a working branch, if requested to open the PR.

2 Likes

I don’t have strong opinions on the feature but I’d the ids are in the table where the macro is called, then it’s closer to belongs_to than has_many. Maybe belongs_to_many

4 Likes

I’ve come back to this a number of times. Really like the name belongs_to_many. And I’m feeling like this concept is not an extension anymore, but more of a core concept.

In order for ancestry (materialized path) to find child nodes, you need to use child_model.ancestry LIKE "#{model.ancestry}/%". Implementation was very tricky because not only does it need a belongs_to_many, but it also needs an association between the parent and children that is not pk, fk based.

The compound key effort was probably difficult, in part for the 4 ways you need to generate SQL, gut mostly for the preloader/associations associating children and parent.

So for my LIKE implementation, I did get preloads and joins working. Think it works for ancestry being a string, ltree, and int[], though I was mostly distracted by a bad interface more than getting it to work for many different forms of the local “ids”.

How far have you gotten with this?

Hi Aaron.

Did you post this to GitHub? I didn’t see has_many_ids or belongs_to_many at arianf (Arian Faurtosh) · GitHub

I want to play with this for my gem. I’ve implemented things like this before, so no problem, but I wanted to see if you’d already had a stab at it.

may also want to look at:

–Keenan

keenan@thebrocks.net

Hi All!

This has long been supported by my PG gem.

When I implemented it, the worst part was convincing Rails to balance HAS_MANY relationships properly alongside BELONGS_TO.

I’d recommend you check it out, as it has everything you might need for this behavior. Plus it works really well with all the other features of the gem. And if you only want this one, you can disable the others.

Let me know if I can help in any other way!

Thanks Carlos Silva