Proposal: has_many_ids — association macro for PostgreSQL array columns

Hey all,

I’ve been working on a feature I’d like to propose for Rails core and wanted to get feedback before opening a PR.

The pattern

A pretty common denormalization in PostgreSQL apps is storing associated record IDs directly in an array column on the owner:

# posts table has: tag_ids integer[], default: '{}'
class Post < ActiveRecord::Base
  # today you'd do something like:
  def tags
    Tag.where(id: tag_ids)
  end
end

This works, but you lose preloading (includes), so you end up with N+1s or hand-rolling preload logic. Every app I’ve seen do this ends up with slightly different bespoke solutions.

What I’m proposing

A new has_many_ids macro that gives you a read-only association proxy backed by an array column:

class Post < ActiveRecord::Base
  # Auto-detects the tag_ids integer[] column
  has_many_ids :tags

  # Explicit column name
  has_many_ids :highlighted_tags, class_name: "Tag", column: :highlighted_tag_ids

  # Scopes work
  has_many_ids :active_tags, -> { where(active: true) }, column: :tag_ids
end
post.tags           # => SELECT * FROM tags WHERE id = ANY($1) with [post.tag_ids]
post.tag_ids        # => [1, 2, 3] (reads the column)
post.tag_ids = [4, 5]  # writes the column, resets the association

Post.includes(:tags)  # preloads in 2 queries, no N+1

Why a separate macro instead of a has_many option

Earlier iterations tried adding an array_key: option to has_many, but this association doesn’t fulfill the has_many contract. There are no joins, no build/create/<</delete, no dependent:. Grafting it onto has_many meant scattering if array_key? checks throughout the association internals (not great).

A separate macro lets it have its own builder, reflection, and association class (HasManyIdsAssociation < HasManyAssociation) without touching any existing code paths. It follows the same pattern as has_many :through, which has its own HasManyThroughAssociation.

What it does

  • Loading uses WHERE id = ANY($1) — single bind parameter, statement-cacheable regardless of array length
  • includes / preload work, properly expands each owner’s array into the preloader’s key map
  • If your association is :tags, it looks for a tag_ids array column automatically. Or specify column: explicitly
  • Works with integer[] and bigint[] (type inferred from the column)

What it intentionally does not do

  • No build, create, <<, delete — raises HasManyIdsReadonlyAssociationError with a helpful message pointing you to update the array column directly
  • No joins / eager_load — raises HasManyIdsJoinNotSupportedError (suggests preload / includes)
  • No dependent: — this is a read-only view over a denormalized column, not an ownership relationship
  • No counter_cache, inverse_of, through, as: (polymorphic)

The idea is to keep the scope very narrow for a v1. This is a convenience for reading a denormalized column through the association interface, nothing more.

Implementation

It follows the same pattern as has_many :through — own builder, reflection, and association class. Existing has_many codepaths are untouched.

Open questions

has_many_ids felt right because the column stores IDs and the macro name hints at what’s different. Open to other suggestions.

I have a working branch, if requested to open the PR.

2 Likes

I don’t have strong opinions on the feature but I’d the ids are in the table where the macro is called, then it’s closer to belongs_to than has_many. Maybe belongs_to_many

4 Likes

I’ve come back to this a number of times. Really like the name belongs_to_many. And I’m feeling like this concept is not an extension anymore, but more of a core concept.

In order for ancestry (materialized path) to find child nodes, you need to use child_model.ancestry LIKE "#{model.ancestry}/%". Implementation was very tricky because not only does it need a belongs_to_many, but it also needs an association between the parent and children that is not pk, fk based.

The compound key effort was probably difficult, in part for the 4 ways you need to generate SQL, gut mostly for the preloader/associations associating children and parent.

So for my LIKE implementation, I did get preloads and joins working. Think it works for ancestry being a string, ltree, and int[], though I was mostly distracted by a bad interface more than getting it to work for many different forms of the local “ids”.

How far have you gotten with this?