Active Record associations for hierarchical structure with closure/bridge table

Our app contains a hierarchical/tree structure modeled using a “closure table” a.k.a “bridge table”, “graph of distances” (1). This allows descendants with multiple ancestors and vice versa (many to many relations), and selecting descendants/ancestors or modifying parent-child relationships in a single query. An example closure table is shown in (2). The app modelling is shown in (3). Parent-child relationships are modified (insert/delete statements) using Cartesian product joins. They can be modified using custom parents= and children= methods (the AR-built ancestors= and descendants= methods are not overwritten nor used).

How can this be cleanly integrated with Active Record?

In particular:

  • Insert/delete works when using the assignment operator (content.parents = ...), not other operators like content.parents << ... or supposedly content.children.add(...), etc). How can we overload these methods or at least ensure they don’t get used, as the default Active Record implementation is incorrect in this case?
  • Aren’t we forgetting some AR magic because of these method overrides? Can/should we override the code differently?
  • Is there any way to simplify all the defined associations (ancestors, ancestor_mappings, descendants, descendant_mappings, parents, parent_mappings, children, children_mappings)?

(1) sql - What are the options for storing hierarchical data in a relational database? - Stack Overflow

(2) Below an example for a hierarchy A->B->C. Descendants of A are obtained by selecting rows where ancestor is A, ancestors of B are obtained by selecting rows where descendant is B, parents of C are obtained by selecting rows where descendant is C and distance is 1, …

------------------------------------
| ancestor | descendant | distance |
------------------------------------
| A        | A          |        0 |
| A        | B          |        1 |
| A        | C          |        2 |
| B        | B          |        0 |
| B        | C          |        1 |
| C        | C          |        0 |
------------------------------------

(3)

class Content < ApplicationRecord
  has_many :ancestor_mappings,
           ->{ joins(:ancestor).order(distance: :desc).merge(Content.order(:position)) },
           class_name: 'ContentMapping', foreign_key: :descendant_id
  has_many :ancestors, through: :ancestor_mappings

  has_many :parent_mappings,
           ->{ joins(:ancestor).where(distance: 1).merge(Content.order(:position)) },
           class_name: 'ContentMapping', foreign_key: :descendant_id
  has_many :parents, through: :parent_mappings, source: :ancestor

  has_many :descendant_mappings,
           ->{ joins(:descendant).merge(Content.order(:position)) },
           class_name: 'ContentMapping', foreign_key: :ancestor_id
  has_many :descendants, through: :descendant_mappings
  
  has_many :child_mappings,
           ->{ joins(:descendant).where(distance: 1).merge(Content.order(:position)) },
           class_name: 'ContentMapping', foreign_key: :ancestor_id
  has_many :children, through: :child_mappings, source: :descendant

  def parents=(new_parents)
    new_parents = [new_parents].flatten
    (parents - new_parents).each { |p| ContentMapping.delete_mapping(p, self) }
    (new_parents - parents).each { |p| ContentMapping.insert_mapping(p, self) }
    reload
  end

  def children=(new_children)
    new_children = [new_children].flatten
    (children - new_children).each { |c| ContentMapping.delete_mapping(self, c) }
    (new_children - children).each { |c| ContentMapping.insert_mapping(self, c) }
    reload
  end
end

class ContentMapping < ApplicationRecord
  belongs_to :ancestor, class_name: 'Content'
  belongs_to :descendant, class_name: 'Content'
  validates :ancestor, :descendant, presence: true
  validates :distance, numericality: { greater_than_or_equal_to: 0 }, allow_nil: false

  class << self
    def insert_mapping(parent, child)
      run_sql(<<-SQL, parent_id: parent.id, child_id: child.id)
        INSERT INTO content_mappings(ancestor_id, descendant_id, distance, created_at, updated_at)
        SELECT m1.ancestor_id, m2.descendant_id, m1.distance+m2.distance+1,
               CURRENT_TIMESTAMP, CURRENT_TIMESTAMP
        FROM content_mappings m1, content_mappings m2
        WHERE m1.descendant_id=:parent_id AND m2.ancestor_id=:child_id
      SQL
    end

    def delete_mapping(parent, child)
      run_sql(<<-SQL, parent_id: parent.id, child_id: child.id)
        DELETE FROM content_mappings m
        USING content_mappings p, content_mappings c
        WHERE p.ancestor_id = m.ancestor_id AND c.descendant_id = m.descendant_id
        AND p.descendant_id = :parent_id AND c.ancestor_id = :child_id
      SQL
    end

    def run_sql(unsanitised_sql, **args)
      connection.select_value(sanitize_sql([unsanitised_sql, args]))
    end
  end
end

I’ve used the closure_tree gem in the past and it’s pretty easy to work with and supports quite a bit. Might be a good starting point to see what it supports in relation to what you’re looking for.

Thanks! This gem looks like a great learning resource. Unfortunately it can’t be used in its current state as it doesn’t support multiple parents:

Does this gem support multiple parents?

No. This gem’s API is based on the assumption that each node has either 0 or 1 parent.

The underlying closure tree structure will support multiple parents, but there would be many breaking-API changes to support it. I’m open to suggestions and pull requests.

ClosureTree::Model indeed has belongs_to and has_many relationships (withouth a through option), meaning that a single parent foreign key needs to be encoded in the model table.

Other observations:

  • We currently have an absolute ordering across all nodes (using Content.position) but would like (although we’re not sure how feasible this would be) a specific order for the descendants of each ancestor. It looks like the gem does not support that.
  • Based on the code in ClosureTree::HierarchyMaintenance, my understanding is that updates in the hierarchy table are made in an after_save hook after the default Active Record instructions are run, meaning that an extra SQL statement is made every time. For example when adding a child to a node, AR supposedly issues an insert statement to create a row representing the direct parent-child relationship, then the other needed rows (i.e the relationships with grandparents, grandchildren etc) are created in after_save. Conceptually, from a data integrity and from a performance point of view, it makes more sense to me to create all of these at once.

What a great concept, thanks for asking this question.

Not a exact answer to your question, but since a tree can be represented as a graph, you may find this alternative approach useful…

I have been using RedisGraph to represent tree/graph relations between records. Like the closure table approach, a trigger (AR callback) can be used to automatically keep the graph in sync with the SQL database. The Ruby client library is not quite complete and there is no OM so it does take some work to translate the graph query results into Ruby objects. Intro doc here: https://developer.redis.com/howtos/redisgraph/using-ruby

1 Like