[Proposal] Expose update_all Arel statement

Exposing the Arel statement built by ActiveRecord’s update_all would be useful for bulk operations.

Scenario 1: Updating multiple channel objects

In our project we bulk update the Channel table with each channel’s latest unread_count. This can be done by looping the channel and its latest unread_count and updating each channel. In these cases we don’t need any ActiveRecord callbacks; we just need the data updated.

Solution 1: record.update

hash.each{|channel, unread_count| channel.update(unread_count: unread_count) }

Pros: Easy to read and ActiveRecord abstracts the DB’s language

Problems: expensive for many records and 1 round trip to the DB for each record

Solution 2: Record.update_all with grouped counts

hash.each{|unread_count, channel_ids| Channel.where(id: channel_ids).update_all(unread_count: unread_count) }

Pros: ActiveRecord abstracts the DB’s language, similar counts could be groups so channels with the same count can be updated in bulk

Cons: Harder to read and worst case 1 round trip to the DB for each channel_id. Grouped updates much more complicated if multiple columns are being updated with multiple possible values

Solution 3: Manual SQL + Bulk Execute

statements = hash.map do |channel_id, unread_count|
  <<~SQL
    UPDATE "channels" SET "unread_conversation_count" = #{unread_count}, "updated_at" = '#{now}' WHERE "channels"."id" = #{channel_id};
  SQL
end
Channel.connection.execute(statements.join)

Pros: Less round trips to the DB. Multiple columns can be updated for that channel_id

Cons: Harder to read. SQL language might not be portable between DB types(not usable in a gem). DB rules like namespacing, escaping, and SQL injection not automatically handled by Arel.

Proposed Solution: Bulk execute statements created by update_all

statements = hash.map do |unread_count, channel_ids|
  Channel.where(id: channel_ids).update_all_statement(unread_count: unread_count).to_sql
end
Channel.connection.execute(statements.join)

Pros: Easy to read. Arel handles all the details of creating correct SQL. Less round trips to the DB. Multiple columns can be updated for those channel_ids.

Cons: Not available in ActiveRecord, yet…

Scenario 2: counter_culture gem

Counter caches can for various reasons become inaccurate. Querying for the correct values can be done in a batched SQL query but the updates to each record have to be done one by one. Below is a simplified version of the method they use to update the records (see the code at counter_culture/reconciler.rb at master · magnusvk/counter_culture · GitHub). Look at the last line in the method. Each record (possibly millions of them) get updated one at a time using update_all. They do this so Arel takes care of constructing the correct SQL statement.

def update_count_for_batch(column_name, records)
  ActiveRecord::Base.transaction do
    records.each do |record|
      count = record.read_attribute('count') || 0
      next if record.read_attribute(column_name) == count

      updates = ["#{column_name} = #{count}"]
      timestamp_columns.each do |timestamp_column|
        updates << "#{timestamp_column} = '#{current_time.to_formatted_s(:db)}'"
      end

      relation_class.where(relation_class.primary_key => record.send(relation_class.primary_key)).update_all(updates.join(', '))
    end
  end
end

Proposal

If ActiveRecord could expose the update_all Arel statement the above SQL updates could get batched. The change to the ActiveRecord codebase would be minimal. Rename update_all to update_all_statement and create a new update_all method that calls update_all_statement.

Latest Rails Code

def update_all(updates)
  raise ArgumentError, "Empty list of attributes to change" if updates.blank?

  if updates.is_a?(Hash)
    if klass.locking_enabled? &&
        !updates.key?(klass.locking_column) &&
        !updates.key?(klass.locking_column.to_sym)
      attr = table[klass.locking_column]
      updates[attr.name] = _increment_attribute(attr)
    end
    values = _substitute_values(updates)
  else
    values = Arel.sql(klass.sanitize_sql_for_assignment(updates, table.name))
  end

  arel = eager_loading? ? apply_join_dependency.arel : build_arel
  arel.source.left = table

  group_values_arel_columns = arel_columns(group_values.uniq)
  having_clause_ast = having_clause.ast unless having_clause.empty?
  stmt = arel.compile_update(values, table[primary_key], having_clause_ast, group_values_arel_columns)
  klass.connection.update(stmt, "#{klass} Update All").tap { reset }
end

Would become

def update_all_statement(updates)
  raise ArgumentError, "Empty list of attributes to change" if updates.blank?

  if updates.is_a?(Hash)
    if klass.locking_enabled? &&
        !updates.key?(klass.locking_column) &&
        !updates.key?(klass.locking_column.to_sym)
      attr = table[klass.locking_column]
      updates[attr.name] = _increment_attribute(attr)
    end
    values = _substitute_values(updates)
  else
    values = Arel.sql(klass.sanitize_sql_for_assignment(updates, table.name))
  end

  arel = eager_loading? ? apply_join_dependency.arel : build_arel
  arel.source.left = table

  group_values_arel_columns = arel_columns(group_values.uniq)
  having_clause_ast = having_clause.ast unless having_clause.empty?
  arel.compile_update(values, table[primary_key], having_clause_ast, group_values_arel_columns)
end

def update_all(updates)
  stmt = update_all_statement(updates)
  klass.connection.update(stmt, "#{klass} Update All").tap { reset }
end