Exposing the Arel statement built by ActiveRecord’s update_all
would be useful for bulk operations.
Scenario 1: Updating multiple channel objects
In our project we bulk update the Channel table with each channel’s latest unread_count. This can be done by looping the channel and its latest unread_count and updating each channel. In these cases we don’t need any ActiveRecord callbacks; we just need the data updated.
Solution 1: record.update
hash.each{|channel, unread_count| channel.update(unread_count: unread_count) }
Pros: Easy to read and ActiveRecord abstracts the DB’s language
Problems: expensive for many records and 1 round trip to the DB for each record
Solution 2: Record.update_all with grouped counts
hash.each{|unread_count, channel_ids| Channel.where(id: channel_ids).update_all(unread_count: unread_count) }
Pros: ActiveRecord abstracts the DB’s language, similar counts could be groups so channels with the same count can be updated in bulk
Cons: Harder to read and worst case 1 round trip to the DB for each channel_id. Grouped updates much more complicated if multiple columns are being updated with multiple possible values
Solution 3: Manual SQL + Bulk Execute
statements = hash.map do |channel_id, unread_count|
<<~SQL
UPDATE "channels" SET "unread_conversation_count" = #{unread_count}, "updated_at" = '#{now}' WHERE "channels"."id" = #{channel_id};
SQL
end
Channel.connection.execute(statements.join)
Pros: Less round trips to the DB. Multiple columns can be updated for that channel_id
Cons: Harder to read. SQL language might not be portable between DB types(not usable in a gem). DB rules like namespacing, escaping, and SQL injection not automatically handled by Arel.
Proposed Solution: Bulk execute statements created by update_all
statements = hash.map do |unread_count, channel_ids|
Channel.where(id: channel_ids).update_all_statement(unread_count: unread_count).to_sql
end
Channel.connection.execute(statements.join)
Pros: Easy to read. Arel handles all the details of creating correct SQL. Less round trips to the DB. Multiple columns can be updated for those channel_ids.
Cons: Not available in ActiveRecord, yet…
Scenario 2: counter_culture gem
Counter caches can for various reasons become inaccurate. Querying for the correct values can be done in a batched SQL query but the updates to each record have to be done one by one. Below is a simplified version of the method they use to update the records (see the code at https://github.com/magnusvk/counter_culture/blob/master/lib/counter_culture/reconciler.rb#L164). Look at the last line in the method. Each record (possibly millions of them) get updated one at a time using update_all. They do this so Arel takes care of constructing the correct SQL statement.
def update_count_for_batch(column_name, records)
ActiveRecord::Base.transaction do
records.each do |record|
count = record.read_attribute('count') || 0
next if record.read_attribute(column_name) == count
updates = ["#{column_name} = #{count}"]
timestamp_columns.each do |timestamp_column|
updates << "#{timestamp_column} = '#{current_time.to_formatted_s(:db)}'"
end
relation_class.where(relation_class.primary_key => record.send(relation_class.primary_key)).update_all(updates.join(', '))
end
end
end
Proposal
If ActiveRecord could expose the update_all Arel statement the above SQL updates could get batched. The change to the ActiveRecord codebase would be minimal. Rename update_all
to update_all_statement
and create a new update_all
method that calls update_all_statement
.
Latest Rails Code
def update_all(updates)
raise ArgumentError, "Empty list of attributes to change" if updates.blank?
if updates.is_a?(Hash)
if klass.locking_enabled? &&
!updates.key?(klass.locking_column) &&
!updates.key?(klass.locking_column.to_sym)
attr = table[klass.locking_column]
updates[attr.name] = _increment_attribute(attr)
end
values = _substitute_values(updates)
else
values = Arel.sql(klass.sanitize_sql_for_assignment(updates, table.name))
end
arel = eager_loading? ? apply_join_dependency.arel : build_arel
arel.source.left = table
group_values_arel_columns = arel_columns(group_values.uniq)
having_clause_ast = having_clause.ast unless having_clause.empty?
stmt = arel.compile_update(values, table[primary_key], having_clause_ast, group_values_arel_columns)
klass.connection.update(stmt, "#{klass} Update All").tap { reset }
end
Would become
def update_all_statement(updates)
raise ArgumentError, "Empty list of attributes to change" if updates.blank?
if updates.is_a?(Hash)
if klass.locking_enabled? &&
!updates.key?(klass.locking_column) &&
!updates.key?(klass.locking_column.to_sym)
attr = table[klass.locking_column]
updates[attr.name] = _increment_attribute(attr)
end
values = _substitute_values(updates)
else
values = Arel.sql(klass.sanitize_sql_for_assignment(updates, table.name))
end
arel = eager_loading? ? apply_join_dependency.arel : build_arel
arel.source.left = table
group_values_arel_columns = arel_columns(group_values.uniq)
having_clause_ast = having_clause.ast unless having_clause.empty?
arel.compile_update(values, table[primary_key], having_clause_ast, group_values_arel_columns)
end
def update_all(updates)
stmt = update_all_statement(updates)
klass.connection.update(stmt, "#{klass} Update All").tap { reset }
end