I’m using rails 7.1 and postgresql 13 , and try to adopt upsert_all to my needs, and I noticed it uses CURRENT_TIMESTAMP
PG function to set a value for created_at/updated_at timestamps.
I believe it’s inconsistent to regular create
/update
behaviour, because it relies on database time rather than application time. I see at least 2 reasons why we shouldn’t do that:
CURRENT_TIMESTAMP
uses DB transaction start time, which is incorrect if upsert happens in the middle of long-running transaction. I believe timestamp should display exact time of call, e.g. to find it in app logs.- it’s impossible to test timestamps with time travelling approach, timestamps are always computer time, not stubbed time.
I did this quick&dirty monkeypatch in my app, and it seems to work:
module ActiveRecord
class InsertAll
attr_reader :current_timestamp
alias_method :old_initialize, :initialize
def initialize(model, inserts, on_duplicate:, update_only: nil, returning: nil, unique_by: nil, record_timestamps: nil)
@current_timestamp = Time.current
old_initialize(model, inserts, on_duplicate:, update_only:, returning:, unique_by:, record_timestamps:)
end
private
# Overridden connection.high_precision_current_timestamp by @current_timestamp
def timestamps_for_create
model.all_timestamp_attributes_in_model.index_with(current_timestamp)
end
class Builder
# Overridden connection.high_precision_current_timestamp by @current_timestamp
def touch_model_timestamps_unless(&block)
return "" unless update_duplicates? && record_timestamps?
model.timestamp_attributes_for_update_in_model.filter_map do |column_name|
if touch_timestamp_attribute?(column_name)
# only this SQL is overridden ----------------------------------------------------------------------------------------------------vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
"#{column_name}=(CASE WHEN (#{updatable_columns.map(&block).join(" AND ")}) THEN #{model.quoted_table_name}.#{column_name} ELSE #{connection.quote(insert_all.current_timestamp)} END),"
end
end.join
end
end
end
end
Calling
User.upsert_all([{ email: 'foo@bar1.com', country_id: 1 }], unique_by: %i[email])
Generated SQL query originally
INSERT INTO "users" ("email","country_id","created_at","updated_at") VALUES ('foo@bar.com', 1, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP) ON CONFLICT ("email") DO UPDATE SET updated_at=(CASE WHEN ("users"."country_id" IS NOT DISTINCT FROM excluded."country_id") THEN "users".updated_at ELSE CURRENT_TIMESTAMP END),"country_id"=excluded."country_id" RETURNING "id"
and with monkey patch:
INSERT INTO "users" ("email","country_id","created_at","updated_at") VALUES ('foo@bar1.com', 1, '2024-06-22 22:29:40.917937', '2024-06-22 22:29:40.917937') ON CONFLICT ("email") DO UPDATE SET updated_at=(CASE WHEN ("users"."country_id" IS NOT DISTINCT FROM excluded."country_id") THEN "users".updated_at ELSE '2024-06-22 22:29:40.917937' END),"country_id"=excluded."country_id" RETURNING "id"
Which is also works correctly.
Was there a reason why it was chosen to use DB functions instead of app time for timestamp? Is there a reason my monkey patch is incorrect?