Announcing sidekiq-iteration - a gem that makes your long-running sidekiq jobs interruptible and resumable by design

fatkodima · November 2, 2022, 4:33pm

Hello everyone

I am publishing a new gem - GitHub - fatkodima/sidekiq-iteration: Makes your long-running sidekiq jobs interruptible and resumable by design. For those familiar with job-iteration (GitHub - Shopify/job-iteration: Makes your background jobs interruptible and resumable by design.) from Shopify, this is an adoption of that gem to be used with raw Sidekiq (no ActiveJob).

Motivation

Imagine the following job:

class NotifyUsersJob
  include Sidekiq::Job

  def perform
    User.find_each do |user|
      user.notify_about_something
    end
  end
end

The job would run fairly quickly when you only have a hundred User records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.

With frequent deploys and worker restarts, it would mean that a job will be either lost or restarted from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.

Solution

sidekiq-iteration helps to make this job interruptible and resumable. It will look like this:

class NotifyUsersJob
  include Sidekiq::Job
  include SidekiqIteration::Iteration

  def build_enumerator(cursor:)
    active_record_records_enumerator(User.all, cursor: cursor)
  end

  def each_iteration(user)
    user.notify_about_something
  end
end

each_iteration will be called for each User record in User.all relation. The relation will be ordered by primary key, exactly like find_each does. Iteration hooks into Sidekiq out of the box to support graceful interruption. No extra configuration is required.

See the gem documentation for more details and examples of usage.

john-999 · November 5, 2022, 1:14pm

Nice!

IMO, the “makes your jobs interruptible and resumable” part should be integrated into ActiveJob.

collimarco · November 7, 2022, 11:47pm

Does it uses PostgreSQL cursors in order to avoid repeating the query on the database? Or it repeats the (maybe complex) query on every iteration using “ID grater than”?

fatkodima · November 8, 2022, 9:17pm

It uses the same approach as in_batches (basically the “ID greater than” that you mentioned). It has a pretty decent performance, which was also recently quite improved for whole table batching - Optimize Active Record batching for whole table iterations by fatkodima · Pull Request #45414 · rails/rails · GitHub.

Topic		Replies	Views
Announcing sidekiq-expiring-jobs - a new gem that adds expiring jobs support to Sidekiq rubyonrails-talk	0	675	April 12, 2023
ActiveJob automatic retries rubyonrails-core	0	168	January 1, 2015
Cron job for active_jobs rubyonrails-talk	1	220	February 18, 2016
Parallelizing Tasks in the Background rubyonrails-talk	1	155	October 6, 2013
sidekiq job status back to browser rubyonrails-talk	2	255	March 24, 2014

Announcing sidekiq-iteration - a gem that makes your long-running sidekiq jobs interruptible and resumable by design

Motivation

Solution

Related topics

More Resources