Strategy for large mailing job with ActiveJob

I have a rails 6.0 app running on Heroku. I use Heroku scheduler to schedule a Sidekiq job that sends a daily email to ~ 20,000 users. The email is custom for each user though there are common components that could be shared/cached for efficiency. I have two Sidekiq workers with -c 3.

I’ve tried several strategies already and none of them feel right, my Google-Fu fails me and so I thought I’d ask here. How would you approach this?

Two strategies that I’ve tried:

  1. Single job for everything

    • scheduler runs a rake tasks that schedules OneBigJob
    • OneBigJob iterates through 20,000 users, generates an email for each and sends it.
  2. Lots of little jobs

    • scheduler runs a rake tasks that schedules MainJob
    • MainJob iterates through 20,000 users, generates a LittleJob for each user
    • LittleJob generates an email and sends it

Strategy 1 has some efficiencies because some of the content is shared but it is running up against memory limits in my workers.

I am new-ish to ActiveJob & Sidekiq and wonder if there are any downsides to scheduling 20,000 jobs.

How would you approach this?

Option 2 is the preferred way to do it in my opinion.

As a rule of thumb, you want your background jobs to be idempotent.

In practice this means writing jobs that can fail and retry without causing unwanted behavior (in your case, sending an email twice to a user)

Your first job’s task is to *schedule all of the 20,000 emails that need to be sent (probably calling your mailer with deliver_later). After each email is scheduled, you should try to save somewhere that that user’s email has already been scheduled in case your job fails and needs to retry.

Something like:

User.where(todays_email_scheduled: false).each do |user|
  # schedule email with deliver later
  user.update(todays_email_scheduled: true)
end

Do a bit of research into batches as well to help with memory issues.

https://api.rubyonrails.org/classes/ActiveRecord/Batches.html

Also, some APIs have a limit of requests per second or minute you can execute so make sure you investigate that and maybe schedule emails with a bit of delay.

And be sure to research about affecting your domain’s spam score. You might want to add special metadata to your email to make sure it’s not flagged as spam

I hope this helps!

This is exactly the issue that caused me to revisit this.

We’ve been sending emails for years but we’ve recently moved to Heroku where the memory is more constrained (I’m still trying to fit in 512Mb) and this became an issue with recent user growth.

My only concern was whether there was any issue with creating so many jobs in one go.

I’ll bite the bullet and schedule separate jobs for each user. Thanks for the advice.

this became an issue with recent user growth.

That’s a good problem to have then!

If your first Job failed, you might be able to fix it using find_in_batches. Saving 20,000 records in memory to iterate through them is very expensive. Batches will allow you to cycle through the collection in smaller groups which will greatly reduce your memory usage.