Retry logic for killed Resque Workers

geoffyoungs · October 21, 2023, 7:41pm

Hi,

I’ve opened an issue (Killed Resque jobs cannot be retried using ActiveJob · Issue #49734 · rails/rails · GitHub) and a PR (Catch and handle Resque::DirtyExit exceptions in ActiveJob by geoffyoungs · Pull Request #49735 · rails/rails · GitHub) to fix.

We run ActiveJob / resque in production on k8s and find (especially after deploys) that the code that reaps old workers kills them too quickly for them to be handled nicely.

The failed jobs show in the resque failure queue (can be seen from resque-web interface) but because the exception is generated outside of the bounds of the job (in a different worker that is still active) it’s not handled by the existing ActiveJob exception handler.

I’ve added a MiniTest that demonstrates the issue (against rails/rails) and how it is fixed in the PR - but I’m not sure what the best approach to testing in rails is, as this requires multiple processes and running up redis-server to demonstrate the issue.

Is this issue so niche as to be outside the scope of the ActiveJob adapter or not?

If not, any thoughts on my proposed fix/testing?

Thanks,

Geoff.

Topic		Replies	Views
ActiveJob automatic retries rubyonrails-core	0	172	January 1, 2015
Testing ActiveJobs with retry_on attempts: :unlimited rubyonrails-talk activejob	1	265	October 17, 2024
Need help for Resque job. rubyonrails-talk	2	189	August 17, 2014
Existing PR for better logging of ActiveJob retry/discard? rubyonrails-talk	0	238	November 9, 2021
Problem in resque: job not getting removed from queue when finished processing rubyonrails-core	1	160	April 6, 2011

Retry logic for killed Resque Workers

Related topics

More Resources