Problem: Code Reloading and Exclusive Locks

I’ve got an application with a controller action that looks something like this:

def my_action
  model = Record.find(params[:id])
  input_html = render_to_string(…)

  input = Tempfile.new(['input-file', '.html'])
  input.puts(input_html)
  input.close

  render plain: `./bin/process #{input.path}`
end

So long as ./bin/process returns in a timely manner, we typically haven’t had a problem with this code. However, we have recently run into a particular confluence of situations that makes this code more problematic.

  • If input_html contains links to related assets…
  • … and requests for those assets are dispatched to the Rails server…
  • … and those requests pass through the ActionDispatch::Executor middleware…
  • … and the Rails server is configured with cache_classes = false
  • … then the request to my_action will hang until ./bin/process aborts.

In the specific case we’re encountering, the rendered HTML contains embedded ActiveStorage images, which ./bin/process must load to complete successfully. Requests for those assets need to go through the Rails router, but are being blocked in ActionDispatch::Executor which is waiting for an exclusive lock so that it can unload (and later reload) the application classes. Since ./bin/process is still executing, that lock cannot be granted, and the application is deadlocked.

Referring to the Threading and Code Execution documentation, there is reference to the ActiveSupport::Dependencies.interlock.permit_concurrent_loads method as a potential solution to similar / related problems, but it provides no relief for this particular problem.

Practically speaking, this is only an issue for a) recursive Rails requests b) that utilize the Rails router c) in the development environment. Even so, it would be nice to have a solution for this — does anyone have any insight they can share with me?

Interesting one!

For that to happen, a monitored file should have been edited while bin/process runs, right? If code reloading is not triggered, a recursive call should work like a regular parallel call does.

@fxn Thanks for the response.

In my case, bin/process doesn’t actually touch any files outside of /tmp/ … but your assertion that code reloading is triggered made me re-evaluate some of my assumptions about what was going on.

After digging into what causes a reload to trigger, and logging Rails.application.reloaders just prior to triggering the recursive call, I was able to find an unexpected culprit — a manually created EventedFileUpdateChecker.

This “reloader” was watching a set of directories for changes, and after Puma forked its worker process, it was naturally set to true. However, my naïve expectation that the ActionDispatch::Reloader middleware would execute that checker (and clear its changed? bit) proved to be unfounded. [The buggy behavior had gone unnoticed, since the code in the callback was also the code that implicitly ran when the class in question was loaded.]

However, without this buggy watcher, requests are successful iff there have been no code changes since the last request — if updated? would return true, all threads try to unload, and the recursive request fails. This failure case includes the first request made to the server, which I believe has to do with Puma’s habit of forking workers on startup.

(Since I seem to have omitted some useful context, I’m running Rails 6.0.3.4.)

That makes sense.

For some years, Rails has been thread-safe with default development mode settings thanks to that lock. That allows concurrent requests while there is no code reloading.

However, if a code reload is needed, it has to happen without concurrency, because code reloading itself is not thread-safe.

To solve all that, the lock that you saw is a RW lock. Requests take the lock for reading, and there can be several of them. But the reloader takes W, and that one is exclusive. While someone has W, no other threads can acquire R.

The problem you are encountering is that your main request has not finished, the recursive calls attempt to acquire R, as any other incoming request, but that is not possible because a code change is requesting W. Therefore, the main request never finishes, and W cannot be acquired either. So everything halts.

Let me think about it, let me also /cc @matthewd.