*Something* is going on with the Rails loader in 6

A story, in which misaligned module/path names cause Rails to lose track of constants (sometimes) & explicit require statements did SOMETHING strange to the Rails loader.

I wanted to move all user-management code into a namespace called Auth, to make it mesh neatly with the rest of a “modules within the monolith” structure. (I don’t really remember the details, this was in summer 2019.)

Anyway, I there was a model I forgot to rename in there. Let’s call it UserAction. This model lived in /app/models/auth/user_action.rb but it was just called UserAction, not Auth. There was also a lib/auth/sso_client helper file that was required at the top of /app/controllers/auth/user_sessions_controller.rb using require 'auth/sso_client'.

Rails could find UserAction just fine. However, it kept on insisting that Auth::UserSessionsController did not exist. I remember I was co-working with @Noel_Rappin when this first came up, that I had him bullshit-check my work, and that he couldn’t see anything wrong with Auth::UserSessionsController.

The problem was only happening for me, not for my teammates, and went magically away after a few days.

EXCEPT, a month later, it started happening for everyone intermittently, and also happening in prod.

In order to fix it, I remember needing to:

  1. rename UserAction to Auth::UserAction
  2. put lib under Rails loader control to get rid of the require 'auth/sso_client'
  3. remove random other require statements from entirely different parts of the application. (This was particularly annoying because I needed to stop using require for some classes that I was using the def ClassName constructor pattern for, and Zeitwerk does not autoload those constructors if they are kept in the same file as the method they construct. I was able to work around this but my workaround was hella ugly and confusing.)

I remember trying all of those things in isolation to see if they were actually all required, and they appeared to be. It is however possible that something could have cached itself in a way that makes me incorrect about that. Again, sorry I don’t have better details here.

Since then my team has kept on seeing blips that feel like they’re related. The only concrete one that comes to mind: once, a developer accidentally copypastaed a require 'spec_helper' to the top of an application code file and similar symptoms emerged. I don’t remember exactly what they were, but they were in the “losing track of classes” genre. If I hadn’t had my past experience and known to search the diff for weird-looking require statements, I imagine that this would have taken me days to debug; the error messages I was seeing would not have pointed me in this direction without that context.

I have tended to attribute these issues to Zeitwerk, because I’ve never seen anything like them before Rails 6. In truth though I have no idea if this is a Zeitwerk thing, a Bootsnap thing, a non-Zeitwerk autoloader thing, some combination of the above…

It’s an ongoing annoyance for me though, and it’s been a day-ruining blocker for other developers on my team.

That is key. Zeitwerk never would load such constant. It is not defined in the right file. Any run of the application that tried to access that constant would have raised an exception.

Could it be the case that you issued a require call for that file? require loads whatever the file contains. And it is idempotent (does not play well with reloads).

require cannot be used for files that have to be autoloaded and reloaded. That is, your application files. In an application that is managed by an autoloader the way to load code is to reference constants. You have to see each single constant as if it was a require.

You have to program as if all application classes and modules were already in memory.

Let me give you some advices in case they may help:

  1. Never load an autoloadable file with require. Use require exclusively for files in lib or for 3rd party libs as in require "nokogiri". This is not new with Zeitwerk, it has always been this way.
  2. Make sure rails zeitwerk:check passes. Some companies even have this integrated with CI. The task may catch something wrong in a file the test suite is not exercising.
  3. To troubleshoot something, throw Rails.autoloaders.log! in config/application.rb and inspect the traces.
  4. If nothing of that works, shoot me a line.

I think it’s possible that UserAction was required explicitly, yes.

However, what gets to me is that the error wasn’t with UserAction. It was consistently with Auth::UserSessionsController.

A good error will provide a reasonably competent developer with a pointer to where to look next. There’s nothing about undefined constant Auth::UserSessionsController that would lead the average developer to conclude that the problem has to do with a mismatch between UserAction's file location and actual namespace.

I believe that Zeitwerk needs to expose significantly more actionable errors here. I know, because you’ve mentioned it elsewhere in this forum, that the Ruby autoload mechanism Zeitwerk leverages doesn’t retain a lot of information about call sites. Because of that, I worry that improving the error messages here may be quite difficult. However, one of Rails’s biggest strengths is that it chooses developer experience over purity of design/implementation when these two things come into conflict. So my perspective is that, however difficult, we gotta find some way to make this work. (And I mean that “we,” I am happy to roll up my sleeves and contribute code here.)

I suspect that my criticisms of Zeitwerk have given you the impression that I’m dismissive of your work on it overall, and I’m sorry if that’s been the case. I appreciate the performance improvements that Zeitwerk makes over the classic autoloader, and even though I have experienced it as less stable I’m glad that so many people have had the opposite experience. But right now, it’s brittle against atypical or unexpected directory structures, and unexpected require statements, in ways that are hard to debug. I’m glad for the reminder that rails zeitwerk:check exists, but I don’t think that it’s discoverable enough to be an adequate solution to the problem. We need an error message here that points people in the right direction.

I get your point about require always having been considered problematic. Subjectively, this feels like it’s flakier in Rails 6, but that is subjective. However, I don’t believe that “it’s always been this way” is a good reason for this to be the case going forward. I have two big reasons for feeling this way:

  • Developers coming to Ruby from other languages are used to explicit requires, and can struggle with the app/lib distinction. While they’re learning the Rails Way here, they tend to misplace require statements. (In fact, the code I described above was originally written by developers with a Java/Javascript background who’d only been doing Rails for a few months.) So: misplaced require statement errors are some of the hardest Rails errors to debug, and they’re most frequently encountered by Rails newbies. That’s a bad combination!
  • I think that in particular, disallowing require while also making the autoloader mandate a strict code structure reduces the “big tent” quality modern Rails strives for. I think it’s reasonable for the autoloader to nudge people very hard in the direction of a particular code structure, or refuse to help people if they want to use a different one. But I think it’s a different matter entirely for the autoloader to not allow them “escape hatches.” To me, this is the difference between an opinionated framework (good) and a dogmatic framework (bad).

It’s late here. Just let me point out that Zeitwerk is not about performance. Zeitwerk provides an autoloader that matches Ruby semantics, which is something the classic autoloader just was not able to do and was a source of many gotchas that have disappeared. What users get with the new autoloader is a way more solid and predictable autoloading experience, compared to the previous one.

In addition to that, Zeitwerk is usable by any Ruby project, not only Rails. This is also new.

I am learning Swift. And to learn Swift, I need to study Swift. If something is different than Ruby, Swift programmers do not have to make it intuitive for Ruby programmers, or Haskell programmers, or Prolog programmers. They have to design the language according to their choices, and the Java programmer has to adapt. Not the other way around.

In Java the User class means something. In Ruby the User class is an abuse of language, it means the class object stored in the User constant, whose name is not guaranteed to be even “User”. Why? Matz choice. You learn, adapt, and flow with the language design.

In Rails, the choice has been that applications don’t use require to load application code. At most, with classic, you had a last resort to fix loading order issues called require_dependency that is going to be eventually deleted. So when you see a Rails tutorial, the file that defines UsersController will never have require "user". It just refers to User.

One of the main motivations for Zeitwerk is to avoid using require, besides fixing Rails autoloading.

So, I invite you to change your mind, in this case. You’re writing Rails, in Rails there are no require calls for application code. You just refer to constants.

1 Like

I also wholeheartly recommend to read the autoloading guide. It will make you a better Rails programmer. Not using require is first thing in the topic.

I’ll add some polish to that guide for 6.1, but the content is good.

1 Like

My mental model of autoloading is is that when you call a nested constant, the loader will resolve each nested constant sequentially. To do that, it goes through all the entries in autoload_paths according to a resolution algorithm.

I don’t know how Rails / Zeitwerk works, but a naive autoloader might do something like:

(assuming we’re resolving Auth::Secondary::SomeController)

  1. if a constant named Auth does not exist, we have not resolved Auth before, so go through autoload_paths and look for all files / directories which match auth.rb or /auth/
  2. For all files that match the name auth.rb, load it and raise an error if it doesn’t define Auth.
  3. For all base directories which match /auth, define the constant Auth and add these directories into the lookup cache for Auth

Now that we have narrowed down the file locations for Auth, we can proceed to look up Secondary according to the same rules, but instead of starting in autoload_paths, we start in the lookup cache for Auth.

If the constant Auth is already defined but Auth::Secondary doesn’t exist, we skip looking up Auth and go straight to resolving Auth::Secondary based on the lookup cache for Auth.

This algorithm works on the assumption that:

if a constant is loaded, the lookup cache for that constant will have been built.

Now, what could screw up this lookup?

Having a file named auth.rb which defines both Auth and Auth::Secondary

The algorithm will resolve Auth successfully, see that Auth::Secondary has been defined, and try to look up Auth::Secondary::SomeController in the lookup cache for Auth::Secondary. However, this will fail as the lookup cache for Auth::Secondary is empty because no lookup was performed for Auth::Secondary.

I’m guessing that require screws up Rails / Zeitwork autoloading in a similar way by having constants loaded before the actual algorithm would have gotten around to loading them.

@ferngus your mental model approximates a bit the idea behind classic mode. If you are really interested in fine tuning your understanding, please watch this talk from RailsConf 2014.

That technique has a series of important limitations which zeitwerk mode addresses. The approach followed by Zeitwerk is totally different. If you are interested in understanding how it works, watch this talk from RailsConf 2019.

classic mode starts its deprecation cycle in Rails 6.1, BTW. So classic mode might be interesting for the sake of curiosity, and to understand where we come from, but looking towards the future, what you really need to be familiar with is zeitwerk mode.

2 Likes

Appreciate the pointers. Definitely interested, will look into the videos.