Tales from an Unconventional Rails App

pvande · May 19, 2020, 5:00pm

The application described in this long-winded post does not live on the Rails “happy path”. The problems it encounters are unlikely to affect most Rails applications. The frustrations described herein, however, are most definitely real. I also try to discuss a bit beyond the issues encountered, into the solutions we’ve attempted or implemented. Some of these — perhaps many — are outright hacks, prone to fragility, or otherwise poor code. While I would be happy to hear about documented approaches or APIs that I’ve missed, I would ask readers to hold their judgements — the application here is (as all software is) a series of considered tradeoffs and balanced concerns.

Before I get into this, I want to acknowledge that Rails is great, and the core team is inspiring — the work you all have done and are doing is what allows so many of us to be successful. So, thank you all for everything you’ve done, and for taking the time to engage the community around our pain.

The Application

I work for an organization whose primary application is a CMS for authoring K-12 textbooks. This application is used exclusively by organization employees and their partners, meaning that many typical web application concerns — high concurrency, malicious input sanitization, etc. — aren’t high priority needs for us.

All of our development happens in Docker (in an Alpine Linux environment, usually running on macOS), running Rails 6.0.3 on Ruby 2.6.4. We’re backed by a PostgreSQL database, with Redis and Sidekiq for background jobs. Our production and staging instances are managed by Heroku.

Our application manages data across multiple curricula in a single database, structuring the data in a freeform tree (backed by PostgreSQL’s ltree plugin, for those curious). Our curricula don’t uniformly follow the same structural convention (e.g. Module → Unit → Lesson), so our trees may be unpredictably deep in places and unexpectedly shallow in others.

Despite sharing the same base models (and database tables), we’ve architected our curriculum code (which we refer to as “projects”) in such a way that they are isolated from one another, and use STI to achieve specialization. This lets us, for example, encode the differences between Lessons in different curricula, and between Lessons and Units in the same project. In practice, our projects are something like little Rails Engines — directories of views, models, assets, and configuration that follow a regular structure which our application understands.

Because the review and editing patterns are so consistent between models, we handle all of our CRUD operations on those records through a single controller, which delegates to a model-specific form object.

At the tail end of the process, we generate the various textbooks we produce, rendering massive HTML pages that traverse most (if not all) of the given tree (along with any associated embedded content and tags). This HTML then gets passed off to various processes that refine it into its terminal forms.

Spring

We ended up removing Spring fairly early in the application’s life. Ultimately, we found that we were restarting the Docker container frequently enough in development that the advantage of having a background process that preloading the application was being lost to the initial boot overhead.

I also have personally lost days of development time to Spring cache issues, where I’m trying to diagnose why the change I’ve made to the code hasn’t produced the logical results I’m trying to effect — and I’m not the only one on my team. This, in turn, leads to a distrust of Spring and a preference towards the more reliable behavior of restarting the Docker container to avoid the issue. Removing Spring made our effective development work faster by avoiding the startup overhead and reducing the number of times we felt the need to restart the container, even as our individual rails command runs lost speed. YMMV.

I feel like Spring exists in a somewhat unfortunate position — it’s a background process that’s a part of your development environment, with the specific aim of being an invisible performance boost that developers don’t have to think about. At the same time, owing to cases where Spring is too good at caching, Spring becomes something that everyone has to constantly think about in case they need to work around it. This has been something I’ve witnessed on every Rails project I’ve been a part of, and why I see Spring consistently being disabled on those projects.

Sprockets

We’ve also stripped Sprockets out of our application. Having worked with Webpack extensively on other projects, I’ve become familiar with the nuance of Webpack configuration (and enamored of hot reloading both Javascript and CSS). While Sprockets offers a simpler experience for CSS composition, Webpack offers a better experience for development (quirks notwithstanding).

There are a few obstacles to setting this up correctly — our application’s directory structure being only one of them — but it’s entirely possible set up Webpacker to resolve CSS and image files nearly identically to how it handles Javascript.

Running Rails without Sprockets, however, has its own challenges — especially if you’re using Engines. At present, there is no good story for Engine-based assets that doesn’t involve either static compilation or Sprockets, and most Engines have an implicit dependency on Sprockets. While Rails will let you choose to opt out of Sprockets at application creation time, there’s no indication that this decision is also opting you out of some subset of useful tools (e.g. PgHero).

Webpacker

Webpack (and Webpacker) are not without their own WTFs, but we’ve been content to set up our own conventions and work within them. (So much so, in fact, that when we needed to do a visual facelift on a secondary application we manage, our front-end engineer specifically requested that we reproduce the Webpacker setup on that application!) The biggest “complaint” we’ve had about Webpacker had to do with how its helpers function. If you set the extract_css option to false, any CSS required by your Javascript will be injected from Javascript; if it’s true, a separate CSS file will be generated that needs to be loaded. If you’re using different values in production and development (e.g. hot reloading CSS in development), this means that you need to call both javascript_pack_tag and stylesheet_pack_tag for the same entry points in order to get somewhat consistent behavior.

To work around this in our application, we wrote our own helper that builds inclusion tags for all JS and CSS files generated for a named entry point. (Passing options to each is a little clunky, but it’s also not something we do often.)

def asset_pack_tags(*names, js_options: {}, css_options: {})
  manifest = current_webpacker_instance.manifest

  js_entries = names.flat_map { |name| manifest.lookup_pack_with_chunks(name, type: :javascript) || [] }.uniq
  css_entries = names.flat_map { |name| manifest.lookup_pack_with_chunks(name, type: :stylesheet) || [] }.uniq

  js_tags = javascript_include_tag(*js_entries, **js_options)
  css_tags = stylesheet_link_tag(*css_entries, **css_options)

  return js_tags + css_tags
end

Zeitwerk

There is a lot of inconsistent documentation around autoloading in Rails. Formally (I believe), the “official” interface for modifying autoload paths is Rails.application.config.paths.add — but there is still plenty of advice promoting modifications to autoload_paths and eager_load_paths. As a bonus, IIRC, config.paths.add is documented as taking an autoload option, but it’s not passed by any of the Rails code, and paths are autoloaded without it.

The recent addition to Zeitwerk of the collapse method was a fantastic boon to us, allowing us to simplify paths to our project models from projects/<project_name>/models/<project_name>/project_model.rb — so glad to see that change!

There are still a couple rough edges, though. Our projects, like most gems, have files that don’t follow the same filename → constant convention. In gems, these are usually the lib/<library>/version.rb files; in our projects, they live at projects/<project_name>/config/project.rb. While Zeitwerk can support these files through the use of a custom inflector, the one-off nature of these files makes that feel fairly overkill. It seems like it would be generally preferable to be able to declare explicit mappings for these exceptions. That would provide useful flexibility, allowing minor infrequent exceptions to be quickly defined with a method call, and allowing alternative conventions to be implemented via a custom inflector.

ActiveStorage

Like many applications we store user-uploaded assets (like images) in S3 in our non-development environments. In development, we’re perfectly content to use our local disk for storage. Configuring ActiveStorage to do this is simple and straightforward, as it should be.

Given the complexity of data we work with, it’s easiest for us to simply load a backup of the production data server into our development databases; this is where we get into problems with ActiveStorage. Because all of the bookkeeping for AS is done in the database, our development environment knows the keys of all of the production assets … but will never find them on disk (unless we also copied everything from S3 to local disk as well).

For our use case, what we actually wanted was a way to specify fallbacks for failed lookups. We would never want development to update the production S3 bucket, for example, but if it could read from it, that would be helpful. On the other hand, we would definitely still want reads and writes to continue to work in development, so we couldn’t simply point to the production database with a read-only account. ActiveStorage provides a MirrorService, but nothing like we were looking for…

So we built one.

# Usage:
#
# local:
#   service: Disk
#   root: <%= Rails.root.join("storage") %>
#
# remote:
#   service: S3
#   region: us-east-2
#   bucket: <%= ENV['S3_BUCKET'] -%>
#   access_key_id: <%= ENV['S3_ACCESS_KEY'] ) %>
#   secret_access_key: <%= ENV['S3_SECRET_KEY'] ) %>
#
# local_with_fallback:
#   service: ReadReplica
#   primary: local
#   replicas: [ 'remote' ]

class ActiveStorage::Service::ReadReplicaService < ActiveStorage::Service
  attr_reader :primary, :services

  delegate :upload, :update_metadata, to: :primary
  delegate :delete, :delete_prefixed, to: :primary
  delegate :path_for, :url_for_direct_upload, to: :primary

  def self.build(primary:, replicas:, configurator:, **options)
    primary = configurator.build(primary)
    replicas = replicas.map { |name| configurator.build(name) }

    return self.new(primary: primary, replicas: replicas)
  end

  def initialize(primary:, replicas:)
    @primary = primary
    @services = [ primary, *replicas ]
  end

  def download(key, &block)
    service_for(key).download(key, &block)
  end

  def download_chunk(key, range)
    service_for(key).download_chunk(key, range)
  end

  def exist?(key)
    @services.any? { |service| service.exist?(key) }
  end

  def url(key, **opts)
    service_for(key).url(key, **opts)
  end

  private

  def service_for(key)
    @services.find(-> { @primary }) { |service| service.exist?(key) }
  end
end

Credit where credit’s due: while this wasn’t baked into Rails, and while the documentation is a bit sparse, adding a custom ActiveStorage Service was generally straightforward.

The other issue we’ve had with ActiveStorage is a relatively minor nuisance, but one that is worth mentioning all the same. Our image attachments are a mix of hi-res raster (usually PNG) and vector (usually SVG) images. Naturally, sending dozens of multi-megabyte images over the wire during the editing process is not particularly useful, which is where ActiveStorage Variants come in. However, we cannot simply call Attachment#variant and get reasonable results — that method raises an exception if you attempt to call it for an Attachment that is not variable? (like an SVG). Consequently, our templates end up littered with fragments like this:

<%= image_tag polymorphic_path(record.image.variable? ? record.image.variant(...) : record.image) -%>

This is doubly problematic, since forgetting to check variable? before calling variant will look correct often enough that changes can get at times pass through QA without having triggered the failure case. In applications where the attachments are truly unpredictable user input, I would expect the issues to be even more common.

What would feel useful here is a method that allows you to express the intent to (e.g.) use a thumbnail-sized variant, and left the determination of how to proceed with the library. Such a method would be prone to a different type of error (namely, handling cases where a constrained image size was specified, but the unconstrained image is served and breaks the layout) but that concern is presentational and non-fatal, which seems preferable for an error case this well-hidden.

ActiveRecord :: Associations without Foreign Keys

This is a minor use case presenting major difficulties. As mentioned, we’re using PostgreSQL’s ltree extension to model our content hierarchy — this employs a “materialized path” to denote a node’s location within a tree (e.g. Root.Photos.Science.Astronomy). The extension allows the database to sensibly index the relationship between nodes in the same tree, so queries for descendants are just as fast as a query for a parent. What this means, however, is that we don’t have a foreign key for our relationships, just an SQL expression.

ActiveRecord’s association macros are great, but they have a baked in assumption (and usually rightly so) that there are two columns that can be compared between tables to perform a join. The workaround isn’t pretty, but it’s functional.

fk = self.name.foreign_key
has_many :children, -> (node) { unscope(where: fk).where(project: node.project).where("subpath(path, 0, -1) = ?", node.path) }

This approach works, but it introduces a couple of additional problems.

ActiveRecord :: Creating off Associations without Foreign Keys

Because our newly-minted association doesn’t have a foreign key, Rails can’t (and shouldn’t be expected to) figure out how to ensure that it’s setting up parentage correctly. Instead, that’s something that quite reasonably falls to our application. If we have a has_many relationship, it’s relatively easy to patch that in:

has_many :children, -> {...}, before_add: assign_parent

private def assign_parent(parent)
  self.path = [ parent.path, self.name ].join('.')
end

If the relationship is a has_one, we have a different problem — has_one doesn’t support before_add, or any comparable callback. Maybe that’s fine (he said, hopefully), has_one just creates specialized instance methods on the record (e.g. build_<association>) method, so those can be overridden.

Remember how I said we’re using form objects to handle all of our content edits? Those form objects also edit and create nested records, and to do that, they rely on Association#build and Reflection#build_association. Even if we did just override build_child, our form objects would have the same problem. Unfortunately, fixing this is ugly.

class TreeNode
  has_many :child, -> {...}
  reflect_on_association(:child).define_singleton_method(:association_class) do
    MyHasOneChildAssociation
  end

  class MyHasOneChildAssociation < ActiveRecord::Associations::HasOneAssociation
    def initialize_attributes(child, except_from_scope_attributes = nil)
      super
      child.set_parent(@owner)
    end
  end
end

At this point, both children.build and build_child work properly, but that’s not quite the end of the story. children.create and create_child (along with the ! counterparts) fail after trying to assign an attribute to tree_node_id! The TL;DR is that those methods end up calling Association#set_owner_attributes, which tries (and fails) to assign the foreign key and type (when appropriate). We’ve worked around this by creating custom subclasses of both HasOneAssociation and HasManyAssociation that do more appropriate things in set_owner_attributes, but it fundamentally feels as though there’s a missing interface here. Potentially:

An option on the macro for customizing the association_class for the association.
An option on the macro for specifying a class or object that overrides the implicit condition for finds and the implicit behaviors for creates.
Meta-association methods for creating custom associations that behave like the built-in associations (e.g. build/create, reflection, preloading, etc.).

ActiveRecord :: PostgreSQL Generated Columns

The previously described associations would be easier to work with in Rails if we just had a foreign key on our table. Our path column is prone to change, however, adding a second column with derived data seems both redundant and error-prone. We could use a view (materialized or otherwise), but then we don’t have the ability to write changes back. Database triggers or functions are technically also an option. The “best” integrated option, however, would be a virtual column — Rails added support for defining them in migrations for MySQL and MariaDB back in Rails 5.1.0.

PostgreSQL also recently built out a similar feature in PostgreSQL 12, which they call “generated columns”. Sadly, I have seen no indication that Rails will be adding that functionality for the PostgreSQL adapter any time soon.

ActiveRecord :: Preloading

Because (as you may have noted above) we specified our children association with a scope that took a node argument, Rails won’t allow us to use the built-in preloading methods. While that’s disappointing, it’s also completely understandable — sorting that out isn’t something Rails can do on its own. It is, however, something that can be sorted out by someone familiar with the data domain.

Our needs are actually larger than that: we don’t know up front how many levels of children we should be preloading, but we do know how to trivially query the database for the entire tree. To that end, we’ve actually built out functionality that will “preload” associations for the entire tree. It looks something like this:

class TreeNode
  def self.preload_descendants
    extending(PreloadDescendants)
  end

  module PreloadDescendants
    def load
      super

      table = model.table_name

      loaded_paths = @records.pluck(:path)
      return if loaded_paths.empty?

      # Ensure that we maintain the project condition.
      descendants = model.where(where_values_hash.slice('project'))

      # Load all descendants of the queried nodes.
      descendants = descendants.where("#{table}.path <@ ARRAY[?]::ltree[]", loaded_paths)
                               .where.not(path: loaded_paths)

      # Load the tree from the bottom up, preserving the other order clauses.
      descendants.reorder!(Arel.sql("NLEVEL(#{table}.path) DESC"), *order_values)

      # Ensure that we do the same eager loading here that we did for our
      # original results.
      descendants.includes!(*includes_values) if includes_values.present?
      descendants.preload!(*preload_values) if preload_values.present?
      descendants.eager_load!(*eager_load_values) if eager_load_values.present?

      # Ensure that we reuse the same extension modules that we had earlier.
      # Obviously, we should omit ourselves, since our work is already done.
      descendants.extending!(*(extending_values - [PreloadDescendants]))

      nodes = descendants + @records
      nodes_by_parent = Hash.new { |h,k| h[k] = [] }

      # Because we've explicitly ordered our result set on `NLEVEL() DESC`, we will visit
      # all of the most deeply nested nodes first.  In particular, we will be guaranteed
      # to visit all children of a node before we visit their parent.  Rails will also
      # help guarantee that by freezing the array we send to `load_records` so that any
      # subsequent writes will cause errors.
      nodes.each do |node|
        nodes_by_parent[node.parent_path] << node

        association = node.association(:children)
        association.target = nodes_by_parent[node.tree_path]
        association.loaded!
      end

      return self
    end
  end
end

Worth noting:

We’re using extending to add a module to the Relation.
- That module overrides load, giving us a convenient “hook” for when the query is actually executed.
We construct a nearly identical query to the one implied by the Relation.
We manually “preload” the association by setting target and calling loaded!.

This, conveniently, works! Mostly. It definitely feels dirty, though.

We came to this solution when we found that we simply didn’t have a better option. Given a bucket of nodes, we know how to associate them, but there didn’t seem to be a better way to “hook” the Relation between when it loaded the data and when that data was consumed. We certainly could pass the relation to a service object that either preloaded each relation as above or (more cleanly) provided an interface for requesting relatives of a given node — and we did try it — but that interface is both less convenient and less conventional.

An interface that would have served us well here would be a callback for when a relation has loaded (ideally once per table queried!), and an “official” mechanism for populating the dataset for an association and/or relation.

ActiveRecord :: Extended Scopes on Preloaded Associations

This is a minor issue that I ran up against while I was working on a preloading refactor.

Consider the following code:

class Role; end

class User
  has_many :roles

  def self.preload_roles
    includes(:roles)
  end
end

class Thing
  belongs_to :user
end

# This loads the Thing, the Thing's User, and that User's Roles.
thing = Thing.includes(:user).merge(User.preload_roles)

# This is does the same thing.
Thing.belongs_to :user_with_roles, -> { preload_roles }
thing = Thing.includes(:user_with_roles).first

So far, so good. We can use merge to concatenate scopes, and the concatenated scopes appear to be associated with the correct objects. Now consider:

User.has_many :fancy_roles, -> { where(type: 'fancy') }

def User.preload_roles
  includes(:roles).extending(PopulateFancyRoles)
end

module PopulateFancyRoles
  def load
    super

    # Assume this is going to do something relevant, like populating
    # `fancy_roles` from the preloaded `roles` association.
    warn self.model
  end
end

# What do you expect these to log?
Thing.includes(:user_with_roles).first
Thing.preload(:user_with_roles).first
Thing.eager_load(:user_with_roles).first

Thing.includes(:user).merge(User.preload_roles).first
Thing.preload(:user).merge(User.preload_roles).first
Thing.eager_load(:user).merge(User.preload_roles).first

yoelblum · May 20, 2020, 10:27am

It might be wise to break these into separate issues!

fxn · June 18, 2020, 10:15pm

@Pieter Have you seen the inflect method? It is the second example here. Would that be what you had in mind?

Regarding paths, yes, config.autoload_paths is the public interface.

pvande · June 19, 2020, 2:40am

Unfortunately, that’s not quite what I had in mind. The problem in our case is that projects/foo_bar/config/project.rb defines the constant FooBar — as near as I can tell, the inflect method doesn’t allow you to specify the constant path that corresponds to a file path, only the constant name that corresponds to a path segment.

(I’ve explored putting together a PR to add a mechanism for defining this kind of ad hoc exception, but wanted to polish it further and discuss the potential further before publishing the code.)

If config.autoload_paths is the public interface, does that also imply that config.eager_load_paths is also required (as is commonly reported)? Is there a reason why autoload_paths is preferred over config.paths.add, which seems to do the same job, without repetition, and without “forking” the autoload paths? (autoload_paths returns and memoizes a dup of the paths named in config.paths.)

fxn · June 19, 2020, 8:46am

Second part needs some research, that interface (that I consider internal) goes back to Rails 3 days IIRC, but I can tell that you normally want to push to config.eager_load_paths because you want the code eager loaded if config.cache_classes. That collection then is added to the autoload paths by the Rails boot process.

Let’s focus for a moment in understanding the use case about config/project.rb.

The config directory does not seem to be something you normally would put in the autoload paths (either directly or indirectly). I am going to assume it is not.

config/project.rb can define FooBar as long as it is not a file managed by Zeitwerk, which under the assumption above, it is not.

Zeitwerk is careful also about which constants it manages, regardless of the file system. That is, if finds a file foo_bar.rb and the constant FooBar was already defined in memory, it ignores that file, it assumes the constant is really owned by someone else. That mimicks what Ruby does with $LOAD_PATH, first wins.

Similarly, if a directory foo_bar would normally be treated as an implicit namespace, but FooBar exists already defined “externally”, it assumes someone else has the ownership, respects that already existing module object, and descends into the subdirectory to see what does the project add to that namespace.

Assuming ownership belongs to someone else is also taken into account when reloading. Zeitwerk won’t trigger reloading for such constants, only for the ones it has control over (and that plays well with subconstants, it is capable of reloading autoloaded subconstants but preserve the parent).

Those are the rules of the game, and having them in mind it could be the case that the only thing you need to do is to ensure config/project.rb is evaluated at boot time before Zeitwerk runs its setup. If config/project.rb defines FooBar as a side-effect, that’s fine and will be respected.

pvande · June 19, 2020, 6:05pm

@fxn Thank you for taking the time to have this discussion.

I feel as though I’ve done a poor job communicating the particulars of this problem space; allow me to try and clarify.

Here’s an example of our project’s file structure:

/our_rails_app
├── Gemfile
├── Gemfile.lock
├── app
│   ├── assets
│   ├── controllers
│   ├── helpers
│   ├── models
│   └── views
├── bin
├── config
├── db
├── lib
├── log
├── projects
│   ├── k12_science
│   │   ├── assets
│   │   ├── config
│   │   ├── forms
│   │   ├── helpers
│   │   ├── lib
│   │   ├── models
│   │   ├── services
│   │   ├── tasks
│   │   ├── test
│   │   └── views
│   └── k12_math
│       ├── assets
│       ├── config
│       │   ├── locales
│       │   ├── postcss.config.js
│       │   ├── project.rb      (defines `K12Math`)
│       │   └── webpack.js
│       ├── forms
│       │   ├── unit_form.rb    (defines `K12Math::UnitForm`)
│       │   └── lesson_form.rb  (defines `K12Math::LessonForm`)
│       ├── helpers
│       ├── lib
│       ├── models
│       │   ├── unit.rb         (defines `K12Math::Unit`)
│       │   └── lesson.rb       (defines `K12Math::Lesson`)
│       ├── services
│       ├── tasks
│       ├── test
│       └── views
├── public
├── tmp
└── vendor

We have Zeitwerk configured with:

    config.autoload_paths << 'projects'
    Rails.autoloaders.each do |loader|
      # Don't autoload from non-code directories
      loader.ignore('projects/*/assets')
      loader.ignore('projects/*/config')
      loader.ignore('projects/*/lib')
      loader.ignore('projects/*/test')
      loader.ignore('projects/*/tasks')
      loader.ignore('projects/*/views')

      # Don't expect project subdirectories (e.g. projects/k12_math/models) to
      # contribute a namespace.
      loader.collapse('projects/*/*')
    end

What’s not initially clear is that we have a “host” application (which is otherwise a fairly typical Rails CMS application), that effectively has multiple tenants (which we call “projects”). Projects are somewhere between a Rails engine and a Ruby namespace module — they provide Rails views, models, and other Ruby objects, they’re integrated into the Webpack asset pipeline, but they have no controllers, no Railties, no Rails config changes, and there is a tight coupling between projects and the host application (in both directions).

When we designed projects this way, we made the following choices:

Projects should be self-contained, making them (relatively) easy to rename, duplicate, and remove.
Project structure should mimic Rails application structure.
Project constants should live inside an appropriate namespace.
The Project namespace should house any project configuration details.
The Project namespace should be declared in a predictable location.

That last point achieved relevancy when we removed our database “projects” table in favor of doing runtime reflection — an approach that only works if the project namespaces are eagerly loaded. To that end, we do eagerly load that file at application startup, and everything works as expected.

The conventional location for such a file would be e.g. projects/k12_math.rb (alongside the projects/k12_math directory), but this violates our desire to have our projects be self-contained. We made the decision to use projects/k12_math/config/project.rb fully aware that it meant that we would lose autoloading (and code reloading!) functionality by violating convention (including for all dependency files).

We’re also aware that we could revive that functionality by implementing a custom Inflector (and ensuring that projects/*/config was an autoloader root directory), but that feels like a relatively significant step for a relatively insignificant problem.

In the realm of “nice to have” feature proposals, we have a situation where we have a small, known set of files that validate the default autoloader convention. Rather than define an Inflector (which has the limitation of only adjusting a single cname at a time for a given path, and which adds cost to every cname lookup), it would seem preferable to us to be able to define an “exception” to the standard mapping:

loader.exception('projects/k12_math/config/project.rb', 'K12Math')

[Spitballing] From this, you could store a map of parent_cpath to a set of [ cname, abspath ] (plus a separate set of exceptional abspaths), skip adding autoloaders for anything in the exceptional_paths set, and add appropriate autoloaders for the exceptions whenever the parent_cpath is detected as defined.

The one edge case that would not be covered by such a solution is one where a file is mapped to a parent_cpath that either will never exist or is not a Module (in both cases, the autoload call would never be properly set). (Errors in these cases could be adequately reported in zeitwerk:check by ensuring that all of the exceptional constants are defined after an eager_load and that all of the exceptional_paths are in $LOADED_FEATURES.)

This mechanism could also be used to solve the lib/xxx/version.rb problem, as a lighter weight replacement for the GemInflector.

The TL;DR is that we’re deliberately violating convention, we have our reasons for doing so, and we would need to eager load the exceptional files even if we weren’t. It’s just inconvenient that we have to give up code reloading for that file and its dependencies.

fxn · June 19, 2020, 11:24pm

Thanks for such a detailed explanation.

Could you share the contents of projects/k12_math/config/project.rb? I believe I need to understand better the role of this file.

config directories are ignored, when is the file above evaluated? By who?

pvande · June 20, 2020, 12:01am

Sure thing:

module K12Math
  extend ProjectNamespace
  self.translations << :es
end

ProjectNamespace sets up some common configuration primitives, gives us some common convenience methods, that sort of thing. We basically treat these namespace modules as the object of record for the project.

As for when they get loaded, we have a Rails initializer that is at the moment simply running

Rails.root.glob('projects/*/config/project.rb').each do |project_config|
  require project_config.to_s
end

fxn · June 20, 2020, 9:12am

Got it!

First of all, let me say that project organization is cool, and I love that collapse allowed you guys to write it that way and do not need to repeat the project namespace all over the place.

If I have understood the situation correctly, the code in the projects is autoloaded/reloaded, except project.rb files, and you’d like to have them autoloaded/reloaded too as everything else. Right?

If that is correct, I see two possible solutions. Both assume the config directories are not ignored.

Solution 1: Custom inflector

I know you dislike this one, and you know most if not all of the things I am going to say, but allow me to build the argument, also for other people that may be reading the thread.

The name inflector was chosen because of the inherited concept from Rails, and because the most common use case for it is little tweaks. But it receives the absolute path as second argument, and could have been called something like “Path2ConstantMapper”.

I see the exception interface so overlapping with the inflector, that I am in principle reluctant to consider that one. Performance-wise, it is the more or less the same, because someone needs to check for every single file if it is an exception.

So the inflector I envision would be (written right here, not tested):

class MyInflector
  def self.camelize(basename, abspath)
    # Short-circuit for performance.
    return basename.camelize if basename != 'project'

    if abspath =~ %r{/projects/([^/]+)/config/project\.rb\z}
      $1.camelize
    else
      'Project'
    end
  end
end

If you want to keep project.rb files, this approach would use Zeitwerk idiomatically for my taste, and it is quite simple, no?

Solution 2: Rename project.rb to k12_math.rb

Change the project conventions to expect an entrypoint that defines the namespace in the config directory matching its name:

projects/k12_math/config/k12_math.rb

For this to work, all projects/*/config directories would be in the autoload paths.

This setup would take advantage of the support in Zeitwerk for nested root directories, in which the deeper one wins. This setup would be like having the top-level k12_math.rb you prefer to avoid to have projects self-contained, but allowing one subdirectory in them to act as Object.

I do not know if the approach as is would apply, perhaps config has more stuff in it, in which case perhaps the entrypoint could live somewehere else… Depends on details, but you see the idea.

It also depends on whether you prefer the uniformity of config/project.rb over the uniformity “files match constants without exceptions (mod collapse, mod nested root directories)”.

Do you like any?

pvande · June 20, 2020, 10:49pm

That’s the goal, yes.

In our particular application, this would work (and to be clear: we are likely to adopt this approach). The WTF that initiated this conversation, however, is that if you need to create one- or two-off static exceptions to the default filename-to-constant mapping, you need to replace a fundamental piece of the library with your own code, which is then responsible for either handling or delegating every file and directory name in your project.

Custom inflectors feel like a fantastic solution for when you need to make a systematic convention change (like handling hyphenated file name mapping), but they feel like the wrong abstraction for defining a couple of static mappings — it just feels like a lot of responsibility to hand over for something that is not particularly uncommon (see: the GemInflector). Furthermore…

More specifically, it maps path fragments to constant names advised by absolute paths. There are [at least] two important restrictions that this places on what it is capable of adjusting.

Given a file path <root>/foo/bar.rb, a custom Inflector cannot produce the constant ::BAR.
Given a file path <root>/foo.rb, a custom Inflector cannot produce the constant ::F::OO.

Again, I think that custom Inflectors are a great tool for when you have individual naming patterns that don’t map to constants according to the default convention, and these restrictions allow your augmented conventions to be applied consistently with no extra effort. But I don’t just get to write:

class MyInflector
  EXCEPTIONAL_FILES = {
    '/foo/bar/baz.rb' => '::Foo::Bar',
    # …
  }

  def self.camelize(basename, abspath)
    EXCEPTIONAL_FILES.fetch(abspath) { basename.camelize }
  end
end

(The Inflector is invoked at the wrong time to make this work: camelize is called during filesystem traversal, while this sort of constant mapping needs to happen when the parent constant is loaded.)

Grounding this in reality again, this is not a shortcoming that (currently) impacts our application.

The exception interface I’ve described has a few advantages:

It happens during constant load, which allows it to do things that Inflectors can’t.
It reduces the common case of single-file exceptions to their simplest form (one method call vs one class definition).
It separates the mechanisms for naming convention changes and files that don’t follow convention.
It makes it harder to screw up performance.
- Internally, the feature would almost certainly use a single test of Hash#key? per file; naïve custom Inflectors (like the one you wrote above, sans short-circuit) may perform much worse. (This may be alleviated with documentation.)
It eases the transition path for new naming conventions.
- As an example, as I was doing the initial refactoring to project/xxx/config/project.rb, having a mechanism that allowed me to explore that potential convention without committing to that convention for everything would have been greatly welcomed. At this point, it’s quite fair to say that we’re committed, and that’s the biggest reason we’re considering actually moving to the Inflector-based solution — we actually have a new convention now.

Having said that, I can also come up with reasons not to adopt such an interface:

Making conventional exceptions easy ends up weakening the convention.
It imposes a performance cost on every user, whether they have exceptions or not.
It’s easy to accidentally declare conflicting sources for the same constant, but hard to detect.
- This problem is not technically new.
It’s easy to accidentally declare mappings to files that will never be accessible.
Knowing when to use a custom Inflector vs. a handful of exceptions may be unclear.
- Documentation may be able to help with this.

Once again @fxn, I’d like to thank you for having this discussion with me. I appreciate the time and thought you’ve poured into it, and whether or not the proposal goes anywhere, I feel as though my concern has been heard.

fxn · June 21, 2020, 11:06am

Awesome analysis!

Yes, the fact that an inflector can only return a constant name is a consequence of the way this gem is designed, it is coherent with the whole concept, which has several motivations. Let me elaborate a bit.

As you know, in Ruby constants belong to modules, and Zeitwerk sets autoloads for them. An autoload is set on a class or module object, and only accepts a constant name (not a constant path). You know all that, just setting up the argument.

So, Zeitwerk works naturally in a way that makes the class or module object always present. The “current namespace”, its class or module object, is clearly identified at all times.

collapse and the inflector are flexible points, but are still aligned with that view. By forcing the project structure, you know a top-level constant is at the top level of a root directory, for example. Nowadays, that is mod collapsing. With collapsing you can no longer look at the project tree and guess the namespaces. Collapsing has a cost in term of predictability, uniformity, etc. you lose the 1-1 relationship. But in this case I thought this deviation could be worthwhile for some use cases (like yours!), and still plays well with the idea of the controlled descend.

Performance is also another aspect. It is convenient to be lazy for large code bases. If foo/bar/baz.rb could define ::Invoice, it is no longer enough to descend one namespace at a time. You’ve opened the door to be able to define anything anywhere, so you need to walk the whole project tree on setup and reload. (EDIT: This is in the context of an inflector returning constant paths instead pf constant names, the exception API would allow some optimizations, I believe.)

And, the most important of all reasons: The constraints are set in a way that makes the problem solvable in a way that matches Ruby semantics. With the current constraints, autoload can be implemented, the conventions are natural, and all known edge cases are addressed.

One tricky edge case is explicit namespaces. Files defining explicit namespaces have to be evaluated, because you need the actual class or module object that it defines in order to be able to set autoloads in them. So you need eager tree walk, and eager loading explicit namespaces. I do not even know if that would be technically possible, because explicit namespaces can refer any autoloadable constant at the top-level (think class body level).

If this was technically possible, which is something I do not know right now, we would have complicated the gem by a lot, just to allow users to define Admin::User in foo/bar/baz.rb. I think that is not worth pursuing.

The inflect method was a way to be able to declaratively add mappings without going through the trouble of defining a whole inflector. I’ll think about the addition of something similar for absolute paths.

This has been a great conversation, thanks a lot for it :).

Topic		Replies	Views
Are we missing primitives? Briefly comparing Rails & Laravel out of box experiences A May Of WTFs feature	5	1542	May 16, 2020
Looking for some input on career advancement w/ Rails rubyonrails-talk	6	122	July 7, 2010
Just an awful lot of things to know about A May Of WTFs	0	471	May 16, 2020
rails 1.2.2 rubyonrails-talk	9	85	March 10, 2007
[PRIZE] Docker for Rails Developers - beta launch rubyonrails-talk announcement	2	435	May 7, 2018