Brainstorming approaches to reconcile Rails 8's default multi-DB setup with DATABASE_URL

Rails 8 beta apps currently don’t deploy well out of the box with PaaS providers which provision a DATABASE_URL, like Fly.io and Heroku. This is because Rails 8 beta apps have up to 4 databases by default in production: primary, cache, queue, and cable.

Playing around with database.yml

Assuming I have a DATABASE_URL that looks like this: postgres://user:password@host:port/database_name, and assuming the other database names are database_name_cache, database_name_queue, etc, I had the following setup for Kamal deploys:

production:
  primary: &primary_production
    <<: *default
    url: <%= ENV['DATABASE_URL'] %>
  cache:
    <<: *primary_production
    url: <%= ENV['DATABASE_URL']&.+('_cache') %>
    migrations_paths: db/cache_migrate
  queue:
    <<: *primary_production
    url: <%= ENV['DATABASE_URL']&.+('_queue') %>
    migrations_paths: db/queue_migrate
  ...

Then I realized that this setup didn’t work with Fly.io because they use query parameters inside the database URLs, like postgres://user:password@host:port/database_name?sslmode=disable. So for a more general setup I am currently using the following:

production:
  primary: &primary_production
    <<: *default
    url: <%= ENV['DATABASE_URL'] %>
  cache:
    <<: *primary_production
    url: <%= URI.parse(ENV['DATABASE_URL']).tap { |u| u.path += '_cache' } if ENV['DATABASE_URL'] %>
    migrations_paths: db/cache_migrate
  queue:
    <<: *primary_production
    url: <%= URI.parse(ENV['DATABASE_URL']).tap { |u| u.path += '_queue' } if ENV['DATABASE_URL'] %>
    migrations_paths: db/queue_migrate
  ...

This deploys great with both Kamal and Fly.io. Heroku is another story though, it doesn’t work out of the box because they don’t seem to allow creating multiple databases within the same database add-on. I am setting aside this concern for now.

In any case, the yaml setup above looks a little too verbose in order to support Rails 8 defaults with PaaS providers. It is unclear to me in these early days if Rails should have slightly more streamlined configuration, or PaaS providers should adapt, or a little bit of both.

Finding ways to simplify the database.yml setup

I am assuming that DATABASE_URL represents the primary database, which seems to be the case according to the source code and the Heroku docs.

  • A solution could be to use DATABASE_URL as the base to define URLs for Solid databases.

    Maybe something like this PR from Sam Ruby.

    Or since traditionally ENV['DATABASE_URL'] has more precedence than database:, maybe introduce a higher precedence field like url_database (for lack of a better name):

    production:
      primary: &primary_production
        <<: *default
        url: <%= ENV['DATABASE_URL'] %>
      cache:
        <<: *primary_production
        url_database: myapp_production_cache
        ...
      queue:
        <<: *primary_production
        url_database: myapp_production_queue
        ...
    

    where url_database would override DATABASE_URL’s database name portion while still using other values like user/password/host/port.

  • According to this code, instead of complicating the database.yml setup, one could set the (PRIMARY_)DATABASE_URL, CACHE_DATABASE_URL, QUEUE_DATABASE_URL, and CABLE_DATABASE_URL variables. But if you want to set those URLs based on the value of (PRIMARY_)DATABASE_URL, you sometimes can’t do that before the first deploy because some PaaS providers like Fly.io don’t give you access to that variable (maybe there is a way to reference another variable when setting it). What would be best IMO is if the PaaS provider could populate all these variables automatically so it is seamless for the user.

I am curious about the Rails team’s thoughts on this topic and whether new PRs are needed to make things smoother.

2 Likes

Thanks for starting this discussion! And my PR can definitely be thrown away if a better approach is defined.

Even with kamal, setting four environment variables when only one would do seems annoying and mildly error prone. As an example, four values would have to be changed if you want your sqlite3 database to be placed on a volume.

What would be best IMO is if the PaaS provider could populate all these variables automatically so it is seamless for the user.

If no better solution is implemented in Rails, I will pursue this for fly.io. I’ll note that this requires parsing config/database.yml which isn’t precisely YAML, but rather a combination of YAML and ERB.

Yet another solution, one that works with fly.io is to create a config/initializers/databases.rb with contents like the following:

if ENV["DATABASE_URL"]
  databases = YAML.load_file("config/database.yml", aliases: true)["production"].
    map {|key, config| [key, config["database"]]}.to_h

  unless databases.keys == ["primary"]
    url = URI.parse(ENV["DATABASE_URL"])

    databases.each do |key, database|
      url.path = "/" + database
      ENV["#{key.upcase}_DATABASE_URL"] = url.to_s
    end
  end
end
1 Like

Playing with it more, I’m increasingly preferring the ergonomics of providing an option to set a single secret to control the placement of all of the databases. DRY and all that. Sure, if you absolutely want to place the four databases in four different places, you should be able to do that, but otherwise, everything should follow from the single DATABASE_URL secret.

The cleanest implementation I’m finding is to place this into the configuration. Yes, it is mildly ugly, but it is explicit, fully backwards compatible, and can still be overridden with environment variables like CACHE_DATABASE_URL. It seems that url, if present, takes precedence.

diff --git a/config/database.yml b/config/database.yml
index d8c045c..9522215 100644
--- a/config/database.yml
+++ b/config/database.yml
@@ -81,18 +81,22 @@ test:
 production:
   primary: &primary_production
     <<: *default
     database: depot_pg_production
     username: depot_pg
     password: <%= ENV["DEPOT_PG_DATABASE_PASSWORD"] %>
+    url: <%= ENV["DATABASE_URL"] %>
   cache:
     <<: *primary_production
     database: depot_pg_production_cache
     migrations_paths: db/cache_migrate
+    url: <%= URI.parse(ENV["DATABASE_URL"]).tap { |url| url.path += "_cache" } if ENV["DATABASE_URL"] %>
   queue:
     <<: *primary_production
     database: depot_pg_production_queue
     migrations_paths: db/queue_migrate
+    url: <%= URI.parse(ENV["DATABASE_URL"]).tap { |url| url.path += "_queue" } if ENV["DATABASE_URL"] %>
   cable:
     <<: *primary_production
     database: depot_pg_production_cable
     migrations_paths: db/cable_migrate
+    url: <%= URI.parse(ENV["DATABASE_URL"]).tap { |url| url.path += "_cable" } if ENV["DATABASE_URL"] %>

I can make fly launch insert these lines in config/database.yml if the definition of production is a map of maps, and contains more than just the definition for the primary database.

Yeah that seems to be in line with what the Rails guide and code say, if you explicitly provide a url:, it seems like it has the highest precedence. If you don’t provide a url but there is a (X_)DATABASE_URL env var present, then the env var subcomponents get merged on top of the components defined in yaml[1][2][3][4][5][6].

Yeah it is mildly ugly but I guess backwards compatibility trumps everything. When you start having multiple databases in your database.yml there is no way around a wordier database.yml with additional config. It gets even wordier by the way if you want to have multiple databases both in development and production :sweat_smile:: Best way to play around with Solid Queue in development? · Issue #332 · rails/solid_queue · GitHub.

I am wondering if we couldn’t have our cake and eat it too by introducing some syntactic sugar or helpers on top of database.yml to shorten default configuration for solid databases. For what they’re worth here are some random ideas:

  • url: <%= Rails.configuration.solid.cable_db_url %>

    as a shorthand for

    url: <%= URI.parse(ENV["DATABASE_URL"]).tap { |url| url.path += "_cache" } if ENV["DATABASE_URL"] %>

  • solid: true to do this

    cache:
      <<: *primary_production
      solid: true
    queue:
      <<: *primary_production
      solid: true
    ...
    

    as a shorthand for this

    cache:
      <<: *primary_production
      url: <%= URI.parse(ENV['DATABASE_URL']).tap { |u| u.path += '_cache' } if ENV['DATABASE_URL'] %>
      migrations_paths: db/cache_migrate
    queue:
      <<: *primary_production
      url: <%= URI.parse(ENV['DATABASE_URL']).tap { |u| u.path += '_queue' } if ENV['DATABASE_URL'] %>
      migrations_paths: db/queue_migrate
    ...
    

    I like this one because it could significantly reduce verbosity for a default Rails 8 setup, and you could still override it if needed.

Looks like an overkill from rails to require 3 databases. Queue and cache may not be in a RDBMS to begin with.

1 Like

Technically those databases are not required; but the key is that they are configured by default whether you use them or not; and unless you are aware of them your first experience will be an obscure error message that some sort of unix socket is not found on localhost for a database that you may not even use or need.

This is no longer an issue for fly.io - the overhead for migration and storage for these additional databases appear to be minimal.

[EDIT: I should have read closer the original post here, where it seems that DATABASE_URL should actually be used for primary based on the code. I’m going to go investigate what was going on with my set up that caused my primary to ignore the DATABASE_URL. Please ignore literally everything below here.]

I have a Documentation PR related to this already open… as I had production downtime due to this change when switching to SolidQueue.

It doesn’t mention anywhere in the rails guide that mulit-databases ignores DATABASE_URL. I think it’s fine for us to move away from that, but at least the guide should mention how to fix it!