How can I prevent this N+1 query with Active Storage and a few associated models?

Hi,

After introducing Active Storage into one of my models I’m experiencing an N+1 query on active_storage_attachments that I can’t quite figure out how to resolve.

Here’s my associations:

class Course < ApplicationRecord
  has_many :sections
  has_many :lessons, through: :sections
end

class Section < ApplicationRecord
  belongs_to :course

  has_many :lessons
  has_many :videos, through: :lessons
end

class Lesson < ApplicationRecord
  belongs_to :section

  has_many_attached :downloads

  has_one :course, through: :section
  has_one :video
end

class Video < ApplicationRecord
  belongs_to :lesson

  has_one :course, through: :lesson
end

My goal is to be able to build a table of contents and loop through all of the sections and display a list of sections, lessons, video durations and display a number of files for each lesson. All of this is working except for the N+1 query.

Here’s my main controller query and view:

@sections = @course.sections.preload(:lessons, :videos).all
<% sections.each do |section| %>
  [REDACTING A BUNCH OF THINGS HERE, IT IS NOT IMPORTANT]

  <% section.lessons.each do |lesson| %>
    [REDACTING A BUNCH OF THINGS HERE, IT IS NOT IMPORTANT]
  
    <% lesson.downloads.includes(:blob).references(:blob).order(:filename).each do |download| %>
      [REDACTING A BUNCH OF THINGS HERE, IT IS NOT IMPORTANT]
    <% end %>
  <% end %>
<% end %>
  • The includes the blob query so I can sort the downloads by filename, I’m not sure if there’s a better way but I get the N+1 query with and without this part

I tried a bunch of combinations of things unsuccessfully. No matter what I try I end up with this query executing 212 times (1 for each lesson):

SQL (1.4ms)  SELECT "active_storage_attachments"."id" AS t0_r0, "active_storage_attachments"."name" AS t0_r1, "active_storage_attachments"."record_type" AS t0_r2, "active_storage_attachments"."record_id" AS t0_r3, "active_storage_attachments"."blob_id" AS t0_r4, "active_storage_attachments"."created_at" AS t0_r5, "active_storage_blobs"."id" AS t1_r0, "active_storage_blobs"."key" AS t1_r1, "active_storage_blobs"."filename" AS t1_r2, "active_storage_blobs"."content_type" AS t1_r3, "active_storage_blobs"."metadata" AS t1_r4, "active_storage_blobs"."service_name" AS t1_r5, "active_storage_blobs"."byte_size" AS t1_r6, "active_storage_blobs"."checksum" AS t1_r7, "active_storage_blobs"."created_at" AS t1_r8 FROM "active_storage_attachments" LEFT OUTER JOIN "active_storage_blobs" ON "active_storage_blobs"."id" = "active_storage_attachments"."blob_id" WHERE "active_storage_attachments"."record_id" = $1 AND "active_storage_attachments"."record_type" = $2 AND "active_storage_attachments"."name" = $3 ORDER BY "filename" ASC  [["record_id", 208], ["record_type", "Lesson"], ["name", "downloads"]]

I think I need to do something with with_attached_downloads but I can’t figure out where that would go. That scope is referenced here: ActiveStorage::Attachment

I can really use your assistance, thanks!

I have simplified the use-case in a runnable script:

And changing the preload to lessons: :downloads_attachments makes active_storage_attachments table to be queries once while originally in the script active_storage_attachments is being queries twice - once for every lesson, as in your original problem

# typed: true
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails", branch: "main"
  gem "rack", "~> 2.0"
  gem "sqlite3"
end

require "active_record/railtie"
require "active_storage/engine"
require "tmpdir"

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.eager_load = false
  config.session_store(:cookie_store, key: "cookie_store_key")
  secrets.secret_key_base = "secret_key_base"

  config.logger = Logger.new($stdout)
  Rails.logger  = config.logger

  config.active_storage.service = :local
  config.active_storage.service_configurations = {
    local: {
      root: Dir.tmpdir,
      service: "Disk",
    },
  }
end

ENV["DATABASE_URL"] = "sqlite3::memory:"

Rails.application.initialize!

require ActiveStorage::Engine.root.join("db/migrate/20170806125915_create_active_storage_tables.rb").to_s

ActiveRecord::Schema.define do
  CreateActiveStorageTables.new.change

  create_table :sections, force: true do |t|
  end
  create_table :lessons, force: true do |t|
    t.references :section, null: false, foreign_key: false
  end
end

class Section < ActiveRecord::Base
  has_many :lessons
end

class Lesson < ActiveRecord::Base
  belongs_to :section

  has_many_attached :downloads
end

require "minitest/autorun"

class BugTest < Minitest::Test
  def test_upload_and_download
    section = Section.create!
    Lesson.create!(
      section: section,
      downloads: [{
        content_type: "text/plain",
        filename: "dummy.txt",
        io: ::StringIO.new("dummy"),
      }],
    )

    Lesson.create!(
      section: section,
      downloads: [{
        content_type: "text/plain",
        filename: "dummy2.txt",
        io: ::StringIO.new("dummy"),
      }],
    )

    Section.preload(:lessons).each do |section|
      section.lessons.each do |lesson|
        lesson.downloads.to_a
      end
    end
  end
end

Hi,

How would I apply that back to my example?

When I change my original query of:

@sections = @course.sections.preload(:lessons, :videos).all

To:

@sections = @course.sections.preload([lessons: :downloads_attachments], :videos).all

I’m still getting the same N+1 query on active_storage_attachments.

Could you try @sections = @course.sections.preload(lessons: :downloads_attachments, :videos) I removed .all as it shouldn’t play any role

Also could you additionally try

@sections = @course.sections.preload(lessons: {downloads_attachments: :blob}, :videos)

This should allow you to remove downloads.includes(:blob) from the most nested iteration.

Could you try @sections = @course.sections.preload(lessons: :downloads_attachments, :videos) I removed .all as it shouldn’t play any role

As is this produces a syntax error, did you mean to wrap it in brackets like [lessons: :downloads_attachments], :videos? If so then it works in the sense that it’s syntatically valid. It also works without the .all (good catch). N+1 still exists on the original Active Storage query.

@sections = @course.sections.preload(lessons: {downloads_attachments: :blob}, :videos)

This produces a syntax error of:

syntax error, unexpected ')', expecting => ...s_attachments: :blob}, :videos)

If I wrap it in brackets like ([lessons: {downloads_attachments: :blob}], :videos) then it works but it still has the N+1 query. I also tried simplifying the downloads loop to be lesson.downloads.each and now it all works without the N + 1 query.

However, I’m not sure how to adjust that loop to allow me to sort by the blob’s filename.

Getting close tho!

Yea, sorry, I should have tested it first. I expect the correct syntax should be preload({lessons: {downloads_attachments: :blob}}, :videos)

I’ve updated the script to also includes :videos so I can make sure I suggest correct syntax:

# typed: true
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails", branch: "main"
  gem "rack", "~> 2.0"
  gem "sqlite3"
end

require "active_record/railtie"
require "active_storage/engine"
require "tmpdir"

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.eager_load = false
  config.session_store(:cookie_store, key: "cookie_store_key")
  secrets.secret_key_base = "secret_key_base"

  config.logger = Logger.new($stdout)
  Rails.logger  = config.logger

  config.active_storage.service = :local
  config.active_storage.service_configurations = {
    local: {
      root: Dir.tmpdir,
      service: "Disk",
    },
  }
end

ENV["DATABASE_URL"] = "sqlite3::memory:"

Rails.application.initialize!

require ActiveStorage::Engine.root.join("db/migrate/20170806125915_create_active_storage_tables.rb").to_s

ActiveRecord::Schema.define do
  CreateActiveStorageTables.new.change

  create_table :sections, force: true do |t|
  end
  create_table :lessons, force: true do |t|
    t.references :section, null: false, foreign_key: false
  end

  create_table :videos, force: true do |t|
    t.references :section, null: false, foreign_key: false
  end
end

class Section < ActiveRecord::Base
  has_many :lessons
  has_many :videos
end

class Lesson < ActiveRecord::Base
  belongs_to :section

  has_many_attached :downloads
end

class Video < ActiveRecord::Base
end

require "minitest/autorun"

class BugTest < Minitest::Test
  def test_upload_and_download
    section = Section.create!
    Lesson.create!(
      section: section,
      downloads: [{
        content_type: "text/plain",
        filename: "dummy.txt",
        io: ::StringIO.new("dummy"),
      }],
    )

    Lesson.create!(
      section: section,
      downloads: [{
        content_type: "text/plain",
        filename: "dummy2.txt",
        io: ::StringIO.new("dummy"),
      }],
    )

    Section.preload({lessons: {downloads_attachments: :blob}}, :videos).each do |section|
      section.lessons.each do |lesson|
        lesson.downloads.each { |d| d.blob }
      end
    end
  end
end

Since you are preloading it, you’ll have to sort it in-memory. Possibly lesson.downloads.sort(&:filename).each

Both variants are working without the N + 1 query. Is there a preferred syntax?

Section.preload({lessons: {downloads_attachments: :blob}}, :videos).each
Section.preload([lessons: {downloads_attachments: :blob}], :videos).each

Since you are preloading it, you’ll have to sort it in-memory.

On the bright side there’s only ever going to be handful of downloads so it shouldn’t be a problem, but is there any way to expand the preload so we can sort it at the DB level?

I’ve done “tricks” in the past like this where I added this to Section

has_many :ranked_lessons, -> { rank(:row_order) }, class_name: "Lesson"

This lets me preload ranked_lessons with a custom sort but I’m not quite 100% sure how to apply that back to this example. This depends on GitHub - brendon/ranked-model: An acts_as_sortable/acts_as_list replacement built for Rails 4, 5 and 6 to provide that scope but we should be able to swap that with any query. That just happens to demo a solution I do have working in my app.

lesson.downloads.sort(&:filename).each

This throws wrong number of arguments (given 1, expected 0).

But this works lesson.downloads.sort_by(&:filename).each, it does sort them. Although the complexity of this approach isn’t the best ruby on rails - Sort array returned by ActiveRecord by date (or any other column) - Stack Overflow but to be fair the filename isn’t indexed in my case so it probably won’t matter in the end. Still doing it at the DB “feels” like the optimal approach.

In either case, thanks a lot for helping me get this far!

Yea, sorry, was supposed to suggest sort_by

Technically it should be possible to come up with a custom built downloads_attachments_by_filename association that will include custom ordering similarly to ranked_lessons from your example but it may complicate maintenance of this custom association compared to one defined by Rails

That’s a fair point. I’ll admit I’m usually guilty of premature optimization but this feels like one time where the in-memory sort is “actually fine”. Especially since I’ll eventually be caching these collections since it’s extremely ready heavy with very little changes.

Thanks again.

If anyone happens to want to explore the custom association approach just to see how it could be done I’ll run your suggestions as a test. I’ll also reply back here if I get bored one day just to see how it could be done.

On an unrelated note:

We’ve been using active storage for a few years now, and every time we had a has_many_attached we eventually were forced to replace that with a new model and a has_one_attached because people wanted to sort the attachments in a specific way, or add titles, or other properties.

And it’s always a headache updating production to move the attachments to the new model.

Consider using the gem ar_lazy_preload | RubyGems.org | your community gem host it turned out to be very helpful in my apps

Hi Breno Im just about to start building a Document Share feature, and my thought was to do has_many_attached. From what you are writing there is some sort of lesson learned in your end. Can you elaborate just a little bit on how you do this and what to avoid?

Thanks Jens

Sure. Let’s start with one of the examples we had to change: Product images.

Originally we had this:

class Product
  has_many_attached :images
  
  def cover_image
    images.order(:id).first
  end
end

Products were created by an API integration with our sellers, and when attaching images we took the care to do so in the order our sellers were sending then, or according the a specific field (such as position). This worked very well for the first few sellers which were larger and also took care to properly prioritize their product images.

Then we got a seller who was sending images at random, and our business team started complaining that sometimes the cover image was horrible and they wanted to manually change it. Unfortunately we had no simple way to do this.

The ActiveStorage::Blob class has a metadata field, but you are not supposed to actually add more fields there since any call to analyze will overwrite it. So we were forced to rewrite this part of the code like this:

class Product
  has_many :photos

  def cover_image
    photos.order(:position).first
  end
end

class Photo
  has_one_attached :file
  validates :position, presence: true
end

Of course, that required a migration to turn 8 million has_many_attached blobs into 8 million Photo records, each with a has_one_attached blob.

So with your document share, you should first consider: Am I ever going to need to have extra rules for these attachments? Extra rules being: changing the order they are shown, giving them descriptions, controlling permissions separately, etc? If yes you should probably not use has_many_attached.

Even if not, I’d still consider not using it. Every single time we used has_many_attached we got burned and had to write massive migrations to convert them into has_one_attached.

Thanks Breno! When reading, it’s clear. Ny first, second and maybe hird implementation would have been similar to the first example :slight_smile:

Best Jens

1 Like

Both variants are working without the N + 1 query. Is there a preferred syntax?

Section.preload({lessons: {downloads_attachments: :blob}}, :videos).each
Section.preload([lessons: {downloads_attachments: :blob}], :videos).each

So, preload([lessons: {downloads_attachments: :blob}], :videos) is really preload([{lessons: {downloads_attachments: :blob}}], :videos). Ruby has implied curly braces when using a Hash as the last argument, or when in an Array.

irb(main):001:0> [:a, :b, c: :d]
=> [:a, :b, {:c=>:d}]

So really the question is about this vs this:

Section.preload({lessons: {downloads_attachments: :blob}}, :videos).each
Section.preload([{lessons: {downloads_attachments: :blob}}], :videos).each

Either way is fine. The preloader will go through each argument recursively for arrays and hashes.

I often put the symbol arguments first and hashes last. Like this:

Section.preload(:videos, lessons: { downloads_attachments: :blob }).each

I mostly do it like this because it involves less curly braces. But, it’s all the same.

1 Like