Persistent ActiveRecord N+1 Query Performance Degradation Under Production Load

Hello Rails Community,

I am currently facing a persistent and increasingly disruptive performance issue on my Ruby on Rails website, and after extensive debugging I believe it is rooted in an ActiveRecord N+1 query problem that becomes severe under production load. The application works correctly from a functional standpoint, but when multiple users access data-heavy pages simultaneously, response times increase dramatically and occasionally lead to request timeouts. The specific issue appears on a page that renders a collection of parent records along with several associated child records and nested associations. While everything renders as expected, the server logs show a high number of repeated SQL queries being executed for associated records, and the overall request duration increases proportionally with the number of records returned.

In development, this issue is noticeable but manageable because the dataset is relatively small. However, in production, where the database contains thousands of records and associations, the performance degradation becomes significant. For example, rendering a list of 50 parent objects, each with associated child records and additional nested relations, results in hundreds of individual SQL queries. Even though the database is properly indexed and hosted on a performant managed service, the overhead of repeatedly querying associated tables introduces substantial latency. Profiling tools indicate that the majority of request time is spent waiting on database queries rather than Ruby processing.

I have attempted to address this issue using includes, preload, and eager_load in various combinations, but the results are inconsistent. In some cases, eager loading reduces the number of queries as expected, but in other scenarios—particularly when conditional logic or scopes are applied to associations—the N+1 behavior seems to reappear. Additionally, certain nested associations require filtering based on runtime parameters, which complicates the eager loading strategy. I have reviewed the logs carefully and confirmed that despite using includes, Rails still issues separate queries for certain associations when conditions are evaluated dynamically in the view layer.

Another complexity arises from view partials that iterate over associated records. These partials sometimes call methods on associated models that trigger additional lazy-loaded queries. While these methods appear harmless and encapsulate business logic cleanly, they seem to inadvertently cause further database hits. Attempts to refactor the logic into precomputed fields or memoized attributes have only partially reduced the query count. The interplay between view rendering, association scopes, and dynamic method calls makes it difficult to eliminate the N+1 queries entirely without significantly restructuring the codebase.

In production, this issue becomes amplified when traffic increases. Application server threads become blocked while waiting on database responses, leading to increased queue times and degraded performance across unrelated endpoints. Although caching mechanisms such as fragment caching and low-level caching have been implemented, cache invalidation complexity prevents aggressive caching of certain dynamic components. As a result, many requests still rely heavily on real-time database access, which exposes the full impact of the N+1 query problem under concurrent load.

I am seeking guidance from the Rails community on best practices for completely eliminating persistent N+1 queries in complex nested association scenarios. Specifically, I would appreciate advice on structuring queries to handle conditional eager loading, avoiding lazy-loading within model methods and partials, and ensuring that includes or eager_load behave as intended when scopes are applied dynamically. Any recommendations on profiling tools, architectural refactoring strategies, or patterns for balancing eager loading with performance optimization would be extremely valuable. My goal is to ensure consistent, scalable performance for this endpoint without sacrificing code clarity or maintainability. Sorry for long post!

Is there anyone who can guide me?

Obviously there is no substitute for doing the analysis work yourself, but if you are looking for an easy band-aid GitHub - salsify/goldiloader: Just the right amount of Rails eager loading might something to look into.

I would start with one of two things (or maybe both)

  1. Monitoring, ideally a production monitoring tool to monitor metrics like datadog, new relic, sentry. This can be used to help identify timeouts, hot spots, etc. This is probably best if it’s live-traffic related
  2. Replication of a large set of data locally can sometimes be pretty trivial. If you’re using postgres and your policies allow it, you could take a snapshot and clone it locally and have the same size dataset

There’s so many things this could actually be that it’s hard to tell without analytics + code review.

It’s tricky. I setup a sample repo a couple months ago with some examples in the readme here that kind of walks thru what’s going on with some of the SQL, and “when” (shows some examples of things you need to think about to get the SQL you actually want). There’s a link there to a book called Advanced Active Record that I found useful in better understanding some of the internals.

The Complete Guide to Rails Performance is another good place to start.