Persistent ActiveRecord N+1 Query Performance Degradation Under Production Load

Hello Rails Community,

I am currently facing a persistent and increasingly disruptive performance issue on my Ruby on Rails website, and after extensive debugging I believe it is rooted in an ActiveRecord N+1 query problem that becomes severe under production load. The application works correctly from a functional standpoint, but when multiple users access data-heavy pages simultaneously, response times increase dramatically and occasionally lead to request timeouts. The specific issue appears on a page that renders a collection of parent records along with several associated child records and nested associations. While everything renders as expected, the server logs show a high number of repeated SQL queries being executed for associated records, and the overall request duration increases proportionally with the number of records returned.

In development, this issue is noticeable but manageable because the dataset is relatively small. However, in production, where the database contains thousands of records and associations, the performance degradation becomes significant. For example, rendering a list of 50 parent objects, each with associated child records and additional nested relations, results in hundreds of individual SQL queries. Even though the database is properly indexed and hosted on a performant managed service, the overhead of repeatedly querying associated tables introduces substantial latency. Profiling tools indicate that the majority of request time is spent waiting on database queries rather than Ruby processing.

I have attempted to address this issue using includes, preload, and eager_load in various combinations, but the results are inconsistent. In some cases, eager loading reduces the number of queries as expected, but in other scenarios—particularly when conditional logic or scopes are applied to associations—the N+1 behavior seems to reappear. Additionally, certain nested associations require filtering based on runtime parameters, which complicates the eager loading strategy. I have reviewed the logs carefully and confirmed that despite using includes, Rails still issues separate queries for certain associations when conditions are evaluated dynamically in the view layer.

Another complexity arises from view partials that iterate over associated records. These partials sometimes call methods on associated models that trigger additional lazy-loaded queries. While these methods appear harmless and encapsulate business logic cleanly, they seem to inadvertently cause further database hits. Attempts to refactor the logic into precomputed fields or memoized attributes have only partially reduced the query count. The interplay between view rendering, association scopes, and dynamic method calls makes it difficult to eliminate the N+1 queries entirely without significantly restructuring the codebase.

In production, this issue becomes amplified when traffic increases. Application server threads become blocked while waiting on database responses, leading to increased queue times and degraded performance across unrelated endpoints. Although caching mechanisms such as fragment caching and low-level caching have been implemented, cache invalidation complexity prevents aggressive caching of certain dynamic components. As a result, many requests still rely heavily on real-time database access, which exposes the full impact of the N+1 query problem under concurrent load.

I am seeking guidance from the Rails community on best practices for completely eliminating persistent N+1 queries in complex nested association scenarios. Specifically, I would appreciate advice on structuring queries to handle conditional eager loading, avoiding lazy-loading within model methods and partials, and ensuring that includes or eager_load behave as intended when scopes are applied dynamically. Any recommendations on profiling tools, architectural refactoring strategies, or patterns for balancing eager loading with performance optimization would be extremely valuable. My goal is to ensure consistent, scalable performance for this endpoint without sacrificing code clarity or maintainability. Sorry for long post!