Frequently querying Users' relationships with the current user

Hello,

I have a table with Follower Relationships between User. The relationship contains the two users (follower and followed) and a state that sets whether such a relationship is only requested or has been accepted.

This information is always returned when returning a User’s JSON there are four related states of whether a user is following or followed by the current user or whether either have been requested.

Currently this is done in the view, resulting in queries of course, which isn’t really efficient for many many users, because each user requested causes four additional queries.

Now my question is how I would pre-fetch this information, preferably ending up with an easy way to always do so regardless of how users are being queried. While I know I can eagerly load the user’s relationships and then check for presence, I want to avoid querying all the relationships of a users when I am only interested in the relationship with the current user.

All of this is happening in PostgreSQL, in case that’s relevant. Currently Rails 6.1 is in use, but if there would be a better solution in 7+ I’d still like to hear about it.

What numbers are we talking about? How badly does this currently affect the system? What is the desired performance vs. what can currently be observed? What hardware is available?

Hey, thank you for your answer. Since I try to give some context I have a summary below.

I don’t think I completely follow what some of these questions have to do my original question, because I am not asking for how I should dimension my application server(s), and neither am I asking about how much of a priority it should be, but I’ll gladly give some answers and rough (measured) numbers, if it helps.

Yes, it is currently a problem and it’s growing of course. I’ll give some context below.

We are talking about hundreds of thousands of individual users with users which sometimes have thousands of such relationships.Of course not all are returned at once. For example going through them is paginated, etc.

However, the issue is that for some requests in extreme cases many (hundreds, approaching a thousand) users are returned as part of some other objects and having four additional queries being made for them is quite a bit of an overhead, even though the individual queries typically taking around 1ms (including network latency, etc.). Still means that for extreme cases one ends up with a request spending a whole second processing. I want to avoid building a solution only tackling these extreme cases.

My thought process therefor is to combine as much of this into single queries. As mentioned it currently is triggered by the view and I am not sure what the best (also in terms of idiomatic code) way is to at least combine these four queries per user into one.

I have been thinking about having a method returning a hash and then using that in a view. That would mean one additional query, but I think it looks very unidiomatic and still means that the cost for this incurs rather frequently.

My overall goal is to keep requests completing in well under a second, while not needing to complicate my code too much, which is why I think moving all of this logic into the controller, preferably into the controller and if possible somewhat “generically” into something I can re-use would be great. I know that might not be all that easy, which is why I would be fine with at least significantly reducing the requests.

And for your last question, on the hardware side we are not currently limited. The database server (and its replicas) is currently overdimensioned and the application servers are currently in the process of being scaled out. Since this is an older project there is some unrelated work to make this easier, but that’s unrelated to that problem.

I hope that is concrete enough. So to sum it up: There is hundreds of thousands of users. Since not all of them show the same activity/have the same amount of popularity for the majority of users making additional very small queries is at least okay, but for a growing number having that many very small queries becomes an issue. Throwing additional hardware on the problem wouldn’t be an issue, however that doesn’t currently look like right approach to me, because we not limited here and since individual requests are slow.

Some side notes: Caching isn’t really working so well here, because we are talking about what a single user gets about other users in relation to themselves, so this data is both per-user and needs to be up to date.

I focused on the queries in view here, since this is the main problem. But for better understanding I should mention that these relationships are also a form of permission system so to speak (ie. you can only view something when you are an accepted follower). Since this has to be done upfront in some situations, to decide what is shown that is also something it is used for. However, this is less of an issue, since it is usually done as part of whatever is queried for anyways. As mentioned we are talking about sub millisecond queries here.

I don’t know if this will help or not, but one thing I have tried to follow in my work is to make “the slow thing” update “the fast thing”. By this I mean that if there’s an expensive query which will return a set of records or ids, then run that query once when it changes, and store the result somewhere durable and faster to query.

So if the act of accepting a follow changes the result set of acceptable followers, cache that set, and then also update the follower’s cache of allowed follows. That action (accepting) only happens once, in the context of one user. Compare with running that same query to find the acceptable followers or allowed follows on each page view or subsequent message send.

I know that callbacks have a shady history in Rails, but this is one case where I would reach for them to build in some pre-caching to speed up your joins.

Materialized views with an on-update trigger is another more db-centric solution you might also look at.

Walter

This is the typical Rails approach, and should yield better performance because your application will retrieve an entire page of records with one query. The query output is stored in a global variable (e.g., @followers). Then all your view has to do is iterate over that variable.

Check out the Kaminari pagination gem. The README has a thorough description of how to use it.

@walterdavis Hey, thank you for your response. I totally get that, I do that too.

However the issue here is that it’s rather the opposite, that I have many small, simple queries, just a lot of them and I’d like to reduce them. There isn’t really something to pre-calculate or materialize, because it is per user.

@djmolny Thanks for your response, I actually looked at the gem before, when I implemented pagination. I didn’t use it, but drew inspiration from it.

Just to clarify, how would you do it in the specific case?

I have user objects that are supposed to end up like:

{
 id: 1234,
 name: "John Doe",
 is_followed: true,
 is_following: false,
 is_follow_requested: false,
 ...
}

The goal here is to allow the front end to display Users accordingly.

These is_* queries are checking a table with follower, followed, status table. And the queries as mentioned check against the current user of the session. In the current version there is a User model containing methods like is_followed(user_id) where in the view a user.is_followed(current_user) which results in of course many queries, when multiple users are being returned.

My intermediary solution would be at least reducing queries by for example creating a helper that does one query and returns the states or something like that, so I can be somewhat independent of the original query - for example when a User is just joined to some other Model. However it might be interesting to know if there is some way to do that generically on the ORM side.