Is threading the right option? Or changing my Apache setup?

I have a Rails App that is hitting a fairly big database (several million rows of data).

The app runs really well as I have enough ram on the servers to have enough mongrels running and enough fully primed instances of postgres up with several of the more commonly used tables up in RAM), but I hit one performance brick wall that I am not sure how to get around.

If a user requests one of our larger queries (which can take up to 2-3 minutes to run), that mongrel is blocked as rails is chugging away with postgres getting the resulting data set. I have tuned the query (it now takes 2-3 minutes instead of 8-15) and have the correct indices on the tables etc. I am sure I can do more here, but it is diminishing on the speed returns.

The problem is not so much the response time to the user who is doing the query as they know that this query will take time and this is expected. It is done via an AJAX call and they get some progress information.

The problem is that the apache server then takes the next incoming requests and sends them off to the mongrels in turn, it wraps around all the mongrels and tries to serve again to this mongrel that is doing the long query and so the second user gets blocked waiting for the first query to finish.

One handling would be multi threaded rails app, but I am sure there is a better option.

I tried setting the Apache balancer to max=1 but this didn't seem to solve it.

How is anyone else handling this? Running it in backgroundrb doesn't seem to be an option because I am producing an interactive list, not a static page / pdf or report.

So, how do I get apache to ignore this blocked mongrel and skip on to the next one?

Regards

Mikel

Mikel Lindsaar wrote:

I have a Rails App that is hitting a fairly big database (several million rows of data).

The app runs really well as I have enough ram on the servers to have enough mongrels running and enough fully primed instances of postgres up with several of the more commonly used tables up in RAM), but I hit one performance brick wall that I am not sure how to get around.

If a user requests one of our larger queries (which can take up to 2-3 minutes to run), that mongrel is blocked as rails is chugging away with postgres getting the resulting data set. I have tuned the query (it now takes 2-3 minutes instead of 8-15) and have the correct indices on the tables etc. I am sure I can do more here, but it is diminishing on the speed returns.

The problem is not so much the response time to the user who is doing the query as they know that this query will take time and this is expected. It is done via an AJAX call and they get some progress information.

The problem is that the apache server then takes the next incoming requests and sends them off to the mongrels in turn, it wraps around all the mongrels and tries to serve again to this mongrel that is doing the long query and so the second user gets blocked waiting for the first query to finish.

One handling would be multi threaded rails app, but I am sure there is a better option.

I tried setting the Apache balancer to max=1 but this didn't seem to solve it.

How is anyone else handling this? Running it in backgroundrb doesn't seem to be an option because I am producing an interactive list, not a static page / pdf or report.

So, how do I get apache to ignore this blocked mongrel and skip on to the next one?    I had the same problem with nginx and used a patched version discussed here with other solutions : Again: Workaround found for request queuing vs. num_processors, accept/close - Mongrel - Ruby-Forum

In your case, using haproxy behind Apache might be the simplest.

If you are interested by nginx, the link to the patched nginx doesn't work for me now. I can provide the 0.6.24 sources with the fair balancer module I use in production if needed. The diff between the official version and mine should be small enough for a quick audit (I did just that some weeks ago).

Lionel

I have a Rails App that is hitting a fairly big database (several million rows of data).

The app runs really well as I have enough ram on the servers to have enough mongrels running and enough fully primed instances of postgres up with several of the more commonly used tables up in RAM), but I hit one performance brick wall that I am not sure how to get around.

If a user requests one of our larger queries (which can take up to 2-3 minutes to run), that mongrel is blocked as rails is chugging away with postgres getting the resulting data set. I have tuned the query (it now takes 2-3 minutes instead of 8-15) and have the correct indices on the tables etc. I am sure I can do more here, but it is diminishing on the speed returns.

The problem is not so much the response time to the user who is doing the query as they know that this query will take time and this is expected. It is done via an AJAX call and they get some progress information.

The problem is that the apache server then takes the next incoming requests and sends them off to the mongrels in turn, it wraps around all the mongrels and tries to serve again to this mongrel that is doing the long query and so the second user gets blocked waiting for the first query to finish.

One handling would be multi threaded rails app, but I am sure there is a better option.

I tried setting the Apache balancer to max=1 but this didn't seem to solve it.

How is anyone else handling this? Running it in backgroundrb doesn't seem to be an option because I am producing an interactive list, not a static page / pdf or report.

Why couldn't you pass it off to backgroundrb? It could then stuff the results into a temporary table or memcache and you could look for a finished result there. That would free up mongrel to do it's thing as you'd only be querying "are you done yet?" over and over till it was.

Maybe your data won't let you do that due to it's size requirements though.

I had thought about using the temp table approach, it has some benefits, #1 being it allows the user to get onto something else while the list is generating... but I need a solution now, I think I'll hit that in a future version, good idea though.

I ended up putting the following in my balancer group:

<Proxy balancer://.......>   BalancerMember http://127.0.0.1:4000 max=1 acquire=100   (repeat) </Proxy>

And that seems to have handled it, the Apache server skips over the blocked Mongrel.

I'll have a look at HA Proxy or Nginx per Lionel's post on the next performance iteration.

Question though Phillip, how would Memcache help in this situation of long running SQL queries?

I can think with BackgroundRB on a temp table, you have an AJAX auto requester on the page polling the mongrel that asks "are we done yet?" and when the task is finished, pop it out. I guess you would get the mongrel pack to poll a database table seeing if job XYZ is finished yet and retrieve the temp table name to read from once the job is finished and then send the data back to the client.

That actually sounds like a good solution now that I think of it. But I don't know enough about memcache to know how this would fit in.

Regards

Mikel

Question though Phillip, how would Memcache help in this situation of long running SQL queries?

Never mind, I went and read the memcached website :slight_smile:

thanks for the good pointer, it looks like a good idea!

Regards

Mikel

In fact, you can use BackgrounDRb to store results in Memcache, so as result is available across all the mongrel clusters. In a nutshell, you pass the query to BackgrounDRb worker and worker stores the results back in memcache with a session identifier. You poll BackgrounDRb with ask_status and when query is finished ask_status will return to the final result.

You don't even need to use Memcache directly, bdrb has a configuration option, where you can specify if you want to use Memcache for worker result storage.

One more way (I do it this way) would be to write a mongrel handler for that particular request. That solves it as mongrel can handle multiple requests simultaneously.

I have solved the multiple request (somewhat) by having apache skip over busy mongrels, but this solution sounds interesting.

Any pointers on where to start on that? That sounds like a good gem :smiley:

Mikel