I have a Rails App that is hitting a fairly big database (several
million rows of data).
The app runs really well as I have enough ram on the servers to have
enough mongrels running and enough fully primed instances of postgres
up with several of the more commonly used tables up in RAM), but I hit
one performance brick wall that I am not sure how to get around.
If a user requests one of our larger queries (which can take up to 2-3
minutes to run), that mongrel is blocked as rails is chugging away
with postgres getting the resulting data set. I have tuned the query
(it now takes 2-3 minutes instead of 8-15) and have the correct
indices on the tables etc. I am sure I can do more here, but it is
diminishing on the speed returns.
The problem is not so much the response time to the user who is doing
the query as they know that this query will take time and this is
expected. It is done via an AJAX call and they get some progress
information.
The problem is that the apache server then takes the next incoming
requests and sends them off to the mongrels in turn, it wraps around
all the mongrels and tries to serve again to this mongrel that is
doing the long query and so the second user gets blocked waiting for
the first query to finish.
One handling would be multi threaded rails app, but I am sure there is
a better option.
I tried setting the Apache balancer to max=1 but this didn't seem to solve it.
How is anyone else handling this? Running it in backgroundrb doesn't
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.
So, how do I get apache to ignore this blocked mongrel and skip on to
the next one?
I have a Rails App that is hitting a fairly big database (several
million rows of data).
The app runs really well as I have enough ram on the servers to have
enough mongrels running and enough fully primed instances of postgres
up with several of the more commonly used tables up in RAM), but I hit
one performance brick wall that I am not sure how to get around.
If a user requests one of our larger queries (which can take up to 2-3
minutes to run), that mongrel is blocked as rails is chugging away
with postgres getting the resulting data set. I have tuned the query
(it now takes 2-3 minutes instead of 8-15) and have the correct
indices on the tables etc. I am sure I can do more here, but it is
diminishing on the speed returns.
The problem is not so much the response time to the user who is doing
the query as they know that this query will take time and this is
expected. It is done via an AJAX call and they get some progress
information.
The problem is that the apache server then takes the next incoming
requests and sends them off to the mongrels in turn, it wraps around
all the mongrels and tries to serve again to this mongrel that is
doing the long query and so the second user gets blocked waiting for
the first query to finish.
One handling would be multi threaded rails app, but I am sure there is
a better option.
I tried setting the Apache balancer to max=1 but this didn't seem to solve it.
How is anyone else handling this? Running it in backgroundrb doesn't
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.
In your case, using haproxy behind Apache might be the simplest.
If you are interested by nginx, the link to the patched nginx doesn't
work for me now. I can provide the 0.6.24 sources with the fair balancer
module I use in production if needed. The diff between the official
version and mine should be small enough for a quick audit (I did just
that some weeks ago).
I have a Rails App that is hitting a fairly big database (several
million rows of data).
The app runs really well as I have enough ram on the servers to have
enough mongrels running and enough fully primed instances of postgres
up with several of the more commonly used tables up in RAM), but I hit
one performance brick wall that I am not sure how to get around.
If a user requests one of our larger queries (which can take up to 2-3
minutes to run), that mongrel is blocked as rails is chugging away
with postgres getting the resulting data set. I have tuned the query
(it now takes 2-3 minutes instead of 8-15) and have the correct
indices on the tables etc. I am sure I can do more here, but it is
diminishing on the speed returns.
The problem is not so much the response time to the user who is doing
the query as they know that this query will take time and this is
expected. It is done via an AJAX call and they get some progress
information.
The problem is that the apache server then takes the next incoming
requests and sends them off to the mongrels in turn, it wraps around
all the mongrels and tries to serve again to this mongrel that is
doing the long query and so the second user gets blocked waiting for
the first query to finish.
One handling would be multi threaded rails app, but I am sure there is
a better option.
I tried setting the Apache balancer to max=1 but this didn't seem to solve it.
How is anyone else handling this? Running it in backgroundrb doesn't
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.
Why couldn't you pass it off to backgroundrb? It could then stuff the results into a temporary table or memcache and you could look for a finished result there. That would free up mongrel to do it's thing as you'd only be querying "are you done yet?" over and over till it was.
Maybe your data won't let you do that due to it's size requirements though.
I had thought about using the temp table approach, it has some
benefits, #1 being it allows the user to get onto something else while
the list is generating... but I need a solution now, I think I'll hit
that in a future version, good idea though.
I ended up putting the following in my balancer group:
And that seems to have handled it, the Apache server skips over the
blocked Mongrel.
I'll have a look at HA Proxy or Nginx per Lionel's post on the next
performance iteration.
Question though Phillip, how would Memcache help in this situation of
long running SQL queries?
I can think with BackgroundRB on a temp table, you have an AJAX auto
requester on the page polling the mongrel that asks "are we done yet?"
and when the task is finished, pop it out. I guess you would get the
mongrel pack to poll a database table seeing if job XYZ is finished
yet and retrieve the temp table name to read from once the job is
finished and then send the data back to the client.
That actually sounds like a good solution now that I think of it. But
I don't know enough about memcache to know how this would fit in.
In fact, you can use BackgrounDRb to store results in Memcache, so as
result is available across all the mongrel clusters.
In a nutshell, you pass the query to BackgrounDRb worker and worker
stores the results back in memcache with a session identifier. You
poll BackgrounDRb with ask_status and when query is finished
ask_status will return to the final result.
You don't even need to use Memcache directly, bdrb has a configuration
option, where you can specify if you want to use Memcache for worker
result storage.
One more way (I do it this way) would be to write a mongrel handler for that particular request. That solves it as mongrel can handle multiple requests simultaneously.