502 Proxy Error - mongrel_rails, apache, cap 2.0

I've been having problems with mongrel for several weeks now, but
usually I'm able to resolve the problem by deleting the PID files,
restarting mongrel, and/or redeploying. But this time, I'm not able
to resolve the situation.

I'm able to pull up pages on the site on one request, but then the
next will produce the 502 Proxy Error, so it seems as though one of my
clusters is "stuck" someone.

I'm receiving this error in the browser:
Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request POST /login.

Reason: Error reading from remote server

Apache/2.2.3 (Red Hat) Server at 67.192.59.219 Port 80

I've tried killing the processing and restarting mongrel, but I'm
still having the same problem.

Can someone help by pointing me in the right direction of what to look
for in log files, or how to solve this issue?

I'm using the basic recipe laid out in the Pragmatic Programmers Guide
and on all my Rails projects, I run into a problem of one or more of
the clusters not restarting because a PID file already exists...is
this a known bug in the version of Capistrano I'm using?

Any help would be much appreciated.

Thanks,
Andy

I've been having problems with mongrel for several weeks now, but
usually I'm able to resolve the problem by deleting the PID files,
restarting mongrel, and/or redeploying. But this time, I'm not able
to resolve the situation.

I'm able to pull up pages on the site on one request, but then the
next will produce the 502 Proxy Error, so it seems as though one of my
clusters is "stuck" someone.

I'm receiving this error in the browser:
Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request POST /login.

Reason: Error reading from remote server

Apache/2.2.3 (Red Hat) Server at 67.192.59.219 Port 80

I've tried killing the processing and restarting mongrel, but I'm
still having the same problem.

Can someone help by pointing me in the right direction of what to look
for in log files, or how to solve this issue?

I'm using the basic recipe laid out in the Pragmatic Programmers Guide
and on all my Rails projects, I run into a problem of one or more of
the clusters not restarting because a PID file already exists...is
this a known bug in the version of Capistrano I'm using?

Any help would be much appreciated.

I'm able to restart the clusters but something in the Rails code is
making one or both of the mongrel clusters hang.

If there are any Rails developers that have experience with deploying
with Capistrano 2.0, using Apache and mongrel_rails, we have a budget
to pay another developer to help us debug this problem.

It has stopped us in our tracks and I don't see anywhere in the Rails
logs files that shows what's causing it. It's not happening in our
development environment either. My client is unable to test new
functionality and we're trying to hit an aggressive deadline.

If you can help, please post here and I'll contact you directly.

Thanks,
Andy

I'm able to restart the clusters but something in the Rails code is
making one or both of the mongrel clusters hang.

If there are any Rails developers that have experience with deploying
with Capistrano 2.0, using Apache and mongrel_rails, we have a budget
to pay another developer to help us debug this problem.

It has stopped us in our tracks and I don't see anywhere in the Rails
logs files that shows what's causing it. It's not happening in our
development environment either. My client is unable to test new
functionality and we're trying to hit an aggressive deadline.

If you can help, please post here and I'll contact you directly.

I'm able to restart the clusters but something in the Rails code is
making one or both of the mongrel clusters hang.

If there are any Rails developers that have experience with deploying
with Capistrano 2.0, using Apache and mongrel_rails, we have a budget
to pay another developer to help us debug this problem.

It has stopped us in our tracks and I don't see anywhere in the Rails
logs files that shows what's causing it. It's not happening in our
development environment either. My client is unable to test new
functionality and we're trying to hit an aggressive deadline.

If you can help, please post here and I'll contact you directly.

I had the same problem. I was using a cluster of 3 mongrels starting
at port 8000. Turns out I had another process that always grabbed
that port, so only two of my three mongrels were actually running.
The first time I connected to the site took a long time because the
load balancer waited for the first mongrel to time out, then passed
the request to the second. But on reconnect it always gave me the
proxy error. I changed my starting port number and the problem went
away...

Hi,

Were you able to access all your mongrel instance separately?

I can think of the following reasons.

  1. One or more of your mongrel instance might not have started at all. Might be due to port clash?

  2. All the mongrel instances are started but one or more of your mongrel instance is blocked by your firewall. Might happen if you configure your proxy balancer to use ur domain name instead of localhost/127.0.0.1.

  3. Might be a bug in mongrel. There was a bug in a previous version of mongrel which results in mongrel clusters ending in infinite CLOSED_WAIT state. To check run “netstat -na” and check if your mongrel clusters are in CLOSED_WAIT state. What version of mongrel are you using? This was happening when the application was running for a longer period of time, but not at immediately.

  4. Your database connection might dropout. Increate the database verfication timeout. Add

ActiveRecord::Base.verification_timeout = 14400 to your environment.rb. This might happen if your application was idle for a longer period of time.

  1. Your log files are huge. If you don’t have proper log file rotation, it might lock your mongrel process. Again this will happen only if your application is running for a longer period of time.

  2. There is some performance issue in your application. Try running top and monitor ur CPU/Memory usage. Debug using strace -p to trace out the issue.

Since you were getting the proxy error immediately, my bet is on reasons 1 and 2.

Good luck!

Cheers,

Ganesh Gunasegaran.
SageWork(http://www.sagework.com) Simplify IT

Thanks for your help everyone.

I've been able to get the mongrel instances restarted now. I have to
remove the PID files, kill the processes and then deploy.

Now, I'm debugging why the mongrel instance hangs and it looks like it
happens when the collection I'm looping through has LONG TEXT inside
of it.

When the objects in the collection have short text, it doesn't have a
problem.

Basically, I'm looping through a list of messages that have a subject
and body. When the subject and body are short, it functions fine, but
when they are long, the proxy error is triggered.

Does anyone have any experience with mongrel clusters timing out
because of large text in collections?

Thanks,
Andy

Thanks for your help everyone.

I've been able to get the mongrel instances restarted now. I have to
remove the PID files, kill the processes and then deploy.

Now, I'm debugging why the mongrel instance hangs and it looks like it
happens when the collection I'm looping through has LONG TEXT inside
of it.

When the objects in the collection have short text, it doesn't have a
problem.

Basically, I'm looping through a list of messages that have a subject
and body. When the subject and body are short, it functions fine, but
when they are long, the proxy error is triggered.

Does anyone have any experience with mongrel clusters timing out
because of large text in collections?

Hi everyone,

I'm experiencing the same problem on my server since today. I actually
did not change anything on it. So I found this post and I tried all
your tips without success.
I even restarted my server (Ubuntu). The problem really seems to come
from mongrel as when try to reach directly my website with lynx like :
http://www.mysite.com:3000/ I have the same delay....

Any idea ? I'm really stuck presently :frowning:

Many thanks in advance.

Antoine