Mongrel stops responding after period of inactivity


I'm running a 2-instance Mongrel cluster behind Apache 2.2.4 with Rails 1.2.3. If no requests are received by the application for several hours (this usually happens overnight) then Mongrel stops responding and no requests are detected by Rails (at least nothing is in the Rails log). Nothing untoward is in the Mongrel log.

If you try and visit the application, the request times out with a 502 Proxy Error. Apache is still up and running and serving static files without a problem, but the following is present in the Apache error log (from trying to hit the SessionController, identified by / sessions) which may provide a clue:

    [Sun Jul 29 08:37:36 2007] [error] proxy: error reading status line from remote server     [Sun Jul 29 08:37:36 2007] [error] proxy: Error reading from remote server returned by /sessions

Restarting the Mongrel cluster resolves the problem until the next time it happens. I have done a lot of reading about this issue online and a number of sources -- including the Mongrel FAQ -- point to being able to fix a 'hanging mongrel' situation by setting this value in environment.rb:

    ActiveRecord::Base.verification_timeout = 14400

This make the ActiveRecord timeout value less than the MySQL default of 28800. I have made this change and it doesn't appear to make a difference.

Can anyone advise me on what to try next to diagnose this issue? I'm quickly running out of ideas and I'd appreciate a bit of help!

Here are the vitals from our Ubuntu 6.0.6 server:

    MySQL         Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (x86_64) using readline 5.1         connect_timeout 5         interactive_timeout 28800         max_connect_errors 10         max_connections 100         max_user_connections 0         wait_timeout 28800

    ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux]         cgi_multipart_eof_fix (2.2)         fastthread (1.0)         mongrel (1.0.1)         mongrel_cluster (1.0.2)

    Apache/2.2.4 (Unix)

    Mongrel Cluster Config:         port: "8000"         environment: production         address:         pid_file: log/         servers: 2         user: [the user]         group: [the group]

    Apache Load Balancer setup:         <Proxy balancer://appname_cluster>         BalancerMember         BalancerMember         </Proxy>

    Apache Loaded Modules:         core_module (static)         authn_file_module (static)         authn_default_module (static)         authz_host_module (static)         authz_groupfile_module (static)         authz_user_module (static)         authz_default_module (static)         auth_basic_module (static)         cache_module (static)         include_module (static)         filter_module (static)         deflate_module (static)         log_config_module (static)         env_module (static)         headers_module (static)         setenvif_module (static)         proxy_module (static)         proxy_connect_module (static)         proxy_ftp_module (static)         proxy_http_module (static)         proxy_ajp_module (static)         proxy_balancer_module (static)         ssl_module (static)         mpm_prefork_module (static)         http_module (static)         mime_module (static)         status_module (static)         autoindex_module (static)         asis_module (static)         cgi_module (static)         negotiation_module (static)         dir_module (static)         actions_module (static)         userdir_module (static)         alias_module (static)         rewrite_module (static)         so_module (static)         php5_module (shared)         info_module (shared)

I had the same problem exactly with a debian server. Setting the verification timeout didn’t help, so I had to put a cron job to wget localhost on both ports every ten minutes. The solution may be stupid, but since the admin of the server never gave me the root account (and the security settings are FBI -like ), I couldn’t think of anything else. The strange thing is that we are running 3-4 production and development boxes that never hang even if mongrel stays inactive for days. (gentoo, arch mainly). My guess is that it has something to do with the debian version of mysql or some insane security settings.

Thanks for the reply. I wonder whether upgrading to Ubuntu Edge or even Feisty might help, although that seems like quite a drastic option.

I'd still like to know why it's actually happening, so if anyone has any other ideas of how to track down the root of the problem I'm all ears!


[also posted to the Mongrel Users list]

I thought I'd follow up on my progress with this thread. I haven't seen the problem now for a couple of days so (fingers crossed) my problem appears to be fixed.

Firstly, I DO have logrotate scheduled to run daily using cron but I didn't touch this, so I don't believe this is the problem in my case.

I initially tried setting the following in environment.rb

        ActiveRecord::Base.verification_timeout = 14400

This did NOT solve my problem, but is probably a change which should be made anyway. I then made two significant changes:

1. I didn't have the MySQL gem installed, so I installed the latest version (version 2.7).

2. I made the following change to my Apache Virtual Host setting to prevent Apache from losing the connection to the proxy (according to mod_proxy - Apache HTTP Server)

        SetEnv force-proxy-request-1.0 1         SetEnv proxy-nokeepalive 1

After making these two changes and restarting MySQL, Apache and the Mongrel Cluster, the problem hasn't reoccurred.

Hope this helps people in the future.

Cheers, Olly