Hi,
My rails app has been growing in LOC, everything was running fine, until
someday (one or two weeks ago) where I pushed an update to my server:
after a random period of time, my ruby processes eat 100% of the cpu,
and the app becomes unresponsive. The problem is that I am unable to
tell which update started giving troubles.
$ netstat -anp shows connections not being properly closed between my
rails process and postgresql database, the rails app certainly is
hanging there.
I have yet been unable to identify the source of the problem even after:
- reinstalling on a fresh operating system (debian lenny)
- switching from connecting to postgresql through remote tcp to local
unix sockets
- updating nginx
- updating Rails and other gems
- updating plugins, and removing some that are not so useful
- moving from Thin instances to Nginx+Passenger
- removing suspicious and most recent lines of code that could be the
problem
Everything works fine on my dev machine. On the production server, after
a random amount of time, it suddenly goes crazy. It's terribly painful
to hunt down and I don't see any new potential areas to investigate.
Recently I have been seeing a new error message from time to time but
which disappears on the next request: