Hi,
My rails app has been growing in LOC, everything was running fine, until someday (one or two weeks ago) where I pushed an update to my server: after a random period of time, my ruby processes eat 100% of the cpu, and the app becomes unresponsive. The problem is that I am unable to tell which update started giving troubles.
$ netstat -anp shows connections not being properly closed between my rails process and postgresql database, the rails app certainly is hanging there.
I have yet been unable to identify the source of the problem even after: - reinstalling on a fresh operating system (debian lenny) - switching from connecting to postgresql through remote tcp to local unix sockets - updating nginx - updating Rails and other gems - updating plugins, and removing some that are not so useful - moving from Thin instances to Nginx+Passenger - removing suspicious and most recent lines of code that could be the problem
Everything works fine on my dev machine. On the production server, after a random amount of time, it suddenly goes crazy. It's terribly painful to hunt down and I don't see any new potential areas to investigate.
Recently I have been seeing a new error message from time to time but which disappears on the next request: