Tracking Down a Memory Leak

Hey all,

I have a weird situation. I've got two different rails applications running right now that seem to consume as much memory as exists on the host machine.

The first application is a public-facing e-commerce website. It has ZERO performance issues - things load incredibly fast (both in terms of end-user experience and milliseconds of rendering time on the server side [visible through production.log]), but out of the 8GB of memory allocated to the machine, the application is using all but about 200 megs.

The second application is a much lower traffic internal maintenance/ outage reporting system used by several departments in the company. We originally had given it 2GB of memory, and it consumed nearly every bit of that, but squeezed that down to 512megs just to see what would happen (since that system isn't public-facing). Just as before, it consumed all 512 megs.

Now here's the funny thing: doing a process list in top and ordering by memory usage, ruby takes AT MOST 1.0% of memory and spawns at most 3-4 processes. That means that ruby - and in that case, my rails application - can't be the sole culprit, right? Or am I missing something here?

Both machines are VMWare virtual machines running Ubuntu 9.04 (and yes, we plan to upgrade soon) with Passenger 2.2.5, Ruby 1.8.7 and Rails 2.3.4.

We did an experiment with the larger machine to see what caused issues, if anything. I had our system administrator clone an exact copy of the first machine (8GB memory machine), and we let it SIT doing nothing all weekend (Friday - Tuesday) - absolutely no HTTP requests were made to the machine whatsoever. As I expected, memory usage is minimal - about 520mb out of the 8GB assigned to it.

I just made some HTTP requests to the machine on the rails app in question, and didn't see memory usage go up that much, if at all. However, if it's indeed a true LEAK, we won't know for sure for some time. At this point I'm watching it to see what's up.

Another interesting point is that, according to our *nix admin, the kill -9 command should FORCE the operating system to reclaim memory used by runaway processes. We were operating under the theory that maybe a ruby process consumed memory, didn't give it up, AND didn't report its use to the OS (so it wouldn't show up in top, for example). Well, assuming kill -9 will force memory to be reclaimed by the OS, I've blown that theory out of the water by issuing a kill -9 on damn near every process related to ruby or apache on the second virtual machine (after we expanded its memory for testing). After killing every apache and ruby process (and therefore, passenger), memory usage had barely moved - maybe 35mb at most.

Our IT director believes it has to be something to do with Ruby, simply because he doesn't see this behavior on any other machine where Ruby is *not* installed or in use. Given that sole fact, I'm inclined to agree, but of course I'd like more than mere conjecture to go on. In no way am I faulting Ruby or Rails - if there's a leak, it's most likely my application; but killing the application entirely doesn't restore memory, seemingly ruling out that theory.

So you can see why I'm confused! Is there anything else I can do, any other way I can check to see IF indeed it's my application, and if so, what can I do (other than trimming AR statements, I already know that one) to clamp down on memory usage?

Thanks guys.

Phoenix Rising wrote:


Hey all,
I have a weird situation. I've got two different rails applications
running right now that seem to consume as much memory as exists on the
host machine.
The first application is a public-facing e-commerce website. It has
ZERO performance issues - things load incredibly fast (both in terms
of end-user experience and milliseconds of rendering time on the
server side [visible through production.log]), but out of the 8GB of
memory allocated to the machine, the application is using all but
about 200 megs.
The second application is a much lower traffic internal maintenance/
outage reporting system used by several departments in the company.
We originally had given it 2GB of memory, and it consumed nearly every
bit of that, but squeezed that down to 512megs just to see what would
happen (since that system isn't public-facing). Just as before, it
consumed all 512 megs.
Now here's the funny thing: doing a process list in top and ordering
by memory usage, ruby takes AT MOST 1.0% of memory and spawns at most
3-4 processes. That means that ruby - and in that case, my rails
application - can't be the sole culprit, right? Or am I missing
something here?
Both machines are VMWare virtual machines running Ubuntu 9.04 (and
yes, we plan to upgrade soon) with Passenger 2.2.5, Ruby 1.8.7 and
Rails 2.3.4.
We did an experiment with the larger machine to see what caused
issues, if anything. I had our system administrator clone an exact
copy of the first machine (8GB memory machine), and we let it SIT
doing nothing all weekend (Friday - Tuesday) - absolutely no HTTP
requests were made to the machine whatsoever. As I expected, memory
usage is minimal - about 520mb out of the 8GB assigned to it.
I just made some HTTP requests to the machine on the rails app in
question, and didn't see memory usage go up that much, if at all.
However, if it's indeed a true LEAK, we won't know for sure for some
time. At this point I'm watching it to see what's up.
Another interesting point is that, according to our *nix admin, the
kill -9 command should FORCE the operating system to reclaim memory
used by runaway processes. We were operating under the theory that
maybe a ruby process consumed memory, didn't give it up, AND didn't
report its use to the OS (so it wouldn't show up in top, for
example). Well, assuming kill -9 will force memory to be reclaimed by
the OS, I've blown that theory out of the water by issuing a kill -9
on damn near every process related to ruby or apache on the second
virtual machine (after we expanded its memory for testing). After
killing every apache and ruby process (and therefore, passenger),
memory usage had barely moved - maybe 35mb at most.
Our IT director believes it has to be something to do with Ruby,
simply because he doesn't see this behavior on any other machine where
Ruby is *not* installed or in use. Given that sole fact, I'm inclined
to agree, but of course I'd like more than mere conjecture to go on.
In no way am I faulting Ruby or Rails - if there's a leak, it's most
likely my application; but killing the application entirely doesn't
restore memory, seemingly ruling out that theory.
So you can see why I'm confused! Is there anything else I can do, any
other way I can check to see IF indeed it's my application, and if so,
what can I do (other than trimming AR statements, I already know that
one) to clamp down on memory usage?
Thanks guys.

What are you using to measure the memory use?

Quoting Phoenix Rising <polarisrising@gmail.com>:

Hey all,

I have a weird situation. I've got two different rails applications running right now that seem to consume as much memory as exists on the host machine.

First, how are you measuring the memory usage figures. The reason is Linux will use most of available memory for buffers and caches. When the application goes away, the buffers and cache piece of memory usage does not go away. In fact, it won't go away until reboot or an application needs the memory. How can you tell the difference? Many ways, I use top, the first four to five lines contain the needed info.

HTH,   Jeffrey

Yeah I'm using top to measure resource usage. I'll reboot a server (virtual server of course), and at boot time memory usage is maybe ~200mb. After a few days it's skyrocketed to consume all 2GB in the case of the smaller application server we tested (haven't messed with the other one since it's a public-facing application). What am I doing wrong here?

Sounds to me like Passenger is spawning a lot more Rails instances than you presume. We’ve had a similar problem in the past, where we had to fix our code because of a Passenger conflict with a thread proxy. Since we are using Mongrel in development, we didn’t notice until production apps started consuming all memory on the server within days. This was the one that gaves us trouble, but there’s a few others as well: http://www.modrails.com/documentation/Users%20guide.html#_smart_spawning_gotcha_2_the_need_to_revive_threads

Best regards

Peter De Berdt

High memory usage in Linux is not necessarily a Bad Thing (tm). As stated earlier, Linux systems use memory for caching filesystem and other things. This will result in memory usage growing over a period of time while the machine is up. The system will free this memory as necessary when applications request it. As long as your system is not running into swap, and none of the individual processes are growing out of control, it is likely that the machine is in good shape. Unless there is a performance issue that you are seeing, do not think that high memory usage is hurting you. Look at the "buffers" number on the Mem line in top. That will tell you how much of the memory is in use by the kernel for buffering/caching. All of that should be available for your applications to use as necessary.

Chris

If you yourself are seeing that the Ruby processes aren't using all of the memory, why are you thinking the Ruby application is using all of the memory? As some others have said, ***basically*** the total memory usage reported by top and free is that which the kernel has allocated (and will never give back); from that the kernel gives memory to processes as requested/available.

You say you're running Passenger: have you used passenger-memory-stats to actually get the accurate look at the memory usage of the Ruby app and Apache? As a bit of a better system memory usage indicator, what does the -/+ buffers/cache column of free say? That shows the amount of memory total that the kernel actually is freely able to allocate to processes.

Thank you all for the replies!

If you yourself are seeing that the Ruby processes aren't using all of the memory, why are you thinking the Ruby application is using all of the memory? As some others have said, ***basically*** the total memory usage reported by top and free is that which the kernel has allocated (and will never give back); from that the kernel gives memory to processes as requested/available.

Thanks for the explanation as to how/why/what the kernel is doing here. I simply wasn't aware this is how it worked; my understanding was that the kernel would allow a program to use memory when requested, and when the program exits, the kernel would reclaim, then "free" said memory. Obviously I was wrong. My reasoning/thinking it was due to my ruby application was because I haven't seen this behavior before (then again, I never looked this closely, either). Not only that, but our systems admin and IT director are basically pointing the finger at me and my choice of Ruby on Rails as "the culprit" :slight_smile: Obviously they're WRONG! However, I'm trying to lean-out resource usage because if we can free up resources on our host server (s), we can hold more virtual machines on each, thus reducing costs.

You say you're running Passenger: have you used passenger-memory-stats to actually get the accurate look at the memory usage of the Ruby app and Apache? As a bit of a better system memory usage indicator, what does the -/+ buffers/cache column of free say? That shows the amount of memory total that the kernel actually is freely able to allocate to processes.

On our public-facing web application server, I'm seeing 13 Apache processes, consuming 0.3mb - 0.6mb of memory each in "private" memory. VMSize is about 148.5MB per apache process. The same utility reports that I have two rails instances running, each of which is consuming approximately 60mb of memory.

With regards to free, the -/+ column reads:

-/+ buffers/cache: 429896 7183448

I'm assuming this is measured in kilobytes? (I've never heard of this tool, honestly - I'm no systems admin by a long shot!) If this is true, that means 7.1 GIGS of data is being cached!? Why - and what exactly is the system caching? How could I find that out? And as vwchris pointed out, this may not necessarily be a bad thing, but how can I tell for sure?

High memory usage in Linux is not necessarily a Bad Thing (tm). As stated earlier, Linux systems use memory for caching filesystem and other things. This will result in memory usage growing over a period of time while the machine is up. The system will free this memory as necessary when applications request it. As long as your system is not running into swap, and none of the individual processes are growing out of control, it is likely that the machine is in good shape. Unless there is a performance issue that you are seeing, do not think that high memory usage is hurting you. Look at the "buffers" number on the Mem line in top. That will tell you how much of the memory is in use by the kernel for buffering/caching. All of that should be available for your applications to use as necessary.

Right now my buffers line reads 252608k - so about 250mb. Full top header:

top - 14:11: 48 up 20 days, 16:50, 1 user, load average: 0.05, 0.04, 0.00 Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7613344k total, 75734036k used, 41704k free, 252604k buffers Swap: 4305380k total, 116k used, 4305264k free, 676664k cached

(I actually re-typed that output from another machine due to the fact I can't VPN from this Windows 7 box and I'm working from home thanks to the H1N1 vaccine...)

So after doing some looking around thanks to your recommendations, it looks to me like the kernel is caching 7.1GB of data (weird...really, really weird) and freeing that as needed. Does this look/sound like a plausible explanation to you? Because I've NEVER heard of that, but maybe I'm just plain wrong.

Thanks for your help!