Tracking Down a Memory Leak

Hey all,

I have a weird situation. I've got two different rails applications
running right now that seem to consume as much memory as exists on the
host machine.

The first application is a public-facing e-commerce website. It has
ZERO performance issues - things load incredibly fast (both in terms
of end-user experience and milliseconds of rendering time on the
server side [visible through production.log]), but out of the 8GB of
memory allocated to the machine, the application is using all but
about 200 megs.

The second application is a much lower traffic internal maintenance/
outage reporting system used by several departments in the company.
We originally had given it 2GB of memory, and it consumed nearly every
bit of that, but squeezed that down to 512megs just to see what would
happen (since that system isn't public-facing). Just as before, it
consumed all 512 megs.

Now here's the funny thing: doing a process list in top and ordering
by memory usage, ruby takes AT MOST 1.0% of memory and spawns at most
3-4 processes. That means that ruby - and in that case, my rails
application - can't be the sole culprit, right? Or am I missing
something here?

Both machines are VMWare virtual machines running Ubuntu 9.04 (and
yes, we plan to upgrade soon) with Passenger 2.2.5, Ruby 1.8.7 and
Rails 2.3.4.

We did an experiment with the larger machine to see what caused
issues, if anything. I had our system administrator clone an exact
copy of the first machine (8GB memory machine), and we let it SIT
doing nothing all weekend (Friday - Tuesday) - absolutely no HTTP
requests were made to the machine whatsoever. As I expected, memory
usage is minimal - about 520mb out of the 8GB assigned to it.

I just made some HTTP requests to the machine on the rails app in
question, and didn't see memory usage go up that much, if at all.
However, if it's indeed a true LEAK, we won't know for sure for some
time. At this point I'm watching it to see what's up.

Another interesting point is that, according to our *nix admin, the
kill -9 command should FORCE the operating system to reclaim memory
used by runaway processes. We were operating under the theory that
maybe a ruby process consumed memory, didn't give it up, AND didn't
report its use to the OS (so it wouldn't show up in top, for
example). Well, assuming kill -9 will force memory to be reclaimed by
the OS, I've blown that theory out of the water by issuing a kill -9
on damn near every process related to ruby or apache on the second
virtual machine (after we expanded its memory for testing). After
killing every apache and ruby process (and therefore, passenger),
memory usage had barely moved - maybe 35mb at most.

Our IT director believes it has to be something to do with Ruby,
simply because he doesn't see this behavior on any other machine where
Ruby is *not* installed or in use. Given that sole fact, I'm inclined
to agree, but of course I'd like more than mere conjecture to go on.
In no way am I faulting Ruby or Rails - if there's a leak, it's most
likely my application; but killing the application entirely doesn't
restore memory, seemingly ruling out that theory.

So you can see why I'm confused! Is there anything else I can do, any
other way I can check to see IF indeed it's my application, and if so,
what can I do (other than trimming AR statements, I already know that
one) to clamp down on memory usage?

Thanks guys.

Phoenix Rising wrote:


Hey all,
I have a weird situation. I've got two different rails applications
running right now that seem to consume as much memory as exists on the
host machine.
The first application is a public-facing e-commerce website. It has
ZERO performance issues - things load incredibly fast (both in terms
of end-user experience and milliseconds of rendering time on the
server side [visible through production.log]), but out of the 8GB of
memory allocated to the machine, the application is using all but
about 200 megs.
The second application is a much lower traffic internal maintenance/
outage reporting system used by several departments in the company.
We originally had given it 2GB of memory, and it consumed nearly every
bit of that, but squeezed that down to 512megs just to see what would
happen (since that system isn't public-facing). Just as before, it
consumed all 512 megs.
Now here's the funny thing: doing a process list in top and ordering
by memory usage, ruby takes AT MOST 1.0% of memory and spawns at most
3-4 processes. That means that ruby - and in that case, my rails
application - can't be the sole culprit, right? Or am I missing
something here?
Both machines are VMWare virtual machines running Ubuntu 9.04 (and
yes, we plan to upgrade soon) with Passenger 2.2.5, Ruby 1.8.7 and
Rails 2.3.4.
We did an experiment with the larger machine to see what caused
issues, if anything. I had our system administrator clone an exact
copy of the first machine (8GB memory machine), and we let it SIT
doing nothing all weekend (Friday - Tuesday) - absolutely no HTTP
requests were made to the machine whatsoever. As I expected, memory
usage is minimal - about 520mb out of the 8GB assigned to it.
I just made some HTTP requests to the machine on the rails app in
question, and didn't see memory usage go up that much, if at all.
However, if it's indeed a true LEAK, we won't know for sure for some
time. At this point I'm watching it to see what's up.
Another interesting point is that, according to our *nix admin, the
kill -9 command should FORCE the operating system to reclaim memory
used by runaway processes. We were operating under the theory that
maybe a ruby process consumed memory, didn't give it up, AND didn't
report its use to the OS (so it wouldn't show up in top, for
example). Well, assuming kill -9 will force memory to be reclaimed by
the OS, I've blown that theory out of the water by issuing a kill -9
on damn near every process related to ruby or apache on the second
virtual machine (after we expanded its memory for testing). After
killing every apache and ruby process (and therefore, passenger),
memory usage had barely moved - maybe 35mb at most.
Our IT director believes it has to be something to do with Ruby,
simply because he doesn't see this behavior on any other machine where
Ruby is *not* installed or in use. Given that sole fact, I'm inclined
to agree, but of course I'd like more than mere conjecture to go on.
In no way am I faulting Ruby or Rails - if there's a leak, it's most
likely my application; but killing the application entirely doesn't
restore memory, seemingly ruling out that theory.
So you can see why I'm confused! Is there anything else I can do, any
other way I can check to see IF indeed it's my application, and if so,
what can I do (other than trimming AR statements, I already know that
one) to clamp down on memory usage?
Thanks guys.

What are you using to measure the memory use?

Quoting Phoenix Rising <polarisrising@gmail.com>:

Hey all,

I have a weird situation. I've got two different rails applications
running right now that seem to consume as much memory as exists on the
host machine.

First, how are you measuring the memory usage figures. The reason is Linux
will use most of available memory for buffers and caches. When the
application goes away, the buffers and cache piece of memory usage does not go
away. In fact, it won't go away until reboot or an application needs the
memory. How can you tell the difference? Many ways, I use top, the first
four to five lines contain the needed info.

HTH,
  Jeffrey

Yeah I'm using top to measure resource usage. I'll reboot a server
(virtual server of course), and at boot time memory usage is maybe
~200mb. After a few days it's skyrocketed to consume all 2GB in the
case of the smaller application server we tested (haven't messed with
the other one since it's a public-facing application). What am I
doing wrong here?

Sounds to me like Passenger is spawning a lot more Rails instances than you presume. We’ve had a similar problem in the past, where we had to fix our code because of a Passenger conflict with a thread proxy. Since we are using Mongrel in development, we didn’t notice until production apps started consuming all memory on the server within days. This was the one that gaves us trouble, but there’s a few others as well: http://www.modrails.com/documentation/Users%20guide.html#_smart_spawning_gotcha_2_the_need_to_revive_threads

Best regards

Peter De Berdt

High memory usage in Linux is not necessarily a Bad Thing (tm). As
stated earlier, Linux systems use memory for caching filesystem and
other things. This will result in memory usage growing over a period
of time while the machine is up. The system will free this memory as
necessary when applications request it. As long as your system is not
running into swap, and none of the individual processes are growing
out of control, it is likely that the machine is in good shape.
Unless there is a performance issue that you are seeing, do not think
that high memory usage is hurting you. Look at the "buffers" number
on the Mem line in top. That will tell you how much of the memory is
in use by the kernel for buffering/caching. All of that should be
available for your applications to use as necessary.

Chris

If you yourself are seeing that the Ruby processes aren't using all of
the memory, why are you thinking the Ruby application is using all of
the memory? As some others have said, ***basically*** the total memory
usage reported by top and free is that which the kernel has allocated
(and will never give back); from that the kernel gives memory to
processes as requested/available.

You say you're running Passenger: have you used passenger-memory-stats
to actually get the accurate look at the memory usage of the Ruby app
and Apache? As a bit of a better system memory usage indicator, what
does the -/+ buffers/cache column of free say? That shows the amount
of memory total that the kernel actually is freely able to allocate to
processes.

Thank you all for the replies!

If you yourself are seeing that the Ruby processes aren't using all of
the memory, why are you thinking the Ruby application is using all of
the memory? As some others have said, ***basically*** the total memory
usage reported by top and free is that which the kernel has allocated
(and will never give back); from that the kernel gives memory to
processes as requested/available.

Thanks for the explanation as to how/why/what the kernel is doing
here. I simply wasn't aware this is how it worked; my understanding
was that the kernel would allow a program to use memory when
requested, and when the program exits, the kernel would reclaim, then
"free" said memory. Obviously I was wrong. My reasoning/thinking it
was due to my ruby application was because I haven't seen this
behavior before (then again, I never looked this closely, either).
Not only that, but our systems admin and IT director are basically
pointing the finger at me and my choice of Ruby on Rails as "the
culprit" :slight_smile: Obviously they're WRONG! However, I'm trying to lean-out
resource usage because if we can free up resources on our host server
(s), we can hold more virtual machines on each, thus reducing costs.

You say you're running Passenger: have you used passenger-memory-stats
to actually get the accurate look at the memory usage of the Ruby app
and Apache? As a bit of a better system memory usage indicator, what
does the -/+ buffers/cache column of free say? That shows the amount
of memory total that the kernel actually is freely able to allocate to
processes.

On our public-facing web application server, I'm seeing 13 Apache
processes, consuming 0.3mb - 0.6mb of memory each in "private"
memory. VMSize is about 148.5MB per apache process. The same utility
reports that I have two rails instances running, each of which is
consuming approximately 60mb of memory.

With regards to free, the -/+ column reads:

-/+ buffers/cache: 429896 7183448

I'm assuming this is measured in kilobytes? (I've never heard of this
tool, honestly - I'm no systems admin by a long shot!) If this is
true, that means 7.1 GIGS of data is being cached!? Why - and what
exactly is the system caching? How could I find that out? And as
vwchris pointed out, this may not necessarily be a bad thing, but how
can I tell for sure?

High memory usage in Linux is not necessarily a Bad Thing (tm). As
stated earlier, Linux systems use memory for caching filesystem and
other things. This will result in memory usage growing over a period
of time while the machine is up. The system will free this memory as
necessary when applications request it. As long as your system is not
running into swap, and none of the individual processes are growing
out of control, it is likely that the machine is in good shape.
Unless there is a performance issue that you are seeing, do not think
that high memory usage is hurting you. Look at the "buffers" number
on the Mem line in top. That will tell you how much of the memory is
in use by the kernel for buffering/caching. All of that should be
available for your applications to use as necessary.

Right now my buffers line reads 252608k - so about 250mb. Full top
header:

top - 14:11: 48 up 20 days, 16:50, 1 user, load average: 0.05, 0.04,
0.00
Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 7613344k total, 75734036k used, 41704k free, 252604k buffers
Swap: 4305380k total, 116k used, 4305264k free, 676664k cached

(I actually re-typed that output from another machine due to the fact
I can't VPN from this Windows 7 box and I'm working from home thanks
to the H1N1 vaccine...)

So after doing some looking around thanks to your recommendations, it
looks to me like the kernel is caching 7.1GB of data (weird...really,
really weird) and freeing that as needed. Does this look/sound like a
plausible explanation to you? Because I've NEVER heard of that, but
maybe I'm just plain wrong.

Thanks for your help!