Making Mongrels Faster...

So I have a rather high traffic site that is starting to slow down significantly. Here is the setup.

4 boxes, 75 mongrels spread across them all. One of them is the “master” that is using apache mod_proxy_balancer to balance the traffic across them all.

And one database server (which does not seem to be getting overly taxed)

What I am looking for are any tweaks tips people have used or know of to make the rendering that the mongrels do faster. I am looking into memcached but as of yet I have found in testing on my linux servers that it is horribly slow compared to the same code and test on a dev windows box.

Any help would be greatly appreciated.

Nes++

Nestor Camacho said the following on 02/14/2007 11:44 AM:

What I am looking for are any tweaks tips people have used or know of to make the rendering that the mongrels do faster. I am looking into memcached but as of yet I have found in testing on my linux servers that it is horribly slow compared to the same code and test on a dev windows box.

Without knowing the performance profiles of the boxes, its difficult to generalise.

I'd start by looking to see of there was 'starvation', of memory, of network bandwidth .... heck, sometimes using a correctly configured switch, or reconfiguring your switch, can make a massive difference!

I'd then look to see how the mod_proxy is interacting and distributing.

In the past, I've achieved astounding performance just with round-robin DNS and no other 'balancing'. At one site they were cynical and installed a hardware load balancer and performance dropped compared to the RR-DNS.

At another site, they thought that locking critical parts of the application into memory would speed things up. In fact the OS paging algorithm was smarter than they were - the app ran faster when not locked.

Tuning often requires deep knowledge of the architecture. I can tune most version of *NIX but haven't a clue when it comes to Windows.

But as I say, the generalizations we can make in the absence of details and measurements may not be very helpful or informative.

What is the CPU utilization break out from top while under load?

What does vmstat 5 5 report while under load?

There is a Google Group, started by Robby and Planet Argon here:

   http://groups.google.com/group/rubyonrails-deployment

That focuses entirely on Rails deployment issues.

First thanks for the quick response and sorry for not giving more details. Hardware details below.

Vendor OS CPU Memory Harddrive(s) Dell PowerEdge CENT OS 4.2 2xP4 3 GHZ 2GB 2x160GB

Unfortunately, I do not have access to the network side of things this is at a collocation server states away. However, I don’t think the problem is the network I am able to do a lot of bandwidth related duties both small and large transfers very quickly and it is very responsive. I have sustained 10meg’s with no degradation.

I will try and collect some measurements and update everyone. But from what I am seeing I am not quite maxing out on memory or cpu When traffic starts to come in, the mongrels spike the CPU to do the rendering/database calls, etc. Then cool down than spike again to render/database calls, etc. Other than the going through and removing as much code bottlenecks we might have. From my point of view I wanted to try and squeeze out all I can from the mongrels/server.

Nes++

On 2/14/07, Anton Aylward < anton@si.on.ca

Nestor Camacho said the following on 02/14/2007 12:56 PM:

Unfortunately, I do not have access to the network side of things this is at a collocation server states away.

LOL! That pretty much gaurentees its going to be a network problem! :slight_smile:

However, I don't think the problem is the network I am able to do a lot of bandwidth related duties both small and large transfers very quickly and it is very responsive. I have sustained 10meg's with no degradation.

I read that to mean your connection form the outside world into the machines. The connection between the machines may have other considerations. As I said, setting up switching hubs can influence performance in odd ways.

You might google for the excellent papers written by the guys at Wikipedia (as well, of course as those at Google and Ebay) on how they grew their networks, do load balancing and the trade off between database, rendering, file/image/stylesheet//javascript serving, static pages, etc.

One thing they point out is that when scaling out, FTP is the kiss-of-death. Don't share code that way. Push it with sync.

I wish I knew as much about Ruby and Rails as I do about hardware. Guess where I mis-spent my youth?

4 boxes, 75 mongrels spread across them all. One of them is the "master" that is using apache mod_proxy_balancer to balance the traffic across them all. And one database server (which does not seem to be getting overly taxed)

That's what... almost 19 mongrels per box? We run 4... 5 was too many... see here for more info:

http://mongrel.rubyforge.org/docs/how_many_mongrels.html

What I am looking for are any tweaks tips people have used or know of to make the rendering that the mongrels do faster. I am looking into memcached but as of yet I have found in testing on my linux servers that it is horribly slow compared to the same code and test on a dev windows box.

Also, memcache shouldn't be slow... except on OSX, and then only if you don't apply the patch by Hodel... memcache should be fast... so something is wrong there...

The other thing to look at is DB caching... mysql for instance won't cache any query involving "NOW()"... so if you can tweak some of those you might save some DB time as well...

Good luck!

-philip

There is definitely a point of diminishing returns on performance
when you add too many mongrels. I would say scale back to 10 mongrels
or less per box. How many page views/day are you serving?

  You can change the production log level to fatal and gain
performance by only logging fatal errors. Without knowing more about
your app I can't say much more. With that many mongrels you may want
to try Haproxy for the load balancing, it is a lot more inteligent
then mod_proxy_balancer.

-Ezra

Hi,

How optimized is the app? Can you get rails to do less work using caches page? Is the app public so we can have a look to get some ideas?

Cheers, Carl.

Carl Woodward said the following on 02/15/2007 01:19 AM:

How optimized is the app? Can you get rails to do less work using caches page? Is the app public so we can have a look to get some ideas?

Caching is not _always_ a good strategy It assumes that the items being cached have a high rate of reuse. There are many circumstances where this does not apply.

Ebay, for example, has factored out the 'static' pages and serves them from a dedicated machine ((or cluster). The 'static' things include javascript and style sheets.

The reality is that a UNIX box is caching a lot of things - the pages of the files that contain the binary of the Ruby interpreter, the text files that make up the code for RoR and the application, and of course the directory and i-node information for all those files.

So when you cache fragments or pages, the OS sees them as competing with all this.

If the 'static' pages are taking up space in the system cache then they are loading down your ability to cache fragments and dynamic pages. Even if they don't get flushed, they still present a load t the virtual memory page use check algorithm, and so eat CPU cycles.

Please don't try to tweak the VM caching. I've found that even the "poorer" (by whatever critiqued) virtual memory systems are better than application programmers think.

The best short term solution is to throw more memory on the machine. You may need one of the enhanced kernels. Many are built for a 4Gig limit, but its easy to build or procure one for a 64Gig limit - just make sure you don't pick the ones built for a laptop or desktop :slight_smile:

The longer term solution is to study what Google, Yahoo, EBay and others have done and written about, make measurements of your own system and experiment.

Don't expect to get it right the first time!

I agrees there is a point where you will get diminished returns. We originally had 4 mongrels per server as the load increased so did the number of mongrels we added. Till we reached the 75 in total. The issue were the spikes, we would drum along without any issues all day then slam we would get 1000-2000 visitors in a 20-30 min span. As it stands from my perf tests, I was getting 4 req/s a second to load our main page. Not very good…

Last night I spent a few hours trying out apache/fastcgi since some people swear by it some people swear about it ;). And I was able to get 17-18 req/s. Much better… will I move over to fastcgi… maybe I want to try out your idea about putting the mongrels logging to fatal errors.

How do I do that I tried looking at the docs and did not see anything in there.

Thanks for everyones feed back! It has been a tremendous help.

Nes++

Nestor Camacho said the following on 02/15/2007 02:22 PM:

I agrees there is a point where you will get diminished returns. We originally had 4 mongrels per server as the load increased so did the number of mongrels we added. Till we reached the 75 in total. The issue were the spikes, we would drum along without any issues all day then slam we would get 1000-2000 visitors in a 20-30 min span. As it stands from my perf tests, I was getting 4 req/s a second to load our main page. Not very good...

If your main page is 'static' or has many static elements its a candidate to move out of that cluster.

You might try, just as an example, setting up the INET daemon with

http-8081 stream tcp nowait nobody /bin/cat cat /cdrom/index.html http-8082 stream tcp nowait nobody /bin/cat cat /cdrom/docs/index.html http-8083 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc1.html http-8084 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc2.html http-8085 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc3.html

Fast, low CPU load. Oh, and VERY secure as well :slight_smile:

Now, set up a machine with with no applicaitons, no apache that does that, and add it to your DNS naming it "styleserver.mydomain.com"

Now your headers read:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt; <html xmlns="http://www.w3.org/1999/xhtml&quot;&gt; <head>   ....   .....   <link href="http://stylesheets.mydomain.com:8086/stylesheets/local.css?&quot;        media="all" rel="Stylesheet" type="text/css" />   <link href="http://stylesheets.mydomain.com:8087/stylesheets/theme.css?&quot;        media="print" rel="Stylesheet" type="text/css" />   .... </head>

and so forth. Similarly for javascript.

The main page is not static it is actually dynamic. In my httpd.conf file I had set it up so that apache would deliver what little files are static (stylesheets, images, etc). Besides that though a lot of dynamic rendering is done to load the main page. I already have the developers combing through the code to find places where they can make things more stream line as well as trying to start to use memcache (which is reporting slower responses on linux, but that is a whole other issue).

As to Stephan Wehner’s question.

According to the rails it believes that it is rendering things at 14-27 req/s depending on what part of the site is being rendered.

Nes++

Nestor Camacho said the following on 02/15/2007 04:30 PM:

The main page is not static it is actually dynamic. In my httpd.conf file I had set it up so that apache would deliver what little files are static (stylesheets, images, etc).

That's good. It makes it easy to move it out of the overhead that invovles the extra work done by that big blob of code that is Apache and onto a dedicated 'static server' machine.

Its not the size of the files, its the ridiculous amount of work Apache has to do to serve them up.

Factoring them off take a load off the rendering servers.

K.I.S.S.

http://meta.wikimedia.org/wiki/Why_Wikipedia_ran_slow_in_late_2003 http://meta.wikimedia.org/wiki/Why_Wikipedia_runs_slow http://meta.wikimedia.org/wiki/November_2005_image_server

At Wikimedia servers - Meta <quote> The Squid systems maintain large caches of pages, so that common or repeated requests don't need to touch the Apache or database servers. They serve most page requests made by visitors who aren't logged in. They are currently running at a hit-rate of approximately 75%, effectively quadrupling the capacity of the Apache servers behind them. This is particularly noticeable when a large surge of traffic arrives directed to a particular page via a web link from another site, as the caching efficiency for that page will be nearly 100%. They also load balance the Apaches. Round-robin DNS is balancing the load among the Squids. See cache strategy for more details. </quote>

I did a test setup of Memcached with my app. The main page also has several sections of dynamic content. Using memcache to store fragments of the page for a short time(you can set their expiration time) increased throughput to approximately three times the uncached numbers. Memcached is very easy to set up and use, you just have to make sure you expire the approprate caches when updates to data are made, otherwise you have to wait for them to expire to get the update.

Also, what, 19 Mongrels per server? That is far, far too many. I can max out my CPUs with 5 mongrels per machine. I actually have 4 apps with 4 instances each running on my production web nodes. That is probably too many, but I have plenty of RAM so why not?

Let us know what happens with your site. Also, what is your site?

Jason

Nestor Camacho wrote: