Scaling Best Practices

Can anyone recommend some good reading material on scaling a Rails app? we receive around 5k-7k visitors per day and are running postgres and rails with fastcgi - we have not implemented caching yet and are pondering moving to mongrel. We have thrown more hardware at our application and seemed to help a bit - but we are looking for the most optimal growth plan and would love any thoughts or advice or case studies anyone has had - thanks for your time in posting!

Mongrel is recommended, see: http://mongrel.rubyforge.org/

Normally, what you will do is run multiple copies of mongrel behind a reverse proxy. The Apache proxying support (which is what most people use) appears to scale to a point. Beyond that you might have to look at other options. You should also move static resources (such as CSS, JPEG, PNG, and static HTML) so they are served by the web server as opposed to the app server (as you have probably configured for fastcgi).

SQL query tuning is one thing that most people neglect. First, make sure the columns you are using in relationships are indexed. Make sure any columns you are using in finders (i.e. MyModel.find_by_name(name)) are also indexed. Rails 2.0 should do query caching, but look for unnecessary trips to the database in your application. Proper care and feeding of postgres is also important. In one extreme PHP example, a simple change in the logic reduced the database queries by a factor of 5 and the response time on one particular page from almost a minute to about 1 second.

Other associated services. For example, for one client we're forced to use SMTP mailer connections. The SMTP server has a slow response to each request, so there's a perceptible delay when sending mail. To compensate we wrote a small service to send mail asynchronously. This may be true of other issues, like shared file systems, etc.

Generally, your pipe to the outside world does not require gigabit networking (your limited by the size of your external pipe). Between your database and your app server, if you have gigabit links, make sure you use CAT 6 cable and your switch supports gigabit networking. The same can be said for NFS connections.

Can anyone recommend some good reading material on scaling a Rails app? we receive around 5k-7k visitors per day and are running postgres and rails with fastcgi - we have not implemented caching yet and are pondering moving to mongrel. We have thrown more hardware at our application and seemed to help a bit - but we are looking for the most optimal growth plan and would love any thoughts or advice or case studies anyone has had - thanks for your time in posting!

  May I humbly suggest getting a copy of my book that was just finished yesterday? http://pragprog.com/titles/fr_deploy It covers taking a rails app form infancy to maturity and covers all the topics of scaling out like apache/nginx/mongrel as well as Xen and mysql master - > slave and master <-> master.

Cheers- - Ezra Zygmuntowicz -- Founder & Software Architect -- ezra@engineyard.com -- EngineYard.com

Thanks for the quick response - so right now we are running the database, app and web services all on one server - perhaps time for us to break out to three servers? We have indexed all our databases but could return to the code and ensure we are being efficient. From what I have read more people recommend more and more hardware - but I dont understand the relationship between site activity and processing power. Our current setup is too slow - and its hurting business. How hard is it to migrate from fastCGI to mongrel?

Hi Ezra - thanks for the link - can you give any general guidance while in the mean time - will take a while to order and read your book :wink:

Can anyone recommend some good reading material on scaling a Rails app? we receive around 5k-7k visitors per day and are running

5k-7k visitors? That could be a little bit of traffic or a lot of traffic... what's your actual page requests per second on average? If each of those visitors only hits one page a day then your scaling problem is very different than if they hit 100.

postgres and rails with fastcgi - we have not implemented caching yet and are pondering moving to mongrel.

I'd definitely recommend switching away from fastcgi. Mongrel with Nginx. Or perhaps Litespeed.

Caching will almost certainly help as well. But pick your spots so you don't spend time caching things that don't make any sense. Maybe look into memcache. If you can page cache, that will be your biggest gain.

Also, postgres can definitely stand to be tuned to your specific situation. See if you've got some slow queries and ask on the postgres lists for help on tuning.

Normally, what you will do is run multiple copies of mongrel behind a reverse proxy. The Apache proxying support (which is what most people use) appears to scale to a point. Beyond that you might have to look at other options.

Nginx.

You should also move static resources (such as CSS, JPEG, PNG, and static HTML) so they are served by the web server as opposed to the app server (as you have probably configured for fastcgi).

You may want to look into spreading your assets to other hosts and using the 'asset%d' trick to get Rails to spread the load...

SQL query tuning is one thing that most people neglect. First, make sure the columns you are using in relationships are indexed. Make sure

Maybe. Maybe not :slight_smile: If you've got a million users and have a 'gender' column don't index that as roughly half are going to be one and half the other. I think postgres is smart enough to realize that and ignore your index, but mysql isn't. It will use the index and then lookup 500,000 rows and you'll get worse performance.

Similarly if you have a table that the column isn't very unique and it's constantly being updated the index re-generation overhead will hurt you.

But if you do a lot of lookups on those users by their login and don't have login indexed then yeah you're gonna be hurting :slight_smile:

Other associated services. For example, for one client we're forced to use SMTP mailer connections. The SMTP server has a slow response to each request, so there's a perceptible delay when sending mail. To compensate we wrote a small service to send mail asynchronously. This may be true of other issues, like shared file systems, etc.

http://seattlerb.rubyforge.org/ar_mailer/

maybe of use there... not sure if the original questioner has email issues or not...

Marc-

  So can you expound on what your current pain points are? What kind of hardware are you currently on? What is the load on the box? is the database or the fcgi's taking most of the resources? Are you RAM constrained or CPU constrained? What kind of peak traffic do you get?

  If you can provide a breakdown of your current setup and what is the bottleneck then I can better help you. But it sounds like you would benefit from an additional server to put the database on and then switching the app servers to nginx + mongrel or thin. This helps toseparate the concerns so you can know whether you need to scale the database or the application servers.

  Give a little more info and we can help figure out the best plan of attack for you.

Cheers- - Ezra Zygmuntowicz -- Founder & Software Architect -- ezra@engineyard.com -- EngineYard.com

Thanks for the quick response - so right now we are running the database, app and web services all on one server - perhaps time for us to break out to three servers?

I'd put the database on it's own and then probably run nginx/mongrel (or litespeed) on the other two and load balance b/n them.

We have indexed all our databases but could return to the code and ensure we are being efficient. From what I have read more people recommend more and more hardware - but I dont understand the relationship between site activity and processing power. Our current setup is too slow - and its hurting business. How hard is it to migrate from fastCGI to mongrel?

Pretty easy. Google around and you'll find some good tutorials on setting up mongrel (and mongrel cluster).

Phillip - thank you for your comments - yes our visitors tend to stick around and browse - here is a sampling of our traffic from Wed. of this week (via google analytics) :

6,330 Visits 42,607 Pageviews 6.73 Pages/Visit 00:06:53 Avg. Time on Site 55.02% % New Visits

In regards to page request per second - Im not sure how to calc that - I have data for page views per hour - which gives an average of 30-40 page views per minute - again appreciate your help and advice.

HI Ezra - we just upgraded to a rackspace box with duo core - running 4 gigs of RAM - but when I run TOP - almost 90% of CPU is going to fcgi and postmaster - Wed. was a bigger day for us - see stats below in the other response - we have everything on one box - database, app, web - (bad idea??) thanks for your insight and guidance (I am sure we will buy your book ;).

Phillip - thank you for your comments - yes our visitors tend to stick around and browse - here is a sampling of our traffic from Wed. of this week (via google analytics) :

6,330 Visits 42,607 Pageviews 6.73 Pages/Visit 00:06:53 Avg. Time on Site 55.02% % New Visits

In regards to page request per second - Im not sure how to calc that - I have data for page views per hour - which gives an average of 30-40 page views per minute - again appreciate your help and advice.

Hrm. That's actually not that much traffic.. assuming you can finish your requests in under a second. What's your logs say about how long it's taking to generate rails pages?

What's your memory usage? Lots of free ram? Or maxxed out? If you've got free ram I'd switch to mongrel first. Then, if necessary, move postgresql to another box.

HI Ezra - we just upgraded to a rackspace box with duo core - running 4 gigs of RAM - but when I run TOP - almost 90% of CPU is going to fcgi and postmaster -

Doesn't that really just mean those are the only processes doing something? I know there are some systems that use 100% of the cpu for a trivial process simply because nothing else wants to run so that process figures it might as well hog the cpu :slight_smile:

But if they are fighting for CPU time then that's a problem...

If you hit shift-m while in top what does it say is your most ram hungry processes? Are you hitting the 4gb limit?

I agree - we should be able to handle TONS more - our RAM is MAXED at 4 gigs - the real problem I think then is postgres taking to long to execute queiries - how can I tell how long to generate a rails apge? (I think I will post in the postgres groups as well to see if they can help)

Traffic doesn't come evenly distributed all day long. Are there times of the day where the performance is fine? Are there times during the day when performance sucks?

For example, if you don't get much traffic early in the morning, and performance is still a problem, then this isn't a scalability issue. It might be a configuration or software issue.

Also, are you running in production mode or development mode?

Could be an IO subsystem problem. When you look at top and the server is busy what does the %wa say? Also try this command and paste us the output:

iostat -x 5

  What kind of disks are in the server? And with what kind of raid setup? It sounds to me like you just need to get a separate box for the database. Keeping the database and the fcgi's on separate boxes and tuning the configs properly will allow linux to agressively cache the stuff you need. With both on the same box they are fighting for disk io cache.

  Also are you using the default postgresql config? The default config is tuned for 64Mb of ram and needs to be dialed in when you have more ram.

Cheers- -Ezra

I agree - we should be able to handle TONS more - our RAM is MAXED at 4 gigs - the real problem I think then is postgres taking to long to execute queiries - how can I tell how long to generate a rails apge? (I think I will post in the postgres groups as well to see if they can help)

log/production.log should tell you...

Your production log should give you more info:

Processing ExternalController#playlist [GET]

Rendering external/playlist

Completed in 0.02592 (38 reqs/sec) | Rendering: 0.00662 (25%) | DB: 0.00625 (24%) | 200 OK

Rails Log Analyzer could help you, haven’t used it myself, but it seems like it could provide you with more information.

http://rails-analyzer.rubyforge.org/

Although I must say the number of views per day you’re getting isn’t humongous. I have an old horse (compared to your setup) serving a lot more than that (it has quite a number of apps running on it too). I do use Apache + Pound (load balancing) + mongrel cluster. I’ve used Apache+FCGI quite some time ago and learned you should avoid it. I first switched over to Lighttpd, which improved things a lot and then to the current setup. We also have a server running Apache load balancer+mongrel cluster and nginx+mongrel cluster, they all work very very well.

Best regards

Peter De Berdt

Ezra - we have talked with you before - and we know you are a rails guru - one of the best! - thank you for taking time to help (I had to step away to grab a pastrami sandwich) -

3 x 146 GB (10,000 RPM) SCSI Drives - RAID 5

Top top - 14:11:08 up 3 days, 23:59, 2 users, load average: 3.03, 2.94, 2.83 Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie Cpu(s): 30.1% us, 8.9% sy, 0.0% ni, 60.8% id, 0.2% wa, 0.0% hi, 0.0% si Mem: 4147336k total, 4115540k used, 31796k free, 55668k buffers Swap: 1052248k total, 256k used, 1051992k free, 2520152k cached

iostat -x 5 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.36 41.93 3.80 18.28 175.58 481.78 87.79 240.89 29.77 0.21 9.62 0.95 2.10

Peter - lets switch boxes! :wink: I am glad to here you are having success with your setup - we have a good problem of a successful web app - so traffic continually is increasing - I just pray we dont get digged or slashdotted (yet at least).