Scaling Best Practices

Can anyone recommend some good reading material on scaling a Rails
app? we receive around 5k-7k visitors per day and are running
postgres and rails with fastcgi - we have not implemented caching yet
and are pondering moving to mongrel. We have thrown more hardware at
our application and seemed to help a bit - but we are looking for the
most optimal growth plan and would love any thoughts or advice or case
studies anyone has had - thanks for your time in posting!

Mongrel is recommended, see: http://mongrel.rubyforge.org/

Normally, what you will do is run multiple copies of mongrel behind a
reverse proxy. The Apache proxying support (which is what most people
use) appears to scale to a point. Beyond that you might have to look at
other options. You should also move static resources (such as CSS,
JPEG, PNG, and static HTML) so they are served by the web server as
opposed to the app server (as you have probably configured for fastcgi).

SQL query tuning is one thing that most people neglect. First, make
sure the columns you are using in relationships are indexed. Make sure
any columns you are using in finders (i.e. MyModel.find_by_name(name))
are also indexed. Rails 2.0 should do query caching, but look for
unnecessary trips to the database in your application. Proper care and
feeding of postgres is also important. In one extreme PHP example, a
simple change in the logic reduced the database queries by a factor of 5
and the response time on one particular page from almost a minute to
about 1 second.

Other associated services. For example, for one client we're forced to
use SMTP mailer connections. The SMTP server has a slow response to
each request, so there's a perceptible delay when sending mail. To
compensate we wrote a small service to send mail asynchronously. This
may be true of other issues, like shared file systems, etc.

Generally, your pipe to the outside world does not require gigabit
networking (your limited by the size of your external pipe). Between
your database and your app server, if you have gigabit links, make sure
you use CAT 6 cable and your switch supports gigabit networking. The
same can be said for NFS connections.

Can anyone recommend some good reading material on scaling a Rails
app? we receive around 5k-7k visitors per day and are running
postgres and rails with fastcgi - we have not implemented caching yet
and are pondering moving to mongrel. We have thrown more hardware at
our application and seemed to help a bit - but we are looking for the
most optimal growth plan and would love any thoughts or advice or case
studies anyone has had - thanks for your time in posting!

  May I humbly suggest getting a copy of my book that was just finished yesterday? http://pragprog.com/titles/fr_deploy It covers taking a rails app form infancy to maturity and covers all the topics of scaling out like apache/nginx/mongrel as well as Xen and mysql master - > slave and master <-> master.

Cheers-
- Ezra Zygmuntowicz
-- Founder & Software Architect
-- ezra@engineyard.com
-- EngineYard.com

Thanks for the quick response - so right now we are running the
database, app and web services all on one server - perhaps time for us
to break out to three servers? We have indexed all our databases
but could return to the code and ensure we are being efficient. From
what I have read more people recommend more and more hardware - but I
dont understand the relationship between site activity and processing
power. Our current setup is too slow - and its hurting business. How
hard is it to migrate from fastCGI to mongrel?

Hi Ezra - thanks for the link - can you give any general guidance
while in the mean time - will take a while to order and read your
book :wink:

Can anyone recommend some good reading material on scaling a Rails
app? we receive around 5k-7k visitors per day and are running

5k-7k visitors? That could be a little bit of traffic or a lot of
traffic... what's your actual page requests per second on average? If
each of those visitors only hits one page a day then your scaling problem
is very different than if they hit 100.

postgres and rails with fastcgi - we have not implemented caching yet
and are pondering moving to mongrel.

I'd definitely recommend switching away from fastcgi. Mongrel with Nginx.
Or perhaps Litespeed.

Caching will almost certainly help as well. But pick your spots so you
don't spend time caching things that don't make any sense. Maybe look
into memcache. If you can page cache, that will be your biggest gain.

Also, postgres can definitely stand to be tuned to your specific
situation. See if you've got some slow queries and ask on the postgres
lists for help on tuning.

Normally, what you will do is run multiple copies of mongrel behind a
reverse proxy. The Apache proxying support (which is what most people
use) appears to scale to a point. Beyond that you might have to look at
other options.

Nginx.

You should also move static resources (such as CSS, JPEG, PNG, and
static HTML) so they are served by the web server as opposed to the app
server (as you have probably configured for fastcgi).

You may want to look into spreading your assets to other hosts and using
the 'asset%d' trick to get Rails to spread the load...

SQL query tuning is one thing that most people neglect. First, make
sure the columns you are using in relationships are indexed. Make sure

Maybe. Maybe not :slight_smile: If you've got a million users and have a 'gender'
column don't index that as roughly half are going to be one and half the
other. I think postgres is smart enough to realize that and ignore your
index, but mysql isn't. It will use the index and then lookup 500,000
rows and you'll get worse performance.

Similarly if you have a table that the column isn't very unique and it's
constantly being updated the index re-generation overhead will hurt you.

But if you do a lot of lookups on those users by their login and don't
have login indexed then yeah you're gonna be hurting :slight_smile:

Other associated services. For example, for one client we're forced to
use SMTP mailer connections. The SMTP server has a slow response to
each request, so there's a perceptible delay when sending mail. To
compensate we wrote a small service to send mail asynchronously. This
may be true of other issues, like shared file systems, etc.

http://seattlerb.rubyforge.org/ar_mailer/

maybe of use there... not sure if the original questioner has email issues
or not...

Marc-

  So can you expound on what your current pain points are? What kind of hardware are you currently on? What is the load on the box? is the database or the fcgi's taking most of the resources? Are you RAM constrained or CPU constrained? What kind of peak traffic do you get?

  If you can provide a breakdown of your current setup and what is the bottleneck then I can better help you. But it sounds like you would benefit from an additional server to put the database on and then switching the app servers to nginx + mongrel or thin. This helps toseparate the concerns so you can know whether you need to scale the database or the application servers.

  Give a little more info and we can help figure out the best plan of attack for you.

Cheers-
- Ezra Zygmuntowicz
-- Founder & Software Architect
-- ezra@engineyard.com
-- EngineYard.com

Thanks for the quick response - so right now we are running the
database, app and web services all on one server - perhaps time for us
to break out to three servers?

I'd put the database on it's own and then probably run nginx/mongrel (or
litespeed) on the other two and load balance b/n them.

We have indexed all our databases
but could return to the code and ensure we are being efficient. From
what I have read more people recommend more and more hardware - but I
dont understand the relationship between site activity and processing
power. Our current setup is too slow - and its hurting business. How
hard is it to migrate from fastCGI to mongrel?

Pretty easy. Google around and you'll find some good tutorials on setting
up mongrel (and mongrel cluster).

Phillip - thank you for your comments - yes our visitors tend to stick
around and browse - here is a sampling of our traffic from Wed. of
this week (via google analytics) :

6,330 Visits
42,607 Pageviews
6.73 Pages/Visit
00:06:53 Avg. Time on Site
55.02% % New Visits

In regards to page request per second - Im not sure how to calc that -
I have data for page views per hour - which gives an average of 30-40
page views per minute - again appreciate your help and advice.

HI Ezra - we just upgraded to a rackspace box with duo core - running
4 gigs of RAM - but when I run TOP - almost 90% of CPU is going to
fcgi and postmaster - Wed. was a bigger day for us - see stats below
in the other response - we have everything on one box - database, app,
web - (bad idea??) thanks for your insight and guidance (I am sure we
will buy your book ;).

Phillip - thank you for your comments - yes our visitors tend to stick
around and browse - here is a sampling of our traffic from Wed. of
this week (via google analytics) :

6,330 Visits
42,607 Pageviews
6.73 Pages/Visit
00:06:53 Avg. Time on Site
55.02% % New Visits

In regards to page request per second - Im not sure how to calc that -
I have data for page views per hour - which gives an average of 30-40
page views per minute - again appreciate your help and advice.

Hrm. That's actually not that much traffic.. assuming you can finish your
requests in under a second. What's your logs say about how long it's
taking to generate rails pages?

What's your memory usage? Lots of free ram? Or maxxed out? If you've
got free ram I'd switch to mongrel first. Then, if necessary, move
postgresql to another box.

HI Ezra - we just upgraded to a rackspace box with duo core - running
4 gigs of RAM - but when I run TOP - almost 90% of CPU is going to
fcgi and postmaster -

Doesn't that really just mean those are the only processes doing
something? I know there are some systems that use 100% of the cpu for a
trivial process simply because nothing else wants to run so that process
figures it might as well hog the cpu :slight_smile:

But if they are fighting for CPU time then that's a problem...

If you hit shift-m while in top what does it say is your most ram hungry
processes? Are you hitting the 4gb limit?

I agree - we should be able to handle TONS more - our RAM is MAXED at
4 gigs - the real problem I think then is postgres taking to long to
execute queiries - how can I tell how long to generate a rails apge?
(I think I will post in the postgres groups as well to see if they can
help)

Traffic doesn't come evenly distributed all day long. Are there times
of the day where the performance is fine? Are there times during the
day when performance sucks?

For example, if you don't get much traffic early in the morning, and
performance is still a problem, then this isn't a scalability issue. It
might be a configuration or software issue.

Also, are you running in production mode or development mode?

Could be an IO subsystem problem. When you look at top and the server is busy what does the %wa say? Also try this command and paste us the output:

iostat -x 5

  What kind of disks are in the server? And with what kind of raid setup? It sounds to me like you just need to get a separate box for the database. Keeping the database and the fcgi's on separate boxes and tuning the configs properly will allow linux to agressively cache the stuff you need. With both on the same box they are fighting for disk io cache.

  Also are you using the default postgresql config? The default config is tuned for 64Mb of ram and needs to be dialed in when you have more ram.

Cheers-
-Ezra

I agree - we should be able to handle TONS more - our RAM is MAXED at
4 gigs - the real problem I think then is postgres taking to long to
execute queiries - how can I tell how long to generate a rails apge?
(I think I will post in the postgres groups as well to see if they can
help)

log/production.log should tell you...

Your production log should give you more info:

Processing ExternalController#playlist [GET]

Rendering external/playlist

Completed in 0.02592 (38 reqs/sec) | Rendering: 0.00662 (25%) | DB: 0.00625 (24%) | 200 OK

Rails Log Analyzer could help you, haven’t used it myself, but it seems like it could provide you with more information.

http://rails-analyzer.rubyforge.org/

Although I must say the number of views per day you’re getting isn’t humongous. I have an old horse (compared to your setup) serving a lot more than that (it has quite a number of apps running on it too). I do use Apache + Pound (load balancing) + mongrel cluster. I’ve used Apache+FCGI quite some time ago and learned you should avoid it. I first switched over to Lighttpd, which improved things a lot and then to the current setup. We also have a server running Apache load balancer+mongrel cluster and nginx+mongrel cluster, they all work very very well.

Best regards

Peter De Berdt

Ezra - we have talked with you before - and we know you are a rails
guru - one of the best! - thank you for taking time to help (I had to
step away to grab a pastrami sandwich) -

3 x 146 GB (10,000 RPM) SCSI Drives - RAID 5

Top
top - 14:11:08 up 3 days, 23:59, 2 users, load average: 3.03, 2.94,
2.83
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
Cpu(s): 30.1% us, 8.9% sy, 0.0% ni, 60.8% id, 0.2% wa, 0.0% hi,
0.0% si
Mem: 4147336k total, 4115540k used, 31796k free, 55668k
buffers
Swap: 1052248k total, 256k used, 1051992k free, 2520152k
cached

iostat -x 5
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.36 41.93 3.80 18.28 175.58 481.78 87.79
240.89 29.77 0.21 9.62 0.95 2.10

Peter - lets switch boxes! :wink: I am glad to here you are having
success with your setup - we have a good problem of a successful web
app - so traffic continually is increasing - I just pray we dont get
digged or slashdotted (yet at least).