Scaling Best Practices

So it does not appear to be an IO problem. Just a busy box. I think your best course of action here it to get an additional box and move the fcgi's over to that box and leave the current box just for the database. Also make sure your psql config is *not* the default config as it is tuned for tiny amounts of memory.

  With 2 cpu cores and a load average of 3.x you are not doing too bad. Basically a load average of 4.0 per cpu core is completely baked. So if your load average approaches 8.0 then you know your cpus are completely maxed out.

  I think you need another box to separate the db from the app server so you can get out of the big ball of mud setup and into a setup where you can tune the db and app servers separately.

Cheers-

- Ezra Zygmuntowicz -- Founder & Software Architect -- ezra@engineyard.com -- EngineYard.com

Postgres - by far - then apache - to answer the other comment - slow all the time - we have an international user group - but traffic is def. slower at 1 am than at noon PST - however even when I am on the site at night around midnight still super slow.

Phil - here is a little sampling from out log (im removed the url to our site for now)

Processing HumansController#my_account (for 24.41.63.235 at 2008-02-21 18:42:37) [GET]   Session ID: 17fa2835e51d3e67a7ad1d8e031f3a4d   Parameters: {"action"=>"my_account", "controller"=>"humans"} Rendering within layouts/setup Rendering humans/my_account Completed in 2.92013 (0 reqs/sec) | Rendering: 0.03056 (1%) | DB: 2.42590 (83%) | 200 OK [http://www.website.com/shop/browse/womens? page=11] Completed in 0.20057 (4 reqs/sec) | Rendering: 0.10504 (52%) | DB: 0.09320 (46%) | 200 OK [http://www.website.com/humans/my_account/\] Rendering within layouts/setup Rendering home/index Rendering within layouts/setup Rendering shop/browse Completed in 0.86887 (1 reqs/sec) | Rendering: 0.09286 (10%) | DB: 0.67580 (77%) | 200 OK [http://www.website.com/\] Rendering within layouts/setup Rendering vote/in_the_running_day

Completed in 2.92013 (0 reqs/sec) | Rendering: 0.03056 (1%) | DB: 2.42590 (83%) | 200 OK [http://www.website.com/shop/browse/womens? page=11]

You should figure out what in this request is taking the database 2.4 seconds to complete... that's a long time...

Completed in 0.86887 (1 reqs/sec) | Rendering: 0.09286 (10%) | DB: 0.67580 (77%) | 200 OK [http://www.website.com/\]

Don't know what your site does, but maybe you could page cache the homepage for a couple of minutes at a time? We do that and it gains us quite a bit.

It really does sound a query tuning issue. It comes in two forms. The first the prototypical indexing issue. For example, when navigating across a relationship:

@parent = SomeClass.find(params[:id]) @children = @parent.children

The child table will have some_class_id, which needs to be indexed. When you migrate and add tables Rails/postgres automatically generates primary keys for id fields. It does not generate indexes on all the foreign key fields. So, the generated :

SELECT * FROM <table-name> WHERE some_class_id = 37

Will scan the entire table.

The second problem, which may or may not apply to you, is what is called a "cross-product" query. For example, in a find method you join to another table but don't add a restriction for the join columns. A cross product query between two tables, each with 1000 rows, will generate an intermediate result on the order of 1,000,000 records.

How to find you have a query problem.

1) You can isolate your queries, run them in a postgres console and see how long they take or do an "explain plan." 2) Examine your log for places where the time spend in the database is more than a fraction of a second.

It may take a little while to find since a page may generate a dozen queries, only one of which is bad. It may not even be on every page.

cartesian product - eh? we may have some brutally poor joins in our queries I really appreciate everyones insights and comments - super valuable information - at least gives us some places to get started.

Marc wrote:

Phil - here is a little sampling from out log (im removed the url to our site for now)

Processing HumansController#my_account (for 24.41.63.235 at 2008-02-21 18:42:37) [GET]   Session ID: 17fa2835e51d3e67a7ad1d8e031f3a4d   Parameters: {"action"=>"my_account", "controller"=>"humans"} Rendering within layouts/setup Rendering humans/my_account Completed in 2.92013 (0 reqs/sec) | Rendering: 0.03056 (1%) | DB: 2.42590 (83%) | 200 OK [http://www.website.com/shop/browse/womens? page=11]    more than 2s in DB ?!?

On most web applications ever to scale a DB query should take less than 0.01s

So 2 questions to answer :

1/ How many DB queries are you doing for this page? 2/ Are there some DB queries taking more than a few ms?

To check this, dump your db, reload it in your development environment, browse the pages while watching the development.log file, you'll get a listing of all SQL queries with the time spent in DB.

If you do too many queries, check if you can use :include in your ActiveRecord::Base.find calls to reduce the need to fetch associations with other queries later. Verify that you get a benefit doing so (I actually saw PostgreSQL being faster when fetching associations later on tricky queries but 99% of the time it should be a huge win).

If some queries are slow, depending on what you are used to, go into pgadmin or psql and "EXPLAIN query". Then you'll have to be a DBA -:slight_smile: Adding indexes on the proper column or column combinations can make the difference between a page taking 3s and a page taking 0.05s to render...

You should now that PostgreSQL becomes faster each time they release a new minor version since around 7.0. Switching from 8.0 to 8.2 helped quite a bit here.

I'm not sure if ruby-postgres is OK with 8.3 (I remember there where problems being addressed recently). 8.2 is fine.

You definitely should use migrations to add/remove indexes, it's easier to maintain your dev/test boxes in sync with the production for performance testing this way (and you can document why an index is needed).

Lionel

Lionel - thank you so much for the great advice - we have serious query issues - as you can see below - i am reviewing the production log right now and we are consistently looking at 1.5 - 4 seconds in some cases for scrips to run

Friends- thank you again for all your help - we just migrated and we just re-indexed and saw a significant increase in speed - however - we will be taking all the comments here on this post to heart and putting together a scaling plan - thanks again to everyone for the quick responses and great insight!

This plugin may interest you:

http://agilewebdevelopment.com/plugins/sql_logging

IMO, best run in production mode with log level set to DEBUG.

Howdy Ezra -

> Can anyone recommend some good reading material on scaling a Rails > app? we receive around 5k-7k visitors per day and are running > postgres and rails with fastcgi - we have not implemented caching yet > and are pondering moving to mongrel. We have thrown more hardware at > our application and seemed to help a bit - but we are looking for the > most optimal growth plan and would love any thoughts or advice or case > studies anyone has had - thanks for your time in posting!

        May I humbly suggest getting a copy of my book that was just finished yesterday? http://pragprog.com/titles/fr_deploy It covers taking a rails app form infancy to maturity and covers all the topics of scaling out like apache/nginx/mongrel as well as Xen and mysql master - > slave and master <-> master.

Cheers- - Ezra Zygmuntowicz

If you don't mind Ezra, having just read through your performance tuning chapter a few questions came to mind?

First you don't mention HAproxy - I think (?) you once recommended it. Do you now feel that nginx serves well on the front end?

We've deployed haproxy -> many nginx -> mongrel

Is the above configuration problematic?

I also noticed (perhaps related to the above) some conf settings that I don't share.

A. under events, you're using worker_connections 8192. Every other example (including my own) have them set to 1024.

I've looked around but havne't found a decent guide to setting this value

B. use epoll

How does the event drive poller (epoll) assist ?

thanx for your thoughts Ezra. Congrats on the book - looking forward to seeing how you treated the Capistrano chapter. grin.

I have a few more questions about the book - namely the performance tuning chapter. Where's the best place to ask more?

Jodi