So it does not appear to be an IO problem. Just a busy box. I think your best course of action here it to get an additional box and move the fcgi's over to that box and leave the current box just for the database. Also make sure your psql config is *not* the default config as it is tuned for tiny amounts of memory.
With 2 cpu cores and a load average of 3.x you are not doing too bad. Basically a load average of 4.0 per cpu core is completely baked. So if your load average approaches 8.0 then you know your cpus are completely maxed out.
I think you need another box to separate the db from the app server so you can get out of the big ball of mud setup and into a setup where you can tune the db and app servers separately.
Postgres - by far - then apache - to answer the other comment - slow
all the time - we have an international user group - but traffic is
def. slower at 1 am than at noon PST - however even when I am on the
site at night around midnight still super slow.
It really does sound a query tuning issue. It comes in two forms. The
first the prototypical indexing issue. For example, when navigating
across a relationship:
The child table will have some_class_id, which needs to be indexed.
When you migrate and add tables Rails/postgres automatically generates
primary keys for id fields. It does not generate indexes on all the
foreign key fields. So, the generated :
SELECT * FROM <table-name> WHERE some_class_id = 37
Will scan the entire table.
The second problem, which may or may not apply to you, is what is called
a "cross-product" query. For example, in a find method you join to
another table but don't add a restriction for the join columns. A cross
product query between two tables, each with 1000 rows, will generate an
intermediate result on the order of 1,000,000 records.
How to find you have a query problem.
1) You can isolate your queries, run them in a postgres console and see
how long they take or do an "explain plan."
2) Examine your log for places where the time spend in the database is
more than a fraction of a second.
It may take a little while to find since a page may generate a dozen
queries, only one of which is bad. It may not even be on every page.
cartesian product - eh? we may have some brutally poor joins in our
queries I really appreciate everyones insights and comments - super
valuable information - at least gives us some places to get started.
Phil - here is a little sampling from out log (im removed the url to
our site for now)
Processing HumansController#my_account (for 24.41.63.235 at 2008-02-21
18:42:37) [GET]
Session ID: 17fa2835e51d3e67a7ad1d8e031f3a4d
Parameters: {"action"=>"my_account", "controller"=>"humans"}
Rendering within layouts/setup
Rendering humans/my_account
Completed in 2.92013 (0 reqs/sec) | Rendering: 0.03056 (1%) | DB:
2.42590 (83%) | 200 OK [http://www.website.com/shop/browse/womens?
page=11]
more than 2s in DB ?!?
On most web applications ever to scale a DB query should take less than 0.01s
So 2 questions to answer :
1/ How many DB queries are you doing for this page?
2/ Are there some DB queries taking more than a few ms?
To check this, dump your db, reload it in your development environment, browse the pages while watching the development.log file, you'll get a listing of all SQL queries with the time spent in DB.
If you do too many queries, check if you can use :include in your ActiveRecord::Base.find calls to reduce the need to fetch associations with other queries later. Verify that you get a benefit doing so (I actually saw PostgreSQL being faster when fetching associations later on tricky queries but 99% of the time it should be a huge win).
If some queries are slow, depending on what you are used to, go into pgadmin or psql and "EXPLAIN query". Then you'll have to be a DBA - Adding indexes on the proper column or column combinations can make the difference between a page taking 3s and a page taking 0.05s to render...
You should now that PostgreSQL becomes faster each time they release a new minor version since around 7.0. Switching from 8.0 to 8.2 helped quite a bit here.
I'm not sure if ruby-postgres is OK with 8.3 (I remember there where problems being addressed recently). 8.2 is fine.
You definitely should use migrations to add/remove indexes, it's easier to maintain your dev/test boxes in sync with the production for performance testing this way (and you can document why an index is needed).
Lionel - thank you so much for the great advice - we have serious
query issues - as you can see below - i am reviewing the production
log right now and we are consistently looking at 1.5 - 4 seconds in
some cases for scrips to run
Friends- thank you again for all your help - we just migrated and we
just re-indexed and saw a significant increase in speed - however - we
will be taking all the comments here on this post to heart and putting
together a scaling plan - thanks again to everyone for the quick
responses and great insight!
> Can anyone recommend some good reading material on scaling a Rails
> app? we receive around 5k-7k visitors per day and are running
> postgres and rails with fastcgi - we have not implemented caching yet
> and are pondering moving to mongrel. We have thrown more hardware at
> our application and seemed to help a bit - but we are looking for the
> most optimal growth plan and would love any thoughts or advice or case
> studies anyone has had - thanks for your time in posting!
May I humbly suggest getting a copy of my book that was just finished
yesterday? http://pragprog.com/titles/fr_deploy It covers taking a
rails app form infancy to maturity and covers all the topics of
scaling out like apache/nginx/mongrel as well as Xen and mysql master -
> slave and master <-> master.
Cheers-
- Ezra Zygmuntowicz
If you don't mind Ezra, having just read through your performance
tuning chapter a few questions came to mind?
First you don't mention HAproxy - I think (?) you once recommended
it. Do you now feel that nginx serves well on the front end?
We've deployed haproxy -> many nginx -> mongrel
Is the above configuration problematic?
I also noticed (perhaps related to the above) some conf settings that
I don't share.
A. under events, you're using worker_connections 8192. Every other
example (including my own) have them set to 1024.
I've looked around but havne't found a decent guide to setting this
value
B. use epoll
How does the event drive poller (epoll) assist ?
thanx for your thoughts Ezra. Congrats on the book - looking forward
to seeing how you treated the Capistrano chapter. grin.
I have a few more questions about the book - namely the performance
tuning chapter. Where's the best place to ask more?