Rails Camp Scaling Session notes

Bill Lipa wrote:

Here are some notes from the scalability session of last week's Rails camp. They were entered by another session participant and are posted at:
http://www.rubyonrailscamp.com/10%3A15%2Bsession%2B-%2Bscaling

The key points from my point of view:
- the Ruby VM is sketchy, rather like the Java VM around 1997
- the single threaded nature of Rails dispatch handling means we may incur a big memory/hardware hit, for example if pages depend on remote services with varying response time.

Nonetheless Rails is still attractive because of its elegance and expressiveness. But keep your eyes open.

----

What scale applications on the Engine Yard site?

    * terminology: ‘slice’ is a virtual xen server – around 30req/second
    * theballot.org – Ran on 2 slices. 1Gb/second of traffic. They ran on $20/month hose before then.
    * kongregate – Flash game distribution site. 3 slices now. Deploying several times a day. They plan for 67 boxes by next summer.
    * They have made scaling easy, to levels equivalent to basecamp.

Scaling solutions of theirs

   1. Start with 2 Load balancers
   2. slices dont even have disks, mount root from external FS via GFS. Each slice gets 5 mongrel instances. This stuff runs enginex (sp?).
   3. Each ‘slice’ machine stores a DB instance. There is a rails plugin for managing writes/reads.
   4. Use AOE raid for disk store.
   5. Likely bottleneck is slices, not file system. Single cluster would be 16-24 machines (which is a big web site)
   6. On sudden spike when hosted with them, in an hour they can add slices.
   7. For us what we build now… dont need to do anything special to be hosted by them. It’ll generally migrate easily.
   8. Capistrano is used by them for deployment. It helps a lot.
   9. Number 1 performance issue that they see is N+1 poorly structed SQL problem.
  10. attr_accessible, attr_protected is IMPORTANT
  11. Memory usage is issue on servers. Mongrel process is at least 40Meg each. Some extreme cases are above 140Meg. Memory is cheap. Processor usage has not been a factor. All boxes are dual processor quad core AMDs and they are sleeping.
  12. Don’t worry about it until it is becoming a problem! Don’t preoptimize.
  13. pennyarcade is a rail sit and it is huge.
  14. Amount of silicon used for rails is 30% to 5x more than other machines typically used – but so what?
  15. Statement: In the end DB limits you, not the application.

Lack of multithreading is raised as a question
Case study:

   1. Java vs Ruby – Say, 1000 simultaneous requests
          * Mongrel can multithread but can back up on slow request dispatch.
          * In cases when you have to wait for things to do stuff – backgrounddrb is used. This releases the lock on the worker. Also look at ‘merb’ – mongrel plus erb. First use for this is image upload.
          * In a typical rails environment image upload locks process.
          * Worst case – 100Meg mongrel processes, 1000threads simultaneously. That’s 100Gig, @ 16Gig per machine makes for 8 machines.. Not a big deal.

Array implementation and rails calls

    * Supposedly each rails call creates 60000(!) arrays.
    * There is a patch to make Array implementation quicker – but it is not accepted yet.

Problem with Ruby is some guys hobby

    * At rubyconf matz’s talk was underwhelming. Development way slow.
    * rubinius (sp?) – Interpreter would be compiled to C. And interpreter would be written in ruby. Apparently good performance gains have been seen.

Corporate support, etc.

    * IBM hosting this
    * Sun doing jruby
    * See recent post on digg – php eats rails for lunch? Presumably this post: http://ohloh.net/wiki/articles/php_eats_rails

Hiring

    * Hiring is about to go dot.com stupid – anybody who breathes is almost good enough.
    * Hard to find good programmers who know rails and ruby
    * Good interview question for them: Have you ever implemented a binary level protocol?

Bill - thank you for those notes; they made very interesting reading.
What Ezra and Tom are doing at EngineYard is truly impressive.

Keep an eye on Charles Nutter's (JRuby) blog, http://headius.blogspot.com/

JRuby is slower than CRuby at present, but it is catching up, and has the potential to run Rails without the N times memory overhead for N Rails processes.
http://headius.blogspot.com/2006/11/advanced-rails-deployment-with-jruby.html

I've read that Litespeed tries to achieve something similar, by loading Ruby and Rails before forking additional server processes.

regards

   Justin Forder

Thanks!

<blush>