Rails suitable for highly scalable apps?

Hello. I'm new to Ruby & Rails, though a veteran at engineering large-
scale distributed systems.

I have a new project which requires a REST API and simple web UI and
after reading (superficially) about RoR on and off over recent years,
I thought it was time I took it for a spin for a new project. It is a
dream 'ground-up' project with no legacy requirements.

However, I've hit a speed-bump and I'm unsure if it is a limitation of
Rails or just my lack of in-depth understanding of the code base.

When using the generator to generate a new model class, Rails chooses
an auto-increment int id as the primary key by default(!). This is
obviously pretty poor form for numerous reasons, such as:
1. Completely at odds with scalability and distributed implementation
of a DB since it introduces an unnecessary need for centralization
2. Depending on the DB engine, you might run out of primary keys as
soon as you hit 2^32 rows
3. A security vulnerability waiting to happen - unless you pay close
attention, it would be easy to expose ids to the public in a multi-
user environment for which guess-ability of some resource ids is bad
practice

By 1, I'm talking about indefinitely scalable distributed
implementations (since the term 'scalability' is used to mean a wide
variety of things, from vertically scaling a web app performance by
adding memory to a server to horizontally scaling with limits where
adding resources, such as servers, eventually becomes a case of
diminishing returns).
An easy way to check if your architecture is fully distributed and
performance of operations is independent of data size etc., is to do a
quick thought experiment where you imaging to have a ridiculous amount
of data, users etc. For example, would the performance for a user be
significantly effected if your database was so large it needed to be
spread over a trillion servers? If the answer is yes, then your
architecture is not indefinitely scalable as there is some
centralization introducing a dependency between performance and data
set size, of user-base size or whatever.

So, if you had a trillion DB servers, auto_increment could never work
because to determine which is the next id would require querying them
all to figure out what the largest existing id is (or, alternatively,
keeping the 'next id' stored in a central place - which will be a
performance bottleneck when a trillion servers have to hit it up for
every insert).
(for the purists, notice I said "significantly" above. For example,
consider the design of the DNS system and imagine if records had no
TTL - living on indefinitely. The load on the root servers would be
vanishingly small and it would hardly matter if they were out of
service for short periods).

Obviously, nobody has a trillion servers, but engineering systems to
be highly-scalable isn't hard and is good practice anyway (- in case
your client's service becomes the next Facebook, in which case you
won't have to touch anything - just spool up more and more cloud
servers and sit back rather than watch as their business fails due to
users leaving a sinking ship of slow or failed page-loads ).

Now, I've surfed around the web for information about how to use
custom ids or other primary key columns in Rails, but have only found
confusion (ignoring people who ask why and/or say not to do it).
Examples given seem to differ (perhaps due to changes before Rails 3?)
and I can't get any of the ideas to work.

For example, supposing I wish to use UUIDs for primary keys. I've
tried variations on:

class CreateItems < ActiveRecord::Migration
  def self.up
    create_table :items, {:id => false} do |t|
      t.string :id, :null => false, :limit => 36, :primary => true

      t.timestamps
    end

  end

  def self.down
    drop_table :items
  end
end

However, the :primary doesn't seem to work (perhaps is invalid) and
the table generated doesn't have a primary key. I can use add_index
to add a :unique index, but it isn't primary. Obviously, I'll need
some hooks to generate the UUIDs - I've delved into that part.

So, can Rails really handle this in a clean way and have scaffolding
work etc? How? Can someone kindly clue me into what I need in the
migration, model class and anywhere else? I'd prefer to avoid DB-
specific SQL execution (while I'm testing this on MySQL, that
obviously isn't a distributed scalable technology so I'll be using a
distributed store ultimately).
I'd also like some tables to have natural (domain specific) primary
key values, a related though perhaps separate issue (and less
critical).

I've achieved similar on another project using Grails by writing a JPA
implementation. I'm really hoping Rails can do this without having
the source hacked.

Any help or pointers are greatly appreciated.

Cheers,
-David.

So, if you had a trillion DB servers, auto_increment could never work
because to determine which is the next id would require querying them
all to figure out what the largest existing id is (or, alternatively,
keeping the 'next id' stored in a central place - which will be a
performance bottleneck when a trillion servers have to hit it up for
every insert).

I think master master mysql setups do things slightly differently, you
can set things up so that with (for example)
3 servers one of them has auto increment keys that look like 3n, the
next one 3n+1, and the third 3n+2, so a given server only needs to
track the auto increment it last assigned. No idea how far this
scales, even with 64bit ids you would't have much room with 10^12
servers

However, the :primary doesn't seem to work (perhaps is invalid) and
the table generated doesn't have a primary key. I can use add_index
to add a :unique index, but it isn't primary. Obviously, I'll need
some hooks to generate the UUIDs - I've delved into that part.

:primary_key is hardwired to be an integer on mysql, and I believe on
on postgres and other dbs too. If you want a primary key of a
different type I think you'll need to add the column as whatever data
type you want and then run a lump of sql to mark it as the primary key

So, can Rails really handle this in a clean way and have scaffolding
work etc? How? Can someone kindly clue me into what I need in the
migration, model class and anywhere else? I'd prefer to avoid DB-
specific SQL execution (while I'm testing this on MySQL, that
obviously isn't a distributed scalable technology so I'll be using a
distributed store ultimately).

If that is your end goal you might not want to spend too much time
with activerecord since it only really does SQLish things (ie not
mongodb, couchdb etc.)

Fred

[…]
:primary_key is hardwired to be an integer on mysql, and I believe on

on postgres and other dbs too. If you want a primary key of a

different type I think you’ll need to add the column as whatever data

type you want and then run a lump of sql to mark it as the primary key

Nasty. I was hoping that wasn’t the case. I had a quick look at the source for rails and the mysql2 gem and saw code that looked like the ‘primary_key’ type was being hard-coded to ‘INT(11) NOT NULL AUTO_INCREMENT’.

If that is your end goal you might not want to spend too much time

with activerecord since it only really does SQLish things (ie not

mongodb, couchdb etc.)

If I understand correctly, without using ActiveRecord, the scaffolding won’t work either. I’m guessing there are other parts of Rails that depend on AR too.

I guess the answer to my subject question is “no” - Rails isn’t suitable for modern scalable web apps. Sadly.

It baffles me why a relatively young project would cripple itself by making use of legacy architecture almost mandatory.

btw, there is nothing in the SQL standard that makes RDBMS inherently unscalable, it is just that the only implementations of RDBMs’ currently publicly are not scalable (in the indefinite sense) - though that is about to change. I’ve used no-sql stores before, but I prefer not to give up SQL unless necessary (and it isn’t). The only issue with SQL is that there is no standardization on how to handle domain-level conflict resolution (which is a given in a distributed system because inter-node communication can never be infinitely fast - even if networking technology advances, Einstein tells us that much).

My next plan is to spend a little effort creating some custom code to attempt to use UUIDs as primary keys and if that proves to be too much work, I likely use Grails instead.

Appreciate your reply.

Cheers.

...
If I understand correctly, without using ActiveRecord, the scaffolding won't
work either. I'm guessing there are other parts of Rails that depend on AR
too.

I am surprised that whether scaffolding works for you or not is
relevant. It is certainly not appropriate for the sort of app you are
describing.

...

My next plan is to spend a little effort creating some custom code to
attempt to use UUIDs as primary keys and if that proves to be too much work,
I likely use Grails instead.

This might be helpful
http://amthekkel.blogspot.com/2009/02/ruby-on-rails-how-to-use-guid-for-use.html

Google for
rails guid primary key
and
rails legacy database
for more suggestions

Colin

Scaffolding is a tool for demos and super rapid prototyping, it is not
intended to be used for regular code. It is certainly not relevant to
your concerns.

ActiveModel does a good job of smoothing that sort of stuff over (but
I of course agree that if you are scaling to a trillion servers it
won't be using rails scaffolds)

Ferd


If I understand correctly, without using ActiveRecord, the scaffolding won’t
work either. I’m guessing there are other parts of Rails that depend on AR
too.
I am surprised that whether scaffolding works for you or not is
relevant. It is certainly not appropriate for the sort of app you are
describing.

It isn’t, but being new to Rails I’m uncertain what other Rails functionality uses or assumes AR. That is, if I don’t use AR, what will the impact be and what will be left that Rails is actually bringing to the table?

Of course, right now, I am in the ‘prototyping’ phase, so scaffolding would have been ‘nice’ is all.

This might be helpful
http://amthekkel.blogspot.com/2009/02/ruby-on-rails-how-to-use-guid-for-use.html

Thanks, I saw that one (and the other copies of it) and have spent a day googling around already.

Cheers.

Just to ask the obvious question, do you know whether this app will need to scale yet? There is a thin line between best practices and premature optimization.

Generally if I’m scaling to a bazillion servers, I’m gonna be using a functional programming language with one or more NoSQL data store and an asynchronous messaging architecture with eventual consistency. If someone is asking me to build a first cut for a startup, I’m not gonna try to do that using Scala, Lift and Mongo. I’m gonna build it quickly in Rails and then I’ll refactor performance critical subsystems - initially through caching and eventually through re-writing in more appropriate stacks for scaling if for any reason Rails isn’t taking me where I need to be.

I’m a big fan of Grails, but I generally build most of my web apps in Rails unless I’m working with a Java shop (Groovy is less of a conceptual leap than JRuby) or need really tight integration with Spring, Hibernate or something else quintisentially Java. I find I can usually build something quicker in Rails.

So I’d start by just confirming that you really have a scale problem. If you’re rewriting an existing app that already has substantial load, it makes perfect sense to be focusing on this now. If it’s a start up venture (whether within an existing business or not), I’d focus on failing and iterating quickly. If your problem ends up being that the stack you started with isn’t scaling the way you want that is a really high quality (and unfortunately a really rare) problem to have.

Best Wishes,

Peter

How many servers does Twitter have? I'm just curious how many
applications - in the real world - need scaling to the level discussed
here. I'm new to RoR and have spent my decades in computers building
much smaller scaled implementations. i.e. Corporate apps. So, I don't
really have a handle on this.

However, I'm curious as to how big of an issue this really is. If
Twitter runs on RoR (and I'm told it does, but don't know any
details), that would seem to be a very large implementation. Are they
running into limitations? What systems are bigger than Twitter and how
much? Does anyone have any real data?

Thanks,
Clyde

Twitter had huge scaling problems. While I am a big fan of Rails and while it is wrong to suggest rails cannot scale, if you really hit twitter scale you are not going to want to use a general purpose web framework with a SQL data store. However, almost none of us hit that scale which is why I build sites in rails and am open to re-architecting if lightening happens to strike.

Best wishes,
Peter

No, obviously I don’t know. However, I’ve watched as businesses have gone under precisely because they didn’t architect for scale at the outset. There is no reason not to it as it really is little more effort than not doing so.

Consider this scenario which I watched pay out: A site had been implemented using a standard LAMP stack (Linux, Apache, MySQL, PHP) with master-slave replication. It had been running along for a reasonable time collecting customers (around 7 months) when the exponential adoption started to hit. It was driven by a few factors, including media buzz and unhappiness with a policy change of a competing site which saw users switching over en-mass. Within about a week the number of registrations was just over 1000x what it was the week or so prior. Their reaction was to throw more web instances behind the load-balancer and spool up significantly more MySQL slave instances.

Unfortunately, another week passed and the site was too slow to be usable for customers - page timeouts and just very long load-times ensued while the devs tried to retrofit the code to handle the ‘eventual consistency’ that results when you have many slaves replicating from a master, add a caching layer, implement application-level sharding of key tables etc. Well, there was a user backlash about the poor performance and the media picked that up also. Customer support was so overwhelmed they couldn’t even reply to most help tickets. By the third week a new competitor had launched a new site that was snappy (easy when you have little traffic). However, the same customers were signing up with the new competitor in droves and however they architected their site (it was run on Amazon EC2) it withstood the test and by the end of the month there were almost no active users of the site left. After the brand had been ruined by bad press, there were few new registrations and not enough revenue to cover the costs, the owners just closed up shop and the business was history.

I’ve seen something similar happen on two other occasions also. It is easy to say in hindsight that ‘they should have done this or that’, but they just couldn’t react quickly enough.

So, if you think you can take a system with tens of thousands of active users and loads of existing data and re-architect it for scalability and migrate a large database to a different technology within a few days, good luck to you. I don’t want to even try it.

Not long ago, I agree, that the best options were non-SQL stores, but that is changing. While there are not yet any inherently-scalable SQL technologies on the market, a few are getting ready to launch (and already have limited availability for pilots, beta tests etc).

I’m going to stick with the minor additional effort of just architecting for scale at the outset and then if a situation like that strikes when I’m on vacation, I’ll stay on vacation :wink:

Cheers.

No, obviously I don't *know*. However, I've watched as businesses have gone under precisely because they didn't architect for scale at the outset. There is no reason not to it as it really is little more effort than not doing so.

I've build apps for scale and I can say this is definitively *not* *true*. It's not just a matter of using UUID's instead of doubles or ints for your primary key and a few other tweaks. Firstly, what you need to change depends on the exact load characteristics. Often a number of levels of caching are a piece of the puzzle, but a message based asynchronous approach to mutable state in the app is often required. For example, I often use an event-sourcing style approach where any entity state in the db is simple a cache of a query of all of the events that have happened to that entity over time. it's a great approach for scaling writes effectively, but it is definitely extra effort to implement as none of the mainstream web frameworks think in terms of events and optimizing for immutability. It's somewhat easier where you're scaling for reads than writes, but there are a lot of usage specific questions that drive the best scaling strategies and IMO unless you know out of the box that you *will* have huge load, it's pure waste in the lean software development sense of the term.

Consider this scenario which I watched pay out: <snip of scaling nightmare story> I've seen something similar happen on two other occasions also. It is easy to say in hindsight that 'they should have done this or that', but they just couldn't react quickly enough.

And what *percentage* of the projects you've ever seen have had this issue? By definition, the number of sites that will be in the (say) top 10,000 for traffic is very small compared to the number of sites that are built.

Best Wishes,
Peter

No, obviously I don’t know. However, I’ve watched as businesses have gone under precisely because they didn’t architect for scale at the outset. There is no reason not to it as it really is little more effort than not doing so.

I’ve build apps for scale and I can say this is definitively not true.

I guess I should have qualified that to say that it is little more effort if you have the tools available. Having built scalable services several times, I’ve the experience and (custom) tools available. You are right, that starting from traditional/legacy systems such as existing RDBMs and most frameworks out-of-the-box would be a significant effort until modern tools come to market and mature.

[…] By definition, the number of sites that will be in the (say) top 10,000 for traffic is very small compared to the number of sites that are built.

True, but irrelevant to the owners of those sites those businesses fail as a result regardless of how improbable it seemed beforehand. If it happened to a customer who’s site I architected, I’d hardly feel good explaining, after their business was bankrupted, that I didn’t bother building it for scale as it didn’t seem very likely to need it - since only a small percentage do.

Anyway, most of our customers have high-scalability as a requirement - so regardless of if they’re dreaming, that is what they get.

My hope in at looking at Rails for this new project (which isn’t critical and hence one I can take the risk of experimenting with a new technology), was that being relatively new, it might be less work to incorporate the features required for scalability. Unfortunately, it isn’t looking that way.

Cheers.

If the big focus is scalability I’d look at clojure, scala or erlang.

I think i understand David's point: "scalable" is true or false.
Could be nice if scalability was planned for from the beginning.
I do not know anything about it, but i hope that the issue discussed
here is only about Rails, i hope that Ruby can be used for scalable
applications.

Alexey.

I think i understand David's point: "scalable" is true or false.

I'm not sure that is Davids point at all. It sounds like he has experience with building scalable applications and notices that some best practices for scalability like UUID primary keys are not the default way in Rails. It seems to me that he is more than sophisticated enough to realize that scalability isn't a binary choice.

Could be nice if scalability was planned for from the beginning.

I think there are some architectural defaults that could be changed which would help, but scalability is so large, complex and specific to a given use case I don't think there's any way to "just build scalability in".

I do not know anything about it, but i hope that the issue discussed
here is only about Rails, i hope that Ruby can be used for scalable
applications.

Rails has some conventions which are not optimal for scaling certain types of applications. All languages can be used for scalable applications - although you need to look at the performance characteristics at runtime for a given application. The main issue for scaling with Ruby is that it's an OO rather than functional language which raises fundamental issues in managing mutable state. You can work around that by writing Ruby in a more functional style and using architectural patterns that don't depend on such shared state.

I think a way more interesting question is whether Rails is productive than whether it is scalable. I would much rather build something quickly in a productive framework and then revisit parts of the app if scale became a concern than spend way more to build a scalable app that nobody ends up using. I use Rails for it's productivity and am enjoying it more with each project I deliver.

Best Wishes,
Peter

rails has become a lot better in the years thanks to the core devs,
contributors and the community for pushing it that way. it definitely
depends on how you build your application.

there are couple of things i can recommend and i am following them in
practice.

- optimize your queries like hell.

- design your database well and so you can avoid joins as much as
possible. for example, you can have posts and comments. in order to
display comment count of blog post in your blog home page, instead of
doing the following query "select count(*) as count from comments
where post_id = x" you can add a column in posts table called as
"comment_count" and show them there.

- use faster solutions wherever you can.

- use solr or sphinx for full text searching.

- make sure your views are perfectly fine and does not contain ruby
code more than enough. remember, the rendering takes time, as well.

- cache whatever you can.

good luck.

rails has become a lot better in the years thanks to the core devs,
contributors and the community for pushing it that way. it definitely
depends on how you build your application.
there are couple of things i can recommend and i am following them in
practice.

I think all of those are really good points, but they relate to incremental scaling up, not scaling out. They will allow you to get more performance from a single server. For substantial scale, it’s more a case of architecting so that you can throw multiple servers at the problem.

I’ve never tried to build out a Rails app with ten front end servers speaking to a cluster of back end (SQL) db servers, but I’m guessing there would be problems with database contention and you’d start to have to take a lot more interest in failed saves - especially if you need to scale a write heavy application. That’s where a fundamentally different architectural approaches based on designing from the get go for eventual consistency and asynchronous messaging are really important (assuming you don’t need immediate consistency for most of your app).

Personally I quite like the event sourcing model (http://martinfowler.com/eaaDev/EventSourcing.html) as it effectively gets rid of mutable state and makes any database values for entities a cache rather than an authoritative source. It’s a different way of writing things, but it makes scaling out trivially simple. If I get some time I may have a play to see the best way of providing an easy implementation of this in the Ruby world. I see something here (https://github.com/cavalle/banksimplistic#readme) but haven’t had a chance to play with it.

That said, I just got brought in to do some architecture on a JRuby app that is going to genuinely need substantial write scaling from day one, so I may just get a chance to play with some of this if it makes sense to keep this in Rails vs. just using Rails as a thin layer and handling all the contention logic using a message bus or an eventually consistent, write scalable NoSQL data store like Cassandra with callbacks for contention handling.

Best Wishes,

Peter

For now you are going to have to execute some sort of SQL to set non-standard Rails primary keys. The below article covers all the steps and gems you will need to make setting primary keys other than the auto increment integer work. The SQL call outlined in this article is pretty standard and you should not have issues moving it from database to database.

http://roninonrails.blogspot.com/2008/06/using-non-standard-primary-keys-with.html

On the issue of scaling, Rails has come a long way since it’s early inception and the issues Twitter had. Rails can scale, just don’t be afraid to do some work to make it happen. Here is a 21 screen cast series from Gregg Pollack and New Relic that discuss and show you how to scale Rails.

http://railslab.newrelic.com/scaling-rails

B.