Case studies for high-availability Rails deployments?

Hello everyone,

We are developing a new version of a satellite telemetry distribution
system currently in use in the European space industry, and are trying
to convince the client to permit us to use Rails as the platform for
the higher-level layers of the system.

The system as a whole has strict high-availability requirements (our
current track record is over six nines yearly), and in order to push
for Rails adoption we need to demonstrate that Rails, and the
technology stack it runs on, would be capable of meeting these
requirements.

So, I'm in the process of putting together a comprehensive argument for
Rails, and am wondering whether anyone on the list would know of case
studies or references of Rails being successfully deployed in strict
high-availability situations? (Googling didn't turn up any, and
skimming through the real-world usage references on the Wiki did not
immediately provide useful leads.)

Also, I would appreciate hearing any thoughts and experience on how the
various Rails stacks (FastCGI, SCGI, Mongrel, etc) measure up both
reliability-wise and in situations where there would be a high number
of concurrent, long-lived server push ("Comet") connections open. Are
there any known memory leaks, for instance, that should be taken into
consideration (I seem to recall that at least the FCGI gem has
previously suffered from leaks)?

I will, naturally, summarize my findings in the form of a blog post or
two, for the benefit of the community-at-large.

Hi Arto,

Not to burst your bubble, but very seriously consider and test
alternative platforms. When I hear "high-available telemetry system for
space industry" I do *not* think of Mongrel or Rails. I don't think of
anything web actually.

Hey Zed,

Not to worry -- despite what it may sound like, we are not talking any
deep voodoo here. This is a high-level, web-based user interface that
controls the lower-level layers of the system (which are based on
entirely different technologies). The legacy version of this web UI is
currently successfully deployed, as it is.

To be more specific, what I'm looking for are simply good arguments
that using Rails instead of, say, Apache + mod_php5 (which would
already be an "approved" technology stack for this project) would not
be unduly detrimental to the uptime expectations of the system as a
whole, and wouldn't cause any other new headaches.

On one level, this just means that the Rails stack should not crash or
leak memory more than, for instance, Apache with PHP5 does, nor place
undue additional load on the system. Surely we can manage PHP5's level
of reliability? :wink:

Also, a number of secondary concerns exist that I'll need to address,
such as the policy for security updates, the expected availability and
support for Ruby & Rails packages on the platform this will run on,
the general availability of developers and sysadmins familiar with
Rails versus more "mainstream" technologies, and so on. Those I have
somewhat mapped out, already.

(There are, of course, provisions for instant hardware failover and
such, but these are outside the current circle of concern.)

In the end, we need to provide justifications for using Rails in the
first place (this, in itself, is not very difficult) and at the same
time try and make an argument that the maturity level of a Rails
stack, a technology not previously used in this industry (AFAIK), is
sufficient to warrant operational deployment.

So, it's not rocket science, really :wink: