Case studies for high-availability Rails deployments?

Hello everyone,

We are developing a new version of a satellite telemetry distribution system currently in use in the European space industry, and are trying to convince the client to permit us to use Rails as the platform for the higher-level layers of the system.

The system as a whole has strict high-availability requirements (our current track record is over six nines yearly), and in order to push for Rails adoption we need to demonstrate that Rails, and the technology stack it runs on, would be capable of meeting these requirements.

So, I'm in the process of putting together a comprehensive argument for Rails, and am wondering whether anyone on the list would know of case studies or references of Rails being successfully deployed in strict high-availability situations? (Googling didn't turn up any, and skimming through the real-world usage references on the Wiki did not immediately provide useful leads.)

Also, I would appreciate hearing any thoughts and experience on how the various Rails stacks (FastCGI, SCGI, Mongrel, etc) measure up both reliability-wise and in situations where there would be a high number of concurrent, long-lived server push ("Comet") connections open. Are there any known memory leaks, for instance, that should be taken into consideration (I seem to recall that at least the FCGI gem has previously suffered from leaks)?

I will, naturally, summarize my findings in the form of a blog post or two, for the benefit of the community-at-large.

Hi Arto,

Not to burst your bubble, but very seriously consider and test alternative platforms. When I hear "high-available telemetry system for space industry" I do *not* think of Mongrel or Rails. I don't think of anything web actually.

Hey Zed,

Not to worry -- despite what it may sound like, we are not talking any deep voodoo here. This is a high-level, web-based user interface that controls the lower-level layers of the system (which are based on entirely different technologies). The legacy version of this web UI is currently successfully deployed, as it is.

To be more specific, what I'm looking for are simply good arguments that using Rails instead of, say, Apache + mod_php5 (which would already be an "approved" technology stack for this project) would not be unduly detrimental to the uptime expectations of the system as a whole, and wouldn't cause any other new headaches.

On one level, this just means that the Rails stack should not crash or leak memory more than, for instance, Apache with PHP5 does, nor place undue additional load on the system. Surely we can manage PHP5's level of reliability? :wink:

Also, a number of secondary concerns exist that I'll need to address, such as the policy for security updates, the expected availability and support for Ruby & Rails packages on the platform this will run on, the general availability of developers and sysadmins familiar with Rails versus more "mainstream" technologies, and so on. Those I have somewhat mapped out, already.

(There are, of course, provisions for instant hardware failover and such, but these are outside the current circle of concern.)

In the end, we need to provide justifications for using Rails in the first place (this, in itself, is not very difficult) and at the same time try and make an argument that the maturity level of a Rails stack, a technology not previously used in this industry (AFAIK), is sufficient to warrant operational deployment.

So, it's not rocket science, really :wink: