scalable file uploads with Rails

Hi,

I'm involved in a project where I have to re-architect file uploads in a Rails application to make it scalable. Users will be uploading large XML files (approx. 1MB) with high probability of overlap (upload at the same time) - which we try to minimize. The current system runs Mongrel cluster (3 Mongrels) and Apache mod proxy balancer. The file upload is done using attachment_fu.

What choices do I have? 1. Throw more Mongrel processes in the Mongrel cluster. We are already have other applications running Mongrel clusters on the same machine, so this option is limited.

2. Use BackgrounDRb. I looked a bit into BackgroundDRb, but I'm not sure it can help. Even if a middleman passes the upload task to a worker process, would that work? First of all, can you even pass the upload task? How would you do it? Would that completely free up the Mongrel process? Would I have to scale the BackroundDRb process, or is there scalability built in? I couldn't find an example on the web that does just that.

3. Use Merb. I'm still trying to get my head around it. I found 2 examples that show how to do file uploads with Merb, but they are kinda old, and Merb went through a lot of changes in the last year. Even if I could get one upload example working, how do I deal with scalability? Would I start a bunch of these Merb processes and use a proxy balancer to distribute the file uploads? From what I'm reading, these would take much less memory than having Mongrel processes running Rails, so I guess that would help me. I don't think I've seen any examples on the web that do it.

4. Write my own cgi c/c++ upload functionality. This will get nasty because files are transmitted with multipart where each packet has a header, etc. If I could get this to work, then I leave the upload functionality to Apache (which I guess would do a good job about scaling the uploads and it will be fast too) and I'll run some Ruby cron jobs which parse the files on the web server.

I appreciate feedback to any of these choices.

Thanks, Tiberiu

if you are up to it, you can also use JRuby. JRuby uses native threads so you should get good non-blocking performance without having to configure any "runtimes". I use it and get great performance.

Adam

We started using the nginx upload module about a month ago and it works great. Whatever you do you don’t want rails in the file upload loop on a busy site. You can easily starve out other requests and put your servers into a death spiral.

http://www.grid.net.ru/nginx/upload.en.html.

Chris