Designing for Mass-Uploading

Hi,

I am planning a project which is very heavily built on the premise of users being able to upload lots of data (maybe a bit like Flickr but not for photos). They may choose to upload quite a bit in one go - perhaps up to 100MB at the extreme, and they will upload a few MB every week or even day.

I have great worries about doing this in Rails however, as I understand that a Rails instance blocks during an upload. Having multiple instances running behind Mongrels isn't really a feasible solution, because even with as little as 1000 uses I'd be worried that all it would take is a few of them to upload over slow connections and all my instances are blocking.

I'm sure there's a better way to handle this type of app, but I'm very eager not to start building my project if Rails is inherently not suited to my project. If anyone has any advice on Rails' blocking issue, or things to look at, or things I can do to work around this issue smartly, I would be deeply grateful.

Many thanks,

- N.

The uploading itself DOESN’T block your mongrel, it’s the processing afterwards that will use single threaded Rails and thus block the mongrel instance. Now there are a few ways to handle this flow:

  • If it’s just saving the file afterwards, plugins like attachment_fu will just move the Tempfile into a permanent file, which shouldn’t take very long

  • If you need to do processing afterwards, you hand the file over to a backgroundrb process and let that do the processing, use a PeriodicalUpdater on your page to periodically poll a Rails method that in its turn asks the backgroundrb worker its progress. The advantage of this approach is that you can basically let the user continue doing other stuff while the upload is being processed (similar to what happens when you upload a movie to youtube)

A third approach would be to use Merb for handling the file uploads. Merb is a different framework, you’ll run it on a separate mongrel and port, but it’s multithreaded and thus very suited for file upload handling (amongst a lot of other things, it’s a really nice framework, but you don’t get as much out-of-the-box as you do in Rails). It can use ActiveRecord (and IIRC even Rails’ models). Afterwards you redirect the user back to the rails app.

Best regards

Peter De Berdt

Peter, thanks for your reply, much appreciated.

All I'll want to do with the uploaded file is take a hash of it (to use as an ID for various reasons) and store it in the filing system, a note of its location will be entered in a new database record.If the uploading phase doesn't block and leaves that instance open to serving new connections it sounds like I might be able to get away with doing nothing at all?

However, in the future I may like to index any text in the uploaded file (for search purposes), and so I may go with a backgrounDrb solution from the beginning (this sounds like it will be easier to provide a progress indicator to the user too, which will be necessary for big files).

I haven't looked at attachment_fu, so I'll go and take a look at that now, and see how that may fit in with my plans.

Thanks for your kind reply, it's given me plenty to think about :slight_smile:

All I’ll want to do with the uploaded file is take a hash of it (to use

as an ID for various reasons) and store it in the filing system, a note

of its location will be entered in a new database record.If the

uploading phase doesn’t block and leaves that instance open to serving

new connections it sounds like I might be able to get away with doing

nothing at all?

Attachment_fu will save the files to either the filesystem or the database (and do some thumbnailing if you need it). You can use the callbacks of attachment_fu to calculate the hash of the file, index the text, …

However, in the future I may like to index any text in the uploaded file

(for search purposes), and so I may go with a backgrounDrb solution from

the beginning (this sounds like it will be easier to provide a progress

indicator to the user too, which will be necessary for big files).

For upload progress there are two possible solutions:

  • Use a Flash uploader such as SWFUpload (swfupload.org), this is a fantastic solution I’ve used in several of our apps. The nice thing about SWFUpload to me is that you can filter out filetypes/maximum size clientside, the upload stream is monitored client side and the upload dialog allows multiple file selection. Sadly, Rails 2’s security measures and cookie based sessions have broken flash uploaders and the solutions that have come up so far apparently don’t do the job on all browsers.

  • http://mongrel.rubyforge.org/docs/upload_progress.html

Best regards

Peter De Berdt

That's fantastic Peter, thanks again for your reply, it's more than helpful.

I hadn't considered a Flash uploader on the client, but I can see the clear advantages despite the purist in me wanting to keep to browser-supplied technologies. I'll take a look at SWFUpload.

I haven't looked too far into the Rails 2.0 changes (although I quickly realised the scaffold differences) so your comment about the new security measures that are causing you problems is intriguing. At the risk of asking you to spend more of your time in this thread would it be possible for you to expand on that with a link to some of the issues if you have a moment?

Many thanks again.

http://groups.google.com/group/rubyonrails-talk/search?q=swfupload&start=0&scoring=d&

Check all the messages after yours in the list :slight_smile: The topic came up just recently. For now I’m not upgrading my existing Rails 1.2.6 apps until a 100% working solution has been found.

Best regards

Peter De Berdt

Much appreciated. Thanks again.