Designing for Mass-Uploading

Hi,

I am planning a project which is very heavily built on the premise of
users being able to upload lots of data (maybe a bit like Flickr but not
for photos). They may choose to upload quite a bit in one go - perhaps
up to 100MB at the extreme, and they will upload a few MB every week or
even day.

I have great worries about doing this in Rails however, as I understand
that a Rails instance blocks during an upload. Having multiple instances
running behind Mongrels isn't really a feasible solution, because even
with as little as 1000 uses I'd be worried that all it would take is a
few of them to upload over slow connections and all my instances are
blocking.

I'm sure there's a better way to handle this type of app, but I'm very
eager not to start building my project if Rails is inherently not suited
to my project. If anyone has any advice on Rails' blocking issue, or
things to look at, or things I can do to work around this issue smartly,
I would be deeply grateful.

Many thanks,

- N.

The uploading itself DOESN’T block your mongrel, it’s the processing afterwards that will use single threaded Rails and thus block the mongrel instance. Now there are a few ways to handle this flow:

  • If it’s just saving the file afterwards, plugins like attachment_fu will just move the Tempfile into a permanent file, which shouldn’t take very long

  • If you need to do processing afterwards, you hand the file over to a backgroundrb process and let that do the processing, use a PeriodicalUpdater on your page to periodically poll a Rails method that in its turn asks the backgroundrb worker its progress. The advantage of this approach is that you can basically let the user continue doing other stuff while the upload is being processed (similar to what happens when you upload a movie to youtube)

A third approach would be to use Merb for handling the file uploads. Merb is a different framework, you’ll run it on a separate mongrel and port, but it’s multithreaded and thus very suited for file upload handling (amongst a lot of other things, it’s a really nice framework, but you don’t get as much out-of-the-box as you do in Rails). It can use ActiveRecord (and IIRC even Rails’ models). Afterwards you redirect the user back to the rails app.

Best regards

Peter De Berdt

Peter, thanks for your reply, much appreciated.

All I'll want to do with the uploaded file is take a hash of it (to use
as an ID for various reasons) and store it in the filing system, a note
of its location will be entered in a new database record.If the
uploading phase doesn't block and leaves that instance open to serving
new connections it sounds like I might be able to get away with doing
nothing at all?

However, in the future I may like to index any text in the uploaded file
(for search purposes), and so I may go with a backgrounDrb solution from
the beginning (this sounds like it will be easier to provide a progress
indicator to the user too, which will be necessary for big files).

I haven't looked at attachment_fu, so I'll go and take a look at that
now, and see how that may fit in with my plans.

Thanks for your kind reply, it's given me plenty to think about :slight_smile:

All I’ll want to do with the uploaded file is take a hash of it (to use

as an ID for various reasons) and store it in the filing system, a note

of its location will be entered in a new database record.If the

uploading phase doesn’t block and leaves that instance open to serving

new connections it sounds like I might be able to get away with doing

nothing at all?

Attachment_fu will save the files to either the filesystem or the database (and do some thumbnailing if you need it). You can use the callbacks of attachment_fu to calculate the hash of the file, index the text, …

However, in the future I may like to index any text in the uploaded file

(for search purposes), and so I may go with a backgrounDrb solution from

the beginning (this sounds like it will be easier to provide a progress

indicator to the user too, which will be necessary for big files).

For upload progress there are two possible solutions:

  • Use a Flash uploader such as SWFUpload (swfupload.org), this is a fantastic solution I’ve used in several of our apps. The nice thing about SWFUpload to me is that you can filter out filetypes/maximum size clientside, the upload stream is monitored client side and the upload dialog allows multiple file selection. Sadly, Rails 2’s security measures and cookie based sessions have broken flash uploaders and the solutions that have come up so far apparently don’t do the job on all browsers.

  • http://mongrel.rubyforge.org/docs/upload_progress.html

Best regards

Peter De Berdt

That's fantastic Peter, thanks again for your reply, it's more than
helpful.

I hadn't considered a Flash uploader on the client, but I can see the
clear advantages despite the purist in me wanting to keep to
browser-supplied technologies. I'll take a look at SWFUpload.

I haven't looked too far into the Rails 2.0 changes (although I quickly
realised the scaffold differences) so your comment about the new
security measures that are causing you problems is intriguing. At the
risk of asking you to spend more of your time in this thread would it be
possible for you to expand on that with a link to some of the issues if
you have a moment?

Many thanks again.

http://groups.google.com/group/rubyonrails-talk/search?q=swfupload&start=0&scoring=d&

Check all the messages after yours in the list :slight_smile: The topic came up just recently. For now I’m not upgrading my existing Rails 1.2.6 apps until a 100% working solution has been found.

Best regards

Peter De Berdt

Much appreciated. Thanks again.