Large file storage in database

I'm in the planning stage of an app that will store images and short video clips approximately 1 meg in size from user uploads. My partner wants to use the database to store all of the files, mainly for scaling purposes.

I recently built a similar app using the filesystem, but that was to be run from a single machine. This new app will have 3 application servers and a fileserver for statics. I still think it would be effective to use the filesystem to store the files and write them via a NFS share to the fileserver, but I am open to using the database. Has anyone done anything like this or have any information of the performance of this type of setup? I would appreciate any comments.

Jason

I'm in the planning stage of an app that will store images and short
video clips approximately 1 meg in size from user uploads. My partner
wants to use the database to store all of the files, mainly for scaling
purposes.

I recently built a similar app using the filesystem, but that was to be
run from a single machine. This new app will have 3 application servers
and a fileserver for statics. I still think it would be effective to use
the filesystem to store the files and write them via a NFS share to the
fileserver, but I am open to using the database. Has anyone done
anything like this or have any information of the performance of this
type of setup? I would appreciate any comments.

I think as long as you're aware of how much space the files will use up in your database *and* you cache the results to the filesystem upon first retrieval, keeping them in the database isn't so bad.

I wouldn't use NFS simply because i hear (and have seen) too many instances where it hangs. You also don't get any reduncancy (although you don't with a single db instance either).

What we do (ours is for internal content admin folks, not end users) is upload the data, then do an internal post to our 'media master'. When an end user hits page that needs that content it talks to one of 4 media slaves. The slaves are configured so that if they don't have it they ask for it from the master. So within a short time the content has been replicated out to four servers and served statically.

-philip

This was beat to death some time ago:

http://www.zend.com/zend/trick/tricks-sept-2001.php?article=tricks-sept-2001&kind=tr&id=2033&open=1&anc=0&view=1

So, I recently built an engine for our server to deal with images,
which caches them into the public directory to be served by apache.
The article Greg put up doesn't deal too well with the bigger rails
issues that make serving images from rails really slow...

- Rails is particularly poor at multiplexing lots of connections.
- Rails, by default, loves to dump your database inserts (particularly
if they fail) to your log.

We used to have images in our database, but that caused some pretty
massive problems with the log. It also makes it particularly bad
because you snag your model to do something like ask, how many users
does this image have, and oops, you just pulled a crapload of data from
your database, yeowch!
If you're serving images off of your db, you'll also want to make sure
that you only serve them up if they're NOT out of date... here's a
handy snippet of code to do that:

def only_if_modified(date)
        minTime = Time.rfc2822(@request.env["HTTP_IF_MODIFIED_SINCE"])
rescue nil

        if minTime && date <= minTime
            render_text '', '304 Not Modified'
        else
            yield
        end
    end
    helper_method :only_if_modified

Here's the solution I pulled off, and why I did it.
- Apache is awesome at serving up images.
- We've got pictures which we want to serve up in lots of different
sizes, like thumbnails.
- I wanted people's browsers to cache the images without much work.

So, what i did was, I made a caching solution that stores the images
into a public directory and caches different versions of the image into
a directory /public/images/cache with unique hashes based upon the
image and the size. I haven't done any real performance metrics on
this, but my observations noted a pretty huge performance gain.

Cheers,
-Bramski