Something like a file server

Hi, I'm sorry if this is slightly OT, but I'm trying to find a way to do the following. I have a bunch of processes that generate regular update files. Each file may be between 100KB - 4MB in size.

On the other side, I have people who want to pick up the most recent version of this file. So, I need to give them a fixed URL to the file. When they make a request, they would like to get only the most recent file. Initially, I thought that something like FTP would work, but I run into the problem that if a request comes in when the file is being updated, the client may not get a valid file.

The other extreme is to have an upload controller and a download controller and then store the file in a database, so that it can be served up through the database. The database serializes it so that I don't have to worry. But, since the files can be large, it seems a bit of a waste to use this approach.

Is there a better way? Anything that you would recommend?

Thanks, Mohit.

A common approach used for something like this is to have a "current" symlink, and update it, whenever you have a newer file.

Eg: $ touch some-file-v1 $ ln -s some-file-v1 current-version $ touch some-file-v2 $ ln -sf some-file-v2 current-version

If you give people the URL to "current-version", and only update the symlink after you've created the new version, then they won't be downloading a file before it's completely written out to disk.

We use the "store the files in the DB" approach for a few of our projects at $work, and that works pretty well for us. It's not really a waste, especially if you plan on having multiple "physical" webservers.

-Jacob

Hi Jacob,

Thanks for the quick reply.

Jacob Helwig wrote:

A common approach used for something like this is to have a "current" symlink, and update it, whenever you have a newer file.

Eg: $ touch some-file-v1 $ ln -s some-file-v1 current-version $ touch some-file-v2 $ ln -sf some-file-v2 current-version

If you give people the URL to "current-version", and only update the symlink after you've created the new version, then they won't be downloading a file before it's completely written out to disk.    This seems simple enough! Would this approach also work if someone was already accessing the older file when we try to do the second set of steps:

$ touch some-file-v2 $ ln -sf some-file-v2 current-version

Can I update a symlink while someone is already reading a file?

We use the "store the files in the DB" approach for a few of our projects at $work, and that works pretty well for us. It's not really a waste, especially if you plan on having multiple "physical" webservers.

Actually, I do use this for one of our solutions. The only concern is that you need many more Mongrels if your files are very large - since sending the file from database locks up the Mongrel for a longer period of time... with small files, it works quite well.

Cheers, Mohit. 10/18/2009 | 4:42 PM.

Hi Jacob,

Thanks for the quick reply.

Jacob Helwig wrote:

A common approach used for something like this is to have a "current" symlink, and update it, whenever you have a newer file.

Eg: $ touch some-file-v1 $ ln -s some-file-v1 current-version $ touch some-file-v2 $ ln -sf some-file-v2 current-version

If you give people the URL to "current-version", and only update the symlink after you've created the new version, then they won't be downloading a file before it's completely written out to disk.

This seems simple enough! Would this approach also work if someone was already accessing the older file when we try to do the second set of steps:

$ touch some-file-v2 $ ln -sf some-file-v2 current-version

Can I update a symlink while someone is already reading a file?

This shouldn't cause a problem, because the file was already opened using the old "real" file's information about where it is on disk. (Though this is pretty easy to confirm for whatever your particular setup is.)

We use the "store the files in the DB" approach for a few of our projects at $work, and that works pretty well for us. It's not really a waste, especially if you plan on having multiple "physical" webservers.

Actually, I do use this for one of our solutions. The only concern is that you need many more Mongrels if your files are very large - since sending the file from database locks up the Mongrel for a longer period of time... with small files, it works quite well.

It should also be possible to have whatever is delegating to Mongrel directly serve up files that exist on disk already. Then you could save things from the DB to disk, and not tie up a Mongrel worker after the first hit. This makes your "always download the most recent version" trickier, though.

-Jacob

Jacob Helwig wrote:

It should also be possible to have whatever is delegating to Mongrel directly serve up files that exist on disk already. Then you could save things from the DB to disk, and not tie up a Mongrel worker after the first hit. This makes your "always download the most recent version" trickier, though.    Thanks Jacob. I think you've given me a couple of pointers on how to proceed from here. I guess the above idea can be implemented using Rails caching also.

Cheers, Mohit. 10/18/2009 | 5:29 PM.