My application accepts a form to create multiple large(100's of MB)
temporary files and then zip's them up to send off to a user.
I have the files constructed, and the zipping working. The problem is
that if I use send_file to send the zip off to the user, I cannot
delete the file afterwards as it seems send_file forks off another
process and deletes the file before the streaming starts. I tried to
put my File.delete() in a method called by after_filter but that
caused the same issue.
I don't want to load everything into memory and use send_data as there
is the potential for these files to be 200+ MB. If send_data accepted
a block that would be cool but it isn't very flexible either.
Am I missing something?
Thanks,
Chad Burt
My application accepts a form to create multiple large(100's of MB)
temporary files and then zip's them up to send off to a user.
I have the files constructed, and the zipping working. The problem is
that if I use send_file to send the zip off to the user, I cannot
delete the file afterwards as it seems send_file forks off another
process and deletes the file before the streaming starts. I tried to
put my File.delete() in a method called by after_filter but that
caused the same issue.
I don't want to load everything into memory and use send_data as there
is the potential for these files to be 200+ MB. If send_data accepted
a block that would be cool but it isn't very flexible either.
Am I missing something?
My understanding is that send_file sets a header that gets passed back to your front end web server and instructs that to send the file in question. So, no matter where you put your File.delete() it's going to do it before the front end web server gets a chance to send it.
I would look at a periodic clean up script to remove zip files older than say 30 minutes (or whatever length of time it takes for your users to download them)...
I'm wondering if there are any solutions other than using cron or the
like. That involves accounting for download speeds, deciding to store
the timestamps in the filename vs the filesystem vs a database table.
Also it's another point of failure and deployment step.
It seems like this should be a common problem. Is this just an
oversight of http?
-Chad
I'm wondering if there are any solutions other than using cron or the
like. That involves accounting for download speeds, deciding to store
the timestamps in the filename vs the filesystem vs a database table.
Also it's another point of failure and deployment step.
It seems like this should be a common problem. Is this just an
oversight of http?
Don't think so... same thing would happen if you manually removed a file
while it was being downloaded...
You don't need to store the timestamps in the filename... just use the
last accessed time... if that's longer than an hour say remove it.
Should be able to do the whole thing with ruby...
-philip
On a 'nix system, you should be able to delete the file at any time
after the download has started. 'nix filesystem semantics usually
give an opened file handle access to the file contents -- as they were
at the time it was opened -- until the file handle is closed. At that
point, the filesystem will reclaim the bits.
If you can detect when the file is being downloaded (instead of when
it's created), just delete it then.
YMMV on other platforms.
J.
This is a sane approach, because it would be crazy to try to hold
these in memory for the duration of the download.
But cut yourself some slack on your cleanup. It looks like you're
trying to over-engineer that part of the problem.
Just schedule a Unix cron job (or Wintendo scheduled task) that cleans
up files older than, say, 2 days in the temporary zip construction
directory. This cuts people some slack when they have trouble
downloading it and need several attempts, possibly spread out over a
day.
In other words, you can make things _better_ for your users by being
_less_ brilliant on the server side. 
An example in cron on a Linux host might be:
0 * * * * find /path/to/zips -type f -mtime +2 | xargs -r rm -f
Ciao,
Sheldon.
Ok, cron is seeming more reasonable now. I actually have to run a cron
job anyways to maintain concurrency with another database. It runs a
rake task and I will just add a new task to that file. Just wanted to
make sure I wasn't adding another step where I didn't need to.
Thanks,
Chad