Excuse the ignorance here. I have a form that uploads a CSV file to S3, via direct uploads. Since I expect only one file per record, my model has the needed has_one_attached attribute.
So I upload to S3, and now I need to run a background job to process the uploaded file. That being the thing to do, I can only do that once the file, which could be huge and take a long tine to upload, exists fully and completely at S3.
What is the standard way of knowing when that moment arrives in my Rails setup? I get it that records get updated in my model, and things get attached. But does any of that signal actual completion of the upload at S3? What takes care of that signal? If it is the JS, it means I have to listen to an event in JS, and then hit up the server with a message “hey, the upload completed”. So I am dependent on the JS that is uploading to S3 in this case?
Just checking… I have no idea if that is true or not, and if it is, what people would mostly do about it.
JS library sends a request to rails direct uploads controller, to get the upload url
JS library begins upload of the file
JS library finishes upload of the file
JS library creates a hidden field in the form, containing the identifier of the uploaded file
Form is submitted.
Controller action does a save(form_params) that automatically attaches the file.
This means that if you need to process something after a file has been uploaded, the best option is to enqueue the job in the controller action that attached the file, or as an after_commit callback in the controller if there as a change the the has_one_attached attribute.
OK. Roger that. So when the controller attaches the file, even that actual potentially really big file lives at S3, it is complete there, so we are free to Spawn a background job now, and process said file.
I guess I was thinking that that can’t be, as an upload could take say, 2 minutes. So for 2 minutes, the JS has to sit there, waiting. Until it gets an A-OK from S3 that all went well and that file uploaded, nothing server-side should happen. So this POST back to the server, about attaching the file, happens only after it is all done, and that indeed happens under the hood, auto-magically for me, thanks to Rails Direct Upload coding.
Correct. Which is why the direct upload example includes the code necessary to create a loading bar so that the user has some feedback on how much of the file has been uploaded.
And yes, as long as you use rails defaults, everything will be handled automatically in the controller. For example:
<%= form_with @photo do |form| %>
<%= form.file_field :file, direct_upload: true %>
<%= form.submit %>
<% end %>
@photo = Photo.new
@photo = Photo.new(photo_params)
render "new", status: :unprocessable_entity
This should be everything for a minimalistic direct upload. The JS file will create a hidden field called file with the identifier for the uploaded file, and the controller will automatically attach the file when it notices that an identifier has been set to the file attribute of Photo which is a has_one_attached.
And you probably want to enqueue your job inside the if @photo.save, just before the redirect_to
It is interesting. I spent 10 years just using Sinatra and CarrierWave and a combo of Uppy and Dropzone, I never experienced the Rails way of doing things, hence my questions. I have not found too many clear examples explaining the Direct Upload for Rails. Explanations of the actual flow and delegated responsibilities I mean. I appreciate your huge effort here to inform me! Thanks!
Weird. I was looking for the active storage guides that I used when learning, but I can no longer find the js library examples.
The readme for active storage itself contains more practical examples than the rails guide on it:
And here’s the original code I used on our app to show upload progress for one of multiple files. It works by adding event listeners that wait for direct upload events, then using the information that come with them to figure out how much of the file was uploaded and adjust the correspoding file progress bar.
Active Storage tries to make the question of “how” files are stored completely transparent to applications (which is why no extra columns in the database are needed when you need an attachment in a model, and the key is completely random), so defining a separate folder per attachment is not available as a feature.
Out of curiosity, what is the use case for choosing different folders? Is it security? Compliance?
Banal use case. One, any one account at S3 only gets 100 buckets. I have potentially hundreds of clients. I cannot just assign each one a bucket out of my supply. I would have to get them to get their own bucket, but then S3 is so “hard” to configure, that would fall to me, and that is a pain point.
So just envisioned having one bucket and a folder per client. Since all the other upload services work this way, I assumed Active Storage would too. Now I understand. Active Storage does not work this way. Strange but true.
I would love to touch base with the person responsible for the Active Storage S3 service, and try and convince them that it is not really that hard to do. A prefix is not a terrible thing to use with a key. Would not require the whole service to be re-factored.
I honestly am so new to Active Storage, that the source of my question. I had no idea it was limited in that fashion with S3. I could not see the light. Anyway, I accept that that is the way it is. I wish it was different, but I am in no position to argue!!
Yeah, Active Storage was created for a very specific use case and it’s taking a while to get it adapted to others. I use it mainly for image galleries and have to keep working around the fact that finding the url of an image takes 3 queries (one for the model with the image, one for the attachment and one for the blob)
But in this case, it there anything wrong with putting the images of all clients on the bucket’s root, without a folder/prefix? Active Storage uses a 28 character long base 58 string for the key, and all it’s URLs are signed. It should be secure enough?
My problem is not one of security. It is more one of organization. I can point Rails Active Storage at a bucket. From that point on, the end result is, it takes the name of a file like “fizzbuzz.csv”, and regardless of all the blob processing it does, it stores that in the root of the bucket.
Examine that for a second. We can imagine we have a folder “cold-beer” if we simply prefix the name of the file “fizzbuzz.csv” with “cold-beer”. S3 and the AWS gem place the file in my-bucket/cold-beer/fizzbuzz.csv
So we did nothing to Rails Active Storage, compromised no security, but we did use a prefix to tell S3 “hey”, throw this in the bucket “/cold-beer/fizzbuzz.csv”. The file itself is the same. Asking for it is the same. None of the fancy pants work of Active Storage is really challenged here, except it is insisting that the name of the uploaded file be placed in the bucket, because we create a hash out of that combo and not one that perhaps is one with a prefix.
All I can guess is that the Rails team working on Active Storage never uses S3 much, so they just don’t see the need to organize files this way… luckily, I think it is just fooling with the key, but perhaps I am naively missing something actually tough to do here.