ActiveStorage and Direct Uploading

dlazar · January 20, 2022, 3:14pm

Excuse the ignorance here. I have a form that uploads a CSV file to S3, via direct uploads. Since I expect only one file per record, my model has the needed has_one_attached attribute.

So I upload to S3, and now I need to run a background job to process the uploaded file. That being the thing to do, I can only do that once the file, which could be huge and take a long tine to upload, exists fully and completely at S3.

What is the standard way of knowing when that moment arrives in my Rails setup? I get it that records get updated in my model, and things get attached. But does any of that signal actual completion of the upload at S3? What takes care of that signal? If it is the JS, it means I have to listen to an event in JS, and then hit up the server with a message “hey, the upload completed”. So I am dependent on the JS that is uploading to S3 in this case?

Just checking… I have no idea if that is true or not, and if it is, what people would mostly do about it.

brenogazzola · January 20, 2022, 6:06pm

Sequence of steps of direct upload is:

User clicks “Submit” on the form
JS library sends a request to rails direct uploads controller, to get the upload url
JS library begins upload of the file
JS library finishes upload of the file
JS library creates a hidden field in the form, containing the identifier of the uploaded file
Form is submitted.
Controller action does a save(form_params) that automatically attaches the file.

This means that if you need to process something after a file has been uploaded, the best option is to enqueue the job in the controller action that attached the file, or as an after_commit callback in the controller if there as a change the the has_one_attached attribute.

dlazar · January 20, 2022, 6:32pm

OK. Roger that. So when the controller attaches the file, even that actual potentially really big file lives at S3, it is complete there, so we are free to Spawn a background job now, and process said file.

I guess I was thinking that that can’t be, as an upload could take say, 2 minutes. So for 2 minutes, the JS has to sit there, waiting. Until it gets an A-OK from S3 that all went well and that file uploaded, nothing server-side should happen. So this POST back to the server, about attaching the file, happens only after it is all done, and that indeed happens under the hood, auto-magically for me, thanks to Rails Direct Upload coding.

brenogazzola · January 20, 2022, 7:05pm

Correct. Which is why the direct upload example includes the code necessary to create a loading bar so that the user has some feedback on how much of the file has been uploaded.

And yes, as long as you use rails defaults, everything will be handled automatically in the controller. For example:

class Photo
  has_one_attached :file
end

<%= form_with @photo do |form| %>
  <%= form.file_field :file, direct_upload: true %>
  <%= form.submit %>
<% end %>

  def new
    @photo = Photo.new
  end

  def create
    @photo = Photo.new(photo_params)

    if @photo.save
      redirect_to @photo
    else
      render "new", status: :unprocessable_entity
    end
  end

  private
    def photo_params
      require(:photo).permit(:file)
    end

This should be everything for a minimalistic direct upload. The JS file will create a hidden field called file with the identifier for the uploaded file, and the controller will automatically attach the file when it notices that an identifier has been set to the file attribute of Photo which is a has_one_attached.

And you probably want to enqueue your job inside the if @photo.save, just before the redirect_to

dlazar · January 20, 2022, 7:46pm

It is interesting. I spent 10 years just using Sinatra and CarrierWave and a combo of Uppy and Dropzone, I never experienced the Rails way of doing things, hence my questions. I have not found too many clear examples explaining the Direct Upload for Rails. Explanations of the actual flow and delegated responsibilities I mean. I appreciate your huge effort here to inform me! Thanks!

brenogazzola · January 20, 2022, 11:43pm

Weird. I was looking for the active storage guides that I used when learning, but I can no longer find the js library examples.

The readme for active storage itself contains more practical examples than the rails guide on it:

github.com

rails/rails/blob/main/activestorage/README.md

# Active Storage

Active Storage makes it simple to upload and reference files in cloud services like [Amazon S3](https://aws.amazon.com/s3/), [Google Cloud Storage](https://cloud.google.com/storage/docs/), or [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/), and attach those files to Active Records. Supports having one main service and mirrors in other services for redundancy. It also provides a disk service for testing or local deployments, but the focus is on cloud storage.

Files can be uploaded from the server to the cloud or directly from the client to the cloud.

Image files can furthermore be transformed using on-demand variants for quality, aspect ratio, size, or any other [MiniMagick](https://github.com/minimagick/minimagick) or [Vips](https://www.rubydoc.info/gems/ruby-vips/Vips/Image) supported transformation.

You can read more about Active Storage in the [Active Storage Overview](https://edgeguides.rubyonrails.org/active_storage_overview.html) guide.

## Compared to other storage solutions

A key difference to how Active Storage works compared to other attachment solutions in Rails is through the use of built-in [Blob](https://github.com/rails/rails/blob/main/activestorage/app/models/active_storage/blob.rb) and [Attachment](https://github.com/rails/rails/blob/main/activestorage/app/models/active_storage/attachment.rb) models (backed by Active Record). This means existing application models do not need to be modified with additional columns to associate with files. Active Storage uses polymorphic associations via the `Attachment` join model, which then connects to the actual `Blob`.

`Blob` models store attachment metadata (filename, content-type, etc.), and their identifier key in the storage service. Blob models do not store the actual binary data. They are intended to be immutable in spirit. One file, one blob. You can associate the same blob with multiple application models as well. And if you want to do transformations of a given `Blob`, the idea is that you'll simply create a new one, rather than attempt to mutate the existing one (though of course you can delete the previous version later if you don't need it).

## Installation

Run `bin/rails active_storage:install` to copy over active_storage migrations.

This file has been truncated. show original

And here’s the original code I used on our app to show upload progress for one of multiple files. It works by adding event listeners that wait for direct upload events, then using the information that come with them to figure out how much of the file was uploaded and adjust the correspoding file progress bar.

gist.github.com

https://gist.github.com/brenogazzola/1b94ddec30848fd508ba0a88e7df8dca

direct_upload.js

import Rails from '@rails/ujs'

const DirectUpload = {
  start () {
    document.addEventListener('direct-upload:initialize', DirectUpload.initialize)
    document.addEventListener('direct-upload:start', DirectUpload.begin)
    document.addEventListener('direct-upload:progress', DirectUpload.progress)
    document.addEventListener('direct-upload-error', DirectUpload.error)
    document.addEventListener('direct-upload:end', DirectUpload.end)
  },

This file has been truncated. show original

Marcos9324 · February 9, 2022, 6:21am

Thanks for sharing this information. It was useful.

dlazar · February 9, 2022, 11:46pm

Going back to the Breno Gazzola’s explanation and the JS aspect of things.

I am using S3, and I have a bucket. But I do not want to store files in the bucket, instead I want to store files in a folder in the bucket. So my key has to be modified to contain this prefix.

How the heck do people do this with ActiveStorage and Rails 7? I am not seeing any elegant way to store a file via direct upload in a specific place other than a root bucket. Is that true?

brenogazzola · February 10, 2022, 3:54pm

There’s no way to do that. Active Storage does not support prefixes, and the key itself uses the has_secure_token feature of ActiveRecord to be generated.

brenogazzola · February 10, 2022, 3:58pm

Active Storage tries to make the question of “how” files are stored completely transparent to applications (which is why no extra columns in the database are needed when you need an attachment in a model, and the key is completely random), so defining a separate folder per attachment is not available as a feature.

Out of curiosity, what is the use case for choosing different folders? Is it security? Compliance?

dlazar · February 10, 2022, 4:12pm

Banal use case. One, any one account at S3 only gets 100 buckets. I have potentially hundreds of clients. I cannot just assign each one a bucket out of my supply. I would have to get them to get their own bucket, but then S3 is so “hard” to configure, that would fall to me, and that is a pain point.

So just envisioned having one bucket and a folder per client. Since all the other upload services work this way, I assumed Active Storage would too. Now I understand. Active Storage does not work this way. Strange but true.

I would love to touch base with the person responsible for the Active Storage S3 service, and try and convince them that it is not really that hard to do. A prefix is not a terrible thing to use with a key. Would not require the whole service to be re-factored.

I honestly am so new to Active Storage, that the source of my question. I had no idea it was limited in that fashion with S3. I could not see the light. Anyway, I accept that that is the way it is. I wish it was different, but I am in no position to argue!!

Thanks for the feedback, I appreciate it!

brenogazzola · February 10, 2022, 5:45pm

Yeah, Active Storage was created for a very specific use case and it’s taking a while to get it adapted to others. I use it mainly for image galleries and have to keep working around the fact that finding the url of an image takes 3 queries (one for the model with the image, one for the attachment and one for the blob)

But in this case, it there anything wrong with putting the images of all clients on the bucket’s root, without a folder/prefix? Active Storage uses a 28 character long base 58 string for the key, and all it’s URLs are signed. It should be secure enough?

dlazar · February 10, 2022, 6:15pm

My problem is not one of security. It is more one of organization. I can point Rails Active Storage at a bucket. From that point on, the end result is, it takes the name of a file like “fizzbuzz.csv”, and regardless of all the blob processing it does, it stores that in the root of the bucket.

Examine that for a second. We can imagine we have a folder “cold-beer” if we simply prefix the name of the file “fizzbuzz.csv” with “cold-beer”. S3 and the AWS gem place the file in my-bucket/cold-beer/fizzbuzz.csv

So we did nothing to Rails Active Storage, compromised no security, but we did use a prefix to tell S3 “hey”, throw this in the bucket “/cold-beer/fizzbuzz.csv”. The file itself is the same. Asking for it is the same. None of the fancy pants work of Active Storage is really challenged here, except it is insisting that the name of the uploaded file be placed in the bucket, because we create a hash out of that combo and not one that perhaps is one with a prefix.

All I can guess is that the Rails team working on Active Storage never uses S3 much, so they just don’t see the need to organize files this way… luckily, I think it is just fooling with the key, but perhaps I am naively missing something actually tough to do here.

NeilDouglas · February 5, 2024, 8:46am

I’m probably a bit late to the party, but you can specify a key, under which ActiveStorage will put your file in S3. It doesn’t have to be in the root of the bucket. I’m not sure what’s the process is for attaching fire from params, but I think it’s something along the lines of:

record.reports.attach(
  key: "clients/#{client.id}/#{record.class.name.tableize}/#{record.id}/reports/#{n}.pdf",
  io: StringIO.new(binary_data, "rb"),
  filename: "#{n}.pdf"
)

and it will be put into clients/123/records/456/reports/789.pdf in your bucket on S3.

Topic		Replies	Views
Active Storage, Direct Uploads and development rubyonrails-talk	0	531	January 24, 2022
Heroku article on creating a direct to S3 upload rubyonrails-talk	1	222	November 26, 2014
Proposal: Enhance the DirectUpload JS API to support more useful custom blob names (like the ActiveStorage::Attached::*attach(:key) argument) rubyonrails-core activestorage	0	28	November 26, 2024
Attach s3 url to active storage without re-upload the file rubyonrails-talk activestorage	2	1074	July 7, 2023
Active Storage with custom path and validation rubyonrails-talk	1	74	August 13, 2024

ActiveStorage and Direct Uploading

Related topics

More Resources