Hi,
I’m starting to get back into Rails and loving it again.
After getting Active Storage working to upload an image (a user avatar) I wanted to handle a few things to make the experience more user friendly and also make things as efficient and safe as possible on the server side.
I ran into a couple of things that I couldn’t find in the docs or through various community outlets. There’s 3 questions and I’m going to try and include code when I can from Python / Flask to show how I’ve handled this in the past in hopes it’s more relatable to see how we can handle it here with Rails.
Efficiently setting a custom file name
At the moment Active Storage will use the file name provided by the user for the upload. This is problematic because someone could pick a questionable name. I know you don’t need to display it anywhere but it’s sitting there for anyone to inspect in the HTML. Also from a practical sense we as the site creator should have full control over the file names being persisted.
I read you can update the filename after the fact in various Active Storage related tables since the file name isn’t taken into account on disk or S3 but this requires a 2nd query to update that table directly after the user uploads the file.
Is there any way to set this file name before it’s saved to the correct table?
In Flask this would have been no problem since you can set the name before persisting the file to disk. There’s really no code example needed here since it’s just assigning a filename
variable to be whatever you want such as generating a hashid of the user id or whatever you want.
Validating images by comparing bytes in a stream
Depending on the Content-Type to validate an image or any file isn’t dependable. This value comes from the client since it’s a header that can be spoofed. This could lead to situations where a user uploads an executable with a content type of jpg which will cause all sorts of issues when you try to show an image that’s not an image.
Ideally we should protect our system from having these files enter our system with validations.
Typically you only need to look at the first few hundred bytes of a file to determine what it is. Python’s standard library has a imghdr.what
function you can call which returns back what type of image something is such as a jpg, png, etc. or None if it’s not determined to be an image. This function will let you either pass in a file path on disk or a byte stream allowing you to avoid writing the file to disk until it’s been deemed valid.
Likewise you can validate the size of the file this way too by passing the byte stream to a function that seeks to the end to get its length. This can be the same byte stream as before which is nice because now you’ve validated 2 things without writing a file on disk.
All in all this is really efficient and safe. It means you can validate you really have an image and it’s less than the size you want to accept without ever writing the file to disk or executing a single database query.
Transform the original image (not variants)
Ideally we would want to be able to resize and convert the original image instead of saving the original as it is and then create variants out of it. This prevents having to store a massive 4mb jpg that a user uploaded when we’re only ever going to display a 300x300 version or less that’s probably 5kb.
If you’ve been reading along above, now that we have a stream of bytes that’s been determined to be valid it’s time to write it to disk and accept the file as being valid along with run a single DB query to write some meta data about the file so we can look it up later.
With Python you can use various image processing tools here to resize and convert the image however you see fit, such as maybe converting the avatar to be a 300x300 jpg and now you can choose to also create thumbnail variants if you wish to do so. All of these variants could use your custom file name as a base too.
Rails made it easy to create and transform variants but I didn’t see anything related to the original image, especially not before it’s originally persisted in the DB and as a blob on disk.
All in all, those are the 3 problems I’m trying to solve with Active Storage. I know there’s a few 3rd party validation libraries for Active Storage but I didn’t see any existing ones that handle the above use case. One of them did handle checking the size of the blob
tho, but from the code it’s not clear (to me) on if that blob is read from the DB afterwards or if that’s the bytes of the file in memory before a single DB query has been executed.
Thanks for reading!