While I agree that projects that are going to be handling large amounts of user images should go with imgproxy (or similar) from the start (and if not for the fact that we have 5 years worth of code built on top of AST variants we’de also make the switch), I still would like to provide a few comments on your upsides:
The go code in imgproxy will execute faster than the ruby code in Rails. That said, this part of the workflow should be only a small % of the total response time when you take into account the time taken to download/upload (network and storage latency) and process the image (libvips/cpu).
It’s the libvips that has to handle the image bomb, not imgproxy or rails. Still, you are right that crashing imgproxy is preferable over crashing your puma servers and if you have strict security requirements, then AST is a no go. Though the maintainers of libvips are pretty good about handling CVEs.
I have to partially disagree with this one. Active Storage variants are fully baked into rails. You just call .variant and the framework handles the rest. Using imgproxy is not going to reduce the amount of code you have. It’s simply replacing the .variant call with the imgproxy_url call. I say partially, because I know some people are creating jobs to pre process variants, and in this case yeah, it’s less code. Also, if you are using a CDN other than Cloudflare (say, Cloudfront), you probably have some weird code to generate the routes needed, which would also be removed.
PS: Sorry for the ping, @Envek. I forgot to hit reply to your message.
Thanks for the great post! I’m curious about how you handle virus checks for uploaded files.
Our company uses a ClamAV server, which our Rails app calls before uploading files to backend storage. We also considered direct uploads, but decided to perform the virus check before rather than after the upload, leading us to abandon that idea. This decision was also influenced by the nature of our application, which doesn’t handle a large volume of uploads.
RE validation: there’s been a few attempts at this. https://github.com/rails/rails/pull/41178 is an example, it builds on several others. The hard bit is how direct uploads works; I think in my PR I found a path that would work with direct uploads, but it never got enough review and ultimately went stale.
My philosophical take on Active Storage is that it’s a good example of Rails being great if you want to build a Basecamp (or Hey) clone, but not being so great for many other use cases. For example, working with files with associated business logic, or managing lots of files, or validating uploads. I think it does Rails a disservice to have a first class feature that works really well for specific use cases, but does not generalise so well. If it were up to me, it would be a second-party gem. Contrast this with Active Record, which works very well in most web apps and as a result is very popular.
(I’d make the same argument about Action Cable, and I’d make a similar argument about Action Text and Action Mailbox - though I think the issue with those is less about not working for general use cases, and more about only solving niche use cases in the first place.)
That depends on how important fast loading images is for you, how many of them you have, and if your users are concentrated somewhere that Cloudflare has an R2 datacenter.
Every storage service has a latency to locate and serve the file you want, and just because Cloudflare is a CDN, this does not mean that every file you upload to R2 is automatically cached in the PoPs.
So if you are, say, in the US and you don’t mind a small delay when images are displaying, you should probably use public mode instead of proxy mode. This way you don’t have any load on your servers, and you don’t have to pay for outbound data, since R2 only charges for storage and requests.
However, if you are somewhere in South America, like me, where there are no R2 datacenters and displaying images quickly is important, your best bet is to use proxy mode and have a reverse proxy in front of puma that can cache your files (like nginx), so that you can avoid the extra load on puma servers and latency when streaming files from storage. Of course, that means you are paying outbound data to your host (AWS, GCP, Azure, etc).
I was thinking that Cloudflare’s Cache Reserve might help solve the problem by keeping the images in an S3 automatically, so that if the cache is deleted in some datacenter, it goes to the reserve first before going to the origin server. Do you think that would be useful?
While testing I noticed that caching with cloudflare was not working until I set a rule. This is because cloudflare does not cache things that come with cookies. so I made a small pr that disables the session. Turns out several others had had this problem and the solution was there a while back. In case anyone can check it out.
I made the same journey and ended up going back to Carrierwave because of the difficulty serving the uploaded files without hitting the Rails app. While I was impressed by AS in general this ended up being a show-stopper for my particular application. Involving a third-party service to serve key parts of a site is not always a good option, and I wish AS would add support for local storage that is compatible with popular front-end proxies like NGINX. If/when that happens I would be happy to switch back to it because AS offers many other benefits. Perhaps this functionality could be added as a separate Gem?
I am in the position where I have to choose between Carrierwave and Active Storage. My use case is pretty simple, I have ~30 product and for each product I have 3/4 images. I do not want to use amazon s3 or any other external service for just 120 pictures, I think it’s an overkill, I prefer to store them locally on my server. Does Active Storage provide a way to keep images on the server without s3 or similar services?
Yes, it does. You can define a local: service in config/storage.yml like so:
local:
service: Disk
root: <%= Rails.root.join("storage") %>
And tell your app to use it in your environment file with:
config.active_storage.service = :local
By default ActiveStorage will use signed access for every file it stores, but this represents an unnecessary overhead for images on a public website. You can disable signed access by adding
public: true
to your service definition in config/storage.yml. This reduces the overhead (and the length of <img> URLs) - but you’ll find that you still cannot serve those images without going through ActiveStorage. They exist on disk only as blobs, and I spent quite some time trying to figure out how to give my front-end server access to them. My conclusion: it’s like forcing a square peg into a round hole. If it’s even doable, it’s more effort than it’s worth; ActiveStorage was not designed for this usage scenario - it’s designed for storing large numbers of files on distributed CDN services where you need high security and high granularity. I’m sure it’s super duper awesome for that, but like you my needs are more prosaic, and simply not a good fit for AS.
It’s telling how the ActiveStorage documentation refers to a local disk storage such as defined by the configuration above:
Active Storage facilitates uploading files to a cloud storage service like Amazon S3, Google Cloud Storage, or Microsoft Azure Storage and attaching those files to Active Record objects. It comes with a local disk-based service for development and testing and supports mirroring files to subordinate services for backups and migrations.
My emphasis. I actually read this paragraph before starting my project, and I did reflect over the strange wording wrt local disk storage, but didn’t take it seriously. In hindsight that was a mistake.
Thanks for taking the time to write this up @brenogazzola. As a newcomer to AST I’ve spent the last week getting to grips with it, reading docs and many blog articles. Your post with its logical sections, lists and bullets covering the workings and features was most helpful for me in understanding and visualising AST. Much appreciated.