CDN caching for Active Storage private assets without exceeding memory

After implementing responsive images with Active Storage in our Rails 7 app, we encounter Heroku “memory quota exceeded” errors. What’s the proper way to limit memory usage? Should caching be used? How to do so as proxy URLs can be short-lived?

All this concerns one image, which changes once per day, rendered with variants of 4 different resolutions for different screen widths. The assets are stored privately in AWS S3 (there is no public: true). The variants are generated by Vips. The variant URLs are rendered in an img srcset attribute using this helper.

If you are still using ImageMagick, first thing you should do is replace it with libvips. For a complete explanation, check my post here, as it explains why Heroku + Active Storage + ImageMagick is so problematic

Also, I’m not sure if you are using proxy or redirect. Redirect should not consume extra memory, but proxy will, so make sure you have a CDN in front of you (Cloudflare) that can absorb some of the impact.

2 Likes

@brenogazzola thanks for the tips!

  • We are already using Vips, not Imagemagick, since we started using Active Storage in Rails 6 (and BTW thanks for the PR for making Vips the default!)
  • We use CloudFlare in front of the app. We’ve added this config to make sure caching works (cf-cache-status: HIT).
  • We use Rails.application.config.active_storage.resolve_model_to_route = :rails_storage_proxy because the Rails documentation says that “this can be useful for serving files from a CDN”.

We initially came up with the above settings to deliver public image assets with Active Storage. However now we would like to deliver private images and audios stored in S3 (without public: true in the config).

Based on the issue I described in this thread, repeatedly proxying the asset data through the Rails server doesn’t look like the best idea. So, what is the proper way to deliver private assets while keeping performance in terms of delivery time to the user and not overusing server memory? Is there a way to use redirect mode while ensuring that assets are somehow cached on some edge servers and yet ensure that the exposed URLs are still short-lived?

This is in effect how CloudFront works, to my understanding. Can something similar be done with Active Storage or should we just go back to using CloudFront? How is this solved in other companies?

If you simply want assets to be private and not overload your servers, you can force rails to use the redirection controller instead of the proxy controller with the helper rails_storage_redirect_path. There will be a performance hit because rails will return a redirect and the asset will be pulled from storage (S3) instead of being served by the CDN.

Otherwise I’m not sure private assets and CDN caching are compatible things. If you let an asset be cached by the CDN, it means there is a URL that can be accessed that will be served by Cloudflare without reaching your servers, which means someone could share it by accident. And if you make that url temporary, them after it expires you’d have to proxy the file again.

However, all that is probably irrelevant. Cloudflare has a lot of PoP’s (46 is NA alone), and unless you are paying for ARGO, each PoP will have it’s own cache. Files are only cached after the third hit (so you will have to proxy each file 138 times for NA), and unless they are accessed frequently, they are going to expire.

I don’t know what your use case for private files is, but I think in real world usage their cache hit would end up being pretty low. It is also something you should pay attention to in your public assets. You might find out that you are proxying them a lot more frequently than you though. Try checking your Cloudflare dashboard for the hit rates.

There’s another option for public files that let’s you bypass the proxy controller and still use get CDN caching. Let’s say your domain is example.com:

  1. Go to S3 and create a bucket named assets.example.com.
  2. Set that bucket as your active-storage bucket in storage.yml, set it to public: true
  3. Go into Cloudflare and create a CNAME for that bucket: assets.example.comassets.example.com.s3.us-east-1.amazonaws.com
  4. Add cloudflare page rules to force caching in the subdomain;
  5. Finally, instead of passing blob to the image_tag, do this:
image_tag blob.url(virtual_host: true)

The end result will be something like this:

<img src="https://assets.example.com/ozf663sus62msm00fwcycqadnnqp"/>

Thank you @brenogazzola. Concerning your last suggestion, as public: true is set, it however means that the URLs will be long-lived and can therefore be shared around, correct?

What I am looking for is a system like Amazon CloudFront using signed URLs (signed by AWS Presigner): the assets are cached at the edge for fast delivery, however the URLs are short-lived so the asset needs to be re-accessed every time with a newly generated URL, which therefore requires the user to be authenticated in the app.

Without this possibility, I don’t really understand the interest of non-public files in Active Storage. It sounds like we are forced to choose between security and performance, but nowadays to be able to serve a worldwide audience we really need to be able to deliver through a CDN. For instance in our case we have servers in Europe but customers in Australia, and without a CDN the download times are just too long. On the other hand, we do want to be able to deliver more sensitive (e.g. paid-only) assets. How could we technically overcome this dilemma? It would be wonderful if it could be achieved somehow with Active Storage to benefit from the deep Rails integration, variants, etc. (In the future I would even hope to revive this PR to enable variants for non-image assets.)

Right, so I don’t think you can get that with Cloudflare. However, maybe you can do it with Cloudfront? If you check the .url option for blobs in the s3_service you will see it handles public and private assets differently and it can also take an option hash that it will forward to the aws-sdk-s3 gem. So if the gem itself has cloudfront support (maybe with the virtual_host option?) you might be able to do what you need.

service.rb:

    def url(key, **options)
      instrument :url, key: key do |payload|
        generated_url =
          if public?
            public_url(key, **options)
          else
            private_url(key, **options)
          end

        payload[:url] = generated_url

        generated_url
      end
    end

s3_service.rb:

      def private_url(key, expires_in:, filename:, disposition:, content_type:, **client_opts)
        object_for(key).presigned_url :get, expires_in: expires_in.to_i,
          response_content_disposition: content_disposition_with(type: disposition, filename: filename),
          response_content_type: content_type, **client_opts
      end

      def public_url(key, **client_opts)
        object_for(key).public_url(**client_opts)
      end

As for the PR, it seems Active Storage has no official maintainer right now. I also had a few PRs I decided to close since after the first view reviews they got stuck…

Interesting, I could investigate that. But I think I first need to understand when CloudFront expires cached assets. My basic understanding of CDN caching is that it’s worthwhile only if the assets seldom expire. Whereas for security reasons, URLs should be short-lived. Therefore to meet both needs of performance and security, my understanding is that there would be a need to have a long caching time, but a short URL expiration time.

Based on the private_url shown above however, I see a single expires_in, which seems to suggest it’s the same value used for cache expiration and for URL expiration? Or maybe it only expires the URL and the asset still stays for an undetermined while longer in the cache of the CloudFront edge server?

I must say I still understand too little about how this all works. I’ll try to do more code/doc reading and testing. In all cases, thank you @brenogazzola for your very helpful suggestions.

1 Like