Local ActiveStorage and URL expiration with big (~long) file uploads

Generally, local active storage service seems to be a bit unloved. I consider it an important part, though, because in my believe it should be conceptually relatively easy to selfhost everything (that it comes with maintenance and management costs is a different story).

It took me a while to understand while my “big” (> 200MB) and thus long (> 5 minutes because of upload-bandwidth restrictions on the server) uploads on a production server failed. I searched the culprit in nginx-settings before finding a fix to set

config.active_storage.service_urls_expire_in = 1.hour

. Then I asked myself: Why does local disk storage actually needs expiring URLs (if not for security by obscurity only)?

1 Like

This seems like a really interesting dark corner you’ve found!

It sounds like you think an simple general fix might be to change the local backend to not have expiring URLs. Is my understanding correct?

The purpose of the disk service is to emulate cloud services locally. It uses signed, short-lived URLs for the same reason as S3/GCS/Azure: to grant temporary read or write access to a stored object. The access grant isn’t permanent, so the signature expires. You can implement different authentication mechanisms with custom controllers.

Configuring a different signature lifetime is legit. That’s why it’s configurable! In your case, it sounds like large uploads are taking longer than the default expiry to hit app servers due to nginx spooling to disk before proxying. Totally sensible to turn the signature expiry knob, then.

As an aside, this isn’t security through obscurity. That refers to obfuscating the implementation of a security mechanism, relying on attackers not knowing how the system works. The system in question isn’t obfuscated; it’s open-source for all to read.

1 Like

I think it’s also fair for us to commit to the fact that Active Storage was specifically designed and intended to work with cloud storage services. The entire API and modeling of the problem is focused on solving that problem first. We would have made different design decisions if the primary target had been local disk. But we should be explicit about that! It may well be that older file storage solutions like CarrierWave or whatever is better suited for the local-disk approach. We shouldn’t be shy about recommending those options.

(But also, if we can make the experience better with local disk without compromising the primary objection – cloud storage – then that’s all game too!)

1 Like

My original comment probably wasn’t constructive enough. Here are some paths forward. We can:

  • Better document how and why to tune signature expiry when using the disk service in production
  • Better document how to use the disk service in production in general
  • Clarify that the disk service is intended for local development first
  • Advise when to consider another storage solution, like David said
  • Tune the default expiry
1 Like

I do not think defaulting to non-expiring URLs for local disk storage is a good change. I posted it here because its a WTF - I do not recall the error message but it certainly could have better (although it might be difficult to implement - do not remember if the response was a timeout or - more likely - a 404).

Regarding the different controller - I figured that out;

There might be technical details I do not understand - but access grant is a kind-of security thing, isnt it? And I also do not mean it “bad” when saying “security by obscurity” - in my use case I have both cases - stuff that shouldnt be accessible to non-authenticated (or authorized) users but also kind of “static” content. Especially for the static content I’d like the URLs not change regularly, especially for caching reasons. I know that this can be achieved with some customized controllers and configurations - which is great design.

Yep I think that this is a “trap” I ran into. Again, its solveable, but most people do seem to use the local approach only for dev and testing (as intended); so I couldnt find good solutions/documentation on it. Conceptionally I think I know what could be done to improve the situation (custom controllers and just a handful of configuration logic lines - I guess we are probably speaking about < 50 LOCs total here) so far I just lack the resources and the pressure - with some caching/performance costs the longer living URLs work - people will have a hard time guessing the URLs of the uploaded content but I can still upload big files.

One a side note (can open a different topic for that if you like), if RAILS_SERVE_STATIC_FILES (e.g. in heroku or dokku), I think there is no reason not to public_file_server.enabed = true and public_file_server.headers = { 'Cache-Control' => 'public, max-age=#{6.month.to_i}". This will add cache headers to make e.g. Google happy.

Ah, good point! We can definitely improve that error.

1 Like

(This question is to everyone, it’s just sparked by this quote.)

Are the appropriate design decisions for a production-focused disk backend & a “local testing” focused disk backend sufficiently different that it’s not appropriate to treat them as the same solution?

I ask because, while I can see “better advice around config tuning” being the solution here, I also wonder whether a separate disk backend might emphasize the idea that local is just for dev mode.

I think a swappable RepresentationController that does’nt expire URLs at all could serve some. Again, my use case also handles “non-static”/protectable content, so I wouldnt want to use this for every uploaded content - but I guess other “all public” applications might benefit from that. And I am also not sure if it really is RepresentationController, I have to recite the situation from memory.

So, maybe a Guide section about the implementation and source locations to hook in (like RepresentationController) might be worthwile - if that API is considered stable.

I like the idea of have the guide walk you through “here’s how to serve from your own controller”. We tell you that this is what you have to do in certain security situations via the comments, but we don’t actually show you HOW to do it! If you’re interested in exploring this guide fix, please do. Feel free to tag me on the PR :pray:

1 Like

Betsy, you mean that DiskService should have been called LocalService or even TestService to emphasize its intentions? I could see that. But I’d actually like to see if we can’t get DiskService to a place where it actually could do the disk service at an adequate level. I’m mostly just explaining how it came to be: ASt was designed for cloud storage, and we mostly just threw in the DiskService, such that there’d be an option to test with. But for someone doing real work with the DiskService to upgrade it to better fit production use (without taking anything away from its role as a testing service for the predominant cloud storage case) would also be great!


“Interested”? Sure enough! Likely that it can happen soon and with quality in my current lifestyle/conditions? Not so sure.

If anyone takes a stab - feel free to reach out. I also take the responses as the permission to create a Github issue for this, once it moves towards my keyboard and would link the issue here - if somebody else is quicker, please add me to the issues nosy-list, thanks.