[Question] Where in the Active Storage codebase are files closed?

I have been reading PR #45606 that introduced File and Pathname attachables, which is great for testing.

But the PR diff shows that typical usage for a file is to do something like User.create!(avatar: File.open("image.jpg")), and for a pathname the code does attachable.open.

What strikes me is that it doesn’t seem like the file descriptors are closed, unless they get closed somewhere else. I can’t find any close call in the code path that this PR touches. I have checked Attached::Changes::CreateOne, ActiveStorage::Blob, and a few other files.

Can someone point me to the right location in the codebase where the closing is happening? Or is there a potential issue here?

I did a quick dive and only saw .close called when downloading, transforming, or previewing on the new file.

In fact, this actually makes sense. If you were to call .close on the attachable being passed in, then you could cause unexpected behavior when attaching a file object. I.e.

avatar = params[:user_upload] # some IO object

# Update the avatar
current_user.update(avatar: avatar)

# If ActiveStorage called .close on the attachable, then the following
# code would raise an IOError
BespokeMirrorService.backup_to_s3_glacier(avatar: avatar) 

It’s really not a concern though, because once you dereference the attachable variable (which is “immediately” in the case of User.create!(..., avatar: File.open('avatar.jpg')) then it’s eligible for garbage collection, and Ruby’s garbage collector has a special function for handling open files, more on that at the end.

Here is the code snippet that’s actually running:

#lib/active_storage/attached/model.rb
def #{name}=(attachable)
   ...
   ActiveStorage::Attached::Changes::CreateOne.new("#{name}", self, attachable)
end

If you trace everything that runs from here, only the CreateOne object maintains a reference to the attachable, and by design the object is pretty short lived. So the attachable is available for garbage collection basically as soon as you’re done attaching it to the blob/attachment.

As for the Ruby Garbage Collector, I’m very new to C and haven’t explored the Ruby internals all that much. But AFAIK the relevant function is rb_gc_obj_free, which for a dereferenced file calls the following

      case T_FILE:
        if (RFILE(obj)->fptr) {
            make_io_zombie(objspace, obj);
            RB_DEBUG_COUNTER_INC(obj_file_ptr);
            return FALSE;
        }
        break;

At this point, I think we can just trust based off naming and Ruby people knowing what they’re doing that this function probably closes the IO object properly. I tried tracing it a bit, but it appears as though the rb_io_close and rb_io_fptr_finalize_internal which are the two functions called based off using IO#close or waiting for the GC to sweep the file record work sufficiently differently that I don’t feel like getting to the root :stuck_out_tongue:

1 Like

Thanks, I didn’t know that IO objects were automatically closed when garbage collected.

And Rails indeed never closes the file, as shown in this example:

file = File.open('path/to/image.png')
User.create!(avatar: file)
file.closed? # => false

I don’t know if this is a good thing though.

In Ruby people often use File.open('myfile') { |file| ... } so the file gets automatically closed at the end of the block. Closing files as soon as you’re done with them is considered a good practice, and implicitly relying on the garbage collector to close your IO objects doesn’t strike me as a great practice as you can’t predict when or even if the garbage collector will run. I may be wrong, but if your Ruby program is long-running with a somehow badly optimized or misbehaving GC that never runs, you could end up with a resource leak where you run out of file descriptors because you have too many of them open. This seems to me like a cause for concern.

Now if you provide File.open('path/to/image.png') as shown in my example, maybe it is fine for Rails to leave the file open as you could argue that it is the user’s responsibility to close it, and as you mentioned maybe the user wants to use that opened file for some further processing. But if you provide Pathname.new('path/to/image.png'), Rails automatically opens it here and here, and doesn’t return a reference to the opened file, so you never get a chance to close it yourself. This seems like an issue that could lead to a resource leak as I mentioned above, even if unlikely.

Also, the Rails guides on manually attaching File/IO objects instruct the reader to provide an open IO object, but they don’t instruct the reader to close the object afterwards.

Maybe I am missing something, or maybe this is not so much of an issue in practice. :man_shrugging:

It was right in the docs :man_facepalming: thanks for finding

When you call open on an instance of a Pathname, it returns a new File instance for the given pathname and calls .open on that. You’ll notice if you call .object_id on the file instance, it’s returning a different object each time

irb(main):001> p = Pathname.new "Gemfile"
=> #<Pathname:Gemfile>
irb(main):002> p.open
=> #<File:Gemfile>
irb(main):003> p.open.object_id
=> 40420
irb(main):004> p.open.object_id
=> 41800

For more hacking around you can check if the object has been recycled (YMMV):

irb(main):005> ObjectSpace._id2ref(40420) # (RangeError)
irb(main):006> ObjectSpace._id2ref(41800) # (RangeError)
irb(main):007> GC.disable
=> false
irb(main):008> p.open.object_id
=> 70240
irb(main):009> ObjectSpace._id2ref(70240)

And to your point regarding:

if your Ruby program is long-running with a somehow badly optimized or misbehaving GC that never runs, you could end up with a resource leak where you run out of file descriptors because you have too many of them open. This seems to me like a cause for concern.

While you’re correct that it’s best practice to close files, I think that the (entirely subjective) counter would be that it’s a bit antithetical to Rails itself to sacrifice readability and brevity for what’s likely an unlikely and inconsequential hypothetical. I think reading the Rails Doctrine would explain some of the “why” behind this. Sure, it’s possible that a GC could be misbehaving and run out of FDs. But, that would happen 1 out of every what, billion ruby runtimes? Of all the Ruby code powering production systems as we speak, I’d argue you might expect that exact situation to happen less than a dozen times a year. Subjectively speaking, assigning the file to a variable, creating the record, and then making sure to close the file over 3 lines just isn’t as “beautiful” as inlining it (specifically wrt the documentation thing).

Of course, sometimes these decisions come to bit you in the butt! And that’s why observability and profiling are some important to a production rails application. Realistically, if the GC was having problems, ActiveRecord would exhaust all system resources before you’d expect to run out of file descriptors in a typical rails app. But at the end of the day, rails apps aren’t powering rocket ships, so the tradeoffs between potential problems versus writing conciser, prettier code is going to be different.

But, for almost 100% of rails apps

@message.images.attach(io: File.open('/path/to/file'), filename: 'file.pdf')

is preferred because it’s more “readable” than

File.open('/path/to/file') do |file|
  @messages.images.attach(io: file, filename: 'file.pdf')
end

Just my thoughts though!

1 Like

The GC does mean that this isn’t ever really going to be a problem in practice, but I agree it’s needlessly untidy.

We should change the internal open to be closed again afterwards (whereas if the user supplies an open file, it would be overstepping for us to close it). Then we can change the guide to prefer the Pathname spelling, and just include “you can also supply an IO” as more of an aside.

PR welcome :+1:t2:

1 Like

I played around with ObjectSpace._id2ref in my console and it turns out that dereferenced IOs are immediately garbage-collected so I am inclined to think that my initial concern is a non-issue.

For what it’s worth, I still made a patch to close internal opens, and it passes tests: Comparing main...close-internally · jeromedalbert/rails · GitHub.

But attach only accepts a rewindable io, otherwise you get a "io must be rewindable" error. If we want to change the guide to prefer attach(io: Pathname.new(..., we’d need to make the attach code accept a Pathname, but it may look weird for the user to provide a non-IO Pathname for an io keyword parameter.

Do you think it is still worth making a PR, or should we put the issue to rest?

1 Like

Checked out your commit, great work! Just wanted to remind you about this snippet:

avatar = params[:user_upload] # some IO object
# Update the avatar
current_user.update(avatar: avatar)

# If ActiveStorage internally called .close on the attachable, then the following
# code would raise an IOError
BespokeMirrorService.backup_to_s3_glacier(avatar: avatar)

Your change would be breaking for anyone who relies on the current behavior of ActionDispatch::Http::UploadedFile not being closed after being uploaded to a service.

1 Like