Or I guess it could mean both Ruby and Go are not accessing the macOS filesystem in the most performant way. Apple has plenty of performance tips on filesystem performance - Performance Tips
I wonder if ramdisk on Mac is slower than ramdisk on a VM on the same system
If this is the case it would indicate there is some sort of filter driver slowing stuff down
@tenderlove what approach should we be taking here, putting all gems into one file in some sort of cache (a zero validation bootsnap sort of option), or should we be giving up and kicking a fuss with our Apple friends? something else?
I made an APFS ram disk on macOS:
diskutil partitionDisk $(hdiutil attach -nomount ram://2048000) 1 GPTFormat APFS 'ramdisk' '100%'
Copied 10,000 files each of a random 100 bytes, and then ran the same Go code as before against the ramdisk. The initial run was much quicker than directly against the filesystem, but still not super fast. After multipe runs it also still ended up at around the same 240ms.
$ ./gofile difference = 646.605246ms $ ./gofile difference = 440.253606ms $ ./gofile difference = 340.012913ms $ ./gofile difference = 271.434962ms ... $ ./gofile difference = 232.221546ms $ ./gofile difference = 238.020729ms
I repeated with a HFS+ ramdisk and got similar results.
It’s not like there is many alternatives to
read() syscalls… And the documentation you point to is for Switft/Objective-C APIs, it’s safe to assume they use these syscalls under the hood just like Go, Ruby any pretty every programming languages.
I have heard from a small avian friend that the macOS filesystem sandbox security guarantees mean that
read() is around the same speed but
open() is dramatically slower due to sandbox/security guarantees. It also seems like those guarantees are non-negotiable from the OS security side. I think that leaves options like:
- Try to cooperate with someone who has the right job to optimize the security stuff without disabling it, with a payoff in months or years.
- Try to add something to Bootsnap or Bundler that reduces calls to
open(). Has anyone tried concatenating all their library files into one giant file and comparing perf that way?
I suppose there’s nothing stopping us from doing both.
So that backs up my hunch above.
It also means it doesn’t only interest Rubyist but pretty much all developers. Even if I don’t have high hopes, publicizing this on popular places such as HN might actually lead to something long term. But yes, I agree that this isn’t something you can do now and wait on.
I tried several variations of that a few years back when I was working on bootscale (bootsnap’s father), but there are many problems I didn’t find a solutions to. Out of the top of my mind changing the code path mean you need to rewrite all
__LINE__ and dependant calls such as
However what we can do much more easily to get most of gain with little effort is to store the Bootsnap iseq in a giant indexed file. Because anyway, after your first boot you’re no longer reading the ruby source file, but the bootsnap cache.
We just need to have an efficient way to keep a big mmaped hash opened. Something like this: GitHub - luispedro/diskhash: Diskbased (persistent) hashtable
@indirect does your small avian friend knows wether
stat() is also impacted?
This uses LMDB as a backend to store the ISeqs. We’ve been using LMDB as a store for sprockets since years, it’s a bit finicky to use, but I think we’ve ironed most of the bugs out since then.
On micro benchmarks
LMDB#get it’s 4 times faster than
File.read on my machine: lmdb.rb · GitHub .
Also on paper, this save
open() syscall per ruby file loaded, and if we were to consider gems content as immutable, we could also avoid the
stat used to validate the cache freshness.
However when testing it again our app I can’t seem to see any performance improvements. I’d like to try it against the discourse benchmark, but I’m having trouble setting it up.
It’s possible that what is gained by avoiding many
open() syscalls is lost by going through the LMDB bindings and managing these blobs in Ruby rather than in C like the regular bootsnap cache store does. For instance bootsnap uses
rb_str_new_static to avoid copying the cache blobs. To do the same I’d need to query LMDB from C.
Sorry to barge right in but does the secure boot layer change work in the startup boot utility of OSX?
The screenshot below mentions something about secure boot but don’t know if it has anything to do with the disk access or whether you can disable it for the current system.
That’s not secure boot we’re taking about but “System Integrity Protection”: How to turn off System Integrity Protection on your Mac | iMore
I turned off System Integrity and re-ran my go code. Same sort of results as before:
$ ./gofile difference = 2.72653261s $ ./gofile difference = 1.159182558s $ ./gofile difference = 605.57577ms $ ./gofile difference = 388.847999ms ... $ ./gofile difference = 254.285507ms $ ./gofile difference = 250.556632ms
Unfortunately, my source does not know whether stat() is also slowed down.
To be clear, the security/sandbox I am referring to here has nothing to do with secure boot or system integrity protection. Modern versions of macOS have per-process access controls for basically all hardware, including the file system. There’s a conceptual introduction here if you’re interested:
The kernel-level security framework that keeps processes from automatically having access to your camera and all your files. That thing. You can’t turn it off.
FWIW, I think we should at least try to put together a Radar for this. I think we’re close to a reproducible example.
I understand the guarantees that might be causing this behavior are non-negotiable, but it’s possible that no one at Apple realized they’re slow in these cases and no one’s ever tried to optimize them.
Yes, I definitely agree with this. I have also heard that there might be available headcount on the team responsible, so if anyone wants to work on it, get in touch.
On behalf of thousands of developers feeling this every day I just want to say this thread makes me very very excited.
Sooooooo did anyone manage to get anywhere with flagging this to Apple?
I am sure they read this boot on m1 is way better even with the slow file access, you just need native vs Rosetta to get the full bang
Almost night and day difference for me. Identical RSpec test on M1 is almost 6x faster in the “files took X seconds to load” bit.