Why is Rails boot so slow on macOS?

Based on your conclusion that the difference comes from IO performance, maybe you could try to read from a RAMFS partition? It would probably be tricky to properly integrate in a development environment, but it’s worth a try.

8 Likes

Sam said on Twitter he tested w/ a ramdisk and saw no significant change. That makes sense to me as a 4x slowdown in random read performance on 2 machines w/SSDs is going to have to be software related.

This looks like a non-Rails-specific issue that others have encountered.

Questions still open here for me:

  1. Does an encrypted/decrypted partition make a difference?
  2. Does HFS+ or another filesystem on MacOS perform better or worse?
  3. Does a virtualized or dockerized environment perform better or worse on Sam’s benchmark?
  4. Can we wrap a profile around this benchmark to see which parts exactly are slower?
3 Likes

I think when we ran the test we only tested putting the application directory on the ramdisk, I am guessing to get any material boost we would need to keep all the gems on the ramdisk.

Agree completely that even if we manage to prove this is way faster, integrating it into a dev environment may be too difficult. One possible alternative would be to have a “non validating” mode for bootsnap where it does not stat files of validate and simply loads stuff directly from the cache?

@nateberkopec one wonderful thing about this benchmark is that we can all run it. All you need is redis and postgres installed, it takes care of the rest including configuring a standalone pg process and standalone redis process.

Perhaps you know someone who wants to get in to “performance” who could try answering these questions? Just play with knobs and re-run the bench.

I strongly suspect standalone virtualbox image will boot faster on a Mac than native Mac, which is kind of scary. I think Docker is painfully slow once you involve any cached filesystem that replicates to the host. We tried running everything inside docker, but the problem there is that it runs in a overlayfs volume which by its nature is really slow, if you could mount a native ext3fs volume to it somehow, it may be fast, but replicating to the host will be tricky.

4 Likes

Yes, for most app gems contain way more files than the actual app. Testing with the gems on the RAMdisk shouldn’t be too hard if you use bundle --path to install gems in a subdirectory.

It would also be interesting to test both HFS+ (is that still available on newer OSX?) and APFS ramdisk, if there is a notable difference between the two that could confirm that the performance difference is in the filesystem implementation.

It’s all conjecture, but it really sounds like OSX isn’t keeping files in cache as aggressively than Linux. That’s what the RAMdisk would help prove. If the file system cache works properly, the vast majority of your benchmark file access should be done in RAM if you run it twice.

If the RAMdisk is almost as slow as the real disk, then it’s not a cache problem.

1 Like

I just ran benchmark on a Macbook 12-inch (Early 2015, 1.1 GHz Intel Core M, 8GB, macOS 10.14.6). Ruby v2.6.5 via rbenv. Postgresql v12.2 installed via homebrew.

Running benchmark on 1 core
Topic View: 1.907 (±14.4%) op/s
Topic Create: 1.114 (±3.4%) op/s
Post Spec: 51.092 (±16.5%) seconds
PBKdf2 64000 sha256: 2.693 (±12.6%) op/s
Cook README.md: 36.738 (±16.8%) op/s
Discourse Boot: 10.181 (±28.1%) seconds
Random Read: 1119.328 (±308.1%) op/s

Then on the same machine, but via Vagrant(v2.2.6) & Virtualbox(v6.1.0). VM was running Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-29-generic x86_64). Assigned only 1GB of RAM and a single core. Ruby v2.6.5 again via rbenv. Postgresql v10.6 OS package.

Running benchmark on 1 core
Topic View: 2.789 (±10.1%) op/s
Topic Create: 1.822 (±4.7%) op/s
Post Spec: 39.85 (±4.0%) seconds
PBKdf2 64000 sha256: 5.55 (±23.3%) op/s
Cook README.md: 43.146 (±37.4%) op/s
Discourse Boot: 3.774 (±3.5%) seconds
Random Read: 3056.49 (±413.0%) op/s

Just a quick test, but Discourse Boot and Random Read are nearly 3x faster inside VM.

3 Likes

I ran a ruby-prof around a single run of Random Read test. Comparing macOS vs a VM on macOS:

macOS

Ubuntu VM on macOS

On the VM random_file is 22% of the time, compared to 4% on macOS. Assuming they take the same real world time on both VM and macOS then it suggests Class::IO#read is 5x slower on macOS?

3 Likes

I simplified the test to remove all the PG/Redis stuff and the thread forking. It just created the files and ran the 11 random reads. I benchmarked with benchmark-ips and this is what i got:

RANDOM_FILES = 5_000
RANDOM_FILE_SIZE = 10_000

macOS
random_read      1.712k (±10.3%) i/s -      8.550k in   5.069574s

VM on macOS
random_read      4.593k (± 8.6%) i/s -     22.984k in   5.052378s

And I tested with fewer files:

RANDOM_FILES = 50
RANDOM_FILE_SIZE = 10_000

macOS
random_read      1.866k (± 6.2%) i/s -      9.324k in   5.020224s

VM on macOS
random_read      6.176k (± 8.0%) i/s -     30.798k in   5.024956s

More testing. With tiny files the performance difference increases

macOS:

Creating 500 random files for IO tests of size 50000
random_read    593.334  (±17.4%) i/s -      2.856k in   5.010175s

Creating 500 random files for IO tests of size 5
random_read      1.651k (± 5.8%) i/s -      8.296k in   5.041559s

VM on macOS:

Creating 500 random files for IO tests of size 50000
random_read      2.580k (±25.0%) i/s -     12.250k in   5.308382s

Creating 500 random files for IO tests of size 5
random_read     10.451k (± 7.5%) i/s -     52.377k in   5.044150s

I tried to install valgrind/callgrind to profile the C code. But so far not had any success in getting that to work.

UPDATE 2020-04-01 (I cannot make any more posts as new user, so editing this)

I gave up on valgrind and tried Apple’s Instruments. I ran a Time Profile while running my benchmark-ips test. Seems like the majority of the time is being spent in open_key_argshttps://github.com/ruby/ruby/blob/d214c188e48db5049fba8049a57dd6f73b1a57b0/io.c#L10614

Drilling in more gives:

5 Likes

I profiled randomly reading files with File.read(random_file) on both macOS (10.14.6) and then repeated on a VM on the same computer: Vagrant(2.2.6), Virtualbox(6.1.0), Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-29-generic x86_64). Both are running Ruby 2.6.5.

Here is the key areas of the profile.

macOS:

Ubuntu VM on macOS:

So maybe open_key_args is running much slower on macOS?

3 Likes

On macOS it looks like a lot of time is spent in rb_cloexec_open ruby/io.c at d214c188e48db5049fba8049a57dd6f73b1a57b0 · ruby/ruby · GitHub

If I am understanding this correctly, then in the VM very little time is spent in rb_cloexec_open:

I don’t know C, so I cannot really go much further with this. But if anyone needs anymore info then please let me know.

5 Likes

@g101 your profiles are interesting. They show that most of the time is spent either in the open syscall itself, or waiting to for it (there is some busy looping if it would block).

At this stage it would be interesting to reduce the benchmark to a simple C program doing lots of fopen.

I’m starting to wonder wether open on OSX might be slow because of the extra layers of file system security, with signed binaries and all they added a few versions back (System Integrity Protection - Wikipedia), in the same way that SELinux and AppArmor, or many anti-virus on Windows can have performance overhead on file access as well (but much lower than what is witnessed here).

5 Likes

Newer Macbooks have the T2 chip so I think even encrypted APFS is meant to be fast.

I have no idea if this is relevant, but it discuss slow I/O reads on OS X using C++. And the issue seemed to come down to compiler options c++ - LLVM Clang produces extremely slow input/output on OS X - Stack Overflow

3 Likes

I’m not talking about encryption overhead, here.

I suggest that everytime you open a file (open("/foo/bar")), the kernel has to check if the process is allowed to open that path (not just unix permissions). I’m suspecting this check is slow.

This could be true with the latest introduction of the security thing which we see on Catalina. When you’re trying to open something, macOS just asks for permission. It’s possible that even when running Rails apps, it just tries every file we open with that same check.

Whenever I’m developing rails apps in docker on a Mac I always I have to mount an NFS volume to get anywhere near decent performance. Our app not only needs to load a lot of Ruby files, but also a bunch of static assets during each page load. Hitting our app’s dashboard after a cold boot can take minutes. Interesting to hear from this article that there might be something inherently slow about the Mac’s filesystem.

2 Likes

The Docker file system performance issues may not be related, as a lot of those come from the fact than docker on a mac runs in a Linux VM, and Docker for mac uses a “bridge” to allow this VM (and thus your containers) to access your local filesystem. This has a huge performance impact.

2 Likes

Right, that’s true. I’m wondering however if the MacOS filesystem contributes to the overall slowness in some capacity, considering the article mentions reading many small files is slower than when using other operating systems.

1 Like

I am not a GoLang expert but I created some code to randomly read small files. I created a folder with 10,000 files of size 100 bytes.

import (
	"fmt"
	"io/ioutil"
	"math/rand"
	"time"
)

func main() {
	rand.Seed(time.Now().UnixNano())
	start := time.Now()
	for i := 0; i < 10000; i++ {
		file := fmt.Sprint("random_files/", rand.Intn(10000))
		_, err := ioutil.ReadFile(file)
		if err != nil {
			fmt.Println("Err")
		}
	}
	t := time.Now()
	difference := t.Sub(start)
	fmt.Printf("difference = %v\n", difference)
}

On macOS Mojave 10.14.6 the results were:

user$ ./gofile 
difference = 3.300336361s
user$ ./gofile 
difference = 1.16395616s
user$ ./gofile 
difference = 693.500542ms
...
user$ ./gofile 
difference = 240.043622ms

After multiple runs the time eventually leveled off at around 240ms. I assume once all the files were cached by the OS.

Running the same code on same computer but inside a VirtualBox VM running Ubuntu 18.04 LTS, I got the results:

$ ./gofile 
difference = 137.963123ms
$ ./gofile 
difference = 129.356798ms
$ ./gofile 
difference = 130.005818ms
$ ./gofile 
difference = 133.438552ms
$ ./gofile 
difference = 115.163246ms
$ ./gofile 
...
$ ./gofile 
difference = 97.664743ms

This time leveling off around 100ms.

3 Likes

Wow so MacOS is almost 2.5x slower for that test :frowning:

2 Likes

MacOS is almost 2.5x slower for that test

Yep. So that go test proves that as suspected above the performance issue is somewhere in Apple’s code.

I doubt there’s much that can be done beside making a lot of noise about it, and hope Apple fixes it some day.

We could also try to workaround it by bundling multiple small files into one, but that is a quite tricky, and hard to justify for working around an OSX specific bug.

Or I guess it could mean both Ruby and Go are not accessing the macOS filesystem in the most performant way. Apple has plenty of performance tips on filesystem performance - Performance Tips

1 Like