Nokogiri as a default dependency

This is a great initiative! I just got back into Rails development after being away for a while and one of the things that stood out for me was the nokogiri gem installation time.

First, a lot of people run into problems with getting the gem to compile. From GitHub - sparklemotion/nokogiri: Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. : “There are currently 1,237 Stack Overflow questions about Nokogiri installation.”

Second, the sheer size of the gem and the time it takes to compile can raise fear and uncertainty even if it is working. “Why is this one gem taking so much longer to install than the others?”

17 Likes

Yeah… it’d be great in general to reduce the number of native code / non-pure Ruby gems that Rails requires out of the box.

1 Like

Also the scss binary… I’ve been taking particular note of how set up bogs down at that point

9 Likes

Thanks for bringing this up!

Oga is a good alternative to Nokogiri which installs pretty fast. I’m not sure how difficult it would be to replace the dependency, though.

I think Oga is off the table, it uses MPL 2.0 which is a tricky license for quite a few to consume.

Nokogiri is coming from ActionText, rails-dom-testing and webdrivers.

Bottom line is that we do need some sort of HTML parser to be able to do a bunch of testing.

The bigger elephant in the room is that Nokogiri by iteself is just a bad choice for today’s web.

It does not ship with an HTML5 parser and libxml folks move really really slowly, it is hard to get any traction on issues.

At Discourse we recently moved to https://github.com/rubys/nokogumbo

This means that when you parse stuff such as ℵ the entity ℵ remains and that automatic tag expansion which is defined in the HTML5 spec works as expected:

Eg:

<s>
test

test
</s>

Results in:

test

test

I think we absolutely should improve the story here, but surprisingly the issue is far beyond install times. It is more that we want Rails to ship with a proper / maintained HTML5 parser.

Gumbo looks really good, BUT Google have not been committing to it for many years from what I can tell.

Maybe the best outcome is port of : https://github.com/kovidgoyal/html5-parser that implements nokogiri like methods. I don’t know.

Nokogiri+nokogumbo is the best that exists today, but I just replaced the 1 dependency complaint with 2.

I entirely agree with @avdi that install times for sassc are pretty brutal, they are even worse than nokogiri.

That said, install times are something we can all sort out by starting to push native gems. gem install libv8 installs ultra fast across multiple platforms. Perhaps we need a task force to assist in publishing pre-compiled gems?

There is a bunch of discussion on GitHub about the slowness of sassc compilation:

https://github.com/sass/sassc-ruby/issues/189

I guess there are some low hanging fruit here prior to doing native gems which can totally erase all the wait times on common platforms.

11 Likes

That said, install times are something we can all sort out by starting to push native gems.

For what it’s worth, Nokogiri is working on pushing precompiled binaries Ship precompiled gems for linux · Issue #1983 · sparklemotion/nokogiri · GitHub (for linux at least, not sure why no OSX).

For a very long time the precompiled binaries support in rubygems/bundler was wonky, but it’s getting better now.

3 Likes

I have to +1 this - whenever new people join my team and try to set up their environment, Nokogiri almost always bites them. Often it installed fine for one Ruby on their machine, but when they switch to a different one with RVM it fails. I’ve had to Google + StackOverflow numerous times myself to get it to stop being confused. I’ve also run into the issue with the lack of HTML5 support.

Maybe there isn’t an immediate fix, but I think this should be something we as a community try to tackle at least slowly.

And Nokogiri problems crop up for me all too frequently. I’ve never figured out the problem. I think I start removing gems and 'bundle until the problem goes away then put the removed gems back. Just looked at some of my notes and I didn’t write this down. But one note is four years old, so this isn’t a new problem. I assume it’s some kind of dependency problem, but that is something I know next to nothing about.

I figured it’s worth updating this post with the current state of the world, to address some of the complaints that were very valid in 2020 (but are no longer correct in 2023):

  • Nokogiri is available pre-compiled which has eliminated installation problems if you’re on a modern Ruby and a common architecture and operating system
  • Nokogiri offers an HTML5 parser
  • Rails 7.1 will ship with HTML5 sanitization as well
4 Likes