Optimizing ActiveSupport with native code

I've had this idea kicking around in my head for a while, and had some
time yesterday to start playing around with it: ActiveSupport is
heavily used in both the Rails library code and in the application
layer of a typical Rails stack. Certain parts of AS would be much more
efficiently implemented in C rather than Ruby; that optimization could
potentially have some noticeable, positive effects on a Rails app's
performance. So, why not write a library that swaps out the
appropriate ActiveSupport methods with native C implementations?

So I'm looking for feedback from the group on that idea. Some
questions that come to mind:

* Is this a new idea? I did some googling around and didn't find
anything, but I don't want to reinvent the wheel here.
* Is this a bad idea? Of course AS itself wouldn't want to restrict
itself to a particular Ruby interpreter, but I don't see any harm in
an add-on library that optimizes it for MRI. Am I missing anything?
* How widely applicable is it? The ActiveSupport::Inflector singleton
provides some *very* low-hanging fruit - unscientific benchmarking is
suggesting 10x speed improvements in #underscore, #camel_case, etc.
But native implementations are probably only useful for methods that
perform non-trivial work that has no direct relation to the Ruby space
- string manipulation and arithmetic being the obvious candidates. Is
it worth trying to provide a comprehensive suite of ActiveSupport
native implementations?
* How about using existing bindings to C libraries? So far I've
focused on simply reimplementing individual AS methods using pure C,
but, for instance, it might be worth reimplementing ActiveSupport's
XML support using Nokogiri, etc.

If folks think this is a worthwhile idea, I'd love to get as many
people as possible involved. But for now, I'd appreciate any feedback
y'all have.

The tinkering I did yesterday is here. It's basically just a single C
file with several native implementations of AS methods, and a single
Ruby file which benchmarks the methods against their pure-Ruby
equivalents and also checks that the output of the corresponding
methods is the same across a set of inputs:


Looking forward to thoughts, comments, criticism, stinging insults,
etc. if you've got 'em.


This looks great, but have you considered the complexities of i8n on a native implementation?

That's a great point - I hadn't thought of that specifically, but even
just looking at the Inflections module, it became clear that the
approach has its limits. For instance, the #pluralize and #singularize
methods (which are called internally by several other inflectors)
basically iterate over a collection of regular expressions, which are
user-definable, looking for one that matches the input string. That's
pretty inefficient, but it's not immediately clear to me how to do it
better while maintaining the flexibility needed for I18n (and
customization generally), or how a native code implementation would
improve the situation. So I guess the idea would be to focus on
methods whose behavior is not locale-dependent first, to get the
biggest bang for the coding buck.

That said, it does occur to me that Inflector#ordinalize should be
locale-aware - as I recall the implementation in the
ActiveSupport::Inflector module is not, but I'm assuming it's
overridden somewhere by the I18n module. I'd say there are various
potential approaches to this problem - avoiding locale-dependent
methods as mentioned above, but also potentially providing
locale-specific C implementations, which could be selected among in
the Ruby layer, with a fallback to the pure-Ruby implementation. Just
thinking out loud here, but I don't think it has to be a deal-breaker.

Thanks for the feedback!


Steve Ross wrote:

Well, lots. Look at Mat's prototype implementation and the ramifications will become immediately apparent.