Hi gents,
I am playing around with an idea to improve the performance of singularize and pluralize for Rails 4.0. In my proof of concept I see some 5x boost, but it relies an assumption that I’d like to consult with you all. Let me explain.
As you know, inflection rules have a lhs which is a string or regexp, and a replacement string as rhs.
The current implementation collects the rules in an array, and to apply them to a particular word the array is iterated. The first pattern that matches is the one applied. In particular, the most common rule (eg append “s” to form a plural), is the last one because most specific rules come first. By default we have +30 rules for singulars and +30 for plurals.
My idea is to build a single regexp with an alternation, detect which segment matches, and apply its replacement. That is, let the regexp engine itself do the loop. Much faster.
I have this working with a quick hack that has shown there’s a potential speedup here. To be able to know which regexp is the one that matched I use named captures. For example (?<_25>(ax)is) would be the alternation corresponding to the regexp /(ax)is/, if that’s the 26th pattern. It won’t win me an elegance award, but it is a hack that works (I could workaround name clashes easily if the user happens to use _25 himself, that’s not important).
Named captures are the only way I’ve seen to be able to build the alternation and at the same time know which part matches. Because existing inflection regexps have captures.
This is the proof-of-concept: https://gist.github.com/1798985.
I believe this is correct as long as the user regexps have no backreferences, because if you have a = /(…)\1/ and b = /(.)\1/ then “xx” matches b, but does not match the union a|b because the backreference \1 in b now refers to the group in a.
OK, so this is the question: do you guys use backreferences in custom inflections? If you didn’t we could consider ruling them out for 4.0 to be able to implement this.