RE lookahead RE problems

I'll start by confessing that this comes originally from something I
worked on in Perl, and I've assumed, rightly or wrongly, that regular
expressions are regular expressions are regular expressions.


The context is that there are a whole pile of patterns that must be
preceded by .... .. well, not words or _some_ punctuation. Call them
"sort-of zero width". That is, white space, beginning of line and some
opening sequences, call the '['and (' and '{' for the sake of the
example, are allowed.

I'm trying to put the RE into a 'constant' so that I don't have to keep
repeating it - all the DRY stuff about changes and so forth!

I'm trying to use RE's lookahead.

This works in perl

       $STARTWORD = qr/^|(?<=[\s\(\[\{])/m;

There is also the corresponding end word

       $ENDWORD = qr/$|(?=[ \t\n\,\.\;\:\!\?\)])/om;

When I translate these into Ruby I get an error,
It doesn't seem to like the lookbehind
The error message is

    SyntaxError undefined (?...) sequence: /^|(?<=[\s\(])/

Well, possibly. Or it may be that it I'm having problems when combining
it with an actual pattern.

What I've done is separate out the pattern to a constant (and tried to
eliminate things that might confuse the parser)

   STARTWORD = %r{^|(?<=[\s\(])}m

An LO! The parser chokes on that.
Does it choke because there isn't actually pattern being compared?
Well, maybe. If I remove the '%r{' stuff the parser doesn't choke.
But it doesn't choke on

    ENDWORD = %r{$|(?=[\s,.;:!?)])}m

And I seem to be getting confused when combining these with other
regular expressions because of this inconsistency.

Right now I don't know if the problem is having the REs as constants.
Does this make them 'precompiled'?
   ENDWORD.type ==> "Regexp"
so I'm presuming it is. In which case why can't I precompile STARTWORD?

So: Is it that Ruby can't handle the '?<=' lookbehind assertion ... or
what? Am I completely hung up on a wrong track?

The ruby regular expression engine doesn't support look-behind.

As far as I know, look-behind assertions are not handled by
Ruby 1.8.* but I think Oniguruma in 1.9 can.

You should ask your question in Ruby-Talk mailing list,
which is a better appropriate place for this kind of question.

    -- Jean-François.

John Harrison said the following on 16/01/08 12:43 PM:

The ruby regular expression engine doesn't support look-behind.




This engine is the RegExp engine for Ruby 1.9 and onwards, so you only need this gem for 1.8.x.


Jason Roelofs said the following on 16/01/08 01:21 PM:


This engine is the RegExp engine for Ruby 1.9 and onwards, so you only
need this gem for 1.8.x.

Roll on 1.9 then, because I get pages and pages of error messages when I
try installing this gem, starting with

oregexp.c:2:23: error: oniguruma.h: No such file or directory

Now that can't be because I don't have the Ruby sources installed, can it?

The Oniguruma gem is just a wrapper around the actual library. I haven’t installed this myself, though I assumed it would come with the needed code. You just need to install Oniguruma itself, then get the gem.


The library can be found here:

I am trying to get look behind working as well. However, having got
past the errors, I am now wrestling with syntax:

** Starting Rails with development environment...
`gem_original_require': ./lib/string_extensions.rb:4: undefined
sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError)
./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(?
<=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/
rubygems/custom_require.rb:27:in `require'
It seems to be complaining about the look-behind and look-ahead
assertions in the following code fragment (which origuruma is
to support):
class String
  def vowels
  def consonants
According to this reference (
doc/RE.txt), the look behind and look ahead syntax that I am using
appears to be correct (ref section 7. Extended groups) but apparently
is not.


Thanks for all the help everyone. The problem was solved with the help
from pullmonkey on Rails Forum! Here is the solution:


1. Extract vowels and consonants from a string
2. Handle the conditional treatment of 'y' as a vowel under the
following circumstances:
     - y is a vowel if it is surrounded by consonants
     - y is a consonant if it is adjacent to a vowel

Here is the code that works:

  def vowels(name_str)
    reg ='[aeiou]|(?<![aeiou])y(?![aeiou])')

  def consonants(name_str)
    reg ='[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y|

(Note, the .scan(/./) can be eliminated to return an array)

The major problem was getting the code to accurately treat "y" as a
consonant. The key to solving this problem was to:

1. define unconditional consonants explicitly (i.e.,
[bcdfghjklmnpqrstvwx]) -- not as [^aeiou] which automatically includes
"y" thus OVER-RIDING any conditional reatment of "y" that follows

2. define conditional "y" regexp assertions independently, i.e., "| (?
<=[aeiou]) y | y (?=[aeiou])" -- not "|(?<=[aeiou]) y (?=[aeiou])"
which only matches "y" preceded AND followed by a vowel, not preceded
OR followed by a vowel