RE lookahead RE problems

I'll start by confessing that this comes originally from something I
worked on in Perl, and I've assumed, rightly or wrongly, that regular
expressions are regular expressions are regular expressions.

See
http://www.ilovejackdaniels.com/cheat-sheets/regular-expressions-cheat-sheet/

The context is that there are a whole pile of patterns that must be
preceded by .... .. well, not words or _some_ punctuation. Call them
"sort-of zero width". That is, white space, beginning of line and some
opening sequences, call the '['and (' and '{' for the sake of the
example, are allowed.

I'm trying to put the RE into a 'constant' so that I don't have to keep
repeating it - all the DRY stuff about changes and so forth!

I'm trying to use RE's lookahead.

This works in perl

       $STARTWORD = qr/^|(?<=[\s\(\[\{])/m;

There is also the corresponding end word

       $ENDWORD = qr/$|(?=[ \t\n\,\.\;\:\!\?\)])/om;

When I translate these into Ruby I get an error,
It doesn't seem to like the lookbehind
The error message is

    SyntaxError undefined (?...) sequence: /^|(?<=[\s\(])/

Well, possibly. Or it may be that it I'm having problems when combining
it with an actual pattern.

What I've done is separate out the pattern to a constant (and tried to
eliminate things that might confuse the parser)

   STARTWORD = %r{^|(?<=[\s\(])}m

An LO! The parser chokes on that.
Does it choke because there isn't actually pattern being compared?
Well, maybe. If I remove the '%r{' stuff the parser doesn't choke.
But it doesn't choke on

    ENDWORD = %r{$|(?=[\s,.;:!?)])}m

And I seem to be getting confused when combining these with other
regular expressions because of this inconsistency.

Right now I don't know if the problem is having the REs as constants.
Does this make them 'precompiled'?
   ENDWORD.type ==> "Regexp"
so I'm presuming it is. In which case why can't I precompile STARTWORD?

So: Is it that Ruby can't handle the '?<=' lookbehind assertion ... or
what? Am I completely hung up on a wrong track?

The ruby regular expression engine doesn't support look-behind.

http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

As far as I know, look-behind assertions are not handled by
Ruby 1.8.* but I think Oniguruma in 1.9 can.

You should ask your question in Ruby-Talk mailing list,
which is a better appropriate place for this kind of question.

    -- Jean-François.

John Harrison said the following on 16/01/08 12:43 PM:

The ruby regular expression engine doesn't support look-behind.

http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

{{ExpletiveDeleted!}}

Suggestions?

Oniguruma

http://oniguruma.rubyforge.org/

This engine is the RegExp engine for Ruby 1.9 and onwards, so you only need this gem for 1.8.x.

Jason

Jason Roelofs said the following on 16/01/08 01:21 PM:

Oniguruma

http://oniguruma.rubyforge.org/

This engine is the RegExp engine for Ruby 1.9 and onwards, so you only
need this gem for 1.8.x.

Roll on 1.9 then, because I get pages and pages of error messages when I
try installing this gem, starting with

oregexp.c:2:23: error: oniguruma.h: No such file or directory

Now that can't be because I don't have the Ruby sources installed, can it?

The Oniguruma gem is just a wrapper around the actual library. I haven’t installed this myself, though I assumed it would come with the needed code. You just need to install Oniguruma itself, then get the gem.

Jason

The library can be found here: http://www.geocities.jp/kosako3/oniguruma/

I am trying to get look behind working as well. However, having got
past the errors, I am now wrestling with syntax:

** Starting Rails with development environment...
Exiting
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require': ./lib/string_extensions.rb:4: undefined
(?...)
sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError)
./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(?
<=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/
rubygems/custom_require.rb:27:in `require'
It seems to be complaining about the look-behind and look-ahead
assertions in the following code fragment (which origuruma is
supposed
to support):
class String
  def vowels
    scan(/[aeiou]|(?<![aeiou])y(?![aeiou])/i)
  end
  def consonants
    scan(/![aeiou]|(?<=[aeiou])y(?=[aeiou])/i)
  end
end
According to this reference (http://www.geocities.jp/kosako3/oniguruma/
doc/RE.txt), the look behind and look ahead syntax that I am using
appears to be correct (ref section 7. Extended groups) but apparently
is not.

<stumped/>

Thanks for all the help everyone. The problem was solved with the help
from pullmonkey on Rails Forum! Here is the solution:

Objective:

1. Extract vowels and consonants from a string
2. Handle the conditional treatment of 'y' as a vowel under the
following circumstances:
     - y is a vowel if it is surrounded by consonants
     - y is a consonant if it is adjacent to a vowel

Here is the code that works:

  def vowels(name_str)
    reg = Oniguruma::ORegexp.new('[aeiou]|(?<![aeiou])y(?![aeiou])')
    reg.match_all(name_str).to_s.scan(/./)
  end

  def consonants(name_str)
    reg = Oniguruma::ORegexp.new('[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y|
y(?=[aeiou])')
    reg.match_all(name_str).to_s.scan(/./)
  end

(Note, the .scan(/./) can be eliminated to return an array)

The major problem was getting the code to accurately treat "y" as a
consonant. The key to solving this problem was to:

1. define unconditional consonants explicitly (i.e.,
[bcdfghjklmnpqrstvwx]) -- not as [^aeiou] which automatically includes
"y" thus OVER-RIDING any conditional reatment of "y" that follows

2. define conditional "y" regexp assertions independently, i.e., "| (?
<=[aeiou]) y | y (?=[aeiou])" -- not "|(?<=[aeiou]) y (?=[aeiou])"
which only matches "y" preceded AND followed by a vowel, not preceded
OR followed by a vowel

HTH.