I'll start by confessing that this comes originally from something I worked on in Perl, and I've assumed, rightly or wrongly, that regular expressions are regular expressions are regular expressions.
See http://www.ilovejackdaniels.com/cheat-sheets/regular-expressions-cheat-sheet/
The context is that there are a whole pile of patterns that must be preceded by .... .. well, not words or _some_ punctuation. Call them "sort-of zero width". That is, white space, beginning of line and some opening sequences, call the '['and (' and '{' for the sake of the example, are allowed.
I'm trying to put the RE into a 'constant' so that I don't have to keep repeating it - all the DRY stuff about changes and so forth!
I'm trying to use RE's lookahead.
This works in perl
$STARTWORD = qr/^|(?<=[\s\(\[\{])/m;
There is also the corresponding end word
$ENDWORD = qr/$|(?=[ \t\n\,\.\;\:\!\?\)])/om;
When I translate these into Ruby I get an error, It doesn't seem to like the lookbehind The error message is
SyntaxError undefined (?...) sequence: /^|(?<=[\s\(])/
Well, possibly. Or it may be that it I'm having problems when combining it with an actual pattern.
What I've done is separate out the pattern to a constant (and tried to eliminate things that might confuse the parser)
STARTWORD = %r{^|(?<=[\s\(])}m
An LO! The parser chokes on that. Does it choke because there isn't actually pattern being compared? Well, maybe. If I remove the '%r{' stuff the parser doesn't choke. But it doesn't choke on
ENDWORD = %r{$|(?=[\s,.;:!?)])}m
And I seem to be getting confused when combining these with other regular expressions because of this inconsistency.
Right now I don't know if the problem is having the REs as constants. Does this make them 'precompiled'? ENDWORD.type ==> "Regexp" so I'm presuming it is. In which case why can't I precompile STARTWORD?
So: Is it that Ruby can't handle the '?<=' lookbehind assertion ... or what? Am I completely hung up on a wrong track?