This will start to get tricky as it becomes important to know what the
rules really are.
Are you wanting to avoid car surrounded by an anchor tag, or car
surrounded by any tag.
<p>car lorem ipusum</p> will pose a problem as it's half surrounded by a tag.
so to satisfy your test case, the following works by making sure that
car is not followed by a '<' which will work on both cases mentioned
above.
<string>.gsub(/\bcar\b(?=[^<])/, 'bike')
This checks that car is surrounded by a word-break but is not followed by a '<'
If, in Ruby 1.9, I do the following:
<string>.gsub(/(?<!>)\bcar\b(?=[^<])/, 'bike')
I am now checking that car is surrounded by a word-break and is not
preceded by a '>' and not followed by a '<' however this pattern will
not replace "<p>car lorem" with "<p>bike lorem"
I'm starting to think that regular expressions are not the best way to
solve this.
You should probably use an HTML parser and then do regular expression
substitutions based on where in the DOM you are
Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math '<', but why should I use the '()' and '?=' ?
(?=<pattern>) is a look-ahead match that doesn't consume the characters matched
In string "cartoon"
/car(?=[^<])/ will only match "car" and /car[^<]/ will match "cart"
/car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a
negative look ahead
(?<<pattern>) is a look-behind match that doesn't consume the
characters matched (only Ruby 1.9)