Help with a regular expressions and gsub

Andrew_Timberlake · April 18, 2009, 7:38am

Use word break matches in your regular expression as follows:

<string>.gsub(/\bcar\b/, 'bike')

Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

11175 · April 19, 2009, 8:56pm

Andrew Timberlake wrote:

Andrew_Timberlake · April 20, 2009, 6:08am

This will start to get tricky as it becomes important to know what the rules really are. Are you wanting to avoid car surrounded by an anchor tag, or car surrounded by any tag. car lorem ipusum will pose a problem as it's half surrounded by a tag.

so to satisfy your test case, the following works by making sure that car is not followed by a '<' which will work on both cases mentioned above.

<string>.gsub(/\bcar\b(?=[^<])/, 'bike') This checks that car is surrounded by a word-break but is not followed by a '<'

If, in Ruby 1.9, I do the following: <string>.gsub(/(?<!>)\bcar\b(?=[^<])/, 'bike') I am now checking that car is surrounded by a word-break and is not preceded by a '>' and not followed by a '<' however this pattern will not replace "car lorem" with "bike lorem"

Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

11175 · April 20, 2009, 7:15am

Thanks. When I use <>. what I need is to obtain bike and <a href='#'>car</a>.

Andrew_Timberlake · April 20, 2009, 7:35am

I'm starting to think that regular expressions are not the best way to solve this. You should probably use an HTML parser and then do regular expression substitutions based on where in the DOM you are

Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

11175 · April 20, 2009, 8:38am

Thanks again for yyour help. Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means no math '<', but why should I use the '()' and '?=' ?

Andrew_Timberlake · April 20, 2009, 8:58am

(?=<pattern>) is a look-ahead match that doesn't consume the characters matched In string "cartoon" /car(?=[^<])/ will only match "car" and /car[^<]/ will match "cart" /car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a negative look ahead

(?<<pattern>) is a look-behind match that doesn't consume the characters matched (only Ruby 1.9)

Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain