Removing a block of text within a string

Bala_Paranj · December 7, 2006, 6:45am

my_string = "<pre>dflakdjflakj</pre> my_string.gsub(/[^<pre>][</pre>$], '')

You basically create a regular expression to find the <pre> and </pre> and nuke all the characters when you find the match.

I dont think my regex is correct. Advanced Ruby developers here can help.

rab · December 7, 2006, 3:50pm

Can someone help me with how to do this? What would the "strip_code_snippets" method look like? I think I'd be fine finding the <pre> tag but I don't know how to completely remove the text from <pre> to </pre>. Any help would be greatly appreciated. Thanks in advance.

my_string = "<pre>dflakdjflakj</pre> my_string.gsub(/[^<pre>][</pre>$], '')

You basically create a regular expression to find the <pre> and </pre> and nuke all the characters when you find the match.

I dont think my regex is correct. Advanced Ruby developers here can help.

>> my_string = "This is my little block of code <pre>puts 'I need to learn about Regexp'\nputs 'Will you help?'</pre>" => "This is my little block of code <pre>puts 'I need to learn about Regexp'\nputs 'Will you help?'</pre>" >> regexp = %r{<pre\b[^>]*>.*?</pre>}m => /<pre\b[^>]*>.*?<\/pre>/m >> my_string.gsub(regexp, '') => "This is my little block of code "

The square brackets [...] enclose a character set and [^...] enclose a negated set (not one of those characters. The previous posting isn't even well-formed syntactically.

Here's a little explanation to get you started:

regexp = %r{<pre\b[^>]*>.*?</pre>}m <pre = literal character matching \b = word boundary [^>] = character set matching anything that's NOT a > * = zero or more times > = literal > .*? = any character (.) repeated zero or more times, but as few as possible to let the regexp match (*?) Note: this is getting rather advanced, you can look at the pickaxe pp.68-77 </pre> = literal character matching

The %r{ } is an alternate way to write a literal regular expression which I used in lieu of escaping the / in the /pre. You can see the equivalent form that irb printed as the value.

The 'm' at the end is a flag to match multi-line input. It turns the '.' from matching "any character except newline" to simply "any character".

-Rob

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

rab · December 7, 2006, 9:52pm

Sorry if this gets dup'd, I haven't seen it hit the list after 8hrs.

Can someone help me with how to do this? What would the "strip_code_snippets" method look like? I think I'd be fine finding the <pre> tag but I don't know how to completely remove the text from <pre> to </pre>. Any help would be greatly appreciated. Thanks in advance.

my_string = "<pre>dflakdjflakj</pre> my_string.gsub(/[^<pre>][</pre>$], '')

You basically create a regular expression to find the <pre> and </pre> and nuke all the characters when you find the match.

I dont think my regex is correct. Advanced Ruby developers here can help.