Regex in Ruby - Strip HTML out of comments - help

Steve_Longdo · August 21, 2006, 2:44am

Typo extends the String class to add the following method:

Strips any html markup from a string

TYPO_TAG_KEY = TYPO_ATTRIBUTE_KEY = /[\w:_-]+/ TYPO_ATTRIBUTE_VALUE = /(?:[A-Za-z0-9]+|(?:‘[^’]?'|“[^”]?"))/

TYPO_ATTRIBUTE = /(?:#{TYPO_ATTRIBUTE_KEY}(?:\s*=\s*#{TYPO_ATTRIBUTE_VALUE})?)/ TYPO_ATTRIBUTES = /(?:#{TYPO_ATTRIBUTE}(?:\s+#{TYPO_ATTRIBUTE}))/ TAG = %r{<[!/?[]?(?:#{TYPO_TAG_KEY}|–)(?:\s+#{TYPO_ATTRIBUTES})?\s(?:[!/?]]+|–)?>}

def strip_html self.gsub(TAG, ‘’).gsub(/\s+/, ’ ').strip end

I haven’t run into any edge cases of it failing yet, but I am sure if anyone finds one a bug report would be welcome

Topic		Replies	Views
Regex in Ruby - Strip HTML out of comments - help rubyonrails-talk	3	167	August 21, 2006
Regex in Ruby - Strip HTML out of comments - help rubyonrails-talk	1	146	August 21, 2006
Problems using the RedCloth gem !?! rubyonrails-talk	2	96	January 22, 2007
Validation strip html tags? rubyonrails-talk	1	86	April 2, 2008
Extract copy from HTML? rubyonrails-talk	0	106	December 21, 2006

Regex in Ruby - Strip HTML out of comments - help

Strips any html markup from a string

Related topics

More Resources