Regex in Ruby - Strip HTML out of comments - help

Typo extends the String class to add the following method:

Strips any html markup from a string

TYPO_TAG_KEY = TYPO_ATTRIBUTE_KEY = /[\w:_-]+/
TYPO_ATTRIBUTE_VALUE = /(?:[A-Za-z0-9]+|(?:’[^’]?’|"[^"]?"))/

TYPO_ATTRIBUTE = /(?:#{TYPO_ATTRIBUTE_KEY}(?:\s*=\s*#{TYPO_ATTRIBUTE_VALUE})?)/
TYPO_ATTRIBUTES = /(?:#{TYPO_ATTRIBUTE}(?:\s+#{TYPO_ATTRIBUTE}))/
TAG = %r{<[!/?[]?(?:#{TYPO_TAG_KEY}|–)(?:\s+#{TYPO_ATTRIBUTES})?\s
(?:[!/?]]+|–)?>}

def strip_html
self.gsub(TAG, ‘’).gsub(/\s+/, ’ ').strip
end

I haven’t run into any edge cases of it failing yet, but I am sure if anyone finds one a bug report would be welcome :slight_smile: