I think it filters some HTML tags, but not normal and safe ones like <br />, <p>, <h1> etc.
If there is, I haven’t found it. I have though found a useful Regex for doing that. It’s a slightly modified version of the one used in Typo. [No source code will go scavenged!] I know I might catch some flack for doing so but I choose to add this method to the String class itself, as I use it a lot.
class String def strip_html(leave_whitespace = false) name = /[\w:_-]+/ value = /([A-Za-z0-9]+|(‘[^’]?'|“[^”]?"))/ attr = /(#{name}(\s*=\s*#{value})?)/ rx = /<[!/??(#{name}|–)(\s+(#{attr}(\s+#{attr})))?\s([!/?]]+|–)?>/
(leave_whitespace) ? self.gsub(rx, "").strip : self.gsub(rx, "").gsub(/\s+/, " ").strip
end end
Be aware, though, that there is stil a lot of HTML entities left in the Textilized string. [™, etc.] Depending on your end use, you may need to strip those entities as well. Let me know if you do need that because I’ve written some really handy code for it, completely based on what transformations RedCloth does. You know, only make the server work as hard as it has to.
RSL
In what circumstances would you want an h1 and not an h2? Sounds like you’re definitely going to need to Regex that one.
Anyhow, here’s the two additional methods [both on the String class as before] for dealing with RedClothed HTML entities.
def strip_accents self.gsub(/&([A-Za-z])(grave|acute|circ|tilde|uml|ring|lig|cedil);/, ‘\1’) end
def convert_entities dummy = self.dup { “#822[01]” => “"”,
"#821[67]" => "'",
"#8230" => "...",
"#8211" => "-",
"#8212" => "--",
"#215" => "x",
"gt" => ">",
"lt" => "<",
"(#8482|trade)" => "(tm)",
"(#174|reg)" => "(r)",
"(#169|copy)" => "(c)",
"(#38|amp)" => "&",
"nbsp" => " ",
"cent" => " cent",
"pound" => " pound",
}.each do |textiled, normal|
dummy.gsub!(/&#{textiled};/, normal)
end
dummy.gsub(/&[^;]+;/, "")
end