convert html to plain text in ruby


I'm looking for a way to convert html to plain text.
Now, I know about strip_tags, but - as the name says - that only
strips the tags.

What I need is to get stuff like &amp; and &lt; back to & and < too.
Any help?


You could use some regexp and the hash ERB::Util::HTML_ESCAPE to
return the unescaped versions of the characters.
- Richard

You might be able to check out some example code in convert_attachment_to plugin:

Depending on configuration, it will take an uploaded HTML file (or PDF, MS doc…) and convert it into a plain text attribute, etc. Probably overkill for what what you are after, but might have something you can learn from.