convert html to plain text in ruby

Hi,

I'm looking for a way to convert html to plain text. Now, I know about strip_tags, but - as the name says - that only strips the tags.

What I need is to get stuff like &amp; and &lt; back to & and < too. Any help?

Thanks, Mathijs

You could use some regexp and the hash ERB::Util::HTML_ESCAPE to return the unescaped versions of the characters. - Richard

You might be able to check out some example code in convert_attachment_to plugin:

http://github.com/kete/convert_attachment_to/tree/master

Depending on configuration, it will take an uploaded HTML file (or PDF, MS doc…) and convert it into a plain text attribute, etc. Probably overkill for what what you are after, but might have something you can learn from.

Cheer,

Walter