convert html to plain text in ruby

Mathijs · October 1, 2008, 10:31pm

Hi,

I'm looking for a way to convert html to plain text. Now, I know about strip_tags, but - as the name says - that only strips the tags.

What I need is to get stuff like & and < back to & and < too. Any help?

Thanks, Mathijs

Richard_Luther · October 2, 2008, 6:26pm

You could use some regexp and the hash ERB::Util::HTML_ESCAPE to return the unescaped versions of the characters. - Richard

Walter_McGinnis · October 3, 2008, 1:56am

You might be able to check out some example code in convert_attachment_to plugin:

http://github.com/kete/convert_attachment_to/tree/master

Depending on configuration, it will take an uploaded HTML file (or PDF, MS doc…) and convert it into a plain text attribute, etc. Probably overkill for what what you are after, but might have something you can learn from.

Cheer,

Walter

Topic		Replies	Views
Util.html_escape() in erb file rubyonrails-talk	10	566	September 27, 2010
Regex in Ruby - Strip HTML out of comments - help rubyonrails-talk	3	167	August 21, 2006
Validation strip html tags? rubyonrails-talk	1	86	April 2, 2008
Converting text to html rubyonrails-talk	1	100	November 15, 2006
How to remove special characters? (ie. &) rubyonrails-talk	3	133	February 10, 2009

convert html to plain text in ruby

Related topics

More Resources