sanitizing and stripping some html?

Thomas_Mango · April 22, 2007, 3:32pm

I have an application that manages a list of feeds. In a scheduled BackgrounDRb worker, I parse each of these feeds and post the content to the same site. Some of these feeds contain HTML in the description of each item in the feed. I would like to first sanitize the HTML to remove anything particularly harmful, then I would like to strip certain tags, leaving the content.

I first tested Rick Olson's white_list plugin. It seems that this simply strips tags and their content. For example, if I say p is a bad tag, <p>content</p> gets completely stripped. I would actually like to keep the 'content' and simply remove the HTML. Certain tags are alright, such as b, em, strong, but most I would like stripped out.

I then tested Sanitize HTML in Ruby | Take the First Step and it seems to do the trick. I was just wondering if anyone else had been interested in stripping HTML but leaving the content and how they went about doing so. Thanks for your input.

Topic		Replies	Views
Validation strip html tags? rubyonrails-talk	1	106	April 2, 2008
Extract copy from HTML? rubyonrails-talk	0	120	December 21, 2006
HELP !!! RAils HTML sanitizer to remove html tags rubyonrails-talk	5	356	May 16, 2008
sanitize doesn't remove inner html rubyonrails-talk	1	184	August 19, 2008
Sanitizing HTML in a model? rubyonrails-talk	0	91	October 15, 2007

sanitizing and stripping some html?

Related topics

More Resources