Regex in Ruby - Strip HTML out of comments - help

Tom_Mornini1 · August 21, 2006, 2:21am

Regexes to parse HTML or XML are *notoriously* difficult.

Far better to use an HTML or XML parser to handle this task.

carmen · August 21, 2006, 2:44am

i donno about that. theres a million HTML parser out there, and some are even written in JAVA. at its core theres a very small number of reserved chars and there's little syntactic magic on the delimiter level. this is proably one reason its so successful, compared to the .doc format, or CSV...

re: the thread topic, ive always used this to strip html:

string =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;

Tom_Mornini1 · August 21, 2006, 3:10am

I've not invested the libraries in Ruby enough to make a good suggestion as to which library to use and how to use it. Sorry.

Tom_Mornini1 · August 21, 2006, 4:05am

Nor have I investigated the libraries...

Topic		Replies	Views
Regex in Ruby - Strip HTML out of comments - help rubyonrails-talk	0	153	August 21, 2006
Regex in Ruby - Strip HTML out of comments - help rubyonrails-talk	1	145	August 21, 2006
Help with a regex rubyonrails-talk	4	121	December 7, 2006
Validation strip html tags? rubyonrails-talk	1	86	April 2, 2008
Looking for an HTML parser rubyonrails-talk	1	175	September 3, 2006

Regex in Ruby - Strip HTML out of comments - help

Related topics

More Resources