Regex in Ruby - Strip HTML out of comments - help

Regexes to parse HTML or XML are *notoriously* difficult.

Far better to use an HTML or XML parser to handle this task.

i donno about that. theres a million HTML parser out there, and some are even written in JAVA. at its core theres a very small number of reserved chars and there's little syntactic magic on the delimiter level. this is proably one reason its so successful, compared to the .doc format, or CSV...

re: the thread topic, ive always used this to strip html:

string =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;

I've not invested the libraries in Ruby enough to make a good suggestion as to which library to use and how to use it. Sorry.

Nor have I investigated the libraries... :slight_smile: