If anyone is thinking about using either of these packages to
screen-scrape then I think you should consider mechanize as an option
At a guess, I would use...
wget to pull down the page
tidy to convert it to XHTML
XPath from libxml or similar high-end parser
All three engines are written in a C language, not our beloved Ruby.
And no Perl, either...