Wikipedia Parser

I need to parse and redisplay in html wikipedia articles (formatted with the wikipedia style). Has anyone encountered such a library in ruby ? Any libraries that are good at that?

Thanks

David wrote:

I need to parse and redisplay in html wikipedia articles (formatted with the wikipedia style). Has anyone encountered such a library in ruby ? Any libraries that are good at that?

Thanks

>

Check out http://shanesbrain.net/articles/2006/10/02/screen-scraping-wikipedia Makes it dead easy to roll your own. Chris

Usually you shouldn't use bots on wikipedia, but should download the
free database instead and use that. Read about their policy here:

If you have your own mediawiki install and want to use a bot, you can
check out pywikipedia bot: pywikibot download | SourceForge.net It's not in ruby,
but it works great.

Actually, I’m not entirely sure that you shouldn’t use bots at all on the Wikipedia. According to the link you provided: “Robots or bots are automatic processes that interact with Wikipedia as though they were human editors”

That last bit sounds like they’re talking about a very specific kind of bot and not just a scraper.

RSL

Robots or bots are automatic processes that interact with Wikipedia as though they were human editors.” There’s nothing against screen-scraping there. That policy is about bots which edit content. Otherwise, Google would be breaking WP policy.

This is taking the discussion a little off topic though. -Nathan

I wrote that article a while ago. It'll be interesting to use WWW::Mechanize, or better yet, scRUBYt, which use Hpricot in the backend anyway.

Shane

http://shanesbrain.net

If you just need to cache some pages for displaying later, screen scraping Wikipedia is a good choice compared to downloading the db. If you’re going to be parsing and redisplaying the content in real time that is against Wikipedia’s policy.

See http://en.wikipedia.org/wiki/Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.org_at_runtime.3F