regrex_crawler -- a crawler which uses regular expression to catch data from website

RegexpCrawler is a crawler which uses regular expression to catch data from website. It is easy to use and less code if you are familiar with regular expression. The project site is: http://github.com/flyerhzm/regexp_crawler/tree

I give an example: a script to synchronize your github projects except fork projects, , please check example/github_projects.rb

require 'rubygems' require 'regexp_crawler'

crawler = RegexpCrawler::Crawler.new(   :start_page => "http://github.com/flyerhzm&quot;,   :continue_regexp => %r{<div class="title"><b><a href="(/ flyerhzm/.*?)">}m,   :capture_regexp => %r{<a href="http://github.com/flyerhzm/\[^/&quot;\]\*?\(?:confused: tree)?">(.*?)</a>.*<span id="repository_description".*?>(.*?)</span>.* (<div class="(?:wikistyle|plain)">.*?</div>)</div>}m,   :named_captures => ['title', 'description', 'body'],   :save_method => Proc.new do |result, page|     puts '============================='     puts page·     puts result[:title]     puts result[:description]     puts result[:body][0..100] + "..."   end,·   :need_parse => Proc.new do |page, response_body|     page =~ %r{http://github.com/flyerhzm/\\w\+\} && !response_body.index (/Fork of.*?<a href=".*?">/)   end)· crawler.start

The results are as follows: