regrex_crawler -- a crawler which uses regular expression to catch data from website

RegexpCrawler is a crawler which uses regular expression to catch data
from website. It is easy to use and less code if you are familiar with
regular expression.
The project site is: http://github.com/flyerhzm/regexp_crawler/tree

I give an example: a script to synchronize your github projects except
fork projects, , please check example/github_projects.rb

require 'rubygems'
require 'regexp_crawler'

crawler = RegexpCrawler::Crawler.new(
  :start_page => "http://github.com/flyerhzm",
  :continue_regexp => %r{<div class="title"><b><a href="(/
flyerhzm/.*?)">}m,
  :capture_regexp => %r{<a href="http://github.com/flyerhzm/[^/"]*?(?:/
tree)?">(.*?)</a>.*<span id="repository_description".*?>(.*?)</span>.*
(<div class="(?:wikistyle|plain)">.*?</div>)</div>}m,
  :named_captures => ['title', 'description', 'body'],
  :save_method => Proc.new do |result, page|
    puts '============================='
    puts page·
    puts result[:title]
    puts result[:description]
    puts result[:body][0..100] + "..."
  end,·
  :need_parse => Proc.new do |page, response_body|
    page =~ %r{http://github.com/flyerhzm/\w+} && !response_body.index
(/Fork of.*?<a href=".*?">/)
  end)·
crawler.start

The results are as follows: