I have to develop an application which fetches all the images, pdf, cgi, etc. file extension links from website.
Can anybody guide me from where should I begin?
I have to develop an application which fetches all the images, pdf, cgi, etc. file extension links from website.
Can anybody guide me from where should I begin?
You can find usefully information at http://railscasts.com/episodes?utf8=%E2%9C%93&search=nokogiri
Specially Mechanize
's
Well wget has a mirror mode that will clone a website
wget --mirror http://www.example.com
or you could look at nutch (Home - NUTCH - Apache Software Foundation) which is a web crawler for building searches.