How to get all image, pdf and other files links from a website?

11155 · January 4, 2012, 11:26am

I have to develop an application which fetches all the images, pdf, cgi, etc. file extension links from website.

Can anybody guide me from where should I begin?

Felipe_Fontoura · January 4, 2012, 11:44am

Specially Mechanize

's

Peter_Hickman1 · January 4, 2012, 12:10pm

Well wget has a mirror mode that will clone a website

or you could look at nutch (Home - NUTCH - Apache Software Foundation) which is a web crawler for building searches.