How to get all image, pdf and other files links from a website?

I have to develop an application which fetches all the images, pdf, cgi,
etc. file extension links from website.

Can anybody guide me from where should I begin?

You can find usefully information at

Specially Mechanize


Well wget has a mirror mode that will clone a website

wget --mirror

or you could look at nutch ( which is a
web crawler for building searches.