Resolving image URLs

Philip_Hallstrom1 · December 3, 2007, 8:21pm

I'm trying to scrape images from a page. I'm using Hpricot to scrape the actual image URLs into an array but I've encountered a problem regarding resolving the full image paths.

Example:

The src of the images can be like any of the following:

http://external.site.com/images/image.jpg (Full URL) /images/image.jpg (Absolute Path) ../images/image.jpg (Relative Path) images/image.jpg (Relative Path)

Is there a way to resolve these paths to the proper URLs? So I can copy the images to my server or whatever else I need to do with them?

Parse the url into pieces... extract the domain name and the "directory" part of the path.

Then just match them up. If your image starts with http just use that. If it starts with a slash then prepend the domain name. Otherwise domain + directory_path + image.

-philip

Greg_Donald1 · December 3, 2007, 6:31pm

/me watches while wget get reinvented.

Topic		Replies	Views
Resolving image URLs rubyonrails-talk	0	96	December 3, 2007
Absolute image URL? rubyonrails-talk	3	190	January 27, 2009
Resolving image URLs rubyonrails-talk	0	101	December 3, 2007
absolute image urls rubyonrails-talk	2	151	February 19, 2009
Relative paths for images problem rubyonrails-talk	0	138	March 22, 2007

Resolving image URLs

Related topics

More Resources