Resolving image URLs

rab · December 3, 2007, 6:37pm

I'm trying to scrape images from a page. I'm using Hpricot to scrape the actual image URLs into an array but I've encountered a problem regarding resolving the full image paths.

Example:

The src of the images can be like any of the following:

http://external.site.com/images/image.jpg (Full URL) /images/image.jpg (Absolute Path) ../images/image.jpg (Relative Path) images/image.jpg (Relative Path)

Is there a way to resolve these paths to the proper URLs? So I can copy the images to my server or whatever else I need to do with them?

Hope that makes sense.

Cheers,

Jim

You use URI.join

require 'uri'

=> true

page_and_images = {

?> 'http://external.site.com/somedir/somepage.html’ => ['http://external.site.com/images/image.jpg’

Topic		Replies	Views
Resolving image URLs rubyonrails-talk	1	97	December 3, 2007
Absolute image URL? rubyonrails-talk	3	189	January 27, 2009
Resolving image URLs rubyonrails-talk	0	101	December 3, 2007
Relative paths for images problem rubyonrails-talk	0	138	March 22, 2007
absolute image urls rubyonrails-talk	2	149	February 19, 2009

Resolving image URLs

Related topics

More Resources