hpricot and regexp parsing help needed please

11175 · September 29, 2008, 8:08pm

Hello.

I'm using hpricot for the first time on a project. I need to get some url's from a web site, but I only want certain url's.

I can grab all of the url's from the page without a problem, but how can I enhance this to select http://www.goodsite.com vs. http://www.wrongsite.com?

I'd like to test for the string "goodsite".

Thanks!

Philip_Hallstrom · September 29, 2008, 10:37pm

I'm using hpricot for the first time on a project. I need to get some url's from a web site, but I only want certain url's.

I can grab all of the url's from the page without a problem, but how can I enhance this to select http://www.goodsite.com vs. http://www.wrongsite.com?

I'd like to test for the string "goodsite".

...assuming doc is an hpricot object...

doc.search("a[@href*='goodsite']") do |result| .... end

11175 · September 29, 2008, 11:03pm

Philip Hallstrom wrote:

...assuming doc is an hpricot object...

doc.search("a[@href*='goodsite']") do |result| .... end

Yes, that works to only grab the links that I need. Previously though, I had used

(doc/:a).each do |link|

this only gave me the html string.

Can I do this the same way instead of returning

<a href= "http://…

I only want http:// so that I can use these links.

THANKS!

11175 · September 29, 2008, 11:08pm

Actually, I was wrong in my previous post. Sorry!! Both results are the same, i.e., I get back the <a href...

Is there a way for me to have a clean link? I want to insert this into a table and then pull up the pages.

Thanks!

11175 · September 29, 2008, 11:40pm

Figured it out.

doc.search("a[@href*='goodsite']") do |result| link = results.attributes['href'] puts link end

Topic		Replies	Views
Hpricot rubyonrails-talk	0	84	August 6, 2008
Hpricot rubyonrails-talk	0	106	August 6, 2008
Hpricot rubyonrails-talk	0	106	August 6, 2008
hpricot search condition rubyonrails-talk	2	96	September 10, 2009
Hpricot - malformed HTML question rubyonrails-talk	0	86	November 4, 2008

hpricot and regexp parsing help needed please

Related topics

More Resources