hpricot search condition

11175 · September 8, 2009, 5:10pm

Is this your html, or are you scraping someone else's html?

If it's yours, organize your html differently... if you know you want to be processing a section at a time, wrap those sections with an identifiable container, then scope your searches by the container.

(doc/"div").each do |dv| this_h3 = (dv/"h3") if this_h3.inner_html == "blah2" (dv/"li").each do |li| puts li.inner_html end end end

emits just c, and d

If its someone else's html in that format, you'll probably have to go elem by elem for the whole doc with state machine-ish code to track what you've seen previously since there doesn't seem to be any real 'path' to the li's per h3.

11175 · September 9, 2009, 5:08pm

Your html is still flat, so you have to work with the patterns that you see. You have: span li li li span li li li etc...

An ugly, brute force, one case solution is to:

read the page with Hpricot remove the header convert it to a simple string representation stick your opening tag '<see>' at the head stick your closing tag and a div end '</div></see>' at the tail change all '<span>' to '</div><div><span>' doctor up the new head from '<see></div><div>' to just '<see><div>' re-create your Hproicot doc from the modified string

which takes about 8 lines of code.

YMMV

Colin_Law1 · September 10, 2009, 8:02am

Please don't top post, it annoys readers on this list and makes it less likely that you will get help.

I have not used hpricot but if I were in your situation the first thing I would do is carefully look through the documentation for hpricot. Have you done that?

Colin

Topic		Replies	Views
Hpricot rubyonrails-talk	0	91	August 6, 2008
Help with regex needed rubyonrails-talk	2	105	October 22, 2008
Hpricot rubyonrails-talk	0	81	August 6, 2008
Hpricot help - parsing malformed HTML rubyonrails-talk	2	149	November 17, 2006
hpricot and regexp parsing help needed please rubyonrails-talk	4	171	September 29, 2008

hpricot search condition

Related topics

More Resources