hpricot search condition

Is this your html, or are you scraping someone else's html?

If it's yours, organize your html differently... if you know you want to be processing a section at a time, wrap those sections with an identifiable container, then scope your searches by the container.

<div> <h3>blah</h3> <li>a</li> <li>b</li> </div> <div> <h3>blah2</h3> <li>c</li> <li>d</li> </div>

(doc/"div").each do |dv|   this_h3 = (dv/"h3")   if this_h3.inner_html == "blah2"     (dv/"li").each do |li|       puts li.inner_html     end   end end

emits just c, and d

If its someone else's html in that format, you'll probably have to go elem by elem for the whole doc with state machine-ish code to track what you've seen previously since there doesn't seem to be any real 'path' to the li's per h3.

Your html is still flat, so you have to work with the patterns that you see. You have: span li li li span li li li etc...

An ugly, brute force, one case solution is to:

read the page with Hpricot remove the header convert it to a simple string representation stick your opening tag '<see>' at the head stick your closing tag and a div end '</div></see>' at the tail change all '<span>' to '</div><div><span>' doctor up the new head from '<see></div><div>' to just '<see><div>' re-create your Hproicot doc from the modified string

which takes about 8 lines of code.

YMMV

Please don't top post, it annoys readers on this list and makes it less likely that you will get help.

I have not used hpricot but if I were in your situation the first thing I would do is carefully look through the documentation for hpricot. Have you done that?

Colin