Hpricot loop question to read table row values

Hi.

I've got a file that contains a table that looks like this:

<table>
  <tr><td>column title a</td><td>column title b</td></tr>
  <tr><td>row 1 a</td><td>row 1 b</td></tr>
  <tr><td>row 2 a</td><td>row 2 b</td></tr>
  <tr><td>row 3 a</td><td>row 3 b</td></tr>
  <tr><td>row 4 a</td><td>row 4 b</td></tr>
</table>

I need to read the rows starting with the second table row, which
excludes the title of the column. Then I need to read each column's
value.

How can I loop through each row to get each value that I need?

To read a row of values I used
(doc/"/html/body/table/tbody/tr[2]/td[1]").inner_html >> row 1 a
(doc/"/html/body/table/tbody/tr[2]/td[2]").inner_html >> row 1 b

I tried variations of each loops that would increment the table row, but
I don't have the syntax correct for it to work.

Any ideas?

THANKS!

I found a superb example on how to do the above problem -
http://tinyurl.com/4zl6b9

But now I can't figure out how to save my output to my table. I am
following this example exactly so that I can duplicate it for my
purposes. I am able to get the output correctly, but I can't save it.
I get an "undefined method save" error.

See below for Steve's code.

Where can I put my method save to have it save my values to my table?

I tried to replace puts g.to_csv to g.save, but that didn't work. I
also tried to replace games << game with game.save, but that didn't work
either.

I'm still new at figuring out how to convert examples like this to my
needs.

Any help is greatly appreciated!!!!!

--- code from Steve ---

def parse_games(doc)
   games = []
   doc.search("//table[@class='tablehead']//tr").each do |tr|
      @week = tr.search("/td/a").inner_html if(tr[:class] == 'stathead')
      @date = tr.at("td").inner_html if(tr[:class] == 'colhead')

      teams = []
      tr.at("td").search("a").each do |team|
         teams << team.inner_html
      end

      if(teams.size == 2)
         @time = tr.search("td:eq(1)").inner_html
         game = Game.new()
         game.date = @date
         game.week = @week
         game.time = @time
         game.away_team = teams[0]
         game.home_team = teams[1]
         games << game
      end
   end
   games
end

games =
parse_games(Hpricot(open("http://sports.espn.go.com/nfl/schedule")))
games.each do |g|
   puts g.to_csv
end

I've got a file that contains a table that looks like this:

<table>
<tr><td>column title a</td><td>column title b</td></tr>
<tr><td>row 1 a</td><td>row 1 b</td></tr>
<tr><td>row 2 a</td><td>row 2 b</td></tr>
<tr><td>row 3 a</td><td>row 3 b</td></tr>
<tr><td>row 4 a</td><td>row 4 b</td></tr>
</table>

I need to read the rows starting with the second table row, which
excludes the title of the column. Then I need to read each column's
value.

How can I loop through each row to get each value that I need?

To read a row of values I used
(doc/"/html/body/table/tbody/tr[2]/td[1]").inner_html >> row 1 a
(doc/"/html/body/table/tbody/tr[2]/td[2]").inner_html >> row 1 b

I tried variations of each loops that would increment the table row, but
I don't have the syntax correct for it to work.

I don't know if hpricot's result set supports slice, but you could do something like this:

doc.search('table td').slice(1,99999999).each do |td_ele|
   .....
end

I'm cheating by picking 999999 but it's easier than figuring out how many results there are. I'm sure there's a method to simply remove that first element in a chain, but I can't think of it.

And you'd want to change that 'search' to match your table specifically.

Who needs loops when you have XPath? You can grab an entire column in
one fell swoop.

  require 'xml'
  doc = XML::Parser.string(html).parse
  column1 = doc.find('/table/tr[position()>1]/td[1]/text()')
  puts column1.to_a

You'll need libxml for that (gem install libxml-ruby). Hpricot is not
XPath compliant enough.

See my XPath article here:
http://markthomas.org/2008/08/22/improve-your-XML-parsing-with-XPath/

-- Mark.

Philip Hallstrom wrote:

I don't know if hpricot's result set supports slice, but you could do
something like this:

doc.search('table td').slice(1,99999999).each do |td_ele|
   .....
end

Thanks, but I've actually been able to move past the point of parsing
out my variables. Now I just need to figure out how to save my results.

Any thoughts on that would be very much appreciated.

Mark Thomas wrote:

Who needs loops when you have XPath? You can grab an entire column in
one fell swoop.

  require 'xml'
  doc = XML::Parser.string(html).parse
  column1 = doc.find('/table/tr[position()>1]/td[1]/text()')
  puts column1.to_a

You'll need libxml for that (gem install libxml-ruby). Hpricot is not
XPath compliant enough.

See my XPath article here:
http://markthomas.org/2008/08/22/improve-your-XML-parsing-with-XPath/

-- Mark.

I looked over you XPath article. Thanks.

But given the table structure noted above, will XPath actually work to
take the variables that I need in each of the columns and insert those
into database fields? I need to locate 3 separate columns and grab each
of those by row and insert those into a table. So the html and the
database table will end up looking the same.

Thanks.

But given the table structure noted above, will XPath actually work to
take the variables that I need in each of the columns and insert those
into database fields? I need to locate 3 separate columns and grab each
of those by row and insert those into a table. So the html and the
database table will end up looking the same.

If you need a row at a time, then you just shorten your XPath a bit:

  doc = XML::Parser.string(html).parse
  doc.find('/table/tr[position()>1]').each do |row|
    my_db_insert(row.find('td/text()').to_a)
  end

-- Mark.