Help with regex needed

Hi here is the array I am scanning: ["\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</

\n</td>\n<td >\n&nbsp;Ede, Lisa\n</td>\n\n<td >\n&nbsp;Valley

Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n&nbsp;\n</td>\n\n</

\n<tr>\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/

frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</

\n</td>\n<td >\n&nbsp;Hirsch, E. D. (Eric Donald), 1928-\n</td>\n

\n<td >\n&nbsp;Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> \n\n<td >\n&nbsp;\n</td>]

I am trying to pull out the essential (everything but the newlines and such) value in between the <td></td>.

Here is the regex I am trying: s.first.scan(/\<td \>(.*?)\<\/td\>/mi) But I don't get the first <td> a href value.

Any help would be appreciated. Kim

Use hpricot plugin to handle HTML parsing.

I agree with Mukund. Use Hpricot:

html = Hpricot(s.first)

html.search( "td" ) do |cell|   puts cell.inner_html end

-- Mark.