Parsing XML file with no style info with Hpricot

11155 · March 7, 2010, 12:10pm

Hello,

I've been trying for hours to parse an XML using Hpricot. Usually it's not a problem. Here's my simple code:

#This works and outputs the proper xml data @url1 = 'http://www.sportingnews.com/stories/sportingnews/MLB/rss.xml’ @page1 = Hpricot(open(@url1)) <%= @page 1 %>

#This does not work, and I'm scratching my head @url1 = 'http://gd2.mlb.com/components/game/mlb/year_2010/month_03/day_06/gid_2010_03_06_anamlb_oakmlb_1/boxscore.xml’ @page1 = Hpricot(open(@url1)) <%= @page 1 %>

The gd2.mlb.com XML file does not have any style information according to Firefox. I can read it using Oxygen. Can somebody provide me with a hint on how to parse the mlb.com XML? Thanks!

-A

11155 · March 7, 2010, 5:24pm

Any idea how to parse this XML?

-A

Allan Last wrote:

hassan · March 7, 2010, 5:35pm

And I'm scratching mine trying to guess what you mean by "does not work" ...

11155 · March 7, 2010, 5:39pm

Hpricot is not parsing the MLB xml file. I'm thinking the reason that it is not reading the MLB xml file is because it is not in a standard XML format.

If you give my code a quick try, you'll notice that it will read other XML files, but not the MLB XML.

#This works and outputs the proper xml data @url1 = 'http://www.sportingnews.com/stories/sportingnews/MLB/rss.xml’ @page1 = Hpricot(open(@url1)) <%= @page1 %>

#This does not work, and I'm scratching my head @url1 = 'http://gd2.mlb.com/components/game/mlb/year_2010/month_03/day_06/gid_2010_03_06_anamlb_oakmlb_1/boxscore.xml’ @page1 = Hpricot(open(@url1)) <%= @page1 %>

Hassan Schroeder wrote:

hassan · March 7, 2010, 5:43pm

Actually, I already did, and it seems to work just fine. Hence my own head-scratching.

So, again, maybe you can say *exactly* what you expect to happen and how that differs from what you're seeing.

11155 · March 7, 2010, 6:18pm

Hi Hassan,

This picture: http://picasaweb.google.com/lh/photo/Qf4DFta9p5ERoCRb6Lbd2Q?feat=directlink

This is the parsed output from the feed from the sportingnews XML file. It is displayed on my view with <%= @page1 %>.

This picture: http://picasaweb.google.com/lh/photo/xLVr8_U-x12rJnADs_qcEw?feat=directlink

The blank space what is displayed on the view with <%= @page1 %> using the MLB XML file.

I'm expecting the XML information seen here on Firefox: http://picasaweb.google.com/lh/photo/X7VFocR3L4S4Pl_2jvDzVQ?feat=directlink

to be displayed when I parse the MLB file. Hpricot is not parsing this file.

-A

Hassan Schroeder wrote:

Frederick_Cheung · March 7, 2010, 7:55pm

I'm expecting the XML information seen here on Firefox:http://picasaweb.google.com/lh/photo/X7VFocR3L4S4Pl_2jvDzVQ?feat=dire…

to be displayed when I parse the MLB file. Hpricot is not parsing this file.

Have you tried viewing the source of the page generated by your view? I suspect hpricot is parsing the file but just blatting it into the view like that is producing invalid html which your browser is not rendering.

Fred

hassan · March 7, 2010, 8:54pm

I'm expecting the XML information seen here on Firefox:

/

to be displayed when I parse the MLB file. Hpricot is not parsing this file.

Sure it is -- use irb to examine what's in @page1.

As Frederick already suggested, you apparently have a view problem, not an Hpricot parsing problem.

11155 · March 8, 2010, 10:13am

Thanks everybody. I saw the info on the source. I figured it out.

-A

Hassan Schroeder wrote:

Topic		Replies	Views
Hpricot Strange behaviour rubyonrails-talk	0	107	April 6, 2009
hpricot won't scrape! (newb question) rubyonrails-talk	2	138	April 1, 2009
hpricot results in an rhtml file rubyonrails-talk	9	153	January 23, 2007
Peculiar Hpricot error in Rails app rubyonrails-talk	5	125	April 10, 2008
How to get data from DOM Source other website? rubyonrails-talk	2	178	June 16, 2010

Parsing XML file with no style info with Hpricot

Related topics

More Resources