problem scraping using nokogiri - getting wrong characters

Hi all,

I am scraping a table off of another site and inserting it onto my site. you can see an example on the initial page at: http://mthosts.heroku.com. I'm referring to the green box with the snowbird weather and snowfall information.

this box has been scraped off of the snowbird site at: http://www.snowbird.com/ski_board/snowreport.php

The problem is that on the snowbird site it has degree symbols (°) but on my page it shows up as: (�)

I think it has something to do with the encoding but i'm pretty new to html etc. and am not sure what i can do to fix this. I've tried substituting the characters and some other things but haven't had any success yet.

any ideas?

thanks,

max

Hi!

I opened the html source from the snowreport.php site and I noted that the strange symbols that you mentioned are htmlencoded

characters. The symbol is °

I had a similar problem on last Monday, but I couldn’t complete solve it.

Try the lib: http://htmlentities.rubyforge.org/

or use a regular expression (sub, gsub) to substitute ° for the degrees symbol.

Regards,

Everaldo

i tried that but it didn't work for me. what did was to explicitly set the encoding property in nokogiri

    url = 'http://www.snowbird.com/ski_board/snowreport.php’     page = Nokogiri::HTML(open(url))     page.encoding = 'utf-8'

worked great after that!

thx,

Max