problem scraping using nokogiri - getting wrong characters

Hi all,

I am scraping a table off of another site and inserting it onto my site. you can see an example on the initial page at: I'm referring to the green box with the snowbird weather and snowfall information.

this box has been scraped off of the snowbird site at:

The problem is that on the snowbird site it has degree symbols (°) but on my page it shows up as: (�)

I think it has something to do with the encoding but i'm pretty new to html etc. and am not sure what i can do to fix this. I've tried substituting the characters and some other things but haven't had any success yet.

any ideas?




I opened the html source from the snowreport.php site and I noted that the strange symbols that you mentioned are htmlencoded

characters. The symbol is °

I had a similar problem on last Monday, but I couldn’t complete solve it.

Try the lib:

or use a regular expression (sub, gsub) to substitute ° for the degrees symbol.



i tried that but it didn't work for me. what did was to explicitly set the encoding property in nokogiri

    url = '’     page = Nokogiri::HTML(open(url))     page.encoding = 'utf-8'

worked great after that!