Net:HTTP incorrect content retrieved

Hi All,

I am just trying to get some data from different website for studying and testing. I could get web content successfully from some sites with UTF-8 encoding.

However, when I try to access

http://bet.hkjc.com/football/index.aspx?lang=ch&pageno=1

I got below content from response.body (even though I use force_encoding)

QtwpU(QutVP7vw qQQ03P)J+,KwSRP())///+7/J ee K)IQId&$PRZXYf[XWWbjkETnIe}e E%!nH%*eV#i}m @"KI-N.,]i5*@(uAVbY"DTR,H!5/;bgQE9 i9i X5!Krl3RrQW\T\Ra` R S`HOPWHMzI;L#\!\`AL0oKRKs2Rz>y*J-ILN:0=5/(1'l^bqA}Nb^mju-Pyr0!D\HXrC%BcDIbgRrbrFjXjQ1(}nfcd`dn0$$'vl9w+|9s5OwC(*eg%*$+de+d3AIA|Z~~IRbNNpI%( %C%[b\qFj*c$sJSRA*3!EnPx`:!B- .,WJ*&Ru v<yZ0';>?%8 (34B.D"|#tSRKsJ(4\K@F'M(3!))y e_WPZB^N)BuMp52 ='Pj"Sf<r(3)5)5 QB3Q@p2,`VY$sz@7^M6hlfIjn1"gP$A=qURWDzeU! 1X m8zPT`#F&sf@fiEbWn8$c a$t v=]\7z-@\:+R`YTn!!r23J-T`H?*NGA~,HLO!L)qG-1 Q^ T:=gl0/5^j@,/V\XRZ[4`'& %] pxQR[AL+*Y)[ /?0F$KTU(#-3/5EFr$\}PV_ZY\@h gZ"XP2X@n0zt xR)(@XPT dY$@)</! &5/9#!N}}`LP>*#1_GPPj8`t,C'h+(+)hf2pZwIS* @!Z^`mad6p|P&x) + M/-PBW J: 8y(450Ks!tk$APh %`g1?13Xs-P(R+wO*vAU/=3MIPQ& !95kOPJJ i`z K)$9998&&AP2`V"t0 0R)%IYPS]< `20514121UlDH5<H,.M0eJ@dpIbQIj#E``U&j U:HX+@C2@0pC@F{j]2^%%qd#}q93'fSQ~yq*(1h%e'[z4p(VE& lx5* ,S*5VP*CO'%E@@9, 5931'<5IXGEA `4,rD}xBR

what I expect (I checked in browser), should be below

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;

<html><head><meta http-equiv="X-UA-Compatible" content="IE=8" /><meta http-equiv="content-type" content="text/html; charset=UTF-8" /><meta name="keywords" content="足球"> <meta name="description" content="足球詳情"><script type="text/javascript">try {var enableAccessControl =false;if (enableAccessControl) {var tmp = window.location.href.substr(7) ;var SERVER_NAME = tmp.substr(0, tmp.indexOf("/"));var domainName = SERVER_NAME.substr(SERVER_NAME.indexOf(".")+1) ;document.domain = domainName;try {if (!top.betSlipFrame.isLogon())window.location.replace("/general_index.aspx?lang=en");}catch (e) {window.location.replace("/general_index.aspx?lang=en");}}}

Do anyone has idea?

1. Please post to the ruby mailing list - this is not a Rails question.

2. You'll get more usable responses if you include the code and      the Ruby version you're using to fetch the document.

FWIW,

This is not an UTF-8 encoding issue - the server is sending the response with Content-Encoding set to gzip.

On Ruby 2.0 and up, Net::HTTP will automatically decode gzip, earlier versions will need to do it themselves:

http://pushandpop.blogspot.com.au/2011/05/handling-gzip-responses-in-ruby-nethttp.html

–Matt Jones

Matt, really thanks a lots.

and I will put the question correctly into ruby path