Hi folks,
Here's my basic issue, hopefully this is clear. I'm trying to submit some UTF-8 values in my query string, but they are coming out mangled on the other end. It *seems* like the problem is that what Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it's mangling characters which are two bytes (for example, "%20," which is space and a one byte character, gets converted fine). I feel like I've almost figured this out, but I'm still stumped. Here's my "evidence:"
# Example UTF-8 string:
"Adélaïde de Hongrie"
# GET string (obviously URI encoded):
Started GET "/registers/results?filter[title]=Ad%E9la%EFde%20de %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700
# What Rack produces/Rails sees (in Controller):
Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, "search"=>"", "limit"=>"4"}
# Error I'm getting, when I try to "do stuff" with the above string:
ArgumentError (invalid byte sequence in UTF-8):
# What would actually be a valid string with hex UTF code points in the format above:
"Ad\xC3\xA9la\xC3\xAFde de Hongrie"
Or, in the "\u ..." format (see anything interesting here? Something obvious is eluding me...):
"Ad\u{E9}la\u{EF}de de Hongrie
To be clear, this is not a form, but an ajax query. I've tried adding the 'utf8' snowman thing manually too, but that doesn't seem to do anything...of course, maybe I'm doing that wrong.
Any thoughts/questions/pointing out of obvious errors or confused ways of thinking? I'd also appreciate any pointers to Rails documentation which describes in more detail how this stuff happens; I've just been digging through the code and it's slow going for me.
Help much appreciated!
Cheers, Dave