javascript encodeURIComponent equal code

Javascript's encodeURIComponent works differently from CGI.eacape or ERB::Util.u.

Well the difference is that the javascript stuff is produced UTF16 and the ruby UTF8 (although the documentation I can find suggests that the javascript should also be producing utf8).

for example: encodeURIComponent('中文') = '%D6%D0%CE%C4' >> CGI.escape("中文")

=> "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文")

=> "%E4%B8%AD%E6%96%87"

Is there any way to get the same encoded result with ruby code?

The are various libraries for messing around with string encodings, including iconv, and pack/unpack have some specifiers that are relevant for unicode stuff, and rails itself also has various unicode utilities in it.

Fred

Frederick Cheung wrote: > Well the difference is that the javascript stuff is produced UTF16 and > the ruby UTF8 (although the documentation I can find suggests that the > javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16 encodeURIComponent result to be the shorter?

Because for double byte characters utf16 is shorter than utf8.

> The are various libraries for messing around with string encodings, > including iconv, and pack/unpack have some specifiers that are > relevant for unicode stuff, and rails itself also has various unicode > utilities in it.

I tried to encode the string to utf-16 encoding before passing it to CGI.escape(), But I don't have any luck to production the same result as encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文")

I find a perl and a python way to do encodeURIComponent on the net, and their are here:perlとpython用encodeURIComponent() - rubyu's blog

It is a pity that I don't know perl nor python. Can anyone figure out the ruby code for me from them?

Those aren't playing with encodings which is apparently the issue here. Why does it matter anyway?

Fred

Frederick Cheung wrote: > Those aren't playing with encodings which is apparently the issue > here. Why does it matter anyway?

ok.

Here is the source code of ERB::Util.url_encode(s) method. # File erb.rb, line 801 def url_encode(s) s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X", $&.unpack("C")[0]) } end

now it works like this:

> ERB::Util.url_encode("中文")

> => "%E4%B8%AD%E6%96%87"

Can you help me changing the url_encode code a bit, so it can return utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)

well s.unpack("U*") will turn a string into a array of integers (utf code points) that it should then be easy to split into bytes. I'd start from scratch rather than using url_encode though.

Fred

when:>> "中文".unpack("U*")

=> [20013, 25991]

So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?

Well 20013 is 0x4E2D which is the utf16 for the first of your characters. Looking back at what you write I'd no idea where D6D0 is coming from - that's a completely different character according to the unicode character palette I have. Not sure what you javascript has been doing.

Fred

Frederick Cheung wrote:

I'd no idea where D6D0 is coming from

OK, problem solved. Thank you, Fred. I may never have it done without your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode products the right string I need.

Thank you again.

Nanyang Zhan wrote:

Frederick Cheung wrote:

I'd no idea where D6D0 is coming from

OK, problem solved. Thank you, Fred. I may never have it done without your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode products the right string I need.

Thank you again.

could you give me some codes you soloved the problem? thanks a lot.