javascript encodeURIComponent equal code

Javascript's encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).

for example:
encodeURIComponent('中文') = '%D6%D0%CE%C4'
>> CGI.escape("中文")

=> "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文")

=> "%E4%B8%AD%E6%96%87"

Is there any way to get the same encoded result with ruby code?

The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.

Fred

Frederick Cheung wrote:
> Well the difference is that the javascript stuff is produced UTF16 and
> the ruby UTF8 (although the documentation I can find suggests that the
> javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?

Because for double byte characters utf16 is shorter than utf8.

> The are various libraries for messing around with string encodings,
> including iconv, and pack/unpack have some specifiers that are
> relevant for unicode stuff, and rails itself also has various unicode
> utilities in it.

I tried to encode the string to utf-16 encoding before passing it to
CGI.escape(), But I don't have any luck to production the same result as
encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文")

I find a perl and a python way to do encodeURIComponent on the net, and
their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786

It is a pity that I don't know perl nor python. Can anyone figure out
the ruby code for me from them?

Those aren't playing with encodings which is apparently the issue
here. Why does it matter anyway?

Fred

Frederick Cheung wrote:
> Those aren't playing with encodings which is apparently the issue
> here. Why does it matter anyway?

ok.

Here is the source code of ERB::Util.url_encode(s) method.
# File erb.rb, line 801
def url_encode(s)
s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X",
$&.unpack("C")[0]) }
end

now it works like this:

> ERB::Util.url_encode("中文")

> => "%E4%B8%AD%E6%96%87"

Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)

well s.unpack("U*") will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I'd
start from scratch rather than using url_encode though.

Fred

when:>> "中文".unpack("U*")

=> [20013, 25991]

So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?

Well 20013 is 0x4E2D which is the utf16 for the first of your
characters. Looking back at what you write I'd no idea where D6D0 is
coming from - that's a completely different character according to the
unicode character palette I have. Not sure what you javascript has
been doing.

Fred

Frederick Cheung wrote:

I'd no idea where D6D0 is
coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.

Nanyang Zhan wrote:

Frederick Cheung wrote:

I'd no idea where D6D0 is
coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.

could you give me some codes you soloved the problem?
thanks a lot.