Multi Byte Strings

Hey guys,

We've been talking about the multi-byte patch and I think it's time to
get feedback from you guys on a possible way forward.

We can include ActiveSupport::Multibyte with rails 1.2, and update all
of the relevant helpers to use the String#chars proxy. This will mean
that none of the action view helpers will mangle multibyte strings.
Similarly, if any Strings are being mangled in ActiveRecord or
anywhere else, we'll accept patches to fix them.

But any iconv conversions in actionpack, or database encoding changes
will be left to plugin authors. Perhaps by the time that rails 2.0
comes around these plugins will have gained critical mass and
best-practises will have emerged, letting us add them to the core.
Similarly, encodings other than utf-8 can be provided by plugins.

Comments?

If Rails is going this direction, I think this is the right way to
approach it.

Can't David just get the world to conform to using English?

He's king of the internet, not the world.

Sorry I can never go back to ascii again, I need my smiley faces, skulls and wheelchair glyphs!

Manfred

Even English is a lot nicer with proper punctuation, math symbols, 44 different star symbols and let's not forget Skull and Crossbones: :skull_and_crossbones:

Kind regards,
Thijs

PGP.sig (186 Bytes)

Even English is a lot nicer with proper punctuation, math symbols, 44
different star symbols and let's not forget Skull and Crossbones: :skull_and_crossbones:

Yeah, there's no way I'm building apps without the Skull and Crossbones.

Ok, no one seemed to object to our plan, it's now 'the plan'. I'll
merge ActiveSupport:Multibyte this evening some time. After that
we'll take individual patches for all the cases in the other
components which mangle multibyte strings.

To make it easier to merge each of those patches should:

* Address one helper / bug
* Contain unit tests
* have a keyword of multibytebug

Then I'll make an effort to review and merge them quickly. Also, get
the word out that this is the last chance to object or provide
feedback on our multi byte plan.

I assume you would not be opposed to a patch that checks for JRuby and
defers unicode processing to our built-in support, right? It wouldn't
cause any additional overhead for MRI, but could make unicode string
processing quite a bit faster under JRuby.

I'm not sure what such a patch would look like, but I wanted to know
that it wouldn't be rejected offhand if we came up with something.

Might this be better done as a plugin or similar?

-- tim lucas

I'm not sure what such a patch would look like, but I wanted to know
that it wouldn't be rejected offhand if we came up with something.

Sure, assuming it's elegant and low/no overhead, I can't see anything
wrong with that.

Assuming it's possible to do in a plugin, that is. Ideally, though, I
don't want JRubyists to be penalized because Ruby can't support
unicode natively, which is what the requirement to install an
additional plugin basically amounts to.

If we can provide the Chars implementation through native means, all
it would require is calling JRuby's built-in code rather than
deferring to the pure Ruby version, and everything else should remain
the same.

All hypothetical at the moment, but we'll hopefully get something
concrete out soon.

The chars accessor is added to the string in a core_ext. You can easily replace the chars method on String for JRuby support.

   class String
     def chars; self; end
   end

The biggest problem with this is that you will not be able to use the normalization routines. Another solution would be to register a new backend handler with ActiveSupport::Multibyte which goes something like this:

   ActiveSupport::Multibyte::Chars.handler = MyHandler

Although registering the JRuby string methods as a handler would impose some method calling overhead.

Manfred

I'd be willing to bet a pure-Java implementation of the handler
methods would more than make up for the overhead. Thanks for the tips!

You will be on the safe side of things if you implement it as a Handler form Multibyte, you can do it later when the codebases are merged.

If we can provide the Chars implementation through native means, all
it would require is calling JRuby's built-in code rather than
deferring to the pure Ruby version, and everything else should remain
the same.

All hypothetical at the moment, but we'll hopefully get something
concrete out soon.

We need to make this work as an extension that's part of running JRuby.
We can't have JRuby specific conditionals in Rails proper. That's just
opening the gates to hell. Let's definitely figure out how to use
Ruby's dynamic nature to make this work. Even changing the code to make
it easier to overwrite, ala the handler suggestion.

I agree, and I never meant that it should "if jruby do something".
That would accomplish nothing for other implementations as they begin
to build out their own support for Unicode...including MRI. It would
be more along the lines of checking for an existing String#chars
implementation or an existing Handler (something less unpleasant than
a factory pattern, hopefully) and deferring to that implementation
instead.

I'm just looking for a way for JRuby's Unicode capabilities to be
fully leveraged by Rails without requiring plugins or hacks. I think
the suggestions on this list can be made to work with what we have in
JRuby. In fact, I'll try to come up with a Chars-compatible interface
today, to see how easily it maps to Java's String.

Let me know if you need any help.

Manfred

Ok, no one seemed to object to our plan, it's now 'the plan'. I'll
merge ActiveSupport:Multibyte this evening some time. After that
we'll take individual patches for all the cases in the other
components which mangle multibyte strings.

To make it easier to merge each of those patches should:

* Address one helper / bug
* Contain unit tests
* have a keyword of multibytebug

This has now been applied:

http://dev.rubyonrails.org/changeset/5223

Please get working on test cases and bug fixes for the parts of rails
which mangle multibyte strings.

Thanks to Julian Tarkhanov, Manfred Stienstra & Jan Behrens for their work.

Michael Koziarski wrote:

> Ok, no one seemed to object to our plan, it's now 'the plan'. I'll
> merge ActiveSupport:Multibyte this evening some time. After that
> we'll take individual patches for all the cases in the other
> components which mangle multibyte strings.
>
This has now been applied:

http://dev.rubyonrails.org/changeset/5223

I've created a short movie to show some of the features of the chars
accessor. Might be a nice introduction to the world of multibyte
safeness in Ruby (:

http://www.fngtps.com/2006/10/activesupport-multibyte

Thanks,
Manfred

We find this totally awesome, thx Michael

Hi,

I just encountered my first MultiByte problem with Rails <= 1.1
I guess I have been lucky.
I am just wondering if ActiveSupport::MultiByte fix this specific case.

render_text _(“Rename selected %s to `%s’ now.”) % [params[:item_type], params[‘name’]]

the two params[‘xx’] contains Japanese strings. and are displayed as

%u65D7%u9F13

but
since I never had this problem before, I doubt the problem is actually
the % operator. while it may be. I think it is important to note this
is happening through Ajax. which might be causing the problem as well.

Would this work under rails 1.2 with ActiveSupport::MultiByte
and is it possible to install ActiveSupport::MultiByte on rails 1.1.6 ?

thanks