Disabling XML character escaping for to_xml

Currently, it appears to_xml will automatically escape any entities into their corresponding &XXX representation. There's a piece in the documentation that says "If $KCODE is set to u and encoding set to UTF8, then escaping will NOT be performed."

Unfortunately, this doesn't appear to be the case. Even after following the docs and ensuring that default_charset is indeed UTF-8 (actually the default for Rails nowadays), we still get encoded characters in to_xml output.

Since our client is UTF-8 aware, we need to pass thru the UTF-8 data intact. The only way we've found to do this is thru the following horrible monkey-patch:

module Builder   class XmlBase     def _escape(text)       text     end   end end

What's the proper way to do this?

Thanks, Nate

I had the same issue, but eventually putting

$KCODE='UTF8'

in my config/environment.rb solved the issue.

Greetings,

Wouter

Just deployed to a production server, but it doesn't work there, although the rails version is the same. Maybe it's the ruby version (1.8.7 locally and 1.8.6 on the server)

I have the same issue, $KCODE='UTF8' by default, but I set it anyway in environment.rb This didn't solve my problem, I applied the patch and it worked, It's not the ideal solution, but it gets the job done :slight_smile: I've tried the multibyte chars thing and it didn't work eather.

May the source be with you

Any word on if this is fixed in Edge/Rails 2.2?

Cheers, Walter

Actually, the monkey patch solution sort of sucks. It turns off ALL escaping, not just turning off utf to entities escaping.

So this is fine:

<dc:description>māori</dc:description>

but this is not:

<dc:description><p>āēīōū</p> <p>&nbsp;</p></dc:description>

The html tags SHOULD be escaped, while the unicode characters shouldn't be. My work around will simply be to strip out the embedded HTML, but this a problem that people should be aware of when using the monkey patch.

Cheers, Watler

The html tags SHOULD be escaped, while the unicode characters shouldn't be. My work around will simply be to strip out the embedded HTML, but this a problem that people should be aware of when using the monkey patch.

Many moons ago I overrode the String#to_xs method that builder adds to just escape the vitals (ie &<>'" ) instead of all the extra stuff it does.

Fred

Yeah, I ended up doing that basically, but in some specific helpers. My coworker refined it though using the htmlentities plugin. You can see it here:

http://github.com/kete/kete/tree/master/lib/oai_dc_helpers.rb#L135

Long term we may do this for all the xml values, not just our dc:description element. So it might move up to monkey patching builder or more general spot or something.

Cheers,

Walter