Disabling XML character escaping for to_xml

Currently, it appears to_xml will automatically escape any entities
into their corresponding &XXX representation. There's a piece in the
documentation that says "If $KCODE is set to u and encoding set to
UTF8, then escaping will NOT be performed."

Unfortunately, this doesn't appear to be the case. Even after
following the docs and ensuring that default_charset is indeed UTF-8
(actually the default for Rails nowadays), we still get encoded
characters in to_xml output.

Since our client is UTF-8 aware, we need to pass thru the UTF-8 data
intact. The only way we've found to do this is thru the following
horrible monkey-patch:

module Builder
  class XmlBase
    def _escape(text)
      text
    end
  end
end

What's the proper way to do this?

Thanks,
Nate

I had the same issue, but eventually putting

$KCODE='UTF8'

in my config/environment.rb solved the issue.

Greetings,

Wouter

Just deployed to a production server, but it doesn't work there,
although the rails version is the same. Maybe it's the ruby version
(1.8.7 locally and 1.8.6 on the server)

I have the same issue,
$KCODE='UTF8' by default, but I set it anyway in environment.rb
This didn't solve my problem, I applied the patch and it worked,
It's not the ideal solution, but it gets the job done :slight_smile:
I've tried the multibyte chars thing and it didn't work eather.

May the source be with you

Any word on if this is fixed in Edge/Rails 2.2?

Cheers,
Walter

Actually, the monkey patch solution sort of sucks. It turns off ALL
escaping, not just turning off utf to entities escaping.

So this is fine:

<dc:description>māori</dc:description>

but this is not:

<dc:description><p>āēīōū</p>
<p>&nbsp;</p></dc:description>

The html tags SHOULD be escaped, while the unicode characters
shouldn't be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Cheers,
Watler

The html tags SHOULD be escaped, while the unicode characters
shouldn't be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Many moons ago I overrode the String#to_xs method that builder adds to
just escape the vitals (ie &<>'" ) instead of all the extra stuff it
does.

Fred

Yeah, I ended up doing that basically, but in some specific helpers. My coworker refined it though using the htmlentities plugin. You can see it here:

http://github.com/kete/kete/tree/master/lib/oai_dc_helpers.rb#L135

Long term we may do this for all the xml values, not just our dc:description element. So it might move up to monkey patching builder or more general spot or something.

Cheers,

Walter