REXML UTF-16 trouble

I've run into a problem parsing XML with REXML, and it looks like the problem has to do with UTF-16 encoding and a bug with REXML.

I'm still running OS X 10.4.11, Ruby 1.8.6 (using Locomotive). I even tried upgrading to the latest version of REXML 3.1.7.3 - no luck. It still gives errors, only now it's saying the following.

Iconv::InvalidCharacter: ">"

Is this old news to everyone? If so, is there known solution for this?

I can't be the only person who needs to parse UTF-16 xml in Rails.

Thanks in advance.

Random thought - the XML file claims to be utf 16, but is it really?

Good question. How would I know?

If I stash the result straight into a variable and do a "puts" I get the following…

<?xml version="1.0" encoding="utf-16"?> <CallStatus><Code>InvalidPassword</Code> <Success>false</Success> <Message>Invalid password</Message> </CallStatus>

Again, REXML chokes on this, but if I change utf-16 to utf-8, no problem.

I have an interim Kludge for now. I'm chopping the BOM piece off of the XML and then sending it through as XML via REXML. It's working like a charm now. It's not pretty, but it'll do until I figure out if this is part of a bigger problem.

# Since this part of the string is always going to be same, slice the first 39 characters # Off of it. bom_string_to_remove = login_result.slice(0, 39) login_results = login_result.gsub(bom_string_to_remove,'')

So this… <?xml version="1.0" encoding="utf-16"?> <CallStatus><Code>InvalidPassword</Code> <Success>false</Success> <Message>Invalid password</Message> </CallStatus>

Becomes this… <CallStatus><Code>InvalidPassword</Code> <Success>false</Success> <Message>Invalid password</Message> </CallStatus>

And, now I treat it as XML.

Good question. How would I know?

I'd ogle the bytes via unpack or something like that.

Fred

Fred,

Thanks, I will give that a try -- eventually, I need this to work without the workaround.

+bm

Was this ever solved? How do I load a UTF-16 xml file into REXML?

Thanks! Naren