Nokogiri not returning attribute value verbatim

Cant seem to find an answer to this on google:

If I have this value as the text within an attribute in my xml source: “a2/PP00nFwWa7I8Jog7bcw==\n”

When I ask Nokogiri to return it, why does it return this: "a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the debugger as a space character). So it seems Nokogiri is converting the “\n” to a space.

Is there a way to tell Nokogiri to return verbatim? I am dealing with encrypted data and this modification which it is making to the xml source is significant?

I originally thought this might be a Ruby 1.9.2 issue but confirmed that this is the same in 1.8.7. The difference is that REXML was returning this string as expected and now am converting to Nokogiri.

Thanks,

David

David Kahn wrote in post #958607:

Cant seem to find an answer to this on google:

If I have this value as the text within an attribute in my xml source: "a2/PP00nFwWa7I8Jog7bcw==\n"

When I ask Nokogiri to return it, why does it return this: "a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the debugger as a space character). So it seems Nokogiri is converting the "\n" to a space.

Is there a way to tell Nokogiri to return verbatim? I am dealing with encrypted data and this modification which it is making to the xml source is significant?

You probably need to use the xml:space attribute in your source document, or at least that's the impression I get from White Space | Microsoft Learn .

Best,

David Kahn wrote in post #958607:

Cant seem to find an answer to this on google:

If I have this value as the text within an attribute in my xml source:

“a2/PP00nFwWa7I8Jog7bcw==\n”

When I ask Nokogiri to return it, why does it return this:

"a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the

debugger

as a space character). So it seems Nokogiri is converting the “\n” to a

space.

Is there a way to tell Nokogiri to return verbatim? I am dealing with

encrypted data and this modification which it is making to the xml

source is

significant?

You probably need to use the xml:space attribute in your source

document, or at least that’s the impression I get from

http://msdn.microsoft.com/en-us/library/ms256097.aspx .

Thanks Marnen - that was a really good idea, I just tried it in the console and it does not seem to help for the “\n” (results below) is it possible there is some other setting which would preserve the “\n”? This is really strange to me as these characters are within a string literal… but it actually does also surprise me about the spaces.

without the xml:space=“preserve”

ruby > doc_enc = “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” => “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">”

ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12ff91c name=“document” children=[#<Nokogiri::XML::Element:0x12ff71e name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12ff6d8 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd = Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12fded2 name=“document” children=[#<Nokogiri::XML::Element:0x12fdcac name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12fdc3e name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== "

with xml:space="preserve"

ruby > doc_enc = "<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\

=> “<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12f7244 name=“document” children=[#<Nokogiri::XML::Element:0x12f708c name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12f705a name=“space” namespace=#<Nokogiri::XML::Namespace:0x12f6f56 prefix=“xml” href=“http://www.w3.org/XML/1998/namespace”> value=“preserve”>, #<Nokogiri::XML::Attr:0x12f7050 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== " ruby >

David Kahn wrote in post #958607:

Cant seem to find an answer to this on google:

If I have this value as the text within an attribute in my xml source:

“a2/PP00nFwWa7I8Jog7bcw==\n”

When I ask Nokogiri to return it, why does it return this:

"a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the

debugger

as a space character). So it seems Nokogiri is converting the “\n” to a

space.

Is there a way to tell Nokogiri to return verbatim? I am dealing with

encrypted data and this modification which it is making to the xml

source is

significant?

You probably need to use the xml:space attribute in your source

document, or at least that’s the impression I get from

http://msdn.microsoft.com/en-us/library/ms256097.aspx .

Thanks Marnen - that was a really good idea, I just tried it in the console and it does not seem to help for the “\n” (results below) is it possible there is some other setting which would preserve the “\n”? This is really strange to me as these characters are within a string literal… but it actually does also surprise me about the spaces.

without the xml:space=“preserve”

ruby > doc_enc = “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” => “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">”

ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12ff91c name=“document” children=[#<Nokogiri::XML::Element:0x12ff71e name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12ff6d8 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd = Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12fded2 name=“document” children=[#<Nokogiri::XML::Element:0x12fdcac name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12fdc3e name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== "

with xml:space="preserve"

ruby > doc_enc = "<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\

=> “<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12f7244 name=“document” children=[#<Nokogiri::XML::Element:0x12f708c name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12f705a name=“space” namespace=#<Nokogiri::XML::Namespace:0x12f6f56 prefix=“xml” href=“http://www.w3.org/XML/1998/namespace”> value=“preserve”>, #<Nokogiri::XML::Attr:0x12f7050 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== " ruby >

What seems even more insane is that if I wrap the encrypted string in characters (pipe in this case), it still takes away my “\n”: => “<BORROWER xml:space="preserve" _SSN="|a2/PP00nFwWa7I8Jog7bcw==\n|">” ruby-1.9.2-p0 > Nokogiri::XML(doc_enc)
=> #<Nokogiri::XML::Document:0x12eb1d8 name=“document” children=[#<Nokogiri::XML::Element:0x12eafd0 name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12eaf94 name=“space” namespace=#<Nokogiri::XML::Namespace:0x12eaea4 prefix=“xml” href=“http://www.w3.org/XML/1998/namespace”> value=“preserve”>, #<Nokogiri::XML::Attr:0x12eaf8a name=“_SSN” value=“|a2/PP00nFwWa7I8Jog7bcw== |”>]>]>

David Kahn wrote in post #958607:

Cant seem to find an answer to this on google:

If I have this value as the text within an attribute in my xml source:

“a2/PP00nFwWa7I8Jog7bcw==\n”

When I ask Nokogiri to return it, why does it return this:

"a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the

debugger

as a space character). So it seems Nokogiri is converting the “\n” to a

space.

Is there a way to tell Nokogiri to return verbatim? I am dealing with

encrypted data and this modification which it is making to the xml

source is

significant?

You probably need to use the xml:space attribute in your source

document, or at least that’s the impression I get from

http://msdn.microsoft.com/en-us/library/ms256097.aspx .

Thanks Marnen - that was a really good idea, I just tried it in the console and it does not seem to help for the “\n” (results below) is it possible there is some other setting which would preserve the “\n”? This is really strange to me as these characters are within a string literal… but it actually does also surprise me about the spaces.

without the xml:space=“preserve”

ruby > doc_enc = “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” => “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">”

ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12ff91c name=“document” children=[#<Nokogiri::XML::Element:0x12ff71e name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12ff6d8 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd = Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12fded2 name=“document” children=[#<Nokogiri::XML::Element:0x12fdcac name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12fdc3e name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== "

with xml:space="preserve"

ruby > doc_enc = "<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\

=> “<BORROWER xml:space="preserve" _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">” ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12f7244 name=“document” children=[#<Nokogiri::XML::Element:0x12f708c name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12f705a name=“space” namespace=#<Nokogiri::XML::Namespace:0x12f6f56 prefix=“xml” href=“http://www.w3.org/XML/1998/namespace”> value=“preserve”>, #<Nokogiri::XML::Attr:0x12f7050 name=“_SSN” value="a2/PP00nFwWa7I8Jog7bcw== ">]>]>

ruby > nd.xpath(“BORROWER”).attribute(“_SSN”).value => "a2/PP00nFwWa7I8Jog7bcw== " ruby >

What seems even more insane is that if I wrap the encrypted string in characters (pipe in this case), it still takes away my “\n”:

=> “<BORROWER xml:space="preserve" _SSN="|a2/PP00nFwWa7I8Jog7bcw==\n|">”

ruby-1.9.2-p0 > Nokogiri::XML(doc_enc)
=> #<Nokogiri::XML::Document:0x12eb1d8 name=“document” children=[#<Nokogiri::XML::Element:0x12eafd0 name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12eaf94 name=“space” namespace=#<Nokogiri::XML::Namespace:0x12eaea4 prefix=“xml” href=“http://www.w3.org/XML/1998/namespace”> value=“preserve”>, #<Nokogiri::XML::Attr:0x12eaf8a name=“_SSN” value=“|a2/PP00nFwWa7I8Jog7bcw== |”>]>]>

Sorry for all the addl posts but also in CDATA!!! Can the chars “\n” never mean anything but newline in our world?

=> “<BORROWER _SSN="[CDATA[a2/PP00nFwWa7I8Jog7bcw==\n]]">”

ruby-1.9.2-p0 > Nokogiri::XML(doc_enc)
=> #<Nokogiri::XML::Document:0x12e6f16 name=“document” children=[#<Nokogiri::XML::Element:0x12e6d04 name=“BORROWER” attributes=[#<Nokogiri::XML::Attr:0x12e6cd2 name=“_SSN” value=“[CDATA[a2/PP00nFwWa7I8Jog7bcw== ]]”>]>]>

>>> David Kahn wrote in post #958607: >>> > Cant seem to find an answer to this on google:

>>> > If I have this value as the text within an attribute in my xml source: >>> > "a2/PP00nFwWa7I8Jog7bcw==\n"

>>> > When I ask Nokogiri to return it, why does it return this: >>> > "a2/PP00nFwWa7I8Jog7bcw== " (the last character I confirmed in the >>> > debugger >>> > as a space character). So it seems Nokogiri is converting the "\n" to a >>> > space.

>>> > Is there a way to tell Nokogiri to return verbatim? I am dealing with >>> > encrypted data and this modification which it is making to the xml >>> > source is >>> > significant?

>>> You probably need to use the xml:space attribute in your source >>> document, or at least that's the impression I get from >>>White Space | Microsoft Learn.

>> Thanks Marnen - that was a really good idea, I just tried it in the >> console and it does not seem to help for the "\n" (results below) is it >> possible there is some other setting which would preserve the "\n"? This is >> really strange to me as these characters are within a string literal.. but >> it actually does also surprise me about the spaces.

>> # without the xml:space="preserve" >> ruby > doc_enc = "<BORROWER >> _SSN=\"a2/PP00nFwWa7I8Jog7bcw==\n\"></BORROWER>" >> => "<BORROWER _SSN=\"a2/PP00nFwWa7I8Jog7bcw==\n\"></BORROWER>" >> ruby > Nokogiri::XML(doc_enc) >> => #<Nokogiri::XML::Document:0x12ff91c name="document" >> children=[#<Nokogiri::XML::Element:0x12ff71e name="BORROWER" >> attributes=[#<Nokogiri::XML::Attr:0x12ff6d8 name="_SSN" >> value="a2/PP00nFwWa7I8Jog7bcw== ">]>]> >> ruby > nd = Nokogiri::XML(doc_enc) >> => #<Nokogiri::XML::Document:0x12fded2 name="document" >> children=[#<Nokogiri::XML::Element:0x12fdcac name="BORROWER" >> attributes=[#<Nokogiri::XML::Attr:0x12fdc3e name="_SSN" >> value="a2/PP00nFwWa7I8Jog7bcw== ">]>]> >> ruby > nd.xpath("BORROWER").attribute("_SSN").value >> => "a2/PP00nFwWa7I8Jog7bcw== "

>> # with xml:space=\"preserve\" >> ruby > doc_enc = "<BORROWER xml:space=\"preserve\" >> _SSN=\"a2/PP00nFwWa7I8Jog7bcw==\ >> => "<BORROWER xml:space=\"preserve\" >> _SSN=\"a2/PP00nFwWa7I8Jog7bcw==\n\"></BORROWER>" >> ruby > Nokogiri::XML(doc_enc) => #<Nokogiri::XML::Document:0x12f7244 >> name="document" children=[#<Nokogiri::XML::Element:0x12f708c name="BORROWER" >> attributes=[#<Nokogiri::XML::Attr:0x12f705a name="space" >> namespace=#<Nokogiri::XML::Namespace:0x12f6f56 prefix="xml" href=" >>http://www.w3.org/XML/1998/namespace&quot;&gt; value="preserve">, >> #<Nokogiri::XML::Attr:0x12f7050 name="_SSN" value="a2/PP00nFwWa7I8Jog7bcw== >> ">]>]> >> ruby > >> nd.xpath("BORROWER").attribute("_SSN").value => >> "a2/PP00nFwWa7I8Jog7bcw== " >> ruby >

> What seems even more insane is that if I wrap the encrypted string in > characters (pipe in this case), it still takes away my "\n":

> => "<BORROWER xml:space=\"preserve\" > _SSN=\"|a2/PP00nFwWa7I8Jog7bcw==\n|\"></BORROWER>" > ruby-1.9.2-p0 > > Nokogiri::XML(doc_enc) > => #<Nokogiri::XML::Document:0x12eb1d8 name="document" > children=[#<Nokogiri::XML::Element:0x12eafd0 name="BORROWER" > attributes=[#<Nokogiri::XML::Attr:0x12eaf94 name="space" > namespace=#<Nokogiri::XML::Namespace:0x12eaea4 prefix="xml" href=" >http://www.w3.org/XML/1998/namespace&quot;&gt; value="preserve">, *#<Nokogiri::XML::Attr:0x12eaf8a > name="_SSN" value="|a2/PP00nFwWa7I8Jog7bcw== |">]>]>*

Sorry for all the addl posts but *also* in CDATA!!! Can the chars "\n" *never* mean anything but newline in our world?

Take a deep breath. I believe you are quoting your strings incorrectly. Witness the following script, which behaves as expected on Nokogiri 1.4.3.1 and libxml 2.7.6:

    require 'rubygems'     require 'nokogiri'

    xml = '<root><foo _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">bar</foo></

'

    puts Nokogiri::XML.parse(xml).to_xml     # => <?xml version="1.0"?>     # <root>     # <foo _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">bar</foo>     # </root>

Next time you may want to try the nokogiri-talk mailing list for a quicker response from users of the library.

Take a deep breath. I believe you are quoting your strings

incorrectly. Witness the following script, which behaves as expected

on Nokogiri 1.4.3.1 and libxml 2.7.6:

require 'rubygems'

require 'nokogiri'



xml = '<root><foo _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">bar</foo></

puts Nokogiri::XML.parse(xml).to_xml

# => <?xml version="1.0"?>

#    <root>

#      <foo _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">bar</foo>

#    </root>

Next time you may want to try the nokogiri-talk mailing list for a

quicker response from users of the library.

Thanks Mike, this does work when I try it in the console and you are right, has to do with quoting. What seems clear is that the entire xml has to be within single quotes, as if it is within double quotes then the \n gets replaced. What I am not clear about is how to tell Ruby when I load the file (I am getting the xml out of a saved file), to put it in single rather than double quotes? Or is there a way to transform it after loading it. When I am reading the file in I get:

file = “<BORROWER _SSN="a2/PP00nFwWa7I8Jog7bcw==\n">”