SOLVED: Guess posting this got me more curious and I figured it out:
If I ask for
my_file.xpath(“//EMBEDDED_FILE/DOCUMENT”).text
Nokogiri automatically takes the content within the cdata element within the DOCUMENT node and returns it to me without the cdata. Nice. So just a case of making things harder for myself.
Just to add some context, what you experienced pretty much applies to
all lanuages that offer access to xml, and the reason is the W3C XML
specification requires this behavior. When any XML parser reads the
XML, CDATA sections are not preserved. The text property returns the
text node of an element and if the element happens to have a CDATA
section, then the text part if it is returned along with any other
text content of the element.