Read embedded image from xml file

Hi,

I have an xml file which contains an embedded image(this is not from a
rails application), i cant get a clear answer on this so basically i
need to know is there a way i can read the image from the xml file and
then store it to the file system somewhere? Im using paperclip as my
means of managing files.

On another note, can you embed an image in an xml file from rails?

I thought the easiest solution would be to provide a link to the
physical image and get it that way,

Any advice appreciated,

JB

How the image is encoded into XML? Can you give an example?

Is it a CDATA section with Base64 or something else? If it's a known
encoding, you just read it as an attribute / tag value and then decode
using an appropriate decoder, then open the file in binary mode and
save the decoded version there. If it's paperclip that accepts the
XML, there's such thing as processors that you can use to write a
processor that extracts the image and replaces the original XML file
with the extracted thing.

In order to put the image back into XML, again you use the encoder
(Base64, for example) and put the resulting value inside the tag.

Again, if you give a sample XML, I may be able to give more precise
suggestions.

Cheers,
- A

Ok thanks, i did read about using Base64 to decode it but my main issue
is as you say understanding
what format it was decoded in the first place. Please find a sample
attached

thanks

JB

Aleksey Gureiev wrote:

How the image is encoded into XML? Can you give an example?

Is it a CDATA section with Base64 or something else? If it's a known
encoding, you just read it as an attribute / tag value and then decode
using an appropriate decoder, then open the file in binary mode and
save the decoded version there. If it's paperclip that accepts the
XML, there's such thing as processors that you can use to write a
processor that extracts the image and replaces the original XML file
with the extracted thing.

In order to put the image back into XML, again you use the encoder
(Base64, for example) and put the resulting value inside the tag.

Again, if you give a sample XML, I may be able to give more precise
suggestions.

Cheers,
- A

Attachments:
http://www.ruby-forum.com/attachment/4995/test.xml

I'm not sure that that is an image file per se. If you strip the junk
out from the decoded file you get this:

... Packager Shell Object ... Package ... 2010-8-23 _9999_bum
street_Stewart McNicholl_IMAGE_005.jpg ... C:\Users\Stewart
McNicholl\Desktop\2010-8-23 _9999_bum street_Stewart McNicholl_IMAGE
_005.jpg ... C:\Users\STEWAR~1\AppData\Local\Temp\2010-8-23 _9999_bum
street_Stewart McNicholl_IMAGE_005 (2).jpg ...

It looks like some sort of archive format perhaps.

A jpeg encoded in base 64 would probably start "/9j/4AAQ....", or
something like that.

Here's what you basically do:

  require 'rexml/document'
  require 'base64'

  filename = 'file.dat'
  xml = '... contents of test.xml ...'

  doc = REXML::Document.new(xml)
  str = doc.root.elements['Sheet1'].elements['PhotoTaken1'].text.gsub(/\s+/, '')

  File.open(filename, 'wb') { |f| f.write Base64.decode64(str) }

In case of Paperclip::Processor, you will need to read the contents of the uploaded file and output the decoded contents into a temp file. Here's a good link to see how processors are built: http://mdeering.com/posts/018-paperclip-processors-doing-so-much-more-with-your-attachment

Also, I just checked the contents of the encoded data and it doesn't appear to be an image, so... just be warned. Check it in the hex editor when decoded. You'll see what I mean.

To put it back into XML, just read the contents of the saved Paperclip file, encode it (Base64.encode64) and put it back into the XML tag (see creating documents with REXML).

- A

PGP.sig (495 Bytes)

Thanks Peter. Think this is coming from an access database export of
some kind so that would probably explain the unknown format.

Am i right in saying its better practice to provide a link to the
image file in the xml if possible rather than embedding the actual
image. It seems
a bit of a recipe for disaster parsing binary data in an xml file and
decoding to an image.

JB

Ok thanks guys, i now know how the best way to tackle this,

JB

Well if you put the image in the xml as base64 they you don't have to
worry about losing files. Just copy the xml file and you have
everything you need. What would be helpful of course would be for the
xml to contain the meta information about the image such as:

<image encoding="base64" mime-type="image/jpeg"
filename="McNicholl_IMAGE_005.jpg">...</image>

Then you would have no problem decoding the file.

However the xml file is going to get big very quickly and could become
unwieldily and difficult to edit.

As a side note I have checked some files on my disk and gif images
start R0lGODlh... in base 64, jpgs start /9j/4AAQSk... and pngs start
iVBORw0KG...

So if the base64 was an actual image format and you did not have the
mime-type available then inspecting the first few characters of the
should give you a good chance of getting the format correct.