any gems to process PowerPoint files?

Does anyone know of any gems or plugins that can take a PowerPoint and
create images out of every slide and also access the text in each
slide?

not sure if this will help but google docs just started providing 3rd
party publishing/conversion from MS Office to google docs. perhaps
this can get you part of the way by converting and using the gdata gem

http://code.google.com/apis/gdata/articles/gdata_on_rails.html

b

Hi,

I highly reccomend you Prezi http://prezi.com/
and not any MS product, since they are not free, and not open-source,
so if any company makes anything which can use it - it still mean that
you or some will pay for it in the end. And mostly PowerPoint is
really out of date.
cheers
Zoltán

My client has PowerPoint files and we need to process them into images
and text.

I was hoping there was something out there that could do this
directly.

If anyone else has suggestions, please post.

Thanks

My client has PowerPoint files and we need to process them into images
and text.

I was hoping there was something out there that could do this
directly.

If anyone else has suggestions, please post.

Thanks

Not on the server, but PowerPoint itself can output a Web site from each file using File / Export. Something for a temp at your client's office to do all day long.

Walter

Unfortunately that's not going to work either.

I really need to process these either in my Rails app or through a 3rd
party service.

Anyone know of any SaaS or web services that would do this?

Thanks

Andy wrote in post #965257:

Unfortunately that's not going to work either.

I really need to process these either in my Rails app or through a 3rd
party service.

Anyone know of any SaaS or web services that would do this?

Thanks

How about the Google Docs API?

Best,

Let's say the uploaded PPT is belong to Document model, and it's pages
images are belong to DocumentPage model.

So you need to make Paperclip Processor, which you use in Document
model. Inside this Processor you need to:
1. Create tmp folders where you will perform all operations
2. Convert PPT to PDF using http://www.artofsolving.com/opensource/pyodconverter
3. Convert PDF to TIFF images using ImageMagick.
4. Process TIFF images with
Tesseract(http://code.google.com/p/tesseract-ocr/) to extract
keywords.
5. Convert TIFF to PNG
6. Create DocumentPage models passing PNG images and extracted
keywords as a parameters.
7. If all DocumentPage models are created, just go out of Processor to
let the Document model be created.

Here is the Processor
https://gist.github.com/723079
It's kinda messy and kinda belongs to my application, but you get the idea.