Cleanly handling sub-generatede files with Paperclip

Hi,

Let's say I upload a pdf file. Imagemagick extracts all pages out of it and stores the png images on the hard-drive. How to easily handle all these generated files with Paperclip?

Has anyone done that before? Thanks for your advice

Fernando Perez wrote:

Hi,

Let's say I upload a pdf file. Imagemagick extracts all pages out of it and stores the png images on the hard-drive. How to easily handle all these generated files with Paperclip?

Has anyone done that before? Thanks for your advice

I've done precisely this just recently. It isn't as tricky as it seems, really. All you need are a few steps in your pdf processor that will take the extracted images and add them to a new record. So, if you have the following relationship:

class Document < ActiveRecord::Base   has_many :images   has_attached_file :file, :styles => { :original => {} }, :processors => [:extract] end

class Image < ActiveRecord::Base   belongs_to :document   has_attached_file :image end

In your processor perform your extraction to a temporary folder, and after it is done do something like the following:

if @attachment.respond_to?(:instance) and @attachment.instance.respond_to?(:images)   @attachment.instance.images.destroy_all

  Dir.glob("#{@temporary}/*.{jpg,png}").each do |path|     File.open(path) { |file| @attachment.instance.images.create(:image => file) }   end else   raise PaperclipError, "Unable to save extracted pages. No valid attachment." end

Afterwards make sure to remove the temporary folder and you should be good.

Parker Selbert wrote:

Interesting approach. In particular problem you ran into in practice? Too many files for the fs? Database blowing up? Other?

Fernando Perez wrote:

Interesting approach. In particular problem you ran into in practice? Too many files for the fs? Database blowing up? Other?

It has worked really well in practice. The failing point was always ImageMagick, really. We ended up using pdf2image instead, which yielded much better output, much faster. We've processed 120+ page documents, so the file issue wasn't a problem. With the time it takes to process the images (assuming you are resizing / thumbnailing) you'll certainly want to process with a background processor though--Delayed Job, Resque or the like.