Cleanly handling sub-generatede files with Paperclip

Hi,

Let's say I upload a pdf file. Imagemagick extracts all pages out of it
and stores the png images on the hard-drive. How to easily handle all
these generated files with Paperclip?

Has anyone done that before? Thanks for your advice

Fernando Perez wrote:

Hi,

Let's say I upload a pdf file. Imagemagick extracts all pages out of it
and stores the png images on the hard-drive. How to easily handle all
these generated files with Paperclip?

Has anyone done that before? Thanks for your advice

I've done precisely this just recently. It isn't as tricky as it seems,
really. All you need are a few steps in your pdf processor that will
take the extracted images and add them to a new record. So, if you have
the following relationship:

class Document < ActiveRecord::Base
  has_many :images
  has_attached_file :file, :styles => { :original => {} }, :processors
=> [:extract]
end

class Image < ActiveRecord::Base
  belongs_to :document
  has_attached_file :image
end

In your processor perform your extraction to a temporary folder, and
after it is done do something like the following:

if @attachment.respond_to?(:instance) and
@attachment.instance.respond_to?(:images)
  @attachment.instance.images.destroy_all

  Dir.glob("#{@temporary}/*.{jpg,png}").each do |path|
    File.open(path) { |file| @attachment.instance.images.create(:image
=> file) }
  end
else
  raise PaperclipError, "Unable to save extracted pages. No valid
attachment."
end

Afterwards make sure to remove the temporary folder and you should be
good.

Parker Selbert wrote:

Interesting approach. In particular problem you ran into in practice?
Too many files for the fs? Database blowing up? Other?

Fernando Perez wrote:

Interesting approach. In particular problem you ran into in practice?
Too many files for the fs? Database blowing up? Other?

It has worked really well in practice. The failing point was always
ImageMagick, really. We ended up using pdf2image instead, which yielded
much better output, much faster. We've processed 120+ page documents, so
the file issue wasn't a problem. With the time it takes to process the
images (assuming you are resizing / thumbnailing) you'll certainly want
to process with a background processor though--Delayed Job, Resque or
the like.