I built a system in Rails 2.3.8 that accepted PDF uploads and needed to extract their text content using the venerable (read ancient) pdftotext command-line utility. I had to jump through the following hoops to make it work, and this might have some bearing on your solution:
#model
has_attached_file :pdf,:styles => { :text => { :fake => 'variable' } }, :processors => [:text]
after_post_process :extract_text
private
def extract_text
file = File.open("#{pdf.queued_for_write[:text].path}","r")
plain_text = ""
while (line = file.gets)
plain_text << Iconv.conv('ASCII//IGNORE', 'UTF8', line)
end
self.plain_text = plain_text
end
#lib/paperclip_processors/text.rb
module Paperclip
# Handles extracting plain text from PDF file attachments
class Text < Processor
attr_accessor :whiny
# Creates a Text extract from PDF
def make
src = @file
dst = Tempfile.new([@basename, 'txt'].compact.join("."))
command = <<-end_command
"#{ File.expand_path(src.path) }"
"#{ File.expand_path(dst.path) }"
end_command
begin
success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", command.gsub(/\s+/, " "))
Rails.logger.info "Processing #{src.path} to #{dst.path} in the text processor."
rescue PaperclipCommandLineError
raise PaperclipError, "There was an error processing the text for #{@basename}" if @whiny
end
dst
end
end
end
Within the environs of Paperclip, you can write processors that do pretty much anything, and usually result in a new file saved as a new format in the attachments hierarchy. Once that process is done, you can access the result file and do other stuff with it. But I'm not sure if that answers your question at all, since you don't seem to be facing the same problem I was.
If your form posts a file to Paperclip, you don't get access to the file parts of that form submission directly in your controller, unless I'm missing something fundamental. But a processor can access them directly, at a very low level.
Walter