Cut first x pages from a PDF file

Hi all,

I'm finishing up on a project, which has a model with an attached PDF file. I'm using Paperclip to process the attachment.

What I'd like to do is to enable users to cut (remove) the first x pages from the PDF-file, but I have no idea how to do that. Is there a gem that can do this? And how would this integrate with Paperclip?

Thank you very much.

Kind regards, Jaap Haagmans

Take a look at Paperclip Processors

Paperclip could use custom processors to do non-standart stuff with attachments. You specify your processor name like:

has_attached_file :file,

:styles => {:original => {:processors => [:pdf_processor]}}

and you need to have a class in

lib/paperclip_processors/pdf_processor.rb like

module Paperclip

class PdfProcessor < Paperclip::Processor

end

end

there’s some examples in documentation. The thing is you get an uploaded @file in your processor, and in the end you need to return your changed @new_file back. You could do anything with it inside.

Like in your case, you can create some tmp_dir, process the @file.path with something like pdftk to your tmp_dir/new_file_name.pdf, and return @new_file = File.open(tmp_dir/new_file_name.pdf) to Paperclip to store in path you specified in the Model. Then you just delete tmp_dir with after_create filter or with cron job.

Hi Vladimir,

Thanks. The bit about paperclip processors is very helpful!

I've looked at pdftk though and couldn't find how to cut pages from PDF files. Am I overlooking something? The idea is that we will receive many different kinds of PDF-files including (for example) author information that the author doesn't want to share. If this information is on the first two pages, we'd like the author to be able to say "cut the first two pages out and save it".

Thanks again.

I don't think that you can explicitly do that with pdftk, but you can do 'burst' to break the pdf out into lots of single page documents, and then 'cat' to combine the pages that you want in the final document.

Simon

Well, you can use ranges with pdftk. Like

$ pdftk in.pdf cat 3-end output out.pdf

Catenates pages from third till the end. Here are good examples of pdftk usage

http://www.accesspdf.com/pdftk/