HTML and CSS to PDF

I would like to take a rails page and convert it to a pdf. I don't want
to have to generate the code myself for making the pdf, so it should
obey css. What is the best tool for doing this? Does the tool use the
standard css, or can I provide it alternative print-css?

Thanks in advance,
Jonathan Steel

Jonathan Steel wrote:

I would like to take a rails page and convert it to a pdf. I don't want
to have to generate the code myself for making the pdf, so it should
obey css. What is the best tool for doing this?

Prince is supposed to be great, but it's expensive. Free alternatives
include wkhtmltopdf, prawn_format, acts_as_flying_saucer...

Does the tool use the
standard css, or can I provide it alternative print-css?

Print CSS is standard.

Thanks in advance,
Jonathan Steel

Best,

Marnen Laibow-Koser wrote:

Jonathan Steel wrote:

I would like to take a rails page and convert it to a pdf. I don't want
to have to generate the code myself for making the pdf, so it should
obey css. What is the best tool for doing this?

Prince is supposed to be great, but it's expensive. Free alternatives
include wkhtmltopdf, prawn_format, acts_as_flying_saucer...

A lot of people seem to be raving about Prince, which I find odd coming
from a community like rails. It is indeed expensive. I tried it on a
complex site, and it came crashing to its knees. I have looked into
prawn and am so far impressed. I'm just scarred that it won't be able to
do a complex pdf that looks like what you would see on a web site.

As for the others, I had planned on looking into wkhtmltopdf, and have
not heard of the other two. Thanks for your input.

Does the tool use the
standard css, or can I provide it alternative print-css?

Print CSS is standard.

I know its standard for when you are printing. But are there tools that
will pic up the print css and use that?

Prince is supposed to be great, but it's expensive. Free alternatives
include wkhtmltopdf, prawn_format, acts_as_flying_saucer...

I'm trying out acts_as_flying_saucer, but I can't get it working. It
just seems to ignore my render_pdf action and generates a normal html
page. Do you have a sample project of it working? There is very limited
documentation on this project, but I am interested in it.

Did you see this? Google translates it pretty well.
http://www.jrubyonrails.de/2010/03/pdf-und-html-mit-acts-as-flying-saucer.html

Colin

Hi Jonathan,

I've used prince on a few projects running in production in the past
couple of years and have nothing but good things to say about it. Not
sure what problems you experienced when you tried prince out, but I
have yet to see any problems with it.

If you can't afford the license for prince or want to stick with foss,
then I'd also highly recommend wkhtmltopdf. The main reason, other
than cost, to go with prince over wkhtmltopdf is that prince has
greater print-related css coverage than wkhtmltopdf (or more
importantly, the underlying pieces that make up wkhtmltopdf,
specifically webkit) does at this time. The main reason why I chose
prince over wkhtmltopdf for those specific projects was that the
client required certain must-haves in terms of resulting pdf output
from the underlying html/css (specifically related to css regarding
preventing pdf page-breaks inside of defined html elements) that could
be handled by prince but not by wkhtmltopdf.

Check the latest wkhtmltopdf (or webkit) in terms of css print-related
coverage as these differences are likely narrowing. Or better yet,
test wkhtmltopdf out to see if it meets the needs of your project
regardless.

As for the cost of prince, my clients had no problem paying for the
server license, especially given the cost savings they've realized
over time compared to if they would have had to pay me or some other
developer to dev the custom pdf generation code using one of the low-
level pdg-gen'ing libs (like prawn). It's just a lot easier/cheaper
to have ui devs make mods to html/css, especially in a web app context
where the app provides an on-screen preview of what the pdf will (just-
about) look like before the pdf is actually gen'd. I doubt I'll ever
go back to using low-level pdf-gen'ing libs again.

Jeff

Thanks for the great info Jeff.

You raise the same points that I have raised in our team. It would be
easier to convert html into a pdf instead of us spending the time to
develop code using Prawn for custom views.

Prince worked on a simple page, but when I tried it on a more
complicated one I got a ton of errors like the following:

prince: /Users/jonathan/tmp/97.html:552: error: Opening and ending tag
mismatch: div line 480 and html

followed by just as many:

prince: /Users/jonathan/tmp/97.html:552: error: Premature end of data in
tag div line 372

and ending with:

prince: /Users/jonathan/tmp/97.html: error: could not load input file
prince: error: no input documents to process

I tried out wkhtmltopdf and really like it. My only concern at this
point is that I can't get page breaking to work, and I found some recent
posts that would suggest it can't do page breaking.

I'm looking at acts_as_flying_saucer now, but can't get it to work for
complicated examples. I will probably be going with either prawn,
wkhtmltopdf, or acts_as_flying_saucer.

Thanks for the great info Jeff.

You raise the same points that I have raised in our team. It would be
easier to convert html into a pdf instead of us spending the time to
develop code using Prawn for custom views.

Prince worked on a simple page, but when I tried it on a more
complicated one I got a ton of errors like the following:

prince: /Users/jonathan/tmp/97.html:552: error: Opening and ending tag
mismatch: div line 480 and html

followed by just as many:

prince: /Users/jonathan/tmp/97.html:552: error: Premature end of data in
tag div line 372

and ending with:

prince: /Users/jonathan/tmp/97.html: error: could not load input file
prince: error: no input documents to process

Did you check first that the html is valid by viewing the source
(view, page source or similar in browser) and copying the complete
text and pasting into w3c html validator (google will find it)?

Colin

Colin Law wrote:

Sounds like the underlying html/css in your "complicated" test might
not be valid, such that prince is saying that it isn't able to
generate the pdf because it can't parse/process that html/css? You
might want to run that html/css thru a validator first, like
http://validator.w3.org/ , to first fix any invalid html/css and then
try it again.

As for flyingsaucer, I looked into using that a while back but just
didn't like all of the dependencies required to get it working at the
time, especially for ruby/rails project. But, maybe if you already
have a jvm installed, or are already running jruby, or ....

Whatever you end up using to gen your pdfs with, another tool you
might find useful is pdftk -- http://www.accesspdf.com/pdftk/ -- for
any pre-/post-processing of your pdfs, like splitting pdfs into pages,
stitching pdf pages together, adding watermarks, etc.

Jeff

Not really a valid test of prince then is it? Kind of like testing a
toaster that wasn't plugged in: "hey, this toaster is junk, ... it
didn't even heat up the bread". --Jeff

Jeff Burlysystems wrote:

Sounds like the underlying html/css in your "complicated" test might
not be valid, such that prince is saying that it isn't able to
generate the pdf because it can't parse/process that html/css? You
might want to run that html/css thru a validator first, like
http://validator.w3.org/ , to first fix any invalid html/css and then
try it again.

I think I will try this just because its a good exercise anyways.

As for flyingsaucer, I looked into using that a while back but just
didn't like all of the dependencies required to get it working at the
time, especially for ruby/rails project. But, maybe if you already
have a jvm installed, or are already running jruby, or ....

I played with it some more and it turns out that as soon as I install
the acts_as_flying_saucer plugin, the layout of all my pages gets
totally messed up. Uninstall the plugin, pages go back to normal. So
this one is definitely out of the picture. I might look at the
underlying Java Library and compile a simple binary that we can use to
convert saved pages. More work then doing it in rails, but it would be
just like prince or wkthtmltopdf.

Whatever you end up using to gen your pdfs with, another tool you
might find useful is pdftk -- http://www.accesspdf.com/pdftk/ -- for
any pre-/post-processing of your pdfs, like splitting pdfs into pages,
stitching pdf pages together, adding watermarks, etc.

Thanks. This tool does look interesting.

You can't expect anything like this to work reliably with invalid
html. You could install the html validator plugin for firefox, then
it will check the html on the fly for you.

Colin

Jonathan Steel wrote:
<a href="http://github.com/amardaxini/acts_as_flying_saucer_demo">Act as
flying saucer demo</a>
Just make sure your html is proper

Any query regarding acts_as_flying_saucer you can drop me a mail
on amardaxini@gmail.com
<a href="http://railstech.com">Amar Daxini</a>

Jonathan Steel wrote:

Jeff Burlysystems wrote:

Sounds like the underlying html/css in your "complicated" test might
not be valid, such that prince is saying that it isn't able to
generate the pdf because it can't parse/process that html/css? You
might want to run that html/css thru a validator first, like
http://validator.w3.org/ , to first fix any invalid html/css and then
try it again.

By default acts_as_flying_saucer make stylesheet media attribute as
print
thats why it's mashup

I have just update plugin and added some features also

As some others have pointed out, you simply need to provide Prince with valid XHTML. It even provides you with a nice syntax error to trace down where YOUR code is wrong. You wouldn’t expect Ruby to let you get away with “ruts ‘Hello World’” instead of “puts ‘Hello World’”. We’ve use Prince in a project that created the most complex documents you can ever imagine, using almost everything Prince has to offer, including plenty of SVG images with dynamic data in them. We’re speaking of hundreds of pages too. It works without a hitch and support from the developer is simply amazing. Yes, it’s expensive, but it’s worth every cent if you plan on generating plenty of PDF documents with complex layouts.

Best regards

Peter De Berdt

prince: /Users/jonathan/tmp/97.html: error: could not load input

it. Being so expensive, Prince pretty much had to work without a hitch
if I was going to spend any time trying to make it work.

So what I really meant by this was that with several other options
available, things had to work almost immediately if I was going to spend
any time considering them. Although it is definitely our site that is
causing the problem with Prince, finding the problem in the site could
be just as much work as creating the pdf from scratch using something
like prawn. If I did fix the site, then I still had no guarantee that it
would work with Prince when I was done, so I might as well just make the
PDF from scratch.

Invalid pages in the browser are just as evil, aren’t they :wink: I sometimes wish browsers were a lot less forgiving on that part, it would avoid a lot of seemingly unrelated issues, especially when you start using Javascript DOM manipulations.

When we tried Prawn, the size of the PDF it produces was about twice of what a similar Prince document spits out. I’m assuming the output it far from optimized. That said, it’s a very valid option in some simple PDF cases and certainly less pricey :slight_smile:

Best regards

Peter De Berdt

Invalid pages in the browser are just as evil, aren't they :wink: I
sometimes wish browsers were a lot less forgiving on that part, it
would avoid a lot of seemingly unrelated issues, especially when you
start using Javascript DOM manipulations.

Yah I knew somebody would bring that up as soon as I made my last
comment. I think the problem is probably due to our DOM manipulation.
Our site renders properly in every browser we have tried, so it took me
really by surprise then these html parsers started choking on the pages.

It may render ok at the moment, but can you sleep soundly knowing you
have invalid html and that the next release of firefox may interpret
the invalid html in a different way?

Colin