Successor to `wkhtmltopdf` (& `wicked_pdf`)

I found out today that GitHub - wkhtmltopdf/wkhtmltopdf: Convert HTML to PDF using Webkit (QtWebKit) has been archived. And more worryingly, there hasn’t been a new package generated for quite a while (for example, current stable Debian is unsupported): wkhtmltopdf, which means that GitHub - zakird/wkhtmltopdf_binary_gem: Ruby gem containing easily installable access to wkhtmltopdf application is not going to be reliable into the future.

We are using wkhtmltopdf because we use wicked_pdf. I found this issue Long term plans given the deprecation of wkhtmltopdf? · Issue #1081 · mileszs/wicked_pdf · GitHub but thought it might be good to get some more eyes on it and see if anyone else has suggestions for successors.

What tools are folks looking at for PDF generation these days?

1 Like

The best way I have found is to use Chromium print-to-pdf. If you have to run it on user inputs, then sandbox it in a container.

In our shop, I ended up building a small express.js server that runs puppeteer on request. You give it a url, it will open it and render it into pdf which is streamed back as a response. Since we have a separate front-end team, this allows them to control the pdf design using familiar tools (html/css).

1 Like

We switched to GitHub - prawnpdf/prawn: Fast, Nimble PDF Writer for Ruby. We had to re-create our templates/views in Prawn’s DSL since Prawn doesn’t convert HTML like wicked_pdf does, but it wasn’t too difficulty or time consuming. We’ve been happy with Prawn since we made the switch.

1 Like

I use https://ruby.libhunt.com to find alternatives to ruby gems. There are a number of alternatives for generating pdf: Wicked Pdf Alternatives - Ruby PDF | LibHunt.

There are also various binaries which can be installed on deployment servers and use them by calling system() calls from Rails app or background processes.

We do something similar, actually! Only difference is we have puppeteer return a PNG, because the PDF generation wasnt as accurate as an image. Then we have a separate process that converts that to PDF using image magick.

As @siasmj says. If at all possible, prawn. Its totally different world than converting html to pdf but for creating documnets its much more stable and, at least for our cases, no more work than converting html to pdf which always caused problems.

Check out grover:

It uses puppeteer behind the scenes without the need to run an express.js server (though I’ve done that too and it works well also).

You can use DevTools Protocol lib(GitHub - rubycdp/ferrum: Headless Chrome Ruby API / GitHub - YusukeIwaki/puppeteer-ruby: A Ruby port of Puppeteer) to call chromium to generate images.

Installing chromuim will make the docker image very large, so I extracted a microservice to do that.

Could you clarify how your extracted microservice solved the issue of docker image being very large with chromium? I couldn’t follow.

We did something similar but not as a separate microservice. Instead was just have a nodejs-based cli program that accepts HTML on STDIN and returns the PDF on STDOUT. Then just using popen we can write our HTML to this script and receive the resulting PDF.

It’s work fairly great as it avoids all networking hickups you might get with microservices and is just bundled with the app. The only downsides are:

  • It does balloon the container image a bit since Chrome ends up packaged with the app. But it’s not like Ruby-based containers are small anyway so :man_shrugging:.
  • There is a bit of a delay as it’s spinning up the Chrome process each time you want a PDF rendered. I’m thinking of addressing this by keeping the script always running similar to how FastCGI operates. Send in HTML on stdin still but just separate each request by some sort of indicator and probably have a mutex so it’s rendering on one PDF at a time.
1 Like

I love this approach, and also considered it, but ended up skipping the CLI step. No particularly strong reason, but I figured even if we make it listen on a port, we could still run it inside an app container, so it doesn’t have to be a separate service. But it 1) is designed with network IO in mind, which is easily directly controlled in js server code, 2) is ready to be scaled independently of other worker/web machines, 3) keeps chromium running (as you noted, avoids cold starts), all out of the box.

Just separate the dependencies from the main application image.

So basically you encapsulate an image into PDF?