In our shop, I ended up building a small express.js server that runs puppeteer on request. You give it a url, it will open it and render it into pdf which is streamed back as a response. Since we have a separate front-end team, this allows them to control the pdf design using familiar tools (html/css).
We switched to GitHub - prawnpdf/prawn: Fast, Nimble PDF Writer for Ruby. We had to re-create our templates/views in Prawn’s DSL since Prawn doesn’t convert HTML like wicked_pdf does, but it wasn’t too difficulty or time consuming. We’ve been happy with Prawn since we made the switch.
There are also various binaries which can be installed on deployment servers and use them by calling system() calls from Rails app or background processes.
We do something similar, actually! Only difference is we have puppeteer return a PNG, because the PDF generation wasnt as accurate as an image. Then we have a separate process that converts that to PDF using image magick.
As @siasmj says. If at all possible, prawn. Its totally different world than converting html to pdf but for creating documnets its much more stable and, at least for our cases, no more work than converting html to pdf which always caused problems.
We did something similar but not as a separate microservice. Instead was just have a nodejs-based cli program that accepts HTML on STDIN and returns the PDF on STDOUT. Then just using popen we can write our HTML to this script and receive the resulting PDF.
It’s work fairly great as it avoids all networking hickups you might get with microservices and is just bundled with the app. The only downsides are:
It does balloon the container image a bit since Chrome ends up packaged with the app. But it’s not like Ruby-based containers are small anyway so .
There is a bit of a delay as it’s spinning up the Chrome process each time you want a PDF rendered. I’m thinking of addressing this by keeping the script always running similar to how FastCGI operates. Send in HTML on stdin still but just separate each request by some sort of indicator and probably have a mutex so it’s rendering on one PDF at a time.
I love this approach, and also considered it, but ended up skipping the CLI step. No particularly strong reason, but I figured even if we make it listen on a port, we could still run it inside an app container, so it doesn’t have to be a separate service. But it 1) is designed with network IO in mind, which is easily directly controlled in js server code, 2) is ready to be scaled independently of other worker/web machines, 3) keeps chromium running (as you noted, avoids cold starts), all out of the box.