Friday, 6 October, 2023 UTC


Summary

Background

I have been working on a long running Rails application where one of the primary pieces of functionality is the ability to export dozens of reports in PDF format. When the application was first written the PDF generation was handled by PDFKit. This Ruby gem uses wkhtmltopdf under the hood to generate PDFs from HTML.

PDFKit and WKHTMLTOPDF

Over the years we have encountered many issues with formatting due to wkhtmltopdf using the WebKit engine to render the HTML for conversion. Over time this led to several workarounds to allow the generated PDF to match the HTML provided. As we went through a total redesign of the UI and modernized our CSS wkhtmltopdf required more and more special treatment to allow the rendering to somewhat match what was intended. It also meant that newer CSS features, such as flexbox, were unsupported.
The wkhtmltopdf upgrade process was also slightly painful as we were using a version with a patched QT that we would need to track down and replace with every upgrade.

Grover and Puppeteer

With all of the considerations above and after developing a visualization that made heavy use of flexbox for its layout and realizing that the PDF output did not appear anything like the HTML provided we were on the search for a replacement. After another team member (Shout Out to Sam Ehlers) put together a proof of concept using Chrome Headless along with the –print-to-pdf functionality it seemed like it would be a viable option for generating PDFs of our reports. That proof of concept also came with the realization that we would need to come up with a way to present the HTML so that we could provide both landscape and portrait orientations. It also meant updating out CSS that was targeting the print media type as that was what Chrome Headless was targeting in the rendering of the export. We went looking for options that would make the process easier and finally found Grover.
Grover is a combination of a Ruby gem that you can call directly or use as middleware in your Rails application and the puppeteer npm package used to control Chrome Headless.
Since the application already had the code infrastructure built in to handle PDF generation with PDFKit we opted not to use Grover as middleware, but to call it directly, replacing the PDFKit calls that already existed.
This worked very well for the most part, keeping in mind that Chromium will try to convert any relative path into a full path, so there needed to be logic for our local development environment and production to convert any links into their fully qualified counterparts.
This was not an issue with the content of the reports themselves as they contained no links. However, in the GCP environment with IAP turned on the PDFs were missing the styling and some of the content that was rendered through javascript to the page. It turns out that when rendering the HTML, Chrome Headless was trying to follow the links to the CSS and Javascript and since it was not authenticated through IAP that content was blocked.
The solution for this was to write a helper for the report export views that read the compiled CSS and Javascript, converted them to a Base64 encoded string and then embedded the links on the page like this:
tag(:link, rel: :stylesheet, href: "data:text/css;base64,#{base64_data}")
This allowed all of the data to be available to Chrome Headless without the need to configure any IAP access.
The configuration for Grover ended up being very similar to the example configuration provided with the exception of using wait_until: 'networkidle0' versus wait_until: 'domcontentloaded' to account for some of the javascript content taking a little longer to render.

End Results

The transition to this new approach brought about a welcome transformation in our workflow. By embracing this switch, we had the opportunity to shed a substantial portion of the previously convoluted conditional HTML and CSS formatting we had been heavily dependent on when dealing with wkhtmltopdf.
We could now embark on the development of increasingly intricate and sophisticated layouts for future reports, secure in the knowledge that the resulting PDF exports would align with our expectations.
The post Replacing PDFKit with Grover for Rails PDF Generation appeared first on Simple Thread.