> *htmldocs is different from other tools like Wkhtmltopdf and Weasyprint in tha...

monax · on Jan 17, 2024

As the maintainer of wkhtmltopdf @ Odoo I can tell you it's not WebKit. Instead it's outdated WebKit from 2014 running on top of QT4 '^^

elmo2you · on Jan 17, 2024

@monax Since you're the maintainer, I'll assume you no doubt will know more about it than me. But from what I (vaguely) recall, it's worse than that.

Not at liberty to elaborate on exact details, but not so long ago I had to deal with wkhtmltopdf, when it turned out to be the (still preferred/recommended) PDF rendering solution as part of a major popular web middle-ware framework, at a large corporate client. I was rather shocked to see a top-tear prestigious international institution working with such outdated tech (albeit certainly in ignorance), but never mind that.

What struck me most was the nature of the bugs I encountered. Probably one of the most baffling: seemingly randomly changing formatting of the output. In the end it turned out to be a Windows specific problem, where multiple administrators logged onto the Windows Server hosting the web application. Because of different workstation display geometries on their end, they effectively kept changing the display DPI settings of that server (a headless machine, only accessed through RDP). That in turn affected the rendering internals of wkhtmltopdf. Rather hilarious when I finally figured it out. That's when I learned it best never again use wkhtmltopdf on any Windows system (if anywhere at all, for that matter).

Wasn't the WebKit core even older than 2014? Maybe something about it being older but then just maintained independently until 2014 .. or something like that? Or maybe my memory is just messed up and failing me.

Either way, what I do remember is my amazement about seeing this (at best) 10 year old code (arguably of questionable quality to start with and certainly outdated by now) still in use as a go-to solution for rendering PDF from HTML. Ended up replaced it with a puppeteer-based solution. Arguably with its own problems, but less of a black hole than wkhtmltopdf. Especially considering it was (also) rendering user-supplied data. What could ever go wrong, right?

monax · on Jan 17, 2024

It seems you have more information about this situation than I do, as I recently took on the role of project maintainer and wasn't involved from the start.

I find the bug you mentioned interesting, and I'll promptly investigate to determine if it still exists. If I remember correctly, there's a --dpi option that may have already resolved this issue.

I share your concern about the code's state; it's problematic as it's currently leaking file descriptors and memory all over the place. We experimented with Puppeteer as an alternative, but its speed and memory usage were too high for our specific use case, so we're currently stuck with the current wkhtmltopdf.

elmo2you · on Jan 18, 2024

Just to clarify, I'm not bashing wkhtmltopdf. I think it is a great tool (or at least it was), for what it was initially designed for. I've used it myself several times, to great effect. Albeit mostly long ago (about a decade or so).

I'm not even concerned all that much about the code's state itself, rather that it still shows up today. In situations it was probably never designed for. Adding to that, it now often appears wrapped inside some interfacing layer/code, binding it to whatever other software it is embedded in. But then only exposing part of wkhtmltopdf. Case in point, the --dpi option you mentioned (which indeed might fix the mentioned bug), was simply inaccessible through how the framework interfaced with wkhtmltopdf (which was a design cluster-F# in its own right).

However, the bigger elephant in the room for me is that the HTML/CSS support has changed considerably over the last decade, with wkhtmltopdf pretty much stuck in time. It's just pretty much a random hit or miss when it comes to rendering any modern web content. Only for carefully crafted (legacy) content does it still make a good use case.

I won't blame wkhtmltopdf for that though, but I sure haves questions for those who apparently still consider it a valid tool for integration into modern web frameworks. I guess part of that comes down to unintentional ignorance, not knowing what they are actually integrating. Another reason might be that I've not seen much of a better sort-of-drop-in replacement for wkhtmltopdf, so people stick to it far a lack of a proper alternative. Maybe some of the people who integrated wkhtmltopdf elsewhere are even very well aware of all the limitations/dangers. Still, little use of that if that knowledge is lost on those who subsequently end up using it, primarily just as consumers.

Puppeteer is a very different beast in its own right, regardless which engine/browser you end up using. But if more modern html/css features are required, it can be a valid replacement (regardless of numerous downsides nonetheless). Still, I (also) doubt that it will work as a good replacement for all of wkhtmltopdf's use cases.

If nothing else, I got some free entertainment out of it all .. watching the shocked expression on the faces of top-level suits, when I explained to them what they had been blissfully unaware of. They had good reasons to be worried, because this exposure should have been caught by their rigorous technology-review procedures. However, it had been an ad-hock dependencies that was introduced as part of a legit feature request. The dev had probably never even noticed, because it appeared as a native framework functionality (hiding any obvious reference to using wkhtmltopdf under the hood). Anyways, wkhtmltopdf is certainly not to be blamed for that.

Still, it does illustrate the dangers of modern-day software integration. I think especially in (web) frameworks, where the name of the game often appears to have turned towards a popularity contest (with competing frameworks) and "making things as simple as possible". Something about a road paved with good intentions.

tharos47 · on Jan 17, 2024

You can use paged.js (https://pagedjs.org/) it's a polyfill for CSS Paged Media to generate good PDFs with modern HTML/CSS.