The article talks about wkhtmltopdf; in fact, they developed their server in response to its limitations:
The wkhtmltopdf utility has been around awhile and works great when you get it working correctly on your platform. However, the newest version as of this writing 0.12.5 has a bug prevening TOC generation on some platforms. Some Linux platforms require the installation of Microsoft font packs, and compiling from source leads you down a rabbit hole of dependency hell.
I'd say over half of the "PDF-creation" projects posted here have been vulnerable to some/all of those attacks. (I continue to be surprised at how many web-to-pdf services exist. I guess there must be a lot of people paying for them?)
These are great security suggestions and I should make some clarifications on the intended use. We use txPDF as a backend Microservice and not open to direct public use. It is good for automating report generation from other portions of a larger system.
I'm the owner/dev of one of those paid services, and yes, competition is fierce, but people do still pay for the convenience of not having to manage it themselves. One look at the issue count of puppeteer/phantomjs/selenium/slimer... tells its own story.
I venture there is huge money in a sweet path from latex resumes in pdf format to ms word. I want to offer my clients a basic template but if I choose the latex route, I will inevitably have requests for the latter no matter how lame the format.
Have you tried using a print media stylesheet? You could hide the navigation, reduce the whitespace, maybe shrink the font size a little bit, and remove link text decoration.
Great idea. I have used print media sheets in the past, but found them easy to have regressions e.g. elements that are introduced but not hidden. A webpage to pdf process is also vunerable to that though.
I think ideally, because the resume renderer is a react component, I'd rather just boot up chromium with the react component and resume data and do a fully clean render of the page into pdf.
In Chrome Dev Tools, click on the devices button (the icon with the phone and tablet). Using the top-right menu, select "Capture full size screenshot".
Walla, you now have a full size screenshot that you can convert into PDF.
Incidentally, I am author of https://www.pagedash.com, which is a personal web scrapbook which allows you to capture the current page as HTML and generate links to share with others.
I tried it with this page only but it didn't work for me. Got a 110Kb png file but it's empty. It is a valid PNG but it's completely blank. Maybe it's buggy.
I find wkhtmltopdf very difficult to work with, for instance the official documentation is just a man [1].
I discovered the project Weasyprint[2] a few months ago. I find it easier to use, and very powerful when using Python. You can define a custom loader to inject images or styles generated on the fly for instance.
There are still some missing features compared to wkhtmltopdf, such as defining a custom footer and header, but it's a very promising project.
Since you mention Python, I have found pdfkit[1] to be a pretty good wrapper for wkhtmltopdf. I have a document generation engine that uses it dozens of times a day. Worst part is that wkhtmltopdf in the Ubuntu repos is still compiled (when last checked) without some patch that allows it to run headlessly. I built from source, which was not too difficult.
One of my application running at work has a task of creating a user ordersheet made through the main app workflow and transposing it to an HTML document which is then converted to a PDF document by wkhtmltopdf and dispatched via email, etc.
I found this setup to be really stable and easy to maintain, so far it has produced around 70k orders per year and has been running for over 4 years now without any hiccups.
Before that I was using phantomjs but it wasn’t as fast and reliable for some reasons that I can’t quite remember now, since I havent touch that part of the app in a long time.
All I remember is that wkhtmltopdf was easier to tweak and compose with.
https://prerender.com/ is a great service (fully MIT-licensed at https://github.com/prerender/prerender ) for this type of thing, both for rendering internal pages and for scraping/rendering external sites that rely heavily on client-side code.
Why not using HTML instead of PDF? I'm the author of an extension that allows to save faithfully a web page into an HTML file [1]. From my point of view, that should be the best solution for archiving web pages in a file. Votes on HN disagree with me though [2], I wished I could understand why.