Hacker News new | past | comments | ask | show | jobs | submit login
Linux running inside a PDF file via a RISC-V emulator (github.com/ading2210)
183 points by shantara 3 months ago | hide | past | favorite | 38 comments



That’s just wrong.

It’s no wonder we have endless security issues when documents that should be just data and metadata (layout) declarations are Turing complete. Sigh.


When the last Tetris PDF was on the HN homepage I looked into ways of disabling JS in PDFs but was unable to find anyway other than disabling JS for the entire browser. I tried messing about with uBlock filters but to no avail. Does anyone know if this can be achieved?


In Firefox, about:config > pdfjs.enableScripting > false.


  on one hand: applauds ingenuity

  on other hand: gapes at major security hole in PDF, wonders why this is defacto document type in industry for reading


JS in PDF is sandboxed in Chromium, like WASM. It's not a security hole.


So you can run a PDF reader inside it to display a PDF? Honestly I am a little worried someone will come up with some sort of smart DRM based on this.

In seriousness, it’s impressive and highlights how awful the PDF format is. Can we have a document format that is less encumbered, more structured, and more accepted as a “real” document yet?


Damn Good stuff, bot the doom and this linux of your! Kudos.

I had done LLM in PDF sometime Nov last year but it wasn't released. Since the doom in pdf uptick, I had released a demo here:

https://x.com/VulcanIgnis/status/1879649889178837025

Actually you don't need an old version of emscripten, hit me up for the Makefile and pdf templates or wait till offical release. My build script will make the pdf work both on chrome and firefox, adobe support is pending.


This is kind of what NSO Group’s Pegasus did to break out of iPhone sandboxing. They exploited a codec to bootstrap a virtual machine with a custom instruction set.


This is not breaking any sandboxing as far as I understand it. PDFs can contain JavaScript, JavaScript can emulate a processor and a processor can run Linux... but that Linux is not getting outside the boundaries of the PDF viewer.


Wow. I guess it was just a matter of time after seeing the Doom and Tetris PDFs.

FYI, the PDF is 6.2 MB.


You people really managed to sprinkle JS everywhere, even on PDF. Shame on you.


Completely agree lol, why the downvotes?


Probably because JavaScript has been part of pdfs since Acrobat 4, way back in the last millennium.


On Edge (Chromium), I get this error message instead:

  An embedded page at linux.doompdf.dev says
  TypeError: Cannot set properties of null (setting 'value')
    at Object.0.9657115108887302 (<anonymous>:248:42)
    at set_interval_callback (<anonymous>:43:24)
    at <anonymous>:1:6


Kernel boot message:

”This architecture does not have memory protection”

—_—


Recent and related:

Show HN: Doom (1993) in a PDF - https://news.ycombinator.com/item?id=42678754 - Jan 2025 (74 comments)


I've invented a replacement for the PDF format.

It's called Locked Markdown. `*.lmd`.

It provides 98% of the functionality most people use PDFs for, without all the extra bullshit.

The Spec:

- The most popular Markdown Spec (git flavour?)

- At the top of the page place a hash sum of the markdown below a marker on the page.

LMD Reader:

- Prevents editing the markdown.

- Warns user if the hash sum is not valid.

Example:

```lmd

sha256 77ec0f678315f8a207c3501137e1dfc9642b79a9c93e21807df7b5242846c05c

-------------------------------

# Header 1

Paragraph of text about how bloated the PDF format is.

```

Zip .lmd together with images/videos if required.


What is stopping me just unzipping it, altering the file, hashing the file below the header and then zipping it back up?


Instead of a hash sum you could use a crypto signature. Makes it even more useful: if someone legitimately alters the file, you could verify who it was. While you're at it, make it a zipped git repository and you have edit history for free


not sure, but maybe hashing the media files to be zipped and including that hashlist in the hashed lmd would prevent that? or at least allow for a verification that they werent altered


DMCA and Trademarks


Locked Markdown sounds so close to my favourite defense contractor


Does it do vector graphics and custom fonts?


See? This is _exactly_ why Facebook banned Linux.


Facebook should bad pdfs and all discussions about them. PDFs are a cyber security threat.


Now all we need is a PDF reader embedded within a PDF. Bonus points for enabling it to open itself for infinite recursion.


Now install the pdf reader and open the pdf inside that os.


Is there even a single solid motivating example for why JS in PDF is useful? Can anyone show a real-world application of JS in PDF where it's actually a good fit?

I just don't see why PDF would be your file format of choice if you're writing JS.


We have tax forms available as interactive PDFs, which:

   * auto-computes formulas
   * enables and disables whole sections of the document depending of filled values
   * performs complex validation, beyond checking for required fields and regexp patterns
   * offers inline help
   * can be filled, saved and printed completely offline
   * when printed, looks exactly the same as traditional, paper form
   * don't require external software beyond a PDF reader


NY tax form also generates a barcode from the information you entered to facilitate faster processing


Why is this an interactive PDF in the first place?


To be able to print it and fill in by hand or by typing on a PC? To be able to save the filled-in form in a portable file?


It is allowed by the PDF/A-3 standard which lets you embed data and script into a PDF. And this is used as the archive standard for PDF files sent in electronic invoicing in a number of countries. Basically you get a human readable file containing the data presented and its digitally signed so your data matches the readable presentation.


PDF forms - you can cut down on errors by using JS to enable/disable parts of the form based on users answers (e.g if not married, you are also not filling jointly in a tax form).


Hasn't most of that moved to the web now.


PDFs work offline, and they are designed to represent printable pages, so if the end document has some use on paper, or if you want to allow users to fill it as a document and to do it completely offline, it make sense to use PDF.


The form used by the BOIR, (Beneficial Owner Information Report), which was supposed to be mandatory for all businesses in the US, uses some kind of scripting, which I suspect is JS. I had to spin up a Windows machine on Virtual Box to get the right version of Adobe Acrobat to fill it in, as Adobe dropped support for Linux years ago. Why the CFPB didn't just create an SPA or a web form for this is a mystery to me.

The BOIR is an important tool for helping investigate money laundering. Its constitutionally is in doubt, apparently by people who think money laundering is a good thing.


Same reasons JS in HTML documents is useful??

Frontend form validation. Hide/show widgets, etc




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: