Not too horrifying: when I open the Breakout PDF in Preview, it just displays a ...

wffurr · on Jan 2, 2021

They’re not, though. Just try extracting the document structure to e.g. power an accessibility system like a screen reader, and you rapidly find out that the text is an unstructured bag of characters and positions with no semantic information at all. No paragraphs, no marked headings, not even word boundaries. You have to attempt to infer from proximity and relative sizing.

Wowfunhappy · on Jan 2, 2021

I'm not convinced there's anything you can do about that without losing what makes PDF such a useful format. One of the great things about a PDF is that you can drop a few pieces of paper into a scanner and end up with a PDF in seconds. That wouldn't be possible if you had to care about the underlying markup, as you do when e.g. writing html.

Adobe does have tools for creating PDFs that are accessibility-friendly, but it can take hours of work. As much as it sucks for certain audiences, it just doesn't make sense to do that in the general case.

wffurr · on Jan 3, 2021

Whatever source document you just scanned was almost certainly authored in a structured format.

saagarjha · on Jan 2, 2021

To be fair, Preview’s handling of PDFs is somewhat horrifying itself.