One of my good friends did a lot of research on PDFs as part of his graduate research. Older versions of Adobe Writer (maybe even the current one too?) would always append and never overwrite. So if you edited pages, it would add those edits to the bottom of the file. As long as you did everything in the Writer workflow and didn't Save As a new file, you could see a history of old edits. You can even find stuff that's blacked out in some government documents.
I cannot recommend qpdf [1] enough if you want to play around with PDFs.
Aside from being an excellent pdf manipulation library it also has a mode where it outputs a version of the pdf that is much easier to manipulate with a text editor and then lets you build a new pdf from that.
Shout out to Jay who has been steadily working on it for many many years. He is the most kind, undestanding and hard working free software developer I've had the pleasure to cross paths with. Thanks for all your hard work Jay!
not necessarily, there's nothing that says that old content is preserved in inaccessible streams... the entire PDF file can be re-written discarding all old content.
This is by design and not surprising at all if you read even a tiny bit about PDF. It's in fact the default save method in nearly every PDF capable software. Rewriting the PDF is in fact the less common method. I'm surprised a researcher of PDF would be surprised by that.
However, if you are using a tool like a redaction tool then the software should forbid you from writing in append mode. This was a common error in old PDF apps and perhaps contemporary ones that are new.
Edit for politeness:
My surprise is aimed at the researcher, not you :)
1. the one or ones being addressed
2. ONE sense 2a (which is “being one in particular”)
So, pro tip: in chat-like discussions with strangers such as hacker news, one should prefer saying “one” when using sense 2, even if it sounds a bit archaic (at least to me. Is it?)
Also, when reading a “you” that could be interpreted both ways, do not assume it is used in sense 1.
I also found a thread talking about searching PDFs for specific queries (https://news.ycombinator.com/item?id=10154527) which appears to have generated some interesting results back when the thread was posted, in 2015.
Not seeing anything recent though. But on the subject of a search engine specifically for finding redacted content, I couldn't help but imagine the discussion...
"Hi, I would like to find a •••••••."
"You specifically want a •••••••?"
"Yes, literally."
[Person 2 walks away scratching their head wondering what person 1 would do with a 'hunter2']
Not really, this type of save changes at the end used to be fairly common (i assume for performance reasons on big docs back when computers were much more constrained) microsoft word did the same thing back in the day.
It's not only common it's still the way PDFs are usually saved. Open a PDF in a text editor (PDFs are text not binary files) and you can see any edits appended as "trailers".
This sounds like a feature that could be exploited in creative ways in either a product or some fun side project. I don't know what it is exactly but there's something aout never editing edit logs (possible not being obvious to the user as a factor) plus some graphical UI representation or UX flow (besides undo).