It's expensive. Really expensive. I remember a major bank calling me and my buddy's 2-man consultancy team and telling me they had spent a small fortune on whatever the top-level access to MS developers is, to get some outdated MS COM component to interface with .NET, and MS had failed.
(We charged ~$20K and estimated two weeks. We had it working in two hours.)
I gotta ask, did you spend a week sucking your teeth after that, or did you hand it to them and say "hey, you're paying for expertise and we got it to you faster than we estimated"?
The correct way is the send the customer the almost-final version and wait for the bug report. This way you show how quickly you can tackle the problem but don't make the task look too easy.
This was back in 2004 (?), so too long ago to remember the details. I remember the phone call though because the chap that called us said he been told we never used the word "impossible."
This is such a common hole. One of my early hacks was a forum that allowed you to upload a pfp but didn't check it was actually an image. Just upload an ASP file which is coded to provide an explorer-like interface. Found the administrator password in a text file. It was "internet" just like that. RDP was open. This was a hosting provider for 4000+ companies. Sent them an email. No thank you for that one.
Uploading ASP as an image and having it execute server side is one thing.
But in this case, it's subtly different.
This issue relies more on a quirk of how PDF and PostScript relate (PDF is built on a subset of postscript).
Imagine you had an image format which was just C which when compiled and ran produced the width, height, and then stream of RGB values to form an image. And you formalised this such that it had to have a specific structure so that if someone wanted to, they didn't have to write a C compiler, they could just pull out the key bits from this file which looks like ordinary C and produce the same result.
Now imagine that your website supports uploading such image files, and you need to render them to produce a thumbnail, but instead of using a minimal implementation of the standard which doesn't need to compile the code, you go ahead and just run gcc on it and run the output.
That's kind of more or less what happened here.
It's worth noting here that it's not really common knowledge that PDF is basically just a subset of postscript. So it's actually a bit less surprising that these guys fell for this, as it's as if C had become some weird language nobody talks about, and GCC became known as "that tool to wrangle that image format" rather than a general purpose C compiler.
The attackers in this case relied on some ghostscript exploits, that's true, but if you never ran the resulting C-image-format binaries, you could still get pwned through GCC exploits.
> it's not really common knowledge that PDF is basically just a subset of postscript.
Because that's not actually true? Check out the table in the PDF specification, Appendix A, p985, listing all the PDF operators and their totally different PostScript equivalents, when there are any: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...
The PDF imaging model is mostly borrowed from PostScript, though PDF's imaging model also supports partial transparency. The actual files themselves are totally different.
In this case, no PDF files were involved at all, but a PostScript file renamed to .pdf, which was used to exploit an old insecure GhostScript's PostScript execution engine (PostScript is a programming language, unlike PDF) or maybe parser:
> According to S0I1337, it was done by exploiting a vulnerability on 4chan's outdated GhostScript version from 2012 by uploading a malformed PostScript file renamed to PDF to gain arbitrary code execution as 4chan didn't check if files with PDF extensions were actually PDF files -- https://wiki.soyjak.st/Great_Cuckset, see also the image in A_D_E_P_T's comment https://news.ycombinator.com/item?id=43699395
Read section 2.4 of the PDF you linked for a bit of additional information on this "bsaically".
GhostScript is a postscript interpreter which can handle PDF files by applying the relatively simple transformations described in that section of the PDF. Whether they embedded the ghostscript exploit within the PDF, or didn't, it's not particularly important for making my point.
That seems like saying "Python is basically a subset of C; just run the simple transformations Cython implements". PDF can be transformed into something a PostScript interpreter can understand in the same way Python can be transformed into something GCC can understand. That is not what "subset" means.
Yes. The section itself says PDF differs significantly from PostScript. The required changes detailed there to transform a PDF to PostScript are substantial: add PostScript implementations of the PDF operators; extract and translate the page content, changing the operator names, decompressing and recompressing text, graphics, and image data, and deleting PDF-only content; translate and insert font data; reorder the content into page order. What you end up with is very different - PDF is not basically just a subset of PostScript.
The substantial differences are in terms of restrictions to postscript to reduce it to a declarative language rather than a full fledged programming language.
A PDF is a collection of isolated, restricted postscript programs (content streams) and the data required for rendering stuffed into one file. The overarching format is a subset of COS. But for all intents and purposes you can imagine this as a tarball containing postscript and other data.
The transformations required to go from PDF to postscript amount to:
1. Include some boilerplate
2. Pull out the content streams (postscript bits) ignoring the pdf-specific extensions
3. Search and replace the names of two procedures
4. Pull out the data required for rendering, optionally decompressing it if your postscript output doesn't support the particular compression in use
5. Concatenate all the data in the right order (on the basis of some metadata in the format)
Fun fact, to top it off: The COS format which is the structure behind a PDF, itself looks a lot like postscript, that's because apparently it's originally based on postscript [0] (although it has deviated).
These were fun times. I've been working as a pentester for the past ten years, and the job got a lot harder, with everything using frameworks and containerization.
We still get plenty of results, because the tooling also gets better, and finding just one vulnerability is enough to be devastating, which makes it kind of frustrating. There is tons of progress, but much of it is just not paying dividends.
Germany is definitely top-tier on reducing waste. Before I left the UK it had got to the point where actual landfill-bound unrecyclable trash was a tiny portion of the waste output.
The sad thing is, UK and Germany are tiny compared to all the other countries that don't give a shit.
My favorite is all these letter-soup Firewire-to-USB convertors which are just glue and random wires inside and are either completely inert or disastrously damaging to your peripherals:
I literally got a Firewire to USB converter yesterday to try and pull video off a DV Camcorder. A video capture card in the same price range had worked great for letting me stream VHS tapes through OBS Studio.
There are various YouTube videos showing a daisy chain of Thunderbolt 3 to Thunderbolt 1/2 adapters connected to various Firewire cables and adapters. I was hoping to avoid all of that but the camera doesn't show at all. Fortunately, nothing seems damaged on either side.
I've only got 6 tapes so I'm sending them off to a service and sending the adapter back to Amazon.
Wow, I wondered what the limit was. I never checked, but I've been using it hesitantly since I burn up OpenAI's limit as soon as it resets. Thanks for the clarity.
I'm all-in on Deep Research. It can conduct research on niche historical topics that have no central articles in minutes, which typically were taking me days or weeks to delve into.
I like Deep Research but as a historian I have to tell you. I've used it for history themes to calibrated my expectations and it is a nice tool but... It can easily brush over nuanced discussions and just return folk wisdom from blogs.
What I love most about history is it has lots of irreducible complexity and poring over the literature, both primary and secondary sources, is often the only way to develop an understanding.
I read Being and Time recently and it has a load of concepts that are defined iteratively. There's a lot wrong with how it's written but it's an unfinished book written a 100 years ago so, I cant complain too much.
Because it's quite long, if I asked Perplexity* to remind me what something meant, it would very rarely return something helpful, but, to be fair, I cant really fault it for being a bit useless with a very difficult to comprehend text, where there are several competing styles of reading, many of whom are convinced they are correct.
But I started to notice a pattern of where it would pull answers from some weird spots, especially when I asked it to do deep research. Like, a paper from a University's server that's using concepts in the book to ground qualitative research, which is fine and practical explications are often useful ways into a dense concept, but it's kinda a really weird place to be the first initial academic source. It'll draw on Reddit a weird amount too, or it'll somehow pull a page of definitions from a handout for some University tutorial. And it wont default to the peer reviewed free philosophy encyclopedias that are online and well known.
It's just weird. I was just using it to try and reinforce my actual reading of the text but I more came away thinking that in certain domains, this end of AI is allowing people to conflate having access to information, with learning about something.
If you're asking an LLM about a particular text, even if it's a well-known text, you might get significantly better results if you provide said text as part of your prompt (context) instead of asking a model to "recall it from memory".
So something like this: "Here's a PDF file containing Being and Time. Please explain the significance of anxiety (Angst) in the uncovering of Being."
When I've wanted it to not do things like this, I've had good luck directing it to... not look at those sources.
For example when I've wanted to understand an unfolding story better than the news, I've told it to ignore the media and go only to original sources (e.g. speech transcripts, material written by the people involved, etc.)
Deep Search is pretty good for current news stories. I've had it analyze some legal developments in a European nation recently and it gave me a great overview.
LLMs seem fantastic at generalizing broad thought and is not great at outliers. It sort of smooths over the knowledge curve confidently, which is a bit like in psychology where only CBT therapy is accepted, even if there are many much more highly effectual methodologies on individuals, just not at the population level.
Interesting use case. My problem is that for niche subjects the crawled pages probably have not captured the information and the response becomes irrelevant. Perhaps gemini will produce better results just because it takes into account much more pages
I'd echo this somewhat. Wading into other people's projects I'm often like "wtf is this mess.. why are your variables named a, b and c?"
BUT.. some of the code that LLMs have spit out at me is absolutely wild in how well it is constructed, and it gives me real imposter syndrome when I see the occasions when the code is so much tighter and better than I would have personally crafted.
A large part of that is just following style guides. It's crazy how few people actually read things like PEP8 despite writing tons of code in python. It's not even that much to learn and most is quite logical - all while making your code not just appear professional but also be more readable by others.
These platforms all feel like they are being massively subsidized right now. I'm hoping that continues and they just burn investor cash in a race to the bottom.
reply