Great read - the lengths the attacker went to to not only hide it but also build the trust to be able to pull this off is incredible. So calculated.
One can only imagine they felt a great deal of panic and despair as the security community lit up the issue, regardless of a near zero chance of repercussions.
Very clever - those involved in catching this are, quietly, heroes
The use of trie structure to compare strings without hardcoding raw, searchable string values is very inspired.
But yes, some black hat had a really really bad day: being so close to owning any ssh-enabled recently-updated box and when they can finally see a finish line it is just poof... gone in a day. It was two years of effort from at least one guy and more likely a team.
> The use of trie structure to compare strings without hardcoding raw, searchable string values is very inspired.
Aside, this is one of my grumpy-old-dude opinions when it comes to regular non-malicious code: Source "greppability" is a virtue, one that must be balanced against "don't repeat yourself" and "compile time checking", etc.
Some examples off the top of my head:
1. One import/alias per line, no wildcards.
2. If you have to concatenate a literal string across multiple lines, try to ensure the break isn't inside an interesting substring someone would search for.
3. If you write a compile-time constant like foo=4*6*9, the comments around it should also contain the result. Sure, someone might screw up and forget to keep the comment in-sync, but it can also be invaluable when someone else is tearing their hair out trying to find out why the production logs are saying nothing but "Illegal frump: 216".
Don't get me wrong, I love a good IDE, and many will offer "Find Usages" or "Find Literal"... but it doesn't always work all the time, for all languages in the project at once, for everyone on a team who might use other tools, or during a PR review through a web-browser, etc.
Along the lines of source greppability, at some point someone will try to replace successive a.foo_x, a.foo_y, and a.foo_z accesses with a loop over foo_{suffix}, thus breaking any attempt to grep for foo_y. That someone may be me! This should almost always be struck down in code review wherever possible. Being a bit more DRY isn’t worth it!
If you have to concatenate a literal string across multiple lines, try to ensure the break isn't inside an interesting substring someone would search for.
This is called semantic line breaks and is also important to get (admittedly more naive) diffs sensible.
On the other hand, the shorter version communicates "five" without a human needing to count them up. Either way, some grep-ability would come from hits on samples within unit tests.
I think it goes beyond just this particular attempt. I bet they were looking for other targets, targets that are widely-used but maintained by resource-deficient projects. They wanted to do maximum damage with the fewest possible supply chain attacks.
The fact that many projects are now searching deep and wide through their commit histories, looking at just who they're taking code from, beginning to develop frameworks for attack mitigation and remediation... an entire type of previously very-promising attack is completely burned. This has been a massive defeat. And it happened all by chance.
>an entire type of previously very-promising attack is completely burned.
I fear it's not just the attack that is burned. If new contributors have to be distrusted and/or go through some sort of vetting that isn't based on the merit of their contributions, that is a terrible blow to the entire open source movement.
The threshold for young coders without a great deal of history and without a network of contacts to become contributors to important open source projects has just gone up massively.
Nah. Hardly going to change anything for the worse, only for the better.
The bar is still trivial. The worst part is that it will be difficult to be anonymous. There are a lot of valid, even life & death reasons for anonymity, and that will now be much harder.
So the loss is all the contributors who don't dare let their government or employer see them helping the wrong projects, or in some cases don't want to disclose that they are any good at coding or particularly interested in security etc.
I bet a bunch of employers don’t want any unauthorized contributions to open source. For governments it seems much more niche—-only a few specific projects would raise red flags.
The average business still isn’t going to have any clue who GitHub user coderdude57 is in real life. But coderdude57 may be forced to hop on a video meeting to prove who he is to a project lead before his code is accepted.
> The main vector of attack was patching the code as part of running tests.
No, it was not.
The test files were used as carriers for the bulk of the malicious code, but running the tests did nothing malicious; the supposedly corrupted test file did act as a corrupted file in the test. What extracted and executed the code hidden in these test files was a bit of code injected in the configuration steps, which in turn modified the compilation steps to extract and execute more code hidden in these test files, which in turn extracted and injected the backdoor code. This all happened even if the tests were never run.
> I would expect to see more projects try to protect against this in general by attempting to separate the building and testing procedures.
The suggestions I've seen on this line of thinking were to not only separate the building and testing steps (so that the testing steps cannot affect the output of the build step), but also to remove all test files before doing the build, and adding them back for the testing step. The important part being to remove all binary test files before doing the build, and having two separate steps only as a side effect of not being able to test without putting the test files back.
I don't think this approach is going to be adopted, as it's too much work for little gain; people are focusing more on making sure all files on the tarball used for the build either come from the corresponding tag in the repository, or can be proven to have been deterministically generated from files found in the corresponding tag in the repository. That would have either caught or ignored the key file added to the tarball which started the extraction and injection steps.
How much effort was it really? Yes, they worked on it over 2 years, but I guess it wasn't more effort than a few hours every other week (apart from engineering the actual exploit). After all putting in a full time effort as an unpaid contributor would be suspicious in itself.
Assuming they work 40 hours a week and are doing this in a team (presumably every major player has such a team or is scrambling to get one now), one must expect many potentially infiltrated projects out there.
“Apart from engineering” is doing some seriously heavy lifting here. Writing the actual code is likely just an afterthought by comparison. The engineering - identification of a target weakness, the design of the exploit chain, etc. - is overwhelmingly going to have been the lion’s share of the effort.
Governments are made up of people. There is likely at least 1 real person (if not more) for whom this attack has been the entirety of their professional life for years.
If so, then they were paid for those hours and are perfectly whole right now. Maybe their career takes a dent but I do not weep for whoever this was no matter which government they worked for, unless it was so bad they were actually a slave and forced to do this against their will.
The government who paid them just lost the investment, but who cares about them? It's only a good thing if this does not get a reputation for being a good invesment.
If it was a criminal, then the same as the government they lost the investment but again that's only a good thing.
> > and when they can finally see a finish line it is just poof... gone in a day. It was two years of effort from at least one guy and more likely a team.
> If so, then they were paid for those hours and are perfectly whole right now.
People are not ideal money-fed automatons. Even if you're fully paid for your software development, it feels bad to see all your work thrown away.
If only I had also said something like "maybe their career takes a dent but I do not weep for anyone who worked on this" to indicate that I understood that obviosity.
> > People are not ideal money-fed automatons. Even if you're fully paid for your software development, it feels bad to see all your work thrown away.
> maybe their career takes a dent but I do not weep for anyone who worked on this
Even if they were fully paid, and even if their career is not affected at all (or even improves, "I wrote the xz backdoor" would be an impressive line in a curriculum if you disregard the moral aspects), it can still feel bad to see your work thrown away so close to the finish line. People are not career-driven automatons.
But I agree with you, I do not feel bad for the author of this backdoor; whoever did that does deserve to see this work unceremoniously dumped into the trash. But I can understand why the ones who worked on that code would feel bad about it.
I don't see why; news coverage has pretty uniformly taken the view that "this guy was an evil genius, and we escaped his incredibly well-executed plot by the skin of our teeth".
The statement was that someone had a really bad day.
The implication that people are driven by money and nothing else, or even that they have no right to feel like they had a bad day if they were paid, is absurd.
Nobody is saying that you should be sympathetic. It’s just an interesting comment: An interesting thing to think about. A worthwhile contribution to the conversation. This was someone’s very bad ending to a very long project.
I agree, I had never heard of using a trie as a means of hiding strings.
I'm not familiar with any programming language that provides tries in a standardized way. They're not so hard to code, so I wonder if this will become a trend in future malware.
That they were harassing and manipulating a lone, unthanked maintainer who had already told them he was dealing with mental issues makes them evil, IMO.
The honorable thing for "Jia" to do after this epic failure is seppuku, or whatever his or her local equivalent is.
Nobody sees themselves as the bad guy, and that’s not the same as “some people are just fundamentally selfish”. There are definitely loads of people that’d feel like the end justifies the means. There are plenty of people for whom a workday involves doing far worse things for the world than cyberbullying one person, and will look you in the eye and justify it. Plenty of that stuff is socially acceptable in many many mainstream circles. “Being mean to a maintainer” is just one that this community is especially sensitive to, because it involves a highly personified victim that they can relate to.
These maintainers add vast amounts of value to the modern world, though most of the people that benefit indirectly from their work can't really conceive of what it is they do.
People like "Jia" are pure parasites. It's one of the best cases of "why we can't have nice things" I've ever seen.
Yeah, they add vast amounts of "value", including (accidentally) reinforcing the status-quo.
It would definitely be interesting to see what would happen if the attack wasn't noticed, but instead people focus their interest on attacking Jia Tan because "wow, that guy is one hell of an asshole, it sure is a great thing that he failed!".
Whether or not this attack was the rare one that failed out of many similar ones is largely irrelevant to people. Quick, discuss this one particular case where we noticed it, news flash and all.
> People like "Jia" are pure parasites
They are "parasites" because they don't do what they are "supposed to"? That's pretty crazy. I guess every person that's doing what matches their interests is somehow a bad person/parasite/etc. Or is that only if what they do is forbidden by some rule book? Do you see what I'm getting at here?
HFT is also harmful, and so are the majority of startups that don't do anything actually useful and just take VC money. Those are just a few examples off the top of my head.
What I'm saying is that there are a lot more technically legal ways to profit that harm society, some of them more nefarious than what Jia Tan did.
Doing things that are bad for the society in a fucked up society seems justifiable. It doesn't necessarily make you a bad person.
People just have a more averse reaction to things that are obviously bad, even if in practice there are way worse things that initially seem innocuous and are actually legal to do. That's just the textbook example of hypocrisy.
The fundamental question remains: considering the in-depth knowledge of low-level (a small scale disassembler), system programming and the practical application of data structures (i.e. using a trie for all string operations to avoid the use of string constants), was it a state sponsored attack, or it was a one man show who did it for personal gain? Attempting to backdoor nearly every single Linux instance of sshd goes beyond the definition of just being brazen.
> The fundamental question remains: considering the in-depth knowledge of low-level (a small scale disassembler), system programming and the practical application of data structures (i.e. using a trie for all string operations to avoid the use of string constants), was it a state sponsored attack
I think you're missing where the difficulty is. I'd argue the technical knowledge here isn't the hard part here. You can find more than enough folks who would know how to do everything you listed outside of any state-sponsored context. The hard part is the methodical execution, including the opsec, backup plans, plausible deniability, false identities, knowing how to play the long game, being able to precisely target what you want, understanding human psychological tendencies, identifying unforeseen security vulnerabilities, etc... these are the things that I imagine are likely to distinguish talented individuals from state-sponsored actors, not something like "knows tries and x86 assembly".
And taking all those into account, this absolutely smells state-sponsored. I just see no way that a private entity had enough of an incentive, foresight, experience, and patience to play such a long game with such careful maneuvers.
I do not think I am missing out: the supply chain attack which included playing the long game, subverting the trust of the development community is the real issue that the open source community has no defences against – the thwarted attack has surpassed the scale of all previous supply chain attacks on NodeJs, Python and similar ecosystems and went deep down into the low level, technical layers as well.
The assault was comprehensive, holistic and systematic in its approach – this article does not mention it, but other reports have indicated that the person behind it also managed to compromise the PKI layer at the edge between OpenSSL and sshd which brings an extra level of complexity to the backdoor.
A lot of the filter is just limited time and energy. A college kid has plenty of that, but not the right skills. There are more than a small number of people with the right skills but they have day jobs. That’s what makes think whoever was working on this was being paid upfront to do so. That said an obsessive with an easy day job and not a ton of family responsibilities is always a possibility.
> There are more than a small number of people with the right skills but they have day jobs. That’s what makes think whoever was working on this was being paid upfront to do so.
Doesn't this apply to many OSS contributors? people with skills and free time?
That nobody is talking about the person's IP addresses (the xz project was hosted on a personal server of the maintainer), or any details about their actions indicates to me it was a state actor and the original xz maintainer is cooperating with law enforcement to uncover their actions.
From that article, "To further investigate, we can try to see if he worked on weekends or weekdays: was this a hobbyist or was he paid to do this? The most common working days for Jia were Tue (86), Wed (85), Thu (89), and Fri (79)." That makes it more likely this work was done during working hours; someone doing things outside of work hours would be more likely to produce the same amount (or more) on weekends and holidays.
Whilst the ip addresses and email headers etc should be examined meticulously, in the distant hope that they lead somewhere, the chances are that they won't. Very basic opsec.
or they are totally nonplussed because they are not even a single individual, and a state actor who is doing this on 10-20 other projects for the past 5-10 years which haven't got the same attention. This is their job, being compromised is always a risk.
Not if this is one of a few dozen or few hundred similar ongoing operations. The risk is always there, they have to expect some amount of failure. Open source software is constantly being probed for vulnerabilities in every way possible, from random commits to high level members of committees for standards. Every operation is one in a grab bag of disposable and deniable efforts.
I also am a little biased and assume being burned in state-sponsored acts is similar to the no-blame culture of breaking infrastructure in tech :) because by all accounts this compromise was extremely well done, until it wasn't.
Also, we can't be sure the compromise wasn't intentionally telegraphed to cause some other action (using a different library) on purpose.
>Not if this is one of a few dozen or few hundred similar ongoing operations. The risk is always there, they have to expect some amount of failure.
That actually makes me think it's not happening at a larger scale, since we'd likely have heard of at least a few similarly elaborate cases being uncovered by now. If not during the attempt itself, then at least at some later point in time.
Either almost all of these operations remain undetected, because they are even more sophisticated and much of the world's software ecosystem has been secretly compromised for years or there aren't actually that many such operations.
Unrelated note: I had always thought "nonplussed" was basically a synonym for something like "bewildering confusion." But the way you used it in this context suggested the exact opposite. It turns out that "nonplussed" has also come to mean "unperturbed": https://en.wiktionary.org/wiki/nonplussed
Quite confusing, because the two different meanings are nearly opposite to one another.
> Quite confusing, because the two different meanings are nearly opposite to one another.
It's pretty easy to see where the innovative sense came from: "plussed" doesn't mean anything, but "non" is clearly negative. So when you encounter the word, you can tell that it describes (a) a reaction in which (b) something doesn't happen. So everyone independently guesses that it means failing to have much of a reaction, and when everyone thinks a word means something, then it does mean that thing.
You see the same thing happen with "inflammable", where everyone is aware that "in" means "not" and "flame" means "fire". (Except that in the original sense, in is an intensifying prefix rather than a negative prefix. This doesn't occur in many other English words, although "inflammation" and "inflamed" aren't rare. Maybe "infatuate".)
Here’s what I don’t get: why the many layers of obfuscation in the build phase? (I understand why in the binary linked into ssh.)
Once the first stage, extracting a shell script from one of the „test“ data blobs, has been found it was clear to everybody that something fishy is going on.
It’s inconceivable that I’ve would have found the first stage and just given up, but then it was „only“ a matter of tedious shell reversing…
They could easily done without the „striping“ or „awk RC4“, but that must have complicated their internal testing and development quite a bit.
> It’s inconceivable that I’ve would have found the first stage and just given up
But what you were looking at might not be the first stage.
You might be looking at the modified Makefile. You might be looking at the object files generated during the build. You might be looking at the build logs. You might be investigating a linking failure. The reason for so many layers of obfuscation, is that the attacker had no idea at which layer the good guys would start looking; at each point, they tried to hide in the noise of the corresponding build system step.
In the end, this was caught not at the build steps, but at the runtime injection steps; in a bit of poetic justice, all this obfuscation work caused so much slowdown that the obfuscation itself made it more visible. As tvtropes would say, this was a "Revealing Cover-Up" (https://tvtropes.org/pmwiki/pmwiki.php/Main/RevealingCoverup) (warning: tvtropes can be addictive)
Reduces the attack area through which this could be found, I expect. Without all the obfuscation someone might spot suspicious data in the test data or at some other stage, but this basically forces them to find the single line of suspicious shell script and follow the trail to find the rest of the stuff added to the build process.
> Here’s what I don’t get: why the many layers of obfuscation in the build phase?
For a one of its kind deployment it would probably not matter. However, deploying to multiple targets using the same basic approach would allow all of them to be found once one was discovered. With some mildly confsing but different scripting for each target systematic detection of others becomes more difficult.
Those who caught it were indeed very clever. But the attacker did make mistakes, particularly the Valgrind issues and large performance regression. That ultimately is what raised suspicions.
"Large" is doing quite a bit of work here: the engineer who found it had disabled turbo boost and was, by at least a little serendipity, quite performance focused at the time rather than being busy on other elements of their release target.
I wonder then, whether the attacker has already started over with a new open source project, or found something else to do.
Or was xz only one of multiple targets where the overall plan didn't actually fail and an updated backdoor, without the performance issues, will soon be injected again through some other project.
I thougt I read several years ago, like 10 or 15 even, that they could fingerprint people by communication patterns. Like how you can sometimes recognize an author just by the way they write, just by the text itself.
Surely that would be something the current ais could do 100x better?
So I wonder if there isn't a business there in scanning anonymous posts to associate them with others to deanonymize them all?
You could obviously use an ai to rewrite your text, but I bet there is something there that can still be correlated. Simply avoiding using favorite words and phrases doesn't change the essence of whatever you're saying.
I think the concept of sockpuppet is about to get complicated. Really it probably already is. If I can imagine something, someone else has probably already been doing it and countering it, and countering that for a decade.
That's a big part of how they caught the Unabomber. Unusual spellings of words and ways of saying things. His brother apparently recognised some of these tics.
If a government agency tasked with national security started caring about a critical home project by Nebraskaman only and long after Nebraskaman collapsed, that country needs a new government agency.