where salt and hash are known. X is derived from the names of programs existing on a Windows machine with a particular format.
Or, find a way to calculate
md5(md5(...10,000 times...(md5(X + salt')...))
given that hash is known and salt' but X is not.
Or alternatively, attempt a known plain text attack against RC4. Given that a certain amount of plain text is known (4 bytes) at the start of the RC4 payload then it's likely that the first few bytes of the keystream are known and an attack could be mounted via weakness in the RC4 key schedule.
It should be possible to do a brute force search using a couple of days of EC2 or (insert your favorite cloud provider) here. And by bruteforce you can try text search, or just go for the raw bytes. Not sure a collision can work in this case as well.
To recover X + salt you'd be looking at a preimage attack of MD5. I am only aware of one preimage attack against MD5 and it's only theoretical.
The input to the RC4 key generator is an MD5 hash which means you'd be looking at doing a brute force attack against an input of 2^128 bits. Assuming you find the answer on average in 2^127 and you are looking at an enormous search space.
According to a recent article EC2 has about 500,000 machines. Now assume that I buy them all and I am able on each machine to check 1,000,000,000 values as inputs to RC4 per second then I should have the answer in 800,000 times the age of the universe. But I think my credit card will have been cancelled first.
The question is how large that search space is. If you can get a reliable list of directory names and file names then it might be small, but if you are left iterating characters in filenames (and this appears to be Unicode) then I'd imagine you'd run into the same situation.
I'd be much more tempted to look at the fact that the first four bytes of the RC4 key stream appear to be recoverable and look at key recovery from that.
Even with a reverse-md5^10000 oracle, you'd only get some bits that hash to the same hash as the mysterious pair of strings. Unfortunately the decryption key is derived from the pair of strings themselves, not from their hash. Reverting md5 is not enough to retrieve the decryption key.
The article mentions "~" as a possible starting point, but "{" is also greater than 7A, which would match all the "InstallShield Installation Information" subfolders.
Great point... are those uniquely named based on the application installed? That might be a nice, oblique way of checking if a particular program is installed.
It seems to me that's a good theory of what it might be looking for, as GUIDs should make good triggers. I wonder if this reduces the search space enough to make brute force feasible now.
My thought exactly - how weird it must be to be inside looking out.
Public key crypto was discovered on the inside long before it was rediscovered on the outside, and I figure that the insiders must have been amused by Diffie, Hellman, R, S, and A.
In 1997, it was publicly disclosed that asymmetric key algorithms were developed by James H. Ellis, Clifford Cocks, and Malcolm Williamson at the Government Communications Headquarters (GCHQ) in the UK in 1973.[4] These researchers independently developed Diffie–Hellman key exchange, and a special case of RSA. The GCHQ cryptographers referred to the technique as "non-secret encryption". This work was named an IEEE Milestone in 2010.[5]
So whilst the malware will infect machines more or less indiscriminately, the payload itself can only be successfully decrypted (and therefore activated and executed), on machines that have a specific set of programs installed?
"the attackers are looking for a very specific program with the name written in an extended character set, such as Arabic or Hebrew, or one that starts with a special symbol such as “~”."
I suppose µTorrent is too obvious... Anyway, these kinds of mysteries help re-ignite my interest in Cryptography. I'd love to hear feedback from a fellow HNer about the course from Udacity (perhaps via email since it will probably be considered off-topic here).
I thought about uTorrent too, but Mu has a hex of 0x03BC. Plus it is a popular software and in windows, it's folder in program files uses 'u' instead of Mu.
So does this mean that if you're a high profile target, you should immediately add a random folder to all of your computers in the program files directory?
No..it means if you are running a specific program which unlocks the code you are going to have a bad time (I suppose you could rename all of your program directories though... would that defeat this ?)
The full implications of this code are that the attacker already has another channel to access your machine.
It's not much consolation that you now know that you're being targeted by the Program Files entries (they're a major pain to rename). It's likely there are one or more plants inside your operation and they have physical access to the machine, which is considered game over.
Release Gauss into the wild, have your agent in Fordu Nuclear plant be sure he has Gauss on his machine, and then just get him to name the jpgs or text files he wants sent back to the CIA as 'special.jpg' - Gauss nabs them, sends it back through the network of gauss infected machines, and hey presto - deniable, encrypted, distributed Dead Drops.
Clever. The font makes it possible for the agent to verify he is on a Gauss machine by visiting seemingly innocuous websites which have code to detect whether the font exists, and then inform him by outputting special text only he knows about. He could receive messages that way too. Once he knows it's a Gauss machine, he can drop his specially named files and they are delivered.
Is the idea that gauss would act like a secret file katamari, rolling around collecting data while it spreads, and being harvested when it "infects" a creator controlled machine? It would seem like any direct data transmission would be detectable and investigated with extreme prejudice.
1. Its part of a wider eco-system of collecting / infecting / attacking "framework". It seems that attacking uranium enrichment was just a "plug-in".
2. They have designed for multiple infection vectors. Now if it can get in it can also get out. I would not be surprised if the family of malware here is also able to hook into outlook.exe, and even piggy back on IE connections. There is no particular reason why a payload cannot be steganographically put into every photo uploaded to irans' facebook. Which may not be entirely secure of couse :-)
The possibilites when you have the money and time are incredible.
So, no, something as silly as transmitting over UDP from the agents laptop back to www.cia.gov is unlikely, but this things will just keep pushing data around and around till it gets either home, or to a target.
Sadly, much of the code is out in the open. And is surely being pulled apart by other nationstates and the mafia.
Getting a certain filename onto your computer doesn't sound like a hard problem. Just send them a mail with an attachment of "398rgf90rej243rf.htm" that their email client helpfully extracts for them, or have a file with that name in their web cache when they browse the internet.
Why would you need to trick someone into saving a file with a particular name? You already have malware running on their machine!
Seems much more likely that the check is there to confirm that the payload only runs on specific targets. And, perhaps more importantly, to make recovery and dissection of the payload very difficult for someone without access to the target(s).
If you are a virus and you are too obvious, you are quickly found and and eliminated by the "immune" system. So it is import to stay low on hosts where there is no benefit in attacking and only using them for vectors of infection and only go into full blown activation mode when some specific trigger is found.
I was thinking that this program is the bomb, but it's waiting for a trigger. Having a file with a certain name appear on the machine would be that trigger.
I would guess it shouldn't be planted it is expecting it to be there. Chances are that is an Arabic name for some program from Siemmens or something like that. Or the name of the a rich bank client used to connect to a Swiss bank or something of that sort.
The key is of course is to lay low and undetected until that trigger fires, otherwise, anti-virus companies will blow the whistle.
"2. Append the pair with the second hard-coded 16-byte salt and bytes 0x15, 0x00 " and assuming point 2 of my message above:
This gives a finger print of all actual used programs. This finger print should be specific in the range of 1 to 10^(-7).
If so specific, it limits the scope to preconfigured systems, which are NOT run under user control.
Might it be, that those targets are embedded systems like ATM, Mobile base stations and again SCADA-systems?
Supposedly an administrator could send an update that inserts random files in program files to foil the system identification method, but given that the attacker has such detailed information about the target systems, this seems like a temporary measure at best.
Edit: It looks like the code is only looking for a specific filename. In that case, the only way to thwart this is to rename that file (and fix any issues that this would cause).
a good point raised in the comments is that the "arabic or hebrew" part really meant to say a "non-letter us-ascii value including curley brackets, tilde, and pipe". not sure why anyone would want to jump the gun on narrowing down geography in this way.
Or, find a way to calculate
given that hash is known and salt' but X is not.Or alternatively, attempt a known plain text attack against RC4. Given that a certain amount of plain text is known (4 bytes) at the start of the RC4 payload then it's likely that the first few bytes of the keystream are known and an attack could be mounted via weakness in the RC4 key schedule.