Hacker News new | past | comments | ask | show | jobs | submit login
In Cryptography, Advances in Program Obfuscation (simonsfoundation.org)
107 points by digital55 on Jan 30, 2014 | hide | past | favorite | 47 comments



Fun fact: The Flame virus had several layers of self-encryption. It was basically an onion, with each "layer" being encrypted executable code. The outer layer was typically the only one that was active at any given time. It would then detect some part of the system configuration (like part of the list of the installed programs) and derive a decryption key from that, and try to decrypt itself. Therefore, if(f) the system had a certain set of installed programs, then the Flame virus would decrypt its next layer successfully. That way it would avoid exposing its entire codebase to virus researchers, because the researcher's systems probably wouldn't have the correct list of installed programs. (Is there a name for this technique?)

Apparently, it didn't work very well iirc. The researchers figured out how to get into all of the layers somehow, though I forget how. But the idea was clever... I wonder if something similar could work well in practice.

(This isn't the same thing as what was presented in the article, of course, but it's a lot easier to accomplish in practice.)


Anything like that is not really fundamentally different from a password, and for the program to work correctly (decrypt itself) it has to collect the password somehow. The outer layer that does this really has no defense other than obfuscation, which is difficult to do because in order to do anything useful you almost always have to interact with the system on which you're running anyway, and those interactions give away what you're doing.

And if you're decrypting a program and doing something that people notice, it's a matter of time before someone is able to capture the decrypted code and work backwards from that.

The decryption routine is a rather obvious place to look; does the routine check if something is decrypted properly before jumping into that code, or does it decrypt, jump, and crash if the key is wrong? If it checks, then that's the ideal place to stop the program, dump the memory, and see what's going on. If it doesn't check, you're going to have malware that's crashing all the time and fairly easy to detect for that reason. If you're trying to do something under the radar, the last thing you want is unstable malware.

So yeah, traditional obfuscation is just a speed bump, and too much of it might make things detectable. The underlying problem is that the software has to DO something useful (read a file, send network packets, etc), and it's extremely hard to hide that on hardware you don't control.

But like any security, depending on the goal, maybe a speed bump is good enough.


But consider that the decryption key for the outer layer is something that is unique, such as the machine key. Or the serial number of a device attached to the box. Or even the mac address of a network device.

Thus, only in that particular targeted environment will the next layer be revealed. While it is true that, as you say, Anything like that is not really fundamentally different from a password, that password might require physical possession of the target machine.


This is an amazing idea. Someone might have been able to do this for the final code for something like Stuxnet since they knew the target software. Really brilliant.


It seems like with stuxnet it was a serial number from the attached hardware.


The point is that you'd have to read that data somehow, and likely from a user-level process, so you'd have to make some sort of system call to do that.

Unless the entropy that's the input key is something ephemeral, it's only a matter of time before the key is found. And in the case of malware, you can pretty much assume that the victim/target does have physical possession of the machine.


Yes, it will be very clear from executing the code a step at a time where the key can be found (MAC address, cpu serial id, unique property from external hardware) but that does not give you the value of the key derived from that per-installation specific hardware.

Yes, it is a matter of time, so long as you can run this experiment on every unique hardware environment.


If the malware depends on a sufficient number of system details to decrypt itself, then no amount of code analysis will help decrypt the inner layers. As you say, it's like obtaining a password, and a sufficiently strong password (or sufficiently unique set of system details) cannot even be brute forced before the heat death of the universe.


If by "no amount of code analysis will help" you mean to exclude techniques such as running the damn thing on a known-vulnerable box/VM, then sure. But very few reverse engineers operate under that restriction.

It's like DRM: the user must have the key, so the only security you can have lies in obfuscation.


> ... such as running the damn thing on a known-vulnerable box/VM, ...

But in the case of Flame, wasn't that (one of) the big problem(s)? That the researchers didn't know what, exactly, comprised a known-vulnerable host?


The point of the derived decryption key is that the only system that can produce the key is very likely not in the hands of security researchers.


The name for this is conditional code obfuscation:

http://llvm.org/pubs/2008-02-ImpedingMalwareAnalysis.html


    But if you have a black box obfuscator, creating a
    public key encryption protocol becomes a simple matter
    of choosing your favorite secret-key encryption scheme,
    expressing its workings as a computer program, 
    obfuscating the program, and making the obfuscated
    version widely available. Anyone can then use it to
    encrypt a message to send to you, but no one can tease
    the decryption key out of the obfuscated software.
I cannot imagine downloading and running such an application for which nobody could feasibly determine it was safe, except to run it in some sort of secure sandbox.


Ah,

Further, how could any of us determine that any given program we happened to have downloaded wasn't built with something like this?

If true, this seems to open ever nefarious possibility one might imagine. The evil could be in the executable but it could be in weird point arithmetic in the source code.

I hope this is just hype or something. (reading the article, I'm supposing the main problem is that so far any program like this will be huge).


Note that this is if you have a black-box obfuscator, and black-box obfsucators are impossible. (The proof is actually pretty simple; see [0] ("Theorem 4") for a short version.) Although, according to the article, apparently something similar is possible with an indistinguishability obfuscators; I'm not too clear on how that works. In that case, though, where you use an indistinguishability obfsucator, then yes there is the problem of having a huge program (which is probably also slower).

[0] http://www.scottaaronson.com/blog/?p=392


Then you're not being very imaginative. Instead of assuming that this some giant EXE, assume instead that the (open source, carefully vetted) host for this code exposes a small VM, with no linkage to system libraries or direct access to memory, whose sole purpose is to unwrap small encrypted blobs.


Hence a secure sandbox, as the parent suggested. Careful when you insult people.


Hm. You're right. That came out way snippier than I meant it too. Thanks.


This combo really is something. I saw your other comment about private-key being the interesting part (and not high-speed symmetric operations). This really is a breakthrough but I am not sure today where it will go.


How can you tell it's not swapping text to include "Fire all the missiles!" (or other message of choice) to whatever you're encrypting?


If you are running an obfuscated blob, it's probably not because you intend to trust it, it's probably because you are offering the ability to run some computation to a another person to run in your context in a way that they can trust. (It wasn't mentioned, but I imagine that trying to falsely run a program like this would also be very difficult; the slightest deviation probably shreds the output; my intuition says this would be a necessary condition for any such obfuscation scheme to stand. Combine this with the obfuscated code signing the result and you're probably looking at something extremely hard to forge.) Your trusted environment and input, their trusted code, a result that either they trust or perhaps you mutually trust (with some protocol stuff added in).


I haven't read carefully, but this is a description of something akin to white-box encryption, right?

http://eprint.iacr.org/2013/104.pdf


In the follow-up paper, Sahai and Waters turn the weak indistinguishability obfuscation (which, by itself, is not strong enough for the white-box setting) and one-way functions into public-key encryption. This is what white-box (symmetric) encryption does, so presumably it's doable.

It's worth reiterating that this is very much not practical by any standard, and doesn't look anything like the usual software obfuscation used today, even white-box. This thing requires at the very least that multilinear maps and fully homomorphic encryption are anywhere near practical, which is not the case today (and may end up being like quantum computing, always 10 years away).


This will remain nothing more than a parlor trick. It's pretty much a guarantee, based on how a modern CPU works. You're sure to screw up the branch predictor, caching, and everything else that works to prevent your program from making it feel like 1993 all over again. It would seem like the halting problem would also make an appearance in there somewhere, once you go rearranging how things work. On the upside, I guess compiler writers can drop all their work on optimization.


Yeah, exactly - very similar to the issues with homomorphic encryption in terms of performance hits: https://en.wikipedia.org/wiki/Homomorphic_encryption

Maybe if we do eventually outgrow the computational power we have access to this could happen, but not in the near future I don't think.


'''The team’s obfuscator works by transforming a computer program into what Sahai calls a “multilinear jigsaw puzzle.” Each piece of the program gets obfuscated by mixing in random elements that are carefully chosen so that if you run the garbled program in the intended way, the randomness cancels out and the pieces fit together to compute the correct output. But if you try to do anything else with the program, the randomness makes each individual puzzle piece look meaningless.'''

I know nothing about crypto and AI and even less about neuroscience and this probably sounds stupid - but, is it possible that all the sensory input we receive is just random "sensible nonsense" and our brains partially solve this "multilinear jigsaw puzzle" using a vast neural network?

Considering the fact that brute force AI has been a total failure in recognizing anything, I wonder if our world is similarly obfuscated by sensible nonsense (from the robot's perspective).


It doesn't sound stupid. Given we may exist in a shared CPU space which renders this universe, it would make perfect sense for implementing said universe and our awareness of it. Obfuscating the universe's rules would keep people from jacking with it to the detriment of others. It's bad enough we can screw up other's lives in here as it is. Being able to alter the universe's infrastructure at will wouldn't work out so well.


When I read the article, I thought the opposite. Our brain is an obfuscated computing device (due to evolution and its "holographic" nature), and we're trying to deobfuscate what it does on rather simple inputs.

Mostly, we don't succeed by analyzing it fully, but only by observation and attempts to build an algorithm on our own. I think one could approach breaking such schemes with similar strategy.


That seems to propose a lot of crazy shit without anything back it up to explain something that is already pretty well explained: The world is complex.


There is something I don't get it there though. Wouldn't you, with a debugger for instance, be able to see the program as it would have to deobfuscate itself for the processor to understand it ?


That presumes that being able to "see the program" reveals anything about the cipher or its key. In the white-box crypto model, it doesn't; the source code is itself "encrypted", so that its basic operations are visible but the precise sequence of operations it will taken given a specific input are Hard to determine.


Interesting possibility in white-box model is ability to take something invertible (eg. block cipher) and instantiate it in a way that cannot be inverted. That would allow for building fixed key signature mechanism with essentially arbitrary signature size equal to security level.

Another question is whether usable (ie. reasonable code and data sizes) and secure (with "hard" in the cryptographic sense, not in the sense of non-obvious) white-box cryptography is actually possible.


But this can't work without blackbox TPM chip that are not under your control anymore... Thus for this to work you have to give up general purpose computing?


No, it does not depend on a tpm.


That's a very conventional method of obfuscation. What the paper proposes is that the program would do a series of random actions which would add up to the original program + a lot of noise.


I don't claim to understand the details at all, but I would at least imagine the obfuscated program to be so complex and with so much indirectness and "virtual-machine"-ness that any kind of single stepping would yield an absolutely incomprehensible state at every point until termination.


The picture of Craig Gentry in the article shows him standing in front of the National Cryptologic Museum at the NSA. :-)


It's an absolutely fascinating proposition. The implication of the technology is that such a program would consume many orders of magnitude more CPU and memory to achieve the same result as a non-obfuscated program, so you'd only use it for core cryptographic processes.


Generally you want cryptographic primitives to run extremely fast though. This just comes across as something useful for the "I need to protect my source from hackers!!!" types who don't care if it makes their program stupidly inefficient. The only difference being that here it can't be reverse engineered (usually a bad thing for people who like FLOSS). Yeah there are imaginary uses for running code in an insecure environment but that comes across as an afterthought and has the same problems.


You want bulk encryption to be fast, but can often handle slow key derivation.


I'd like to know how this affects code generation. Won't it create a huge mess of intructions, and run ridiculously slow? Article seems to imply yes:

>However, the new obfuscation scheme is far from ready for commercial applications. The technique turns short, simple programs into giant, unwieldy albatrosses


Presuming it's anything like White Box AES, then yes; that's what it will do.


Is this something like being able to come up with a Malbolge program? Being able to read the instructions or trace execution is not relevant, since the execution depends on some being able to solve some difficult problem?



That's a different question, you are acting as the reverse "TPM" chip and sending the processing away and then deciphering the result in protected space: your computer.

I can't see how you can't trace a program on a computer without such a black box part. Just ask any software cracker.


homomorphic encryption is like an interpreter that can run encrypted code without decrypting it.*

this is like a compiler that generates obfuscated code (that is as hard to decompile as if it were encrypted).

*Technically homomorphic encryption refers to performing arithmetic (and other?) functions on encrypted data without decrypting the data. But these operations can then be used to construct an interpreter that treats the data as code.


eh... what happens when it starts to do io?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: