Back in the time of FIDO, I've accepted the challenge to crack a program (find a correct password) that more or less consisted of a loop to simulate three-address MOV instruction.
The loop jump address sometimes changed for some effectful operations like printing or for optimizations like executing addition.
It took me about four hours to find the correct password. In the course of there three hours I wrote 1) an executor that used i386 debug registers to look for current MOV addresses, 2) a tracer that produced a trace and 3) a compactor which identified common instruction sequences and presented them as some macrocommand. It turned out the original source code has used macros in the opposite way. The final challenge was to write brute force password finder, which is not that hard at all (for 32-bit checksum).
All in x86 assembler. I guess it was about 1995-96, somewhere there.
Now I'd use the same technique, but on higher level. Instead of peephole compacting I'd use graph analysis, but that's about it. You can get pretty much everything from the program trace, I think this way you can get even more information than from disassembly.
So in my opinion, it is one hell of a cool experiment. But try not to use it as a real obfuscation device.
I also remember seeing a "forest of MOVs" obfuscation technique attempting to crack a protection back in the very late 80s, and I remember it so well because it caused me to change my analysis strategy completely. The fun part was that the "interesting" MOVs were hidden amongst other instructions that seemed to perform useful computation, although the results of that were just thrown away and the MOVs were doing all the work. At the time I was fond of printing out code and inspecting/annotating it manually, so I think it took several days and lots of careful documenting of the algorithm before I realised that it was all useless; and upon tracing back the source of the actual value used in the decision and crossing out the irrelevant instructions, imagine my surprise when almost all that was left were MOVs...!
That one taught me it was far better to start by working backward from the result, although self-modifying code tends to be more difficult that way.
And this is exactly why traces are better than assembly. You see instructions that were executed, not the code. You can restore the decision tree (and then the graph, most probaly) and figure out what is going on.
> And this is exactly why traces are better than assembly. You see instructions that were executed, not the code. You can restore the decision tree (and then the graph, most probaly) and figure out what is going on.
I think a good example is the last level (Hollywood) from www.microcorruption.com, which has self modifying code and is set up such that the actual execution jumps into the middle of what the disassembler thinks are actual instructions. Reading the code is pretty useless, but with a trace of instructions executed and register state at each instruction it's easy to start at the end and follow backwards to the interesting part of the program.
Getting the trace is the tricky bit, I had to write an msp430 emulator.
(Actually seeing this example requires completing the rest of the levels, but you should do that anyway, especially if this is the sort of thing you're interested in)
Let me hypothetically apply movfuscator to some not too complex program and look at the assembly. I believe I'll get nothing useful from it, which is fair. But if I trace the program, I can get some useful results - like filling some array with data (increase of the addresses accessed), looping over indices (accesses within some range), see through generated code (table lookups here and there are identical), etc.
Strongest possible +1 for the slides if you are at all interested in low-level alchemy. Also see slide 109 for the beginning of a shadow argument that this might actually have some real-world utility, in that the long list of MOVs is virtually immune to comprehension by existing reverse engineering tools and practices.
That's a TTA (https://en.wikipedia.org/wiki/Transport_triggered_architectu... ), where effectively the ALU and other computation units become memory-mapped devices. It's the logical extension of how a lot of microcontrollers which don't have a multiply instruction in their instruction sets, e.g. 8051, will instead have a multiplier unit that's accessed by reading/writing special memory addresses.
That's somewhat different from the move-based code discussed here where the MOVs are actually performing the computation.
The loop jump address sometimes changed for some effectful operations like printing or for optimizations like executing addition.
It took me about four hours to find the correct password. In the course of there three hours I wrote 1) an executor that used i386 debug registers to look for current MOV addresses, 2) a tracer that produced a trace and 3) a compactor which identified common instruction sequences and presented them as some macrocommand. It turned out the original source code has used macros in the opposite way. The final challenge was to write brute force password finder, which is not that hard at all (for 32-bit checksum).
All in x86 assembler. I guess it was about 1995-96, somewhere there.
Now I'd use the same technique, but on higher level. Instead of peephole compacting I'd use graph analysis, but that's about it. You can get pretty much everything from the program trace, I think this way you can get even more information than from disassembly.
So in my opinion, it is one hell of a cool experiment. But try not to use it as a real obfuscation device.