Movfuscator: A single-instruction C compiler

thesz · on Aug 7, 2015

Back in the time of FIDO, I've accepted the challenge to crack a program (find a correct password) that more or less consisted of a loop to simulate three-address MOV instruction.

The loop jump address sometimes changed for some effectful operations like printing or for optimizations like executing addition.

It took me about four hours to find the correct password. In the course of there three hours I wrote 1) an executor that used i386 debug registers to look for current MOV addresses, 2) a tracer that produced a trace and 3) a compactor which identified common instruction sequences and presented them as some macrocommand. It turned out the original source code has used macros in the opposite way. The final challenge was to write brute force password finder, which is not that hard at all (for 32-bit checksum).

All in x86 assembler. I guess it was about 1995-96, somewhere there.

Now I'd use the same technique, but on higher level. Instead of peephole compacting I'd use graph analysis, but that's about it. You can get pretty much everything from the program trace, I think this way you can get even more information than from disassembly.

So in my opinion, it is one hell of a cool experiment. But try not to use it as a real obfuscation device.

userbinator · on Aug 7, 2015

I also remember seeing a "forest of MOVs" obfuscation technique attempting to crack a protection back in the very late 80s, and I remember it so well because it caused me to change my analysis strategy completely. The fun part was that the "interesting" MOVs were hidden amongst other instructions that seemed to perform useful computation, although the results of that were just thrown away and the MOVs were doing all the work. At the time I was fond of printing out code and inspecting/annotating it manually, so I think it took several days and lots of careful documenting of the algorithm before I realised that it was all useless; and upon tracing back the source of the actual value used in the decision and crossing out the irrelevant instructions, imagine my surprise when almost all that was left were MOVs...!

That one taught me it was far better to start by working backward from the result, although self-modifying code tends to be more difficult that way.

thesz · on Aug 7, 2015

And this is exactly why traces are better than assembly. You see instructions that were executed, not the code. You can restore the decision tree (and then the graph, most probaly) and figure out what is going on.

_pmf_ · on Aug 7, 2015

> And this is exactly why traces are better than assembly. You see instructions that were executed, not the code. You can restore the decision tree (and then the graph, most probaly) and figure out what is going on.

Do you have some illustrative example?

nialo · on Aug 7, 2015

I think a good example is the last level (Hollywood) from www.microcorruption.com, which has self modifying code and is set up such that the actual execution jumps into the middle of what the disassembler thinks are actual instructions. Reading the code is pretty useless, but with a trace of instructions executed and register state at each instruction it's easy to start at the end and follow backwards to the interesting part of the program.

Getting the trace is the tricky bit, I had to write an msp430 emulator.

(Actually seeing this example requires completing the rest of the levels, but you should do that anyway, especially if this is the sort of thing you're interested in)

thesz · on Aug 7, 2015

Not right now.

Let me hypothetically apply movfuscator to some not too complex program and look at the assembly. I believe I'll get nothing useful from it, which is fair. But if I trace the program, I can get some useful results - like filling some array with data (increase of the addresses accessed), looping over indices (accesses within some range), see through generated code (table lookups here and there are identical), etc.

cautious_int · on Aug 7, 2015

I suggest taking a look at the slides, which show how much trickery is involved: https://github.com/xoreaxeaxeax/movfuscator/raw/master/slide...

patio11 · on Aug 7, 2015

Strongest possible +1 for the slides if you are at all interested in low-level alchemy. Also see slide 109 for the beginning of a shadow argument that this might actually have some real-world utility, in that the long list of MOVs is virtually immune to comprehension by existing reverse engineering tools and practices.

ORioN63 · on Aug 7, 2015

On the other hand you could compile it back to regular assembly and use them.

pkaye · on Aug 7, 2015

The Maxim Integrated MAXQ is is one commercial processor that uses a MOV based instruction set. http://www.maximintegrated.com/en/app-notes/index.mvp/id/322...

I've always felt these were more of a trick in being single instruction set because you are using some of the addressing bits to encode an opcode.

userbinator · on Aug 7, 2015

That's a TTA (https://en.wikipedia.org/wiki/Transport_triggered_architectu... ), where effectively the ALU and other computation units become memory-mapped devices. It's the logical extension of how a lot of microcontrollers which don't have a multiply instruction in their instruction sets, e.g. 8051, will instead have a multiplier unit that's accessed by reading/writing special memory addresses.

That's somewhat different from the move-based code discussed here where the MOVs are actually performing the computation.

foobar2020 · on Aug 7, 2015

The x86 is actually Turing-complete without even executing a single instruction. Page faulting is enough: https://github.com/jbangert/trapcc

ishtu · on Aug 7, 2015

Author is the same person who published epic X86 vulnerability https://github.com/xoreaxeaxeax/sinkhole

ericfrederich · on Aug 7, 2015

I thought those slides looked familiar. Thought it might have been a template that the conference provided. Should have looked at the author ;-)

kazinator · on Aug 7, 2015

Exact dupe:

https://news.ycombinator.com/item?id=9751312

agumonkey · on Aug 7, 2015

Previously: https://news.ycombinator.com/item?id=9751312 https://news.ycombinator.com/item?id=6309631

ape4 · on Aug 7, 2015

Write your program using the nearly Turing-complete C preprocessor and compile into mov.

jschwartzi · on Aug 7, 2015

Is ARMv7 mov turing complete?