One approach to this is taken by RevGen [1], which uses QEMU to disassemble the target code and lift it to QEMU's intermediate representation (TCG). The TCG ops are then translated to LLVM.
It's definitely not perfect, but I suspect it will work much better than trying to do x86->LLVM from scratch. QEMU's IR is very likely much closer to correct than anything you'd come up with, since it is used for emulation, and the mapping from QEMU IR to LLVM is very straightforward (you can even run QEMU with LLVM as the backend, albeit with some performance penalty).
Two things to watch out for in this approach: helper functions (places where QEMU has decided to use a C function to implement some functionality rather than implementing it in TCG) and there may be some artifacts in the generated LLVM from things that are related to QEMU's system emulation rather than what the original code did.
Very cool--I'll look at this if I ever get back to this project. (Unlikely, given the changes in my priorities since I started it.)
At the time, my primary motivation was really to learn more about x86 machine code, which explains the brute-force approach. (Hey, Linux got started because Linus wanted to learn more about x86 memory management, so it's not a completely dumb approach.)
It's definitely not perfect, but I suspect it will work much better than trying to do x86->LLVM from scratch. QEMU's IR is very likely much closer to correct than anything you'd come up with, since it is used for emulation, and the mapping from QEMU IR to LLVM is very straightforward (you can even run QEMU with LLVM as the backend, albeit with some performance penalty).
Two things to watch out for in this approach: helper functions (places where QEMU has decided to use a C function to implement some functionality rather than implementing it in TCG) and there may be some artifacts in the generated LLVM from things that are related to QEMU's system emulation rather than what the original code did.
[1] http://infoscience.epfl.ch/record/166081/files/revgen.pdf