Breaking the Solidity compiler with a fuzzer

saagarjha · on June 6, 2020

The Solidity compiler not quite mature yet, I feel. A compiler optimizations course I took as an undergraduate had us look at trying to improve upon state-of-the-art in some emerging field, and a partner and I picked solc. The optimizer was clearly very new and left us a lot of low hanging fruit because it'd readily leave performance on the table–for example, it would fail to hoist/defer stores to storage, which are between two to three orders of magnitude more expensive than standard arithmetic or stack operations. It's really strange that nobody has given this another look, considering that billions of dollars flow through smart contracts…

Solidity itself is kind of strange too: it doesn't seem to have a consistent design and parsing it is in many cases more hard than it needs to be for what appears to be no good reason other than "this is how C/Java looks so we're going to blindly copy it.". I can't say I'm surprised to see a fuzzer quickly find bugs in the compiler.

DennisP · on June 7, 2020

The EVM itself is pretty inefficient. They're planning to replace it entirely with ewasm (a deterministic subset of wasm), once they do the big 2.0 transition in a year or two. That may be why there hasn't been a major effort on compiler optimizations for the EVM, though they do make regular updates to Solidity. (The current version is 0.6.9, fwiw.)

There are also some other smart contract languages in the works, the main one being Vyper, but it's not quite ready for production.

But if you're interested in picking some of that low-hanging fruit, you'd probably be welcome, most of the Ethereum community is pretty open to new contributors.

davidmurdoch · on June 7, 2020

Are you sure the future ETH 1 shard won't continue to execute transactions via the EVM? AFAIK, I don't think it will be replaced, it'll just live within Ethereum 2 as an "execution environment".

DennisP · on June 7, 2020

That is a possibility. They haven't made a final decision, but it would make things easier. But most of the network would still be running ewasm (after Phase 2).

saagarjha · on June 7, 2020

I mean, my complaints weren’t with the efficiency of the EVM, really, it was more that the compiler didn’t take any advantage of the huge gas differential between the stack and storage…I am unsure how EWASM helps at all (or even why it exists, to be honest).

Sadly our project was optimized for a class presentation rather than actually contributing back, but if I ever find myself back in that space I’ll be sure to send in some patches.

DennisP · on June 7, 2020

How long ago were you looking at it? I have no idea whether they've implemented anything like that but they do make regular updates.

Btw another big change the researchers are working on is statelessness, where the on-chain data is just merkle roots, the actual data is held off-chain in a distributed but reliable way, and clients submit merkle proofs with their transactions. That would drastically reduce the amount of data that full nodes have to store, and presumably allow large reductions in storage prices.

(But that doesn't mean solc shouldn't work on compiler improvements like you're talking about, they're not the ones working on statelessness.)

deegles · on June 6, 2020

Could this be used to find bugs in already published contracts? It brings to mind the 2013 PRNG issue [0] that led to a bunch of wallets being drained. Something that is perfectly valid today might have a vulnerability in the future.

[0] https://android-developers.googleblog.com/2013/08/some-secur...

albntomat0 · on June 6, 2020

Theoretically, I think so.

Looking through their code examples though, the compiler failures look to be in obscure/unlikely to be used areas. Additionally, they state that some are dependent on particular compiler flags

AlexCoventry · on June 6, 2020

There's another fuzzer specifically for solidity smart contracts.

https://github.com/crytic/echidna

e79 · on June 6, 2020

It doesn’t surprise me at all that bugs were found in SMT Checker. I recently wrote a blog post on how Solidity’s model checker works, and stumbled across several bugs while attempting to write simple example contracts. I didn’t even need a fuzzer :).

That area of the codebase is far from complete, which is why it is considered experimental and hidden behind a flag that you have to manually enable.

pfdietz · on June 6, 2020

One of the distinguished papers at PLDI this year is on fuzzing for SMT solvers. People have been fuzzing these solves for years, but even now new approaches find new bugs.

https://testsmt.github.io/papers/winterer-zhang-su-pldi20.pd...

analognoise · on June 6, 2020

Rather than fuzzing, why not just implement it in a language with a formal proof like Ada SPARK?

brian_cloutier · on June 7, 2020

Because the outside world sees Ethereum as a business, and is confused why the foundation doesn't do all of these things which would make the "product" better.

While the Ethereum Foundation sees Ethereum as an ecosystem, much closer to an open source project, and tries to support people who want to contribute to it.

The answer to your question is, nobody has shown up yet who's willing to put in the massive amount of effort required to write the compiler in a formal language.

The closest we've gotten to that is an implementation of the evm: https://jellopaper.org/

iso-8859-1 · on June 6, 2020

Simplicity has a bunch of Coq proofs actually. And Cardano probably has some formal stuff too, but I am not into proof-of-stake.

DyslexicAtheist · on June 7, 2020

you'll still only be as correct as your formal proofs (e.g. using formal proofs isn't a replacement for fuzzing)

e79 · on June 7, 2020

On the other hand, fuzzing is only as “correct” as your coverage, properties/assertions, corpus synthesis, etc.