Obfuscated Tiny C Compiler (2002)

Ecco · on June 18, 2018

Every other month an amazing project from Fabrice Bellard comes back under the spotlight on HN. I guess you could train a ML model on previous posts and make a automatic karma-generating machine :-)

saagarjha · on June 18, 2018

He's such a prolific author that sooner or later you just get accustomed to seeing another project on Hacker News and finding out that he wrote it.

saagarjha · on June 18, 2018

Also see the Tiny C Compiler, which was born out of this project and now is a pretty-full featured C compiler in its own right: https://bellard.org/tcc/. It has a couple pretty interesting features by virtue of its small size, such as being able to "interpret" C code as well as generating executables in-memory so they're portable across platforms without a lot of work.

mpweiher · on June 18, 2018

And it is fast. I clocked it at over 200K lines/sec. Swift, on a typical project, will do around 20. Not K, just lines/sec.

Of course the C code was a lot simpler, but still...

IncRnd · on June 18, 2018

Swift, on a typical project, will do around 20. Not K, just lines/sec.

To compare a historical compiler with your Swift numbers: In 1989, Turbo C compiled 16K lines per minute on a ho-hum machine. The computers today, 30 years later, are more than 60x faster, so one would expect compilation speeds exceeding 16k per second of swift code. Turbo Pascal compiled about twice as fast as Turbo C at the time.

vintagedave · on June 18, 2018

Delphi (Turbo Pascal's successor) today does a million in four seconds, or about 250k / sec.

https://youtu.be/Yq2mNUzcpE0?t=146

IncRnd · on June 18, 2018

That's four orders of magnitude greater than 20 loc/sec.

AboutTheWhisles · on June 20, 2018

60x faster? Today's computers are closer to 1000x faster.

IncRnd · on June 20, 2018

That was my point.

There are 60 seconds in a minute.

dxhdr · on June 18, 2018

> Swift, on a typical project, will do around 20. Not K, just lines/sec.

Wow, how common is this in practice? That sounds broken.

fdej · on June 18, 2018

Wait, what? 50 milliseconds per line of code? How is that even possible?

mpweiher · on June 18, 2018

Makes you wonder, right?

Of course, there are (still) pathological cases where a single line of code will abort because it runs out of time.

   > cat too-complex.swift
   let a:[Int] = [1] + [2] + [3] + [4] + [5] + [6] + [7]

For me, this aborts after about a minute and a half:

   > time swiftc too-complex.swift 
   too-complex.swift:1:49: error: expression was too complex to be solved in reasonable time; consider breaking up the expression into distinct sub-expressions
   let a:[Int] = [1] + [2] + [3] + [4] + [5] + [6] + [7]
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~

   real	1m35.094s
   user	1m16.698s
   sys	0m5.141s

That appears to be some sort of exponential in type inference. Other similar ones have been fixed.

The ternary expressions mentioned elsewhere also appear to be problematic, and in general, there's little time bombs all over just waiting to go off.

The rate I mentioned earlier was taken from a CocoaHeads talk by someone who built a caching mechanism for Carthage, because their Swift framework(s) of around 20KLOC were taking ages to compile.

So it's a real-world example, not something made up.

munificent · on June 18, 2018

> That appears to be some sort of exponential in type inference.

If I recall, it's in the implicit conversions, not strictly the type inference.

opencl · on June 18, 2018

It is heavily dependent on the type of code you write. For example, ternary operators can take several hundreds of milliseconds or even entire seconds each to compile while an equivalent if statement is orders of magnitude faster[1]. Anything with type inference also tends to compile very slowly.

[1] https://bugs.swift.org/browse/SR-1577

abecedarius · on June 18, 2018

Not 50 milliseconds, 5 microseconds. Something like 10,000 cycles, or a few hundred to a thousand cycles per byte of source code (depending on average line length).

fdej · on June 18, 2018

"around 20. Not K, just lines/sec"

abecedarius · on June 18, 2018

Sorry for the misunderstanding. Inefficiencies on that level just aren't very surprising to me anymore.

appcypher · on June 18, 2018

I would like to know how you were able to investigate that.

jacobush · on June 18, 2018

Hit compile, count the seconds until complete?

jpfr · on June 18, 2018

Development is still going strong.

This is the active code repository of TCC: http://repo.or.cz/w/tinycc.git

saagarjha · on June 18, 2018

Yup! I maintain a (very rough and in-progress) fork of that code for supporting iOS: https://github.com/saagarjha/tinycc

tzahola · on June 18, 2018

Cool! With CocoaPods building everything from source, a fast compiler would be a godsend for most iOS shops, even for just debug builds.

It should be pretty easy to add Objective-C and C++ support, right? ( ͡° ͜ʖ ͡°)

MaxBarraclough · on June 18, 2018

Neat project!

Is it just for fun, or could it be 'useful'? iOS categorically prohibits JIT, right?

saagarjha · on June 18, 2018

It’s just for fun right now, but I have an idea for something that uses this that’s currently blocked by a stupid, unrelated issue. And yes, iOS generally prohibits JIT, but there are certain circumstances under which you can perform it (you need a certain entitlement). There’s no way that I’m aware to get an app through App Store review with this functionality, though.

MaxBarraclough · on June 18, 2018

Indeed. If it were possible, Firefox would have done it.

mpweiher · on June 18, 2018

Sounds cool! What parts is it supposed to support?

- Mach-O binaries?

- ARM?

- Objective-C?

- ?

saagarjha · on June 18, 2018

Just ARM for now, and support for certain memory protection restrictions that iOS imposes. Honestly, I really didn't do much other than tweak the Makefile, modifying a call to mprotect, and rearrange some deallocation code. TCC doesn't support Mach-O yet, and that's not something that I have enough expertise to support, so all execution is done by JITing an ELF binary in memory as it's done on macOS.

ENTP · on June 18, 2018

Quite literally the only "FB" on the internet worth bothering with. The guy is a genius.

yiyus · on June 18, 2018

After having spent some of my free time lately studying the b interpreter by Arthur Whitney (http://www.kparc.com/b/b.c), this looks quite straightforward to me.

That said, it is indeed tiny. Very nice project, as usual from FB.

exikyut · on June 18, 2018

You'll need to click through to the parent comments for this to properly make sense, but this is the most relevant bit of a subthread I think you may find interesting: https://news.ycombinator.com/item?id=16368271

yiyus · on June 18, 2018

Thank you. I have already found a few interesting threads, mostly with contributions from geocar, who seems to be working in kos. However, I have the impression everything is kept quite secret around this, and I respect that.

My current interest in studying this code is more the journey than the result, but I am quite sure I will have to try to approach someone more familiar with the code at some point.

exikyut · on June 19, 2018

I too have a journey-focused interest in studying the code, heh. I must admit I'm mildly sad about the secrecy (and requirement to participate, presumably to deter lurkers with indeterminable motives), but I suspect it's mostly because K itself is used in finance (eg, http://archive.vector.org.uk/art10501320, ^F "brightly", end of that paragraph and all of the next small one), and it's reasonable to consider that if it doesn't cost a lot of money that'll categorically impact its relevancy :)

I guess this is also why Kona is generally regarded dismally - I wonder if that view isn't carefully cultured/spread for similar reasons. In any case the implementation is heavily tied to the success (which is interesting in and of itself), which also ties the implementation to the niche, and I guess I lament the impact the language's resulting general accessibility and reach because things won't be changing anytime soon. Eh, I guess the selfish(?)/survivalist(?) view is to clear the calendar and say hi while Whitney's still teaching :P (I've been meaning to do exactly this, but my, er, calendar (life, really) doesn't permit me the confidence to say I'd definitely be able to keep up.)

In the video it's mentioned there's not much optimization in kos. That's mildly encouraging, from a general systems perspective and also from an architectural standpoint (eg, functional programming can be interesting/novel/weirdly-aesthetic and "efficient enough" (or even more than) with eg UI latency).

dman · on June 18, 2018

Any tips on how you went ahead trying to dissect and understand b.c?

yiyus · on June 18, 2018

Understanding c.h, a.[ch] and A.S was not too difficult, I can share my notes with you if you are interested.

I am still working in b.c itself. I have started from the entry point and try to make sense of it following different simple examples, with help from an opcodes table and other references. It is a very interesting exercise, and I think I have already found some bug, but it is not an easy task. I am not sure I will ever totally decipher it.

I would like to make public what I have so far and ask for help, but since this code does not have a public license, I would not feel comfortable doing that. My plan is to try to get as far as possible (I have only been working on it for a couple of weeks so far), and ask for permission to make it public from Whitney, though I am not sure what to expect.

dman · on June 18, 2018

Thanks for sharing your notes - I was impressed with the rigor and the effort you have put into understanding the codebase.

dman · on June 18, 2018

Yes I would be interested in your notes. My email is in my profile.

avhon1 · on June 19, 2018

Any idea what terms the b compiler is licensed under?

yiyus · on June 19, 2018

There is no license file in kparc.com, neither a single comment about it, or even about its authorship. Nevertheless, it seems to have been developed by Arthur Whitney, so I guess you could ask him. The kx forums may also be a good place to ask.

swah · on June 18, 2018

Is there ane big reason why TCC is so fast - other than, of course, doing less work that GCC ?

zde · on June 18, 2018

Single pass compiler with back-patching. No intermediate code representation, very simple optimizer (just constant folding and dead code removal). C Preprocessor integrated with parser code. Back-end generates instruction opcodes instead of assembly, eliminating another stage.

userbinator · on June 18, 2018

These may possibly inspired by OTCC:

https://news.ycombinator.com/item?id=8558822

https://news.ycombinator.com/item?id=8746054

They're quite a bit larger, but not "obfuscated" (although still terse), so could be more suitable to learn from.