Hacker News new | past | comments | ask | show | jobs | submit login
Obfuscated Tiny C Compiler (2002) (bellard.org)
134 points by sndean on June 18, 2018 | hide | past | favorite | 41 comments



Every other month an amazing project from Fabrice Bellard comes back under the spotlight on HN. I guess you could train a ML model on previous posts and make a automatic karma-generating machine :-)


He's such a prolific author that sooner or later you just get accustomed to seeing another project on Hacker News and finding out that he wrote it.


Also see the Tiny C Compiler, which was born out of this project and now is a pretty-full featured C compiler in its own right: https://bellard.org/tcc/. It has a couple pretty interesting features by virtue of its small size, such as being able to "interpret" C code as well as generating executables in-memory so they're portable across platforms without a lot of work.


And it is fast. I clocked it at over 200K lines/sec. Swift, on a typical project, will do around 20. Not K, just lines/sec.

Of course the C code was a lot simpler, but still...


Swift, on a typical project, will do around 20. Not K, just lines/sec.

To compare a historical compiler with your Swift numbers: In 1989, Turbo C compiled 16K lines per minute on a ho-hum machine. The computers today, 30 years later, are more than 60x faster, so one would expect compilation speeds exceeding 16k per second of swift code. Turbo Pascal compiled about twice as fast as Turbo C at the time.


Delphi (Turbo Pascal's successor) today does a million in four seconds, or about 250k / sec.

https://youtu.be/Yq2mNUzcpE0?t=146


That's four orders of magnitude greater than 20 loc/sec.


60x faster? Today's computers are closer to 1000x faster.


That was my point.

There are 60 seconds in a minute.


> Swift, on a typical project, will do around 20. Not K, just lines/sec.

Wow, how common is this in practice? That sounds broken.


Wait, what? 50 milliseconds per line of code? How is that even possible?


Makes you wonder, right?

Of course, there are (still) pathological cases where a single line of code will abort because it runs out of time.

   > cat too-complex.swift
   let a:[Int] = [1] + [2] + [3] + [4] + [5] + [6] + [7]
For me, this aborts after about a minute and a half:

   > time swiftc too-complex.swift 
   too-complex.swift:1:49: error: expression was too complex to be solved in reasonable time; consider breaking up the expression into distinct sub-expressions
   let a:[Int] = [1] + [2] + [3] + [4] + [5] + [6] + [7]
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~

   real	1m35.094s
   user	1m16.698s
   sys	0m5.141s

That appears to be some sort of exponential in type inference. Other similar ones have been fixed.

The ternary expressions mentioned elsewhere also appear to be problematic, and in general, there's little time bombs all over just waiting to go off.

The rate I mentioned earlier was taken from a CocoaHeads talk by someone who built a caching mechanism for Carthage, because their Swift framework(s) of around 20KLOC were taking ages to compile.

So it's a real-world example, not something made up.


> That appears to be some sort of exponential in type inference.

If I recall, it's in the implicit conversions, not strictly the type inference.


It is heavily dependent on the type of code you write. For example, ternary operators can take several hundreds of milliseconds or even entire seconds each to compile while an equivalent if statement is orders of magnitude faster[1]. Anything with type inference also tends to compile very slowly.

[1] https://bugs.swift.org/browse/SR-1577


Not 50 milliseconds, 5 microseconds. Something like 10,000 cycles, or a few hundred to a thousand cycles per byte of source code (depending on average line length).


"around 20. Not K, just lines/sec"


Sorry for the misunderstanding. Inefficiencies on that level just aren't very surprising to me anymore.


I would like to know how you were able to investigate that.


Hit compile, count the seconds until complete?


Development is still going strong.

This is the active code repository of TCC: http://repo.or.cz/w/tinycc.git


Yup! I maintain a (very rough and in-progress) fork of that code for supporting iOS: https://github.com/saagarjha/tinycc


Cool! With CocoaPods building everything from source, a fast compiler would be a godsend for most iOS shops, even for just debug builds.

It should be pretty easy to add Objective-C and C++ support, right? ( ͡° ͜ʖ ͡°)


Neat project!

Is it just for fun, or could it be 'useful'? iOS categorically prohibits JIT, right?


It’s just for fun right now, but I have an idea for something that uses this that’s currently blocked by a stupid, unrelated issue. And yes, iOS generally prohibits JIT, but there are certain circumstances under which you can perform it (you need a certain entitlement). There’s no way that I’m aware to get an app through App Store review with this functionality, though.


Indeed. If it were possible, Firefox would have done it.


Sounds cool! What parts is it supposed to support?

- Mach-O binaries?

- ARM?

- Objective-C?

- ?


Just ARM for now, and support for certain memory protection restrictions that iOS imposes. Honestly, I really didn't do much other than tweak the Makefile, modifying a call to mprotect, and rearrange some deallocation code. TCC doesn't support Mach-O yet, and that's not something that I have enough expertise to support, so all execution is done by JITing an ELF binary in memory as it's done on macOS.


Quite literally the only "FB" on the internet worth bothering with. The guy is a genius.


After having spent some of my free time lately studying the b interpreter by Arthur Whitney (http://www.kparc.com/b/b.c), this looks quite straightforward to me.

That said, it is indeed tiny. Very nice project, as usual from FB.


You'll need to click through to the parent comments for this to properly make sense, but this is the most relevant bit of a subthread I think you may find interesting: https://news.ycombinator.com/item?id=16368271


Thank you. I have already found a few interesting threads, mostly with contributions from geocar, who seems to be working in kos. However, I have the impression everything is kept quite secret around this, and I respect that.

My current interest in studying this code is more the journey than the result, but I am quite sure I will have to try to approach someone more familiar with the code at some point.


I too have a journey-focused interest in studying the code, heh. I must admit I'm mildly sad about the secrecy (and requirement to participate, presumably to deter lurkers with indeterminable motives), but I suspect it's mostly because K itself is used in finance (eg, http://archive.vector.org.uk/art10501320, ^F "brightly", end of that paragraph and all of the next small one), and it's reasonable to consider that if it doesn't cost a lot of money that'll categorically impact its relevancy :)

I guess this is also why Kona is generally regarded dismally - I wonder if that view isn't carefully cultured/spread for similar reasons. In any case the implementation is heavily tied to the success (which is interesting in and of itself), which also ties the implementation to the niche, and I guess I lament the impact the language's resulting general accessibility and reach because things won't be changing anytime soon. Eh, I guess the selfish(?)/survivalist(?) view is to clear the calendar and say hi while Whitney's still teaching :P (I've been meaning to do exactly this, but my, er, calendar (life, really) doesn't permit me the confidence to say I'd definitely be able to keep up.)

In the video it's mentioned there's not much optimization in kos. That's mildly encouraging, from a general systems perspective and also from an architectural standpoint (eg, functional programming can be interesting/novel/weirdly-aesthetic and "efficient enough" (or even more than) with eg UI latency).


Any tips on how you went ahead trying to dissect and understand b.c?


Understanding c.h, a.[ch] and A.S was not too difficult, I can share my notes with you if you are interested.

I am still working in b.c itself. I have started from the entry point and try to make sense of it following different simple examples, with help from an opcodes table and other references. It is a very interesting exercise, and I think I have already found some bug, but it is not an easy task. I am not sure I will ever totally decipher it.

I would like to make public what I have so far and ask for help, but since this code does not have a public license, I would not feel comfortable doing that. My plan is to try to get as far as possible (I have only been working on it for a couple of weeks so far), and ask for permission to make it public from Whitney, though I am not sure what to expect.


Thanks for sharing your notes - I was impressed with the rigor and the effort you have put into understanding the codebase.


Yes I would be interested in your notes. My email is in my profile.


Any idea what terms the b compiler is licensed under?


There is no license file in kparc.com, neither a single comment about it, or even about its authorship. Nevertheless, it seems to have been developed by Arthur Whitney, so I guess you could ask him. The kx forums may also be a good place to ask.


Is there ane big reason why TCC is so fast - other than, of course, doing less work that GCC ?


Single pass compiler with back-patching. No intermediate code representation, very simple optimizer (just constant folding and dead code removal). C Preprocessor integrated with parser code. Back-end generates instruction opcodes instead of assembly, eliminating another stage.


These may possibly inspired by OTCC:

https://news.ycombinator.com/item?id=8558822

https://news.ycombinator.com/item?id=8746054

They're quite a bit larger, but not "obfuscated" (although still terse), so could be more suitable to learn from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: