Hacker News new | past | comments | ask | show | jobs | submit login
I wrote a self-hosting C compiler in 40 days (2015) (sigbus.info)
259 points by rspivak on Sept 5, 2019 | hide | past | favorite | 44 comments



> The C vararg spec is not well-designed. If you pass all function arguments via the stack, va_start may be implemented pretty easily, but on the modern processor and in modern calling convention, arguments are passed via registers to reduce overhead of function calls. So the assumption of the spec does not match the reality.

The original assumption of the pre-ANSI <vararags.h> was that way, since it was just a hack exploiting actual object code behavior in the absence of language spec support. But ANSI C standardized the ... ellipsis, and the old hack became undefined behavior. A function declaration with the ellipsis is not type compatible with one that lacks it. This difference in type means that the compiler can map variadic functions to a different calling convention, such as one that always uses the stack for all the trailing arguments, even if they occur in positions for which the regular convention uses registers.

I wouldn't call variadic functions well designed on the whole, but aspect isn't badly designed.


Author here. These days I'm writing a new C compiler (https://github.com/rui314/chibicc) from scratch again to write a book about compilers.

Since I'll be using the new one as a reference implementation for a book, and the book is intended to be for beginners, I put as much effort as I can into improving the readability of the code. In particular, not only the head of the repository but every commit in the repo should be readable, so that readers can easily understand how each feature is implemented. I believe I'm doing a good job keeping it clean so far. (Actually in order to keep the commit history clean, I continue rewriting commit history and doing `git push -f`, but that should be fine because the purpose of publishing the repo is not for co-development but for sharing a reference implementation.)

chibicc does not have a C preprocessor, but except that it can compile itself already, so if you are interested, you can take a look.


How's it coming along? Will there be an English translation available? Actually, I just tried to translate part of the book (the whole book is 'too big to translate') with Google translate, and the translation looks of remarkably high quality compared to what I'm used to from Google translate.

I would love to see a similar book/project for a linker and assembler.


I will translate it to English. Be wary of the machine translation -- using the neural network they learned how to write surprisingly natural sentences but that doesn't mean their translation is correct.


Each time I use google translate from chinese to english, I am also surprised by the good quality. French to english is really bad in comparison.


  (Actually in order to keep the commit history clean, I continue rewriting commit history and doing `git push -f`, 
actually I wonder if you might consider to maintain another clean repository. So there is a raw repository to record every real history, and a clean repositiory to for better reading? Because sometimes the raw history could be interesting too.


I haven't considered that but I think that's hard to do for technical reasons. New commits are made on top of a clean history, so there's no single history of "raw" commits but they are intermingled.


I would recommend you keep doing it the way you are. You are working on an educational exposition and choosing commits as part of that. You should take the position that the best judge of proper presentation.

That said you could always turn off gc if some people want it, but I wouldn't put any more than a token effort into keeping that history.


Maybe you can disable git gc and allow for readers to grab a copy of your repository’s full reflog?


I can disable gc, but reflog is really messy that I think no one can get any info from it. Even I can't, even though I know exactly what I did to the repo.


Fair point; I'm sure you can decide what kinds of things you think would be valuable to share.


> I've almost finished implementing C preprocessor in just one day. It's actually a port from my previous attempt to write a compiler.

The preprocessor is fiendishly tricky to write. I wondered how he did it in such a short time :-) I had to scrap mine and reimplement it 3 times.

I wish I had had https://www.spinellis.gr/blog/20060626/cpp.algo.pdf to work from.


For the reference folks - this is coming from the guy who's wikipedia page [0] says "He was hired by Facebook to write a fast C/C++ preprocessor in D."

Not that I have had any doubts about how hard it is...

[0] https://en.wikipedia.org/wiki/Walter_Bright


Here's the source code:

https://github.com/facebookarchive/warp

and the one for the C compiler I wrote long ago:

https://github.com/DigitalMars/Compiler/blob/master/dm/src/d...

which is integrated with the C lexer for speed reasons.


He also wrote D (or is credited with creating it).

I’m also interested in how one would write a C preprocessor in a day. It’s got so many tricky edge cases and whatnot.


> or is credited with creating it

Full disclosure: I looked into the future with my Chronoscope, and copied the most popular language!


https://www.spinellis.gr/blog/20060626/cpp.algo.pdf was actually very useful. With that doc, I could implement a preprocessor without thinking too much about how it should work in details. In that sense, even after I implemented a C preprocessor, I don't feel I fully understand how it works.


Walter, I think your Zortech compiler was one of the first (if not The First) to introduce register calling convention. I don't remember much, but how did you get around varargs implementation?


Simply, varargs functions didn't use the register calling convention. Varargs is rarely used outside of printf, and so isn't worth the effort optimizing. Indeed, the Linux 64 bit ABI for varargs is hopelessly klunky, but it doesn't really matter.


> Varargs is rarely used outside of printf, and so isn't worth the effort optimizing.

open(2) is also called relatively frequently and AFAIK requires varargs if it’s implemented in C.


It's relatively frequent to see it in source code, but those calls are cold code. People aren't calling open(2) millions of times per second in their inner loop.


Thank you.


Fucking love shit like this.

More folks should write compilers. Can't wait to see your next compiler, Rui.


> More folks should write compilers

It's on my TODO list. After, of course, I finish that x86_32 preemptive multi-tasking kernel I started writing 20 years ago and never finished. Plus the 16-bit CP/M clone for the 68000 that's gathering dust. Or maybe after my half finished Z80 emulator or my half finished Z80-systems-in-Unreal-Engine project, or my mostly-but-not-completely-working VT100 emulator. Sigh. How do people ever finish this stuff?


Not participating in internet forums is a good start, huge time sink.


Usually a during work hours activity for me, while stuff is "building", but point taken.


If ever I heard a reason to use C++ templates, this is it.


I'm writing a new one (https://github.com/rui314/chibicc) and also writing a book about how to write a C compiler using my new compiler as a reference implementation.



I recognize Rui's name from llvm-dev, so I think they found their next compiler!


Rui is [one of?] the primary maintainers of lld: the LLVM project's linker.

EDIT, oh, he says as much at the bottom: "Since then, I have moved to the LLVM team in Google, and I'm now working on lld, the LLVM linker."


The author has since written a longer book on how to write compilers (albeit in Japanese): https://www.sigbus.info/compilerbook


Discovered that book too, still looking for idea how to translate to English.


I will translate it to English myself once it's complete.


How to get a notification when the book comes out?


There are so many small C compilers, I never know which one to choose, although TinyCC seems to be the best. To me it allows 2 things:

* Use C as a fast scripting language. This can be useful for game programming, where you don't want to recompile your engine, but you can't tolerate the sluggishness of interpreted languages.

* Use C as a compile target. I would love to build a "pythonic lean C++", without the hard stuff, without template or backward compatibility of C. Just python indenting, strongly typed, maps, set, geometric types, python-like standard functions... I guess that using C as a compile target allows one to avoid the hassle of building a low-level compiler... Not sure though.


To second point: do you have looked at D?


D is too high level, is not really pythonic, and the syntax seems to be too distant from C.

I just wish C had some nicer things, like a string type, maps, etc. python indentation would also make it readable.

D has a lot of difference with C/C++, even if it does things right.

C is nice because it's simple and readable.

D looks like it's using a lot of new syntax and it looks hard to adapt/learn.


Related HN discussion for when the 8cc project was first announced: https://news.ycombinator.com/item?id=9125912



Once you understand the type declaration syntax via the “declarations mirror use” guideline, there is nothing weird in C syntax. Any scary “pointer to function returning pointer to array” declarations are easily disentangled by applying the operators in the correct order around the declared name. People who know that rule tend to put the asterisk next to the name in pointer declarations.

The semantics of the language are quite simple if we discard the weird implicit integer promotion and conversion rules. Every operation yields a new value of a certain type, and some expressions are considered lvalues by virtue of designating a modifiable location in memory.


Rui was struggling with implementing it, not understanding and using it. He found that with 15 years of C experience, he didn't understand it well enough to just sit down and implement all of it without studying the spec.


I didn’t mean to devalue the efforts of the author, just wanted to point out that it is relatively straightforward to write a minimal C compiler, and the gotchas are in some small details.


But the art of this is in getting all these details accounted for in a staightforward manner.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: