> The C vararg spec is not well-designed. If you pass all function arguments via the stack, va_start may be implemented pretty easily, but on the modern processor and in modern calling convention, arguments are passed via registers to reduce overhead of function calls. So the assumption of the spec does not match the reality.
The original assumption of the pre-ANSI <vararags.h> was that way, since it was just a hack exploiting actual object code behavior in the absence of language spec support. But ANSI C standardized the ... ellipsis, and the old hack became undefined behavior. A function declaration with the ellipsis is not type compatible with one that lacks it. This difference in type means that the compiler can map variadic functions to a different calling convention, such as one that always uses the stack for all the trailing arguments, even if they occur in positions for which the regular convention uses registers.
I wouldn't call variadic functions well designed on the whole, but aspect isn't badly designed.
Author here. These days I'm writing a new C compiler (https://github.com/rui314/chibicc) from scratch again to write a book about compilers.
Since I'll be using the new one as a reference implementation for a book, and the book is intended to be for beginners, I put as much effort as I can into improving the readability of the code. In particular, not only the head of the repository but every commit in the repo should be readable, so that readers can easily understand how each feature is implemented. I believe I'm doing a good job keeping it clean so far. (Actually in order to keep the commit history clean, I continue rewriting commit history and doing `git push -f`, but that should be fine because the purpose of publishing the repo is not for co-development but for sharing a reference implementation.)
chibicc does not have a C preprocessor, but except that it can compile itself already, so if you are interested, you can take a look.
How's it coming along? Will there be an English translation available? Actually, I just tried to translate part of the book (the whole book is 'too big to translate') with Google translate, and the translation looks of remarkably high quality compared to what I'm used to from Google translate.
I would love to see a similar book/project for a linker and assembler.
I will translate it to English. Be wary of the machine translation -- using the neural network they learned how to write surprisingly natural sentences but that doesn't mean their translation is correct.
(Actually in order to keep the commit history clean, I continue rewriting commit history and doing `git push -f`,
actually I wonder if you might consider to maintain another clean repository. So there is a raw repository to record every real history, and a clean repositiory to for better reading? Because sometimes the raw history could be interesting too.
I haven't considered that but I think that's hard to do for technical reasons. New commits are made on top of a clean history, so there's no single history of "raw" commits but they are intermingled.
I would recommend you keep doing it the way you are. You are working on an educational exposition and choosing commits as part of that. You should take the position that the best judge of proper presentation.
That said you could always turn off gc if some people want it, but I wouldn't put any more than a token effort into keeping that history.
I can disable gc, but reflog is really messy that I think no one can get any info from it. Even I can't, even though I know exactly what I did to the repo.
For the reference folks - this is coming from the guy who's wikipedia page [0] says "He was hired by Facebook to write a fast C/C++ preprocessor in D."
Not that I have had any doubts about how hard it is...
https://www.spinellis.gr/blog/20060626/cpp.algo.pdf was actually very useful. With that doc, I could implement a preprocessor without thinking too much about how it should work in details. In that sense, even after I implemented a C preprocessor, I don't feel I fully understand how it works.
Walter, I think your Zortech compiler was one of the first (if not The First) to introduce register calling convention. I don't remember much, but how did you get around varargs implementation?
Simply, varargs functions didn't use the register calling convention. Varargs is rarely used outside of printf, and so isn't worth the effort optimizing. Indeed, the Linux 64 bit ABI for varargs is hopelessly klunky, but it doesn't really matter.
It's relatively frequent to see it in source code, but those calls are cold code. People aren't calling open(2) millions of times per second in their inner loop.
It's on my TODO list. After, of course, I finish that x86_32 preemptive multi-tasking kernel I started writing 20 years ago and never finished. Plus the 16-bit CP/M clone for the 68000 that's gathering dust. Or maybe after my half finished Z80 emulator or my half finished Z80-systems-in-Unreal-Engine project, or my mostly-but-not-completely-working VT100 emulator. Sigh. How do people ever finish this stuff?
I'm writing a new one (https://github.com/rui314/chibicc) and also writing a book about how to write a C compiler using my new compiler as a reference implementation.
There are so many small C compilers, I never know which one to choose, although TinyCC seems to be the best. To me it allows 2 things:
* Use C as a fast scripting language. This can be useful for game programming, where you don't want to recompile your engine, but you can't tolerate the sluggishness of interpreted languages.
* Use C as a compile target. I would love to build a "pythonic lean C++", without the hard stuff, without template or backward compatibility of C. Just python indenting, strongly typed, maps, set, geometric types, python-like standard functions... I guess that using C as a compile target allows one to avoid the hassle of building a low-level compiler... Not sure though.
Once you understand the type declaration syntax via the “declarations mirror use” guideline, there is nothing weird in C syntax. Any scary “pointer to function returning pointer to array” declarations are easily disentangled by applying the operators in the correct order around the declared name. People who know that rule tend to put the asterisk next to the name in pointer declarations.
The semantics of the language are quite simple if we discard the weird implicit integer promotion and conversion rules. Every operation yields a new value of a certain type, and some expressions are considered lvalues by virtue of designating a modifiable location in memory.
Rui was struggling with implementing it, not understanding and using it. He found that with 15 years of C experience, he didn't understand it well enough to just sit down and implement all of it without studying the spec.
I didn’t mean to devalue the efforts of the author, just wanted to point out that it is relatively straightforward to write a minimal C compiler, and the gotchas are in some small details.
The original assumption of the pre-ANSI <vararags.h> was that way, since it was just a hack exploiting actual object code behavior in the absence of language spec support. But ANSI C standardized the ... ellipsis, and the old hack became undefined behavior. A function declaration with the ellipsis is not type compatible with one that lacks it. This difference in type means that the compiler can map variadic functions to a different calling convention, such as one that always uses the stack for all the trailing arguments, even if they occur in positions for which the regular convention uses registers.
I wouldn't call variadic functions well designed on the whole, but aspect isn't badly designed.