Yea this also one reason I don't like how these two languages are often lumped as C/C++ so often.
These days they have diverged even further. So I find funny to see C/C++ on job listings like it's one thing. While you can write code in a very C like manner in C++. That is not how typical C++ is often written these days.
My favorite C "quirk": If you have an array and you want to access an item of it, you can swap the variable and the index number (put the variable name inside brackets and the number outside):
a[5]
is the same as:
5[a]
why?
a[5] is actually sugar for *(a + 5), so by commutative property, you can also do *(5 + a) to access the same memory position :-)
It's one of the B leftovers in C - In B, the only type is "machine word", and words are interpreted as ints or pointers depending on the operators used. Thus, distinguishing between a[i] and i[a] is impossible, so both were valid.
Array-to-pointer decay is another manifestation of this.
It’s in the GCC section so I assume it’s some kind of lambda-function-like compiler extension? That allows jumping between bodies of two different functions…!
It's "statement exprs": "A compound statement enclosed in parentheses may appear as an expression in GNU C. This allows you to use loops, switches, and local variables within an expression. [...] The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct." [0]
Regarding 12, alignment of bitfields, how I believe it works is that when the bitfield of type long is laid out, then the structure so far is considered to be a vector of storage cells whose size and alignment are those of long:
struct foo {
char a;
long b: 16;
char c;
};
So, a has been laid into the structure, so the current offset is 1 byte.
This is considered to be occupying a portion of an existing long type bitfield cell. In other words a is essentially taken to be an 8-bit field in the first long-sized cell of the structure. That cell looks like it has 56 bits left in it (if we assume 64 bit long). Since 56 > 16, the new bitfield b is placed into that cell. When that field is placed, the placement offset becomes 3. The type of c being char, that offset is acceptable for c.
I've painstakingly reverse engineered the rules when developing the FFI for TXR Lisp:
1> (sizeof (struct foo (a char) (b (bit 16 long)) (c char)))
8
2> (alignof (struct foo (a char) (b (bit 16 long)) (c char)))
8
3> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) a)
0
4> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) b)
** ffi-offsetof: b is a bitfield in #<ffi-type (struct foo (a char) (b (bit 16 long)) (c char))>
4> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) c)
3
I've summarized my empirically-obtained understanding for the benefit of users and anyone else doing similar work in a different project.
The cell size is not necessarily the same as "long" - it can be whatever the compiler wants, so long as alignment of non-bitfield fields is appropriate. It doesn't even have to be the same for every bitfield.
If the bitfield is declared as long, then based on that specific cell size the decision is made whether to pack the bits into the current cell or a new cell.
If a leading char member is followed by a uint64_t bitfield that is 57 bits wide, a new cell will be allocated for those 57 bits at offset 8. The char is considered to be a field of 8 bits allocated in an existing 64 bit cell, leaving 56. 57 cannot fit, and so the offset is bumped to the next cell alignment.
This is testable.
I'm only writing about GCC, not about ISO C, which specifies very little, allowing implementations latitude in choosing the underlying storage unit size and alignment for bitfields regardless of their declared type.
I really wish both would be valid in C11. Or rather I wish I had "systems-C" where all the undefined behaviour added for high performance computing was filed off and defined as "whatever the platform does".
> Or rather I wish I had "systems-C" where all the undefined behaviour added for high performance computing was filed off and defined as "whatever the platform does".
Depending on what you mean by undefined behavior, now you've made register allocation an invalid optimization. You really don't want to use that version of C.
> all the undefined behaviour added for high performance computing
UBs were added for cross-incompatibilities, where operations were too "core" (and / or untestable) for IBs to be acceptable. The reason was not performance (aside from not imposing a runtime check where that would have been possible) but portability:
> 3.4.3 undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Those UBs were leveraged later on by optimising compilers, because they provide constraints compensating for C's useless type system.
So you can just use a non-optimising compiler or one which only does simple optimisations (e.g. tcc), and see what the compiler generates from your UBs.
I think volatile variables are a good example of how we really don't want to rely on platform semantics most the time. We use volatile variables to tell the compiler "hey, the platform actually cares about these writes, so do them the way I wrote them."* But this is relatively rare! The vast majority of the time, we want the compiler to go nuts with constant propagation and reordering and all the other good stuff. A language where volatile was the default and "go nuts" was an explicit keyword would be really annoying to use.
* There are more things in heaven and earth (MMIO!) than are dreamt of in your memory model.
Any given implementation is free to define any particular instance of UB. "Whatever the platform does" is still UB for portable code though, since the set of possible platforms and their behavior is unbounded.
Also to be pedantic: "= {};" is not valid C (at least until C23) and fails to compile on MSVC - GCC and Clang accept it as a non-standard language extension though (the proper form would be "= {0};").
"Flat Initializer Lists" is given as an example in K&R C I think, at least the first edition, when writing those extra braces to fill out an initializer must have felt very redundant.
These days many compilers will warn if you do this, however, as it is rare people do this and usually indicates a misunderstanding of the type used.
I think it's quite readable though, so it's a shame it causes warnings. What do you think?
I find it slightly worse to read. It's C, so my brain is in "newlines don't matter" reading mode, so I see an array of 6 things and then have to mentally split them back up.
8. Modifiers to array sizes in parameter definitions [https://godbolt.org/z/FnwYUs]
void foo(int arr[static const restrict volatile 10]) {
// static: the array contains at least 10 elements
// const, volatile and restrict all apply to the array type.
}
I imagine most of these depend on the C version, but this one specifically bit me because one tool only supported c99 and the other was c11 or something later.
What? UB is clearly undesirable, but assuming it is impossible and deducing other outcomes must be meant are clearly wrong assumptions by the compiler writer.
More sensible compilers (including older version of clang) do the right thing (TM) here and yield a compiler error.
There were earlier attempts at do-what-i-mean programming languages. They are rightfully buried in history.
UB is not impossible; I think the author is being a little cheeky there. But the standard does grant compilers extreme liberties as far as how they deal with programs which can execute UB. LLVM's choice of what to do with that liberty, in this case, seems to be to assume the UB is unreachable and continue legally optimizing the program under that assumption. That's not a wrong assumption according to the definition of C.
It's debatable whether it's a good assumption. But not wrong.
> UB is clearly undesirable, but assuming it is impossible and deducing other outcomes must be meant are clearly wrong assumptions by the compiler writer.
Compilers can and absolutely do assume that UB is impossible in this code (no integer overflow) and deduce other outcomes must be meant (the loop operates on contiguous memory):
void foo(char* arr, int32_t end)
{
for (int32_t i = 0; i != end; ++i)
arr[i] = 0;
}
Assuming undefined behavior is impossible is a way for the compiler to optimize code. In fact, it is the main reason why UB exists.
It is exemplified by the C++23 std::unreachable() function, its description is "invokes undefined behavior". It is intended to mark part of the code that are unreachable, so that they don't appear in optimized builds, but may appear in debug builds. It is an explicit use of "the power of UB", an optimizing compiler considers that calling std::unreachable() is impossible, so all code paths that lead to it can be safely pruned. In an debug build, the code may be generated anyways, and the compiler will chose something sensible for what happens when it is called, typically a crash, but it can be anything, it is UB.
From where I stand many of do-what-i-mean programming languages are doing just great in distributed computing, Web and mobile OSes, taking over roles that used to be done in C and C++ during the last century.
It’s also not good advice, because if you put your code through that many off by default compiler warnings, you’ll just find bugs in the warnings. eg -Wstrict-aliasing in gcc can be wrong and -Wdeprecated can be literally impossible to fix.
is effectively a standard way to declare that p must be non-null pointer. I always wondered if any compiler actually makes use of this for optimization purposes.
After learning about a few of these I started to understand why people coming from C always said that PHP is a well designed language…
But OK, I understand that my mind is just not made for the complexity of C. Most likely I'm not a real programmer.
I get instantly knots in my brain and start to bang my head against the wall when I need to look for too long on C code. Actually even C documentation is enough to trigger this. (I get mad every time I have to look on a Linux system man page).
This is highly subjective of course. Other people seem to love C!
I'm more of a grug brain¹, who mostly only understands plain pure functions.
Input in, output out. No magic. Everything else's too taxing.
PHP is a well-designed language because it has value types (immutable data structures). That puts it far above any language without them for correctness. The poorly designed old database libraries aren’t really a language issue.
> Main directly calls this_is_not_directly_called_by_main in this implementation. This happens because: [...] LLVM assumes that bar() will have executed by the time main() runs.
I think this reasoning is slightly incorrect, although I don't blame the author as this is a very common misconception. I believe the correct reasoning might be as follows:
1. The compiler sees the pointer is dereferenced.
2. The compiler infers the pointer was not NULL.
3. The compiler determines a set of candidates for its target (which may be the universal set).
4. If it finds only one candidate, it just substitutes the target.
The critical thing to notice here is that the compiler doesn't need to care about the reachability of that candidate. It's making a conservative over-approximation, after all. You can witness the effect of this by formulating an impossible condition inside bar() that the compiler completely ignores: see [1]. Note the pointer assignment cannot have been implied by "bar() will have executed", as the execution of bar() could never lead to that assignment anyway!
You could use it for getting an address that will be linked in later. On GCC I get a warning (which I don't think I can mask) for taking the address of such an object, because its expression is type void. A better way of achieving this is usually to declare something like extern unsigned char foo[] instead, but that has a type other than void*.
This one is actually pretty simple, it works a lot like static or typedef. It's really just a modifier for what is being declared - in a typedef we're not declaring a name to refer to an instance (variable) of a type, but we're declaring a name to refer to the type itself.
Quote: "4. Flexible array members ..... int elems[]; // <-- flexible array member"
TIL that a dynamic array is also called flexible. This generation, out of boringness, is trying to redefine well established paradigms? Because, for me, a 90's formed developer, "flexible" means maybe inheritance, or even better polymorphism. There is nothing flexible about a dynamic array. Its structure is well defined in the stack/heap, and with current compiler optimizations can even be demoted to a simple static array for faster access within CPU registries.
"Dynamic array" refers to block of memory allocated via malloc() which you just happen to use as array.
"Flexible array member" [0] is when you have a struct and its last member is an array with unspecified size.
An example:
#include <stdio.h>
#include <stdlib.h>
struct Foo {
int len;
int* arr; // dynamic "array"
};
struct Bar {
int len;
int arr[]; // FAM
};
int main()
{
const int n = 12;
// have to allocate myself; no guarante it will be nearby the rest of struct
struct Foo* a = malloc(sizeof a);
a->arr = malloc(n * sizeof *(a->arr));
// array is part of the memory allocated for struct
struct Bar* x = malloc((sizeof x) + n*(sizeof *(x->arr)));
return 0;
}
>"Dynamic array" refers to block of memory allocated via malloc() which you just happen to use as array.<
No. A dynamic array is an array which can be expanded or shrinked during its runtime life. The fact that C/C++ uses malloc for that (and btw, it's not the only way to do it) it's her problem. In other languages you have dynamic arrays that can be expanded/shrinked without using an extra line - main reason why nowadays Rust is a replacement for C/C++
>[0]<
From you own wiki reference: "the flexible array member must be last"
LMAO, really? Well, that indeed is a bigger C quirk. In Pascal, as an example, I can have it anywhere inside the record (struct equivalent of C), and it can be just as "flexible".
It has to be last because it's not a pointer to the array, it is the array. The array elements are immediately after the struct in memory. You can't resize it without reallocating the whole struct.
While a few of these were interesting I'd love to see a short technical explanation of each quirk for the feeble high-level programmer (me). The first one for example, is foo initialised? How so?
The reason is that a struct doesn't generate a new scope, like in C++. If you define something inside a struct it will also be available outside of the struct.
I think it's aimed at C programmers. foo is a struct, so it's a type, it's not a variable. The point is just that struct bar is also defined by the definition of struct foo.
The "Compound literals are lvalues" is one that caught me out recently.
I've been doing C a long time and thought I knew all the "decent" tricks. When I saw it, I went "Oh, that's one of the silly new dynamic features that I ignore."
Nope.
It's been in the language forever. I'm surprised I never tripped over it before given all the embedded work I do.
I consider the array pointer stuff a bit of a foot-gun in C. I've seen too many examples of people mixing up uint8_t[][] and uint8_t**.
The "compound literals are lvalues" pattern I've seen many times for inline initializing a struct that's only going to be around as a parameter to a single function call.
I just fixed some neural-net code to use #7. I hate passing pointers to layers that have a fixed size, and passing an array causes problems sometimes that require too many cats. Typedef array pointer to the sized array is precisely what I needed.
Special mentions #2 looks like it might be useful, but rightly produces errors in C++ (and warnings in C). OTOH, `__builtin_constant_p()` is true for other things than constant literals.
If `x` is a constant, `(x) * 0l` is a zero constant, so `(void*)((x) * 0l)` is a null pointer. When a null void pointer is one branch of a ternary conditional, the expression takes the (pointer) type of the other branch.
If `x` is not a constant, `(void*)((x) * 0l)` is a void pointer to address 0 (which may not even be a null pointer at runtime, since null may have a runtime address distinct from zero!). The ternary conditional then unifies the types of the branches, resulting in `void*`.
My understanding of how it works is, with constant value, the compiler replaces (x) with the constant 0 and converts
(void *) into (int *) which makes the size equality to return true. But I am not entire sure :)
Most people that still cling to C instead of C++, do it because they are stuck in UNIX clones kernel stuff, embedded, or are religiously against anything else.
So whatever language Rust "replaces" is a kind of moot point, and then there is the whole ongoing integration with Linux, a UNIX clone kernel.
Pointing out that a comment doesn't contribute to discussion doesn't contribute to discussion either. I'm definitely not contributing much by saying this.
I also see a fair few elements on that list as being problematic, to say the least. Can't stand Rust, though, so for those times I really need high performance I try and keep my C knowledge sharp-ish.
Fortunately GCC has a whole bucket-list of warnings that can be enabled (I like compiling with -Wall -Wextra -pedantic, myself) which can, combined with proper tooling, catch many issues.
http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf
Source: https://freecomputerbooks.com/Deep-C-and-Cpp.html#downloadLi...
Previous discussion: https://news.ycombinator.com/item?id=3093323
It could be considered a bit dated at this point (It's before C++11) but I find it still both entertaining and educating.