But less satisfying, it looks like clang optimizes out the entire zero-initialization logic, presumably because it assumes the expression ((size_t *)p)[-1] is undefined behavior.
Makes me wonder what kind of compiler flags muslibc uses to prevent this.
Tried -fno-builtin-malloc on a hunch and it does seem to prevent the zero-initialisation logic from getting optimised away: https://godbolt.org/z/xWbEqcenn
musl probably uses -fno-builtin to disable any special handling of "builtin" library functions like malloc and memcpy?
> But less satisfying, it looks like clang optimizes out the entire zero-initialization logic, presumably because it assumes the expression ((size_t *)p)[-1] is undefined behavior.
Yeah, I also noticed that is undefined behavior!
> Makes me wonder what kind of compiler flags muslibc uses to prevent this.
As far as I can see, the only related flags are `-ffreestanding`, `-fno-builtin` and `-fno-strict-aliasing` but I think none of these flags prevent the undefined behavior if 1) either the malloc() implementation gets inlined into calloc(), or 2) calloc() gets inlined into the caller, so that the compiler realizes that `p` is not actually an array of `size_t` or compatible integers.
So perhaps musl also avoids undefined behavior by forcing `malloc()` and `calloc()` to not be inlined into the caller? Although I'm not entirely sure that actually completely prevents undefined behavior, I think it would just avoid it assuming how compilers usually/currently work?
Edit: On further investigation, I think what I said above is wrong. I think `-ffreestanding` and `-fno-builtin` should be enough to prevent undefined behavior. I wasn't considering the case where `((size_t) *p)[-1]` points to a struct like this:
struct alloc {
size_t something;
unsigned char buffer[]; // flexible array member
}
In this case, then I think the code is not triggering undefined behavior when used with those compiler flags (which prevent the compiler from using its knowledge about malloc's special semantics).
Edit 2: On second thought, even then I think it might trigger undefined behavior because the C standard doesn't guarantee that there's no padding at the end of such a struct?
In the real world, musl would call its own `malloc` and can presumably trust that `((size_t *)p)[-1]` is valid and contains the block metadata (and the compiler can't assume it isn't valid), whereas the godbolt snippet calls libc's `malloc` which is a black box to outside callers. Or maybe LLVM can even deduce that the low bits are always zero on `libc`?
“Pointer arithmetic is well defined only on array type and between array[0] and array[array_size+1]”
And it’s even stricter:
- you can create a pointer to array[array_size+1], but dereferencing it invokes undefined behavior.
- changing a pointer that points into an object so that it points into another object invokes undefined behavior. Modern compilers track “pointer provenance” to sometimes speed up code (and occasionally perplex the programmer)
Funnily enough the SO answer has a off by one error. &array[array_size] is the last valid pointer (i.e. the pointer is valid, but can't be dereferenced).
Those rules apply to access to the underlying array, they’re not a constraint on syntax for array expressions. malloc just returns a pointer, which could be in the middle of an array if malloc were an ordinary user-defined function.
It’s only special knowledge of the behavior of standard malloc that allows the compiler to assume that pointer is to offset 0 of an allocated array.
> malloc just returns a pointer, which could be in the middle of an array if malloc were an ordinary user-defined function.
Yes, but even in that case, it would have to be an array of `size_t` objects for the pointer cast `(size_t *)` in the musl code not to trigger undefined behavior (especially if malloc would get inlined), right?
Which means calloc() would trigger undefined behavior as soon as one of its callers used it for allocating a non-size_t array (especially if calloc() would get inlined with link-time optimization)?
You just need certain parts of the returned allocation to be castable to size_t, and there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator. The underlying allocation could be as big as you want, and can be composed of whatever type you like.
> You just need certain parts of the returned allocation to be castable to size_t, and there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator. The underlying allocation could be as big as you want, and can be composed of whatever type you like.
Hmm, I don't think this is true.
Even if you don't assume special semantics for malloc() and friends (by using the `-ffreestanding` and the `-fno-builtin` flags) and even if the compiler knows nothing about the malloc() implementation, I think if you're doing pointer arithmetic (with a pointer type other than `void *` or `[unsigned] char *`) then the compiler can assume that the pointer is pointing to either: 1) an actual array of objects of the underlying type (or a compatible one), or 2) a pointer to a single object, which is treated as an array with one element.
So if you do `((size_t *) p)[-1]` within `calloc()`, then the compiler can assume that `*p`, if used, must either be a `size_t` (or a compatible integer) or `p` must be pointing to one past the end of a single-element array (which, since `*p` is a dereference, would trigger undefined behavior).
Which means that if you call `calloc()` and it returns `p`, then you can only use it to dereference `size_t` objects or compatible ones (otherwise it would trigger undefined behavior). The effects of the undefined behavior would likely be observed if `calloc()` gets inlined into the caller (due to link-time optimization, for instance).
Or am I wrong?
You say no but then you list exceptions. Compatible integer types is easy. And can't you also have any pointers you want into a char array as long as they are aligned correctly? Also it should work to have a struct with a size_t followed by elements of a different type, shouldn't it?
You said "there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator" and I tried to reason how the compiler can prove that `*p` must be a `size_t` or compatible integer, even if it knows nothing about malloc() or its implementation. Which would trigger undefined behavior in the caller of calloc() if it weren't trying to allocate a size_t (or size_t array, or of compatible integers). Although normally this would only produce a visible effect if calloc() gets inlined into the caller.
> Compatible integer types is easy.
What do you mean?
> And can't you also have any pointers you want into a char array as long as they are aligned correctly?
I think so, but I think the problem is that when you do `(size_t *) p` (or pointer arithmetic with that) then the compiler can assume that p points to something that is not just a char array, which allows it to do more optimizations (due to the undefined behavior rules).
> Also it should work to have a struct with a size_t followed by elements of a different type, shouldn't it?
I think in general it wouldn't, because structure members are not guaranteed to be contiguous (i.e. they may have padding between them), so you may be triggering undefined behavior as well. Heck, they aren't even guaranteed to be laid out in memory in the same order as you define them (even though in practice that's what compilers do).
However, if your struct has just a `size_t` field followed by a flexible array member then I think it would be OK!
Edit: on second thought, even with a flexible array member I'm not sure it doesn't trigger undefined behavior because I think the C standard allows there to be padding at the end of such a struct. Which would be OK if the code was operating on a pointer to such a struct, but not to a pointer to size_t. So I think the code triggers undefined behavior even in this case.
> I think so, but I think the problem is that when you do `(size_t *) p` (or pointer arithmetic with that) then the compiler can assume that p points to something that is not just a char array, which allows it to do more optimizations (due to the undefined behavior rules).
For the sake of clarity, I'm going to call `(size_t *) p` "q" for the rest of this post.
Pointers usually have the type of what they point to. But pointers that are one past the end don't follow that rule. q-1 is a size_t, which means q is one past the end of an array containing one size_t. Therefore the existence of the q pointer does not tell the compiler what type actually exists at memory address p/q. Is it invalid to construct q from p? I wouldn't think so as long as they are in the same allocation, but maybe I'm forgetting a rule.
> I think in general it wouldn't, because structure members are not guaranteed to be contiguous
Surely there's a way to allocate two different types up against each other, though? Even if you don't make an actual struct, you can grab some char and say that X bytes are size_t and Y bytes are something else, can't you?
> Therefore the existence of the q pointer does not tell the compiler what type actually exists at memory address p/q. Is it invalid to construct q from p? I wouldn't think so as long as they are in the same allocation, but maybe I'm forgetting a rule.
But then the question is, how would you construct an allocation where both q[-1] and *p are guaranteed to be valid objects of their respective types, using standard C? (i.e. without relying on implementation-specific behavior).
> Surely there's a way to allocate two different types up against each other, though?
If by "against each other" you mean without padding between them, then I think the C standard allows that but there's no reliable/portable/guaranteed way to do it (that I know of).
By that I mean that specific implementations can guarantee that, either with struct packing or, in some cases, by carefully aligning them within a struct or a union. But this depends on the implementation/architecture, so it's not strictly portable/standard C, I think.
> Even if you don't make an actual struct, you can grab some char and say that X bytes are size_t and Y bytes are something else, can't you?
I'm not sure that is possible according to the standard, unless they already were objects of those types. See this[0]:
> It is always presumed that a char* may refer to an alias of any object. It is therefore quite safe, if perhaps a bit unoptimal (for architecture with wide loads and stores) to cast any pointer of any type to a char* type.
> The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.
> As noted by Pinskla it is not deferencing a char* per se that is specifically recognized as a potential alias of any object, but any address referring to a char object. This includes an array of char objects
That said, I think for converting a bunch of chars to another type it's possible to get around that with memcpy() in some cases, but I'm not sure it would be helpful in this scenario.
How about this: We can make a struct and then have a function that bails out if for some reason the fields are not adjacent in the struct. Possibly even at compile time. In practice, this will work on basically any compiler.
Once we know we have this struct, are there any remaining barriers to the cast and -1 access?
>> Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.
How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?
> Once we know we have this struct, are there any remaining barriers to the cast and -1 access?
I think the only problem is that malloc() wouldn't know the type of the non-size_t field, so it couldn't construct such a struct to do a conforming allocation which would allow calloc() to do those operations. But if the compiler doesn't have access to the malloc() implementation (assuming it doesn't know that malloc() is a special function), then maybe that doesn't invoke undefined behavior? I'm not sure, actually!
But I think that this whole saga is actually a consequence of the standard being flawed, in that I think it's not possible to create an implementation of malloc() using strict standard C (but I could be wrong, I'm not exactly an expert on the C standard).
> How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?
The return value of malloc() is treated specially by the standard, I believe. Which is why musl has to use the `-ffreestanding` and `-fno-builtin` flags, I think (to avoid optimizations by the compiler due to special knowledge of standard functions), which are non-standard.
Well, musl also uses '-fno-strict-aliasing' (also a non-standard flag) which probably is what actually allows it to implement malloc() without invoking undefined behavior.
I think this is also why GCC provides a "malloc" attribute which you can use in your own functions, so that they are treated specially (like malloc()), but I believe that this is also non-standard.
What I mean is that malloc doesn't know the size of any of the fields.
Let's assume for the sake of simplicity that `sizeof element` is the same as `sizeof size_t` or a multiple of it. We don't need to assume this, it just makes the math nicer for the moment.
Or I do: char* buffer = calloc(count + 1, sizeof element);
How is C supposed to know what goes in that buffer?
Maybe it's an array of elements. Maybe it's an array of size_t[sizeof element / sizeof size_t].
Maybe it's an array of union { element, size_t[sizeof element / sizeof size_t]}.
If I want to store an `element` at a location `sizeof element` bytes into the buffer, how can it say no? If I want to store a `size_t` at a location `sizeof element - sizeof size_t` bytes into the buffer, how can it say no? If I want to do both, how can it say no?
It returns a pointer to a chunk of memory of sufficent fore size_t (unsigned, positive whole number of elements) where each of those elements is size_t (unsigned, positive whole number) in size within the limits of your computationally representable chunk of numbers.
No code "knows" squat about the implementation. You can know. You're the programmer, you can read the code. The computer just does. Even if that causes demons to fly out your nose.
Friends don't let friends write code without checking whether it'll cause demons to fly out their nose first. If they ain't your friend though, there's always checking ERRNO.
I think I was operating under more conservative assumptions about what is actually legal to do in C.
I actually had to go and read the standard because the strict aliasing rules can be very tricky and it's not always clear what is legal to do in every situation. Even then, it's hard to understand the wording of the standard and its implications, especially since there are multiple rules coming into play and there's no clear explanation of how they interact with each other.
In this specific case that we're talking about, I think you can indeed allocate a buffer, store the malloc-returned pointer in a `char *` variable, and then store an element and a size_t wherever you want within the buffer (but if you want to store them or access them using a properly-typed pointer rather than memcpy() then the address needs to respect the alignment of the type, I think).
But I think this is only legal because the addresses within a buffer returned by malloc are not considered to have a declared type and that the type of some address within the buffer becomes established whenever you assign or copy some value to it, so it doesn't actually necessarily become established based on the valid conversions between pointer types like I thought it was.
I'm still not exactly 100% certain about whether the code we're talking about is actually undefined behavior or not, because on one hand, the standard clearly says that using the unary `*` operator on a pointer which points one past the end of an array is undefined behavior (and `p` could be considered such a pointer that would be used with `*`), but on the other hand, it's hard to see how this can be the case when `p` can be a perfectly valid pointer to store some other value within the same allocation.
The rule which says that it's undefined behavior to use `*` on a pointer that points one past the end of an array (which can be a single value) is inside the pointer arithmetic section. So my interpretation is that, if there is a valid and properly-typed value stored at the address of a pointer that also points to one past the end of an array, it's only undefined behavior to use `*` on such a pointer if the address of that pointer was obtained through pointer arithmetic based on a pointer that also pointed to the array.
So basically, I think once you do any pointer arithmetic, the resulting pointer cannot ever point outside the array or value, except one past the value/array. However, in this latter case, you're not allowed to use `*` on that address through such a pointer under any circumstance.
I think this gets trickier if the array is a dynamically-allocated array of chars because this array of chars could contain values of other types, so I'm not 100% sure of the implications there.
Anyway, if my interpretation is correct, I think in this case the rules are subtle: it seems that reading from or writing to `((size_t *) p)[-1]` is legal if there already is a `size_t` stored at that address (I think?), otherwise it wouldn't be. But it doesn't seem to be legal to read from or write to `(((size_t *) p)[-1])[1]` (unless `p` points to some non-first element of a size_t array, of course). However, it would be legal to write to `*((size_t *) p)` as long as the allocation is valid, I think. And it would also be legal to read from there if the address represented by `p` already contains a `size_t` there, I think (??).
Note that "C" (or rather, the compiler) can't necessarily know what is the type of some value at some address, especially if the value was (or could have been) assigned or copied inside some other function. So you could try to exploit that lack of knowledge to do something that wouldn't be legal otherwise.
However, I think that is a very flawed approach and that you should never rely on this kind of reasoning, because if the compiler (or linker) chooses to inline the code of that other function or do some other kind of inter-procedural analysis (e.g. to perform optimizations) then, suddenly, it can prove things that it couldn't prove before and your code breaks [0].
And this can happen when a new compiler or linker version gets released, without any warning. And the effects can be quite subtle, silent and dangerous (with extremely important security implications, even -- a similar issue has already happened with the Linux kernel, for instance, although the circumstances were slightly different).
Although, of course, I think this is more likely to happen if you just enable link-time optimization rather than some new compiler version being released.
[0] Of course, I'm assuming that the C standard doesn't necessarily guarantee that function implementations are opaque to each other, even if they are implemented in different modules. But it's possible there is such a rule in the C standard that I'm unaware of -- I have never read the entire standard.
“If the pointer P points to the ith element of an array, then the expressions P + n, n + P, and P - n are pointers of the same type that point to the i+nth, i+nth, and i-nth element of the same array, respectively. The result of pointer addition may also be a one-past-the-end pointer (that is, pointer P such that the expression P - 1 points to the last element of the array). Any other situations (that is, attempts to generate a pointer that isn't pointing at an element of the same array or one past the end) invoke undefined behavior.”
Makes me wonder what kind of compiler flags muslibc uses to prevent this.