> Once we know we have this struct, are there any remaining barriers to the cast and -1 access?
I think the only problem is that malloc() wouldn't know the type of the non-size_t field, so it couldn't construct such a struct to do a conforming allocation which would allow calloc() to do those operations. But if the compiler doesn't have access to the malloc() implementation (assuming it doesn't know that malloc() is a special function), then maybe that doesn't invoke undefined behavior? I'm not sure, actually!
But I think that this whole saga is actually a consequence of the standard being flawed, in that I think it's not possible to create an implementation of malloc() using strict standard C (but I could be wrong, I'm not exactly an expert on the C standard).
> How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?
The return value of malloc() is treated specially by the standard, I believe. Which is why musl has to use the `-ffreestanding` and `-fno-builtin` flags, I think (to avoid optimizations by the compiler due to special knowledge of standard functions), which are non-standard.
Well, musl also uses '-fno-strict-aliasing' (also a non-standard flag) which probably is what actually allows it to implement malloc() without invoking undefined behavior.
I think this is also why GCC provides a "malloc" attribute which you can use in your own functions, so that they are treated specially (like malloc()), but I believe that this is also non-standard.
What I mean is that malloc doesn't know the size of any of the fields.
Let's assume for the sake of simplicity that `sizeof element` is the same as `sizeof size_t` or a multiple of it. We don't need to assume this, it just makes the math nicer for the moment.
Or I do: char* buffer = calloc(count + 1, sizeof element);
How is C supposed to know what goes in that buffer?
Maybe it's an array of elements. Maybe it's an array of size_t[sizeof element / sizeof size_t].
Maybe it's an array of union { element, size_t[sizeof element / sizeof size_t]}.
If I want to store an `element` at a location `sizeof element` bytes into the buffer, how can it say no? If I want to store a `size_t` at a location `sizeof element - sizeof size_t` bytes into the buffer, how can it say no? If I want to do both, how can it say no?
It returns a pointer to a chunk of memory of sufficent fore size_t (unsigned, positive whole number of elements) where each of those elements is size_t (unsigned, positive whole number) in size within the limits of your computationally representable chunk of numbers.
No code "knows" squat about the implementation. You can know. You're the programmer, you can read the code. The computer just does. Even if that causes demons to fly out your nose.
Friends don't let friends write code without checking whether it'll cause demons to fly out their nose first. If they ain't your friend though, there's always checking ERRNO.
I think I was operating under more conservative assumptions about what is actually legal to do in C.
I actually had to go and read the standard because the strict aliasing rules can be very tricky and it's not always clear what is legal to do in every situation. Even then, it's hard to understand the wording of the standard and its implications, especially since there are multiple rules coming into play and there's no clear explanation of how they interact with each other.
In this specific case that we're talking about, I think you can indeed allocate a buffer, store the malloc-returned pointer in a `char *` variable, and then store an element and a size_t wherever you want within the buffer (but if you want to store them or access them using a properly-typed pointer rather than memcpy() then the address needs to respect the alignment of the type, I think).
But I think this is only legal because the addresses within a buffer returned by malloc are not considered to have a declared type and that the type of some address within the buffer becomes established whenever you assign or copy some value to it, so it doesn't actually necessarily become established based on the valid conversions between pointer types like I thought it was.
I'm still not exactly 100% certain about whether the code we're talking about is actually undefined behavior or not, because on one hand, the standard clearly says that using the unary `*` operator on a pointer which points one past the end of an array is undefined behavior (and `p` could be considered such a pointer that would be used with `*`), but on the other hand, it's hard to see how this can be the case when `p` can be a perfectly valid pointer to store some other value within the same allocation.
The rule which says that it's undefined behavior to use `*` on a pointer that points one past the end of an array (which can be a single value) is inside the pointer arithmetic section. So my interpretation is that, if there is a valid and properly-typed value stored at the address of a pointer that also points to one past the end of an array, it's only undefined behavior to use `*` on such a pointer if the address of that pointer was obtained through pointer arithmetic based on a pointer that also pointed to the array.
So basically, I think once you do any pointer arithmetic, the resulting pointer cannot ever point outside the array or value, except one past the value/array. However, in this latter case, you're not allowed to use `*` on that address through such a pointer under any circumstance.
I think this gets trickier if the array is a dynamically-allocated array of chars because this array of chars could contain values of other types, so I'm not 100% sure of the implications there.
Anyway, if my interpretation is correct, I think in this case the rules are subtle: it seems that reading from or writing to `((size_t *) p)[-1]` is legal if there already is a `size_t` stored at that address (I think?), otherwise it wouldn't be. But it doesn't seem to be legal to read from or write to `(((size_t *) p)[-1])[1]` (unless `p` points to some non-first element of a size_t array, of course). However, it would be legal to write to `*((size_t *) p)` as long as the allocation is valid, I think. And it would also be legal to read from there if the address represented by `p` already contains a `size_t` there, I think (??).
Note that "C" (or rather, the compiler) can't necessarily know what is the type of some value at some address, especially if the value was (or could have been) assigned or copied inside some other function. So you could try to exploit that lack of knowledge to do something that wouldn't be legal otherwise.
However, I think that is a very flawed approach and that you should never rely on this kind of reasoning, because if the compiler (or linker) chooses to inline the code of that other function or do some other kind of inter-procedural analysis (e.g. to perform optimizations) then, suddenly, it can prove things that it couldn't prove before and your code breaks [0].
And this can happen when a new compiler or linker version gets released, without any warning. And the effects can be quite subtle, silent and dangerous (with extremely important security implications, even -- a similar issue has already happened with the Linux kernel, for instance, although the circumstances were slightly different).
Although, of course, I think this is more likely to happen if you just enable link-time optimization rather than some new compiler version being released.
[0] Of course, I'm assuming that the C standard doesn't necessarily guarantee that function implementations are opaque to each other, even if they are implemented in different modules. But it's possible there is such a rule in the C standard that I'm unaware of -- I have never read the entire standard.
I think the only problem is that malloc() wouldn't know the type of the non-size_t field, so it couldn't construct such a struct to do a conforming allocation which would allow calloc() to do those operations. But if the compiler doesn't have access to the malloc() implementation (assuming it doesn't know that malloc() is a special function), then maybe that doesn't invoke undefined behavior? I'm not sure, actually!
But I think that this whole saga is actually a consequence of the standard being flawed, in that I think it's not possible to create an implementation of malloc() using strict standard C (but I could be wrong, I'm not exactly an expert on the C standard).
> How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?
The return value of malloc() is treated specially by the standard, I believe. Which is why musl has to use the `-ffreestanding` and `-fno-builtin` flags, I think (to avoid optimizations by the compiler due to special knowledge of standard functions), which are non-standard.
Well, musl also uses '-fno-strict-aliasing' (also a non-standard flag) which probably is what actually allows it to implement malloc() without invoking undefined behavior.
I think this is also why GCC provides a "malloc" attribute which you can use in your own functions, so that they are treated specially (like malloc()), but I believe that this is also non-standard.