Hacker News new | past | comments | ask | show | jobs | submit login
Why does calloc exist? (2016) (vorpus.org)
303 points by goranmoomin on Nov 13, 2022 | hide | past | favorite | 163 comments



It may be true today that calloc checks for multiplication overflow and doesn't zero memory if it doesn't need to, that's modern revisionism. Historically, calloc did not do either of those things.

eg, from the 4.4 BSD, circa 1993:

https://minnie.tuhs.org/cgi-bin/utree.pl?file=4.4BSD/usr/src...

    void *
    calloc(num, size)
     size_t num;
     register size_t size;
    {
     register void *p;
    
     size *= num;
     if (p = malloc(size))
      bzero(p, size);
     return(p);
    }


Its not just historically (atleast if you include IoT/Industrial/Medical devices) Microsoft's "Section 52" (IoT security research group) called out this issue in several allocators used in IoT in April 2021 [0][1].

[0] https://msrc-blog.microsoft.com/2021/04/29/badalloc-memory-a...

[1] Full list: https://www.cisa.gov/uscert/ics/advisories/icsa-21-119-04


I don't believe the spec ever calls out what is exactly supposed to happen for multiplication overflow, and this was left up to the implementation to decide what to do. That being said, libmusl implemented a short but sweet check [1]. Fun stuff.

[1]: https://github.com/rofl0r/musl/blob/d05aaedaabd4f5472c233dbb...


Even more satisfying, I tried this out in godbolt and it appears that compilers are able to optimize that check into a single `mul, jo` (jump on overflow) sequence, which is how you would write it if you were writing assembly directly. It avoids a division instruction (which would be far more expensive) even though the code is written as a division: https://godbolt.org/z/r6cxo8vc7


But less satisfying, it looks like clang optimizes out the entire zero-initialization logic, presumably because it assumes the expression ((size_t *)p)[-1] is undefined behavior.

Makes me wonder what kind of compiler flags muslibc uses to prevent this.


Tried -fno-builtin-malloc on a hunch and it does seem to prevent the zero-initialisation logic from getting optimised away: https://godbolt.org/z/xWbEqcenn

musl probably uses -fno-builtin to disable any special handling of "builtin" library functions like malloc and memcpy?


> But less satisfying, it looks like clang optimizes out the entire zero-initialization logic, presumably because it assumes the expression ((size_t *)p)[-1] is undefined behavior.

Yeah, I also noticed that is undefined behavior!

> Makes me wonder what kind of compiler flags muslibc uses to prevent this.

As far as I can see, the only related flags are `-ffreestanding`, `-fno-builtin` and `-fno-strict-aliasing` but I think none of these flags prevent the undefined behavior if 1) either the malloc() implementation gets inlined into calloc(), or 2) calloc() gets inlined into the caller, so that the compiler realizes that `p` is not actually an array of `size_t` or compatible integers.

So perhaps musl also avoids undefined behavior by forcing `malloc()` and `calloc()` to not be inlined into the caller? Although I'm not entirely sure that actually completely prevents undefined behavior, I think it would just avoid it assuming how compilers usually/currently work?

  Edit: On further investigation, I think what I said above is wrong. I think `-ffreestanding` and `-fno-builtin` should be enough to prevent undefined behavior. I wasn't considering the case where `((size_t) *p)[-1]` points to a struct like this:

  struct alloc {
    size_t something;
    unsigned char buffer[]; // flexible array member
  }

  In this case, then I think the code is not triggering undefined behavior when used with those compiler flags (which prevent the compiler from using its knowledge about malloc's special semantics).

  Edit 2: On second thought, even then I think it might trigger undefined behavior because the C standard doesn't guarantee that there's no padding at the end of such a struct?


So, I just checked and it seems that the parent pointed to a musl repository that contains old code.

The current code doesn't contain that undefined behavior-triggering expression: https://git.musl-libc.org/cgit/musl/tree/src/malloc/calloc.c


In the real world, musl would call its own `malloc` and can presumably trust that `((size_t *)p)[-1]` is valid and contains the block metadata (and the compiler can't assume it isn't valid), whereas the godbolt snippet calls libc's `malloc` which is a black box to outside callers. Or maybe LLVM can even deduce that the low bits are always zero on `libc`?


Try with -ffreestanding. Otherwise clang is allowed to make assumptions about the behavior of "malloc".

It then does the typical unsatisfying clang thing of unrolling and vectorizing a _lot_ with unclear benefits: https://godbolt.org/z/eG33YM7qs


That is not undefined right? As long as p - 1 points into a valid object.

Edit: Maybe Clang abuses the bitand with an int 7 to remove the code?


It is undefined behavior. The undefined behavior rules for pointers boil down to (https://stackoverflow.com/questions/56360316/c-standard-rega...)

“Pointer arithmetic is well defined only on array type and between array[0] and array[array_size+1]”

And it’s even stricter:

- you can create a pointer to array[array_size+1], but dereferencing it invokes undefined behavior.

- changing a pointer that points into an object so that it points into another object invokes undefined behavior. Modern compilers track “pointer provenance” to sometimes speed up code (and occasionally perplex the programmer)


Funnily enough the SO answer has a off by one error. &array[array_size] is the last valid pointer (i.e. the pointer is valid, but can't be dereferenced).


Those rules apply to access to the underlying array, they’re not a constraint on syntax for array expressions. malloc just returns a pointer, which could be in the middle of an array if malloc were an ordinary user-defined function.

It’s only special knowledge of the behavior of standard malloc that allows the compiler to assume that pointer is to offset 0 of an allocated array.


> malloc just returns a pointer, which could be in the middle of an array if malloc were an ordinary user-defined function.

Yes, but even in that case, it would have to be an array of `size_t` objects for the pointer cast `(size_t *)` in the musl code not to trigger undefined behavior (especially if malloc would get inlined), right?

Which means calloc() would trigger undefined behavior as soon as one of its callers used it for allocating a non-size_t array (especially if calloc() would get inlined with link-time optimization)?


You just need certain parts of the returned allocation to be castable to size_t, and there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator. The underlying allocation could be as big as you want, and can be composed of whatever type you like.


> You just need certain parts of the returned allocation to be castable to size_t, and there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator. The underlying allocation could be as big as you want, and can be composed of whatever type you like.

  Hmm, I don't think this is true.

  Even if you don't assume special semantics for malloc() and friends (by using the `-ffreestanding` and the `-fno-builtin` flags) and even if the compiler knows nothing about the malloc() implementation, I think if you're doing pointer arithmetic (with a pointer type other than `void *` or `[unsigned] char *`) then the compiler can assume that the pointer is pointing to either: 1) an actual array of objects of the underlying type (or a compatible one), or 2) a pointer to a single object, which is treated as an array with one element.

  So if you do `((size_t *) p)[-1]` within `calloc()`, then the compiler can assume that `*p`, if used, must either be a `size_t` (or a compatible integer) or `p` must be pointing to one past the end of a single-element array (which, since `*p` is a dereference, would trigger undefined behavior).

  Which means that if you call `calloc()` and it returns `p`, then you can only use it to dereference `size_t` objects or compatible ones (otherwise it would trigger undefined behavior). The effects of the undefined behavior would likely be observed if `calloc()` gets inlined into the caller (due to link-time optimization, for instance).

  Or am I wrong?


You say no but then you list exceptions. Compatible integer types is easy. And can't you also have any pointers you want into a char array as long as they are aligned correctly? Also it should work to have a struct with a size_t followed by elements of a different type, shouldn't it?


> You say no but then you list exceptions.

You said "there's no way for the compiler to prove that wrong if it doesn't know the internals of the allocator" and I tried to reason how the compiler can prove that `*p` must be a `size_t` or compatible integer, even if it knows nothing about malloc() or its implementation. Which would trigger undefined behavior in the caller of calloc() if it weren't trying to allocate a size_t (or size_t array, or of compatible integers). Although normally this would only produce a visible effect if calloc() gets inlined into the caller.

> Compatible integer types is easy.

What do you mean?

> And can't you also have any pointers you want into a char array as long as they are aligned correctly?

I think so, but I think the problem is that when you do `(size_t *) p` (or pointer arithmetic with that) then the compiler can assume that p points to something that is not just a char array, which allows it to do more optimizations (due to the undefined behavior rules).

> Also it should work to have a struct with a size_t followed by elements of a different type, shouldn't it?

I think in general it wouldn't, because structure members are not guaranteed to be contiguous (i.e. they may have padding between them), so you may be triggering undefined behavior as well. Heck, they aren't even guaranteed to be laid out in memory in the same order as you define them (even though in practice that's what compilers do).

However, if your struct has just a `size_t` field followed by a flexible array member then I think it would be OK!

Edit: on second thought, even with a flexible array member I'm not sure it doesn't trigger undefined behavior because I think the C standard allows there to be padding at the end of such a struct. Which would be OK if the code was operating on a pointer to such a struct, but not to a pointer to size_t. So I think the code triggers undefined behavior even in this case.


> I think so, but I think the problem is that when you do `(size_t *) p` (or pointer arithmetic with that) then the compiler can assume that p points to something that is not just a char array, which allows it to do more optimizations (due to the undefined behavior rules).

For the sake of clarity, I'm going to call `(size_t *) p` "q" for the rest of this post.

Pointers usually have the type of what they point to. But pointers that are one past the end don't follow that rule. q-1 is a size_t, which means q is one past the end of an array containing one size_t. Therefore the existence of the q pointer does not tell the compiler what type actually exists at memory address p/q. Is it invalid to construct q from p? I wouldn't think so as long as they are in the same allocation, but maybe I'm forgetting a rule.

> I think in general it wouldn't, because structure members are not guaranteed to be contiguous

Surely there's a way to allocate two different types up against each other, though? Even if you don't make an actual struct, you can grab some char and say that X bytes are size_t and Y bytes are something else, can't you?


> Therefore the existence of the q pointer does not tell the compiler what type actually exists at memory address p/q. Is it invalid to construct q from p? I wouldn't think so as long as they are in the same allocation, but maybe I'm forgetting a rule.

But then the question is, how would you construct an allocation where both q[-1] and *p are guaranteed to be valid objects of their respective types, using standard C? (i.e. without relying on implementation-specific behavior).

> Surely there's a way to allocate two different types up against each other, though?

If by "against each other" you mean without padding between them, then I think the C standard allows that but there's no reliable/portable/guaranteed way to do it (that I know of).

By that I mean that specific implementations can guarantee that, either with struct packing or, in some cases, by carefully aligning them within a struct or a union. But this depends on the implementation/architecture, so it's not strictly portable/standard C, I think.

> Even if you don't make an actual struct, you can grab some char and say that X bytes are size_t and Y bytes are something else, can't you?

I'm not sure that is possible according to the standard, unless they already were objects of those types. See this[0]:

> It is always presumed that a char* may refer to an alias of any object. It is therefore quite safe, if perhaps a bit unoptimal (for architecture with wide loads and stores) to cast any pointer of any type to a char* type.

> The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.

> As noted by Pinskla it is not deferencing a char* per se that is specifically recognized as a potential alias of any object, but any address referring to a char object. This includes an array of char objects

That said, I think for converting a bunch of chars to another type it's possible to get around that with memcpy() in some cases, but I'm not sure it would be helpful in this scenario.

[0] https://cellperformance.beyond3d.com/articles/2006/06/unders...


How about this: We can make a struct and then have a function that bails out if for some reason the fields are not adjacent in the struct. Possibly even at compile time. In practice, this will work on basically any compiler.

Once we know we have this struct, are there any remaining barriers to the cast and -1 access?

>> Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.

How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?


> Once we know we have this struct, are there any remaining barriers to the cast and -1 access?

I think the only problem is that malloc() wouldn't know the type of the non-size_t field, so it couldn't construct such a struct to do a conforming allocation which would allow calloc() to do those operations. But if the compiler doesn't have access to the malloc() implementation (assuming it doesn't know that malloc() is a special function), then maybe that doesn't invoke undefined behavior? I'm not sure, actually!

But I think that this whole saga is actually a consequence of the standard being flawed, in that I think it's not possible to create an implementation of malloc() using strict standard C (but I could be wrong, I'm not exactly an expert on the C standard).

> How does that logic work with malloc, which unlike calloc takes no item size and simply gives you a specific number of bytes?

The return value of malloc() is treated specially by the standard, I believe. Which is why musl has to use the `-ffreestanding` and `-fno-builtin` flags, I think (to avoid optimizations by the compiler due to special knowledge of standard functions), which are non-standard.

Well, musl also uses '-fno-strict-aliasing' (also a non-standard flag) which probably is what actually allows it to implement malloc() without invoking undefined behavior.

I think this is also why GCC provides a "malloc" attribute which you can use in your own functions, so that they are treated specially (like malloc()), but I believe that this is also non-standard.


What I mean is that malloc doesn't know the size of any of the fields.

Let's assume for the sake of simplicity that `sizeof element` is the same as `sizeof size_t` or a multiple of it. We don't need to assume this, it just makes the math nicer for the moment.

I do: char* buffer = malloc((count + 1) * sizeof element);

Or I do: char* buffer = calloc(count + 1, sizeof element);

How is C supposed to know what goes in that buffer?

Maybe it's an array of elements. Maybe it's an array of size_t[sizeof element / sizeof size_t].

Maybe it's an array of union { element, size_t[sizeof element / sizeof size_t]}.

If I want to store an `element` at a location `sizeof element` bytes into the buffer, how can it say no? If I want to store a `size_t` at a location `sizeof element - sizeof size_t` bytes into the buffer, how can it say no? If I want to do both, how can it say no?


Read the source.

It returns a pointer to a chunk of memory of sufficent fore size_t (unsigned, positive whole number of elements) where each of those elements is size_t (unsigned, positive whole number) in size within the limits of your computationally representable chunk of numbers.

No code "knows" squat about the implementation. You can know. You're the programmer, you can read the code. The computer just does. Even if that causes demons to fly out your nose.

Friends don't let friends write code without checking whether it'll cause demons to fly out their nose first. If they ain't your friend though, there's always checking ERRNO.


I think you may be right and I was wrong.

I think I was operating under more conservative assumptions about what is actually legal to do in C.

I actually had to go and read the standard because the strict aliasing rules can be very tricky and it's not always clear what is legal to do in every situation. Even then, it's hard to understand the wording of the standard and its implications, especially since there are multiple rules coming into play and there's no clear explanation of how they interact with each other.

In this specific case that we're talking about, I think you can indeed allocate a buffer, store the malloc-returned pointer in a `char *` variable, and then store an element and a size_t wherever you want within the buffer (but if you want to store them or access them using a properly-typed pointer rather than memcpy() then the address needs to respect the alignment of the type, I think).

But I think this is only legal because the addresses within a buffer returned by malloc are not considered to have a declared type and that the type of some address within the buffer becomes established whenever you assign or copy some value to it, so it doesn't actually necessarily become established based on the valid conversions between pointer types like I thought it was.

I'm still not exactly 100% certain about whether the code we're talking about is actually undefined behavior or not, because on one hand, the standard clearly says that using the unary `*` operator on a pointer which points one past the end of an array is undefined behavior (and `p` could be considered such a pointer that would be used with `*`), but on the other hand, it's hard to see how this can be the case when `p` can be a perfectly valid pointer to store some other value within the same allocation.

The rule which says that it's undefined behavior to use `*` on a pointer that points one past the end of an array (which can be a single value) is inside the pointer arithmetic section. So my interpretation is that, if there is a valid and properly-typed value stored at the address of a pointer that also points to one past the end of an array, it's only undefined behavior to use `*` on such a pointer if the address of that pointer was obtained through pointer arithmetic based on a pointer that also pointed to the array.

So basically, I think once you do any pointer arithmetic, the resulting pointer cannot ever point outside the array or value, except one past the value/array. However, in this latter case, you're not allowed to use `*` on that address through such a pointer under any circumstance.

I think this gets trickier if the array is a dynamically-allocated array of chars because this array of chars could contain values of other types, so I'm not 100% sure of the implications there.

Anyway, if my interpretation is correct, I think in this case the rules are subtle: it seems that reading from or writing to `((size_t *) p)[-1]` is legal if there already is a `size_t` stored at that address (I think?), otherwise it wouldn't be. But it doesn't seem to be legal to read from or write to `(((size_t *) p)[-1])[1]` (unless `p` points to some non-first element of a size_t array, of course). However, it would be legal to write to `*((size_t *) p)` as long as the allocation is valid, I think. And it would also be legal to read from there if the address represented by `p` already contains a `size_t` there, I think (??).

Note that "C" (or rather, the compiler) can't necessarily know what is the type of some value at some address, especially if the value was (or could have been) assigned or copied inside some other function. So you could try to exploit that lack of knowledge to do something that wouldn't be legal otherwise.

However, I think that is a very flawed approach and that you should never rely on this kind of reasoning, because if the compiler (or linker) chooses to inline the code of that other function or do some other kind of inter-procedural analysis (e.g. to perform optimizations) then, suddenly, it can prove things that it couldn't prove before and your code breaks [0].

And this can happen when a new compiler or linker version gets released, without any warning. And the effects can be quite subtle, silent and dangerous (with extremely important security implications, even -- a similar issue has already happened with the Linux kernel, for instance, although the circumstances were slightly different).

Although, of course, I think this is more likely to happen if you just enable link-time optimization rather than some new compiler version being released.

[0] Of course, I'm assuming that the C standard doesn't necessarily guarantee that function implementations are opaque to each other, even if they are implemented in different modules. But it's possible there is such a rule in the C standard that I'm unaware of -- I have never read the entire standard.


> changing a pointer that points into an object so that it points into another object invokes undefined behavior

Do you have a source for this? Couldn't find anything on it with a quick web search.


https://en.cppreference.com/w/cpp/language/operator_arithmet...:

“If the pointer P points to the ith element of an array, then the expressions P + n, n + P, and P - n are pointers of the same type that point to the i+nth, i+nth, and i-nth element of the same array, respectively. The result of pointer addition may also be a one-past-the-end pointer (that is, pointer P such that the expression P - 1 points to the last element of the array). Any other situations (that is, attempts to generate a pointer that isn't pointing at an element of the same array or one past the end) invoke undefined behavior.”


I think you may have misread my comment, that wasn't the part I was referring to.


The current C2x (aka C23) draft finally spells the behaviour out explicitly:

> 7.24.3.2 The calloc function

> […]

> Returns

> The calloc function returns either a pointer to the allocated space or a null pointer if the space cannot be allocated or if the product nmemb * size would wraparound size_t.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf


> if the product nmemb * size would wraparound size_t.

Gah! "Wraparound" is not a verb! You'd think someone writing a computer language specification would try to use (human) language correctly.


Verbing weirds language (https://www.gocomics.com/calvinandhobbes/1993/01/25), but in English, you can verb (https://en.wikipedia.org/wiki/Conversion_(word_formation)#Ve...) a lot of phrases.


FWIW, it's a technical term defined as:

> the process by which a value is reduced modulo 2ᴺ, where N is the width of the resulting type


That's not what the objection is. Your quote is clearly defining a noun.


If it is like overflow, it should indeed be "aroundwrap".


Prescriptivism much? :p


> if (n && m > (size_t)-1/n)

Am I the only one that doesn’t think this is “sweet”?

I mean I know what it does and why and how it works, but IMHO a good language should let you express your intention and algorithm in human terms and logic in which there is no -1 value for an unsigned integer type.


There's SIZE_MAX constant, I don't know why they didn't use it.

I'd rewrite it as `if (n != 0 && m > SIZE_MAX / n)`. Seems legit to me.


Most likely to get around clever UB optimizations when checking for values beyond max sizes.


No. (size_t)-1 and SIZE_MAX are identical, there is no difference in the two ways of writing that.

Furthermore, optimisation can't bite you if you test the values before you perform the potentially UB operation (and unsigned operations can never overflow anyway, they are always well-defined).


That was just an assumption, far from me to know all UB cases, I leave that for tooling.


But there is no such check here, is there? SIZE_MAX / SIZE_MAX is legally 1.


Agreed that it would be better if C had standardized versions of GCC's overflow checking functions: https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Integer-Overflo...

With these functions, it could be written as: if (__builtin_mul_overflow(n, m, &n))


C23 has ckd_mul.



True.

It seems the C committee at some point makes things harder than they should. And true, the C committee is not preventing anyone from using something else

(Another reason why we should retire C for most projects)


> It seems the C committee at some point makes things harder than they should

This seems like entirely the wrong way of looking at it to me. The C committee is not trying to make things harder than they should be, but they’re really just a committee in charge of C. Their goal isn’t to make C into some kind of beautiful safe language, it’s just to provide incremental improvements to C for the benefit of people stuck using it for whatever reason.

Like, the C committee isn’t trying to stop you from retiring C. They’re just trying to help out the people who aren’t retiring C, for whatever reason—which could be toolchain availability (lots of architectures out there, you may only have a C compiler for some), could be so they can work with a legacy codebase, could be that some other tooling or process they have works with C (the various “safe C” / analyzable subsets or formal verification processes).


WG14 could take care about improving the language gotchas for safer software, something that all other ISO language groups, including WG21 do bother to do.

However in 40 years they already made their point not to care beyond make C a portable macro assembler with more UB issues than writing in raw Assembly.


C is not even remotely like a macro assembler. I get why people make the comparison, but the more I work with C and assembler, the more I realize that the comparison is facile. The relationship between the C code you write and the machine code which the compiler generates is complicated. If you think of C as a portable assembler, it’s easy to draw the wrong conclusion about how your code will behave. You look at some code and think, “I know how the assembly should look,” and at that point you’ve lost, because of the vast gulf between C and assembly semantics.

The UB issues are not going away. Some people who use C need to get good performance out of it. If you try fixing the UB issues, you end up having to remove optimization passes from your C compilers. That’s just not viable—hence, UB issues are here to say.

The correct thing for the C committee is to make slow, careful improvements to C for the people who are stuck using it. And the correct choice for most people is to stop using C in favor of safer languages.

If you try to have the committee fix the safety problems with C, you’re going to end up with something which neither fast nor safe. If you are stuck using C and need it to be safe, you can use formal methods. If you don’t have the budget for formal methods, you need safety, and you’re stuck on a system which only has a C toolchain, then the alternative is to use something which compiles to C.


I agree that in terms of modern proposals there's a lot of "doing what we can"

However, leaving the overflow check/behavior on the multiplication as "implementation defined" seems like a bad call to me


The fact that signed overflow is undefined is unfortunately something that is necessary for certain optimizations to be valid. You can, for example, try `-fwrapv` or `-ftrapv` on GCC, but if you run through an extensive benchmark suite you’ll see regressions.

Not everyone cares about the benchmarks, but enough do.


...Weren't we talking unsigned though?


I've heard them called museum conservators. : p


That language is C99, a now 23-year-old standard dialect of C.

   if (n && m > SIZE_MAX / n)


> if (n && m > (size_t)-1/n)

Shouldn't the case of wraparound be caught by testing the resulting size as following?

  s = n*m;
  if (n && s >= n && s >= m) /* no overflow */ ...


That doesn't work for multiplication -- e.g. in 8-bit, 20 * 20 wraps to 144.


Maybe it's late but I don't get it. Can someone spell it out for me?


You want to evaluate p in:

  p = m n
If that is gonna overflow, p > SIZE_MAX.

Then it would be that (by substitution)

  m n > SIZE_MAX
so

  m > SIZE_MAX/n
except if n = 0. In the latter case, m can be as big as it wants--p will always evaluate to 0.

And (size_t)-1 is a silly way to write SIZE_MAX (but it saves you a dependency).


Thanks! Indeed the `(size_t)-1` threw me off, and the `m && n` was an interesting obfuscation for checking whether either are 0.


If n != 0 and -1 cast to unsigned size_t divided by n is less than m. So that m times n would be bigger than size_t’s maximum value.

Edit: inverted meaning since check is for error, not for ok case.


You are not.


From my (admittedly very limited, and very dated) knowledge of x86 asm, I remember that multiplication sets some flags if there was an overflow. Is there no way to multiply in C so that such an overflow was detected? Or do some CPU architectures not support such flags? It seems inefficient (to me) to do this check before each multiplication.


The C abstract machine does not have such flags. C therefore does not have any support for checking them (from code, the compiler can add checks if the target supports them but doesn't have to).


I feel like in the mid 2000s there was an effort to raise awareness of the "multiply sizes and pass to allocator" class of vulnerability. That was when calloc implementations started checking for overflow. OpenBSD was vocal about it. And they introduced reallocarray in 2014.


Yes, it is possible for implementations to be non compliant with the specification.


Was there a specification back when this was written? Did it specify what it does now?


Yes, the function is defined to either return enough memory for the requested number of objects, or null. Returning less than enough memory is clearly out of spec. This was later stated explicitly, but at no time was it ever permitted.


Regurgitating from an older comment of mine: https://news.ycombinator.com/item?id=13110538

----

The first version of UNIX to include `calloc` was Research UNIX V6, and here's how it was implemented: https://github.com/dspinellis/unix-history-repo/blob/Researc...

    calloc(n, s)
    {
    return(alloc(n*s));
    }
There are several interesting things we learn from poking around V6:

- `calloc` originated not on UNIX, but as part of Mike Lesk's "iolib", which was written to make it easier to write C programs portable across PDP 11 UNIX, Honeywell 6000 GCOS, and IBM 370 OS[0]. Presumably the reason calloc is the-way-it-is is hidden in the history of the implementation for GCOS or IBM 370 OS, not UNIX. Unfortunately, I can't seem to track down a copy of Bell Labs "Computing Science Technical Report #31", which seems to be the appropriate reference.

- `calloc` predates `malloc`. As you can see in the above code, there was a `malloc`-like function called just `alloc` (though there were also several other functions named `alloc` that allocated things other than memory). (Ok, fine, since V5 the kernel's internal memory allocator happened to be named `malloc`, but it worked differently[1]).

[0]: https://github.com/dspinellis/unix-history-repo/blob/Researc... (format with `nroff -ms usr/doc/iolib/iolib`)

[1]: https://github.com/dspinellis/unix-history-repo/blob/Researc...


I think the `m` in `malloc` and `mfree` in the kernel means `map` as opposed to `memory` in libc. It allocates both core memory and swap memory and uses a `map` struct for this.


At 4:10 in this video there is some interesting history on alloc and malloc, by the original author them self!

https://youtu.be/Xe5ffO6Ouwg


Nice read. Oh I wish the register keyword was actually honored in today's compilers. I am tired of compilers thinking they are smarter than me and having too trust their fancy register allocator algorithm.


Does anyone know if this is actually true?

By that, I mean the title of this article, 'Why does calloc exist?'. My guess would be that back in the origins of the C library:

1) Few cared about automatic overflow checks, that was a job for the programmer, so it seems unlikely the creators added calloc() specifically so that it could check if m * n overflowed (after all, there are loads of libc calls that will shoot you in the foot if you don't do overflow/buffer size checks)

2) calloc() is ancient, probably pre-dating many virtual memory optimisations, and back then, memory was relatively fast vs instruction execution, so adding calloc() to speed up giant memory allocations would be less impactful. But that's just a hunch, and I'm often surprised by just how sophisticated early UNIX code was. Early computing was slow, every performance gain you could squeeze out of the system mattered and if there were VMs that could defer zeroing memory until it was really needed then people would probably try to do that.

But my guess is, calloc() was simply a helper function, added because it was very common to malloc() some memory and then zero it straight away. Does anyone know the actual history?


As others pointed out elsewhere, as as you suspected, early calloc() implementations didn't zero memory or check for overflow.

"Why is calloc useful?" would be probably a better title.


FYI this should have (2016) in the title. The first paragraph talking about HN appears to be referring to some old moderation drama that was recent when posted.


And by goodness, I almost stopped reading, because what a whiny rant.


It seemed dispassionate, reasoned and positive to me, but then again HN has never been a good place for discussions of ethics


I can't wait to read about the ethics and lack of political discussion around malloc, next.


I was about to comment that “this reminds me of the old internet where people took time to write papers online”.

Apparently I was more correct than I realized.


Wow. 2016 ist "the old internet"? Kids these days! To me "the old internet" ist the one before the web.


And all the publications he links to have stopped publishing.


So this is probably a silly question, but I'm curious. Why is it important to initialize memory to 0's before we use it? I know rust considers uninitialized memory unsafe but I don't get why. All that really matters is that we initialize it to something. Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?


> I know rust considers uninitialized memory unsafe but I don't get why. All that really matters is that we initialize it to something.

In the specific case of Rust, no, because not all values are valid. E.g., in the case of Rust, bool must be true (literally 1) or false (0); other values are UB. &str data must be valid UTF-8. Violating these invariants can cause other code that might depend on them to do odd things. E.g., an Option<bool> might rely on one of the other 254 values for the None variant[1], a UTF-8 decoder knowing the data "is" valid UTF-8 need not bound check continuation bytes leading to accesses outside the butter.

> Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?

Well, in Rust, see MaybeUninit (https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.h...). The docs page goes into the UB a bit, too (and names the above examples). So, you can … but it's often easier to just write a 0 or a known value & let the compiler make sure it's safe and optimize it out if it can.

[1]: Example: https://play.rust-lang.org/?version=stable&mode=debug&editio... — Option<bool> is 1 byte because the Option can take advantage of bool only using 2 out of 256 values of its byte. Cf. https://frehberg.com/2022/01/rust-memory-layout-optimization... and the first link in that.


Some languages, like Swift, basically do this: they enforce that dynamically allocated objects are definitely initialized before they are used.

Also, many C programmers (in my experience) don't use calloc -- they just use malloc, and then they initialize the memory. The basic danger is that you forget to initialize part of it, and then you have garbage data in your object fields. Zeros are arbitrary, but at least they (usually) cause a seg fault if you try to dereference them as a pointer, for example.

I guess there's a secondary danger that even if you do initialize all the fields, you might end up with padding bits that don't get initialized, and those would contain parts of an earlier object. That shouldn't usually matter -- being padding bits, they're not meant to be accessed -- but zeroing them out could act as a security mitigation.


Given the parameters of malloc() and calloc(), I assume it's to express intent---malloc() is used to allocate a single structure (thus, avoiding the whole multiplication issue) and calloc() for allocating an array (and initializing the entire array to a known value). Why realloc() doesn't take an items and size parameters, given that's it's almost always used to resize an array.


> Also, many C programmers (in my experience) don't use calloc -- they just use malloc, and then they initialize the memory. The basic danger is that you forget to initialize part of it,

I don't know about the latter part of your statement being true or not. The former is due to looping usage so it's better to do bzero as the first operation for use instead of the last for reuse. In other words, don't do the same functionality in two separate areas.


Is calloc avoided for a good reason? Or is it something that people should be doing in most cases?


There is often a small amount of extra runtime cost to zero-initialize the buffer. Other than that, no reason other than inertia.


it has two arguments


it's unsafe because it can lead to leakage of confidential information, and has many times in the past, but that's not the most common problem

the most common problem is that you write some code that dynamically allocates memory, you test it, it works fine, and then you write more code that depends on it, and later on (weeks, months, years) you run into some hard to reproduce bug

after a weekend of debugging it turns out your code only worked fine when you tested it because malloc was allocating freshly mapped pages the kernel had zeroed for you. so you didn't notice that you left some fields uninitialized in your records. ooops. you didn't need that weekend with your kids anyway

for this reason many debug allocators initialize your malloced memory as a form of offensive programming, and valgrind reports it as a bug if your program's behavior depends on uninitialized memory


The other unsafe practice was thinking "works fine" testing is sufficient. Instead the tests should always have been executed with Valgrind or similar before delivery, instead of manually on a weekend.


testing is never sufficient


Arguably, a leakage of information already happens when you deallocate memory without zeroing it first.


Make sure your compiler is actually doing that zeroing, as memory that is destined for deallocation does not have many requirements imposed on it.


In what security model? If you assume there's an adversary that can read all your memory, then you're pretty screwed no matter what you do.

In C's and Rust's model deallocated memory is assumed to be completely inaccessible (and Rust does its best to ensure it is inaccessible), so in a valid (non-UB) program there is nothing that could leak its secrets.


I thought of a scenario like, your process deallocates a page of memory, so it gets marked as unused by the OS, then some other application requests memory and gets handed your old page, with all the data you left in there. Would that not be leaking information?


Operating systems with support for memory protection and virtual memory generally don't behave that way for security reasons. Everything that is allocated to a process will be zeroed explicitly or on first reference.

There are older operating systems that don't do that though. On the Amiga you have to explicitly ask the OS for memory to be cleared, and if you don't you may get uncleared memory left over from some other process.


on linux the os zeroes the page to prevent this


Because it's fairly common to design structures so that zero-initialized memory is a valid object. Eg that's how Cap’n Proto "decodes" messages in 0us even when fields have default values: https://capnproto.org/

Also see "0 is special" by Linus https://yarchive.net/comp/linux/zero.html

(Zeroing out memory that you know you're going to overwrite is indeed wasteful because at least some of those pages will end up getting zeroed twice.)


If the memory had the value of a password and was then unallocated, the value would still be there. Then when allocating the memory again, the value would be there. If you initialize it to something and then print it, there is no security issue. but if you print it before, then you just printed a password.


This is why even with garbage collected languages that you should always manually bzero the password field or other important fields.


Wouldn’t clearing the memory upon free() be a more reliable way to deal with this, though?


You could try having the OS guarantee zero initialized memory for all blocks that your program gets, but that's hard to enforce as a standard I imagine.


I think we do this in practice. The issue is that malloc doesn' t ask the OS for memory, it calls the system allocator which in turn asks the OS for memory. But once you've freed something, that allocator can give you that memory back without talking to the kernel


So if it’s hard to enforce, what’s the reason we go through the effort of zeroing anyway? Isn’t this one of those things that either need to be 100% reliable to be useful?

Or is this more about preventing escalation in case someone finds an exploit in your code?


Zeroing is just a form of initialization. If you initialize the memory right away (and completely!) with actual data, there's no need for additional zeroing.

Zeroing is something one should do if the allocated memory isn't completely initialized right away. In my experience, that's not only about someone finding exploits, but also about stability.

When people write code, most of the time they simply work with the assumption that an `int` contains a 0 by default and that code usually fails in unpredictable ways if it does not.

Accessing uninitalized memory is just always an error and having uninitalized memory laying around in your application has rarely any benefit. Either assign a value or zero it. Not doing either of those suggests the allocation isn't needed right there in the first place.


What if the program never called free? Or if it crashed? A bad actor could intentionally crash a program, and then allocate lots of memory, then search for the password in the memory.


All modern operating systems will clear the physical memory before handing it from one process (crashed or not) to another.


If that's true, then the entire premise of calloc is invalid, isn't it? Newly allocated memory would always be zeroed anyway by the OS. Then what are we even discussing?


malloc is not a system call, it is a C library function. Modern operating systems clear memory when providing it to a process, but malloc implementations reuse the same memory locations previously used an arbitrary number of times within the process and normally do not clear memory when it is reused (within the process) for performance reasons.


There is a process local object pool in some implementations.


> be a more reliable way to deal with this,

Only if you can guarantee what free_and_zero() would succeed every time. But you can't.


No, as not all memory is free()'d, think e.g. of crashes and buffer overflows.


> Why is it important to initialize memory to 0's before we use it?

Because reading from unitialized memory yields an indeterminate value. Operating on such a value can lead to UB. This applies to languages other than Rust too (e.g. C).

Ralf explains it better than me[1].

[1]: https://www.ralfj.de/blog/2019/07/14/uninit.html


One thing I noticed is in C, probably C++, if you create a struct on the stack you can set every field in the struct and... alignment means the empty areas between your fields don't get set and thus leak information that was on the stack.


only if you do read the values between the members.


Or someone else reads them.


Depends on how you set the fields.


Maybe that first value is a bunch of zeroes?

Since you mention Rust, here's a neat piece of trivia: `vec![0; N]` compiles down to a `calloc` call. (Well Rust's own version of `calloc`.)


For some situations it's certainly like you say - not important to initialize memory upon allocation. That's how malloc is ordinarily implemented.

If your allocation is about to take on a matrix that you will load from a default value or calculation for every cell - that would only suffer unnecessary latency if memory were zeroed on allocation.

But if your allocation is about to take on a struct with many fields and elements, it could be easily misused by not initializing every field (including when new fields are added), then a zero or other default initialization might be convenient.


Imagine you have to initialize a big structure, that has a lot of members. If you don't zero-initialize it you have to manually initialize all the fields, one by one. Not only it takes a lot of lines of code that can be avoided, and it's less efficient to initialize all the zero fields than writing 0 in all the structure, but you can forget to initialize one and good luck debugging it. Or the structure can be modified by adding a optional field, that you have to remember to zero initialize everywhere you don't need it (instead of assuming it's zero because it's zero initialized).

Of course accessing not initialized fields can lead to catastrophic bugs, imagine for example a structure containing a pointer not initialized to NULL that points to a random location, or a char* field that doesn't have a null-terminator in it.

It's safer to 0 initialize everything, at least if you forgot to initialize something you have a predictable situation that is still better than the value of the field depending on what was there previously that is not deterministic.


>Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's

Because we might want to read from it before we write to it. The simplest case is an array: unless we ensure that we write to every index before we read from it, we risk reading out (potentially sensitive) garbage data if we didn't calloc it.


> All that really matters is that we initialize it to something.

People make mistakes and forget to do this and their program sometimes goes wrong in an unpredictable way. Maybe only going wrong when in production. It’s redundancy - most other fields of engineering do it, we seem to think it’s weird in CS.


I think the most annoying part about errors due to missing initialisation is that (especially with variables on the stack, so not those allocated with Malloc) they are coincidentally pre-filled with values from the functions that were called immediately before the current function. So instead of completely random behaviour you see somewhat random behaviour. And an off by 5 error is a lot harder to track than a off by several thousands error.


>Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?

Rust lets you do that.

    let m: u32; // note: no mut
    m = 5;
Even this is totally fine:

    let m: u32; // note: no mut
    if true {
       m = 5
    } else {
       m = 2
    }
>I know rust considers uninitialized memory unsafe but I don't get why

It is wildly unsafe, because the compiler has to be able to rely on the fact that memory slots it emits assembly code for actually use only those bits the compiler knows are in use.

For example, on ARM you have no registers < 32 bit. So the compiler has to use those registers for u8 as well, but it has to be sure that nobody actually sets the high bits, otherwise multiplication (etc) would do completely insane things and not at all what the program should have done.

Similarly, memory slots have a size that is a multiple of 8 bits (because the load/store instructions usually can load/store exactly 8, 16, 32 bits etc) and your type doesn't necessarily have exactly that number of possible inhabitants of the type (it would only be the case if number of possible inhabitants = 2^(8 n) where n is a natural number). Clearly, this is not possible to ensure in general.

Of course you have the compiler be totally defensive about it and issue masking instructions all the time--but that would make the resulting program slow.

Also, think of a bool. It has two possible values, right? But in C, any value that is not equal to 0 is defined to be true.

Now when the compiler emits code for a && b it can't do the obvious which would be

   and r0, r1, r2
because that will do bitwise-and and let's say a = 4 and b = 1 then a bitand b = 0 but what you wanted is a && b == 1 since a is true and b is true. Worse if a = 5 and b = 1, it suddenly works.

So that would be a bad idea as well (you'd have to emit extra instructions to check whether a is 0 and so on).

>Why is it important to initialize memory to 0's before we use it?

It's not. It's just C being C.

I'd actually prefer C to have an alloc function with two arguments m and n where it does m * n with overflow checking and then allocates that and then does NOT initialize the memory to 0's.

That said, todays computers are very fast and 0 has no data dependencies. So I wonder how much overhead, if any, in wall time, it adds to always zero out.


> ...Why is it important to initialize memory to 0's before we use it?

I guess, when writing in C, memset/calloc to 0 is a practical way to blanket-initialize all members of an object, say, pointer vars could then be safely checked against zero (NULL) to test whether it's been assigned or alloc'ed, strings emptied, counts zeroed, so would be flags. Convenient!

Basically, it's a shortcut with structured objects to avoid spelling out and initialising individual variables explicitly.


Discussed at the time:

Why does calloc exist? - https://news.ycombinator.com/item?id=13108434 - Dec 2016 (132 comments)


There is a simpler explanation. On 16-bit systems SIZE_MAX is often 65535. You can't allocate a 64K (65536) block with malloc() but you can with calloc().


This might not be the primary initial motivation for calloc(), but I think you're right about why it takes two arguments.



I'm not confident it's valid to allocate objects larger than SIZE_MAX/2 on any platform.


As a data point, C23 added explicit wording about overflow handling in calloc():

> The calloc function returns […] a null pointer […] if the mathematical product of nmemb * size is not representable as a value of type size_t.

So apparently the Committee is OK with allocating more than SIZE_MAX/2 (but not more than SIZE_MAX).


Why not?

The first argument that comes to mind is the existence of ptrdiff_t, but you’re not guaranteed to get a valid ptrdiff_t just by subtracting two pointers in the same array!


Not objects but memory. It was common in 16-bit era. And you would access it through memory segmentation, with two registers: one pointing at the segment beginning, the other being the offset. In code:

  addr = base << 4 + offset


Does such a system exist?


To generalize, that odd thing happens on systems whose native integer range is smaller than the amount of addressable memory, and where some kind of combination of bank/segment value and offset value defines that address (usually in the manner dictated by hardware).

On 16 bit x86, the biggest value processor can count to is 65535, but the number of those values (from 0 to 65535) is 65536, and so is the number of byte offsets in one segment. Memory handling functions had to deal with segment and offset calculations, overflows, underflows, and wraparounds in wider address space, and could allocate the full 64K segment when not limited by maximum value of a single variable (it was not guaranteed, though, as some allocators could reserve some space for their own headers). The value of 65535 was probably inherited as lower limit for SIZE_MAX in standard because 8 bit systems were not considered real targets for “real C” (as opposed to microcontroller-specific subsets of C). Of course, people then wrote C compilers for those systems, and various AVRs and STMs were made to support pointer-heavy and subroutine-heavy code better while still being simple.

One could get rid of those sharp corners by extending the compiler to present any system as having simple flat memory and integers that are big enough, but it would be the direct opposite of why people use C language. The performance would suffer, too.

Random oldnewthing link to make this comment seem well-sourced:

https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=10...


8086 (and any subsequent intel chip, running in real mode)


Or worse - Intel's 80286, in protected mode.

However good their intentions might have been...programming that thing was hell.


Did anyone else notice (and read) the "Welcome Hacker News readers" section? It is weird. Example: <<I hope the HN moderators find a way to step up to the responsibility their position entails>> What does this mean?


Clicking the links makes it clearer. Back in 2016 there seems to have been a politics free week on HN. (Hmm… 2016… I wonder why!). Anyway he sort of implies it is our duty to discuss politics on HN to avoid maybe the next Hitler or other such bad things happening.


Some of the links in the paragraph about python-requests have bitrotted. Fixed links:

https://github.com/psf/requests/issues/3729

https://foss.heptapod.net/pypy/cffi/-/issues/295


Reason #2 at the link is why we absolute must have a reallocate-and-move primitive - because it could be much faster than allocating and copying: It would have a higher chance for copying by virtual memory page remapping. Of course, just plain copying might also use virtual memory page tricks; I wonder which platforms do that (if any).


I remember asking my systems prof this question in class (I can't remember exactly what her answer was), but it was along the lines of is that it's more efficient due to OS+hardware optimizations. basically: use it when you need zeroed out memory!


I encountered problems on some platforms in the 90s where zeroing the bits of an array of pointers or floats/doubles did not give you an array of NULL pointers or 0.0 values. The comp.lang.c FAQ file discussed this in detail.


Yeah, C doesn't require null pointers to be represented as 0, even if its constant is written 0, nor does it require floats/doubles to be represented as per IEEE-754. Or at least it didn't use to.


I don't think that's standards compliant.

At least for floating-point, not entirely sure about pointers.


floats being IEEE compliant is optional by the language standards. Otherwise the bit representation is not specified.


It still requires 0 to be represented as all bits set to zero.


Where does the ISO C standard say so?

edit:

https://cigix.me/c17#6.2.6.1.p1 seems relevant, I did not find anything for floating point types.

edit2:

Notes are probably not normative, but still:

https://cigix.me/c17#footnote301


That footnote confirms the traditional position that a floating point zero need NOT have an all-zero bit representation.


Yes. I was porting code to new machines and programmers had assumed it was compliant but it wasn’t.


This is the analysis of one specific implementation of calloc() in one specific operating system, nothing can be extrapolated from that.


Minor nitpick: on my mobile view there is a linebreak between base and exponent, that's a bit irritating.

2 63


Another context calloc is useful is in NUMA systems because of the first touch policy, right?


So when would you ever use malloc? calloc appears to be superior.


Not having to zero out allocations can mean that malloc runs slightly faster.


Would be nice to see how hugepages and THP affect performance too


For big pieces memory at least, a lot. I was talking to someone the other day about this, they said that in their kernel, switching to huge pages for mapping physical memory made the boot faster by a factor of 10000. I don't have the measurements though.


Discontinuity between hairlines.


C++ has entered the chat.

T *foo = new T(1000);


Nooo! Again?

> if we really cared about security we wouldn't be writing in C.

If we really cared about security we would definitely be writing in C.

If we really cared about security but would prefer not to think about it we wouldn'd be really caring about security, no matter the language.

Caring is done by little details. You don't want to risk with dangling pointers, do you? Then you either rely on someone else's work (garbage collector?) embedded in some "secure" language or you do your homework with careful memory allocation.

What about buffer/stack overflows? Same.

Finally, have you asked yourself with which language your "secure" stack (kernel, userland, compiler/ interpreter etc.) has been written?

I bet a huge part is in C.


Sure, but I expect that many many many more people have read and run and tested and hardened my kernel, userland, compiler/interpreter than have done so for the code I am just now writing. I have a lot more faith in the correctness of all that other code than I do in the code that I'm about to write, at least not until the code I'm about to write has seen similar scrutiny and usage -- which it likely never will.

Put another way, I trust the Rust code I write to be free of memory errors much more than I trust the C code that I write (despite having much more experience writing C than Rust). And yes, that's even accounting for the possibility of bugs in the Rust compiler that cause my program to do unsafe things with memory. Assuming I could actually maintain the focus and discipline to be my absolute most careful self every moment while writing C (which I, in reality, cannot), I still think the Rust compiler is safer than I am.

Frankly, I would consider nearly anyone (and possibly just an unqualified "anyone") who believes otherwise about their own abilities to be naive at best, and actively arrogant at worst.


I play rugby without a mouthguard because it keeps me disciplined and I care about my own safety. /s


If it were remotely feasible to create a competitive operating system kernel in a memory safe language with a tracing garbage collector, a kernel written in something like Java or C# would have taken over the world by now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: