So you want custom allocator support in your C library

kccqzy · 2023-12-18T05:05:23 1702875923

> When an application frees an object it always has the original, requested allocation size on hand. After all, it’s a necessary condition to use the object correctly.

This is quite untrue in real-world C codebases unless you have a lot of discipline. I know from reading a previous article (https://nullprogram.com/blog/2023/10/08/) by this author that they hate null-terminated strings. But in your typical C code, you often allocate a bunch of memory to be used as a null-terminated string, return it or pass it to another function transferring ownership, and expect a different function to free() it. The length of the string is not necessarily the allocation size! The code could have allocated some predetermined max size for the buffer, build up the string gradually, and then add a null terminator. Thus, the function doing the free() can't pass the right size.

kragen · 2023-12-18T06:15:13 1702880113

if i have a function foo which calls functions bar and baz to build up a dynamically allocated string (using a custom allocator), then calls function quux to transfer ownership of the string, foo needs to store the information about how big the allocation currently is, and how to allocate more of it, in a place where functions bar and baz can access it. if it doesn't do that, bar or baz can easily overflow the buffer because they don't know how big it is. this is a particular case of 'having the requested allocation size on hand is a necessary condition to use the object correctly'

but if foo is passing this data structure to bar and baz, it can pass it to quux too, so the problem as you described it just doesn't exist

(unless your code is only working by the merest of chances and will have some cves later this year the first time some competent programmers look at it)

'bar or baz can easily overflow the buffer because they don't know how big it is' is a particular case of 'having the requested allocation size on hand is a necessary condition to use the object correctly'

kccqzy · 2023-12-18T07:33:06 1702884786

You are describing an ideal scenario where code is written by experts and those experts did not take any shortcuts. Real world C code is unfortunately anything but that. I haven't had the luxury of only working on expertly-written C code which frankly also makes me hate C code.

What if the function bar only appends a number to a string? Then the maximum length it needs could become embedded knowledge by the foo function ("10 characters is enough to store this uint32_t so no need to pass anything related to the allocation" thought the programmer who hated tedium).

What if bar and baz are not allowed to grow the allocation because foo has already enough buffer to store the maximum possible size (say a path on Windows, 260 bytes)?

What if the function quux is in a different third-party library whose API cannot be modified to take the allocation size?

kragen · 2023-12-18T07:55:23 1702886123

well, i do agree that there's a lot of code out there that only works by the merest of chances and will have several cves opened the first time a competent programmer looks at it; the kinds of implicit thoughts and prayers you're describing (sweet jesus, pvg, that isn't a uint32_t you're formatting) are precisely what i was talking about

but i'm not talking about code where the authors didn't take any shortcuts, just boringly competent c code with the regular quantity of bugs in it. i'm sorry you've been subjected to such torments as you report. but the author of this article isn't talking about how to write such astoundingly shitty code; he's talking about how to write good code

if the function quux is in a different third-party library, its api probably cannot be modified to call the custom allocator either (my initial comment draft suggested that you might be talking about such a case, but then i realized that didn't make any sense and so i deleted it)

comex · 2023-12-18T06:37:49 1702881469

There are many reasons it might be inconvenient for foo to pass that data to quux. Perhaps it’s actually quux that called foo rather than the other way around (in other words, foo is passing ownership through its return value rather than through a function argument), in which case C makes it easier to return one value than two. Or perhaps there’s just a lot of existing logic that passes owned strings around without an accompanying size parameter, which is perfectly safe to do if you’re not planning to ever mutate them.

Of course you could easily design your codebase to not have that issue, if you’re writing it from scratch. But that’s not the question.

kragen · 2023-12-18T07:50:34 1702885834

i agree that parts of the code that are not changing the length of the string can use it safely without knowing the buffer size

it's been easy to return a struct by value in c for over 30 years, though in very early versions of c, structs in rvalue context implicitly decayed to pointers, like arrays. it's been efficient for about 20 years

so it's not easier to return one value than two if you've already defined the struct type, which is presumably the case here

there's people who avoid it because they're imitating c code from the 01980s (or, charitably, performance-sensitive c code from the 01990s), or because they're using some horrible abomination like common lisp's cffi, but that's really quite marginal

and even if you didn't originally design your codebase that way, you can certainly fix the design. it'll probably even make it easier to understand and maintain, because you're always passing around a stralloc or whatever, instead of sometimes passing around a stralloc and other times passing around a raw buffer pointer

lmm · 2023-12-18T07:59:10 1702886350

If you're going to go to all the effort of rewriting your codebase anyway, why would you keep it in C?

kragen · 2023-12-18T08:02:54 1702886574

you can change the calling interface of a single function without rewriting your entire codebase

also there are a lot of things for which c is still the best option, though rust is becoming appealing for more of them all the time

varjag · 2023-12-18T10:10:57 1702894257

Your personal coding preferences and discipline will switch the jobs together with you. The new hires on the project you left behind are going to have their own perspective (likely very wrong from your POV) on how things should be done.

If it's an open source project, idiosyncratic code will only increase friction for contribution for little practical gain.

kragen · 2023-12-18T16:11:12 1702915872

broadly i agree except for your last point. but i am unclear on how this relates to the thread, other than explaining why many projects fail to even approach the quality demonstrated in wellons's blog

eqvinox · 2023-12-18T10:08:31 1702894111

Indeed, and more — there are code bases (ours, FRRouting) that do the opposite: expect the allocator to track sizes and invoke malloc_usable_size() to do our own (per-type-of-allocation grouped) accounting (to subtract on realloc/free; for adding it's not needed of course since you have the size already.)

el_pollo_diablo · 2023-12-18T09:21:53 1702891313

Unless all objects allocated by the library have one of the fundamental alignment requirements, I'd add a fourth item to the checklist:

  4. All functions take the alignment requirement as a parameter. The alignment requirement must be a power of 2. The reallocation and deallocation functions must be passed the same alignment requirement value as was passed to the allocation function when the block was first allocated.

A common motivation for rolling out your custom allocator is dealing with extended alignment requirements. This should be reflected in the allocator's API.

eqvinox · 2023-12-18T10:12:11 1702894331

Alternatively, "the allocator should have an equivalent of posix_memalign()".

Also the default alignment should be well documented, since 99% of calls won't care about a specific alignment, and then you start hunting why your math routines are slow when the CPU is doing unaligned SIMD loads…

el_pollo_diablo · 2023-12-18T10:24:58 1702895098

C11 has aligned_alloc, which is more standard than posix_memalign, but there is no standard "aligned_realloc".

There is no default alignment if all the functions of the allocator take it as a parameter, but it is reasonable for an allocator implementation to be optimized for a particular alignment requirement such as _Alignof(max_align_t).

eqvinox · 2023-12-18T10:31:32 1702895492

> C11 has aligned_alloc, which is more standard than posix_memalign,

I don't have data on this but I feel that posix_memalign might be more widely supported than aligned_alloc. Then again C11 is 12 years old now so hopefully this changed.

> There is no default alignment if all the functions of the allocator take it as a parameter,

I would assume you can pass 0 as the requested alignment, which makes the allocator use some default. If the allocator always requires an alignment value, it becomes a serious pain to go through all allocations and figure out a reasonable alignment value. Especially since in most cases you really don't care.

el_pollo_diablo · 2023-12-18T10:57:18 1702897038

> I don't have data on this but I feel that posix_memalign might be more widely supported than aligned_alloc.

Possibly. IIRC Microsoft's CRT does not support aligned_alloc() because their implementation of free() cannot deal with extended alignment requirements, so they have separate functions for aligned (de)allocation. To be fair, they probably do not support posix_memalign() either.

> it becomes a serious pain to go through all allocations and figure out a reasonable alignment value

Sure, you can pass _Alignof(max_align_t) in almost all cases, or have generic (= not reimplemented by each allocator) convenience wrapper functions that do that for you. If you want to pass the least restrictive alignment in all cases, you can enlist the help of a macro, but this is a bit more awkward. For instance:

  #define ALLOC(ptr, size) ((ptr) = alloc(_Alignof(typeof(*(ptr))), (size)))
  
  foo_t *p;
  ALLOC(p, 42 * sizeof *p);

(Note that this uses the typeof() operator, which should become standard with C23.)

tlb · 2023-12-18T12:38:48 1702903128

max_align_t is only the max for scalar types, typically 8 or 16 bytes. But often math-intensive code wants 16 or 32 byte alignment for float arrays, to enable SIMD operations.

el_pollo_diablo · 2023-12-18T13:12:07 1702905127

> max_align_t is only the max for scalar types, typically 8 or 16 bytes. But often math-intensive code wants 16 or 32 byte alignment for float arrays, to enable SIMD operations.

Indeed, but _Alignof(max_align_t) is a good compromise for a default alignment requirement in an allocator interface, which is what eqvinox calls for. Standard C does exactly that: malloc(), realloc(), and calloc() promise that they satisfy _Alignof(max_align_t), but nothing stricter. To reach for extended alignment requirements, your only standard tool is aligned_alloc().

gpvos · 2023-12-18T10:55:34 1702896934

The context parameter already addresses that need.

el_pollo_diablo · 2023-12-18T11:17:22 1702898242

> The context parameter already addresses that need.

How so? The context parameter identifies the allocator instance. What if the library needs to allocate objects with different alignment requirements with the same allocator instance?

Alignment requirement is just like allocation size (*). It varies from call to call, and generic allocators need to deal with heterogeneous alignments and sizes.

(*) Case in point: Rust's Allocator interface (https://doc.rust-lang.org/std/alloc/trait.Allocator.html) bundles them up in a single Layout object.

flohofwoe · 2023-12-18T11:28:17 1702898897

Not really I think, the context pointer (I would call it "user data" though since that's more common in the C world) is normally passed around as an opaque pointer which is used by the allocator functions to find their own state, but not for passing additional parameters to the allocator.

gpvos · 2023-12-18T12:18:52 1702901932

I guess you're right. The alignment parameter may be different per allocation, not per arena/context.

kevincox · 2023-12-18T13:29:42 1702906182

While you are at it passing the size of the allocation during free can also be valuable and is usually available "for free".

el_pollo_diablo · 2023-12-18T13:42:28 1702906948

Yes, that's item 2 of the checklist in the original article.

Cloudef · 2023-12-18T05:31:11 1702877471

The fact that zig's standard library design (and general code guidelines) makes you always carry allocator with you, makes it very easy to manage memory overall. Most of time I set up arena allocator and then I don't even have to care about clean up unless it's something holding a OS resource.

laserbeam · 2023-12-18T05:51:25 1702878685

The article presents essentially the same view, for a language which doesn't have that convention, and doesn't have slices (pointers that also contain size information). Zig allocators check all three requirements illustrated here. As do most modern allocator interfaces.

eqvinox · 2023-12-18T10:20:16 1702894816

Carrying the allocator everywhere does introduce a new class of bugs — if you call into a library at a later point in time to interact with something that was previously allocated, but you pass a different allocator this time… how exactly does that work?

(As opposed to: set up the allocator once, in an initialization call.)

flohofwoe · 2023-12-18T11:30:51 1702899051

Zig stdlib "objects" like dynamic containers indeed usually only take the allocator during initialization and then store it in the instance. E.g.

    var list = ArrayList(u8).init(test_allocator);
    defer list.deinit();
    try list.append('H');
    try list.append('e');
    ...

Cloudef · 2023-12-20T01:49:44 1703036984

There are also often Unmanaged variants where you pass allocator to every method. In these scenarios you have to make sure you use the same allocator though. Unmanaged can be useful for short lived allocations or type that combines many standard containers, saving memory.

ghostpepper · 2023-12-17T23:33:03 1702855983

I wonder how many security-critical memory vulnerabilities would've been avoided over the decades if glibc had been designed with this API instead of the one that became the de-facto standard.

Someone · 2023-12-18T00:06:09 1702857969

Or introduced. I would think that trusting the (possibly compromised) caller to pass the correct size of a to be freed or resized object opens up new possibilities for security vulnerabilities.

o11c · 2023-12-18T01:24:11 1702862651

Note that libc-style `free` often has to do work to walk through all possible arenas/buckets/whatever to see which one contains the given pointer. If given a size (even just a hint) it can do a much shorter walk without reducing security.

tom_ · 2023-12-18T02:02:05 1702864925

"Even just a hint" doesn't deserve to be squirreled away in a mere paranthetical. It's what distinguishes some bullshit (the article's suggestion, whereby the allocator carefully checks that you supplied the right value that it already knows and barfs if not) from the actually potentially useful improvement to the API that you suggest.

The code that frees very often does not know how much was allocated.

lifthrasiir · 2023-12-18T02:09:19 1702865359

An allocator that fully trusts the size provided to `free` is hugely unsafe when the size turns out to be incorrect. As an extreme example, the allocator may know that every 4-byte allocation is put into the arena that is aligned to 0x1000-byte block, so it may jump straight into the arena header which won't be there if the size was incorrect. So a real-life allocator is likely to do some checking even when the size is provided, but that checking can be much more simplified in that case.

saagarjha · 2023-12-18T03:01:21 1702868481

Most allocators can do this lookup quickly, sometimes by placing the size next to the allocation or by encoding it into the returned pointer.

lifthrasiir · 2023-12-18T03:18:53 1702869533

> by encoding it into the returned pointer

More accurately speaking though, this information is not directly available from the pointer (unless we have a universal pointer tagging support today...) and should be mapped via the allocator internal data structure. This is what o11c referred by "arenas/buckets/whatever", and having a size information does simplify this process.

The metadata around the allocation is comparably expensive especially for smaller allocations anyway, so modern allocators would have at least two classes of memory depending on the size. So even when the allocator doesn't make use of any other size information, it is possible that the size information can still be helpful.

saagarjha · 2023-12-18T04:40:16 1702874416

Having the size is definitely useful, because you can skip the metadata. But since needing this information in some form or the other is very important it is generally easy to derive-no walks or anything, because that would be slow.

jaywee · 2023-12-18T05:26:17 1702877177

It can be - the allocator just needs to mmap() different bucket pools to different base addresses.

hollimolli · 2023-12-18T00:29:08 1702859348

What's wrong with the one from glib?

unwind · 2023-12-18T09:00:44 1702890044

Please note that "glibc" is the GNU C library, which is highly distinct from "glib" which is the general-purpose utility library underlying GTK and many other GNOME-adjacent technologies.

hollimolli · 2023-12-28T23:22:53 1703805773

Thanks, man; that was just a typo; still interested in the answer.

eqvinox · 2023-12-18T10:13:55 1702894435

  5. The allocator must be able to mesh into debugging tools like ASAN and valgrind.

Not having these tools available is a serious kneecapping yourself. C is bad enough with all the tooling you can get…

I've only dealt with valgrind so far, integrating with it is pretty easy (a line or two of annotations in each function.) It just needs to be done.

(To be fair this is a requirement on the allocator, not on the API a library exposes to throw in a custom allocator.)

zackoverflow · 2023-12-18T10:56:50 1702897010

> Note that the context pointer came after the “standard” arguments. All things being equal, “extra” arguments should go after standard ones. But don’t sweat it! In the most common calling conventions this allows stub implementations to be merely an unconditional jump.

I didn't know this. Does this optimization have a name? Where can I read more about it?

jonathrg · 2023-12-18T11:27:52 1702898872

This is because of the calling convention / ABI

If you write

    void free_with_extra_args(int *a, int *b) {
        free(a);
    }

then *a is in already in the correct slot (the RDI register) for the first argument when free_with_extra_args is being called. Whatever is put into *b is never touched. If you compile this with gcc -O2 you get

    free_with_extra_args:
        jmp     free

If you make the function call free(b) instead, you'll have to move b into the right place before calling free:

    free_with_extra_args:
            mov     rdi, rsi
            jmp     free

This is on x86-64 as summarized here https://en.wikipedia.org/wiki/X86_calling_conventions#System...

Wikipedia also has a nice summary of calling conventions on other platforms like ARM. All modern calling conventions are similar: pass the first args in registers and then use the stack as needed https://en.wikipedia.org/wiki/Calling_convention

flohofwoe · 2023-12-18T11:35:43 1702899343

That's only one half of it though, the other interesting part is the jmp (instead of call) to hand over control to a subroutine without pushing a new return address to the stack (since there's no code after the function call, and the calling function doesn't require its own stack frame).

jonathrg · 2023-12-18T11:53:46 1702900426

Indeed. Typically you have pairs of call and ret, where call creates a stack frame and ret tears it down

     caller                           caller
       |                                ^
     (call) . . . . . . . . . . . . . (ret)
       |                                |
       V                                |
       outer_function      outer_function
              |            ^
           (call) . . . .(ret)
              |            |
              V            |
              inner_function

A jmp does not modify the stack, so when the inner function calls ret it jumps right back to caller

     caller              caller
        |                  ^
      (call) . . . . . . (ret)
        |                  |
        v                  |
        outer_function     |
              |            |
            (jmp)          |
              |            |
              V            |
              inner_function

This trick stops working as soon as outer_function needs local variables or does anything other than returning the exact return value of inner_function. In that case you need a stack frame

mananaysiempre · 2023-12-18T13:26:01 1702905961

Since the times of K&R, most C calling conventions are defined so that you can call a function with more arguments than it expects. While the ISO C standard does not permit such calls in strictly conforming code, support for them is still ubiquitous.

[One very common use of this is that the C libraries on common desktop systems call main with three (or more!) arguments, and this works whether the programmer has declared an int main(void), an int main(int argc, char *argv), or an int main(int argc, char *argv, char *envp). I once again have to say that the third form, with an envp argument, is not sanctioned by either ISO C or POSIX, and that a C implementation could definitely use some special handling for an external-linkage function called “main” to allow the first two to be used.]

In practical terms, a function call goes along the lines of “pack the first few arguments into caller-saved registers[1] (using more or less complicated rules), then the rest on the stack, then do a call (using either the stack or the link register depending on the architecture), then after returning pop off the stack part of the arguments (the register part being assumed clobbered and not requiring any cleanup).” This kind of convention is called “caller cleanup” (in contrast to “callee cleanup” conventions—like non-vararg __stdcall on Win32—which have the callee pop the arguments off). While you could imagine other ways to permit extra arguments, it’s certainly the most common one.

That, then, implies that a C function can tail call (jump to) any function whose argument tuple is a prefix of its own. Good compilers will recognize this (and unlike you giving a function extra arguments, they are within their standard-given rights to do so!). If you want proper tail calls to functions of other types... that might be possible, but it’s much more gnarly. See for example the description of the musttail attribute in the Clang documentation[2].

[1] For each machine register, a calling convention will define which the callee has to preserve (“callee-saved”) and which the caller will have to if it cares about what’s in them (“caller-saved”).

[2] https://clang.llvm.org/docs/AttributeReference.html#musttail

Retr0id · 2023-12-18T11:06:20 1702897580

I'm not sure if there's a particular name for it, but you could consider it to be a special case of a tail call optimisation https://en.wikipedia.org/wiki/Tail_call

hayley-patton · 2023-12-18T01:21:16 1702862476

> The standard library allocator keeps its state in global variables. This makes for a simple interface, but comes with significant performance and complexity costs

Thread local allocation buffers?

MereInterest · 2023-12-18T03:00:04 1702868404

Unfortunately, because a buffer may be allocated in one thread but freed in another, you can't have purely thread-local state for the allocator.

hayley-patton · 2023-12-18T03:33:13 1702870393

snmalloc batches up cross-thread frees, though generally any thread-local stuff is better than none.

kragen · 2023-12-18T06:16:07 1702880167

how simple can that be and still get reasonable scaling?

masklinn · 2023-12-18T10:23:36 1702895016

Probably depends on your work load.

If you assume there’s almost no crosstalk, you can likely get away with a thread-id and an optimistic mutex per pool: the mutex will only contend if one thread tries to (de)alloc while an other thread has a cross-thread free running.

If you assume there will be more contention than that, then scaling up complexity might be a good idea e.g. lock per size class / arena, or push cross-thread frees to a queue and have the owning thread check the queue once in a while. You can then make that queue lock-free or even wait-free if that is better for your workload / issues.

hayley-patton · 2023-12-18T10:26:54 1702895214

Looks like it's 10.7 kLOC whereas mimalloc is 8.8 kLOC, so it's not much more complex at that...amount of engineering, I guess. Also spotted that the mimalloc README says

> Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads

which may not be the worst thing in the world; depends on what your app does.

kragen · 2023-12-18T15:51:25 1702914685

musl's malloc is 0.6 kloc and right now it has terrible multicore scaling, and i'm wondering how simply that can be fixed

afaik neither c nor posix provides portable cas, so while that's probably the best performing solution, it's probably not suitable for musl

discussion in https://news.ycombinator.com/item?id=38619599

hayley-patton · 2023-12-18T22:44:41 1702939481

C11 has atomics and CAS, is C11 okay?

kragen · 2023-12-19T18:20:28 1703010028

hm! interesting!

kazinator · 2023-12-18T19:22:09 1702927329

> When an application frees an object it always has the original, requested allocation size on hand.

This is false in general. If it's true in your library that you have this old size on hand, then design the allocator callbacks with old-size arguments in free and realloc. If it's not convenient, then don't design those arguments in.

Most of the time, the allocator implementation will the be the default one via malloc and free, where the size is not required, so you're making the caller calculate or retrieve a value that is thrown away before the allocator is called.

There are situations in which a program would have to calculate the size in order to recover it. A trivial example is when a string is duplicated (as with the POSIX strdup function).

If the free function required the original request size, we would have to free the string using free(str, strlen(str) + 1) rather than just free(str).

Another example is when we have a header structure followed by a flexible array member. We can't just free(obj) but something like free(obj, offsetof(obj_t, array) + sizeof(obj->array[0]) * obj->nelems).

The allocated size could be stored in an extra field, but that's not something required when the allocator is malloc/free.

hgs3 · 2023-12-18T02:26:43 1702866403

Good article. The caller can also pass arguments hinting whether the requested allocation will be long or short lived, zero allocated, etc... You can also go beyond the realloc/free idiom and into specialized interfaces, like a push/pop stack allocator.

matheusmoreira · 2023-12-18T03:45:37 1702871137

This is great!! I learned a lot from this post. I'm working on a freestanding C project and had to write my own allocator. I ended up replicating the libc design with a doubly linked list of sized blocks which get merged with free neighbors when deallocated.

Never occurred to me to pass the object sizes to the deallocation and reallocation functions, even though such sizes are passed around everywhere in other parts of the code. Redesigning the whole thing would be a big effort but it might be worth it.

el_pollo_diablo · 2023-12-18T09:50:03 1702893003

If the library reallocates non-trivial objects, it may need a more sophisticated reallocation function, that does not simply bitwise copy a block to its new location. Some additional work may be needed (e.g. pointer fixup). The reallocation function could take a "move constructor" callback as parameter.

eqvinox · 2023-12-18T10:05:33 1702893933

I don't think I have ever seen something like this in an allocator API, do you have an example?

el_pollo_diablo · 2023-12-18T10:43:01 1702896181

I don't think I have seen it in publicly available allocators either, but I once implemented it in a custom one. I suppose that this design is not very popular because it is quite fragile: you end up running arbitrary code (the move constructor) in the middle of an allocation routine, when some of the heap's invariants have not yet been restored. Consequently, you typically require that the move constructor not recursively call the allocator, which is a reasonable limitation.

Alternatively, the allocator could treat reallocation as a multi-step process ( a) reserve new space, b) move data, c) free old space), with enough state tracking to allow a) and c) to be different calls to the allocator, so that b) could be performed by the caller.

There is no perfect solution to this problem. I suppose that's why C++ does not have a standard reallocation counterpart to new/delete/new[]/delete[].

habibur · 2023-12-18T00:58:10 1702861090

The hint in OP's last line, when you are passing around function pointers to allocators, you just need only one standard function interface.

    allocate(void\* ptr, size_t size);

    ptr == NULL : alloc()
    size == 0 : free()
    both ptr and size sent : realloc()

That way it's simpler on the API stage. I prefer to keep it like that in my code.

o11c · 2023-12-18T01:21:10 1702862470

A good API can use only a single ABI-level function, but it needs more parameters than that. I imagine 4 arguments, at least 2 of which are themselves composite:

    auto utopia_alloc(Allocation, AlignAndSkew, Size, Flags) -> Allocation;

That said, while "one function" makes interposing easy, it does mean significant branching cost compared to single functions. There's probably a way to get the best of both worlds (support several functions in the ABI, but forward them all to `utopia_alloc` if interposed).

more details at https://gist.github.com/o11c/6b08643335388bbab0228db763f9921...

Aside: I really hate how the nullprogram site linked seems allergic to type-safety.

lifthrasiir · 2023-12-18T01:51:50 1702864310

Oh that is indeed a good list, you should submit that gist independently. That said, I believe alignments should be a suggestion and not a mandatory request, mainly because the library can fix a bad alignment by overallocation. Charles Bloom has suggested a similar thing in the past [1].

Out of possible flags the zeroing flag is probably the most used and possibly only flag needed for library uses; others are also important but only a small number of libraries will ever want them. Again, a failure to do so should ideally be somehow detected and fixed, but that detection can't be actually checking for non-zeros, so the provided function should somehow return whether the memory was actually zeroed or not.

(I also share the same concern about the site, by the way.)

[1] https://cbloomrants.blogspot.com/2015/09/library-writing-rea... (specifically the "Simplicity is better" point)

o11c · 2023-12-18T02:48:50 1702867730

Alignment has to be mandatory because otherwise `free` breaks.

lifthrasiir · 2023-12-18T03:08:05 1702868885

I think you are imagining an aligned allocator out of an unaligned allocator, where the misalignment should be stored somewhere around the actual memory block, but if the main allocator itself is aware of aligned allocations it is not necessary.

I can still imagine the case where allocation bins are structured by the allocation size and alignment size and an incorrect alignment will result in an incorrect bin, but such bins are only beneficial when the alignment itself is too large compared to allocation sizes, and I'm not sure who really wants them in the first place.

eqvinox · 2023-12-18T10:29:47 1702895387

> Aside: I really hate how the nullprogram site linked seems allergic to type-safety.

+∞ to this. Considering the ecosystem limitations as they are, it's paramount to use every possible thing available to catch mistakes.

vitaminCPP · 2023-12-21T22:23:02 1703197382

> Aside: I really hate how the nullprogram site linked seems allergic to type-safety.

Could you give us an example, I'm not sure what you refering to.

jstarks · 2023-12-18T04:53:12 1702875192

I can’t agree. Combining alloc and free into one API is a false economy. You’re not really going to save any significant amount of space, and at best you’ll have an extra predictable branch or two (but probably unpredictable, I’d guess).

More importantly, it’s harder to interpret the calling code’s intent in the edge cases. What does alloc(NULL, 0) mean? Did the caller try to free a null pointer, or allocate a zero-byte object? With separate functions, you can support either or both or neither, but with the combined model the only safe thing is to support neither and panic, lest you interpret an alloc as a free or vice versa.

laserbeam · 2023-12-18T05:58:51 1702879131

Depends on your goals. Lua aimed to be an incredibly tiny language with a very low footprint. The wanted you to need to implement very few functions if you needed to integrate it in your codebase. It has very few data types and function calls for that purpose.

The goals of lua may not align with the goals of most projects, but it does make sense to have a single allocator function there, the same way it makes sense to have a single array/set/hash table data type (what lua calls a Table).

laserbeam · 2023-12-18T06:07:01 1702879621

Alloc(null, 0) can be implementation defined. It could be a noop in release builds and a panic in debug builds. Just how certain allocators do extra work in debug builds to detect leaks anyway.

Keep in mind this is an allocator interface. Behavior is supposed to change and the allocator maker has control over that. Likewise free() doesn't always actually release memory (arena allocators for example), and that's the whole point.

eru · 2023-12-18T01:47:01 1702864021

It's a pet peeve of mine that C people seem to think that 0 isn't a number. Or that 0 (or -1) mean infinity. (Like a timeout of 0 meaning an infinite timeout, instead of failing immediately.)

I guess it's mostly a consequence of C having such poor support for data types.

For a low-level allocator interface this might be an acceptable compromise, but C people also tend to use this kind of mangling when it's not warranted.

astrobe_ · 2023-12-18T13:55:41 1702907741

> It's a pet peeve of mine that C people seem to think that 0 isn't a number. Or that 0 (or -1) mean infinity.

It's not just C people :D - Zero may or may not be a natural number depending on who you ask, and negative numbers definitely are not.

And since in low-level programming, which is kind of C's kingdom, every bit might count and you'd rather use that useless sign bit for something else. -1 being particularly liked by compilers because it is "all bits set" in the dominant two's complement sign encoding, which sometimes allows optimization tricks.

> I guess it's mostly a consequence of C having such poor support for data types.

Technically nothing prevents anyone from returning a small two fields structure - one for a tag and one for the value - to have "tagged unions" or "optional types" almost like your favorite language. But in the eyes of a low-level C programmer, that's one field too much (omg that doesn't fit in a register, you will be held responsible for global warming if you publish this etc.).

So, it's not that we think that anything non-positive is not a "true number", it's more like we've been using these conventions sometimes since assembly programming days (OR AX,AX to test if register AX is zero in old x86 assembly, because the OR instruction sets the CPU's Z flag) in order to get maximum performance.

eru · 2023-12-19T02:13:29 1702952009

You are right about cramming things into bits. But it often makes code more complicated.

Eg if you have two operations you do one after another and they have time-outs of t1 and t2, then normally, you know that after t1+t2, either both will have succeeded, or they will have failed.

But that breaks down with the 'C-conventions' when either t1 or t2 equals 0. Thus you need extra logic to deal with these.

As an aside: OCaml uses tagged pointers to cram more information into machine words. That's an implementation technique, and (mostly) hidden from users of the language. (Apart from integers having 63 bits, instead of 64 bits size by default.) So that's an example where the language is 'clean', but the implementation is still efficient and uses all the low level tricks.

And my pet peeve is more that many people always do this trickery, even when not every bit counts.

> It's not just C people :D - Zero may or may not be a natural number depending on who you ask, and negative numbers definitely are not.

Yes, but I never mentioned any restriction to Natural numbers. Btw, in the context of computers 0 is definitely a number that comes natural, even if it's not a Natural number by some people's definition.

eska · 2023-12-19T11:50:57 1702986657

The number 0 is interesting in the sense that it is actually a comparatively late and abstract invention. The natural numbers are also mathematically defined as either including or excluding 0 depending on the author.

That said the issue is really in-band signaling. “I already have this data type which I don’t fully use, let’s use one of the unused values to represent something completely unrelated..”

eru · 2023-12-20T03:17:24 1703042244

Yes, in-band signalling is error-prone.

If you want to do it properly, you better have good language support for it. (Eg dependent types might help?)

saagarjha · 2023-12-18T03:03:09 1702868589

An allocation of size zero cannot be used for anything useful.

dwattttt · 2023-12-18T05:36:03 1702877763

You could definitely use it for something useful; you can't store anything through it, but if an allocator supports allocations of size 0 with the same semantics of non-zero allocations, you get a "unique" ptr from the allocator dynamically.

Anywhere you need a dynamic list identifier this would suit (yes, a counter could do this too, this would be effectively a globally unique token)

optimiz3 · 2023-12-18T06:29:58 1702880998

One example is a non-null sentinel/oracle allocation. Often you see this in lockless data structures where you don't want to do a null check.

lifthrasiir · 2023-12-18T03:51:23 1702871483

The caller may still want to distingish a successful yet useless allocation from an unsuccessful allocation, which resulted in the current situation.

Nevermark · 2023-12-18T04:52:54 1702875174

Zero bytes are always allocatable.

Unless you need zero byte (and other) pointers to be consistently unique.

But that would mean allocating and deallocating some memory for zero byte requests, to provide unique addresses. Or some reserved memory address span, outside of available (real and virtual) RAM, for zero byte pointers.

lifthrasiir · 2023-12-18T09:37:33 1702892253

> Unless you need zero byte (and other) pointers to be consistently unique.

Yes, this was the problem. People had some different ideas for contracts that `malloc` etc. should guarantee---the most lax requirement would be a "usable" pointer on success, which does almost perfectly constrain non-zero cases but leaves lots of freedom for zero cases. At the opposite side there is a "unique non-NULL" pointer requirement, as you've said, which essentially makes `malloc(0)` almost same to `malloc(1)`.

Rust also has this issue because its reference, a pointer after compilation, is expected to be non-null even for zero-sized types. Rust's rule is somewhat in between: any pointer should be non-null, but it doesn't have to be unique for zero-sized types. The canonical non-null pointer for such types (as returned by `NonNull::dangling`) is the smallest non-zero pointer that satisfies the alignment.

saagarjha · 2023-12-18T04:40:54 1702874454

How would you tell?

eru · 2023-12-19T02:14:56 1702952096

It should still work. Just like adding 0 to a number should still work, or multiplying by 1. The computer should just happily do nothing.

eqvinox · 2023-12-18T10:26:09 1702895169

Hard no. Because:

  $ git grep -P '(X[CM]ALLOC|XREALLOC|XSTRDUP)' | wc -l 
  1612
  $ git grep -P XFREE | wc -l                         
  2000
  $ git grep -P XREALLOC | wc -l   
  51

alloc and free are the most common operations, and having them be their own thing makes it much easier to reason about the code. Folding them into a common function makes the code much harder to review and maintain, both for humans as well as for tools (e.g. coccinelle).

lifthrasiir · 2023-12-18T01:17:41 1702862261

Plus a context pointer and old size, as the OP pointed out.

wolfspaw · 2023-12-18T06:49:15 1702882155

Really liked that single-alloc-function strategy.

Minimalist and Elegant.

charcircuit · 2023-12-17T23:47:10 1702856830

No mention of just having the caller do allocation?

Someone · 2023-12-18T00:16:09 1702858569

That isn’t efficient in cases where the caller cannot know the size to allocate up front. For example:

To use sprintf, you’d have to guess at the buffer size to pass in, or suffer a buffer overflow.

snprintf is a bit better in that you can recover from passing a buffer that’s too small, but still either requires you to be generous in sizing the buffer, or accept that you often call snprintf twice, doing double work.

The way around that is to pass a pointer to an allocation routine, but that’s a simple form of the allocator interface that this article favors.

charcircuit · 2023-12-18T00:48:36 1702860516

>That isn’t efficient in cases where the caller cannot know the size to allocate up front

This can typically be avoided though. With your sprintf example you could have one function that does the first half that calculates the size and then a second function that does the second half that actually copies into the destination buffer. Alternatively if you want to use realloc you could have a function like snprintf which allows resuming from where it ran out of space.

lifthrasiir · 2023-12-18T01:16:27 1702862187

This strategy, while is worth mentioning as another separate post, can only be applied to simple interfaces. A whole API can be much larger and can't avoid allocations inside. There are some cases where the whole API was carefully designed to avoid such allocations, but it is difficult and not always suitable for users.

IAmLiterallyAB · 2023-12-18T01:47:45 1702864065

For the first idea: Yes, but both of those would have to interpret the format string and determine the character lengths of the arguments which is duplicated work. I'd imagine its a large portion of the work. I'm not convinced its any more efficient than snprintf.

The second idea is neat though... it could return an extra argument which is a pointer to the first part of the format string that wasn't printed

lifthrasiir · 2023-12-18T02:15:37 1702865737

A more realistic version of the former is that the call is opportunistically given a buffer and returns an error with the required buffer size if the buffer is too small. If the buffer is big enough you only need a single call, otherwise you need two calls with a redundant processing, which is better then the original version---this even works when the caller can't preallocate a buffer. This approach is pretty common in the Windows API.

----

ADDED: I thought charcircuit was talking about always calling two functions in a row, missing the original comment that charcircuit was replying to. My bad, so I'd like to update this comment as follows...

I think the former approach would indeed work if the first call is always given a pointer to the space that can be used to record what has been done so far. Like, `snprintf(ptr, n, &recover, "format", ...)`, and `recover` can be relatively small, like 16 bytes. I would record original `ptr` and `format` arguments to be safe, and largest offsets to `ptr` and `format` that are known to be written and synchronized to each other. Of course this is just a workaround for C's inability to construct and return a sum type.

IAmLiterallyAB · 2023-12-18T02:19:41 1702865981

? This is exactly how snprintf works, which is what we're discussing.

lifthrasiir · 2023-12-18T02:23:09 1702866189

Oh wait, I missed the original comment. I see what did you actually mean there.

charcircuit · 2023-12-18T02:11:04 1702865464

>which is duplicated work

The caller can allocate memory to be shared between the 2 function calls to avoid needing duplicate work

vore · 2023-12-18T04:53:24 1702875204

How do you know how much memory you need to allocate when you need to do the work in the first place to determine it? Maybe if snprintf was resumable by supplying a larger buffer while it was doing its work, but this seems like a lot of complexity.

charcircuit · 2023-12-18T05:18:23 1702876703

In this case the state takes a constant amount of space so it is just a matter of allocating a single struct.

With the approach of having the caller allocate you have carefully design the API to make it possible.

planede · 2023-12-18T09:41:54 1702892514

What if the function does more than a single allocation? Imagine a library like libxml2 that can parse a whole XML document into a DOM tree. In many cases you it would be nice to just pass an arena allocator and throw away the whole DOM in one go. Unfortunately libxml2's allocator customization kinda sucks and it's not easy to do.

charcircuit · 2023-12-18T21:09:12 1702933752

You can still do something like the original article describes.

planede · 2023-12-19T09:33:38 1702978418

Sure, that's what I was getting at. Being able to pass stateful allocators as function arguments is very flexible.

eska · 2023-12-18T00:23:54 1702859034

This is about the case when the caller can’t know how much memory to allocate but is able to provide an estimate for most cases. Either it becomes possible to avoid repeated allocation and make deallocation faster, or another classic dynamic allocator can be asked for additional memory for outlier cases.

dwattttt · 2023-12-18T06:08:57 1702879737

If the lifetime of the allocation works out this can end up with a nice pattern of "try to use a cheap stack buffer, and fall back to a more expensive heap allocation when that fails"

cmiller1 · 2023-12-18T02:32:09 1702866729

If you click the link in the first sentence it brings you to an article about that.

ooterness · 2023-12-17T23:51:32 1702857092

"Just"

Woodi · 2023-12-18T05:23:00 1702876980

Umm, C looks like nice language...

Then maybe that so-so-problematic stringZ are too just bad convention ? Maybe put some intern to try other implementation ? ;)

Woodi · 2023-12-19T05:17:52 1702963072

You seriously think libc shouldn't or can't be redone ?? Without \0's this time.