Leveraging Zig's Allocators

eknkc · 2024-06-15T15:28:59 1718465339

I think the last sample needs a `fba.reset()` call in between requests.

BTW, I used zig a lot recently and the opaque allocator system is great. You can create weird wrappers and stuff.

For example, the standard library json parser will parse json, deserialize a type that you requested (say, a struct). But it needs to allocate stuff. So it creates an arena for that specific operation and returns a wrapper that has a `deinit` method. Calling it deinits the arena so you essentially free everything in your graph of structs, arrays etc. And since it receives an upstream allocator for the arena, you could pass in any allocator. A fixed stack allocator if you wish to use stack space, another arena, maybe jemalloc wrapper. A test allocator that checks for memory leaks.. Whatever.

hinkley · 2024-06-15T16:59:11 1718470751

When I read json I always end up holding onto values from some of the keys. Sometimes the keys too if the node is abstract enough.

I assume the receiver then has to know it has to clone all of those values, yes?

That seems a little tricky for general code and moreso for unit tests.

forrestthewoods · 2024-06-15T19:08:04 1718478484

Unit tests are trivial because you can probably use a single arena that is only reset once at the end of the test. Unless the test is specifically to stress test memory in some form.

> I assume the receiver then has to know it has to clone all of those values, yes?

The receiver needs to understand the lifetime any which way. If you parse a large JSON blob and wish to retain arbitrary key/values you have to understand how long they're valid for.

If you're using a garbage collection language you can not worry about it (you just have to worry about other things!). You can think about it less if the key/values are ref-counted. But for most C-like language implementations you probably have to retain either the entire parsed structure or clone the key/values you care about.

ComputerGuru · 2024-06-16T14:45:27 1718549127

I assume the following is perfectly doable from a technical perspective but is there any community support for using multiple allocators in this case to, eg parse general state to an arena and specific variables you want to use thereafter to a different allocator to remain long lived?

anonymoushn · 2024-06-15T17:06:01 1718471161

You can pass the json deserializer an allocator that is appropriate for the lifetime of the object you want to get out of it, so often no copying is required.

saagarjha · 2024-06-15T20:07:28 1718482048

Right, but that means you lose the simplicity and performance benefits of an arena allocator.

anonymoushn · 2024-06-15T21:21:57 1718486517

I've mainly written unusual code that allocates a bunch of FixedBufferAlocators up front and clears each of them according to their own lifecycles. I agree that more typical code would reach for a GPA or something here. If you're using simdjzon, the tape and strings will be allocated contiguously within a pair of buffers (and then if you actually want to copy from the tape to your own struct containing slices or pointers then you'll have to decide where that goes), but the std json stuff will just repeatedly call whatever allocator you give it.

saagarjha · 2024-06-16T22:25:10 1718576710

Well, see, the problem is that you’re going to have to keep the allocator alive for the entire lifetime you use any data whatsoever from the JSON. Or clone it manually. Both seem like they largely negate any benefit you get from using it?

CyberDildonics · 2024-06-15T17:06:01 1718471161

Why wouldn't this be better done with a class that takes care of its memory when it goes out of scope?

anonymoushn · 2024-06-15T17:06:29 1718471189

There are no automatically-invoked destructors in Zig.

saagarjha · 2024-06-15T20:08:16 1718482096

Perhaps if this was added it would prove to be a better solution in this case?

anonymoushn · 2024-06-15T21:28:17 1718486897

It would prevent you from writing the bug at the top of the thread.

I have stopped considering this sort of thing as a potential addition to the language because the BDFL doesn't like it. So realistically we must remember to write reset, or defer deinit, etc. This sort of case hurts a little, but people who are used to RAII will experience more pain in cases where they want to return the value or store it somewhere and some other code gains responsibility for deinitializing it eventually.

i4k · 2024-06-16T02:47:49 1718506069

On the other hand, is more clear where things are released. When over relying on destructors, often it becomes trick to known when it happens and in which order. This kind of trade off is important to take into consideration depending on the project.

juliangmp · 2024-06-16T09:26:10 1718529970

Maybe I'm too used to C++ and Rust but I don't find it tricky at all. Its very clearly defined when a destructor (or fn drop) is called as well as the order (inverse allocation order).

What I would like to see would be some way of forcing users to manually call the dtor/drop. So when I use a type like that I have to manually decide when it gets destroyed, and have actual compile checks that I do destroy the object in every code path.

OskarS · 2024-06-16T13:49:39 1718545779

I will admit that I really miss the "value semantics" thing that RAII gives you when working in Zig. A good example is collections, like hash tables or whatever: ownership is super clear in C++/Rust. When the hash tables goes away, all the contained values goes away, because the table owns its contents. When you assign a key/value, you don't have to consider what happens to the old key/value (if there was one), RAII just takes care of it. A type that manages a resource has value semantics just like "primitive" types, you don't have to worry.

Not so in Zig: whenever you deal with collections where either the keys or values manages a resource (i.e. does not have value semantics), you have to be incredibly careful, because you have to consider lifetimes of the HashMap and the keys/values separately. A function like HashMap.put is sort of terrifying for this reason, very easy to create a memory leak.

I get why Zig does it though, and I don't think adding C++/Rust style value semantics into the language is a good idea. But it certainly sometimes makes it more challenging to work in.

anonymoushn · 2024-06-16T10:30:50 1718533850

Among systems languages, I've mostly used C and Zig. I don't think dtor order is tricky so much as I think that defaulting to fine-grained automatic resource management including running a lot of destructors causes programs that use this default to pay large costs at runtime :(

I think the latter problem is impossible, you end up needing RAII or a type-level lifetime system that can't express a lot of correct programs. I would like something that prevents "accidentally didn't deinit" in simple cases, but it probably wouldn't prevent "accidentally didn't call reset on your FixedBufferAllocator" because the tooling doesn't know about your request lifecycle.

throwawaymaths · 2024-06-16T13:25:36 1718544336

You want a code analyzer on top of a language like zig. I think people think it's hard because it would really be hard for C. Probably would be MUCH easier in zig.

pharrington · 2024-06-16T00:16:02 1718496962

One of Zig's design goals is to have as little implicit behavior as possible.

saagarjha · 2024-06-16T04:18:21 1718511501

I am well aware.

usrnm · 2024-06-16T07:26:17 1718522777

Because you can have an arbitrary number of object that can all be freed in O(1), instead of traversing a tree and calling individual destructors. An arena per object makes no sense

tapirl · 2024-06-15T18:45:18 1718477118

Need varies. Some memory needs to be still alive after the parse process.

latch · 2024-06-15T16:24:23 1718468663

fixed, thanks.

gizmo · 2024-06-15T15:00:50 1718463650

I'm not 100% sure how Zig allocators work but it looks like the arena memory is getting re-used without zeroing the memory? With slight memory corruption freed memory from a previous request can end up leaking. That's not great.

Even if you don't have process isolation between workers (which is generally what you want) then you can still put memory arenas far apart in virtual memory, make use of inaccessible guard pages, and take other precautions to prevent catastrophic memory corruption.

eknkc · 2024-06-15T15:13:05 1718464385

I guess you could place a zeroing allocator wrapper in between the arena and it's underlying allocator. That would write zero to anything that's getting freed. Arena deinit will free anything allocated from the underlying allocator so upon completion of each request, used memory would be zeroed before returned back to the main allocator.

And that handler signature would still be the same. Which is the he whole point of this article so, yay.

samatman · 2024-06-15T16:47:26 1718470046

I once spent an utterly baffling afternoon trying to figure out why my benchmark for a reverse iteration across a rope data structure in Julia was finishing way too fast. I was perf tuning it, and while it would have been lovely if my implementation was actually 50 times faster than reverse iterating a native String type, I didn't buy it.

Finally figured it out: I flipped a sign in the reverse iterator, so it was allocating a bunch of memory and immediately hitting the margin of the Vector, and returning it with most of the bytes undefined. Why didn't I catch it sooner? Well, I kept running the benchmark, which allocated a reverse buffer for the String version, which GC released, then I ran the buggy code... and the GC picked up the recently freed correct data and handed it back to me! Oops.

Of course, if you want to avoid that risk in Zig, you just write a ZeroOnFreeAllocator, which zeros out your memory when you free it. It's a drop in replacement for anything which needs an allocator, job done.

hansvm · 2024-06-15T21:57:57 1718488677

In my Zig servers I'm using a similar arena-based (with resetting) strategy. It's not as bad as you'd imagine:

The current alloc implementation memsets under the hood. There are ongoing discussions about the right way to remove that performance overhead, but safety comes first.

Any sane implementation has an arena per request and per connection anyway, not shared between processes. You don't have bonkers aliasing bugs because the OS would have panicked before handing out that memory.

Zig has a lot of small features designed to make memory corruption an unhappy code path. I've had one corruption bug out of a lot of Zig code the last few years. It was from a misunderstanding of async (a classic stack pointer leak disguised by a confusion over async syntax). It's not an issue since async is gone from the language, and that sort of thing is normally turned into a compiler error anyway as soon as somebody reports it.

KerrAvon · 2024-06-15T15:46:58 1718466418

That’s not specific to Zig — local heap allocators generally don’t zero deallocated memory — that’s a significant, unnecessary performance hit.

If you need data to be isolated when memory is corrupt, you need it to be isolated always.

10000truths · 2024-06-15T18:06:09 1718474769

memset is the golden example of an easily pipelined, parallelized, predictable CPU operation - any semi-modern CPU couldn't ask for easier work to do. Zeroing 8 KB of memory is very cheap.

If we use a modern Xeon chip as an example, an AVX2 store has a throughput of 2 instructions / cycle. Doing that 256 times for 8 KB totals 128 cycles, plus a few extra cycles to account for the latency of issuing the first instruction and the last store to the L1 cache. With a 2 GHz clock frequency, it still takes less than 70 nanoseconds. For comparison, an integer divide has a worst-case latency of 90ish cycles, or 45ish nanoseconds.

toast0 · 2024-06-15T21:25:51 1718486751

Zeroing memory is very cheap, but not zeroing it is even cheaper.

Zeroing memory on deallocation can be important for sensitive data. Otherwise, it makes more sense to zero on allocation if you know that it's needed because the allocated structure will be used without initilazation and the memory isn't zero by guarantee (most OSes guarantee newly allocated memory will be zero, and have a process to zero pages in the background when possible)

10000truths · 2024-06-15T22:22:12 1718490132

Sure, but in most practical applications where an HTTP server is involved, zeroing the request/response buffer memory is very unlikely to ever be your bottleneck. Even at 10K RPS per core, your per-request CPU time budget is 100 microseconds. Zeroing memory will only account for a fraction of a percentage of that.

If you're exposing an HTTP API to clients, it's likely that any response's contents will contain sensitive client-specific data. If memory corruption bugs are more likely than bottlenecking on zeroing out your request/response buffer, then zeroing the request/response buffer is a good idea, until proven otherwise by benchmarks or profiling.

adgjlsfhk1 · 2024-06-16T00:26:07 1718497567

Zeroing on allocation is much more sensible though because that way you preload the memory into your caches as opposed to on deallocation where you bring memory into cache that you know you no longer care about. Also if you do the zero on allocation, the compiler can delete it if it can prove that you write to the memory before reading to it.

celrod · 2024-06-15T19:00:47 1718478047

This memory is now the least recently used in the L1 cache, despite being freed by the allocator, meaning it probably isn't being used again.

If it was freed after already being removed from the L1 cache, then you also need to evict other L1 cache contents and wait for it to be read into L1 so you can write to it.

128 cycles is a generous estimate, and ignores the costs to the rest of the program.

astrange · 2024-06-15T20:46:50 1718484410

You can use non-temporal writes to avoid this, and some CPUs have an instruction that zeroes a cache line. It's not expensive to do this.

celrod · 2024-06-15T23:51:25 1718495485

Nontemporal writes are substantially slower, e.g. with avx512 you can do 1 64 byte nontemporal write every 5 or so clock cycles. That puts you at >= 640 cycles for 8 KiB. https://uops.info/html-instr/VMOVNTPS_M512_ZMM.html

astrange · 2024-06-16T02:10:47 1718503847

Well, the point of a non-temporal write kind of is that you don't care how fast it is. (Since if it was being read again anytime soon, you'd want it in the cache.)

But yes, it can be an over-optimization.

10000truths · 2024-06-15T20:11:47 1718482307

The worker is already reading/writing to the buffer memory to service each incoming HTTP request, whether the memory is zeroed or not. The side effects on the CPU cache are insubstantial.

alexchamberlain · 2024-06-15T19:09:55 1718478595

This might be a stupid question, but why isn't zeroing 8KB of memory a single instruction? It must be so common as to be worthy that all the layers of memory (and indirection) to understand that.

astrange · 2024-06-15T20:47:40 1718484460

If the memory is above the size of a page, you can tell the VM to drop the page and give you a new zero filled one instead.

josephg · 2024-06-15T21:45:31 1718487931

For 8kb? Syscalling in to the kernel, updating the processes’s memory map and then later faulting is probably slower by an order of magnitude or more compared to just setting those bytes to zero.

Memcpy, bzero and friends are insanely fast. Practically free when those bytes are in the cpu’s cache already.

astrange · 2024-06-15T22:34:58 1718490898

So don't syscall. Darwin has a system similar to io_uring for this.

(But it also has a 16KB page size.)

josephg · 2024-06-16T06:52:43 1718520763

Probably still cause a page fault when the memory is re-accessed though. I suspect even using io_uring will still be a lot slower than bzero if you're just zeroing out 2 pages of memory. Zeroing memory is really fast.

pcwalton · 2024-06-15T23:38:15 1718494695

128-bit or 256-bit memsets via SIMD instructions are sufficient to saturate RAM bandwidth, so there wouldn't be much of a gain from having a dedicated instruction.

(By the way, x86 does have a dedicated instruction--rep stosb--but compilers differ as to how often they use it, for the reason cited above.)

anonymoushn · 2024-06-16T10:47:48 1718534868

Supposedly rep movsb is faster than SIMD stores on very recent chips, for cases where you aren't actually hitting RAM with all your writes.

tubs · 2024-06-16T04:49:14 1718513354

The gain is in power efficiency.

Arm64 provides `dc zva` for this.

saagarjha · 2024-06-15T20:16:06 1718482566

Zeroing something that large is not typical. That said, some architectures have optimized zeroing instructions, such as dc zva on ARM.

secondcoming · 2024-06-15T21:34:40 1718487280

compilers are probably going to remove that memset.

olliej · 2024-06-16T02:28:56 1718504936

Compilers can remove the memset if they can show it is overwritten prior to use (though C and C++ UB could technically make it possible to skip padding they don’t), or it isn’t used (in which case we go back to non-zero’d memory again which in this scenario we’re trying to avoid).

There are various _s variants of memset, etc that require the compiler to perform the operations even if it “proves” the data cannot be read.

And finally modern hardware has mechanisms to say “this is now zero” and not actually zero the memory and instead just tell the MMU that the region is now zero (which removes the cpu time and cache impact of accessing the memory directly).

On macOS and iOS I believe all memory is now zero’d on free and I think malloc ostensibly therefore guarantees zero’d memory (the problem I think is whether calloc tries to rely on that behavior, because then calloc can produce non-zero memory courtesy of a buffer overrun/UaF after free has ostensibly zero’d memory)

josephg · 2024-06-15T21:46:28 1718487988

In C, you can use explicit_bzero to make sure the instructions aren’t removed by the optimiser:

https://man7.org/linux/man-pages/man3/bzero.3.html

pjmlp · 2024-06-16T20:40:56 1718570456

> Marked as LEGACY in POSIX.1-2001. Removed in POSIX.1-2008.

In Linux you mean.

atiedebee · 2024-06-16T23:04:34 1718579074

The only standard explicit memset is in C23

nurpax · 2024-06-15T15:30:55 1718465455

The same can happen with C malloc/free too.

jedisct1 · 2024-06-15T16:05:49 1718467549

Zig allocators can be composed, so adding zeroization would be trivial.

keybored · 2024-06-15T15:55:13 1718466913

Deinit in O(1) seems to be a big attraction of arenas.

foota · 2024-06-15T17:01:07 1718470867

O(1) is nice, but I feel like avoiding walking a bunch of data structures is maybe most important.

elvircrn · 2024-06-15T17:22:56 1718472176

Any papers/blogs/SO answers covering this?

foota · 2024-06-15T21:03:22 1718485402

I don't have anything for you, but if you have some normally allocated hierarchal data structures in order to free them you'll have to go through their members, chase pointers, etc., to figure out the addresses to free, then call free on them in sequence. That's all going to be a lot more expensive than just memsetting a bunch of data to zero, which you can do at whatever the speed of your cores memory bandwidth is.

josephg · 2024-06-15T21:56:40 1718488600

Yep. And you often don’t even need to zero the data.

Generally, no paper or SO answer will tell you where your program spends its time. Learn to use profiling tools, and experiment with stuff like this. Try out arenas. Benchmark before and after and see what kind of real performance difference it makes in your own program.

saagarjha · 2024-06-15T20:10:02 1718482202

What are you looking for? Bump allocators are quite simple, compared to typical allocators at least.

tapirl · 2024-06-15T18:49:10 1718477350

If needed, you should zero memory on allocation succeeds, instead of zeroing it after it is freed.

alexchamberlain · 2024-06-15T19:08:02 1718478482

Generally, you 0 on free in secure environments to avoid leaking secrets from 1 section of knowledge to the next. ie a request may contain a password, which the next request should not have access to.

tapirl · 2024-06-16T12:33:15 1718541195

Good reason. But I think it is not the responsibility of memory allocators to do the zero work. It is what the application code should do.

alexchamberlain · 2024-06-16T13:52:05 1718545925

Depends where you draw the line. An arena allocator per request needs to be managed at least by an app framework, if not the application. It's all layers of abstraction, and one of those layers needs to 0 memory.

tapirl · 2024-06-18T07:13:50 1718694830

The arena allocator implementation for general uses absolutely should not do the zero work. This is specific use case, which can be implemented in an app-specific custom allocator.

alexchamberlain · 2024-06-18T08:20:32 1718698832

That's not what I said. My point was that an arena allocator has to be managed at a relatively high level. Similarly, an allocator responsible for 0 on free would be managed at a similar level. They are orthogonal concepts as you say, but there's no reason 0 on free can't be managed by an allocator.

saagarjha · 2024-06-15T20:08:52 1718482132

Guard pages are not enough to prevent memory corruption across requests.

skybrian · 2024-06-15T14:00:37 1718460037

Is there a reason why someone wouldn't use retain_with_limit or is doing without it just an exercise?

latch · 2024-06-15T16:30:25 1718469025

The inspiration for the post came from my httpz library. The fallback using a FixedBufferAllocator + ArenaAllocator is used. The fixed buffer is a thread local. But the arena allocators belong to connections, of which there could be thousands.

You might have 1 fixed buffer, for N (500+) ArenaAllocators (but only being used per one at a time). This allows you to allocate a relatively large fixed buffer since you have relatively few threads.

If you just used retain_with_limit, then you'd either have to have a much smaller retained size, or you'd need a lot more memory.

https://github.com/karlseguin/http.zig/blob/c8b04e3fef5abf32...

cageface · 2024-06-16T07:56:54 1718524614

I have to admit I don’t really understand the problem Zig is trying to solve. If you’re not trying at the language level to address the core problems of C/C++, like Rust is, then it seems like you’re just making a more ergonomic version of those languages and that’s not enough to overcome how deeply entrenched they are.

vanderZwan · 2024-06-16T11:03:06 1718535786

Disclaimer: I don't really write i anything in any of the four languages mentioned, so this is just my impression from the outsid looking in, but I'm sure that for many people the ergonomics are one of the core problems of C/C++.

Matklad also had an interesting take in one of their blogs: if you have to write unsafe code, then Zig is actually both a more ergonomic language than Rust, and easier to write "safe" unsafe code in.

lionkor · 2024-06-16T08:20:41 1718526041

Zigs tight C integration makes it easy to just start using Zig in an existing C codebase, which is a great way to overcome the challenges you've mentioned. It doesn't need to replace C all at once, just slowly.

cageface · 2024-06-16T09:00:51 1718528451

Sure but is that really enough to get buy in on a whole new language? Especially when the new language still leaves open the door for so many of the critical problems with C?

flohofwoe · 2024-06-16T14:05:51 1718546751

Dangling references is about the only C problem that Zig doesn't fix with language/compiler features, everything else is taken care of and mostly in a quite elegant and minimalistic way. Also Zig doesn't need to replace C to be successful, just augment it - and for that it's a already a really good choice.

throwawaymaths · 2024-06-16T13:27:00 1718544420

I mean zig really leaves open ONE door, which is temporal memory safety.

It is likely that zig will solve this. Rust says you must solve this with the type system. I'm not convinced that is the case.

kaba0 · 2024-06-18T07:41:46 1718696506

Also, it doesn’t really have a solution. There are safe ways to concurrently access the same memory region, that rust disallows without unsafe, and if you are writing a high-performance language runtime you might well need these features.

throwawaymaths · 2024-06-19T11:51:45 1718797905

Yeah but you'll still need something to live around the high performance features and a mechanism to check temporal memory safety would be nice.

anonymoushn · 2024-06-16T10:40:09 1718534409

A couple years ago I evaluated Rust for a low-latency application that does math and has a websocket and http client. I didn't use Rust because basically every library for speaking these protocols thought that it was fine to call the global allocator, or run UTF-8 validation on the returned buffers, or pull in a huge async runtime, etc.

jedisct1 · 2024-06-16T15:10:46 1718550646

Along with Go, it's the most please and most productive language I've used for the past 10 years.

And I honestly think I'd have completely stopped writing code if I hadn't discovered Zig, after Rust made me hate what I used to love.

cageface · 2024-06-16T15:15:20 1718550920

That's great but if you're getting paid to write that code you're not getting paid to enjoy it. You're getting paid to write code that doesn't have bugs and security holes.

Our profession's tendency to take the easy way out on tech choices is eventually going to lead to the kind of regulation and certification other engineers are subject to.

deagle50 · 2024-06-25T08:43:32 1719305012

Some people want to enjoy work. And it likely they're referring to personal projects.

pjmlp · 2024-06-16T11:36:09 1718537769

Basically it is Modula-2 with a C facelift, and compile time metaprogramming, for those that want Safe C.

Which means, while the safety is much better than raw C, with proper strings and arrays, alongside stronger type system, it still has UAF as possible gotcha.

trealira · 2024-06-16T12:06:17 1718539577

What is UAF?

pjmlp · 2024-06-16T13:05:52 1718543152

Use After Free.

sbussard · 2024-06-15T21:08:21 1718485701

Has anyone here used zig with Bazel?

hansvm · 2024-06-15T22:04:53 1718489093

Not me, not yet, and it's been a few years since I've used Blaze.

It ought to be fairly straightforward. Zig is an easy dependency to either vendor or install on a given system/code-base (much more painful currently if you want Blaze to also build Zig itself), and at a bare minimum you could just add BUILD steps for each of the artifacts define in build.zig.

Things get more interesting if you want to take advantage of Zig's caching, especially once incremental compilation is fully released. It's a fast enough compilation step that perhaps you could totally ignore Zig's caching for now and wait to see how that feature shapes up before making any hard decisions, but my spidey senses say that'll be a nontrivial amount of work for _somebody_ to integrate those two ideas.

kristoff_it · 2024-06-19T08:06:33 1718784393

Uber has https://github.com/uber/hermetic_cc_toolchain

mikemitchelldev · 2024-06-15T14:10:53 1718460653

Off topic but I wish Go had chosen to use fn rather than func

samatman · 2024-06-15T16:56:30 1718470590

I happen to agree. When I see 'fn' I "hear" function, when I see 'func' I hear "funk, but misspelled".

Also, with four space indentation (or for Go, a four space tabset), 'func' aligns right where the code begins, pushing the function name off one space to the right. For 'fn' the function name starts one space before the code, I find this more aesthetic. Then again, the standard tabset is eight spaces, so this matters less in Go.

It would be pretty silly to pick a language on the basis of that kind of superficial window dressing, of course. But I know which one I prefer.

scns · 2024-06-15T19:21:47 1718479307

I like the way Kotlin did it: fun. Gives a nice alignement of four with the space. And functional programming can be fun.

vram22 · 2024-06-15T23:45:37 1718495137

>And functional programming can be fun.

can is the operative word, not is. Thinking of that other word, the m-word, that sounds like nomad. ;)

/jk

whobre · 2024-06-15T18:28:48 1718476128

I’ll never understand why people care about such things.

tapirl · 2024-06-15T18:51:26 1718477486

Maybe someone think "func" listens like another word.

akira2501 · 2024-06-15T19:14:12 1718478852

You don't understand personal preferences? Or you don't understand the desire to share them with your peers? Or you can't understand why people don't just bully themselves into silence for the benefit of others?

ziggy_star · 2024-06-15T14:13:48 1718460828

Oh hey, are you that Mitch?

I literally just signed up to ask if anybody can recommend any good Zig codebases to read other than Tigerbeatle. How's your terminal going?

Edit: The rest of the posted site seems like a treasure trove not just this one article. Was wondering how to get into Zig and here we are. Such kismet.

Almost missed it so heads up for others.

slimsag · 2024-06-15T14:49:13 1718462953

We're working on a game engine in Zig[0]

If you're looking for interesting Zig codebases to read, you might be interested in our low-level audio input/output library[1] or our module system[2] codebase - the latter includes an entity component system and uses Zig's comptime to a great degree to enable some interesting flexibility (dependency injection, global view of the world, etc.) while maintaining a great amount of type safety in an otherwise dynamic system.

[0] https://machengine.org/

[1] https://github.com/hexops/mach/tree/main/src/sysaudio

[2] https://github.com/hexops/mach/tree/main/src/module

ziggy_star · 2024-06-15T14:51:44 1718463104

Thanks!

mikemitchelldev · 2024-06-15T14:16:27 1718460987

No, sorry not me. Though I have signed up for an invite for that terminal by Mitchell Hashimoto (I think his name is).

lionkor · 2024-06-15T13:58:33 1718459913

I love the way Zig does allocators, when you compare it to Rust where allocation failures just panic (rolls eyes)

bombela · 2024-06-15T17:52:40 1718473960

It's getting there eventually! https://doc.rust-lang.org/std/boxed/struct.Box.html#method.t...

hypeatei · 2024-06-15T18:28:31 1718476111

I agree, the lack of control is frustrating but on the contrary: how much software is actually going to do anything useful if allocation is failing? Designing your std library around the common case then gathering input on what memory fallible APIs should look like is smarter IMO.

hansvm · 2024-06-15T22:14:16 1718489656

Most problems have speed/memory/disk tradeoffs available. Simple coding strategies include "if RAM then do the fast thing, else do the slightly slower thing", "if RAM then allocate that way, else use mmap", "if RAM then continue, else notify operator without throwing away all their work", ....

Rust was still probably right to not expose that at first since memory is supposed be fairly transparent, but Zig forces the user to care about memory, and given that constraint it's nearly free to also inform them of problems. The stdlib is already designed (like in Rust) around allocations succeeding, since those errors are just passed to the caller, but Zig can immediately start collecting data about how people use those capabilities. At a language level, including visibility into allocation failures was IMO a good idea.

pcwalton · 2024-06-15T23:41:43 1718494903

Stack overflow, which is a type of allocation failure, still aborts in Zig, so it's not that simple.

anonymoushn · 2024-06-16T10:46:43 1718534803

andrewrk sees this as a failure but doesn't yet have a solution. It seems difficult to provide users with features they want, such as recursion and C interop, while also statically proving the stack is sufficiently small.

In programs that don't have stack overflows, it's nice to be able to handle allocation failures though :)

kibwen · 2024-06-15T21:10:39 1718485839

The Rust standard library aborts on allocation failure using the basic APIs, but Rust itself doesn't allocate. If someone wanted to write a Zig-style library in Rust, it would work just fine.

ctxcode · 2024-06-15T16:31:02 1718469062

These kind of tactics work for simple examples. In real world http servers you'll retain memory across requests (caches) and you'll need a way to handle blocking io. That's why most commonly we use GC'd/ownership languages for this + things like goroutines/tokio/etc.. web devs dont want to deal with memory themselfs.

samatman · 2024-06-15T16:41:48 1718469708

It scales to complex examples as well. Retained memory would be handled with its own allocator: for a large data structure like an LRU cache, one would initialize it with a pointer to the allocator, and use that internally to manage the memory.

Blocking (or rather, non-blocking, which is clearly what you meant) IO is a different story. Zig had an async system, but it had problems and got removed a couple point releases ago. There's libxev[0] for evented programs, from Mitchell Hashimoto. It's not mature yet but it offers a good solution to single-threaded concurrency and non-blocking IO.

I don't think Zig is the best choice for multithreaded programs, however, unless they're carefully engineered to share little to no memory (using message passing, for instance). You'd have to take care of locking and atomic ops manually, and unlike memory bugs, Zig doesn't have a lot of built-in support for catching problems with that.

A language with manual memory allocation isn't going to be the language of choice for writing web servers, for pretty obvious reasons. But for an application like squeezing the best performance out of a resource-constrained environment, the tradeoffs start to make sense.

[0]: https://github.com/mitchellh/libxev

sph · 2024-06-15T16:35:39 1718469339

Off the top of my head, I was wondering... for software like web services, isn't it easier and faster to use a bump allocator per request, and release the whole block at the end of it? Assuming the number of concurrent requests/memory usage is known and you don't expect any massive spike.

I am working on an actor language kernel, and was thinking of adopting the same strategy, i.e. using a very naive bump allocator per actor, with the idea that many actors die pretty quickly so you don't have to pay for the cost of GC most of the time. You can run the GC after a certain threshold of memory usage.

hansvm · 2024-06-15T22:23:30 1718490210

The problem _somebody_ between the hardware and your webapp has to deal with is fragmentation, and it's especially annoying with requests which don't evenly consume RAM. Your OS can map pages around that problem, but it's cheaper to have a contiguous right-sized allocation which you never re-initialize.

Assuming the number of concurrent requests is known and they have bounded memory usage (the latter is application-dependant, the former can be emulated by 503-erroring excess requests, or something trickier if clients handle that poorly), yeah, just splay a bunch of bump allocators evenly throughout RAM, and don't worry about the details. It's not much faster though. The steady state for reset-arenas is that they're all right-sized contiguous bump allocators. Using that strategy, arenas are a negligible contribution to the costs of a 200k QPS/core service.

ctxcode · 2024-06-15T18:35:31 1718476531

If you never cache any data. Sure, u can use a bump allocator. Otherwise it gets tricky. I havent worked with actors really, but from the looks of it, it seems like they would create alot of bottlenecks compared to coroutines. And it would probably throw all your bump allocator performance benefits out the window. As for the GC thing. You cant 'just' call a GC.. Either you use a bump allocator or you use a GC. Your GC cant steal objects from your bump allocator. It can copy it... but then the reference changes and that's a big problem.

anonymoushn · 2024-06-15T21:32:07 1718487127

I think this comment assumes that you're using one allocator, but it's probably normal in Zig to use one allocator for your caches, and another allocator for your per-request state, with one instance of the latter sort of allocator for each execution context that handles requests (probably coroutines). So you can just have both, and the stuff that can go in the bump allocator does, and concurrent requests don't step on each others toes.

jerf · 2024-06-15T16:44:00 1718469840

Have you looked at how Erlang does memory management within its processes? You definitely can "get away" with a lot of things when you have actors you can reasonably expect will be small scale, if you are absolutely sure their data dies with them.

sph · 2024-06-16T10:38:50 1718534330

The trick to Erlang's memory management is that data is immutable and never shared, so all the complication and contention around GC and atomic locks just disappear.

mst · 2024-06-16T11:49:28 1718538568

The key thing (as I understand it) is that each process naturally has a relatively small private set of data, so Erlang can use a stop-the-process semispace copying collection strategy and it's fast enough to work out fine.

Since nothing can be writing to it during that anyway, I'm not sure the language level immutability makes a lot of difference to GC itself.

latch · 2024-06-15T16:38:02 1718469482

This example came from a real world http server. Admittedly, Zig's "web dev" community is small, but we're trying :) I'm sure a lot could be improved in httpz, but it's filling a gap.

anonymoushn · 2024-06-15T16:46:50 1718470010

You can use these patterns for per-request resources that persist across some I/O calls using async if you are on an old version of Zig or using zigcoro while you wait for the feature to return to the language. zigcoro's API is designed to make the eventual transition back to language-level async easy.

pjmlp · 2024-06-15T16:45:07 1718469907

Yes, we were creating Apache and IIS plugins 25 years ago for PHP, Perl, Tcl, Coldfusion, ASP (scripting COM in VB), or C++ frameworks like ATLServer.

Very few were dealing with raw memory management in C, without anything else.

And all of this evolved into the Web application servers, and distributed computing landscape of modern times.