More

bluetomcat · 2025-10-10T06:36:09 1760078169

Good C code will try to avoid allocations as much as possible in the first place. You absolutely don’t need to copy strings around when handling a request. You can read data from the socket in a fixed-size buffer, do all the processing in-place, and then process the next chunk in-place too. You get predictable performance and the thing will work like precise clockwork. Reading the entire thing just to copy the body of the request in another location makes no sense. Most of the “nice” javaesque XXXParser, XXXBuilder, XXXManager abstractions seen in “easier” languages make little sense in C. They obfuscate what really needs to happen in memory to solve a problem efficiently.

lelanthran · 2025-10-10T09:18:55 1760087935

> Good C code will try to avoid allocations as much as possible in the first place.

I've upvoted you, but I'm not so sure I agree though.

Sure, each allocation imposes a new obligation to track that allocation, but on the downside, passing around already-allocated blocks imposes a new burden for each call to ensure that the callees have the correct permissions (modify it, reallocate it, free it, etc).

If you're doing any sort of concurrency this can be hard to track - sometimes it's easier to simply allocate a new block and give it to the callee, and then the caller can forget all about it (callee then has the obligation to free it).

obviouslynotme · 2025-10-11T02:28:00 1760149680

The most important pattern to learn in C is to allocate a giant arena upfront and reuse it over and over in a loop. Ideally, there is only one allocation and deallocation in the entire program. As with all things multi-threaded, this becomes trickier. Luckily, web servers are embarrassingly parallel, so you can just have an arena for each worker thread. Unluckily, web servers do a large amount of string processing, so you have to be careful in how you build them to prevent the memory requirements from exploding. As always, tradeoffs can and will be made depending on what you are actually doing.

Short-run programs are even easier. You just never deallocate and then exit(0).

adrianN · 2025-10-11T02:46:06 1760150766

Arenas are a nice tool, but they don't work for all use cases. In the limit you're reimplementing malloc on top of your big chunk of memory.

galangalalgol · 2025-10-11T03:17:00 1760152620

Most games have to do this for performance reasons at some point and there are plenty of variants to choose from. Rust has libraries for some of them, but in c rolling it yourself is the idiom. One I used in c++ and worked well as a retrofit was to overload new to grab the smallest chunk that would fit the allocation from banks of them. Profiling under load let the sizes of the banks be tuned for efficiency. Nothing had to know it wasn't a real heap allocation, but it was way faster and with zero possibility of memory fragmentation.

lifthrasiir · 2025-10-11T04:52:41 1760158361

Most pre-2010 games had to. As a prior gamedev after that period I can confidently say that it is a relic of the past in most cases now. (Not like that I don't care, but I don't have to be that strict about allocations.)

card_zero · 2025-10-11T06:32:03 1760164323

Because why?

user____name · 2025-10-11T10:28:59 1760178539

Virtual memory gets rid of a lot of fragmentation issues.

galangalalgol · 2025-10-11T14:08:32 1760191712

Yeah. Fragmentation was a niche concern of that embedded use case. It had an mmu, just wasn't used by the rtos. I am surprised that allocations aren't a major hitter anymore. I still have to minimize/eliminate them in linux signal processing code to stay realtime.

lifthrasiir · 2025-10-11T07:07:38 1760166458

Probably because hardwares became powerful enough that you can make a performant game without thinking much about allocations.

juped · 2025-10-11T05:41:35 1760161295

The normal practical version of this advice that isn't a "guy who just read about arenas post" is that you generally kick allocations outward; the caller allocates.

lelanthran · 2025-10-11T06:01:04 1760162464

They don't work for all use-cases, but they most certainly work for this use-case (HTTP server).

bheadmaster · 2025-10-11T05:17:34 1760159854

> Ideally, there is only one allocation and deallocation in the entire program.

Doesn't this techically happen with most of the modern allocators? They do a lot of work to avoid having to request new memory from the kernel as much as possible.

Asmod4n · 2025-10-11T06:16:36 1760163396

last time i checked, the glibc allocator doesnt ask the OS that often for new heap memory.

Like, every ~thousand malloc calls invoked (s)brk and that was it.

card_zero · 2025-10-11T06:31:25 1760164285

> there is only one allocation and deallocation in the entire program.

> Short-run programs are even easier. You just never deallocate and then exit(0).

What's special about "short-run"? If you deallocate only once, presumably just before you exit, then why do it at all?

free_bip · 2025-10-11T06:41:25 1760164885

Just because there's only one deallocation doesn't mean it's run only once. It would likely be run once every time the thread it belongs to is deallocated, like when it's finished processing a request.

lelanthran · 2025-10-11T06:00:18 1760162418

I agree, which is why I wrote an arena allocator library I use (somewhere on github, probably public and free).

1718627440 · 2025-10-10T21:38:19 1760132299

To reduce the amount of allocation instead of:

    struct parsed_data * = parse (...);
    struct process_data * = process (..., parsed_data);
    struct foo_data * = do_foo (..., process_data);

you can do

    parse (...) {
        ...
        process (...);
        ...
    }

    process (...) {
        ...
        do_foo (...);
        ...
    }

It sounds like violating separation of concerns at first, but it has the benefit, that you can easily do procession and parsing in parallel, and all the data can become readonly. Also I was impressed when I looked at a call graph of this, since this essentially becomes the documentation of the whole program.

ambicapter · 2025-10-11T01:55:35 1760147735

How testable is this, though?

1718627440 · 2025-10-11T10:04:02 1760177042

It might be a problem when you can't afford side-effects that you later throw away, but I haven't experienced that yet. The functions still have return codes, so you still can test, whether a correct input results in no error check being followed and that incorrect input results in an error check being triggered.

throwawaymaths · 2025-10-11T00:02:26 1760140946

is there any system where doing the basics of http (everything up to framework handoff of structured data) are done outside of a single concurrency unit?

btown · 2025-10-12T03:53:07 1760241187

Not exactly what you’re looking for, but https://github.com/simdjson/simdjson absolutely uses micro-parallel techniques for parsing, and those do need to think about concurrency and how processors handle shared memory in pipelined and branch-predicted operations.

lock1 · 2025-10-10T08:32:09 1760085129

Why does "good" C have to be zero alloc? Why should "nice" javaesque make little sense in C? Why do you implicitly assume performance is "efficient problem solving"?

Not sure why many people seem fixated on the idea that using a programming language must follow a particular approach. You can do minimal alloc Java, you can simulate OOP-like in C, etc.

Unconventional, but why do we need to restrict certain optimizations (space/time perf, "readability", conciseness, etc) to only a particular language?

bluetomcat · 2025-10-10T08:59:06 1760086746

Because in C, every allocation incurs a responsibility to track its lifetime and to know who will eventually free it. Copying and moving buffers is also prone to overflows, off-by-one errors, etc. The generic memory allocator is a smart but unpredictable complex beast that lives in your address space and can mess your CPU cache, can introduce undesired memory fragmentation, etc.

In Java, you don't care because the GC cleans after you and you don't usually care about millisecond-grade performance.

jstimpfle · 2025-10-10T21:01:40 1760130100

No. Look up Arenas. In general group allocations to avoid making a mess.

rictic · 2025-10-11T04:17:28 1760156248

If you send a task off to a work queue in another thread, and then do some local processing on it, you can't usually use a single Arena, unless the work queue itself is short lived.

jenadine · 2025-10-11T06:16:43 1760163403

I don't see how arenas solve the problems.

jstimpfle · 2025-10-11T12:59:13 1760187553

You group things from the same context together, so you can free everything in a single call.

estimator7292 · 2025-10-11T14:38:48 1760193528

No. Arenas are not a general case solution. Look it up

cogman10 · 2025-10-11T02:29:11 1760149751

> Why should "nice" javaesque make little sense in C?

Very importantly, because Java is tracking the memory.

In java, you could create an item, send it into a queue to be processed concurrently, but then also deal with that item where you created it. That creates a huge problem in C because the question becomes "who frees that item"?

In java, you don't care. The freeing is done automatically when nobody references the item.

In C, it's a big headache. The concurrent consumer can't free the memory because the producer might not be done with it. And the producer can't free the memory because the consumer might not have ran yet. In idiomatic java, you just have to make sure your queue is safe to use concurrently. The right thing to do in C would be to restructure things to ensure the item isn't used before it's handed off to the queue or that you send a copy of the item into the queue so the question of "who frees this" is straight forward. You can do both approaches in java, but why would you? If the item is immutable there's no harm in simply sharing the reference with 100 things and moving forward.

In C++ and Rust, you'd likely wrap that item in some sort of atomic reference counted structure.

lelanthran · 2025-10-10T09:21:00 1760088060

> Why does "good" C have to be zero alloc?

GP didn't say "zero-alloc", but "minimal alloc"

> Why should "nice" javaesque make little sense in C?

There's little to no indirection in idiomatic C compared with idiomatic Java.

Of course, in both languages you can write unidiomatically, but that is a great way to ensure that bugs get in and never get out.

codr7 · 2025-10-11T15:10:12 1760195412

In C, direct memory control is the top feature, which means you can assume anyone who uses your code is going to want to control memory through the process. This means not allocating from wherever and returning blobs of memory, which means designing different APIs, which is part of the reason why learning C well takes so long.

I started writing sort of a style guide to C a while ago, which attempts to transfer ideas like this one more by example:

https://github.com/codr7/hacktical-c

nxobject · 2025-10-11T20:05:52 1760213152

Echoing my sibling comment - thanks for sharing this.

jabits · 2025-10-11T18:05:07 1760205907

Thanks for sharing this work.

lock1 · 2025-10-11T08:08:22 1760170102

  > Of course, in both languages you can write unidiomatically, but that is a great way to ensure that bugs get in and never get out.

Why does "unidiomatic" have to imply "buggy" code? You're basically saying an unidiomatic approach is doomed to introduce bugs and will never reduce them.

It sounds weird. If I write Python code with minimal side effects like in Haskell, wouldn't it at least reduce the possibility of side-effect bugs even though it wasn't "Pythonic"?

AFAIK, nothing in the language standard mentions anything about "idiomatic" or "this is the only correct way to use X". The definition of "idiomatic X" is not as clear-cut and well-defined as you might think.

I agree there's a risk with an unidiomatic approach. Irresponsibly applying "cool new things" is a good way to destroy "readability" while gaining almost nothing.

Anyway, my point is that there's no single definition of "good" that covers everything, and "idiomatic" is just whatever convention a particular community is used to.

There's nothing wrong with applying an "unidiomatic" mindset like awareness of stack/heap alloc, CPU cache lines, SIMD, static/dynamic dispatch, etc in languages like Java, Python, or whatever.

There's nothing wrong either with borrowing ideas like (Haskell) functor, hierarchical namespaces, visibility modifiers, borrow checking, dynamic dispatch, etc in C.

Whether it's "good" or not is left as an exercise for the reader.

lelanthran · 2025-10-11T15:05:24 1760195124

> Why does "unidiomatic" have to imply "buggy" code?

Because when you stray from idioms you're going off down unfamiliar paths. All languages have better support for specific idioms. Trying to pound a square peg into a round hole can work, but is unlikely to work well.

> You're basically saying an unidiomatic approach is doomed to introduce bugs and will never reduce them.

Well, yes. Who's going to reduce them? Where are you planning to find people who are used to code written in an unusual manner?

By definition alone, code is written for humans to read. If you're writing it in a way that's difficult for humans to read, then of course the bug level can only go up and not down.

> It sounds weird. If I write Python code with minimal side effects like in Haskell, wouldn't it at least reduce the possibility of side-effect bugs even though it wasn't "Pythonic"?

"Pythonic" does not mean the same thing as "Idiomatic code in Python".

estimator7292 · 2025-10-11T14:53:13 1760194393

Good C has minimal allocations because you, the human, are the memory allocator. It's up to your own meat brain to correctly track memory allocation and deallocation. Over the last century, C programmers have converged on some best practices to manage this more effectively. We statically allocate, kick allocations up the call chain as far as possible. Anything to get that bit of tracked state out of your head.

But we use different approaches for different languages because those languages are designed for that approach. You can do OOP in C and you can do manual memory management in C#. Most people don't because it's unnecessarily difficult to use languages in a way they aren't designed for. Plus when you re-invent a wheel like "classes" you will inevitably introduce a bug you wouldn't have if you'd used a language with proper support for that construct. You can use a hammer to pull out a screw, but you'd do a much better job if you used a screwdriver instead.

Programming languages are not all created equal and are absolutely not interchangeable. A language is much, much more than the text and grammar. The entire reason we have different languages is because we needed different ways to express certain classes of problems and constructs that go way beyond textual representation.

For example, in a strictly typed OOP language like C#, classes are hideously complex under the hood. Miles and miles of code to handle vtables, inheritance, polymorphism, virtual, abstract functions and fields. To implement this in C would require effort far beyond what any single programmer can produce in a reasonable time. Similarly, I'm sure one could force JavaScript to use a very strict typing and generics system like C#, but again the effort would be enormous and guaranteed to have many bugs.

We use different languages in different ways because they're different and work differently. You're asking why everyone twists their screwdrivers into screws instead of using the back end to pound a nail. Different tools, different uses.

riedel · 2025-10-11T07:51:15 1760169075

A long time ago I was involved in building compilers. It was common that we solved this problem with obstacks, which are basically stacked heaps. I wonder one could not build more things like this, where freeing is a bit more best effort but you have some checkpoints. (I guess one would rather need tree like stacks) Just have to disallow pointers going the wrong way. Allocation remains ugly in C and I think explicit data structures are are definitely a better way of handling it.

fulafel · 2025-10-11T05:37:35 1760161055

This shared memory and pointer shuffling is of course fraught with requiring correct logic to avoid memory safety bugs. Good C code doesn't get you pwned, I'd argue.

jenadine · 2025-10-11T05:51:39 1760161899

> Good C code doesn't get you pwned, I'd argue.

This is not a serious argument because you don't really define good C code and how easy or practical it is to do. The sentence works for every language. "Good <whatever language> code doesn't get you pwned"

But the question is whether "Average" or "Normal" C code gets you pwned? And the answer is yes, as told in the article.

fulafel · 2025-10-11T07:01:02 1760166062

The comment I was responding to suggested Good C Code employes optimizations that, I opined, are more error prone wrt memory safety - so I was not attempting to define it, but challenging the offered characterisation.

wfn · 2025-10-13T13:38:40 1760362720

Agree re: no need for heap allocation - for others: I recommend reading thru whole masscan source (https://github.com/robertdavidgraham/masscan), it's a pleasure btw - iirc rather few/sparse malloc()s which are part of regular I/O processing flow (there will be malloc()s which depending on config etc. set up additional data structs but as part of setup).

fsckboy · 2025-10-12T04:11:57 1760242317

>Good C code will try to avoid allocations as much as possible in the first place.

there's a genius to this: if you're going to optimize prematurely, do it right out of the gate!

01HNNWZ0MV43FF · 2025-10-10T07:17:11 1760080631

Can you do parsing of JSON and XML without allocating?

veqq · 2025-10-10T07:32:34 1760081554

Of course. You can do it in a single pass/just parse the token stream. There are various implementations like: https://zserge.com/jsmn/

andrepd · 2025-10-10T21:59:41 1760133581

It requires manual allocation of an array of tokens. So it needs a backing "stack vector" of sorts.

And what about escapes?

int_19h · 2025-10-12T01:08:21 1760231301

For escapes you can mutate the raw buffer with data in place, since a single escape always expands to fewer characters than the escape itself.

bluetomcat · 2025-10-10T07:31:49 1760081509

Yes, you can do it with minimal allocations - provided that the source buffer is read-only or is mutable but is unused later directly by the caller. If the buffer is mutable, any un-escaping can be done in-place because the un-escaped string will always be shorter. All the substrings you want are already in the source buffer. You just need a growable array of pointer/length pairs to know where tokens start.

acidx · 2025-10-11T18:15:23 1760206523

Yes! The JSON library I wrote for the Zephyr RTOS does this. Say, for instance, you have the following struct:

    struct SomeStruct {
        char *some_string;
        int some_number;
    };

You would need to declare a descriptor, linking each field to how it's spelled in the JSON (e.g. the some_string member could be "some-string" in the JSON), the byte offset from the beginning of the struct where the field is (using the offsetof() macro), and the type.

The parser is then able to go through the JSON, and initialize the struct directly, as if you had reflection in the language. It'll validate the types as well. All this without having to allocate a node type, perform copies, or things like that.

This approach has its limitations, but it's pretty efficient -- and safe!

Someone wrote a nice blog post about (and even a video) it a while back: https://blog.golioth.io/how-to-parse-json-data-in-zephyr/

The opposite is true, too -- you can use the same descriptor to serialize a struct back to JSON.

I've been maintaining it outside Zephyr for a while, although with different constraints (I'm not using it for an embedded system where memory is golden): https://github.com/lpereira/lwan/blob/master/src/samples/tec...

zzo38computer · 2025-10-11T04:24:57 1760156697

It depends what you intend to do with the parsed data, and where the input comes from and where the output will be going to. There are situations that allocations can be reduced or avoided, but that is not all of them. (In some cases, you do not need full parsing, e.g. to split an array, you can check if it is a string or not and the nesting level, and then find the commas outside of any arrays other than the first one, to be split.) (If the input is in memory, then you can also consider if you can modify that memory for parsing, which is sometimes suitable but sometimes not.)

However, for many applications, it will be better to use a binary format (or in some cases, a different text format) rather than JSON or XML.

(For the PostScript binary format, there is no escaping, and the structure does not need to be parsed and converted ahead of time; items in an array are consecutive and fixed size, and data it references (strings and other arrays) is given by an offset, so you can avoid most of the parsing. However, note that key/value lists in PostScript binary format is nonstandard (even though PostScript does have that type, it does not have a standard representation in the binary object format), and that PostScript has a better string type than JavaScript but a worse numeric type than JavaScript.)

megous · 2025-10-11T12:08:47 1760184527

Yes, you can first validate the buffer, to know it contains valid JSON, and then you can work with pointers to beginings of individual syntactic parts of JSON, and have functions that decide what type of the current element is, or move to the next element, etc. Even string work (comparisons with other escaped or unescaped strings, etc.) can be done on escaped strings directly without unescaping them to a buffer first.

Ergonomically, it's pretty much the same as parsing the JSON into some AST first, and then working on the AST. And it can be much faster than dumb parsers that use malloc for individual AST elements.

You can even do JSON path queries on top of this, without allocations.

Eg. https://xff.cz/git/megatools/tree/lib/sjson.c

gritzko · 2025-10-10T07:32:27 1760081547

Yep, no problem. In place parsing only requires a stack. Stack length is the maximum JSON nesting allowed. I have a C dialect exactly like that.

Ygg2 · 2025-10-10T07:33:21 1760081601

Theoretically yes. Practically there is character escaping.

That kills any non-allocation dreams. Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate. If source is read-only you have to allocate. If source is mutable you have to waste CPU to rewrite the string.

deaddodo · 2025-10-10T21:13:48 1760130828

I'm confused why this would be a problem. UTF-8 and UTF-16 (the only two common unicode subsets) are a maximum of 4 bytes wide (and, most commonly, 2 in English text). The ASCII representation you gave is 6-bytes wide. I don't know of many ASCII unicode representations that have less bytewidth than their native Unicode representation.

Same goes for other characters such as \n, \0, \t, \r, etc. All half in native byte representation.

lelanthran · 2025-10-10T09:27:51 1760088471

> Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate.

Depends on what you are doing with it. If you aren't displaying it (and typically you are not in a server application), you don't need to unescape it.

mpyne · 2025-10-10T21:59:48 1760133588

And this is indeed something that the C++ Glaze library supports, to allow for parsing into a string_view pointing into the original input buffer.

_3u10 · 2025-10-11T04:27:54 1760156874

It’s just two pointers the current place to write and the current place to read, escapes are always more characters than they represent so there’s no danger of overwriting the read pointer. If you support compression this can become somewhat of and issue but you simply support a max block size which is usually defined by the compression algorithm anyway.

Ygg2 · 2025-10-11T12:02:44 1760184164

If you have a place to write, then it's not zero allocation. You did an allocation.

And usually if you want maximum performance, buffered read is the way to go, which means you need a write slab allocation.

lelanthran · 2025-10-11T16:04:55 1760198695

> If you have a place to write, then it's not zero allocation. You did an allocation.

Where did that allocation happen? You can write into the buffer you're reading from, because the replacement data is shorter than the original data.

Ygg2 · 2025-10-12T06:23:06 1760250186

You have a read buffer and somewhere where you have to write to.

Even if we pretend that the read buffer is not allocating (plausible), you will have to allocate for the write source for the general case (think GiB or TiB of XML or JSON).

lelanthran · 2025-10-12T06:28:00 1760250480

> You have a read buffer and somewhere where you have to write to.

The "somewhere you have to write to" is the same buffer you are reading from.

Ygg2 · 2025-10-12T07:48:10 1760255290

Not if you are doing buffered reads, where you replace slow file access with fast memory access. This buffer is cleared every X bytes processed.

Writing to it would be pointless because clears obliterate anything written; or inefficient because you are somehow offsetting clears, which would sabotage the buffered reading performance gains.

lelanthran · 2025-10-12T14:27:21 1760279241

Maybe I missed it, but ITT we were talking about C buffers, not buffered reads.

Ygg2 · 2025-10-12T16:50:18 1760287818

I thought we were talking about high performance parsing. Of which buffered reads are one. Other is loading entire document into mutable memory, which also has limitations.

topspin · 2025-10-10T21:53:31 1760133211

> Practically there is character escaping

The voice of experience appears. Upvoted.

It is conceivable to deal with escaping in-place, and thus remain zero-alloc. It's hideous to think about, but I'll bet someone has done it. Dreams are powerful things.

lelanthran · 2025-10-10T09:25:46 1760088346

> Can you do parsing of JSON and XML without allocating?

If the source JSON/XML is in a writeable buffer, with some helper functions you can do it. I've done it for a few small-memory systems.

self_awareness · 2025-10-11T10:21:07 1760178067

That mythical "Good C Code", which is known only to some people who I never met.

pjmlp · 2025-10-11T10:34:36 1760178876

These abstractions were already common in enterprise C code decades before Java came to be, thanks to stuff like Yourdon Structured Method.

Using fixed size buffers doesn't fix out of bounds errors, and stack corruption caused by such bugs.

Naturally we all know good C programmers never make them. /s

bluetomcat · 2025-09-24T11:20:12 1758712812

> These lies don’t just affect them but also the people reading it as they might never see what actually happens

This is what sustains this whole economic bubble built on debt and future promises. At all levels of society, you have these inflated unrealistic expectations and BS circulating in the media. Technically-incompetent but eloquent and charismatic CEOs predict that in 6 months, some major technological shift will happen. Managers preach about adjusting their organisations to these new realities. Workers have no choice but to play the game with all its dirty tricks, if they want to stay employed. Anyone who dares to say that the emperor has no clothes is isolated in a dark corner because they may suddenly deflate the value of the whole economy. This is corporate feudalism disguised as a competitive economy.

bluetomcat · 2025-09-17T06:30:38 1758090638

They were popular because there was no Unix culture in Eastern Europe at the time. Pretty much any computer geek was a DOS user. To me personally, it always seemed kind of lame because many of these people would not bother to properly learn the shell language.

bluetomcat · 2025-09-03T11:53:05 1756900385

Their English is sufficiently good. It's a cultural aspect regarding writing style. When Russians and most Eastern Europeans write about technical subjects, they tend to be concise, dense and straightforward. Americans, on the other hand, are over-expressive and tend to saturate their writing with pointless metaphors and rhetorical devices.

bluetomcat · 2025-08-12T08:28:32 1754987312

Rust encourages a rather different "high-level" programming style that doesn't suit the domains where C excels. Pattern matching, traits, annotations, generics and functional idioms make the language verbose and semantically-complex. When you follow their best practices, the code ends up more complex than it really needs to be.

C is a different kind of animal that encourages terseness and economy of expression. When you know what you are doing with C pointers, the compiler just doesn't get in the way.

eru · 2025-08-12T10:52:36 1754995956

Pattern matching should make the language less verbose, not more. (Similar for many of the other things you mentioned.)

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Alas, it doesn't get in the way of you shooting your own foot off, too.

Rust allows unsafe and other shenanigans, if you want that.

bluetomcat · 2025-08-12T12:11:23 1755000683

> Pattern matching should make the language less verbose, not more.

In the most basic cases, yes. It can be used as a more polished switch statement.

It's the whole paradigm of "define an ad-hoc Enum here and there", encoding rigid semantic assumptions about a function's behaviour with ADTs, and pattern matching for control-flow. This feels like a very academic approach and modifying such code to alter its opinionated assumptions isn't funny.

eru · 2025-08-13T01:40:38 1755049238

How is encoding all the assumptions and invariants badly in eg a bunch of booleans and nullable pointers any better?

za_creature · 2025-08-12T16:04:07 1755014647

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Tell me you use -fno-strict-aliasing without telling me.

Fwiw, I agree with you and we're in good[citation needed] company: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg...

bluetomcat · 2025-08-09T14:37:25 1754750245

The W210s did indeed rust badly and the interiors weren't on par with previous generations, but in purely mechanical terms, they were still solid cars. The diesels (particularly E250 TD and E290 TD) could cover 700k+ kilometres without any interventions to the engine or the transmission. The W211 is an improvement to the W210 in almost every aspect, and they are still plentiful on the roads in Eastern Europe.

dahauns · 2025-08-09T17:53:17 1754761997

>(particularly E250 TD and E290 TD)

Not a coincidence, though - these two still use those legendary OM602/OM605 diesels of its predecessor series.

fransje26 · 2025-08-09T16:05:44 1754755544

True, from experience, the E290 TD was mechanically solid. The electronics, less so unfortunately. Ours was plagued by intermittent errors and beeping, together with some parasitic battery drain we could not trace down despite our best efforts.

I didn't have the chance to own a W211, but from what I read and heard, it was indeed an improvement. Even in the looks department!

bluetomcat · 2025-08-04T08:07:53 1754294873

What a mess of an article. A pretentious mishmash of scattered references with some vague abstract claims that could be summarised in one paragraph.

flohofwoe · 2025-08-04T08:20:42 1754295642

Sort of fitting though, because C++ coroutines turned out quite the mess (are they actually usable in real world code by now?).

I think in the end it's just another story of a C++ veteran living through the inevitable Modern C++ trauma and divorce ;)

(I wonder what he's up to today, ITHare was quite popular in game dev circles in the 2010s for his multiplayer networking blog posts and books)

pjmlp · 2025-08-04T10:17:38 1754302658

They have been always usable in the real world, as they were initially based on async model of doing C++ programming in WinRT, inspired by .NET async/await.

Hence why anyone that has done low level .NET async/await code with awaitables and magic peoples, will fell right at home in C++ co-routines.

Anyone using WinAppSDK with C++ will eventually make use of them.

TuxSH · 2025-08-04T09:18:10 1754299090

> C++ coroutines turned out quite the mess (are they actually usable in real world code by now?).

They are, they are extensively used by software like ScyllaDB which itself is used by stuff like Discord, BlueSky, Comcast, etc.

C++ coroutines and "stackless coroutines" in general are just compiler-generated FSMs. As for allocation, you can override operator new for the promise types and that operator new gets forwarded the coroutine's function arguments

simonask · 2025-08-04T09:50:05 1754301005

They are compiler-generated FSMs, but I think it's worth noting that the C++ design was landed in a way that precluded many people from ever seriously considering using them, especially due to the implicit allocation. The reason you are using C++ in the first place is because you care about details like allocation, so to me this is a gigantic fumble.

Rust gets it right, but has its own warts, especially if you're coming from async in a GC world. But there's no allocation; Futures are composable value types.

captainmuon · 2025-08-04T12:58:24 1754312304

> The reason you are using C++ in the first place is because you care about details like allocation, so to me this is a gigantic fumble.

I wouldn't say that applies to everybody. I use C++ because it interfaces with the system libraries on every platform, because it has class-based inheritance (like Java and C#, unlike Rust and Zig) and because it compiles to native code without an external runtime. I don't care to much about allocations.

For me the biggest fumble is that C++ provides the async framework, but no actual async stdlib (file io and networking). It took a while for options to be available, and while eg Asio works nicely it is crazily over engineered in places.

pjmlp · 2025-08-05T07:12:05 1754377925

I like what Rust offers over C++ in terms of safety and community culture, but I don't enjoy being a tool builder for ecosystem gaps, I rather spend the time directly using the tools that already exist, plus I have Java and .NET ecosystems for safety, as I am really on the automatic resource management side.

Zig, is really Modula-2 in C's cloathing, I don't like the kind of handmade culture that has around it, and its way of dealing with use after free I can also get in C and C++, for the last thirty years, it is a matter of actually learning the tooling.

Thus C++ it is, for anything that isn't can't be taken over by a compiled managed language.

I would like to use D more, but it seems to have lost its opportunity window, although NASA is now using it, so who knows.

pjmlp · 2025-08-04T11:10:42 1754305842

The C++ model is that in theory there is an allocation, in practice depending on how a specific library was written, the compiler would be able to elide the allocation.

It is the same principle that drives languages like Rust in regards to being safe by default, in theory stuff like bounds checks cause a performance hit, in practice compilers are written to elide as much as possible.

gpderetta · 2025-08-04T17:31:52 1754328712

The required allocation make them awkward to use for short lived automatic objects like generators. But for async operations were you are eventually going to need a long lived context object anyway, it is a non-issue especially given the ability to customize allocators.

I say this as someone that is not a fan of the stackess coroutines in general, and the C++ solution in particular.

uep · 2025-08-04T11:31:01 1754307061

I think you missed an important point in the parent comment. You can override the allocation for C++ coroutines. You do have control over details like allocation.

C++ coroutines are so lightweight and customizable (for good and ill), that in 2018 Gor Nishanov did a presentation where he scheduled binary searches around cache prefetching using coroutines. And yes, he modified the allocation behavior, though he said it only resulted in a modest improvement on performance.

TuxSH · 2025-08-04T12:59:25 1754312365

You can write stuff like this:

  void *operator new(std::size_t sz, Foo &foo, Bar &bar) { return foo.m_Buffer; /* should be std::max_align_t-aligned \*/ }

and force all coroutines of your Coroutine type to take (Foo &, Bar &) as arguments this way (works with as many overloads as you like).

bluetomcat · 2025-07-30T16:13:41 1753892021

Big Tech drove us towards techno-feudalism. It's a wider social phenomenon and their hiring patterns for programmers are only one aspect of the problem. Small businesses are forced to do business on their platforms according to their rules, or else they go bust. Programmers are forced to learn their APIs, so that their "app" can live in their walled gardens. They soaked a huge amount of talent to optimise their ad and recommendation engines. This is a huge opportunity cost to society - that talent could be doing great creative stuff for small and medium-sized businesses instead.

gruez · 2025-07-30T17:25:06 1753896306

>Small businesses are forced to do business on their platforms according to their rules, or else they go bust. Programmers are forced to learn their APIs, so that their "app" can live in their walled gardens.

???

What are you talking about? For a typical fullstack app the proprietary bits probably account for less than 5% of the codebase.

>They soaked a huge amount of talent to optimise their ad and recommendation engines.

That's just PR/advertising/sales. If the companies didn't exist it's not like those job or efforts will disappear, they'll be allocated elsewhere, classified ads in newspapers for instance.

bluetomcat · 2025-07-29T05:26:55 1753766815

With a lot of fancy wording, the article basically proposes that slow-moving, bureaucratic educational institutions should catch up with TikTok’s latest algorithm, helping raise the next generation of influencers.

RajT88 · 2025-07-29T05:54:33 1753768473

To the effect it clearly proposes anything!

Perepiska · 2025-07-29T07:00:04 1753772404

"billions of influencers"

bluetomcat · 2025-07-23T06:25:13 1753251913

It’s good at matching patterns. If you can frame your problem so that it fits an existing pattern, good for you. It can show you good idiomatic code in small snippets. The more unusual and involved your problem is, the less useful it is. It cannot reason about the abstract moving parts in a way the human brain can.

carlmr · 2025-07-23T06:43:27 1753253007

>It cannot reason about the abstract moving parts in a way the human brain can.

Just found 3 race conditions in 100 lines of code. From the UTF-8 emojis in the comments I'm really certain it was AI generated. The "locking" was just abandoning the work if another thread had started something, the "locking" mechanism also had toctou issues, the "locking" also didn't actually lock concurrent access to the resource that actually needed it.

nyarlathotep_ · 2025-07-23T20:03:53 1753301033

> UTF-8 emojis in the comments

This is one of the "here be demons" type signatures of LLM code generation age, along with comments like

// define the payload struct payload {};

bluetomcat · 2025-07-23T07:01:28 1753254088

Yes, that was my point. Regardless of the programming language, LLMs are glorified pattern matchers. A React/Node/MongoDB address book application exposes many such patterns and they are internalised by the LLM. Even complex code like a B-tree in C++ forms a pattern because it has been done many times. Ask it to generate some hybrid form of a B-tree with specific requirements, and it will quickly get lost.

hombre_fatal · 2025-07-23T14:44:57 1753281897

"Glorified pattern matching" does so much work for the claim that it becomes meaningless.

I've copied thousands of lines of complex code into an LLM asking it to find complex problems like race conditions and it has found them (and other unsolicited bugs) that nobody was able to find themselves.

Oh it just pattern matched against the general concept of race conditions to find them in complex code it's never seen before / it's just autocomplete, what's the big deal? At that level, humans are glorified pattern matchers too and the distinction is meaningless.

nyrikki · 2025-07-23T15:10:00 1753283400

LLMs are good at needle in the haystack problems, specifically when they have examples in the corpus.

The counter point is how LLMs can't find a missing line in a poem when they are given the original.

PAC learning is basically existential quantification...has the same limits too.

But being a tool to find a needle is not the same as finding all or even reliability finding a specific needle.

Being being a general programming agent requires much more than just finding a needle.

hombre_fatal · 2025-07-23T15:18:37 1753283917

> The counter point is how LLMs can't find a missing line in a poem when they are given the original.

True, but describing a limitation of the tech can't be used to make the sort of large dismissals we see people make wrt LLMs.

The human brain has all sorts of limitations like horrible memory (super confident about wrong details) and catastrophic susceptibility to logical fallacies.

mckn1ght · 2025-07-23T15:45:43 1753285543

> super confident about wrong details

Have you not had this issue with LLMs? Because I have. Even with the latest models.

I think someone upthread was making an attempt at

> describing a limitation of the tech

but you keep swatting them down. I didn’t see their comments as a wholesale dismissal of AI. They just said they aren’t great at sufficiently complex tasks. That’s my experience as well. You’re just disagreeing on what “sufficiently” and “complex” mean, exactly.

unshavedyak · 2025-07-23T16:48:36 1753289316

> humans are glorified pattern matchers too and the distinction is meaningless.

I'm still convinced that this is true. The more advances we make in "AI" the more i expect we'll discover that we're not as creative and unique as we think we are.

kakapo5672 · 2025-07-23T17:28:13 1753291693

I suspect you're right. The more I work with AI, the more clear is the trajectory.

Humans generally have a very high opinion of themselves and their supposedly unique creative skills. They are not eager to have this illusion punctured.

bigfishrunning · 2025-07-23T17:02:37 1753290157

maybe you aren't...

unshavedyak · 2025-07-23T17:21:39 1753291299

Whether or not we have free will is not a novel concept. I simply side on us being more deterministic than we realize, that our experiences and current hormone state shape our output drastically.

Even our memories are mutable. We will with full confidence recite memories or facts we've learned just moments ago which are entirely fictional. Normal, healthy adults.

Workaccount2 · 2025-07-23T15:08:55 1753283335

Humans can't be glorified pattern matchers because they recognize that they aren't.[1]

[1]https://ai.vixra.org/pdf/2506.0065v1.pdf

The paper is satire, but it's a pretty funny read.

Isamu · 2025-07-23T16:20:53 1753287653

LLMs should definitely be used for brute force searches especially of branching spaces. Use them for what they do best.

“Pattern matching” is thought of as linear but LLMs are doing something more complex, it should be appreciated as such.

0points · 2025-07-23T14:55:45 1753282545

> it has found them (and other unsolicited bugs) that nobody was able to find themselves.

How did you evaluate this? Would be interested in seeing results.

I am specifically interested in the amount of false issues found by the LLM, and examples of those.

hombre_fatal · 2025-07-23T15:11:29 1753283489

Well, how do you verify any bug? You listen to someone's explanation of the bug and double check the code. You look at their solution pitch. Ideally you write a test that verifies the bug and again the solution.

There are false positives, and they mostly come from the LLM missing relevant context like a detail about the priors or database schema. The iterative nature of an LLM convo means you can add context as needed and ratchet into real bugs.

But the false positives involve the exact same cycle you do when you're looking for bugs yourself. You look at the haystack and you have suspicions about where the needles might be, and you verify.

0points · 2025-07-23T15:13:39 1753283619

> Well, how do you verify any bug?

You do or you don't.

Recently we've seen many "security researchers" doing exactly this with LLM:s [1]

1: https://www.theregister.com/2025/05/07/curl_ai_bug_reports/

Not suggesting you are doing any of that, just curious what's going on and how you are finding it useful.

> But the false positives involve the exact same cycle you do when you're looking for bugs yourself.

In my 35 years of programming I never went just "looking for bugs".

I have a bug and I track it down. That's it.

Sounds like your experience is similar to using deterministic static code analyzers but more expensive, time consuming, ambiguous and hallucinating up non-issues.

And that you didn't get a report to save and share.

So is it saving you any time or money yet?

hombre_fatal · 2025-07-23T15:24:23 1753284263

Oh, I go bug hunting all the time in sensitive software. It's the basis of test synthesis as well. Which tests should you write? Maybe you could liken that to considering where the needles will be in the haystack: you have to think ahead.

It's a hard, time consuming, and meandering process to do this kind of work on a system, and it's what you might have to pay expensive consultants to do for you, but it's also how you beat an expensive bug to the punchline.

An LLM helps me run all sorts of considerations on a system that I didn't think of myself, but that process is no different than what it looks like when I verify the system myself. I have all sorts of suspicions that turn into dead ends because I can't know what problems a complex system is already hardened against.

What exactly stops two in-flight transfers from double-spending? What about when X? And when Y? And what if Z? I have these sorts of thoughts all day.

I can sense a little vinegar at the end of your comment. Presumably something here annoys you?

0points · 2025-07-23T15:56:58 1753286218

> I can sense a little vinegar at the end of your comment. Presumably something here annoys you?

Thanks for your responses.

Really sorry about the vinegar, not intentional. I may have such personality disorder idk. Being blunt, not very great communication skills.

hombre_fatal · 2025-07-24T19:37:29 1753385849

It's ok, I do worse things on HN.

My vice is when someone writes a comment where I have a different opinion than them, and their comment makes me think of my own thoughts on the subject.

But since I'm responding to them, I feel like it's some sort of debate/argument even though in reality I'm just adding my two cents.

nzach · 2025-07-23T17:50:03 1753293003

> It can show you good idiomatic code in small snippets.

That's not really true for things that are changing a lot. I got a terrible experience last time I've tried to use Zig, for example. The code it generated was an amalgamation between two or three different versions.

And I've even got this same style of problem in golang where sometimes the LLM generates a for loop in the "old style" (pre go 1.22).

In the end LLMs are a great tool if you know what needs to be done, otherwise it will trip you up.

practice9 · 2025-07-23T07:06:37 1753254397

Humans cannot reason about code at scale. Unless you add scaffolding like diagrams and maps and …

Things that most teams don’t do or half-ass

samrus · 2025-07-23T08:38:22 1753259902

Its not scaffolding if the intelligence itself is adding it. Humans can make their own diagrams ajd maps to help them, LLM agentsbneed humans to scaffold for them, thats the setup for the bitter lesson