Hacker News new | past | comments | ask | show | jobs | submit login
Progress on C23 (thephd.dev)
98 points by ingve on Sept 5, 2021 | hide | past | favorite | 56 comments



>I should also note that C23 will also have Binary Integer Literals, so the same number can be written out in a more precise grouping of binary as well:

  const unsigned magical_number = 0b0110'0001'0110'0011'0110'0001'0110'0010;
Finally. It’s mystifying to me why this took so long to add; not everyone is fluent in translating hex digits to binary nybbles on the fly. This will greatly help with clearly defining and manipulating bitfields. The fact that digit separators can be placed arbitrarily will further help delimit fields of variable width, e.g.

  #                    A B C  D
  uint8_t flags = 0b00’1’0’11’01;
A and B are binary fields; C and D are quaternary fields. The first two bits are padding. This is a lot more readable than

  uint8_t flags = 0x2D;


I added 0b binary literals to the Datalight C compiler nearly 40 years ago, and put them in D, too. It sounds like a great idea, but it turns out to just not be useful in practice. One of the problems is the unwieldy length of them (and yes, you can group them in D with _ ).

I predict people will use it for a while, then abandon it.


They are fantastic for writing tests against any function that might need to bit twiddle.


Yes, that is the promise. Try it for a while, you'll see what I mean.


Already have, there are many more like that on a branch I haven't committed yet. I'm very happy with it:

https://github.com/ityonemo/Primes/blob/f157b63d91375c62f110...

I've also verified that binary representations of unusual datatypes are correct and have the expected properties (~>5 yr ago, paid gig, actually my first paid gig):

https://github.com/interplanetary-robot/SigmoidNumbers/blob/...

https://github.com/interplanetary-robot/SigmoidNumbers/blob/...

Extremely happy with it in all cases. You can't convince me otherwise. Zig and Julia would be poorer languages if not for binary literals. My life would have been more annoying without it. It's a tool I have reached for in the past and will continue reaching for when appropriate and tasteful, I have not "abandoned it". At least, I'm one datapoint that refutes your assertion.


> I'm one datapoint that refutes your assertion.

There's always one!

BTW, D does support 0b:

https://dlang.org/spec/lex.html#BinPrefix


It's highly unlikely I will ever use a language whose creator very wrongly assumes I don't know what I'm talking about when I talk about programming ergonomics.


I’ve used them for almost 20 years in Ruby, several years in Swift. I do find them useful in those contexts.

I haven’t used the C2x version, but one thing I expect them to be useful for is self-documentation of enums representing bitmask values.


> useful for is self-documentation of enums representing bitmask values.

I used to do that, and wound up back using hex. Hex is easier to read for me, as I don't have to count the digits.


why not

    struct Foo{
        bool some_flag : 1;
        bool other_flag : 1;
        ....
    }

?


Because a struct of bools != a bitfield. C booleans as defined in stdbool.h are just convenience macros aliasing "true" to 8 bit integer 1 and "false" to 0. AFAIK, there aren't any compilers smart enough to implicitly pack a struct of bools into a bitfield, and then make member access operations implicitly mask the bitfield. Here's a quick example, using the latest GCC: https://godbolt.org/z/c7T54jzGT


I wrote a little solver for some programming puzzle. Thought I was being clever by using bitfields for an array of booleans to reduce memory usage and bandwidth, as they seemed a natural fit for the problem I was solving.

Turned out that it was actually significantly faster to use one byte per boolean and forgo the masking operations. I assume the processor was just good enough at keeping its cache filled in that particular workload, so the additional masking operations just slowed things down. So I understand why you might not want a compiler to automatically do this.


Not sure if they edited the comment after you commented, but that is a bitfield now, at least


You're right, I didn't see the :1's. Maybe it was edited, maybe I was just blind?


what does

  struct X {
    _BitInt(1) a;
    _BitInt(1) b;
  };
do now?


The article didn't mention fixing C's Biggest Mistake https://www.digitalmars.com/articles/C-biggest-mistake.html which has a simple and backwards compatible fix for C buffer overflows, probably the single biggest cause of memory safety bugs.


Hi! Article author here. I have long, long since waxed poetic about how many bugs and problems this can solve (even just a poor man's library version):

https://twitter.com/__phantomderp/status/1381314735174524928

And, very recently, I have begun to scheme and agitate for a feature similar to what your article proposes:

https://twitter.com/__phantomderp/status/1424466518797135876

I am not sure people will go for `..`, and I would also like to find a way to enable composition and multi-dimensionality in an easier fashion (nesting arrays, for example, does not require that all of the memory is laid out perfectly flat. This is the case in both C and C++, and is taken advantage of by e.g. ASan and other memory-shadowing runtimes that add shadowing cushion around each array member).

As a Standards Committee person, I can't really just demand "and this will go into the standard"; our charter requires 2 existing C compilers/stdlibs to implement the thing (unless the thing is so dead-simple we can just shake hands and implement it because it's not contentious) and then, after that, requires consensus (which is its own roadblock to getting things done when someone doesn't like the syntax/way-you-are-doing-it/gets-in-the-way-of-their-implementation).

So, for example, if the C parser in D were to support this syntax, and someone else were to support some kind of syntax for this, and they all got together and wrote a paper for this idea, that would count as two implementations..........

Hint hint. Wink wink. Nudge nudge? 0:D


It is indeed dead simple :-)

The spec, the implementation, and using it. It's by far the simplest scheme I've ever seen proposed for C. You can see this from the article.

I suppose I could add it to D's ImportC:

https://dlang.org/spec/importc.html

even though ImportC's charter is to implement C11 as it stands and not fix anything.

But would the committee accept DasBetterC as an implementation?

https://dlang.org/spec/betterc.html

DasBetterC does implement the slices.


We tend to accept anything that parses the C Standard - even only a part of it - as a viable implementation! For example, static analyzers are viable implementations of the standard, even if they don't produce code.

I think getting this feature into one other implementation and then writing a paper on it would help. Unfortunately, the cutoff for "new papers that must be submitted to be considered for C23" is October, so there's not a LOT of time, so we'd need to find a 2nd implementation quick-fast, and then write the proposal quick-fast too!


Does Apple have representation on the committee? Have they briefed anyone on this?

https://support.apple.com/guide/security/memory-safe-iboot-i...


Apple made a big-huge list of their extensions and things they've done some years back, which included their Blocks extensions: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1370.pdf

Apple (and others) have also very strongly informed us of the zeroing-things-out work they've done and how it saves them quite a bit of problems. There are folks on the Committee who value their already-existing users and implementations more, where indeterminate initialization provides them the performance and control they like. They also don't want to make those people have to change their code to init things. It's not a direction I agree with, but you have to remember that people like me have a much larger burden of proof. Indeterminate initialization is the status quo, and changing the status quo requires a paper, attending meetings, convincing others, and passing a vote. It's very much an uphill battle, and you have to bring a LOT of evidence to the table to fix it, and even after you bring that evidence you're required to prove it deserves to be in the standard. A lot of work!

For initialization, I am hoping to standardize "= {}" as a way to guarantee a proper static initialization. (Paper here, but needs some more work: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Consist...) Committee has received it favorably, and it should make C23.

For other things, you might need to lean on C23's attribute feature (e.g. "[[clang::must_initialize]]" and stuff) to provide some of that functionality. Not quite ideal, but attributes have standard placement and are infinitely extensible by implementations, so it gives vendors room to provide users what they need while the Standards Committee chews through proposal after proposal to get things done.

Hopefully this is a little helpful to you about the process!


What's gross to me is compilers often know a function has been passed a pointer to fixed sized object. And have hacky ways to determine that at run time. Not to mention all these compilers do know what a phat pointer is.

Says to me the problem is the people on the standards committee.


I can understand C's reluctance to add features and complexity, anything that would make it not C. But the memory safety issue is such an enormous problem, and the fix is so simple, and the fix has been proven in 20 years of constant use in D, that I just can't see not incorporating it.


How about adding a compiler switch like std=c23-heretical or -c23-make-baby-richie-cry.

And then let programmers decide if they want to live in the dark ages.


A switch isn't needed for my proposal. You can mix & match it with legacy code.


Apostrophes for integer-literal separators?!

C'mon, EVERY other language I've ever worked with that had them uses underscores.

Why did you do this =(

Grateful there is now any symbol for this, but this an incredibly un-intuitive one.

Programming languages use underscores, (most) countries use commas (in varying decimal group positions).

What a confusing choice.


C++ uses apostrophes. If you want to use numbers in header files that are common to both C and C++ code, you want the syntax to be the same.


Welp, yeah I'd say that's a pretty good reason.

I then shift blame to whomever decided that was a good idea for C++, tossing it on the towering pile of similar questions directed towards C++'s architecture.


it would conflict with user defined literals which were there before:

  auto operator""_01(unsigned long long l) {
    return 0;
  }

  int x = 100_01; // x == 0
so, as always, backward compat :D

(grepping in my ~ I see at least one occurence of `operator "" _0b` and if you allow `_0b` it's going to be hard to justify not allowing `_0123456`).


A leading underscore just makes it an identifier. After all, ' has a similar ambiguity - is '0' an integer or a character? We use embedded _ in D with no problems.


Nice. I welcome the inclusion of UTF16 and UTF32 types being standardized, and `stdckint.h` looks nice.

That said, I left C for D years ago. I get easy, almost transparent use of C libraries, ability to run with or without GC (`-betterC`), metaprogramming, digit group separators (one of the changes slated for C23), UTF16/32 types, and an amazing standard library. (Full disclosure, much of the std library Phobos is GC dependent but this is being worked on)


Do you use "-BetterC" (WorseD) flag though? Am curious. Also yes, D is love.


LOL. In truth, I use so much `std` (which is an amazing piece of work) that I don't use `-betterC`. I do sometimes do my own memory allocs using `std.allocator` though.


My favorite promises of c2x:

  Closures: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2737.pdf
  Type inference for variable definitions and function returns: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2735.pdf
  Type generic programming: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2734.pdf
  Defer mechanism: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2589.pdf
What is miss the most: the preprocessor still has no macro that modifies or creates a new macro.


At what point does this just become C++ with less library support?


When templates are included, which I assume will never happen.


I feel quite the opposite - this is would be like a mistake like VLA * 100000.


What I keep hoping for is the adoption of C++ const-correctness rules for pointers-to-pointers(and further). They really are safe, and allow for more safely handling multidimensional arrays in a correct and C idiomatic way.


I am learning C now for the first time and trying to use as many of the new quality of life features as possible, at least for my personal projects.

It’s good to see that the standard committee is adding these new things that make C easier to use.

It’s been a bit of a struggle to stick just with C, because a lot of people I see teaching/writing modern C, just write C in a cpp file, and cherry pick the c++ features they want.

I wonder how many standards we would have to go through, before the people that are writing C+ (C with some C++ but no classes, RAII, etc) to be converted back to plain old C


>this also means that you can use it to print the “low” bits of an int when trying to just read the low-end bits as well, such as by using printf("%w8d", 0xFEE);

Note that this happens to work when the argument's type is int (or anything smaller) only because of default argument promotions. For larger types it will cause undefined behaviour. So, for example, printf("%w8d", 0x1LL) is not legal.


Insane that it’s this easy to cause undefined behavior. Forgot what size a “long long” is on your arch when writing a print statement? Undefined behavior.


Multiplying unsigned shorts is famously (?) undefined behavior on most platforms.

They get promoted to int, and the result overflows.

https://stackoverflow.com/questions/33732489/


This seems like one of the more innocuous ones that’s automatically caught by compiler warnings. Admittedly projects have to turn that on and fix warnings (or build with errors) and not all do. The standard should really mandate that certain kind of warnings today should just be unconditional compiler errors.


AIUI the lack of checked division is that integer division always produces an equal or smaller number, and therefore can't overflow.


Integer division can overflow: INT_MIN / -1.


Oooh, you're right; then like OP I'm really at a loss.


I missed what the free_sized() is good for. From a naive look it seems redundant.

Still no computed goto?

Some progress on standardizing inline assembler syntax would be nice too.

Or maybe it won't matter anymore because everyone will be using either gcc or clang.


Modern allocators divide allocations into "size classes" based on the size of the allocation (normally rounded up to some power of 2 or whatever.) When you call something like free(m), the allocator will often have to figure out what size class 'm' was put into before it can proceed. For example, once you free some memory you might not return it to the OS, but keep it around in a cache structure so that it can be re-used. You can only put 'm' into the right cache if you compute the size class and use that. If the metadata for the object is "out of band" (i.e. not next to it directly), figuring out the size class of 'm' might be a little costly.

If your allocator can't use the sizing information when free'ing, it can just be a no-op. If it can, it can result in some performance gains.

You also get the bonus the allocator can run stricter consistency checks on the object itself i.e. free(m,size) can ensure that 'm' actually is of the given 'size', or abort the program. This can help find and catch latent bugs.


sized free is good for safety if your malloc cross-checks the size argument with its metadata, and for performance if your malloc assumes the size argument is correct without having to hit malloc's internal metadata.

There's also a middle ground where you get more ILP by assuming the size argument is correct (for performance), and overlap the work of `free`ing memory with a confirmation that the size argument matches malloc's internal metadata.


I'm still crossing my fingers for a defer mechanism.


I'm a member of the C standard group and i strongly opposed defer. It creates an invisible jump, at the point of the execution. Its essentially a "comefrom" statement, a structure that was created as a joke trying to find something worse than goto. I'm much rather have people use goto. At least you can see where the goto statement is and find the label where it is going.


er... okay... can't this be said for... literally any abstraction in C? a function which does not return a value which has no return statement has a ... invisible jump. A for loop, while loop, if statement... all have "invisible" jumps... Would you rather C had a conditional statement which could only take a single instruction or a goto? Why not just have an assembly language? Having a defer statement which queues up blocks of code for execution at the end of a scope seems far far cleaner than the extremely error-prone goto cleanup style (or any non-goto alternative for it). Yes the jumps are implicit but it's not exactly that difficult to figure out what gets jumped to and when... at least unless you have really excessively long functions in which case everything becomes hard to track (including those "invisible jumps" in for loops, while loops and if statements).


That sounds like a serious misunderstanding of the common use case to me. Have people on the C committee never understood the value of RAII in C++? It’s such a huge safety improvement for ensuring that, for example, a mutex is unlocked — or a file closed — on early return.


The problem with RAII is that unlike defer it requires you to have an exception mechanism. RAII and Exceptions are completely interdependant and you can't use one without the other (you're basically guaranteed leaks if you use exceptions without RAII and to implement RAII you need exceptions).

It's valuable but let's not confuse it with defer which allows a great deal of safety without requiring exceptions.


What is stuck layout for _BitInt(N)? Can we please please please have a "super packed" to deprecate bitfields?


Still implementation-defined, like any other field member! If you want to reply on bit-blasting structs, you need to shake hands with your implementer / ask for attributes. Not the perfect situation but there's way too many architectures for us to go yelling at people demanding they lay things out this way or that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: