It's 32 bits only if you ignore microcontrollers and DSPs. Also, the size of 'lo...

WalterBright · on May 22, 2022

> It's 32 bits only if you ignore microcontrollers and DSPs.

I know. But you're not likely to port any apps between microcontrollers, DSPs, and general purpose machines.

> Also, the size of 'long int' is definitely not constant if your code needs to be portable between Windows and something else.

It's also not portable between Linux and Mac OSX. This makes `long int` a completely useless type. Use `int` and `long long`.

> that did not have same sized _char_

I remember one person who responded to me about a DSP that had sizeof(char)==32, and how it was wonderful that the C Standard accommodated that so code could be portable.

I challenged him to port the C implementation of `diff` to it and see how far he gets with it. I predicted total failure :-)

The C and C++ standards would do the programming world a favor by standardizing on:

1. 2's complement

2. fixed sizes for char, short, int and long

3. dumping any character sets other than Unicode

Machines that don't support that will require a customized compiler anyway, and it's very unlikely any code will be portable to it without significant rewrite.

oriolid · on May 22, 2022

> I remember one person who responded to me about a DSP that had sizeof(char)==32, and how it was wonderful that the C Standard accommodated that so code could be portable.

The support for 32-bit char doesn't automatically make code portable between architectures with different char sizes. But it's what makes it possible to write a C compiler at all for those weird architectures. It certainly doesn't make code that assumes things portable to architectures where those assumptions do not hold.

> The C and C++ standards would do the programming world a favor by standardizing on: 1. 2's complement 2. fixed sizes for char, short, int and long 3. dumping any character sets other than Unicode

This would be a great favour to embedded developers everywhere, because it would finally free us from the C legacy. Rust and Zig look promising, and at some point I even thought D might work but now I understand why it wouldn't. I wonder if you're aware of the standard sized type aliases int8_t, int16_t etc and what's your opinion of them.

WalterBright · on May 23, 2022

> it's what makes it possible to write a C compiler at all for those weird architectures.

Only if one is pedantic. There's no practical problem at all customizing C compiler semantics for weird architectures. After all, everybody did it for DOS C compilers.

> I even thought D might work but now I understand why it wouldn't

People do use it for embedded work. I don't know why you wouldn't think it would work.

> I wonder if you're aware of the standard sized type aliases int8_t, int16_t etc

I am. After all, I wrote stdint.h for Digital Mars C.

> and what's your opinion of them.

There are three of each:

    typedef long int32_t;
    typedef long int_least32_t;
    typedef long int_fast32_t;

1. Too many choices. I have experience with what happens when programmers have too many choices, with the differences between them being slight, esoteric and likely not substantive. They blindly pick one.

2. People have endless trouble with C's implicit integral conversions. This makes it combinatorically much worse.

3. int32_t makes sense for the first hour. Then it become annoying, and looks ugly. `int` is much beter.

4. `int` is 32 bits anyway. No point in bothering with stdint.h.

oriolid · on May 23, 2022

> Only if one is pedantic. There's no practical problem at all customizing C compiler semantics for weird architectures. After all, everybody did it for DOS C compilers.

Resorting to non-standard extensions locks you in with that specific compiler. This is the exact reason why standards exist in the first place.

> People do use it for embedded work. I don't know why you wouldn't think it would work.

The lead developer seems to have strong knee-jerk reactions to things that he does not understand and limited understanding of what microcontrollers are or what embedded software does. Who really cares about porting diff to a system that doesn't have filesystem or text console?

I agree that integral type promotions are a great way to shoot oneself in the foot, but the other explanations are not really convincing. If you did read the comments you are responding to, you should already know that int is not always 32 bits.

WalterBright · on May 23, 2022

You're already locked in with a specialized compiler for those unusual architectures, and despite being Standard conforming, it still isn't remotely portable.

> you should already know that int is not always 32 bits.

I also wrote a 16 bit compiler for DOS, with 16 bit ints. I know all about it :-) I've also developed 8 bit software for embedded systems I designed and built. I've written code for 10 bit systems, and 36 bit systems.

> Who really cares about porting diff to a system that doesn't have filesystem or text console?

I infer you agree the software is not portable, despite being standard conforming. As a practical matter, it simply doesn't matter if the compiler is standard conforming or not when dealing with unusual architectures. It doesn't make your porting problems go away at all.

I went through the Great Migration of moving 16 bit DOS code to 32 bits. Interestingly, everyone who thought they'd followed best portability practices in their 16 bit code found they had to do a lot of rewrites for 32 bit code. The people who were used to moving between the 16 and 32 bit worlds had little trouble.

C++ is theoretically portable to the 16 bit world, but in practice it doesn't work. Supporting exception handling and RTTI consumes all of the address space, leaving no space left for code. Even omitting EH and RTTI leaves one with a crippled compiler if extensions are not added to support the segmented memory model.

How do I know this? I wrote one. I lived it.

oriolid · on May 23, 2022

> I also wrote a 16 bit compiler for DOS, with 16 bit ints. I know all about it :-) I've also developed 8 bit software for embedded systems I designed and built. I've written code for 10 bit systems, and 36 bit systems.

I'm not sure how this means that int is 32-bit.

> I infer you agree the software is not portable, despite being standard conforming. As a practical matter, it simply doesn't matter if the compiler is standard conforming or not when dealing with unusual architectures. It doesn't make your porting problems go away at all.

I guess we could venture into arguing what "the software" and "portable" here means. What I mean that I was working on standard-conforming C codebase that worked correctly on both architectures I mentioned above. This is what I consider portable. Having standard-conforming compiler for both does not make the problems go away, but it makes things much easier than having two non-conforming almost-but-not-exactly-C compilers or totally proprietary languages.

I know that DOS was bad. It's been more than 20 years now. Let's get over with it.

andrewaylett · on May 23, 2022

https://www.analog.com/en/products/landing-pages/001/sharc-p... has a 32-bit word length. It's been a few years since I worked with them, but the compiler had a "native" and a "compatible" mode -- the first where `char` is 32 bits, the second where we multiply each pointer by four before exposing it to C so we can pretend that we're addressing individual 8 bit char elements within the 32 bit hardware word. Much less efficient, much easier to program for. Don't try running code in compatibility mode without enabling compiler optimisations.

astrange · on May 23, 2022

C does require 2’s complement now and removed trigraphs. I’m not sure they have much contact with Unicode since nobody uses the “wide char” stuff.

As for variable sized int types, it’s a nice idea but it’s really unlikely your C program is actually portable across them. You haven’t got good enough test coverage for one thing. If someone wants to do that in a new language users should be required to declare which sizes of char/long they actually expect to work.

WalterBright · on May 23, 2022

In D it is ridiculously simple:

    byte = 8 bits
    short = 16 bits
    int = 32 bits
    long = 64 bits

    add `u` prefix for unsigned version

22 years of experience shows this works very, very well, and is very portable.

The only variable size is size_t and ptrdiff_t, but we're kinda stuck there for obvious reasons.

There have been some issues with printf formats mismatching when moving D code between systems, but we solved that by adding a diagnostic to the compiler to scan the printf format against the argument list and tell you how to fix the format.

astrange · on May 23, 2022

I think that makes perfect sense, but for some reason people want their max file sizes and max array sizes to be different on different CPU architectures without testing it or anything.

oriolid · on May 26, 2022

It goes the other way too. One of my worst memories from Java was when OpenJDK authors decided that they want the maximum array size to be the same (signed int32 range) on different CPU architectures and heap sizes without testing it or anything. It turned out that with enough small objects, that counter overflows. Of course having 2^29 tiny objects around was huge overhead to begin with so I understand why it hadn't been tested and why it doesn't usually happen in real life, but it would have been pretty easy avoid with simple "if it's a size, use size_t" heuristic.

After you accept that size_t can be platform specific and there's nothing scary about it, the format string issue is solved with something like "%z".

carlssonscar · on May 23, 2022

I used to be a good citizen and used size_t for all my buffer sizes, data sizes, memory allocation sizes, etc but I have abandoned that completely.

If my program needs to support >4GB sizes I need 64-bit always, makes no sense to pretend "size_t data_size" could be anything else, and if it doesn't need to support that, using 8 byte size_t on 64-bit machines makes to sense either, just wasting memory/cache line space for no reason.

astrange · on May 23, 2022

They do have some use as a “currency type” - it’s fine if all your code picks i32, but then you might want to use a library that uses i64 and there could be bugs around the conversions.

And C also gives you the choice of unsigned or not. I prefer Google’s approach here (never use unsigned unless you want wrapping overflow) but unfortunately that’s definitely something everyone else disagrees on. And size_t itself is unsigned.