When I dealt with this, there were a couple major gotchas:
* Compilers seem to reliably detect byteswap, but are(were) very hit-or-miss with the shift patterns for reading/writing to memory directly, so you still need(ed) an ifdef. I know compilers have improved but there are so many patterns that I'm still paranoid.
* There are a lot of "optimized" headers that actually pessimize by inserting inline assembly that the compiler can't optimize through (in particularly, the compiler can't inline constants and can't choose `movbe` instead of `bswap`), so do not trust any standard API; write your own with memcpy + ifdef'd C-only swapping.
* For speaking wire protocols, generating (struct-based?) code is far better than writing code that mentions offsets directly, which in turn is far better than the `mempcpy`-like code which the link suggests.
C programmers say things like "C is close to the machine"(+) and then have to write a huge elaborate macro in order to fool the compiler into generating a single machine instruction which is the one they actually want.
One of these days I'll write my "actually typesafe portable macro assembler" project. It will be .. opinionated.
People want to have their cake (not having to think about instruction names, code generation for arithmetic expressions, or register allocation, because those differ per platform) and eat it (generate reasonably predictable machine code for multiple different platforms).
Things like LLVM IR are pretty close, but understandably nobody wants to write IR by hand.
I can't remember the source, but AFAIK packed structs aren't mandated in the specification. It's terrible because a programmer will OFTEN see that result, on any architecture they're familiar with; however it isn't guaranteed to be portable.
Wow, that SO post is full of very bad answers. Regardless ...
Even ignoring compiler extensions, you don't have to memcpy the struct to the network directly (though this necessarily must be doable safely due to the fact that <net/> headers use it). Instead, define an API that takes loose structs, and have generated code do all the swapping and packing from the struct form to the bytes form.
If you're careful, you might make it such that on common platforms this optimizes into just a memcpy. But if the struct can get inlined this doesn't actually matter.
> Repeat it like a mantra. You mask first, to define away potential concerns about signedness. Then you cast if needed. And finally, you can shift. Now we can create the fool-proof version for machines with at least 32-bit ints:
I don't think this works just because she's masking it. I'm pretty sure it's working because she cast (255 & p[0]) to uint32_t, and so all the other operands get promoted to uint32_t as well. I have this working with just casting to unsigned char first:
Edit: actually, it works for me even without casting to uint32_t, without UBSan causing a runtime error, like in the article, so I don't know what's going on.
> I don't think this works just because she's masking it. I'm pretty sure it's working because she cast (255 & p[0]) to uint32_t
You're right, masking does nothing to solve the problem, what does it is the cast to uint32_t. Masking only helps if you're working with (signed) char, which is a bit silly.
Your example works without the cast because the example in the article doesn't show the problem. That is, the following code doesn't have undefined behavior:
Thanks for the explanation! That makes sense. I knew from reading K&R that, in converting a char to an int without sign extension, casting to unsigned char first is just as good as masking by 0xFF after, and I guess I was just not sufficiently familiar with the type promotion rules of C.
This now causes a runtime error:
#include <stdio.h>
#include <stdint.h>
char b[4] = {0x82,0x03,0x04,0x80};
#define UC(x) ((unsigned char) (x))
#define READ32BE(p) (uint32_t)UC(p[3]) | UC(p[2]) << 8 | UC(p[1]) << 16 | UC(p[0]) << 24
int main(void) {
printf("%08x\n", READ32BE(b));
}
----
endian.c:8:22: runtime error: left shift of 130 by 24 places cannot be represented in type 'int'
82030480
> Clang and GCC are reaching for any optimization they can get. Undefined Behavior may be hostile and dangerous, but it's legal. So don't let your code become a casualty.
Perhaps we need a -Wundefined-behaviour so that compilers print out messages when they use those type of 'tricks'. If you see them you can then choose to adjust your code in a way that it follows defined path(s) of the standard(s) in question.
I always thought the problem with this was that the compilers do loads of these optimisations in very mundane ways. Eg if I have a
#define FOO 17
void bar(int x, int y) {
if (x + y >= FOO) {
//do stuff
}
}
void baz(int x) {
bar(x, FOO);
}
the compiler can inline the call to bar in baz, and then optimise the condition to (x>=0)… because signed integer overflow is undefined, so can’t happen, so the two conditions are equivalent.
The countless messages about optimisations like that would swamp ones about real dangerous optimisations.
I have to admit I don't really understand the logic behind how compiler writers got to the point where they were able to interpret "undefined behavior" as "will not happen"
My understanding is that undefined behavior is in the spec because the various physical machines that C could compile to would each do something different. So to avoid stepping on toes the spec basically punted and said we are not going to define what happens. Originally this was fine. the C compiler would dutifully generate the machine instructions and the machine would execute them returning the result. The wrench was thrown into the works when we started getting optimizing compilers. I would argue the correct interpretation of "undefined behavior" in the face of a optimizing compiler is "unknown" but then it would sort of behave like the sql NULL(each unknown is different from every other unknown) and act as an optimization fence. When you have to ask the machine what the result of this operation is, it is hard to optimize around it ahead of time.
So my best guess as to why they decided to read "undefined" as "can not happen" is that this is the interpretation they can best optimize for. And nobody really pushed back because what the hell does "undefined" mean anyway? My read is that if the spec wanted to say "can not happen" the spec would have said "can not happen"
As I understand it, the C spec defines what are valid programs, and for valid programs, either specifies what their observable side effects must be, or leaves them either unspecified or implementation defined. Importantly, programs with undefined behaviour are excluded from the class of valid programs; thus the spec imposes no requirement on the resulting behaviour. To quote,
> Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to […]
And I think this is where “can’t happen” comes in: in the case of undefined behaviour, the compiler is free to emit whatever it pleases, including pretending it cannot happen!
"Can't happen" comes from the fact that the compiler optimizes for defined behaviour, not undefined behaviour. Making defined behaviour fast at the expense of undefined behaviour looks the same as assuming undefined behaviour can't happen in a number of cases. Eg. if you dereference a pointer then check it for null, naturally the compiler will remove the later check. We can describe that as the compiler making the defined case fast, or we could describe it as the compiler assuming the pointer "can't" be null. The latter description is easy to think about, and generalizes broadly enough, to be a common way of describing it.
Yes: that loop is miserable and we should upgrade it to use size_t (and preferably ++i), at which point the undefined behavior required to make that abomination fast wouldn't matter anymore.
I mean, why would you ever prefer to post-increment an integer whose prior value you didn't need? Not enough time studying iterators to make that distinction obvious, perhaps? Regardless, the context here is that we should stop relying so much on the compiler to paper over differences in code that don't by the user anything except a form of quaint familiarity with old code: the i++ relies on the optimizer knowing it doesn't have to keep the prior value, while ++i doesn't, and yet somehow I guess you are enamored with i++ despite it being exactly the same characters to type, merely in a different order. Stop using int, stop using post-increment, and sure: maybe even stop using < when you semantically want != (though I explicitly didn't go there because that isn't--as far as I know--going to affect the performance with a dumber compiler).
Pattern matching is a big thing in programming. Not in C++, as ++i executes arbitrary code found an unbounded distance from the programmer (so ++i vs i++ might be a very important distinction) but in other languages it helps a lot.
for (int i=0, j=0; i < N && j < M; i+=2, j++)
i++ looks like i+=1 so fits better in loops iterating over multiple things.
Also that int says "these values don't overflow" very concisely and the more modern cleverer C++ using iterator facades probably fails to communicate that idea to anyone.
edit: amused to note on checking for typos that I wrote that with < instead of != which violates your other recommendation, which is one I'm in favour of in theory but apparently don't write out by default.
Neither of those things are undefined behavior, but gcc with -Wextra already warns about both:
warn.c:6:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
6 | for (int i = 0; i < n; i++)
| ^
warn.c:10:36: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
10 | for (size_t i = sizeof(var)-1; i >= 0; --i)
| ^~
(even for the relatively high-profile and easy-sounding "detect if the compiler has deleted a line of code in a way which alters observable behavior of the program"!)
`-fsanitize=undefined` enables some runtime warnings/checks, `-Wundefined-behaviour` would presumably enable some kind of compile time warnings/checks.
You actually don't. Plenty of languages and compilers manage to be less programmer-hostile without a written standard. Unfortunately GCC/Clang have prioritised getting higher numbers on benchmarks.
> Unfortunately GCC/Clang have prioritised getting higher numbers on benchmarks.
Because that's what people actually want. There's no untapped market for a C compiler that doesn't aggressively optimize. The closest you'll get is projects selectively disabling certain optimizations, like how the Linux kernel builds with `-fno-strict-aliasing`.
> There's no untapped market for a C compiler that doesn't aggressively optimize.
There probably isn't now, because people who want sanity have moved on from C.I do think that years ago a different path was possible.
> The closest you'll get is projects selectively disabling certain optimizations, like how the Linux kernel builds with `-fno-strict-aliasing`.
Linux also does `-fno-delete-null-pointer-checks` and probably others. I don't think they have a principled way of deciding, rather whenever an aggressive optimization causes a terrible bug they disable that particular one and leave the rest on.
No, only a small amount of people care about compiler benchmarks... which are not representative of general-purpose code anyway since they seem more interested in testing vectorisation and other dubious features that are only important for a tiny subset of everything a compiler gets used for.
That said, ICC and MSVC can achieve similar or better results without going insane with UB, so it's clearly not as strictly necessary as the lame pedants want you to believe.
These days I think the sane option is to just add a static assert that the machine is little endian and move on with your life. Unless you're writing glibc or something do you really care about supporting ancient IBM mainframes?
Plenty of binary formats out there containing big endian integers. An example is the format I'm dealing with right now: ELF files. Their endianness matches that of the ELF's target architecture.
Apparently they designed it that way in order to ensure all values in the ELF are naturally encoded in memory when it is processed on the architecture it is intended to run on. So if you're writing a program loader or something you can just directly read the integers out of the ELF's data structures and call it a day.
Processing arbitrary ELF inputs requires adapting to their endianness though.
The fun really starts if you have a CPU using big endian and a bus using little endian..
Back in the late 1990s, we moved from Motorola 68k / sbus to Power/PCI. To make the transition easy, we kept using big endian for the CPU. However, all available networking chips only supported PCI / little endian at this point. For DMA descriptor addresses and chip registers, one had to remember to use little endian.
I feel like we're at a point where you should assume little endian serialization and treat anything big endian as a slow path you don't care about. There's no real reason for any blob, stream, or socket to use big endian for anything afaict.
If some legacy system still serializes big endian data then call bswap and call it a day.
The internet is big-endian, and generally data sent over the wire is converted to/from BE. For example the numbers in IP or TCP headers are big-endian, and any RFC that defines a protocol including binary data will generally go with big-endian numbers.
I believe this dates from Bolt Baranek and Newman basing the IMP on a BE architecture. Similarly computers tend to be LE these days because that's what the "winning" PC architecture (x86) uses.
The low-level parts of the network are big-endian because they date from a time when a lot of networking was done on big-endian machines. Most modern protocols and data encodings above UDP/TCP are explicitly little-endian because x86 and most modern ARM are little-endian. I can't remember the last time I had to write a protocol codec that was big-endian; that was common in the 1990s, but that was a long time ago. Even for protocols that explicitly support both big- and little-endian encodings, I never see an actual big-endian encoding in the wild and some implementations don't bother to support them even though they are part of the standard, with seemingly little consequence.
There are vestiges of big-endian in the lower layers of the network but that is a historical artifact from when many UNIX servers were big-endian. It makes no sense to do new development with big-endian formats, and in practice it has become quite rare as one would reasonably expect.
Is it though? Because my experience is very different than GP’s: git uses network byte order for its binary files, msgpack and cbor use network byte order, websocket uses network byte order, …
> any RFC that defines a protocol including binary data will generally go with big-endian numbers
I'm not sure this is true. And if it is true it really shouldn't be. There are effectively no modern big endian CPUs. If designing a new protocol there is, afaict, zero benefit to serializing anything as big endian.
It's unfortunate that TCP headers and networking are big endian. It's a historical artifact.
Converting data to/from BE is a waste. I've designed and implemented a variety of simple communication protocols. They all define the wire format to be LE. Works great, zero issues, zero regrets.
> There are effectively no modern big endian CPUs.
POWER9, Power10 and s390x/Telum/etc. all say hi. The first two in particular have a little endian mode and most Linuces run them little, but they all can run big, and on z/OS, AIX and IBM i, must do so.
I imagine you'll say effectively no one cares about them, but they do exist, are used in shipping systems you can buy today, and are fully supported.
Yeah those are a teeny tiny fraction of CPUs on the market. Little Endian should be the default and the rare big endian CPU gets to run the slow path.
Almost no code anyone here will write will run on those chips. It’s not something almost any programmer needs to worry about. And those that do can easily add support where it’s necessary.
The point is that big endian is an extreme outlier.
If you're writing an implementation of one of those "early protocols", sure. If not, call a well-known library, let it do whatever bit twiddling it needs to, and get on with what you were actually doing.
All that nonsense about masking with an int must stop. You want a result in uint32_t? Convert to that type before shifting. Done.
Now, the C standard could still get in the way and assign a lower conversion rank to uint32_t than to int, so uint32_t operands would get promoted to int (or unsigned int) before shifting, but those left shifts would still be defined as the result would be representable in the promoted type (as promotion only makes sense if it preserves values, which implies that all values of the type before promotion are representable in the type after promotion).
They are quite similar if not equivalent to the code presented in this article. I assume people far smarter than me have run these things under sanitizers and found no issues. I mean, it's Linux.
I always just write the code with shifts in any program that cares about endianness (e.g. when reading/writing files), except for dealing with internet (which is big-endian) (in which case the functions specifically for dealing with endianness with internet, will be used).
However, in addition to big-endian and small-endian, sometimes PDP-endian is used, such as Hamster archive file, which some of my programs use. (The same way of using shifts can be used, like above.)
MMIX is big-endian, and has MOR and MXOR instructions, either of which can be used to deal with endianness (including PDP-endian; these instructions have other uses too). (However, my opinion is that small-endian is better than big-endian, but any one will work.)
Can't we just declare __alignof__(uint32_t) on the 'char b[4]' and then freely use bswap without all this madness? Or memcpy the chars into a uint32 array?
> Modern compiler policies don't even permit code that looks like like that anymore. Your compiler might see that and emit assembly that formats your hard drive with btrfs.
This is total FUD. Some sanitizers might be unhappy with that code, but that's just sanitizers creating problems where there need not be any.
The llvm policy here is that must alias trumps TBAA, so clang will reliably compile the cast of char* to uint32_t* and do what systems programmers expect. If it didn't then too much stuff would break.
It doesn't help anyone that the multiples of 8 look nice in octal, because these decimals are burned into hackers' brains. And are you gonna use base 13 for when multiples of 13 look good?
Save the octal for chmod 0777 public-dir.
Anyone who doesn't know this leading zero stupidity perpetrated by C reads 070 as "seventy".
Even 78, 68, 58, 48, ... would be better, and also take 3 characters:
though we are relying on knowing that * has higher precedence than >>.
Supplementary note: when we have the 0xFF in this position, and P has type "unsigned char", we can remove it. If P is unsigned char, the & 0xFF does nothing, except on machines where chars are wider than 8 bits. These fall into two categories: historic museum hardware your code will never run on and DSP chips. On DSP chips, bytes wider than 8 doesn't mean 9 or 10, but 16 or more. Simply doing & 0xFF may not make the code work. You may have to pack the code into 16 bit bytes in the right byte order. (Been there and done that. E.g. I worked on an ARM platform that had a TeakLite III DSP, where that sort of thing was done in communicating data between the host and the DSP).
* Compilers seem to reliably detect byteswap, but are(were) very hit-or-miss with the shift patterns for reading/writing to memory directly, so you still need(ed) an ifdef. I know compilers have improved but there are so many patterns that I'm still paranoid.
* There are a lot of "optimized" headers that actually pessimize by inserting inline assembly that the compiler can't optimize through (in particularly, the compiler can't inline constants and can't choose `movbe` instead of `bswap`), so do not trust any standard API; write your own with memcpy + ifdef'd C-only swapping.
* For speaking wire protocols, generating (struct-based?) code is far better than writing code that mentions offsets directly, which in turn is far better than the `mempcpy`-like code which the link suggests.