The Byte Order Fiasco (2021)

o11c · 2024-06-30T21:00:15 1719781215

When I dealt with this, there were a couple major gotchas:

* Compilers seem to reliably detect byteswap, but are(were) very hit-or-miss with the shift patterns for reading/writing to memory directly, so you still need(ed) an ifdef. I know compilers have improved but there are so many patterns that I'm still paranoid.

* There are a lot of "optimized" headers that actually pessimize by inserting inline assembly that the compiler can't optimize through (in particularly, the compiler can't inline constants and can't choose `movbe` instead of `bswap`), so do not trust any standard API; write your own with memcpy + ifdef'd C-only swapping.

* For speaking wire protocols, generating (struct-based?) code is far better than writing code that mentions offsets directly, which in turn is far better than the `mempcpy`-like code which the link suggests.

pjc50 · 2024-07-01T15:39:22 1719848362

C programmers say things like "C is close to the machine"(+) and then have to write a huge elaborate macro in order to fool the compiler into generating a single machine instruction which is the one they actually want.

One of these days I'll write my "actually typesafe portable macro assembler" project. It will be .. opinionated.

(+) which machine, though

Akronymus · 2024-07-01T17:47:41 1719856061

> (+) which machine, though

A PDP11. Tbf, for most lower level languages it's fairly close to a pdp 11.

CyberDildonics · 2024-07-02T20:22:32 1719951752

You should give an example of that here: godbolt.org

I have never had to do what you're talking about.

JackSlateur · 2024-07-02T12:13:27 1719922407

Cannot you just __asm__() things ?

pjc50 · 2024-07-02T13:49:58 1719928198

Not portably!

People want to have their cake (not having to think about instruction names, code generation for arithmetic expressions, or register allocation, because those differ per platform) and eat it (generate reasonably predictable machine code for multiple different platforms).

Things like LLVM IR are pretty close, but understandably nobody wants to write IR by hand.

o11c · 2024-07-02T14:00:23 1719928823

As I mentioned, asm is often a pessimization. GCC has a lot of useful builtins though.

mjevans · 2024-07-01T05:13:39 1719810819

I can't remember the source, but AFAIK packed structs aren't mandated in the specification. It's terrible because a programmer will OFTEN see that result, on any architecture they're familiar with; however it isn't guaranteed to be portable.

One of the top google hits when I searched for 'c packed structs' https://stackoverflow.com/questions/4306186/structure-paddin...

o11c · 2024-07-01T05:34:52 1719812092

Wow, that SO post is full of very bad answers. Regardless ...

Even ignoring compiler extensions, you don't have to memcpy the struct to the network directly (though this necessarily must be doable safely due to the fact that <net/> headers use it). Instead, define an API that takes loose structs, and have generated code do all the swapping and packing from the struct form to the bytes form.

If you're careful, you might make it such that on common platforms this optimizes into just a memcpy. But if the struct can get inlined this doesn't actually matter.

trealira · 2024-06-30T23:20:09 1719789609

> Mask, and then shift.

> Repeat it like a mantra. You mask first, to define away potential concerns about signedness. Then you cast if needed. And finally, you can shift. Now we can create the fool-proof version for machines with at least 32-bit ints:

  #define READ32BE(p) \
    (uint32_t)(255 & p[0]) << 24 | (255 & p[1]) << 16 | (255 & p[2]) << 8 | (255 & p[3])

I don't think this works just because she's masking it. I'm pretty sure it's working because she cast (255 & p[0]) to uint32_t, and so all the other operands get promoted to uint32_t as well. I have this working with just casting to unsigned char first:

  #include <stdio.h>
  #include <stdint.h>

  char b[4] = {0x02,0x03,0x04,0x80};
  #define UC(x) ((unsigned char) (x))
  #define READ32BE(p) (uint32_t)UC(p[3]) | UC(p[2]) << 8 | UC(p[1]) << 16 | UC(p[0]) << 24
  int main(void) {
      printf("%08x\n", READ32BE(b));
  }

  $ cc -fsanitize=undefined -g -Os -o /tmp/o endian.c && /tmp/o
  02030480

Edit: actually, it works for me even without casting to uint32_t, without UBSan causing a runtime error, like in the article, so I don't know what's going on.

moefh · 2024-07-01T00:08:51 1719792531

> I don't think this works just because she's masking it. I'm pretty sure it's working because she cast (255 & p[0]) to uint32_t

You're right, masking does nothing to solve the problem, what does it is the cast to uint32_t. Masking only helps if you're working with (signed) char, which is a bit silly.

Your example works without the cast because the example in the article doesn't show the problem. That is, the following code doesn't have undefined behavior:

    char b[4] = {0x02,0x03,0x04,0x80};
    #define READ32BE(p) p[3] | p[2]<<8 | p[1]<<16 | p[0]<<24
    int main(int argc, char *argv[]) { printf("%08x\n", READ32BE(b)); }

Note that b[0] (which is being shifted by 24) is 0x02, so everything works. To show the problem you'd need something like this (b[0] must be >= 128):

    char b[4] = {0x82,0x03,0x04,0x80};

trealira · 2024-07-01T20:47:35 1719866855

Thanks for the explanation! That makes sense. I knew from reading K&R that, in converting a char to an int without sign extension, casting to unsigned char first is just as good as masking by 0xFF after, and I guess I was just not sufficiently familiar with the type promotion rules of C.

This now causes a runtime error:

  #include <stdio.h>
  #include <stdint.h>

  char b[4] = {0x82,0x03,0x04,0x80};
  #define UC(x) ((unsigned char) (x))
  #define READ32BE(p) (uint32_t)UC(p[3]) | UC(p[2]) << 8 | UC(p[1]) << 16 | UC(p[0]) << 24
  int main(void) {
      printf("%08x\n", READ32BE(b));
  }
  ----
  endian.c:8:22: runtime error: left shift of 130 by 24 places cannot be represented in type 'int'
  82030480

But this works:

  #include <stdio.h>
  #include <stdint.h>

  char b[4] = {0x82,0x03,0x04,0x80};
  #define CH_to_U32(x) ((uint32_t) (unsigned char) (x))
  #define READ32BE(p) (CH_to_U32(p[3]) \
                       | CH_to_U32(p[2]) << 8 \
                       | CH_to_U32(p[1]) << 16 \
                       | CH_to_U32(p[0]) << 24)
  int main(void) {
      printf("%08x\n", READ32BE(b));
  }
  ----
  82030480

throw0101b · 2024-06-30T20:54:59 1719780899

> Clang and GCC are reaching for any optimization they can get. Undefined Behavior may be hostile and dangerous, but it's legal. So don't let your code become a casualty.

Perhaps we need a -Wundefined-behaviour so that compilers print out messages when they use those type of 'tricks'. If you see them you can then choose to adjust your code in a way that it follows defined path(s) of the standard(s) in question.

karatinversion · 2024-06-30T21:35:11 1719783311

I always thought the problem with this was that the compilers do loads of these optimisations in very mundane ways. Eg if I have a

  #define FOO 17
  void bar(int x, int y) {
    if (x + y >= FOO) {
      //do stuff
    }
  }
  void baz(int x) {
    bar(x, FOO);
  }

the compiler can inline the call to bar in baz, and then optimise the condition to (x>=0)… because signed integer overflow is undefined, so can’t happen, so the two conditions are equivalent.

The countless messages about optimisations like that would swamp ones about real dangerous optimisations.

somat · 2024-07-01T21:40:38 1719870038

I have to admit I don't really understand the logic behind how compiler writers got to the point where they were able to interpret "undefined behavior" as "will not happen"

My understanding is that undefined behavior is in the spec because the various physical machines that C could compile to would each do something different. So to avoid stepping on toes the spec basically punted and said we are not going to define what happens. Originally this was fine. the C compiler would dutifully generate the machine instructions and the machine would execute them returning the result. The wrench was thrown into the works when we started getting optimizing compilers. I would argue the correct interpretation of "undefined behavior" in the face of a optimizing compiler is "unknown" but then it would sort of behave like the sql NULL(each unknown is different from every other unknown) and act as an optimization fence. When you have to ask the machine what the result of this operation is, it is hard to optimize around it ahead of time.

So my best guess as to why they decided to read "undefined" as "can not happen" is that this is the interpretation they can best optimize for. And nobody really pushed back because what the hell does "undefined" mean anyway? My read is that if the spec wanted to say "can not happen" the spec would have said "can not happen"

karatinversion · 2024-07-06T22:56:57 1720306617

As I understand it, the C spec defines what are valid programs, and for valid programs, either specifies what their observable side effects must be, or leaves them either unspecified or implementation defined. Importantly, programs with undefined behaviour are excluded from the class of valid programs; thus the spec imposes no requirement on the resulting behaviour. To quote,

> Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to […]

And I think this is where “can’t happen” comes in: in the case of undefined behaviour, the compiler is free to emit whatever it pleases, including pretending it cannot happen!

lofenfew · 2024-07-06T23:07:50 1720307270

"Can't happen" comes from the fact that the compiler optimizes for defined behaviour, not undefined behaviour. Making defined behaviour fast at the expense of undefined behaviour looks the same as assuming undefined behaviour can't happen in a number of cases. Eg. if you dereference a pointer then check it for null, naturally the compiler will remove the later check. We can describe that as the compiler making the defined case fast, or we could describe it as the compiler assuming the pointer "can't" be null. The latter description is easy to think about, and generalizes broadly enough, to be a common way of describing it.

im3w1l · 2024-07-01T05:03:55 1719810235

How much is actually gained by removing checks like these, and is it worth the cost? Can we find a safer approach to making the code fast?

tedunangst · 2024-06-30T21:31:45 1719783105

Do you want a warning on every for (int i = 0; i < n; i++) loop?

saurik · 2024-07-01T00:17:48 1719793068

Yes: that loop is miserable and we should upgrade it to use size_t (and preferably ++i), at which point the undefined behavior required to make that abomination fast wouldn't matter anymore.

JonChesterfield · 2024-07-01T11:49:28 1719834568

Why would you prefer to preincrement an integer? Too much time around C++ iterators?

saurik · 2024-07-02T13:21:18 1719926478

I mean, why would you ever prefer to post-increment an integer whose prior value you didn't need? Not enough time studying iterators to make that distinction obvious, perhaps? Regardless, the context here is that we should stop relying so much on the compiler to paper over differences in code that don't by the user anything except a form of quaint familiarity with old code: the i++ relies on the optimizer knowing it doesn't have to keep the prior value, while ++i doesn't, and yet somehow I guess you are enamored with i++ despite it being exactly the same characters to type, merely in a different order. Stop using int, stop using post-increment, and sure: maybe even stop using < when you semantically want != (though I explicitly didn't go there because that isn't--as far as I know--going to affect the performance with a dumber compiler).

JonChesterfield · 2024-07-02T14:52:24 1719931944

Pattern matching is a big thing in programming. Not in C++, as ++i executes arbitrary code found an unbounded distance from the programmer (so ++i vs i++ might be a very important distinction) but in other languages it helps a lot.

  for (int i=0, j=0; i < N && j < M; i+=2, j++)

i++ looks like i+=1 so fits better in loops iterating over multiple things.

Also that int says "these values don't overflow" very concisely and the more modern cleverer C++ using iterator facades probably fails to communicate that idea to anyone.

edit: amused to note on checking for typos that I wrote that with < instead of != which violates your other recommendation, which is one I'm in favour of in theory but apparently don't write out by default.

rerdavies · 2024-07-01T20:17:04 1719865024

> Do you want a warning on every for (int i = 0; i < n; i++)

If n is of type size_t, yes please!

Also every for (size_t i = sizeof(var)-1; i >= 0; --i)

(I've been burned by that one more often than I want to publicly admit).

moefh · 2024-07-01T21:07:31 1719868051

Neither of those things are undefined behavior, but gcc with -Wextra already warns about both:

    warn.c:6:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
        6 |   for (int i = 0; i < n; i++)
          |                     ^

    warn.c:10:36: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
       10 |   for (size_t i = sizeof(var)-1; i >= 0; --i)
          |                                    ^~

a_t48 · 2024-07-01T18:11:34 1719857494

Certain compiler settings do produce a warning there, yeah. It’s annoying but an easy fix/habit to change.

pjc50 · 2024-07-01T15:36:25 1719848185

Last time I suggested this it was unpopular. https://news.ycombinator.com/item?id=40787276

(even for the relatively high-profile and easy-sounding "detect if the compiler has deleted a line of code in a way which alters observable behavior of the program"!)

circuit10 · 2024-07-01T00:04:45 1719792285

It’s not always possible to predict in advance when undefined behaviour can happen

saagarjha · 2024-06-30T21:21:18 1719782478

You’ll have far too many messages.

amelius · 2024-06-30T22:52:33 1719787953

Another approach would be to implement undefined behavior in a way that is random.

https://en.wikipedia.org/wiki/Chaos_engineering

rurban · 2024-07-01T08:42:12 1719823332

Exactly. Which should be the only way forward to fix the std header mess

Uvix · 2024-06-30T21:02:17 1719781337

Isn't that what -fsanitize=undefined does?

MaulingMonkey · 2024-06-30T21:18:40 1719782320

`-fsanitize=undefined` enables some runtime warnings/checks, `-Wundefined-behaviour` would presumably enable some kind of compile time warnings/checks.

throwaway2037 · 2024-07-05T06:48:38 1720162118

Do any static analysis tools (commercial or open source) already do this now?

userbinator · 2024-06-30T21:50:58 1719784258

Or better yet, perhaps they should just do the sane thing instead of being a hostile pedantic smartass.

Undefined shouldn't mean "whatever", it should be an opportunity to consider what makes the most sense.

amelius · 2024-06-30T22:50:41 1719787841

But then you have to write a standard.

lmm · 2024-06-30T23:17:05 1719789425

You actually don't. Plenty of languages and compilers manage to be less programmer-hostile without a written standard. Unfortunately GCC/Clang have prioritised getting higher numbers on benchmarks.

favorited · 2024-07-01T21:32:38 1719869558

> Unfortunately GCC/Clang have prioritised getting higher numbers on benchmarks.

Because that's what people actually want. There's no untapped market for a C compiler that doesn't aggressively optimize. The closest you'll get is projects selectively disabling certain optimizations, like how the Linux kernel builds with `-fno-strict-aliasing`.

lmm · 2024-07-01T22:38:32 1719873512

> There's no untapped market for a C compiler that doesn't aggressively optimize.

There probably isn't now, because people who want sanity have moved on from C.I do think that years ago a different path was possible.

> The closest you'll get is projects selectively disabling certain optimizations, like how the Linux kernel builds with `-fno-strict-aliasing`.

Linux also does `-fno-delete-null-pointer-checks` and probably others. I don't think they have a principled way of deciding, rather whenever an aggressive optimization causes a terrible bug they disable that particular one and leave the rest on.

userbinator · 2024-07-02T19:57:10 1719950230

No, only a small amount of people care about compiler benchmarks... which are not representative of general-purpose code anyway since they seem more interested in testing vectorisation and other dubious features that are only important for a tiny subset of everything a compiler gets used for.

That said, ICC and MSVC can achieve similar or better results without going insane with UB, so it's clearly not as strictly necessary as the lame pedants want you to believe.

circuit10 · 2024-07-01T00:05:20 1719792320

That could cause a noticeable performance drop

IshKebab · 2024-06-30T20:55:38 1719780938

These days I think the sane option is to just add a static assert that the machine is little endian and move on with your life. Unless you're writing glibc or something do you really care about supporting ancient IBM mainframes?

Also recommending octal is sadistic!

MenhirMike · 2024-06-30T22:42:16 1719787336

Big Endian is also called Network Order because some networking protocols use it. And of course, UTF-16 BE is a thing.

There is a non-trivial chance that you will have to deal with BE data regardless if your machine is LE or BE.

foobarian · 2024-06-30T23:22:04 1719789724

> some networking protocols

Pretty low-key way to refer to pretty much all layer 1-3 IETF protocols :D

MenhirMike · 2024-07-01T00:41:47 1719794507

It's like Tim Berners-Lee being referred to as "Web Developer" :)

https://imgur.com/kX5oBk6

IshKebab · 2024-07-01T06:23:52 1719815032

Yeah but that's a known order. You don't have to detect it.

matheusmoreira · 2024-07-02T04:38:05 1719895085

Plenty of binary formats out there containing big endian integers. An example is the format I'm dealing with right now: ELF files. Their endianness matches that of the ELF's target architecture.

Apparently they designed it that way in order to ensure all values in the ELF are naturally encoded in memory when it is processed on the architecture it is intended to run on. So if you're writing a program loader or something you can just directly read the integers out of the ELF's data structures and call it a day.

Processing arbitrary ELF inputs requires adapting to their endianness though.

bewo001 · 2024-07-01T14:24:48 1719843888

The fun really starts if you have a CPU using big endian and a bus using little endian..

Back in the late 1990s, we moved from Motorola 68k / sbus to Power/PCI. To make the transition easy, we kept using big endian for the CPU. However, all available networking chips only supported PCI / little endian at this point. For DMA descriptor addresses and chip registers, one had to remember to use little endian.

bvrmn · 2024-06-30T21:03:05 1719781385

> add a static assert that the machine is little endian and move on with your life

It's not clear how it would free you from interpreting BE data from incoming streams/blobs.

bla3 · 2024-06-30T21:29:22 1719782962

https://commandcenter.blogspot.com/2012/04/byte-order-fallac... covers that part.

forrestthewoods · 2024-06-30T21:09:37 1719781777

I feel like we're at a point where you should assume little endian serialization and treat anything big endian as a slow path you don't care about. There's no real reason for any blob, stream, or socket to use big endian for anything afaict.

If some legacy system still serializes big endian data then call bswap and call it a day.

syncsynchalt · 2024-06-30T21:22:22 1719782542

The internet is big-endian, and generally data sent over the wire is converted to/from BE. For example the numbers in IP or TCP headers are big-endian, and any RFC that defines a protocol including binary data will generally go with big-endian numbers.

I believe this dates from Bolt Baranek and Newman basing the IMP on a BE architecture. Similarly computers tend to be LE these days because that's what the "winning" PC architecture (x86) uses.

jandrewrogers · 2024-06-30T23:58:26 1719791906

The low-level parts of the network are big-endian because they date from a time when a lot of networking was done on big-endian machines. Most modern protocols and data encodings above UDP/TCP are explicitly little-endian because x86 and most modern ARM are little-endian. I can't remember the last time I had to write a protocol codec that was big-endian; that was common in the 1990s, but that was a long time ago. Even for protocols that explicitly support both big- and little-endian encodings, I never see an actual big-endian encoding in the wild and some implementations don't bother to support them even though they are part of the standard, with seemingly little consequence.

There are vestiges of big-endian in the lower layers of the network but that is a historical artifact from when many UNIX servers were big-endian. It makes no sense to do new development with big-endian formats, and in practice it has become quite rare as one would reasonably expect.

forrestthewoods · 2024-07-01T01:12:04 1719796324

No idea why you’re getting downvoted. Everything you’ve written is correct.

masklinn · 2024-07-01T04:37:32 1719808652

Is it though? Because my experience is very different than GP’s: git uses network byte order for its binary files, msgpack and cbor use network byte order, websocket uses network byte order, …

IshKebab · 2024-07-01T06:26:55 1719815215

Yeah I'd say it should be true but there are plenty of modern protocols that still inexplicably use big endian.

For your own protocols there's no need to deal with big endian though.

forrestthewoods · 2024-06-30T22:05:31 1719785131

> any RFC that defines a protocol including binary data will generally go with big-endian numbers

I'm not sure this is true. And if it is true it really shouldn't be. There are effectively no modern big endian CPUs. If designing a new protocol there is, afaict, zero benefit to serializing anything as big endian.

It's unfortunate that TCP headers and networking are big endian. It's a historical artifact.

Converting data to/from BE is a waste. I've designed and implemented a variety of simple communication protocols. They all define the wire format to be LE. Works great, zero issues, zero regrets.

classichasclass · 2024-07-01T00:15:24 1719792924

> There are effectively no modern big endian CPUs.

POWER9, Power10 and s390x/Telum/etc. all say hi. The first two in particular have a little endian mode and most Linuces run them little, but they all can run big, and on z/OS, AIX and IBM i, must do so.

I imagine you'll say effectively no one cares about them, but they do exist, are used in shipping systems you can buy today, and are fully supported.

forrestthewoods · 2024-07-01T01:00:59 1719795659

Yeah those are a teeny tiny fraction of CPUs on the market. Little Endian should be the default and the rare big endian CPU gets to run the slow path.

Almost no code anyone here will write will run on those chips. It’s not something almost any programmer needs to worry about. And those that do can easily add support where it’s necessary.

The point is that big endian is an extreme outlier.

userbinator · 2024-06-30T21:45:52 1719783952

Only the early protocols below the application layer are BE. A lot of the later stuff switched to LE.

kortilla · 2024-06-30T22:31:15 1719786675

Yes, those “early protocols” carry everything. Until applications stop opening sockets, this problem doesn’t go away.

lmm · 2024-06-30T23:18:38 1719789518

If you're writing an implementation of one of those "early protocols", sure. If not, call a well-known library, let it do whatever bit twiddling it needs to, and get on with what you were actually doing.

01HNNWZ0MV43FF · 2024-06-30T23:19:02 1719789542

But the payload isn't BE

bvrmn · 2024-06-30T21:17:38 1719782258

AFAIK quite a number of protocols and file formats use BE without any sign to become a legacy even in a distant future.

saagarjha · 2024-06-30T21:22:09 1719782529

You do realize that most of the networking stack is big-endian, right?

zajio1am · 2024-07-01T00:47:34 1719794854

BE MIPS is still alive, many recent Mikrotik hardware is BE MIPS.

Gibbon1 · 2024-07-01T02:50:11 1719802211

That should just be your bad decision shouldn't be other people problems.

circuit10 · 2024-07-01T00:06:18 1719792378

I write programs for my calculator, which is big endian

rerdavies · 2024-07-01T20:27:46 1719865666

Which calculator? And does it have a C compiler?

circuit10 · 2024-07-01T22:06:28 1719871588

It’s the Casio CG50, it uses SuperH which is supported by GCC and there is an unofficial SDK (well there are actually two of them)

The CPU is technically bi-endian but it’s controlled by a pin and it’s hardwired to big endian mode

Most C code just works, sometimes there are endianness bugs when porting things but they’re usually not hard to fix

IshKebab · 2024-07-01T08:35:47 1719822947

Good for you I guess?

circuit10 · 2024-07-01T08:49:25 1719823765

It’s an example of somewhere big endian is used other than an IBM mainframe, even if it’s equally niche

I’m pretty sure there are examples that are way less niche though but I don’t want to look into it at the moment

m463 · 2024-07-01T19:14:57 1719861297

or older macs

(but mostly, network byte order is big endian)

el_pollo_diablo · 2024-07-01T12:27:26 1719836846

Look, it's not that difficult:

  static inline uint32_t load_le32(void const *p)
  {
    unsigned char const *p_ = p;
    _Static_assert(sizeof(uint32_t) == 4, "uint32_t is not 4 bytes long");
    return
      (uint32_t)p_[0] <<  0 |
      (uint32_t)p_[1] <<  8 |
      (uint32_t)p_[2] << 16 |
      (uint32_t)p_[3] << 24;
  }
  
  static inline void store_le32(void *p, uint32_t v)
  {
    unsigned char *p_ = p;
    _Static_assert(sizeof(uint32_t) == 4, "uint32_t is not 4 bytes long");
    p_[0] = (unsigned char)(v >>  0);
    p_[1] = (unsigned char)(v >>  8);
    p_[2] = (unsigned char)(v >> 16);
    p_[3] = (unsigned char)(v >> 24);
  }

All that nonsense about masking with an int must stop. You want a result in uint32_t? Convert to that type before shifting. Done.

Now, the C standard could still get in the way and assign a lower conversion rank to uint32_t than to int, so uint32_t operands would get promoted to int (or unsigned int) before shifting, but those left shifts would still be defined as the result would be representable in the promoted type (as promotion only makes sense if it preserves values, which implies that all values of the type before promotion are representable in the type after promotion).

dang · 2024-07-01T03:18:41 1719803921

Discussed at the time:

The Byte Order Fiasco - https://news.ycombinator.com/item?id=27085952 - May 2021 (364 comments)

matheusmoreira · 2024-07-02T04:17:09 1719893829

The Linux user space API byte swap functions:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

https://github.com/torvalds/linux/blob/master/include/uapi/l...

They are quite similar if not equivalent to the code presented in this article. I assume people far smarter than me have run these things under sanitizers and found no issues. I mean, it's Linux.

  #include <asm/byteorder.h>

zzo38computer · 2024-07-01T00:19:20 1719793160

I always just write the code with shifts in any program that cares about endianness (e.g. when reading/writing files), except for dealing with internet (which is big-endian) (in which case the functions specifically for dealing with endianness with internet, will be used).

However, in addition to big-endian and small-endian, sometimes PDP-endian is used, such as Hamster archive file, which some of my programs use. (The same way of using shifts can be used, like above.)

MMIX is big-endian, and has MOR and MXOR instructions, either of which can be used to deal with endianness (including PDP-endian; these instructions have other uses too). (However, my opinion is that small-endian is better than big-endian, but any one will work.)

jowea · 2024-07-03T03:13:40 1719976420

Can't we just declare __alignof__(uint32_t) on the 'char b[4]' and then freely use bswap without all this madness? Or memcpy the chars into a uint32 array?

https://gcc.gnu.org/onlinedocs/gcc/Alignment.html

jwrallie · 2024-07-01T00:45:00 1719794700

I never thought about doing bit shifts using octal numbers..

I definitely will be doing 010, 020, 030, ... instead of 8, 16, 24, ... from now on!

orlp · 2024-07-02T08:25:03 1719908703

Please don't.

1vuio0pswjnm7 · 2024-07-01T02:50:40 1719802240

Was the author allowed to fix the Tensorflow example?

pizlonator · 2024-06-30T21:50:05 1719784205

> Modern compiler policies don't even permit code that looks like like that anymore. Your compiler might see that and emit assembly that formats your hard drive with btrfs.

This is total FUD. Some sanitizers might be unhappy with that code, but that's just sanitizers creating problems where there need not be any.

The llvm policy here is that must alias trumps TBAA, so clang will reliably compile the cast of char* to uint32_t* and do what systems programmers expect. If it didn't then too much stuff would break.

kazinator · 2024-07-01T08:58:50 1719824330

Please, do not write shift amounts in octal. Or anything else in octal.

Mask after shifting, so you can use the same mask.

It's less verbose and probably easier to optimize, based on the simple pattern that an expression of the form:

  EXPR & MOD

is being converted to a type which is MOD + 1 bits wide.

From this:

  #define WRITE64BE(P, V)                        \
    ((P)[0] = (0xFF00000000000000 & (V)) >> 070, \
     (P)[1] = (0x00FF000000000000 & (V)) >> 060, \
     (P)[2] = (0x0000FF0000000000 & (V)) >> 050, \
     (P)[3] = (0x000000FF00000000 & (V)) >> 040, \
     (P)[4] = (0x00000000FF000000 & (V)) >> 030, \
     (P)[5] = (0x0000000000FF0000 & (V)) >> 020, \
     (P)[6] = (0x000000000000FF00 & (V)) >> 010, \
     (P)[7] = (0x00000000000000FF & (V)) >> 000, (P) + 8)

To this:

  #define WRITE64BE(P, V)         \
    ((P)[0] = ((V) >> 56) & 0xFF, \
     (P)[1] = ((V) >> 48) & 0xFF, \
     (P)[2] = ((V) >> 40) & 0xFF, \
     (P)[3] = ((V) >> 32) & 0xFF, \
     (P)[4] = ((V) >> 24) & 0xFF, \
     (P)[5] = ((V) >> 16) & 0xFF, \
     (P)[6] = ((V) >>  8) & 0xFF, \
     (P)[7] =  (V)        & 0xFF, (P) + 8)

It doesn't help anyone that the multiples of 8 look nice in octal, because these decimals are burned into hackers' brains. And are you gonna use base 13 for when multiples of 13 look good?

Save the octal for chmod 0777 public-dir.

Anyone who doesn't know this leading zero stupidity perpetrated by C reads 070 as "seventy".

Even 78, 68, 58, 48, ... would be better, and also take 3 characters:

  #define WRITE64BE(P, V)          \
    ((P)[0] = ((V) >> 7*8) & 0xFF, \
     (P)[1] = ((V) >> 6*8) & 0xFF, \
     (P)[2] = ((V) >> 5*8) & 0xFF, \
     (P)[3] = ((V) >> 4*8) & 0xFF, \
     (P)[4] = ((V) >> 3*8) & 0xFF, \
     (P)[5] = ((V) >> 2*8) & 0xFF, \
     (P)[6] = ((V) >>   8) & 0xFF, \
     (P)[7] =  (V)         & 0xFF, (P) + 8)

though we are relying on knowing that * has higher precedence than >>.

Supplementary note: when we have the 0xFF in this position, and P has type "unsigned char", we can remove it. If P is unsigned char, the & 0xFF does nothing, except on machines where chars are wider than 8 bits. These fall into two categories: historic museum hardware your code will never run on and DSP chips. On DSP chips, bytes wider than 8 doesn't mean 9 or 10, but 16 or more. Simply doing & 0xFF may not make the code work. You may have to pack the code into 16 bit bytes in the right byte order. (Been there and done that. E.g. I worked on an ARM platform that had a TeakLite III DSP, where that sort of thing was done in communicating data between the host and the DSP).

So we are actually okay with:

  #define WRITE64BE(P, V) \
    ((P)[0] = (V) >> 56), \
     (P)[1] = (V) >> 48), \
     (P)[2] = (V) >> 40), \
     (P)[3] = (V) >> 32), \
     (P)[4] = (V) >> 24), \
     (P)[5] = (V) >> 16), \
     (P)[6] = (V) >>  8), \
     (P)[7] = (V)       , (P) + 8)

akira2501 · 2024-06-30T21:41:24 1719783684

> Now you don't need to use those APIs because you know the secret.

Was that a desired outcome? The endian.3 and byteorder.3 manual pages make it easy.