The Lost Art of C Structure Packing (2014)

dang · on April 27, 2022

The Lost Art of C Structure Packing (2014) - https://news.ycombinator.com/item?id=12231464 - Aug 2016 (112 comments)

The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=9517623 - May 2015 (4 comments)

The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=9069031 - Feb 2015 (113 comments)

The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=6995568 - Jan 2014 (143 comments)

mlindner · on April 28, 2022

It's so lost it's been posted almost every year since it was posted.

ncmncm · on April 27, 2022

It neglects to mention that bit fields have always been the buggiest part of C compilers, and there is never a good enough reason to rely on them, if you have a choice at all. Honest shift-and-mask operations on unsigned machine words are always better, if you absolutely must pack bitwise.

AdamH12113 · on April 27, 2022

Bitfields make for much more readable code when accessing individual fields of hardware registers, although there are some caveats if the registers are poorly-designed. The main one is that bitfield writes are usually read-modify-writes, so if reading the register or writing back its current value causes something to happen, bitfields are a no-go. But when they work, you get code like:

    old_divider = SpiRegs.CONFIG_REG.bit.CLK_DIVIDER;
    SpiRegs.CONFIG_REG.bit.CLK_DIVIDER = new_divider;

instead of:

    old_divider = (SpiRegs.CONFIG_REG & SPI_CONFIG_CLK_DIVIDER_MASK) >> SPI_CONFIG_CLK_DIVIDER_POS;
    SpiRegs.CONFIG_REG = (SpiRegs.CONFIG_REG & ~SPI_CONFIG_CLK_DIVIDER_MASK) | (new_divider << SPI_CONFIG_CLK_DIVIDER_POS);

or the slightly nicer but even longer:

    config = SpiRegs.CONFIG_REG;
    old_divider = (config & SPI_CONFIG_CLK_DIVIDER_MASK) >> SPI_CONFIG_CLK_DIVIDER_POS;
    config &= ~SPI_CONFIG_CLK_DIVIDER_MASK;
    config |= new_divider << SPI_CONFIG_CLK_DIVIDER_POS;
    SpiRegs.CONFIG_REG = config;

For anything other than hardware registers, I agree that they're not portable enough to rely on.

pavon · on April 27, 2022

Helper functions or macros are just as clean as the bitfield syntax. That said, hardware register access is one of those things that is intrinsically tied to a specific platform (and if you target multiple platforms, it will already be behind an abstraction layer), so you can usually know the quirks of how the toolchain for that platform supports bitfields, and use them accordingly. Still more work for people reading the code, though since there are a lot of hidden assumptions behind that deceptively simple "=" than with an explicit mask and shift.

AdamH12113 · on April 27, 2022

>Helper functions or macros are just as clean as the bitfield syntax.

They can be, if done right, but then I have to remember the names of all the helper functions and macros. :-) An IDE can auto-complete bitfield names.

>Still more work for people reading the code, though since there are a lot of hidden assumptions behind that deceptively simple "=" than with an explicit mask and shift.

Depends on the platform. IIRC on ARM a bitfield access is masking and shifting, only done by the compiler instead of me. With optimized code I often have to look at the disassembly anyway if I want to know what's really going on.

deaddodo · on April 27, 2022

> Depends on the platform. IIRC on ARM a bitfield access is masking and shifting, only done by the compiler instead of me. With optimized code I often have to look at the disassembly anyway if I want to know what's really going on.

ARM supports native bitfield loading[1], extraction[2] and clearing[3].

1 - https://developer.arm.com/documentation/ddi0602/2022-03/Base...

2 - https://developer.arm.com/documentation/ddi0602/2022-03/Base...

3 - https://developer.arm.com/documentation/ddi0602/2022-03/Base...

raxxorraxor · on April 28, 2022

Another trap with macros is that they might introduce extremely hard to detect bugs. You macro for clearing a bit like for example:

  #define CLEARBIT(a,p) = ((a) &= ~(1 << (p)))

might not work correctly if your datatype is something "exotic" like a long. An inlined function might be the better choice.

Otherwise you would need to define specific macros again:

  #define CLEARBITLONG(a,p) = ((a) &= ~(1L << (p)))

These bugs can steal hours of your time, especially on embedded systems where debugging is less accessible.

FrozenVoid · on April 28, 2022

  #define CLEARBIT(a,p) = ((a) &= ~((typeof(a))1 << (p)))

raxxorraxor · on April 28, 2022

That is clearly better provided the compiler supports it. Less portable but it should really be a part of ISO C in my opinion.

ncmncm · on April 28, 2022

There are tens of thousands of different ARM chip types.

Many of them get it right.

LAC-Tech · on April 27, 2022

I remember being surprised that setting an individual bit on an AVR hardware register in assembly was so much shorter than doing all that C bit masking stuff.

dataflow · on April 27, 2022

It feels weird to see arguments like this when you could just use a language (C++ being the elephant in the room here) that lets you define methods, then call those methods instead.

nomel · on April 27, 2022

A method that does will often (depending on the architecture) have much more overhead than a struct lookup. If you're doing hardware stuff, you often care about performance.

Sharlin · on April 27, 2022

Every C++ compiler released in the last 30 years will trivially inline oneliner getter functions, member or not.

nomel · on April 28, 2022

True, but this is making the assumption that whatever pointer offsets, shifting, and masking that you have in your method, to extract the 9th bit in the 5th word of a 512 bit struct, will result in the same operations as the simple struct access. It very well could be so (certainly for some architectures, most likely not for others), but I don't think all this complication is justified when a struct is available, especially if you're someone writing code for hardware, where these concepts are more known/less scary.

ncmncm · on April 28, 2022

Compilers do generate instructions specified in functions correctly, inlined or no.

nomel · on April 28, 2022

> Compilers do generate instructions specified in functions correctly, inlined or no.

Well, clearly that's not always the case, as shown in the article, and which is what started this comment chain.

But, I'm not sure how you reached that interpretation. I'm saying that the code that you put into the function will result in some operations, whatever they may be, inline or not. The struct access will result in some operations. The those two sets of operations may be different. Some architectures have specialized instructions for bitfield access. There's a good chance that the complier won't convert the shenanigans in your method to those specialized instructions. Maybe! It requires an understanding of your particular situation. But some people's work exist in a context where this is a reasonable choice, for them.

ncmncm · on April 29, 2022

If you make bitfield members, and access those in an inline function, and code generation for bitfields tickles a bug, the access being in an inline function won't help.

If you code a shift-and-mask in an inline function, instead, your odds are better, same as if you made a macro for it.

dataflow · on April 27, 2022

Inlining?

foldr · on April 27, 2022

You can equally define helper functions to update registers in C.

dataflow · on April 27, 2022

Definitely, but it's more ergonomically annoying with IDE stuff others complained about. [1]

[1] https://news.ycombinator.com/item?id=31185723

jack_h · on April 28, 2022

This just illustrates how even in imperative languages like C/C++ programmers naturally reach for more declarative approaches. Type punning with a union is much the same story. It's just unfortunate that C/C++ has so many foot guns.

chakkepolja · on April 28, 2022

By that logic isn't all abstraction 'declarative code'?

ncmncm · on April 27, 2022

Readable code that does not necessarily execute what it says does nobody any favors.

nomel · on April 27, 2022

This is why we test our code, or build in runtime checks, before releasing it.

TillE · on April 27, 2022

Absolutely, it makes far more sense to do some basic sanity checks instead of writing painfully awkward code out of fear of a hypothetical.

The standard may be flexible, but the behavior of a given compiler on a given platform will be consistent and very unlikely to change in the foreseeable future.

ncmncm · on April 28, 2022

Wrong.

TillE · on April 28, 2022

Perfectionist nerds love to obsess over a million scenarios which only exist in their head, or in obscure edge cases. This is not productive. Perfectly portable, future-proof, CPU-bug-proof code simply does not exist.

The only thing to be aware of here is that you shouldn't be trying to transfer data from one platform to another, loading it directly into memory, which is an insane thing to do in any case.

ncmncm · on April 28, 2022

You may ignore field experience freely if you are insulated from all consequences.

ncmncm · on April 28, 2022

You really are failing to get this!

Testing formally-correct code can only ever find bugs in the exact compiler release and physical CPU chip you run the test on.

The bugs cited show up in various compiler releases and various chip steppings. Unless your code will only ever run on that exact machine, you have not tested it.

nomel · on April 28, 2022

> Unless your code will only ever run on that exact machine, you have not tested it.

I don't understand this sentence. That is the purpose of the runtime check, to make sure the code generated by a buggy compiler, on a specific architecture, doesn't try to run. Bug reports can then be filed, as they should be.

If you're writing a Linux kernel module (known compiler) for a known architecture, which is where this is often used, then you have a known environment, so there's little risk, except the rare compiler bug, which is a risk that extends far beyond packed structs.

This is something that I've used, and I've seen used, often in the hardware world. I'm having difficulty sympathizing with all the fear in this comment section. Don't use it if its use can't be made rational, for you. But please don't tell me that I'm failing to get something that I am familiar with.

ncmncm · on April 28, 2022

Still not getting it.

nomel · on April 28, 2022

I certainly don't understand what you're trying to say, and replying with four words definitely won't help me.

Perhaps there's some confusion of what a runtime check is. A runtime check is executed at runtime, which means that it will be executed after compilation with a specific compiler, on a specific architecture, in the end user system. The runtime check verifies correct struct operations at runtime, as part of the initialization, and aborts the driver load, with a nice system message, if improper packed struct behavior (among other things) is seen. This covers your concern here:

> Unless your code will only ever run on that exact machine, you have not tested it.

It literally is tested on every machine, when the driver is loaded.

This runtime check handles the cases where a customer might try to run it on some new hardware/toolchain that we don't officially support. The official supported cases only have risk of new compiler bugs, since we only support certain architectures. Hypothetical compiler bugs are a problem for all software, and shouldn't be used to drive software design, beyond making sure there is good testing, which has nothing to do with compilers.

Maybe I'm missing something.

edit: ncmncm, I can't reply to you, but since I have code that does what you say is impossible, and since my comment already responds to what you wrote, I would suggest reading all of my comments, fully, a second time.

ncmncm · on April 28, 2022

Yes. You will never have a startup test of all the instructions used, in all the circumstances where they are used. Startup tests are useful for checking the presence or absence of certain advertised features, such as SIMD instructions. They are useless for buggy compilers and buggy chip implementations.

And, what would the code do if it detected such a bug? Abort, or run other code that works OK anyway. So, run that code all the time, and you don't need to try to check.

Gibbon1 · on April 28, 2022

On an AVR if you write this in C

  PORTB |= (1<<4);

You get this in assembly

  sbi PORTB, PINB4

However if you write

  PORTB = PORTB | (1<<4)l

The compiler instead emits.

  sbi PORTB, PINB4

I ported some code once where they went to the trouble of defining the registers as bit fields. In that case

  portb.pin4 = 1;

compiled to

  sbi PORTB, PINB4

ncmncm · on April 28, 2022

If you are doing full functional tests of your ROM app on on each mask stepping before it is soldered into thousands of boards, congratulations, the warning isn't for you.

amd64 bugs tend to be in newer parts of the ISA. ARM provides chip makers very thorough tests, to protect their brand (though not all chip makers fix all the bugs tests find). So, many programmers will not encounter bitfield bugs. But code gets around, and tests are always less thorough than we wish.

nomel · on April 28, 2022

Maybe I'm naive, but I don't see why fully functional runtime tests are required. The only purpose of a runtime tests is to test things that could vary, at runtime, like architectural incompatibilities, with memory layout things like endianness and bitfields being an example. These are trivial to implement in a few dozen bytes. If you have a ROM soldered into boards, then absolutely none of this would be a concern since the architecture would be intimately known, so I'm now completely lost on your point. Anyways, cheers!

junon · on April 28, 2022

Why not

    old_divider = (SpiRegs.CONFIG_REG >> SPI_CONFIG_CLK_DIVIDER_POS) & 1;

Slightly cleaner IMO.

AdamH12113 · on April 28, 2022

Because the divider isn't necessarily one bit. Multi-bit register fields are very common. You can certainly hardcode the numbers if you want:

  old_divider = (SpiRegs.CONFIG_REG >> 24) & 0xff;

which is easier to read but also easier to mess up when you're writing it. It's best to use the vendor's register definitions where possible. Although then you have the possibility of using the wrong #define constant, because all of these are just integers so the compiler can't tell you if you made a mistake.

Another "fun" issue with using numbers is that sometimes an int is 16 bits, so you have to do (13 << 24ul) instead of (13 << 24).

WalterBright · on April 27, 2022

> bit fields have always been the buggiest part of C compiler

Not in my experience. The buggiest part was the preprocessor. You don't hear much about preprocessor bugs anymore because the C standard doesn't dare change it, and in 40 years people have finally got them working right :-/

Personally, I had to scrap and rewrite the C preprocessor 3 times to get it right.

ncmncm · on April 28, 2022

C preprocessors used to be a big trouble spot. Now that we have good test suites, C preprocessor bugs have faded.

ithinkso · on April 27, 2022

Bitfields are used a lot when you have constrained resources, in my case in LTE/5G both on modem and bts sides. Every struct's field takes as much as it needs to and you leave rest as 'reservedX'. You never know when a new feature will have to be implemented and few more bits will be needed for some new field.

Without bitfields the code would be absolutely filled with bit-access macros decreasing readability and screwing with IDE's indexers and static analyzers big time

Not to mention the pain it would be to refactor/reorder/change fields sizes which is relatively painless with bitfields

zozbot234 · on April 27, 2022

The drawback though is that you can't reference individual fields by pointer. You basically need to write the equivalents of OOP getters and setters to really keep the code tidy. The compiler can't plan for such things on its own, not across multiple compilation units at any rate.

nomel · on April 28, 2022

> You basically need to write the equivalents of OOP getters and setters to really keep the code tidy.

I thought that was the entire purpose of the bitfield, to give you a clean (visual and operations) way to access the field:

    mybitfield.fieldname = value

ithinkso · on April 27, 2022

This is a valid and a main drawback of bitfields but in this context bit-access macros would be even worse since you would have pointers to a random uint32 within a struct and good luck keeping track where's what

ncmncm · on April 28, 2022

That is not, in fact, the main drawback of bitfields.

The main drawback of bitfields is that they work in your tests and fail in the field.

ithinkso · on April 28, 2022

Could you share the details (and compilers) of the issues you had because I'm actually kinda curious.

From my experience most of the bugs were just a 'normal' bugs i.e. human errors when writing and those were fixable by just figuring out what was implemented incorrectly. About 2% on a bts side were cache coherency issues because we had a multicore system without hardware coherency, so imagine, and similarily 2% were hardware issues on a modem side due to race conditions or whatever - harder to fix so workarounds. But miscompilation? x86 host tests are used to separate wheat from the chaff but the only tests anyone cares about, before commit, are on target

On modem side I remember one, maybe two if you push it, issues with miscompilation but it wasn't at all related to bitfields but the compiler was doing some stupid shit with register allocations

ncmncm · on April 28, 2022

There are a lot of different C compilers (not so many C++), dozens of ISAs (although not as many as before) and ABIs (ditto) and many, many thousands of implementations of ISAs and ABIs, all done with widely varying degrees of attention to detail.

There are myriad places for mistakes to manifest. Caches, interrupt behavior, and sleep modes are favorite places for implementation bugs. But bitfields are a place ordinary programmers might still encounter them.

Anybody used to working with buggy one-off chip designs knows all about this. But most programmers are insulated from most bugs. The warning is for them.

unnah · on April 28, 2022

Are there other parts of the C language you are avoiding in such an environment, or is it just bitfields? I suppose in any case you'll be using C89 instead of newer language revisions.

ncmncm · on April 28, 2022

There are of course plenty of things people write that have undefined behavior.

I personally never code in C anymore; lately I am using C++17. Compiler optimizers are way, way more reliable than back when I used C.

camgunz · on April 27, 2022

Super agree. I read up on bit fields [0] a while ago and some of the details about them are bonkers:

> Multiple adjacent bit-fields are usually packed together (although this behavior is implementation-defined)

> The special unnamed bit-field of size zero can be forced to break up padding.

> int b:3; may have the range of values 0..7 or -4..3 in C

> on some platforms, bit-fields are packed left-to-right, on others right-to-left

I wouldn't touch them unless I absolutely had to, and knew I could guarantee compiler and platform.

[0]: https://en.cppreference.com/w/cpp/language/bit_field

deaddodo · on April 27, 2022

They're pretty safe on the two most popular ISAs: aarch64 and amd64.

alternatetwo · on April 28, 2022

Even then, msvc and gcc still generate them differently ...

ncmncm · on April 28, 2022

Safe enough to make people complacent, anyway.

dmitrygr · on April 27, 2022

> Honest shift-and-mask operations on unsigned machine words are always better, if you absolutely must pack bitwise

Now you go ahead and teach GCC to use the arm UBFX instruction for those cases. It DOES use it for actual bitfields. shift + mask = 2-3 instructions (immediate load may be needed). UBFX is one.

ncmncm · on April 27, 2022

The more complicated the instruction is, the less likely it was implemented to spec on all the various products and mask steppings you might execute on ... and the less likely its published definition exactly matches C or C++ Standard and platform C ABI specs. And, the less likely that ABI spec nails down all the details.

Compiler implementors don't like to guess, but don't get a choice. If the instruction provided doesn't match the Standard, which do they implement? Both choices are wrong.

mjevans · on April 27, 2022

Until I read this article that's what I thought C Bitfields _were_. I didn't realize the specification was so uselessly sloppy about alignment packing that a programmer couldn't reliably address specific bit field members with just a lowest bit first to highest bit field of exact widths. It's quite annoying that such is not what those are.

scatters · on April 27, 2022

If the specification is lax, it's because the expected behavior is different across platforms. C (and C++) has to support platforms where bytes are more than 8 bits, where floats are non-IEEE, and (until recently) where signed integers are not 2's complement. If you want a specific behavior and don't care about portability all you have to do is read the ABI spec alongside the standard.

ncmncm · on April 28, 2022

You have wholly missed the point of the warning. It does not matter what the spec says when the instructions fail to implement it, sometimes, on machines other than yours.

froh · on April 28, 2022

Please add endianness (little, big and mixed) to the list, relevant to but field packing, while IEEE floats should not interfere with bit fields.

Asooka · on April 27, 2022

The ABI for a specific platform will define how bit fields are arranged in memory, or at least you can always rely on GCC to not change behaviour. They work perfectly well for data in memory, at work we use them all the time for storing tiny integers or flags.

ncmncm · on April 28, 2022

They work on the machines where you have tested them. The ABI defines what is supposed to happen, not what will happen. Gcc will get it right on very heavily used archs, and very heavily used archs will generally execute the instructions right.

If your code only ever runs on the physical machines where you test, or only on very mainstream chip designs, then fine.

You have been warned. Now it is on you.

bsder · on April 27, 2022

Quite true. C bitpacking is lousy.

The best "bitpacking" I have ever dealt with is the "Erlang Bit Syntax". I really wish more languages would adopt it.

See: https://www.erlang.org/doc/programming_examples/bit_syntax.h...

deaddodo · on April 27, 2022

I've always thought Ada's syntax was the clearest as to what's being defined and accessed[1]. That being said, it's Ada...so good luck finding a use case.

1 - https://stackoverflow.com/questions/58493193/ada-how-to-expl...

pyjarrett · on April 28, 2022

Ada is often used in microcontrollers where this is very common.

I write command line tools in Ada, and some other things, I've found this bit layout very useful when reading/writing binary file formats, or when binding to C and matching C struct layout.

deaddodo · on April 30, 2022

Fair. I’m quite fond of Ada because or my Pascal/Wirth early origins, but I personally haven’t ever needed to use it. Glad to hear it’s found some uses outside of the US military.

Unklejoe · on April 28, 2022

Yep. Another thing to watch out for is using bitfields to access device registers.

One time, I was working with an older embedded PPC architecture on a driver to talk to an FPGA that was attached to a 32 bit local bus. The problem is that it didn’t support unaligned accesses. The lower two address bits simply weren’t hooked up, so any access to byte addresses that weren’t a multiple of 4 would just behave as if you masked the two lower address bits off.

There was a structure that used bitfields to access bits in a register on that FPGA. It worked fine til I updated GCC, then it stopped working.

It turns out that the newer version of GCC would do a single byte unaligned read if you were accessing, say, bits 8-15 in a 32 bit bitfield, whereas the older GCC would read the full 32 bit word and shift/mask as needed.

It turns out you can force the older behavior with -fstrict-volatile-bitfields.

Took a minute to figure out, but I learned my lesson. That said, I don’t think you should really be doing IO that way anyway. I typically use IO accessor methods, sometimes with raw addresses, sometimes with struct overlays and taking the address of the member.

chrisseaton · on April 27, 2022

> shift-and-mask operations on unsigned machine words are always better

... but that's what a bit field is?

jcranmer · on April 27, 2022

That's not what a bit field is.

A bit field (in C/C++) is a weird object type that can only exist in a structure or union type, which kind of acts like an underlying regular integral type except for those situations where it does not.

For an example of why compilers might have issues compiling bit fields properly (although this requires C++, since C's ternary operator works on rvalues, not lvalues):

  struct A { int x: 3; int y: 5 } a;
  (choice ? a.x : a.y) = val;

Enjoy making that codegen work properly.

WalterBright · on April 27, 2022

The compiler can rewrite it as:

    choice ? ((a.x = val),a.x) : ((a.y = val),a.y);

The trick with a compiler is to rewrite complex constructions into simpler equivalent ones, then the code gen is much simpler and more reliable.

For example, in the D compiler the `while` loop doesn't survive the semantic pass, as it gets rewritten into a `for` loop. The `for` loop then gets rewritten into `if` and `goto` statements. The code generator only needs to learn about `if` and `goto`.

chrisseaton · on April 28, 2022

> in the D compiler the `while` loop doesn't survive the semantic pass, as it gets rewritten into a `for` loop. The `for` loop then gets rewritten into `if` and `goto` statements

You throw away structured control flow in favour of unstructured control flow?

I would have thought it'd be the other way around and you'd be trying to recover nice clean structured control flow from raw concepts like goto.

jcranmer · on April 28, 2022

Most middle-end compiler IRs are built on conditional/unconditional branches rather than high-level control flow structures. The key concepts at that level can usually be expressed in terms of dominators, postdominators, and/or control-dependence [1]. For example, a (natural) loop exists when a block dominates one of its predecessors. Induction variables in such loop can be very easily discovered, especially if you're already in SSA form (already usually the case at this level of IR).

So, in short, structured control flow--or at least a sufficient subset of such structured control flow--can be easily recovered from low-level information, and you're generally not going to lose much information going down to that level. LLVM even has a way to attach metadata to loops despite not having any dedicated loop construct.

[1] You only need two of these concepts: the third falls out from the definition of the other two.

WalterBright · on April 28, 2022

> You throw away structured control flow in favour of unstructured control flow?

That's right. It sounds counter-intuitive, but it works great. You wind up with a collection of blocks of code connected by edges. Then, you can use graph theory to work magic on them in a general, correct way. Data flow analysis is based on this.

One thing the graph math does is enable the reconstruction of loops out of the blocks and edges - so you can write loops any way you please, and the compiler will figure it all out and apply general algorithms to it (like loop rotation, loop unrolling, etc.).

chrisseaton · on April 28, 2022

The compiler I work with most, Graal, does the opposite and keeps structured loops as a first-class concept in the graph almost all the way through compilation, and turns jumps into a structured loop if it can. We find it makes reasoning about the loops easier. It does make irreducible control-flow harder to deal with (and in some cases impossible.)

WalterBright · on April 28, 2022

C doesn't constrain what e1, e2, and e3 are or what they do in `for(e1;e2;e3)`, and goto's can be willy-nilly turning it into a hash anyway. Hence, trying to use the higher level construct is going to result in lots of special case code, and special case code is much more likely to lead to problems.

But hey, if Graal works for you, great!

ncmncm · on April 28, 2022

Structured control flow is a thing to help people understand code.

The compiler's job is to emit instructions that do precisely the things the code says to do, as efficiently as possible. There is no need for chips to understand the purpose of the code; they just need to do what it says. They don't get confused.

mhh__ · on April 28, 2022

Chips don't need to know but knowing information about loops is really important, so some structure or metadata can go a very long way.

chrisseaton · on April 28, 2022

> as efficiently as possible

Which requires a high-level understanding of the program and its control flow.

WalterBright · on April 29, 2022

> Which requires a high-level understanding of the program and its control flow.

The `for` loop construct does not offer the compiler any more understanding of the code than one made from `goto`s. They contain the same information, and are interchangeable. The latter, however, is more amenable to applying mathematical algorithms to. The former is more amenable to human understanding.

chrisseaton · on April 29, 2022

Why is unstructured control-flow better for applying algorithms than structured control flow? That seems counter-intuitive to me.

With structured control-flow I can do things like reason about the level of nesting of the loop that I'm in. I can peel a loop iteration by literally saying 'take this loop here - copy the body of it out'.

For context here's the kind of structure we use https://chrisseaton.com/truffleruby/basic-graal-graphs/#loop....

mwint · on April 27, 2022

Can someone explain further why this is so hard? I’m not familiar enough with any of this to understand without some help.

jcranmer · on April 27, 2022

There's a few layers of complexity here.

The first is lvalues. In compiler jargon, an lvalue is a kind of object that can have a value stored to it. And you can usually represent it as the address of some memory location [1]. Of course, bitfields break this representation: you need to know what the bit offset and bit size of the field you're storing is (as well as the signedness).

The next level of complexity is the conditional operator. This means that, when conditional operators yield lvalues [2], you now end up in a situation where the lvalue now has a conditional bit offset and bit size within the address. Or maybe one leg of the expression returns a bit-field and the other leg returns a regular int lvalue. Imagine how complex your datastructure needs to be to represent an lvalue during this code generation phase.

[1] Not all lvalues need to have memory locations. But if you're writing a C compiler, it's an easy first approximation to give every variable, even those marked register, some memory location and rely on an optimization pass to convert stack memory locations into register locations, rather than keeping track of this information when the frontend does code generation.

[2] As mentioned elsewhere, conditional operators in C do not yield lvalues. But conditional operators in C++ do.

ncmncm · on April 27, 2022

It is just very finicky, with myriad edge cases easy to get wrong, and even easier to neglect to have complete tests for. Each target CPU design and version has quirks. Many involve sign extension.

jstimpfle · on April 27, 2022

Extracting a subrange of bits and shifting them to the beginning is not exactly rocket science.

ncmncm · on April 27, 2022

Experience indicates otherwise.

If you must use bit fields, make them unsigned. Bugs love to hide under signed bit fields.

jstimpfle · on April 27, 2022

That makes sense, looking at how signed arithmetic works on different architectures it would feel strange to use signed bitfields.

Unsigned bitfields are a nice way to get modular arithmetic with n bits without syntactic clutter.

ncmncm · on April 27, 2022

"Unsigned bitfields are a nice ..."

Appear to be. Are, when all the stars align. Are not in fact, often enough that you are issued a red warning you may ignore if you are insulated from all consequences.

iainmerrick · on April 27, 2022

But, you’d have an even harder time making that work with mask-and-shift macros!

It also doesn’t seem like something that would come up very often. I can’t think of the last time I conditionally stored to one of two struct fields, if I ever have.

The much more normal case would be:

  val = choice ? a.x : a.y;

That one seems pretty straightforward from a codegen perspective.

jcranmer · on April 27, 2022

The example I gave is an example of something legal with bitfields (in C++) that is legitimately challenging to implement [1] that leads to bugs in compilers. It's not meant to be something that anyone is intended to use--indeed, I'd firmly suggest that the standard ought to prohibit this kind of usage.

The broader point is that bitfields are actually weird little objects that look a lot like regular objects in many, but not all, contexts. And it's very easy from a language design or implementation perspective to forget to account for the possibility that you're dealing with a weird little object. This leads to underspecified language specifications and compilers that crash if you do something weird (but legal) such as virtually inherit from a struct containing a bitfield as its last member.

[1] So challenging, in fact, that Clang gives an error message "cannot compile this conditional operator yet". It does work in g++, icx, and MSVC though.

iainmerrick · on April 27, 2022

That all makes it sound like a C++ problem, not a C problem.

jcranmer · on April 27, 2022

You can make a lot of the "fun" of bitfields go away with lvalue-to-rvalue conversion, and C tends to do this conversion very rapidly so that it's hard to find good cases for truly bizarre stuff, whereas C++ makes lvalues last a lot longer.

Of course, if you go reach for C's standard "fun with lvalue" operations, you can get some crazy nonsense. What machine code should you generate here [1]:

  struct A { int x : 5; volatile _Atomic int y: 3; } a;
  a.y++;

I will note that the intersection of volatile and bitfields has been another fruitful area of compiler bugs [2] historically speaking. While C++ does provide better what-the-ever-living-fuck moments for bitfields, C has had its fair share of issues with bitfields.

[1] Whether or not you can make a bitfield _Atomic in C is implementation-defined, so it's possible that someone writes a C implementation where this is legal. I will note that, in a rare display of sanity, all C compilers I can test do in fact sensibly reject _Atomic bitfields, but for the purposes of argument, assume that someone has one where it's permitted, since it is allowable by the standard.

[2] Or programmer bugs blamed on the compiler. This is the intersection of two areas that are notorious for underspecification to begin with, and combined with the general tendency of programmers to expect C compilers to be a thin veneer over assembly, makes it awfully difficult to figure out which behavior is language-intended.

dfox · on April 27, 2022

I vaguely recall that gcc supports this even in C mode.

The underlying problem has to do with whether the IR has first-class concept of arbitrary lvalue or whether the frontend has to convert lvalues that get passed around to some pointer-like thing.

It might look irrelevant for discussion of low-level AOT compilers, but it is also interesting to compare how this is implemented in dynamic/“scripting” runtimes and how the choice of underlying implementation of the concept of “lvalue”/“place” influences the user visible language. Somewhat notably first draft of Common Lisp had something akin to first-class lvalues and the final standard replaced all that with significantly simpler mechanism that purely relies on macros.

ncmncm · on April 28, 2022

Missing the point, again.

He is describing a trivial difference between C and C++ that is not the problem you are being warned about.

ncmncm · on April 27, 2022

"Seems" is not the domain under discussion.

chrisseaton · on April 27, 2022

Is that valid C code? Is a ternary on an L-value an L-value? I'm not sure it is - regardless of bitfields or not?

https://godbolt.org/z/aP8v5xKaz

jcranmer · on April 27, 2022

I mentioned in the note that it's not legal C, since ternaries must yield rvalues in C. It is legal C++, however, since there ternaries may be lvalues.

chrisseaton · on April 27, 2022

I feel like you added that after I replied.

jmwilson · on April 27, 2022

With shifts and masks you know where the bits are. With bitfields, you don't because the specification leaves everything up to the compiler.

  struct foo {
    char a : 4;
    char b : 4;
  };

Is a in the high-order 4 bits, or the lower 4 bits? Both choices are allowed, so it's up to the compiler and makes the code non-portable.

Findecanor · on April 27, 2022

While the C and C++ language specs don't specify the layout of bitfields, modern platforms tend to have a specified ABI which compilers follow when compiling for that platform.

64-bit Linux distros and the BSDs follow the convention once set by the "C ABI for Itanium".

In that, bitfields are grouped in declaration order into container words of the same width as the bitfield's type (char, int, etc.). Bitfields don't span multiple container words, and container words don't overlap. On little-endian platforms, bitfields are packed LSB first, but on big-endian platforms they are packed MSB first within their container word. Alignment rules apply only to the container words.

ncmncm · on April 27, 2022

That is all very fine.

If the instructions emitted and the instructions implemented both happen to match that, on every chip your code must run on, you got lucky.

dfox · on April 27, 2022

The point is that if you care about the resulting in-memory layout then you by definition know on what platform the code will run and what is the ABI.

If you want to produce same sequence of bytes regardless of underlying platform, then you have to do it by hand with uint8_t[] buffers and explicit shifts and masks. Casting pointer to struct to char* and writing it somewhere is inherently non-portable and this gas nothing to do with bitfields and nothing to do with things like __attributte__((packed)), although both of these things are useful when you want to do that and understand the (non-)portability implications.

ncmncm · on April 28, 2022

Totally missing the point. This not about exact bit placement, which is a whole other bag of worms.

iainmerrick · on April 27, 2022

With shifts and masks you know where the bits are.

You know where the bits are within a single word. But if you have a struct with multiple fields, it’s not safe to rely on the exact memory layout even if it doesn’t have any bitfields.

If you need to represent a very specific memory layout, it’s not just bitfields you need to avoid, it’s structs in general.

Conversely, if you don’t need to guarantee a specific layout, bitfields are fine to use, and could be a useful optimisation hint for the compiler.

ncmncm · on April 27, 2022

In other words, you don't understand.

iainmerrick · on April 27, 2022

Here’s an example where I think bitfields are totally appropriate:

Say I have a window manager, and I want to attach a bunch of boolean flags to each window object (isVisible, isMaximized, etc). I don’t need to serialize them to disk. It’s highly preferable that they should be efficiently bit-packed, but not strictly essential.

The conservative way to implement that would be bit-shifts and masking (either manually or via a macro). But implementing it with bitfields would be a lot easier and less error-prone, and would work just as well. What problems do you see with the bitfield approach?

ncmncm · on April 28, 2022

Missing the point, again.

If it works on your particular compiler release, on your particular CPU chip stepping, that tells you nothing about the next compiler over and the next chip over.

amd64 and arm64, compiled with gcc or clang, you are unlikely to run into these problems. But code tends to get around.

iainmerrick · on April 28, 2022

It sounds like the point you want to make is that the implementation of bitfields is often buggy, therefore you should always avoid using them - does that sound right?

If so, I think that’s overly paranoid. The examples that are being given here are baroque usage that would immediately stand out in a code review - memory-mapped registers, conditional lvalues, volatile and atomic fields.

The point I wanted to make is that simple straightforward usage of bitfields, like the example I gave, works fine on any platform you’re likely to encounter.

There’s plenty of widely-used code out there that uses bitfields. I just did a code search to check that (the particular example I was thinking of comes from iOS) and found some in Clang - funnily enough, in its representation of lvalues!

dmitrygr · on April 27, 2022

sometimes you do not care, and

   x = foo.a

is simpler than

  x = (foo & FOO_MASK_A) >> FOO_SHIFT_A

and for assignments, the difference is even bigger:

  foo.a = x

is much better than

  foo = (foo &~ FOO_MASK_A) | ((a << FOO_SHIFT_A) & FOO_MASK_A)

InitialLastName · on April 27, 2022

The case where a) you don't care about the in-memory representation of your struct and b) you care a lot about being able to pack into the absolute minimum memory space, but not enough to make sure the compiler actually packs the fields (depending on architecture and optimization settings, they might not!) is vanishingly small.

The more frequent perceived use for bit-fields (in the situation where they actually work) is to pack into a serialized data format, such that memory or a data stream can be accessed elsewhere. In that case, "the compiler can do whatever it wants with your data packing" is pretty useless, since your "elsewhere" might have a different compiler that does a totally different thing.

dfox · on April 27, 2022

Optimization settings should not affect memory layout as that is specified by ABI (and large part of the “art of structure packing” is about manually reordering struct fields because the compiler cannot do that however obvious the optimization would be).

And as for the second part: anything that writes sizeof(struct foo) bytes of struct foo is inherently non-portable. If you portably want to (de)serialize something you want to write the thing explicitly, very often the compiler will optimize it to more direct implementation. (And well, this is only portable to platforms where CHAR_BITS == 8)

ncmncm · on April 28, 2022

The operative word is "should".

Anything that affects the actual instructions executed on the actual chip they're executed on may make what works here not work there.

Optimization that does not affect instructions is no optimization at all. Bitfields are an extremely fragile part of implementations. Trust it at your own risk.

dmitrygr · on April 27, 2022

> is vanishingly small.

Ladies and gentlemen, this thought is why we now consider 8GB of ram to be a "weak device".

No, no no no no, 1000 times no. Every situation is a low ram situation. Every!

InitialLastName · on April 27, 2022

What I'm saying is that the case where you want to use less RAM for a bit field but you don't actually care if the compiler allocates less then an addressable line of RAM for that bit field (because it actually just might not) is pretty empty.

Edit: I know it's hard to read a whole sentence at once, but I made that same point directly up there too.

WalterBright · on April 27, 2022

One of the annoying things about doing it manually is you have to come up with all those special identifier names for the shifts and masks.

zozbot234 · on April 27, 2022

If "foo" is defined as part of an API/ABI that's used in multiple compile units you will always care, since otherwise a random change in "implementation defined" bitfield encodings on some obscure architecture might break your build. Bitfields are a misfeature in most real-world cases.

PaulDavisThe1st · on April 27, 2022

You omitted the CAS required for assignment in threaded or otherwise reentrant code. Understandably, of course.

chrisseaton · on April 27, 2022

Surely there’s an ABI? Otherwise how does this work at all?

masklinn · on April 27, 2022

> Otherwise how does this work at all?

Hopes, prayers, and a single version of a single compiler being involved.

ncmncm · on April 28, 2022

This. The ABI is an aspiration, not an implementation. It has turned out well elsewhere than bitfields.

throwaway7033 · on April 28, 2022

Most ABIs are complex and only partially specified; you're playing with fire if you use bitfields or other ill-defined features in public APIs.

pavlov · on April 27, 2022

Never ever use bitfields in structs that may cross library boundaries. There are some corners of C that are not fit for public APIs.

kevin_thibedeau · on April 28, 2022

Could be neither if char is bigger than 8-bits.

ncmncm · on April 27, 2022

Physically, yes. The difference is whether you let the compiler generate and hide the shift-and-mask ops, or code them by hand. Normally it is better to leave details to the compiler. This is the exception to that rule.

A result of people avoiding declaring bit fields in serious use cases has been that compiler vendors didn't worry too much about bitfield codegen bugs.

Probably Gcc and Clang are OK on x86, by now. But that does not carry to, e.g., obscure microcontrollers. Heaven help you if your bit field members are supposed to correspond to hardware register sub-fields.

iainmerrick · on April 27, 2022

The same applies to structs in general, not just bitfields.

ncmncm · on April 27, 2022

Common experience is that compilers, ABIs, and instruction implementations get ordinary struct fields right.

layer8 · on April 27, 2022

Bit fields are a specific C language feature, allowing to treat bit slices as a unit whose length doesn’t need to correspond to one of the integer types. See for example https://docs.microsoft.com/en-us/cpp/c-language/c-bit-fields....

chrisseaton · on April 27, 2022

Yeah they compile to the same machine code operations though. If those machine code operations aren't right as a bitfield then they aren't going to be right done manually either.

https://godbolt.org/z/csvTx89EG

layer8 · on April 27, 2022

Bit field support being buggy exactly means that they don’t compile to the same machine code as the bit shifting/masking code you would write by hand (if your hand-written code is correct).

alcover · on April 27, 2022

Is it not rather

  void bar(struct y *s, unsigned int foo) {
      s->c = (s->c & 0xf0) | foo;
  }

tom_ · on April 27, 2022

ARM is little-endian, and by tradition bitfield bit indexes are assigned from least significant (bit 0 in ARM terms) to more significant. b occupies bits 4-7 inclusive.

minipci1321 · on April 27, 2022

> Yeah they compile to the same machine code operations though.

Not always. Switch your example to AARCH64 and check out the BFI instruction.

dragontamer · on April 27, 2022

I feel like "struct of arrays" style coding has really taken off in the past decade, and seems to be the best way to maximize memory operations these days.

Its not so much that "structure packing" is dead, as much as a wide variety of techniques have been developed above-and-beyond just simply structure packing. There's many ways to skin a cat these days, and packing your structures more intelligently is just one possible data optimization.

layer8 · on April 27, 2022

That entirely depends on the access patterns. SOA makes sense when you don’t often access the different fields of the same object (array index) at the same time. If you do, on the other hand, then AOS is more efficient.

TillE · on April 27, 2022

Right. There's a little too much cargo-culting in the "struct of arrays" pattern, you really want to understand why it works or doesn't.

If you have some giant bloated struct and you only care about one or two fields at a time, that's one thing. But if you have a well-aligned, correctly packed struct and you're processing all its data, it's total nonsense to break that up.

dragontamer · on April 27, 2022

I certainly think SOA has been cargo-culted to all hell and back.

But empirically speaking, it seems like SOA / AOS is the easiest "beginner topic" to get high performance-programmers thinking about memory-layout issues.

Maybe in the 90s or 00s, it was more popular to think about struct layouts, alignment issues and the like. But today, SOA is popular because RAM has gotten less... random... and more sequential.

I think its the changing nature of 90s era computers (RAM behaving more random-accessy) vs the nature of 10s era computers (RAM behaving more sequential-accessy)

--------

Its not like the 90s techniques don't work anymore. But the 10s technique of "structure of arrays" and then iterating for-loops over your data works better with prefetchers, multiple-cache hierarchies, and other tidbits that have made RAM more sequential than ever before.

Hopefully programmers continue to study the techniques and understand what is going on under-the-hood, instead of cargo-culting the pattern. Alas, we all know that cargo-culting works in the short term and is easier to do than actually learning the underlying machine!

layer8 · on April 27, 2022

It’s similar to row vs. column oriented databases.

pclmulqdq · on April 27, 2022

"Struct of arrays" becoming popular may also have something to do with few people understanding structure packing. AOS has much better performance if you pack your structs well than if you pack them naively.

monocasa · on April 27, 2022

Wasn't that one of the cool things about the language Jai? Struct definitions could be cleanly inverted between AoS and SoA at use time?

kolbusa · on April 27, 2022

It really depends on the domain. HPC is more frequently SOA (think CSR sparse matrices), while AOS may make more sense in other cases.

edflsafoiewq · on April 27, 2022

In modern dynamic languages, it usually easy to have an array of floats or ints, but an array of structs may not even really exist.

Also of course AoS has fixed fields, while SoA has dynamic fields.

CalChris · on April 27, 2022

Struct packing has its points for small structs. Indeed, you can reduce cache use and increase cache locality. However for large structs, page aligned structs, the cache lines will be constrained to a particular set. Moreover, pointer following from struct to struct can incur a TLB hit; the TLB is another small cache. So while you may cleverly encode things to squeeze size, you may then watch things slow to a crawl.

You are packing small structs in order to squeeze lots of them into the caches. However for large structs, you should at least consider refactoring them into small structs which you can then pack to your heart's content.

loup-vaillant · on April 27, 2022

I once encountered a structured that were packed, even though it shouldn’t have been. Took me over a day to notice where the error came from. I was poking at the internals of a library so I could gather information that it had, but did not provide. There was this context structure I normally only could access through a pointer, but copying the definition of the structure into my own code ought to do the trick…

…except it didn’t.

The way the library was compiled by default made the structure there smaller than my copy. Took me some time to guess why my data was all garbled, but the cause was pretty simple: there was no padding, even if it meant some members ended up unaligned. I had to replace the unaligned members by char arrays to get it to work (I did not dare explore the compilation options of the library).

And then I found a totally different solution for my problem. Oh well.

Comevius · on April 27, 2022

Also related: https://media.handmade-seattle.com/practical-data-oriented-d...

benibela · on April 27, 2022

Reminds me of this article on how to pack an XML DOM node into 8 bytes: https://blog.grijjy.com/2020/10/07/an-xml-dom-with-just-8-by...

Each element node has 6 pointers, so a naive struct would need at least 48 bytes, but you can do it with 8

digikata · on April 27, 2022

Is what the article says re: the pahole utility still correct (that it's not maintained)? Looks like it might be maintained now w/ kernel git under the dwarves area.

Pahole is a decent utility to look at what the packing of a structure actually ended up after everything has had its last effect in the compile chain.

waynecochran · on April 27, 2022

I found it interesting that Pascal had a "packed" keyword and C didn't (outside of implementation specific attributes like __attribute__ ((aligned (8))) in GNU).

layer8 · on April 27, 2022

The reason is that Pascal was used on computers with long machine words (e.g. 36 bits) where memory wasn’t byte addressable. It was customary (in assembly code) to “pack” multiple logical fields into a single word, in particular multiple characters of a text string. The “packed” feature in Pascal was added for that purpose.

mistrial9 · on April 27, 2022

bad compilers make bad days.. custom hardware used to (?) use memory locations to control/enable features.. anything from electronic access paths to actual servo-motors firing. Probably a better idea to use human-readable constructs and avoid this compact and tiny use pattern, IMO. If you want a tricky test for yourself, perhaps some actual hardware design is a better use of time these days?

MichaelMoser123 · on April 27, 2022

check if this is relevant to your platform: 64 bit Intel does allow unaligned access, so you don't get an interrupt on misaligned access, and it's acceptable to have one byte alignment. (I guess that's the reason why pahole isn't maintained any more, because it stopped to matter for the dominant platform).

Of course it may very different if you work on embedded systems...

hnur · on April 27, 2022

"a technique for reducing the memory footprint of programs in compiled languages with C-like structures"

I figure that's not the primary reason for structure packing, but rather for fine-grained control over writing to very specific memory layouts (think global descriptor table) as structs.

UncleEntity · on April 27, 2022

I know in blender there is a compile time check to ensure the structs are properly packed and it has nothing to do specific memory layouts.

I think it has to do with reading/writing them to disk but honestly never cared enough to ask anyone. Did make things convenient sometimes when you could ‘steal’ a padding value and magically got backwards compatibly because the older versions just ignored that field (and when reading an older file just set it to a sane default).

pokey00 · on April 28, 2022

Yeah, I use them all the time for the specific memory layout part. At the same time, I've never had to go as far with optimization as trimming a few bytes off my structs for the program's memory footprint.

If that struct is getting written to storage or shoved over the wire though, I'll always optimize/pack them down, both for the size reduction and because it makes it easy to reconstruct on any other CPU.