It neglects to mention that bit fields have always been the buggiest part of C compilers, and there is never a good enough reason to rely on them, if you have a choice at all. Honest shift-and-mask operations on unsigned machine words are always better, if you absolutely must pack bitwise.
Bitfields make for much more readable code when accessing individual fields of hardware registers, although there are some caveats if the registers are poorly-designed. The main one is that bitfield writes are usually read-modify-writes, so if reading the register or writing back its current value causes something to happen, bitfields are a no-go. But when they work, you get code like:
Helper functions or macros are just as clean as the bitfield syntax. That said, hardware register access is one of those things that is intrinsically tied to a specific platform (and if you target multiple platforms, it will already be behind an abstraction layer), so you can usually know the quirks of how the toolchain for that platform supports bitfields, and use them accordingly. Still more work for people reading the code, though since there are a lot of hidden assumptions behind that deceptively simple "=" than with an explicit mask and shift.
>Helper functions or macros are just as clean as the bitfield syntax.
They can be, if done right, but then I have to remember the names of all the helper functions and macros. :-) An IDE can auto-complete bitfield names.
>Still more work for people reading the code, though since there are a lot of hidden assumptions behind that deceptively simple "=" than with an explicit mask and shift.
Depends on the platform. IIRC on ARM a bitfield access is masking and shifting, only done by the compiler instead of me. With optimized code I often have to look at the disassembly anyway if I want to know what's really going on.
> Depends on the platform. IIRC on ARM a bitfield access is masking and shifting, only done by the compiler instead of me. With optimized code I often have to look at the disassembly anyway if I want to know what's really going on.
ARM supports native bitfield loading[1], extraction[2] and clearing[3].
I remember being surprised that setting an individual bit on an AVR hardware register in assembly was so much shorter than doing all that C bit masking stuff.
It feels weird to see arguments like this when you could just use a language (C++ being the elephant in the room here) that lets you define methods, then call those methods instead.
A method that does will often (depending on the architecture) have much more overhead than a struct lookup. If you're doing hardware stuff, you often care about performance.
True, but this is making the assumption that whatever pointer offsets, shifting, and masking that you have in your method, to extract the 9th bit in the 5th word of a 512 bit struct, will result in the same operations as the simple struct access. It very well could be so (certainly for some architectures, most likely not for others), but I don't think all this complication is justified when a struct is available, especially if you're someone writing code for hardware, where these concepts are more known/less scary.
> Compilers do generate instructions specified in functions correctly, inlined or no.
Well, clearly that's not always the case, as shown in the article, and which is what started this comment chain.
But, I'm not sure how you reached that interpretation. I'm saying that the code that you put into the function will result in some operations, whatever they may be, inline or not. The struct access will result in some operations. The those two sets of operations may be different. Some architectures have specialized instructions for bitfield access. There's a good chance that the complier won't convert the shenanigans in your method to those specialized instructions. Maybe! It requires an understanding of your particular situation. But some people's work exist in a context where this is a reasonable choice, for them.
If you make bitfield members, and access those in an inline function, and code generation for bitfields tickles a bug, the access being in an inline function won't help.
If you code a shift-and-mask in an inline function, instead, your odds are better, same as if you made a macro for it.
This just illustrates how even in imperative languages like C/C++ programmers naturally reach for more declarative approaches. Type punning with a union is much the same story. It's just unfortunate that C/C++ has so many foot guns.
Absolutely, it makes far more sense to do some basic sanity checks instead of writing painfully awkward code out of fear of a hypothetical.
The standard may be flexible, but the behavior of a given compiler on a given platform will be consistent and very unlikely to change in the foreseeable future.
Perfectionist nerds love to obsess over a million scenarios which only exist in their head, or in obscure edge cases. This is not productive. Perfectly portable, future-proof, CPU-bug-proof code simply does not exist.
The only thing to be aware of here is that you shouldn't be trying to transfer data from one platform to another, loading it directly into memory, which is an insane thing to do in any case.
Testing formally-correct code can only ever find bugs in the exact compiler release and physical CPU chip you run the test on.
The bugs cited show up in various compiler releases and various chip steppings. Unless your code will only ever run on that exact machine, you have not tested it.
> Unless your code will only ever run on that exact machine, you have not tested it.
I don't understand this sentence. That is the purpose of the runtime check, to make sure the code generated by a buggy compiler, on a specific architecture, doesn't try to run. Bug reports can then be filed, as they should be.
If you're writing a Linux kernel module (known compiler) for a known architecture, which is where this is often used, then you have a known environment, so there's little risk, except the rare compiler bug, which is a risk that extends far beyond packed structs.
This is something that I've used, and I've seen used, often in the hardware world. I'm having difficulty sympathizing with all the fear in this comment section. Don't use it if its use can't be made rational, for you. But please don't tell me that I'm failing to get something that I am familiar with.
I certainly don't understand what you're trying to say, and replying with four words definitely won't help me.
Perhaps there's some confusion of what a runtime check is. A runtime check is executed at runtime, which means that it will be executed after compilation with a specific compiler, on a specific architecture, in the end user system. The runtime check verifies correct struct operations at runtime, as part of the initialization, and aborts the driver load, with a nice system message, if improper packed struct behavior (among other things) is seen. This covers your concern here:
> Unless your code will only ever run on that exact machine, you have not tested it.
It literally is tested on every machine, when the driver is loaded.
This runtime check handles the cases where a customer might try to run it on some new hardware/toolchain that we don't officially support. The official supported cases only have risk of new compiler bugs, since we only support certain architectures. Hypothetical compiler bugs are a problem for all software, and shouldn't be used to drive software design, beyond making sure there is good testing, which has nothing to do with compilers.
Maybe I'm missing something.
edit: ncmncm, I can't reply to you, but since I have code that does what you say is impossible, and since my comment already responds to what you wrote, I would suggest reading all of my comments, fully, a second time.
Yes. You will never have a startup test of all the instructions used, in all the circumstances where they are used. Startup tests are useful for checking the presence or absence of certain advertised features, such as SIMD instructions. They are useless for buggy compilers and buggy chip implementations.
And, what would the code do if it detected such a bug? Abort, or run other code that works OK anyway. So, run that code all the time, and you don't need to try to check.
If you are doing full functional tests of your ROM app on on each mask stepping before it is soldered into thousands of boards, congratulations, the warning isn't for you.
amd64 bugs tend to be in newer parts of the ISA. ARM provides chip makers very thorough tests, to protect their brand (though not all chip makers fix all the bugs tests find). So, many programmers will not encounter bitfield bugs. But code gets around, and tests are always less thorough than we wish.
Maybe I'm naive, but I don't see why fully functional runtime tests are required. The only purpose of a runtime tests is to test things that could vary, at runtime, like architectural incompatibilities, with memory layout things like endianness and bitfields being an example. These are trivial to implement in a few dozen bytes. If you have a ROM soldered into boards, then absolutely none of this would be a concern since the architecture would be intimately known, so I'm now completely lost on your point. Anyways, cheers!
Because the divider isn't necessarily one bit. Multi-bit register fields are very common. You can certainly hardcode the numbers if you want:
old_divider = (SpiRegs.CONFIG_REG >> 24) & 0xff;
which is easier to read but also easier to mess up when you're writing it. It's best to use the vendor's register definitions where possible. Although then you have the possibility of using the wrong #define constant, because all of these are just integers so the compiler can't tell you if you made a mistake.
Another "fun" issue with using numbers is that sometimes an int is 16 bits, so you have to do (13 << 24ul) instead of (13 << 24).
> bit fields have always been the buggiest part of C compiler
Not in my experience. The buggiest part was the preprocessor. You don't hear much about preprocessor bugs anymore because the C standard doesn't dare change it, and in 40 years people have finally got them working right :-/
Personally, I had to scrap and rewrite the C preprocessor 3 times to get it right.
Bitfields are used a lot when you have constrained resources, in my case in LTE/5G both on modem and bts sides. Every struct's field takes as much as it needs to and you leave rest as 'reservedX'. You never know when a new feature will have to be implemented and few more bits will be needed for some new field.
Without bitfields the code would be absolutely filled with bit-access macros decreasing readability and screwing with IDE's indexers and static analyzers big time
Not to mention the pain it would be to refactor/reorder/change fields sizes which is relatively painless with bitfields
The drawback though is that you can't reference individual fields by pointer. You basically need to write the equivalents of OOP getters and setters to really keep the code tidy. The compiler can't plan for such things on its own, not across multiple compilation units at any rate.
This is a valid and a main drawback of bitfields but in this context bit-access macros would be even worse since you would have pointers to a random uint32 within a struct and good luck keeping track where's what
Could you share the details (and compilers) of the issues you had because I'm actually kinda curious.
From my experience most of the bugs were just a 'normal' bugs i.e. human errors when writing and those were fixable by just figuring out what was implemented incorrectly. About 2% on a bts side were cache coherency issues because we had a multicore system without hardware coherency, so imagine, and similarily 2% were hardware issues on a modem side due to race conditions or whatever - harder to fix so workarounds. But miscompilation? x86 host tests are used to separate wheat from the chaff but the only tests anyone cares about, before commit, are on target
On modem side I remember one, maybe two if you push it, issues with miscompilation but it wasn't at all related to bitfields but the compiler was doing some stupid shit with register allocations
There are a lot of different C compilers (not so many C++), dozens of ISAs (although not as many as before) and ABIs (ditto) and many, many thousands of implementations of ISAs and ABIs, all done with widely varying degrees of attention to detail.
There are myriad places for mistakes to manifest. Caches, interrupt behavior, and sleep modes are favorite places for implementation bugs. But bitfields are a place ordinary programmers might still encounter them.
Anybody used to working with buggy one-off chip designs knows all about this. But most programmers are insulated from most bugs. The warning is for them.
Are there other parts of the C language you are avoiding in such an environment, or is it just bitfields? I suppose in any case you'll be using C89 instead of newer language revisions.
> Honest shift-and-mask operations on unsigned machine words are always better, if you absolutely must pack bitwise
Now you go ahead and teach GCC to use the arm UBFX instruction for those cases. It DOES use it for actual bitfields. shift + mask = 2-3 instructions (immediate load may be needed). UBFX is one.
The more complicated the instruction is, the less likely it was implemented to spec on all the various products and mask steppings you might execute on ... and the less likely its published definition exactly matches C or C++ Standard and platform C ABI specs. And, the less likely that ABI spec nails down all the details.
Compiler implementors don't like to guess, but don't get a choice. If the instruction provided doesn't match the Standard, which do they implement? Both choices are wrong.
Until I read this article that's what I thought C Bitfields _were_. I didn't realize the specification was so uselessly sloppy about alignment packing that a programmer couldn't reliably address specific bit field members with just a lowest bit first to highest bit field of exact widths. It's quite annoying that such is not what those are.
If the specification is lax, it's because the expected behavior is different across platforms. C (and C++) has to support platforms where bytes are more than 8 bits, where floats are non-IEEE, and (until recently) where signed integers are not 2's complement. If you want a specific behavior and don't care about portability all you have to do is read the ABI spec alongside the standard.
You have wholly missed the point of the warning. It does not matter what the spec says when the instructions fail to implement it, sometimes, on machines other than yours.
The ABI for a specific platform will define how bit fields are arranged in memory, or at least you can always rely on GCC to not change behaviour. They work perfectly well for data in memory, at work we use them all the time for storing tiny integers or flags.
They work on the machines where you have tested them. The ABI defines what is supposed to happen, not what will happen. Gcc will get it right on very heavily used archs, and very heavily used archs will generally execute the instructions right.
If your code only ever runs on the physical machines where you test, or only on very mainstream chip designs, then fine.
I've always thought Ada's syntax was the clearest as to what's being defined and accessed[1]. That being said, it's Ada...so good luck finding a use case.
Ada is often used in microcontrollers where this is very common.
I write command line tools in Ada, and some other things, I've found this bit layout very useful when reading/writing binary file formats, or when binding to C and matching C struct layout.
Fair. I’m quite fond of Ada because or my Pascal/Wirth early origins, but I personally haven’t ever needed to use it. Glad to hear it’s found some uses outside of the US military.
Yep. Another thing to watch out for is using bitfields to access device registers.
One time, I was working with an older embedded PPC architecture on a driver to talk to an FPGA that was attached to a 32 bit local bus. The problem is that it didn’t support unaligned accesses. The lower two address bits simply weren’t hooked up, so any access to byte addresses that weren’t a multiple of 4 would just behave as if you masked the two lower address bits off.
There was a structure that used bitfields to access bits in a register on that FPGA. It worked fine til I updated GCC, then it stopped working.
It turns out that the newer version of GCC would do a single byte unaligned read if you were accessing, say, bits 8-15 in a 32 bit bitfield, whereas the older GCC would read the full 32 bit word and shift/mask as needed.
It turns out you can force the older behavior with -fstrict-volatile-bitfields.
Took a minute to figure out, but I learned my lesson. That said, I don’t think you should really be doing IO that way anyway. I typically use IO accessor methods, sometimes with raw addresses, sometimes with struct overlays and taking the address of the member.
A bit field (in C/C++) is a weird object type that can only exist in a structure or union type, which kind of acts like an underlying regular integral type except for those situations where it does not.
For an example of why compilers might have issues compiling bit fields properly (although this requires C++, since C's ternary operator works on rvalues, not lvalues):
struct A { int x: 3; int y: 5 } a;
(choice ? a.x : a.y) = val;
The trick with a compiler is to rewrite complex constructions into simpler equivalent ones, then the code gen is much simpler and more reliable.
For example, in the D compiler the `while` loop doesn't survive the semantic pass, as it gets rewritten into a `for` loop. The `for` loop then gets rewritten into `if` and `goto` statements. The code generator only needs to learn about `if` and `goto`.
> in the D compiler the `while` loop doesn't survive the semantic pass, as it gets rewritten into a `for` loop. The `for` loop then gets rewritten into `if` and `goto` statements
You throw away structured control flow in favour of unstructured control flow?
I would have thought it'd be the other way around and you'd be trying to recover nice clean structured control flow from raw concepts like goto.
Most middle-end compiler IRs are built on conditional/unconditional branches rather than high-level control flow structures. The key concepts at that level can usually be expressed in terms of dominators, postdominators, and/or control-dependence [1]. For example, a (natural) loop exists when a block dominates one of its predecessors. Induction variables in such loop can be very easily discovered, especially if you're already in SSA form (already usually the case at this level of IR).
So, in short, structured control flow--or at least a sufficient subset of such structured control flow--can be easily recovered from low-level information, and you're generally not going to lose much information going down to that level. LLVM even has a way to attach metadata to loops despite not having any dedicated loop construct.
[1] You only need two of these concepts: the third falls out from the definition of the other two.
> You throw away structured control flow in favour of unstructured control flow?
That's right. It sounds counter-intuitive, but it works great. You wind up with a collection of blocks of code connected by edges. Then, you can use graph theory to work magic on them in a general, correct way. Data flow analysis is based on this.
One thing the graph math does is enable the reconstruction of loops out of the blocks and edges - so you can write loops any way you please, and the compiler will figure it all out and apply general algorithms to it (like loop rotation, loop unrolling, etc.).
The compiler I work with most, Graal, does the opposite and keeps structured loops as a first-class concept in the graph almost all the way through compilation, and turns jumps into a structured loop if it can. We find it makes reasoning about the loops easier. It does make irreducible control-flow harder to deal with (and in some cases impossible.)
C doesn't constrain what e1, e2, and e3 are or what they do in `for(e1;e2;e3)`, and goto's can be willy-nilly turning it into a hash anyway. Hence, trying to use the higher level construct is going to result in lots of special case code, and special case code is much more likely to lead to problems.
Structured control flow is a thing to help people understand code.
The compiler's job is to emit instructions that do precisely the things the code says to do, as efficiently as possible. There is no need for chips to understand the purpose of the code; they just need to do what it says. They don't get confused.
> Which requires a high-level understanding of the program and its control flow.
The `for` loop construct does not offer the compiler any more understanding of the code than one made from `goto`s. They contain the same information, and are interchangeable. The latter, however, is more amenable to applying mathematical algorithms to. The former is more amenable to human understanding.
Why is unstructured control-flow better for applying algorithms than structured control flow? That seems counter-intuitive to me.
With structured control-flow I can do things like reason about the level of nesting of the loop that I'm in. I can peel a loop iteration by literally saying 'take this loop here - copy the body of it out'.
The first is lvalues. In compiler jargon, an lvalue is a kind of object that can have a value stored to it. And you can usually represent it as the address of some memory location [1]. Of course, bitfields break this representation: you need to know what the bit offset and bit size of the field you're storing is (as well as the signedness).
The next level of complexity is the conditional operator. This means that, when conditional operators yield lvalues [2], you now end up in a situation where the lvalue now has a conditional bit offset and bit size within the address. Or maybe one leg of the expression returns a bit-field and the other leg returns a regular int lvalue. Imagine how complex your datastructure needs to be to represent an lvalue during this code generation phase.
[1] Not all lvalues need to have memory locations. But if you're writing a C compiler, it's an easy first approximation to give every variable, even those marked register, some memory location and rely on an optimization pass to convert stack memory locations into register locations, rather than keeping track of this information when the frontend does code generation.
[2] As mentioned elsewhere, conditional operators in C do not yield lvalues. But conditional operators in C++ do.
It is just very finicky, with myriad edge cases easy to get wrong, and even easier to neglect to have complete tests for. Each target CPU design and version has quirks. Many involve sign extension.
Appear to be. Are, when all the stars align. Are not in fact, often enough that you are issued a red warning you may ignore if you are insulated from all consequences.
But, you’d have an even harder time making that work with mask-and-shift macros!
It also doesn’t seem like something that would come up very often. I can’t think of the last time I conditionally stored to one of two struct fields, if I ever have.
The much more normal case would be:
val = choice ? a.x : a.y;
That one seems pretty straightforward from a codegen perspective.
The example I gave is an example of something legal with bitfields (in C++) that is legitimately challenging to implement [1] that leads to bugs in compilers. It's not meant to be something that anyone is intended to use--indeed, I'd firmly suggest that the standard ought to prohibit this kind of usage.
The broader point is that bitfields are actually weird little objects that look a lot like regular objects in many, but not all, contexts. And it's very easy from a language design or implementation perspective to forget to account for the possibility that you're dealing with a weird little object. This leads to underspecified language specifications and compilers that crash if you do something weird (but legal) such as virtually inherit from a struct containing a bitfield as its last member.
[1] So challenging, in fact, that Clang gives an error message "cannot compile this conditional operator yet". It does work in g++, icx, and MSVC though.
You can make a lot of the "fun" of bitfields go away with lvalue-to-rvalue conversion, and C tends to do this conversion very rapidly so that it's hard to find good cases for truly bizarre stuff, whereas C++ makes lvalues last a lot longer.
Of course, if you go reach for C's standard "fun with lvalue" operations, you can get some crazy nonsense. What machine code should you generate here [1]:
struct A { int x : 5; volatile _Atomic int y: 3; } a;
a.y++;
I will note that the intersection of volatile and bitfields has been another fruitful area of compiler bugs [2] historically speaking. While C++ does provide better what-the-ever-living-fuck moments for bitfields, C has had its fair share of issues with bitfields.
[1] Whether or not you can make a bitfield _Atomic in C is implementation-defined, so it's possible that someone writes a C implementation where this is legal. I will note that, in a rare display of sanity, all C compilers I can test do in fact sensibly reject _Atomic bitfields, but for the purposes of argument, assume that someone has one where it's permitted, since it is allowable by the standard.
[2] Or programmer bugs blamed on the compiler. This is the intersection of two areas that are notorious for underspecification to begin with, and combined with the general tendency of programmers to expect C compilers to be a thin veneer over assembly, makes it awfully difficult to figure out which behavior is language-intended.
I vaguely recall that gcc supports this even in C mode.
The underlying problem has to do with whether the IR has first-class concept of arbitrary lvalue or whether the frontend has to convert lvalues that get passed around to some pointer-like thing.
It might look irrelevant for discussion of low-level AOT compilers, but it is also interesting to compare how this is implemented in dynamic/“scripting” runtimes and how the choice of underlying implementation of the concept of “lvalue”/“place” influences the user visible language. Somewhat notably first draft of Common Lisp had something akin to first-class lvalues and the final standard replaced all that with significantly simpler mechanism that purely relies on macros.
I mentioned in the note that it's not legal C, since ternaries must yield rvalues in C. It is legal C++, however, since there ternaries may be lvalues.
While the C and C++ language specs don't specify the layout of bitfields, modern platforms tend to have a specified ABI which compilers follow when compiling for that platform.
64-bit Linux distros and the BSDs follow the convention once set by the "C ABI for Itanium".
In that, bitfields are grouped in declaration order into container words of the same width as the bitfield's type (char, int, etc.).
Bitfields don't span multiple container words, and container words don't overlap.
On little-endian platforms, bitfields are packed LSB first, but on big-endian platforms they are packed MSB first within their container word.
Alignment rules apply only to the container words.
The point is that if you care about the resulting in-memory layout then you by definition know on what platform the code will run and what is the ABI.
If you want to produce same sequence of bytes regardless of underlying platform, then you have to do it by hand with uint8_t[] buffers and explicit shifts and masks. Casting pointer to struct to char* and writing it somewhere is inherently non-portable and this gas nothing to do with bitfields and nothing to do with things like __attributte__((packed)), although both of these things are useful when you want to do that and understand the (non-)portability implications.
With shifts and masks you know where the bits are.
You know where the bits are within a single word. But if you have a struct with multiple fields, it’s not safe to rely on the exact memory layout even if it doesn’t have any bitfields.
If you need to represent a very specific memory layout, it’s not just bitfields you need to avoid, it’s structs in general.
Conversely, if you don’t need to guarantee a specific layout, bitfields are fine to use, and could be a useful optimisation hint for the compiler.
Here’s an example where I think bitfields are totally appropriate:
Say I have a window manager, and I want to attach a bunch of boolean flags to each window object (isVisible, isMaximized, etc). I don’t need to serialize them to disk. It’s highly preferable that they should be efficiently bit-packed, but not strictly essential.
The conservative way to implement that would be bit-shifts and masking (either manually or via a macro). But implementing it with bitfields would be a lot easier and less error-prone, and would work just as well. What problems do you see with the bitfield approach?
If it works on your particular compiler release, on your particular CPU chip stepping, that tells you nothing about the next compiler over and the next chip over.
amd64 and arm64, compiled with gcc or clang, you are unlikely to run into these problems. But code tends to get around.
It sounds like the point you want to make is that the implementation of bitfields is often buggy, therefore you should always avoid using them - does that sound right?
If so, I think that’s overly paranoid. The examples that are being given here are baroque usage that would immediately stand out in a code review - memory-mapped registers, conditional lvalues, volatile and atomic fields.
The point I wanted to make is that simple straightforward usage of bitfields, like the example I gave, works fine on any platform you’re likely to encounter.
There’s plenty of widely-used code out there that uses bitfields. I just did a code search to check that (the particular example I was thinking of comes from iOS) and found some in Clang - funnily enough, in its representation of lvalues!
The case where a) you don't care about the in-memory representation of your struct and b) you care a lot about being able to pack into the absolute minimum memory space, but not enough to make sure the compiler actually packs the fields (depending on architecture and optimization settings, they might not!) is vanishingly small.
The more frequent perceived use for bit-fields (in the situation where they actually work) is to pack into a serialized data format, such that memory or a data stream can be accessed elsewhere. In that case, "the compiler can do whatever it wants with your data packing" is pretty useless, since your "elsewhere" might have a different compiler that does a totally different thing.
Optimization settings should not affect memory layout as that is specified by ABI (and large part of the “art of structure packing” is about manually reordering struct fields because the compiler cannot do that however obvious the optimization would be).
And as for the second part: anything that writes sizeof(struct foo) bytes of struct foo is inherently non-portable. If you portably want to (de)serialize something you want to write the thing explicitly, very often the compiler will optimize it to more direct implementation. (And well, this is only portable to platforms where CHAR_BITS == 8)
Anything that affects the actual instructions executed on the actual chip they're executed on may make what works here not work there.
Optimization that does not affect instructions is no optimization at all. Bitfields are an extremely fragile part of implementations. Trust it at your own risk.
What I'm saying is that the case where you want to use less RAM for a bit field but you don't actually care if the compiler allocates less then an addressable line of RAM for that bit field (because it actually just might not) is pretty empty.
Edit: I know it's hard to read a whole sentence at once, but I made that same point directly up there too.
If "foo" is defined as part of an API/ABI that's used in multiple compile units you will always care, since otherwise a random change in "implementation defined" bitfield encodings on some obscure architecture might break your build. Bitfields are a misfeature in most real-world cases.
Physically, yes. The difference is whether you let the compiler generate and hide the shift-and-mask ops, or code them by hand. Normally it is better to leave details to the compiler. This is the exception to that rule.
A result of people avoiding declaring bit fields in serious use cases has been that compiler vendors didn't worry too much about bitfield codegen bugs.
Probably Gcc and Clang are OK on x86, by now. But that does not carry to, e.g., obscure microcontrollers. Heaven help you if your bit field members are supposed to correspond to hardware register sub-fields.
Yeah they compile to the same machine code operations though. If those machine code operations aren't right as a bitfield then they aren't going to be right done manually either.
Bit field support being buggy exactly means that they don’t compile to the same machine code as the bit shifting/masking code you would write by hand (if your hand-written code is correct).
ARM is little-endian, and by tradition bitfield bit indexes are assigned from least significant (bit 0 in ARM terms) to more significant. b occupies bits 4-7 inclusive.
I feel like "struct of arrays" style coding has really taken off in the past decade, and seems to be the best way to maximize memory operations these days.
Its not so much that "structure packing" is dead, as much as a wide variety of techniques have been developed above-and-beyond just simply structure packing. There's many ways to skin a cat these days, and packing your structures more intelligently is just one possible data optimization.
That entirely depends on the access patterns. SOA makes sense when you don’t often access the different fields of the same object (array index) at the same time. If you do, on the other hand, then AOS is more efficient.
Right. There's a little too much cargo-culting in the "struct of arrays" pattern, you really want to understand why it works or doesn't.
If you have some giant bloated struct and you only care about one or two fields at a time, that's one thing. But if you have a well-aligned, correctly packed struct and you're processing all its data, it's total nonsense to break that up.
I certainly think SOA has been cargo-culted to all hell and back.
But empirically speaking, it seems like SOA / AOS is the easiest "beginner topic" to get high performance-programmers thinking about memory-layout issues.
Maybe in the 90s or 00s, it was more popular to think about struct layouts, alignment issues and the like. But today, SOA is popular because RAM has gotten less... random... and more sequential.
I think its the changing nature of 90s era computers (RAM behaving more random-accessy) vs the nature of 10s era computers (RAM behaving more sequential-accessy)
--------
Its not like the 90s techniques don't work anymore. But the 10s technique of "structure of arrays" and then iterating for-loops over your data works better with prefetchers, multiple-cache hierarchies, and other tidbits that have made RAM more sequential than ever before.
Hopefully programmers continue to study the techniques and understand what is going on under-the-hood, instead of cargo-culting the pattern. Alas, we all know that cargo-culting works in the short term and is easier to do than actually learning the underlying machine!
"Struct of arrays" becoming popular may also have something to do with few people understanding structure packing. AOS has much better performance if you pack your structs well than if you pack them naively.
Struct packing has its points for small structs. Indeed, you can reduce cache use and increase cache locality. However for large structs, page aligned structs, the cache lines will be constrained to a particular set. Moreover, pointer following from struct to struct can incur a TLB hit; the TLB is another small cache. So while you may cleverly encode things to squeeze size, you may then watch things slow to a crawl.
You are packing small structs in order to squeeze lots of them into the caches. However for large structs, you should at least consider refactoring them into small structs which you can then pack to your heart's content.
I once encountered a structured that were packed, even though it shouldn’t have been. Took me over a day to notice where the error came from. I was poking at the internals of a library so I could gather information that it had, but did not provide. There was this context structure I normally only could access through a pointer, but copying the definition of the structure into my own code ought to do the trick…
…except it didn’t.
The way the library was compiled by default made the structure there smaller than my copy. Took me some time to guess why my data was all garbled, but the cause was pretty simple: there was no padding, even if it meant some members ended up unaligned. I had to replace the unaligned members by char arrays to get it to work (I did not dare explore the compilation options of the library).
And then I found a totally different solution for my problem. Oh well.
Is what the article says re: the pahole utility still correct (that it's not maintained)? Looks like it might be maintained now w/ kernel git under the dwarves area.
Pahole is a decent utility to look at what the packing of a structure actually ended up after everything has had its last effect in the compile chain.
I found it interesting that Pascal had a "packed" keyword and C didn't (outside of implementation specific attributes like __attribute__ ((aligned (8))) in GNU).
The reason is that Pascal was used on computers with long machine words (e.g. 36 bits) where memory wasn’t byte addressable. It was customary (in assembly code) to “pack” multiple logical fields into a single word, in particular multiple characters of a text string. The “packed” feature in Pascal was added for that purpose.
bad compilers make bad days.. custom hardware used to (?) use memory locations to control/enable features.. anything from electronic access paths to actual servo-motors firing. Probably a better idea to use human-readable constructs and avoid this compact and tiny use pattern, IMO. If you want a tricky test for yourself, perhaps some actual hardware design is a better use of time these days?
check if this is relevant to your platform: 64 bit Intel does allow unaligned access, so you don't get an interrupt on misaligned access, and it's acceptable to have one byte alignment. (I guess that's the reason why pahole isn't maintained any more, because it stopped to matter for the dominant platform).
Of course it may very different if you work on embedded systems...
"a technique for reducing the memory footprint of programs in compiled languages with C-like structures"
I figure that's not the primary reason for structure packing, but rather for fine-grained control over writing to very specific memory layouts (think global descriptor table) as structs.
I know in blender there is a compile time check to ensure the structs are properly packed and it has nothing to do specific memory layouts.
I think it has to do with reading/writing them to disk but honestly never cared enough to ask anyone. Did make things convenient sometimes when you could ‘steal’ a padding value and magically got backwards compatibly because the older versions just ignored that field (and when reading an older file just set it to a sane default).
Yeah, I use them all the time for the specific memory layout part. At the same time, I've never had to go as far with optimization as trimming a few bytes off my structs for the program's memory footprint.
If that struct is getting written to storage or shoved over the wire though, I'll always optimize/pack them down, both for the size reduction and because it makes it easy to reconstruct on any other CPU.
The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=15626205 - Nov 2017 (49 comments)
The Lost Art of C Structure Packing (2014) - https://news.ycombinator.com/item?id=12231464 - Aug 2016 (112 comments)
The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=9517623 - May 2015 (4 comments)
The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=9069031 - Feb 2015 (113 comments)
The Lost Art of C Structure Packing - https://news.ycombinator.com/item?id=6995568 - Jan 2014 (143 comments)