On that note, something that caught me off guard once is that C11 _Alignof and GCC __alignof__ can differ: for example in 32-bit x86 __alignof__(double) == 8 but _Alignof(double) == 4; however __alignof__(struct { double d; }) == 4. Apparently __alignof__ gives the preferred alignment whereas _Alignof gives the alignment required by ABI.
The language I'm working on has the following syntax
bitfield ColorWriteMaskFlags : u32 {
red 1 bool,
green 1 bool,
blue 1 bool,
alpha 1 bool,
};
The number after the field name is the bit size. An (offset, size) pair can be used instead (offsets start at 0 when not explicit). After that can be nothing (the bit field is an unsigned number), or the word 'bool' (the bit field is a boolean) or the word 'signed' (the bit field is a signed number in two's complement). The raw value of the bitfield can always be accessed with 'foo.#raw'.
EDIT: There's also no restriction to how multiple fields can overlap, as long as they all fit within the backing type.
The one thing that would worry me here is the endianness e.g. is red the LSB or the MSB?
Also what happens to the padding (bits not covered by subfields)? How’s that going to look when shoved into a file or over a socket? how does the langage handle overflow (more bits in the bit fields than there are in the parent field)?
Red is the least significant bit, because offsets start at bit #0. Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing, so it's just the native endianness (if you, e.g., need a network format with specific endianness and a 32-bit bitfield, you can instead use 4 consecutive 8-bit bitfields laid out in a consistent way). Unspecified bits are ignored; they'll be zero if the bitfield is initialized through conventional means but if they can be accessed directly through the raw value (or e.g. through memcpy, pointer casting, and any other direct memory access means), but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.
> Red is the least significant bit, because offsets start at bit #0.
So red is the LSB because you decided it was the LSB. That is not a by-definition thing.
> Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing
That’s not actually true. There are formats which process bytes LSB to MSB, and formats which process them MSB to LSB. E.g. git’s offset-encoding, the leading byte is a bitmap of continuation bytes, bit 0 indicates whether byte 7 is present.
Both are perfectly justifiable, one is offset-based, while the other is visualisation-based as bytes are usually represented MSB-first (as binary numbers, essentially).
> but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.
I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?
> I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?
Well, I meant that loops are limited within the normal constraints of the C standard, which doesn't offer tail call elimination, but indeed as you point out it is easy to work around it in practice using clang or gcc optimization passes.