Hacker News new | past | comments | ask | show | jobs | submit login
Zero size arrays in C (labbott.name)
104 points by ashitlerferad on May 11, 2016 | hide | past | favorite | 60 comments



A zero sized array declaration was a constraint violation (this requiring a diagnostic) in ISO C 90 in all contexts: even as a function parameter, where the identifier being declared is actually a pointer and not an array at all!

C99 added the "flexible array member" feature: the last element of a struct can be an array of size zero: "As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member." [C99, 6.7.2.1 ¶16]

In C90 code, the "struct hack" is implemented using an array of size [1] at the end of a struct. Some compilers allowed zero (like GNU C, I think) before C99.

In any case, supporting a [0]-sized array member which is not the last member of a struct, without issuing a diagnostic, is non-conforming. The example given in the article with consecutive [0] arrays requires a diagnostic.

By the way, the (apparently only?) advantage of the flexible array member is that we can use sizeof (type) to obtain the size of the structure just excluding the first element of the array. Whereas with the C90 style struct hack, we must use offsetof(type, last_member) so that we exclude [1] from the calculation:

  struct foo {
    /* ... */
    int array[ZERO_OR_ONE];
  };

  /* Correct in C99, if ZERO_OR_ONE is 0
     Incorrect in C90, (ZERO_OR_ONE can't be zero).  */
  size_t foo_plus_3_elems = sizeof (struct foo) + 3 * sizeof(int);

  /* Correct in C99 whether or not ZERO_OR_ONE is 0 or 1.
     Correct in C90 with ZERO_OR_ONE being 1. */
  size_t foo_plus_3_elems = offsetof (struct foo, array) + 3 * sizeof(int);
Here, "incorrect" means we calculate slightly more storage than needed, usually without any downside.

"Correct in C90" means de facto correct, not that it was well-defined behavior. "Everyone" was doing it.


I think you made a small mistake: an "incomplete array type", as allowed by the clause you cited, is one with empty brackets (int foo[]). Per 6.7.5.2.4:

> If the size is not present, the array type is an incomplete type.

This is different from specifying the size as 0, which is still not allowed, per 6.7.5.2.1:

> In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero.

Oh, and for the record, while you may be alluding to this in your last paragraph: another advantage of flexible array members is that indexing them is actually well-defined per the spec (if enough space has been allocated, of course), while AFAIK the spec has never added a special case for length-1 arrays, so accessing the "extra elements" is technically undefined behavior. However, common compilers like GCC do treat length 1 specially, as an exception to optimizations that generally assume you won't index out of bounds, so 'technically' really is 'technically' - it's not something that some random future version of GCC is likely to break your code if you use.


Even though I got the quote right out of C99, I still mixed it up.

The reason for this is that ... I don't use flexible array member myself or the [0] GCC extension, either.

Flexible array member doesn't seem to bring anything to the table other than formal definedness of behavior. I can't imagine any implementation which supports flexible-array member in its C99+ modes, which makes [1] fail in the same modes or in the C99 mode. In any old compiler that doesn't have C99 support, that formally undefined struct hack is all you have.

I don't mind using offsetof(type, member) to obtain the size of the header before the array, rather than sizeof(type).

The classic hack is portable to C90 and C++98, making it good for "Clean C".

That said, flexible array member is elegant in the sense that it just uses an incomplete array type similarly to a file scope array declaration. Using [1] is a bit like while (1) instead of for (;;). I don't like the [1] in the struct hack, but I don't use C for its beauty.


This is right - a flexible array member is denoted with []. If you update the OPs example to try and put two of them in, you get this error from GCC:

  x.c:7:10: error: flexible array member not at end of struct


> "Correct in C90" means de facto correct, not that it was well-defined behavior. "Everyone" was doing it.

If "everyone" is defined as "people using GCC."


The use of an array of size [1] at the end of a structure is far broader than users of GCC. It is not a GCC extension; it's a trick which based on undefined behavior: accessing [2], [3], [4] ... in that array based on knowing that the space was malloced, which depends on the pointer arithmetic working, while the optimizer looks the other way.

C FAQ: [http://c-faq.com/struct/structhack.html]

"Despite its popularity, the technique is also somewhat notorious: Dennis Ritchie has called it ``unwarranted chumminess with the C implementation,'' and an official interpretation has deemed that it is not strictly conforming with the C Standard, although it does seem to work under all known implementations. (Compilers which check array bounds carefully might issue warnings.)"


I got the impression it was a GCC extension because it's listed in the "C Extensions" of the GCC documentation ( https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html ).

EDIT: the FAQ actually refers to arrays of one element where GCC makes a big deal about saving memory by accepting zero-length arrays. But you are correct that this kind of cleverness wasn't limited to GCC ( https://blogs.msdn.microsoft.com/oldnewthing/20031212-00/?p=... , https://blogs.msdn.microsoft.com/oldnewthing/20040826-00/?p=... ).


Three different ways to declare an array at the end of a structure:

[1] is the old school approach, which is still honored by compilers despite indexing past the first element technically being undefined behavior.

[0] is a GCC extension.

[] is C99.


That seemed pretty straightforward to me. A zero size array is better thought of as a pointer to that part of the struct, and if you put two back to back you get two pointers to the same spot. I'm not sure what that guy was expecting to happen.

The usual reason to use them is when you have a protocol header with a variable length data portion. Putting the zero length array in the struct allows you easy access to the data while not changing the sizeof() the header so you can avoid a bit of pesky pointer math and make the code easier to read.

Thinking about this some more, you could also use this as a ghetto form of union, but I'm not sure why you would want to.

  struct
  {
    char[0] theBytes;
    int     theValue;
  }


Funny, I used a union once to port code when I found-out zero sized arrays were a gcc thing:

  /*
   * SC C Compiler 8.8.4f1 (part of MPW 3.3) does not support zero sized
   * arrays. I use this 'clever' trick involving the preprocessor that
   * guarantees everything is aligned appropriately. It is possible with
   * this that some space will be wasted between the header and the
   * payload if the a single element of the payload is larger than the
   * header, but that is not the case here.
   */
  #if 0
  struct state {
      mbhdr_t       st_mb;      /* Screen 1, Line 1 */
  
      short         st_serln;   /* Screen 1, Line 2 */
      short         st_astln;
      unsigned char st_count;
      unsigned char st_unit;
      char          st_qs[2];
  
      char          st_units;   /* Screen 1, Line 3 */
      avunit_t      st_avms[0];
  };
  #else
  struct state {
      union {
          struct {
              mbhdr_t       suh_mb;       /* Screen 1, Line 1 */
          
              short         suh_serln;    /* Screen 1, Line 2 */
              short         suh_astln;
              unsigned char suh_count;
              unsigned char suh_unit;
              char          suh_qs[2];
          
              char          suh_units;    /* Screen 1, Line 3 */
          }   su_header;
          avunit_t  su_align[1];
      }   st_union[1];
  };
  
  #define st_mb     st_union[0].su_header.suh_mb
  #define st_serln  st_union[0].su_header.suh_serln
  #define st_astln  st_union[0].su_header.suh_astln
  #define st_count  st_union[0].su_header.suh_count
  #define st_unit   st_union[0].su_header.suh_unit
  #define st_qs     st_union[0].su_header.suh_qs
  #define st_units  st_union[0].su_header.suh_units
  #define st_avms   st_union[1].su_align
  #endif


> A zero size array is better thought of as a pointer to that part of the struct, ...

An array of any size is best thought of as an array. Arrays are not pointers. (See section 6 of the comp.lang.c FAQ, http://www.c-faq.com/.)

A zero size array is best thought of as illegal (if you want portability) or as a compiler-specific extension (if you don't mind depending on a particular compiler, perhaps gcc).


Technically correct, but using the zero length array trick creates clearer, more self documenting code. It's a tradeoff between readability and portability.


> Technically correct,

Or, as I prefer to say, "correct".

> but using the zero length array trick creates clearer, more self documenting code. It's a tradeoff between readability and portability.

Clearer compared to which alternative? I find C99-style flexible array members (defined with "[]") quite clear.


The guy was expecting a compiler error for bogus constructs, which C99 provides for flexible array member: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html


A zero sized array can only be used at the end of the struct. It only makes sense there.


You can use zero sized arrays anywhere in a struct. However, I think you can make the argument that they don't really make sense anywhere.


> C makes it very easy to get things subtly wrong. Yet another item to add to your code review checklist. Better yet, don't use zero size arrays unless you really have to.

You don't have to. C has flexible arrays in structs for this, and they actually work.


C99, yes. Earlier versions didn't have that.


And this is why in 2016 I'm still stuck with ANSI C. First person to tell me ANSI C is not C89 anymore wins a prize! I love C but I can't think of any other language that failed to gain traction as it got updates. Maybe similar to failure of Python 3 I guess but even Python 3 is getting reasonable uptake.


ANSI C is not C89 anymore. Where's my prize? 8-)}

The term "ANSI C" is still very commonly used to refer to the language described by the 1989 ANSI C standard. This usage is strictly incorrect, but too firmly entrenched to ignore.

The 1990 ISO C standard describes the same language, and was officially adopted by ANSI, making the 1989 standard obsolete. The 1999 and 2011 ISO standards, both also officially adopted by ANSI, each officially made all earlier standards obsolete. If you want to refer to the language defined by the 1989 ANSI C standard (and described in K&R2), call it "C89" or "C90".

C11 might get more traction than C99 did, since gcc defaults to C11 plus GNU extensions starting with release 6. (But Microsoft's C compiler is still behind the times.)


All the compilers I ever used in my professional career could do C99 but most companies limit themselves to ANSI C (good luck convincing anyone that the term is incorrect!) for "compatibility" reasons and it's got nothing to do with Microsoft (who uses their compilers for C anyway?).

Your prize for pointing out that the usage of ANSI C is incorrect is: disappointment in humanity.


I used to use VC++ to build C code sometimes. People write software in C, and if you use VC++ you might want to compile it. C and C++ are not the same, and the fixing-up process can be time-consuming and error-prone, even assuming the code didn't go full C99 with anonymous aggregates and designated initializers.

Fortunately, these days VS2013+ do an OK job of C99.


My experience is that C99 has been universal outside the Microsoft world for years now - even for embedded programming. I can't remember the last time there was any trouble using it.


Far as I get Microsoft is a C++ shop which is why they didn't care about C after 89. For embedded stuff, my memory is hazy but I think much of what became C99 was available a number of years before the standard was released.

Any event, 1999 was 17 years ago.


I'm reasonably sure that in C++ at least accessing an element outside the array is always UB. Is this legal in some sufficiently modern C?


Because it's a widely-used hack that predates optimizers' obsessive exploitation of undefined behavior, it's become part of the language's customary semantics and is supported everywhere I know of.

C99 adds a feature called flexible array members. If the final member of a struct is an array with no length (blank rather than zero), you're allowed to use the array to access memory after the struct with the appropriate subscripts.


It's not only an optimizer that could throw you off if you really are relying on overflowing an array's bounds. While C doesn't require bounds checks, it also doesn't prohibit them, so a safety-oriented compiler could insert bounds checks (except in the special case in structs you mention, which by the standard has to be allowed). That exists in the real world too; the Intel C compiler can insert bound checks, though they aren't enabled by default.


bound checking would act okay I guess. You usually alloc such a struct this way :

    t_packet *packet = malloc(sizeof(*packet) + payload_size);


The struct would be stored in a larger area of memory that has previously been malloc'd (to allow room for the data that you want to put after the header). So long as accesses to the data are before the end of the malloc, it won't be out of bounds and so won't be undefined behaviour.


Prior to its deletion for reasons unknown, there was a post that (reading between the lines) was calling you out on the term "access", which doesn't appear to encompass writing: http://port70.net/~nsz/c/c11/n1570.html#3.1

I'd class that as a valid quibble, if my interpretation is correct, but a quibble nonetheless. The general intent is clear enough from the standard, I think: http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p18

Further reading on the subject of undefined behaviour (apropos of nothing in particular): http://port70.net/~nsz/c/c11/n1570.html#3.4.3p1, http://robertoconcerto.blogspot.co.uk/2010/10/strict-aliasin..., https://groups.google.com/forum/#!msg/boring-crypto/48qa1kWi..., http://blog.metaobject.com/2014/04/cc-osmartass.html


Your second link seems to agree with my original post, i.e. it is not UB. Read on to point 20: http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p20 and it talks about my example of putting the struct in a malloc'd area of memory with space after the struct.


What's the equivalent pattern in C++ to do this then? Specifically a contiguously allocated but variable length struct. I was thinking you can use template:

    template <int N>
    struct ion_system_heap {
        struct ion_heap heap;
        struct ion_page_pool *pools[N];
    };
But that is not quite the same thing since ion_system_heap<10> and ion_system_heap<20> won't be the same type and makes it super hard to use. And a whole bunch of other problems like not knowing how many pages you want to allocate at compile time.

Yes you can just use a vector for "pools" but that allocates it on the heap instead of in the struct.


Two types must be the same size, if they weren't how would sizeof() work, or vector<> for that matter.


This problem is also on stackoverflow: http://stackoverflow.com/a/1391001.

It solved it by wrapping with a superclass that you downcast from. You won't be able to work with them by value due to object slicing but you can still have a vector of pointers:

    vector<Panda_without_bamboo*> pandas;
I am not sure how to do the downcasting though. Assuming the classes look like this:

    public Panda_without_bamboo
    {
       int bambooCount;
    }

    template<std::size_t N>
    class Panda_with_bamboo : public Panda_without_bamboo
    {
       int a;
       int b;
       Bamboo bamboo[N];
    }
You can't really do:

    static_cast<Panda<pandas[0].bambooCount>*>(pandas[0])


So it does seem like using flexible array members (first answer in that stackoverflow and the topic discussed in this submission) is a cleaner solution if you need contiguous allocation: http://www.codeatcpp.com/2007/10/dynamic-array-template.html


Of course you can create a vector of pointers and do whatever you want.

If you need contiguous memory though this is likely a terrible way to go about it since you are not only having to dereference pointers that could go anywhere, but each one will mean a separate heap allocation which will be slow for small objects and not scale with concurrency.


Note to future me: I checked the C++98 standard. a[i] is defined as *(a+i), the size of the array is not mentioned. So if a points somewhere where a+i is a legal access, this should be fine.

Edit: But casting the allocated memory around to a struct type might violate aliasing rules...


As far as I know, even things like:

uint16_t array[] = { 1, 2, 3, 4 }; cout << "Hello World " << 3[array] << endl;

are valid and well defined. Part of what makes memory access validation such a pain in the ass in C++ is that no attempts are really made at specifying rules for how you can come up with memory locations; just rules on which locations you can read and write from. You can do things like "((int)rand()) = 2", and nothing in the standard says this is an invalid thing to do, so long as you are lucky enough for rand() to point somewhere allocated.


This brings back so many memories of programming in C / C++ with pointers.

Good times.


What are you up to now?


What is the benefit of using a zero sized array over a pointer?


you can do

x = malloc(sizeof x + <variable_length))

x->zero_array points to the variable_length.

The advantage is that you get contiguous memory access and do a single malloc.

As opposed to:

x = malloc(sizeof x)

x->ptr = malloc(<variable_length>);

x and ptr are not necessarily contiguous.


But can't you write it so that it is contiguous using a pointer?


No, because a pointer points to some location in memory, not whatever follows the memory after the location of the pointer.

See [0] for a picture.

[0] http://c-faq.com/aryptr/aryptr2.html


You can

x = malloc(sizeof(x) + <variable_length>);

x->ptr = x + sizeof(x);


Except it requires sizeof a pointer more memory and an extra assignment after the malloc, which is why people use the zero-sized array instead (note this comment was to explain for the grandparent why you would normally prefer a zero-sized array instead).


I guess the simplest example is struct header { size_t size; char data[0]; }, where 'data' simply labels the beginning of what follows the 'header'.


I do that all the time, packets come in, packets go out. They have headers and 'data'. A lot of code that deals with the header information does not need to know anything about the data part.


When using shm_open and mmap there hardly is a shorter solution.


is there a difference between

struct{ int something; int someArray[]; }

and

struct{ int something; int someArray[0]; }

?


Yes; the latter one is the "flexible array member" which is supported in ISO C99 and later. The former is a constraint violation requiring a diagnostic.


Did you mean the other way around? I think someArray[] is a flexible array member and someArray[0] is a constraint violation.


You are right. OP has it backwards. gcc treats someArray[0] almost the same as someArray[].

https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Zero-Length.htm...

Other compilers might have similar treatment for someArray[0].


Just don't try to have both a zero sized array and a flexible array in the same struct. g++ 5 or earlier will compile that, but g++ 6 will not: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550


My C is rusty. I would have thought the former declared a pointer in the second member of the struct, i.e. typically a 4- or 8- byte field.

edit: clarifying, I thought int somearray [] was the same as int* somearray.


I may be wrong, but I think someArray[] only does that for function parameters.


Also for external declarations:

   /* file scope */
   int a[];

   /* later in same translation unit, or in another one: */

   int a[42];
In this case, we get an "incomplete array type" after the first declaration. The storage size isn't known and so sizeof a cannot be computed.

This is also a de facto kind of "flexible array", but not formally named such.

For instance we can do this:

   /* In proprietary kernel source code */
   /* Size deliberately unspecified */
   extern struct driver driver_table[]; 
The proprietary kernel code is compiled into .o files and linked into a .a archive library.

To complete the build, a driver table module must be added and some drivers. The module which defines driver_table can be open-source. It defines the size of driver_table by how many entries there actually are; users can configure as many or few drivers and re-build the kernel, without having to recompile the proprietary part in the .a archive.

Unixes were shipped like this once upon a time. The "device major number" is just the position in the device table.


Formal arguments of array type are transformed into pointers.


Isn't the pointer a zero size array?


I think you dropped this: \

¯\_(ツ)_/¯


Some softwares just delete all backslashes instead of handling it properly, in this case it would be converting it to an HTML entity (&#92;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: