I am ignorant of this particular aspect of C and most of C in general, but shouldn't the kernel developers try to follow the C standard instead of just a compiler's?
> it makes sense to treat GCC (actually "the C89 standard plus GCC extensions") as the standard.
While that might have been the case, this announcement says that one of the many reasons to stop using VLAs is to allow the kernel to be compiled with clang. The announcement reads like being able to compile the kernel with other compilers is a very desirable property, that has taken many years of hard work to achieve due to the incorrect assumption that "GCC is the C standard".
So while that assumption might have made sense back then, it does not appear to make sense now. If you treat one compiler as "the standard", chances are that that's the only compiler that you will ever be able to use. That's a bad strategic decision for a big project like the Linux kernel.
The C standard covers a lot of details and cases that the kernel doesn't need to worry about. That makes the C standard a lot more restricted and constrained. GNUC has some convenient features and extensions because it makes assumptions not guaranteed by the C standard that happen to work for all the platforms that Linux cares about, and also, not beeing driven by a comittee, it can generally iterate and improve the language in a more agressive and fast way.
Wrt C, unfortunately I believe there was a period where the standard was dictated by people who had no interest of even supporting previous versions of the standard.
You don't have to pass void to sizeof explicitly (which, by the way, is an operator, not a macro). The typical use case is arithmetic on void pointers, for example:
I'm actually kinda sad that VLAs didn't work out in general (they have been downgraded to an optional feature in the latest C standards). For all the downsides, they make working with matrices in C much easier.
Yeah, VLA is one of my favorite features in C. I would love if it was viable to use them for arbitrarily large temporary arrays. They lead to much cleaner code. Instead of
{
float *x = malloc(n * sizeof*x);
...
free(x);
}
you do simply
{
float x[n];
...
}
In image processing, you often need large temporary images, but it is dangerous to distribute code such as the above unless you play with the stack limits from outside your program.
It's non-standard (although we have tried to get it added to the next Cxx standard) but you can use __attribute__((cleanup)) here. In systemd code it is common to write this as:
_cleanup_free_ float *x = malloc (...);
Since cleanups are supported by both GCC and Clang it's not a real problem to use them on Linux, BSD and MacOS. We recently added them to libvirt, have used them in libguestfs for a long time, and as I mention above they are used in systemd for years. They are also applicable to many other cases such as automatically closing files at the end of the current scope.
Its unfortunate that C and C++ kept the mindset that automatically memory managed variables are stored on a SMALL stack, with the consequence of exceeding that small and unknown size being that your program crashes.
For variable sized arrays, there is already a bit of overhead in sizing the allocation. Would it really have been impossible to move large allocations to the heap with automatic free at exit of the scope?
> Its unfortunate that C and C++ kept the mindset that automatically memory managed variables are stored on a SMALL stack
This is not a property of the language, it's a property of the runtime its running on.
In a normal userspace program under Linux (at least), you can stack allocate megabytes and the stack will dynamically resize (note that shrinking the stack might not happen in a timely manner).
But in kernel space, there's a conscious decision to keep the stack small and not dynamic. While it's not impossible to have dynamic stack in kernel space, that has a lot of implications.
>For variable sized arrays, there is already a bit of overhead in sizing the allocation. Would it really have been impossible to move large allocations to the heap with automatic free at exit of the scope?
Not a compiler writer here so I'm not sure. I think it wouldn't be too difficult, but then this goes very much against the exact control you normally expect from C. And violating expectations like this is usually not good.
It would also produce bigger functions in a somewhat opaque manner, because you would need the code to deal with malloc and free and stack allocation at the same time.
You just can't recurse and the same function can't be called from isr and from main context at the same time. Basically, nothing is re-entrable. Otherwise it's quite fine. To be sure, you shouldn't be doing those things anyway if you have 64B of RAM and 512 words or ROM. :D
And even if it has a heap, the heap might not be available. In a kernel, the function might have been called in interrupt or atomic context; in userspace, the function might be part of the implementation of the heap itself, or a low-level function called before the heap is initialized.
C++ has std::vector, which doesn’t have any implicit size limitations. I haven’t kept up with the details, so this may have changed, but during the discussions for what C++11, the committee decided not to include VLAs in the official standard because the library already included std::vector.
The problem with vector is that it always heap-allocates. The ideal approach is the one where small allocations are on the stack, and large ones are on the heap, without the API user having to do anything about it.
C++ has had a proposal for std::dynarray floated for a while, which would basically have semantics allowing stack allocation, but wouldn't mandate it, leaving it as a quality of implementation issue (so it would be legal to implement it on top of vector, but it was assumed that compilers would go for optimizations given the opportunity). It didn't pass, unfortunately.
I guess in theory, you could pull it off with a very clever allocator. Bloomberg’s BDE has an allocator along those lines ( https://bloomberg.github.io/bde/classbdlma_1_1BufferedSequen... ), but I don’t think it made it into the standard (memory_resource header).
My point is only that runtime-sized arrays may have made it into the C standard, but they aren’t in the C++ standard.
The word "large" does not really mean anything. You are always using the stack, and there is not portable way to check whether it is full or not. Now that is unfortunate.
VLAs date back all the way to Algol 60. Most Algol-family languages inherited that trait, and many relied on it heavily. C (actually, B) excluded them because it was deliberately lower-level than was the norm for Algol family before.
The big difference is that in the Algol 60 family bounds checking was enforced.
"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
-- C.A.Hoare on his Turing's award speech
C and B designers had another point of view regarding security, or lack of thereof.
The point was that the way VLAs were designed in C99, it was a motorway to stack corruption, leading to dropping it out as optional feature in C11 and C++ not even considering them.
I don't see anything in the wording of C99 around VLAs that requires them to always be allocated on the stack. That implementations did so regardless is, arguably, a defect on their part, and a quality-of-implementation issue.
Well, the whole point of VLAs was supposed to placate the math/numeric crowd somewhat, and they generally expect things to be a little bit higher-level than what's otherwise mandated by the C "zero overhead" philosophy. Since VLAs are opt-in, code that cares more about perf could always skip them and not pay the tax.
But yes, it was probably too naive to expect that to work out.
There's no reason why VLAs have to be allocated uniformly on the stack. The C standard doesn't say that "auto" = stack. It just says that it has to be automatically deallocated when it's out of scope.
But even as implemented today, in practice, it all works fine if you're not working with really large arrays. In a lot of scenarios, you do not. This isn't really any different than using recursion when you reasonably assume that recursion depth is not going to be large. Sometimes it's just easier and clearer all around.
That it the main reason why so many CVEs are related to memory corruption in C code as cause, in practice things do not work fine.
Which was the main theme at Linux Kernel Security Summit 2018.
Just for info, if not wanting to bother watching Google's talk, 68% of Linux kernel exploits were caused by out of bounds errors, including those caused by VLAs misuse.
Though much less strict, this brings to mind the JPL Coding Standard for C [1] - it seems like a similar idea as a lot of JPL's rules, in that it is much more difficult to verify and ensure the stability of less static data structures; compare:
3: Use verifiable loop bounds for all loops meant to be terminating
The hard part with the kernel is its fanciful build system, Kbuild. It's great at turning off compilation of subsystems that you don't plan to use. The difficulty is that there are a large number of mutually exclusive number of configurations, so the config that's called `allyesconfig` really isn't "all" because you can only pick one of the mutually exclusive configs. Additionally, some configs don't just add/remove code, but can change which implementation gets compiled in (all via C preprocessor and -D flags).
Those configs mean there are massive combinations of what is actually set, and which code you're actually trying to compile. I imagine that's similarly difficult for static analysis; due to the build system and what code you run it over. I think a simpler approach with grep might actually work better.
I think they mean it’s removed for most of the modules in the kernel and therefore the ones that may be lurking about still aren’t being compiled on most configurations.
According to the article, “there may be a couple more VLAs hiding in hard-to-find randconfigs.”
I think VLA becomes a compiler feature. We can disable VLA using a GCC option flag: -Werror=vla.
So it should be quite easy to detect if VLA is used in the source.
The downside is if n is large you use a lot of stack, but you wouldn't notice if it's mostly called with low values. So a bit of a ticking time bomb.
I guess if you have a really large n, like more than a few pages worth, then certain values of m[i] might wind up in some other page allocation where it shouldn't.... I seem to recall a vulnerability like this a year or two ago. (Normally a stack can grow on a page fault by hitting a guard page, but if the n above is absurdly large...)
As @benchaney said you have to choose an appropriate guard. But the issues with VLAs are many
* the code is bigger, and people say is meaningfully slower
* super easy to accidentally blow the stack - in kernel land that is probably going to end in a panic
* using vLAs used to disable stack canaries for those frames, not sure if it still does
For a lot of cases where you are truly going to need variable length you’re unlikely to be penalised heavily by a heap allocation due to the relative cost of the operations.
I think it's worth noting that in kernel-land most things have fixed maximum sizes anyway. So while a VLA may seem like it's saving time it is probably cheaper to just use a fixed size array in most situations. In the cases where you really need something of dynamic size I suspect kmalloc isn't that expensive vs the pain that an implicit alloca can have at the wrong time. I'd also guess they probably don't play well with some interrupt handling code as well, as most interrupt handlers can't trigger a page fault or the kernel will panic. It's also worth noting that in general kernel stacks are usually much smaller than userland stacks almost to the point of painful.
really ? there's a difference but unless you keep allocating in a hot-loop this should hardly ever have observable costs : https://gcc.godbolt.org/z/SN9Ois
The problem is with how to access other variables on the stack. Usually the compiler uses an offset from the stack pointer, which without VLA's is known at compile time. However, if you have VLA's, you might need a calculation at runtime to figure out where that variable is stored. Or maybe the offset from the frame pointer is known statically, but then you can't do the "omit frame pointer" optimization and you thus waste a register. Or if you have a variable that is between two VLA's, then you need a runtime calculation regardless.
A really nice use for a VLA would be in the middle of a struct. You could make something like a PNG image file chunk as a struct, with the CRC at the end coming after the VLA. You get a struct of the correct size with the right fields, despite one part of the struct being of variable size.
The function fills in the struct. Maybe it then sends the struct via a packet-oriented protocol, perhaps over a datagram socket, so the length is transmitted implicitly in the packet size.
That’s different from a VLA, which doesn’t use dependent typing or any reliance on users keeping variables unmodified to guarantee they clean up the stack properly.
There is no problem cleaning the stack because the compiler has numerous ways to do that. One way is to keep a copy of the original n, as passed to the function. Another way is to save the original stack pointer.
Making this work is not much harder than supporting the alloca function.
Then you have to read part of the file, read some size fields, make a struct on the stack (for an image file??), copy the stuff you've read in already into that struct, and keep reading. And you now need to define how to pass a pointer of that struct type to helper functions for operating on a PNG.
I am not sure restricting the type of the size parameter makes VLAs a good idea if such a constraint is necessary. It seems brittle and too subtle. Will future readers get that this is the reason for the type choice? Aren't you one refactor away from making it effectively unbounded again? (I suppose a typedef and a comment can make the choice seem more explicit.)
Of course whether any of this matters will depend on the specifics of the matter, kernel code being the most conservative by necessity.
I'm not sure how a restriction would work. The kernel stack is 16kb (meaning uint16_t is already unsafe) & you don't actually know where in the stack you are when creating the VLA so even if you did restrict yourself to uint8_t that doesn't protect you in any way.
There's a huge difference in stack vs. heap allocations. Stack space might be effectively infinite in the basic user-space world, but not in any environment using green threads and certainly not in the kernel. In the Linux kernel this is made far worse by the fact that each task's stack sits right above the task structure itself, so a huge allocation might smash the task structure or the next task's stack. The latter is exactly what I had to debug a couple of jobs ago, and it was one of the more difficult debugging journeys of my long career (it was quite sporadic and manifested as hard hangs). VLAs make that kind of scenario much more likely. Being able to check the length before a potentially disastrous allocation is cumbersome, but safer.
There's no way to return an error from a VLA allocation, and also within C you have no standard way of knowing how much is safe to allocate on the stack (and in fact it may not be possible to know).
So C's VLA feature has been implemented on stack space. Why not on the heap? Seems safer. Is this strictly about convenience for implementation? Reclaiming stack memory at the end of the scope is already implemented, so I understand the motivation.
What I don't understand is: after all these decades of stack smashing vulnerabilities, why would such a shortcut have been taken?
Convenience costs performance, safety costs performance ... at some point you just have to suck it up and pay the performance penalty to make things safer and to improve programmer productivity.
You'd have to get longjmp to understand how to unwind these heap-allocated arrays, not only from the current function's stack frame, but for all the ones in the longjmp'd-over frames. In C-land, longjmp simply overwrites the stack pointer (and all other registers), which implicitly "deallocates" all the VLA arrays, and all alloca() allocations, in one fell swoop. There is no notion of a step-by-step unwinding of the stack, so it would be a big task to try to trap all intermediate VLA arrays.
I'd guess because VLA on the heap already existed and it's called malloc. The only difference would be free() being called automatically but that is not something the C standard would do lightly.
Stack allocation is easier to implement and faster than heap allocation. The former can be implemented in a single instruction (incrementing the stack pointer).
Are there any numbers as to what difference this will make to the compiled size and speed of the kernel? I imagine the difference in size will be tiny.
Did they profile before optimising[1] and identify it as a problem?
There is a Linus quote in the article: "It generates much more code, and much _slower_ code (and more fragile code),". "generates much more code" means a size difference.
If he had just said what you quoted he would be wrong in the general case.
He actually said it is all those things in comparison to doing fixed allocation. The reason that is relevant is the kernel stack is so small you can only use a VLA when you know in advance the upper bound on the size is small, but if it's small you may as well use the fixed upper bound, and if you do that it is always faster, smaller and less fragile than using a VLA.
He's wrong in the general case, because user land C programmers will replace a VLA with:
if (!(array = malloc(n * sizeof(array[0]))) fatal("I'm out of memory");
which compared to the VLA generates more code and is slower. In both the VLA and malloc() case if you run out of memory the program will die nice and deterministically (unlike the kernel), but in the malloc() case that will only happen if you remember to do the check whereas in the VLA case the compiler ensures it always happens.
Removing VLAs in structs (define a struct type in a function, use variables from the function in an array field's bounds) was a requirement, but "normal" VLAs are supported by clang just fine.
>>> - VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
one less vendor lock in (but since it's GCC, it makes me sad).
[1]: Just by way of example, one that I ran into recently: https://elixir.bootlin.com/linux/latest/source/include/uapi/...