So this is probably a silly question, but I'm curious. Why is it important to initialize memory to 0's before we use it? I know rust considers uninitialized memory unsafe but I don't get why. All that really matters is that we initialize it to something. Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?
> I know rust considers uninitialized memory unsafe but I don't get why. All that really matters is that we initialize it to something.
In the specific case of Rust, no, because not all values are valid. E.g., in the case of Rust, bool must be true (literally 1) or false (0); other values are UB. &str data must be valid UTF-8. Violating these invariants can cause other code that might depend on them to do odd things. E.g., an Option<bool> might rely on one of the other 254 values for the None variant[1], a UTF-8 decoder knowing the data "is" valid UTF-8 need not bound check continuation bytes leading to accesses outside the butter.
> Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?
Well, in Rust, see MaybeUninit (https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.h...). The docs page goes into the UB a bit, too (and names the above examples). So, you can … but it's often easier to just write a 0 or a known value & let the compiler make sure it's safe and optimize it out if it can.
Some languages, like Swift, basically do this: they enforce that dynamically allocated objects are definitely initialized before they are used.
Also, many C programmers (in my experience) don't use calloc -- they just use malloc, and then they initialize the memory. The basic danger is that you forget to initialize part of it, and then you have garbage data in your object fields. Zeros are arbitrary, but at least they (usually) cause a seg fault if you try to dereference them as a pointer, for example.
I guess there's a secondary danger that even if you do initialize all the fields, you might end up with padding bits that don't get initialized, and those would contain parts of an earlier object. That shouldn't usually matter -- being padding bits, they're not meant to be accessed -- but zeroing them out could act as a security mitigation.
Given the parameters of malloc() and calloc(), I assume it's to express intent---malloc() is used to allocate a single structure (thus, avoiding the whole multiplication issue) and calloc() for allocating an array (and initializing the entire array to a known value). Why realloc() doesn't take an items and size parameters, given that's it's almost always used to resize an array.
> Also, many C programmers (in my experience) don't use calloc -- they just use malloc, and then they initialize the memory. The basic danger is that you forget to initialize part of it,
I don't know about the latter part of your statement being true or not. The former is due to looping usage so it's better to do bzero as the first operation for use instead of the last for reuse. In other words, don't do the same functionality in two separate areas.
it's unsafe because it can lead to leakage of confidential information, and has many times in the past, but that's not the most common problem
the most common problem is that you write some code that dynamically allocates memory, you test it, it works fine, and then you write more code that depends on it, and later on (weeks, months, years) you run into some hard to reproduce bug
after a weekend of debugging it turns out your code only worked fine when you tested it because malloc was allocating freshly mapped pages the kernel had zeroed for you. so you didn't notice that you left some fields uninitialized in your records. ooops. you didn't need that weekend with your kids anyway
for this reason many debug allocators initialize your malloced memory as a form of offensive programming, and valgrind reports it as a bug if your program's behavior depends on uninitialized memory
The other unsafe practice was thinking "works fine" testing is sufficient. Instead the tests should always have been executed with Valgrind or similar before delivery, instead of manually on a weekend.
In what security model? If you assume there's an adversary that can read all your memory, then you're pretty screwed no matter what you do.
In C's and Rust's model deallocated memory is assumed to be completely inaccessible (and Rust does its best to ensure it is inaccessible), so in a valid (non-UB) program there is nothing that could leak its secrets.
I thought of a scenario like, your process deallocates a page of memory, so it gets marked as unused by the OS, then some other application requests memory and gets handed your old page, with all the data you left in there. Would that not be leaking information?
Operating systems with support for memory protection and virtual memory generally don't behave that way for security reasons. Everything that is allocated to a process will be zeroed explicitly or on first reference.
There are older operating systems that don't do that though. On the Amiga you have to explicitly ask the OS for memory to be cleared, and if you don't you may get uncleared memory left over from some other process.
Because it's fairly common to design structures so that zero-initialized memory is a valid object. Eg that's how Cap’n Proto "decodes" messages in 0us even when fields have default values: https://capnproto.org/
If the memory had the value of a password and was then unallocated, the value would still be there. Then when allocating the memory again, the value would be there. If you initialize it to something and then print it, there is no security issue. but if you print it before, then you just printed a password.
You could try having the OS guarantee zero initialized memory for all blocks that your program gets, but that's hard to enforce as a standard I imagine.
I think we do this in practice. The issue is that malloc doesn' t ask the OS for memory, it calls the system allocator which in turn asks the OS for memory. But once you've freed something, that allocator can give you that memory back without talking to the kernel
So if it’s hard to enforce, what’s the reason we go through the effort of zeroing anyway? Isn’t this one of those things that either need to be 100% reliable to be useful?
Or is this more about preventing escalation in case someone finds an exploit in your code?
Zeroing is just a form of initialization. If you initialize the memory right away (and completely!) with actual data, there's no need for additional zeroing.
Zeroing is something one should do if the allocated memory isn't completely initialized right away. In my experience, that's not only about someone finding exploits, but also about stability.
When people write code, most of the time they simply work with the assumption that an `int` contains a 0 by default and that code usually fails in unpredictable ways if it does not.
Accessing uninitalized memory is just always an error and having uninitalized memory laying around in your application has rarely any benefit. Either assign a value or zero it. Not doing either of those suggests the allocation isn't needed right there in the first place.
What if the program never called free? Or if it crashed? A bad actor could intentionally crash a program, and then allocate lots of memory, then search for the password in the memory.
If that's true, then the entire premise of calloc is invalid, isn't it? Newly allocated memory would always be zeroed anyway by the OS. Then what are we even discussing?
malloc is not a system call, it is a C library function. Modern operating systems clear memory when providing it to a process, but malloc implementations reuse the same memory locations previously used an arbitrary number of times within the process and normally do not clear memory when it is reused (within the process) for performance reasons.
> Why is it important to initialize memory to 0's before we use it?
Because reading from unitialized memory yields an indeterminate value. Operating on such a value can lead to UB. This applies to languages other than Rust too (e.g. C).
One thing I noticed is in C, probably C++, if you create a struct on the stack you can set every field in the struct and... alignment means the empty areas between your fields don't get set and thus leak information that was on the stack.
For some situations it's certainly like you say - not important to initialize memory upon allocation. That's how malloc is ordinarily implemented.
If your allocation is about to take on a matrix that you will load from a default value or calculation for every cell - that would only suffer unnecessary latency if memory were zeroed on allocation.
But if your allocation is about to take on a struct with many fields and elements, it could be easily misused by not initializing every field (including when new fields are added), then a zero or other default initialization might be convenient.
Imagine you have to initialize a big structure, that has a lot of members. If you don't zero-initialize it you have to manually initialize all the fields, one by one. Not only it takes a lot of lines of code that can be avoided, and it's less efficient to initialize all the zero fields than writing 0 in all the structure, but you can forget to initialize one and good luck debugging it. Or the structure can be modified by adding a optional field, that you have to remember to zero initialize everywhere you don't need it (instead of assuming it's zero because it's zero initialized).
Of course accessing not initialized fields can lead to catastrophic bugs, imagine for example a structure containing a pointer not initialized to NULL that points to a random location, or a char* field that doesn't have a null-terminator in it.
It's safer to 0 initialize everything, at least if you forgot to initialize something you have a predictable situation that is still better than the value of the field depending on what was there previously that is not deterministic.
>Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's
Because we might want to read from it before we write to it. The simplest case is an array: unless we ensure that we write to every index before we read from it, we risk reading out (potentially sensitive) garbage data if we didn't calloc it.
> All that really matters is that we initialize it to something.
People make mistakes and forget to do this and their program sometimes goes wrong in an unpredictable way. Maybe only going wrong when in production. It’s redundancy - most other fields of engineering do it, we seem to think it’s weird in CS.
I think the most annoying part about errors due to missing initialisation is that (especially with variables on the stack, so not those allocated with Malloc) they are coincidentally pre-filled with values from the functions that were called immediately before the current function. So instead of completely random behaviour you see somewhat random behaviour. And an off by 5 error is a lot harder to track than a off by several thousands error.
>Why can't we wait and let that be the first value that's actually meaningful instead of wasting time writing 0's?
Rust lets you do that.
let m: u32; // note: no mut
m = 5;
Even this is totally fine:
let m: u32; // note: no mut
if true {
m = 5
} else {
m = 2
}
>I know rust considers uninitialized memory unsafe but I don't get why
It is wildly unsafe, because the compiler has to be able to rely on the fact that memory slots it emits assembly code for actually use only those bits the compiler knows are in use.
For example, on ARM you have no registers < 32 bit. So the compiler has to use those registers for u8 as well, but it has to be sure that nobody actually sets the high bits, otherwise multiplication (etc) would do completely insane things and not at all what the program should have done.
Similarly, memory slots have a size that is a multiple of 8 bits (because the load/store instructions usually can load/store exactly 8, 16, 32 bits etc) and your type doesn't necessarily have exactly that number of possible inhabitants of the type (it would only be the case if number of possible inhabitants = 2^(8 n) where n is a natural number). Clearly, this is not possible to ensure in general.
Of course you have the compiler be totally defensive about it and issue masking instructions all the time--but that would make the resulting program slow.
Also, think of a bool. It has two possible values, right? But in C, any value that is not equal to 0 is defined to be true.
Now when the compiler emits code for a && b it can't do the obvious which would be
and r0, r1, r2
because that will do bitwise-and and let's say a = 4 and b = 1 then a bitand b = 0 but what you wanted is a && b == 1 since a is true and b is true. Worse if a = 5 and b = 1, it suddenly works.
So that would be a bad idea as well (you'd have to emit extra instructions to check whether a is 0 and so on).
>Why is it important to initialize memory to 0's before we use it?
It's not. It's just C being C.
I'd actually prefer C to have an alloc function with two arguments m and n where it does m * n with overflow checking and then allocates that and then does NOT initialize the memory to 0's.
That said, todays computers are very fast and 0 has no data dependencies. So I wonder how much overhead, if any, in wall time, it adds to always zero out.
> ...Why is it important to initialize memory to 0's before we use it?
I guess, when writing in C, memset/calloc to 0 is a practical way to blanket-initialize all members of an object, say, pointer vars could then be safely checked against zero (NULL) to test whether it's been assigned or alloc'ed, strings emptied, counts zeroed, so would be flags. Convenient!
Basically, it's a shortcut with structured objects to avoid spelling out and initialising individual variables explicitly.