Many architectures have hardware support for stacks, which could be slightly faster than arbitrary load/stores. Only works in the function owning the stack frame of course, if you pass a pointer to a stack object somewhere else, it's back to being normal memory.