If memory controller could be given a mask for what bits or bytes are wanted, then the memory overhead could be reduced without changing the representation in the programming language.
At that point the compiler can optimize memory access by generating the mask.
If you had the opportunity to change the entire cache hierarchy at a hardware level and make all the compilers support that, you could do a lot of things.
At that point the compiler can optimize memory access by generating the mask.