There's zero benefit to just writing one byte. I'm having a hard time believing that a compiler capable of such analysis would use it for this purpose.
Conceivably there's a benefit when you can't load the value you want in one instruction (e.g. on many RISC architectures with fixed 32-bit instructions you can't have a 32-bit immediate)
for example on Alpha the most general way to load an arbitrary unsigned 32-bit value is
since it knows the original LDAH will be shared by either of the new values, it can save an instruction on each path.
And so, conceivably, if another thread changed the high 16 bits of x->y between the original store and the load before the comparison, we could observe a mixed value.
Of course, you'd have to create a condition where the compiler writes r22 back to memory and then loads it again but assumes the loaded value is still >= 65536 and < 131072.
Is this contrived? Absolutely. Is it _plausible_? Maybe.
―
Disclaimer: I've never used an Alpha in my life. This is all inferred from Raymond Chen's really amazing series about processors that Windows NT used to support. [0]
I think the point is that unless the language guarantees atomicity (and the compiler implements the guarantee correctly), counterintuitive behavior is permissible and somewhat common, and compiler behavior varies across and within CPU architectures.
For reference, volatile is still the wrong choice for thread safety. If you need your race condition semantics to be well defined, you must use atomics (do note that those atomics might compile down to simple assignments if the platform supports that, but only the compiler is allowed to do that).
Volatile is, well, volatile. You can basically only rely on the fact that the compiler respects source order and keeps all reads and writes (instead of hoisting and reordering).
I believe msvc used to put in memory barriers. Not sure if it still does. These days you are better off ignoring the entire keyword and using the properly specified atomic stuff.
For the record I think the specific example from the OP is rather silly and of course never an optimization on x86(_64), but I'd never pass up a chance to talk about more fun architectures ;-)
I've seen variable size accesses (sometimes 32-bit, sometimes bytewise) to memory locations (almost certainly field access via a dereferenced struct pointer, i.e. operator ->) when reverse engineering x86-64 code known to have been compiled with clang. However, that was very specifically for bitwise access - I don't know if the original source code used C's bitfield struct syntax or explicit masking. I also don't recall if it was just reads whose access size was inconsistent or writes too. It's almost certainly a code size optimisation when comparing to or storing immediate (compile-time constant) values.
Either way, if targeting a modern toolchain, I would strongly recommend using atomic_store()/atomic_load() from <stdatomic.h> for exchanging data between threads when "stronger" interlocked operations (CAS, exchange, atomic arithmetic, etc.) aren't required.
Be very, very careful with the memory order argument when using the "explicit" versions of these functions. Apple subtly broke their IOSharedDataQueue (lock-free & wait-free circular buffer queue between kernel and user space, used for HID events, for example) in macOS 10.13.6 (fixed in 10.14.2 I think) because they used memory_order_relaxed for accessing the head and tail pointers when transitioning to the stdatomic API from OSAtomic. Presumably in a misguided attempt at "optimisation". The writer would only wake up the reader if the head/tail state before or after writing the message to the buffer indicated an empty queue. Unfortunately, due to memory access reordering in the CPU, the writer ended up sometimes seeing a state of memory that never existed on the reader thread, so the reader would wait indefinitely for the writer to wake it up after draining the queue.
The use of volatile for exchanging data between threads, as recommended elsewhere on this thread, is iffy - make sure you know exactly what volatile means to the compiler you are using and the version of the C/C++ standard you are targeting.