Indeed, if you an want immediate error on every out-of-bounds read, this won't be suitable. I do think one should always have the option to not opt into this. But there still exist use-cases where the benefit of being able to do partially-past-the-end loads would significantly outweigh this downside.
That said, clang's MemorySanitizer, and, similarly, valgrind, could still produce errors via tracking which bytes are undefined within registers; might be somewhat delayed between load and error, but still shouldn't allow such out-of-bound values to be used for much.
And, anyway, as this load would be a separate instruction/builtin (if so decided), UB of regular operations is unaffected. If the sanitizer in question doesn't track (partial) register definedness, it could just accept all of these explicitly-potentially-OoB loads; indeed not ideal, but the alternative is not being to write such performant code at all.
And there are already people doing this, just limited to doing so with data within a custom allocator. It would just be nice to have a mechanism to not be fully ruled out of using standard tooling at least for testing.
What's wrong with assembly? What's wrong with aligning a pointer and turning the sanitizer off if need be? If you're making machine specific assumptions then you should be programming against the machine rather than the language.
Assembly is a good fallback, but it requires that individual programmers special-case individual architectures (thus each likely ending up with an incomplete list, and definitely incomplete as soon as it stops being updated) instead of having a standard thing everyone can rely on and the compiler/sanitizers could actually understand, and which automatically extends to more architectures.
Not using sanitizers is, of course, an option, but, as should be obvious, is very much not optimal. Having a single "questionable" operation in one place does not mean that the programmer intends for the entire function or program to be all "programmed against the machine"; the rest of the code, and, to an extent, even the fancy operation in question, could still very much benefit from regular language tooling.
Such operations needn't even be machine-specific.
"try loading N bytes, accepting garbage out-of-bounds; return None if cannot be done without potentially faulting" says and requires nothing about the architecture, and is implementable everywhere (even if as always returning None).
Aligning pointers is another option, but is fundamentally in no way different from the memory-protection-boundary-based version in how much it relies on hardware specifics, and compiler/language builtins could still be made that allow for sanitizer-friendly usage. It might be more or less efficient depending on use-case, both are useful to have.
Of course, the best option would be that malloc, the linker, etc work together to guarantee at least N bytes of addressable memory past all user-accessible pointers, at which point architecture specifics completely stop mattering. This needn't change any behavior around sanitizers or regular loads; all it'd mean is that the "load N bytes with trailing garbage" operation can always succeed. Sanitizers could error on said op reading outside of the guaranteed readability size, and regular loads of course continue erroring on any out-of-bounds read. Compilers could even use this guarantee themselves to emit unmasked loads for loop tails.
That said, clang's MemorySanitizer, and, similarly, valgrind, could still produce errors via tracking which bytes are undefined within registers; might be somewhat delayed between load and error, but still shouldn't allow such out-of-bound values to be used for much.
And, anyway, as this load would be a separate instruction/builtin (if so decided), UB of regular operations is unaffected. If the sanitizer in question doesn't track (partial) register definedness, it could just accept all of these explicitly-potentially-OoB loads; indeed not ideal, but the alternative is not being to write such performant code at all.
And there are already people doing this, just limited to doing so with data within a custom allocator. It would just be nice to have a mechanism to not be fully ruled out of using standard tooling at least for testing.