I understand what you're saying, but at the same time floating point numbers can only represent a fixed amount of precision. You can't, for example, represent Pi with a floating point. Or 1/3. And certain operations with floating point numbers with lots of decimals will always result in some precision being lost.
They are deterministic, and they follow clear rules, but they can't represent every number with full precision. I think that's a pretty good analogy for LLMs - they can't always represent or manipulate ideas with the same precision that a human can.
It's no more or less a good analogy than any other numerical or computational algorithm.
They're a fixed precision format. That doesn't mean they're ambiguous. They can be used ambiguously, but it isn't inevitable. Tools like interval arithmetic can mitigate this to a considerable extent.
Representing a number like pi to arbitrary precision isn't the purpose of a fixed precision format like IEEE754. It can be used to represent, say, 16 digits of pi, which is used to great effect in something like a discrete Fourier transform or many other scientific computations.
1. Compiler optimizations can be disabled. If a compiler optimization violates IEEE754 and there is no way to disable it, this is a compiler bug and is understood as such.
2. This is as advertised and follows from IEEE754. Floating point operations aren't associative. You must be aware of the way they work in order to use them productively: this means understanding their limitations.
3. Again, as advertised. The rounding mode is part of the spec and can be controlled. Understand it, use it.
The purpose of floating point numbers it to provide a reliable, accurate, and precise implementation of fixed-precision arithmetic that is useful for scientific calculations and which has a large dynamic range, which is also capable of handling exceptional states (1/0, 0/0, overflow/underflow, etc) in a logical and predictable manner. In this sense, IEEE754 provides a careful and precise specification which has been implemented consistently on virtually every personal computer in use today.
LLMs are machine learning models used to encode and decode text or other-like data such that it is possible to efficiently do statistical estimation of long sequences of tokens in response to queries or other input. It is obvious that the behavior of LLMs is neither consistent nor standardized (and it's unclear whether this is even desirable---in the case of floating-point arithmetic, it certainly is). Because of the statistical nature of machine learning in general, it's also unclear to what extent any sort of guarantee could be made on the likelihoods of certain responses. So I am not sure it is possible to standardize and specify them along the lines of IEEE754.
The fact that a forward pass on a neural network is "just deterministic matmul" is not really relevant.
Ordinary floating point calculations allow for tractable reasoning about their behavior, reliable hard predictions of their behavior. At the scale used in LLMs, this is not possible; a Pachinko machine may be deterministic in theory, but not in practice. Clearly in practice, it is very difficult to reliably predict or give hard guarantees about the behavioral properties of LLMs.
And at scale you even have a "sampling" of sorts (even if the distribution is very narrow unless you've done something truly unfortunate in your FP code) via scheduling and parallelism.