Floating point numbers aren't ambiguous in the least. They behave by perfectly d...

solid_fuel · 2025-09-09T01:17:23 1757380643

I understand what you're saying, but at the same time floating point numbers can only represent a fixed amount of precision. You can't, for example, represent Pi with a floating point. Or 1/3. And certain operations with floating point numbers with lots of decimals will always result in some precision being lost.

They are deterministic, and they follow clear rules, but they can't represent every number with full precision. I think that's a pretty good analogy for LLMs - they can't always represent or manipulate ideas with the same precision that a human can.

sfpotter · 2025-09-09T02:58:30 1757386710

It's no more or less a good analogy than any other numerical or computational algorithm.

They're a fixed precision format. That doesn't mean they're ambiguous. They can be used ambiguously, but it isn't inevitable. Tools like interval arithmetic can mitigate this to a considerable extent.

Representing a number like pi to arbitrary precision isn't the purpose of a fixed precision format like IEEE754. It can be used to represent, say, 16 digits of pi, which is used to great effect in something like a discrete Fourier transform or many other scientific computations.

tintor · 2025-09-08T22:48:09 1757371689

In theory, yes.

In practice, outcome of floating point computation depends on compiler optimizations, order of operations, and rounding used.

sfpotter · 2025-09-08T23:52:11 1757375531

None of this is contradictory.

1. Compiler optimizations can be disabled. If a compiler optimization violates IEEE754 and there is no way to disable it, this is a compiler bug and is understood as such.

2. This is as advertised and follows from IEEE754. Floating point operations aren't associative. You must be aware of the way they work in order to use them productively: this means understanding their limitations.

3. Again, as advertised. The rounding mode is part of the spec and can be controlled. Understand it, use it.

GMoromisato · 2025-09-08T17:06:46 1757351206

So are LLMs. Under the covers they are just deterministic matmul.

sfpotter · 2025-09-08T22:18:34 1757369914

The purpose of floating point numbers it to provide a reliable, accurate, and precise implementation of fixed-precision arithmetic that is useful for scientific calculations and which has a large dynamic range, which is also capable of handling exceptional states (1/0, 0/0, overflow/underflow, etc) in a logical and predictable manner. In this sense, IEEE754 provides a careful and precise specification which has been implemented consistently on virtually every personal computer in use today.

LLMs are machine learning models used to encode and decode text or other-like data such that it is possible to efficiently do statistical estimation of long sequences of tokens in response to queries or other input. It is obvious that the behavior of LLMs is neither consistent nor standardized (and it's unclear whether this is even desirable---in the case of floating-point arithmetic, it certainly is). Because of the statistical nature of machine learning in general, it's also unclear to what extent any sort of guarantee could be made on the likelihoods of certain responses. So I am not sure it is possible to standardize and specify them along the lines of IEEE754.

The fact that a forward pass on a neural network is "just deterministic matmul" is not really relevant.

Chinjut · 2025-09-08T20:57:01 1757365021

Ordinary floating point calculations allow for tractable reasoning about their behavior, reliable hard predictions of their behavior. At the scale used in LLMs, this is not possible; a Pachinko machine may be deterministic in theory, but not in practice. Clearly in practice, it is very difficult to reliably predict or give hard guarantees about the behavioral properties of LLMs.

Workaccount2 · 2025-09-08T21:20:19 1757366419

Everything is either deterministic, random, or some combination.

We only have two states of causality, so calling something "just" deterministic doesn't mean much, especially when "just random" would be even worse.

For the record, LLMs in the normal state use both.

mhh__ · 2025-09-08T17:09:57 1757351397

And at scale you even have a "sampling" of sorts (even if the distribution is very narrow unless you've done something truly unfortunate in your FP code) via scheduling and parallelism.