Thanks, that's an interesting perspective. It also highlights one of the weak po...

canjobear · 2024-07-23T15:58:10 1721750290

This kind of thinking will lead you to ideas like algorithmic probability, where distributions are defined using universal Turing machines that could model anything.

bubblyworld · 2024-07-23T17:11:19 1721754679

Amazing! I had actually heard about solomonoff induction before but my brain didn't make the connection. Thanks for the shortcut =)

tel · 2024-07-23T17:27:13 1721755633

You can sort of do this over a suitably large (or infinite) family of models all mixed, but from an epistemological POV that’s pretty unsatisfying.

From a practical POV it’s pretty useful and common (if you allow it to describe non- and semi-parametric models too).

bsmith · 2024-07-23T07:30:20 1721719820

Couldn't you just add a control (PID/Kalman filter/etc) to coverage on a stability of some local "most" truth?

bubblyworld · 2024-07-23T10:07:08 1721729228

Could you elaborate? To be honest I have no idea what that means.

_hark · 2024-07-23T13:30:17 1721741417

I think what you're getting at is the construction of the sample space - the space of outcomes over which we define the probability measure (e.g. {H,T} for a coin, or {1,2,3,4,5,6} for a die).

Let's consider two possibilities:

1. Our sample space is "incomplete"

2. Our sample space is too "coarse"

Let's discuss 1 first. Imagine I have a special die that has a hidden binary state which I can control, which forces the die to come up either even or odd. If your sample space is only which side faces up, and I randomize the hidden state appropriately, it appears like a normal die. If your sample space is enlarged to include the hidden state, the entropy of each roll is reduced by one bit. You will not be able to distinguish between a truly random coin and a coin with a hidden state if your sample space is incomplete. Is this the point you were making?

On 2: Now let's imagine I can only observe whether the die comes up even or odd. This is a coarse-graining of the sample space (we get strictly less information - or, we only get some "macro" information). Of course, a coarse-grained sample space is necessarily an incomplete one! We can imagine comparing the outcomes from a normal die, to one which with equal probability rolls an even or odd number, except it cycles through the microstates deterministically e.g. equal chance of {odd, even}, but given that outcome, always goes to next in sequence {(1->3->5), (2->4->6)}.

Incomplete or coarse sample spaces can indeed prevent us from inferring the underlying dynamics. Many processes can have the same apparent entropy on our sample space from radically different underlying processes.

bubblyworld · 2024-07-23T15:05:54 1721747154

Right, this is exactly what I'm getting at - learning a distribution over a fixed sample space can be done with Bayesian methods, or entropy-based methods like the OP suggested, but I'm wondering if there are methods that can automatically adjust the sample space as well.

For well-defined mathematical problems like dice rolling and fixed classical mechanics scenarios and such, you don't need this I guess, but for any real-world problem I imagine half the problem is figuring out a good sample space to begin with. This kind of thing must have been studied already, I just don't know what to look for!

There are some analogies to algorithms like NEAT, which automatically evolves a neural network architecture while training. But that's obviously a very different context.

_hark · 2024-07-23T17:59:34 1721757574

We could discuss completeness of the sample space, and we can also discuss completeness of the hypothesis space.

In Solomonoff Induction, which purports to be a theory of universal inductive inference, the "complete hypothesis space" consists of all computable programs (note that all current physical theories are computable, so this hypothesis space is very general). Then induction is performed by keeping all programs consistent with the observations, weighted by 2 terms: the programs prior likelihood, and the probability that program assigns to the observations (the programs can be deterministic and assign probability 1).

The "prior likelihood" in Solomonoff Induction is the program's complexity (well, 2^(-Complexity), where the complexity is the length of the shortest representation of that program.

Altogether, the procedure looks like: maintain a belief which is a mixture of all programs consistent with the observations, weighted by their complexity and the likelihood they assign to the data. Of course, this procedure is still limited by the sample/observation space!

That's our best formal theory of induction in a nutshell.

bubblyworld · 2024-07-24T08:00:06 1721808006

Someone else pointed me to Solomonoff induction too, which looks really cool as an "idealised" theory of induction and it definitely solves my question in abstract. But there are obvious difficulties with that in practice, like the fact that it's probably uncomputable, right?

I mean I think even the "Complexity" coefficient should be uncomputable in general, since you could probably use a program which computes it to upper bound "Complexity", and if there was such an upper bound you could use it to solve the halting problem etc. Haven't worked out the details though!

Would be interesting if there are practical algorithms for this. Either direct approximations to SI or maybe something else entirely that approaches SI in the limit, like a recursive neural-net training scheme? I'll do some digging, thanks!