Ah I now think I know what you were thinking of when you were talking about "noisy" in terms of K-parsimony, you were thinking of maximally random input strings.
> is that true intelligence might not have these big gaps between explanation-lengths that parsimonious TMs do.
I don't know the field and literature well enough to know if this is the case, is there a published result you can point me to?
> And there’s also the question of deduction; having a few redundant “theorems” on hand might make deductive inferences more efficient whereas parsimony would elide them.
Especially with the words "redundant", "deductive", and "efficient", it sounds to me that you have in mind something like CDCL SAT solvers learning redundant conflict clauses that help prune the search space. In respect to this recall that the AIXI definition/Solomonoff induction definition is noncomputable and so doesn't have a notion of efficiency.
Indeed, some optimally parsimonious TMs for some inputs are not going to meet fixed resource bounds on part of the input. Intuitively if you are concerned about a finite part of the input space, you can just tack them on to the definition of the TM to obtain a TM that has good efficiency on that finite space, at the cost of definitional parsimony. Possibly something in-between for particular infinite spaces exist (dovetailing with a more complex TM with better runtime characteristics that agrees on that space?) and I wonder if there might very well be an efficient frontier of parsimony against say time complexity.
Right, I’m not the most well read on this stuff either, so I’m wondering now if existing architectures operate on this
> efficient frontier of parsimony against say time complexity.
As you mentioned before regularization approximates parsimony, could it be that what’s gained from this loss of precision wrt parsimony are runtime guarantees (since now we’re mostly talking about constant depth circuit-esque DL architectures)? Or is the jump to continuous spaces more relevant? Are these the same?
> is that true intelligence might not have these big gaps between explanation-lengths that parsimonious TMs do.
I don't know the field and literature well enough to know if this is the case, is there a published result you can point me to?
> And there’s also the question of deduction; having a few redundant “theorems” on hand might make deductive inferences more efficient whereas parsimony would elide them.
Especially with the words "redundant", "deductive", and "efficient", it sounds to me that you have in mind something like CDCL SAT solvers learning redundant conflict clauses that help prune the search space. In respect to this recall that the AIXI definition/Solomonoff induction definition is noncomputable and so doesn't have a notion of efficiency.
Indeed, some optimally parsimonious TMs for some inputs are not going to meet fixed resource bounds on part of the input. Intuitively if you are concerned about a finite part of the input space, you can just tack them on to the definition of the TM to obtain a TM that has good efficiency on that finite space, at the cost of definitional parsimony. Possibly something in-between for particular infinite spaces exist (dovetailing with a more complex TM with better runtime characteristics that agrees on that space?) and I wonder if there might very well be an efficient frontier of parsimony against say time complexity.