Hacker Newsnew | past | comments | ask | show | jobs | submit | pegasus's commentslogin

Exactly. The example the article give of reducing resolution as a form of compression highlights the limitations of the visual-only proposal. Blurring text is a poor form of compression, preserving at most information about paragraph sizes. Summarizing early paragraphs (as context compression does in coding agents) would be much more efficient.

Yes, but the (semi-)autonomous entity you're referring to now is the whole company, including all who work there and design the LLM system and negotiate contracts and all that. The will to persist and expand of all those humans together result in the will to expand of the company which then evolves those systems. But the systems themselves don't contribute to that collective will.

How could that make sense? The emergent capabilities of the models are obviously critical to the evolution of that whole system.

You could say that that, yes, that kettle is intelligent, or smart, as in smart watch. But the intelligence in question clearly derives from the human who designed that kettle. Which is why we describe it as artificial.

Similarly, a machine could emulate meta-cognition, but it would in effect only be an reflection and embodiment of certain meta-cognitive processes originally instantiated in the mind which created that machine.


If you've read the book, please elaborate and point us in the right direction, so we don't all have to do the same just to get an idea how those gaps can be explained.

I'm going to go into my own perspective of it; it is not reflective of what it discusses.

The linked multimedia article gives a narrative of intelligent systems, but Hutter and AIXI give a (noncomputable) definition of an ideal intelligent agent. The book situates the definitions in a reinforcement learning setting, but the core idea is succinctly expressed in a supervised learning setting.

The idea is this: given a dataset with yes/no labels (and no repeats in the features), and a commonsense encoding of turing machines as a binary string, the ideal map from input to probability distribution model is defined by

1. taking all turing machines that decide the input space and agree with the labels of the training set, and

2. the inference algorithm is that on new input, the output is exactly the distribution by counting all such machines that assent vs. reject the input, with their mass being weighted by the reciprocal of 2 to the power of the length, then the weighted counts normalized. This is of course a noncomputable algorithm.

The intuition is that if a simply-patterned function from input to output exists in the training set, then there is a simply/shortly described turing machine that captures that function, and so that machine's opinion on the new input is given a lot of weight. But there exist plausible more complex patterns, and we also consider them.

What I like about this abstract definition is that it is not in reference to "human intelligence" or "animal intelligence" or some other anthropic or biological notion. Rather, you can use these ideas anytime you isolate a notion of agent from an environment/data, and want to evaluate how the agent interacts/predicts intelligently against novel input from the environment/data, under the limited input that it has. It is a precise formalization of inductive thinking / Occam's razor.

Another thing I like about this is that it gives theoretical justification for the double-descent phenomenon. It is a (noncomputable) algorithm to give the best predictor, but it is defined in reference to the largest hypothesis space (all turing machines that decide on the input space). It suggests that whereas prior ML methods got better results with architectures that are carefully designed to make bad predictors unrepresentable, it is also not idle, if you have a lot of computational resources, to have an architecture that defines an expressive hypothesis space, and instead softly prioritizing simpler hypotheses through the learning algorithms (e.g. an approximation of which is regularization). This allows your model to learn complex patterns defined by the data that you did not anticipate, if that evidence in the data justifies it, whereas a small, biased hypothesis space would not be able to represent such a pattern if not anticipated but significant.

Note that under this definition, you might want to talk about a situation where the observations are noisy but you want to learn the trend of it without the noise. You can adapt the definition to be over noisy input by for example accompanying each input with distinct sequence numbers or random salts, then consider the marginal distribution for numbers/salts not in the training set (there are some technical issues of convergence, but the general approach is feasible), and this models the noise distribution as well.


Are ”noisy” inputs here at all related to ones where their Kolmogorov complexity is their encoding length?

I don’t know how much I buy the idea that intelligence maximizes parsimony. Certainly true for inductive reasoning but I feel like there’s some tradeoff here. There are probably cases where a small TM explains a very large but finite set of observations, but if a few new ones are added the parsimonious explanation becomes much longer and looks much different from the previous one. I know this wouldn’t be under the same assumptions as the book though :p


If we accept the functional framing (as being able to give a suitable suggestion conditioned on input), then it seems to me that parsimony is the only sensible general framing; every deviation from that is something that is specific to an application or another and can be modeled by a transformation of the input space/output space.

> There are probably cases where a small TM explains a very large but finite set of observations, but if a few new ones are added the parsimonious explanation becomes much longer and looks much different from the previous one.

Indeed, to use an analogy, if you have 99 points that can be described perfectly by a linear function except for one outlier, then clearly your input isn't as clear-cut as might have been originally assumed.

On the other hand, you may be in a different setting where you have noisy sensor inputs and you expect some noise, and are looking for a regression that tolerates some noise. In such a situation, only when the stars align perfectly would your input data be perfectly described by a linear function, and we just have to accept that a broken watch is perfectly right twice a day whereas a working one is almost always only approximately right, but all the time.


Ah, what I was hoping to get at is that true intelligence might not have these big gaps between explanation-lengths that parsimonious TMs do. And there’s also the question of deduction; having a few redundant “theorems” on hand might make deductive inferences more efficient whereas parsimony would elide them.

All this to say I hope there are some gaps in our theoretical understanding of true AI, otherwise I wouldn’t be able to make a living filling them in


Ah I now think I know what you were thinking of when you were talking about "noisy" in terms of K-parsimony, you were thinking of maximally random input strings.

> is that true intelligence might not have these big gaps between explanation-lengths that parsimonious TMs do.

I don't know the field and literature well enough to know if this is the case, is there a published result you can point me to?

> And there’s also the question of deduction; having a few redundant “theorems” on hand might make deductive inferences more efficient whereas parsimony would elide them.

Especially with the words "redundant", "deductive", and "efficient", it sounds to me that you have in mind something like CDCL SAT solvers learning redundant conflict clauses that help prune the search space. In respect to this recall that the AIXI definition/Solomonoff induction definition is noncomputable and so doesn't have a notion of efficiency.

Indeed, some optimally parsimonious TMs for some inputs are not going to meet fixed resource bounds on part of the input. Intuitively if you are concerned about a finite part of the input space, you can just tack them on to the definition of the TM to obtain a TM that has good efficiency on that finite space, at the cost of definitional parsimony. Possibly something in-between for particular infinite spaces exist (dovetailing with a more complex TM with better runtime characteristics that agrees on that space?) and I wonder if there might very well be an efficient frontier of parsimony against say time complexity.


Right, I’m not the most well read on this stuff either, so I’m wondering now if existing architectures operate on this

> efficient frontier of parsimony against say time complexity.

As you mentioned before regularization approximates parsimony, could it be that what’s gained from this loss of precision wrt parsimony are runtime guarantees (since now we’re mostly talking about constant depth circuit-esque DL architectures)? Or is the jump to continuous spaces more relevant? Are these the same?

I’ll have to read up more to see


> I'm going to go into my own perspective of it; it is not reflective of what it discusses

Why not answer the question?

And looking at your paragraphs I'm still not sure I see a definition of intelligence. Unless you just mean that intelligence is something that can approximate this algorithm?


One way you can define intelligence considered functionally is how an entity given patterned input can demonstrate that it learns and understands that pattern by responding with an extension of that pattern. This definition defines what an ideally intelligent entity is if we use this functional definition.

I didn’t read the book, but, I’d advise people not to go into mysticism, it has brought us very little compared to the scientific method, which has powered our industrial and information revolutions.

Dive into the Mindscape podcast, investigate complex systems. Go into information theory. Look at evolution from an information theory perspective. Look at how intelligence enables (collective) modeling of likely local future states of the universe, and how that helps us thrive.

Don’t get caught in what at least I consider to be a trap: “to use your consciousness to explain your consciousness”. I think the jump is, for now, too large.

Just my 2 ct. FWIW I consider myself a cocktail philosopher. I do have a PhD in Biophsyics, it means something to some. Although I myself consider it of limited value.


I have no problem with using the word intelligence to describe human-made systems, since the attribute artificial preserves the essential distinction. These systems inhabit the second-order world of human-created symbols and representations, they are not, and never will be, beings in the real world. Even when they inevitably will be enhanced to learn from their interactions and equipped with super-human sensors and robotic arms. What they won't have is the millions of years of evolution, of continuous striving for self-preservation and self-expansion which shaped the consciousness of living organisms. What they won't ever have is a will to be. Even if we program them to seek to persist and perpetuate themselves, it will not be their will, but the will of whoever programmed them thus.

Not parent, but I would say their experience, even though severely impaired in many areas, is still infinitely more embodied than any human artifact is or even conceivably could be. Simply because the millions of years of embodied evolution which have shaped them into who they are and because of the unimpaired embodiment of most of the cells that make up their organism.

I had no idea you've never heard of it. Thanks for keeping us informed.

>I had no idea you've never heard of it. Thanks for keeping us informed.

I see.

In that case, you'll appreciate the fact that the Three Musketeers chocolate bar bears no relationship to Alexander Dumas, the author of the famed book series featuring D'Artagnan and three musketeers.

You might also be interested to learn that Zenit launch vehicles are not made by the organization that produces Zenit optics and cameras.

Most crucially, Lucky grocery store chain in California turns out to be completely different from the Korean Lucky chemical products and electronics conglomerate (known as "Lucky GoldStar" after merging its chem and electronics wings, and, currently, "LG").

The more you know!


Provided that the author of the message you're replying to is indeed a member of the Animalia kingdom, they are all those creatures together (at the minimum), so yes, they have seen real light directly.

Of course, computers can be fitted with optical sensors, but our cognitive equipment has been carved over millions of years by these kind of interactions, so our familiarity with the phenomenon of light goes way deeper than that, shaping the very structure of our thought. Large language models can only mimic that, but they will only ever have a second-hand understanding of these things.

This is a different issue than the question of whether AI's are conscious or not.


I guess even stags in rut "understand" this. But there are levels of understanding. Read something like The Origins and History of Consciousness by Erich Neumann and you might agree it reaches a deeper level.

I prefer your terminology. That being said, domain modelling (what the article describes) comes first, hence is more foundational and important than data modelling.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: