This is perhaps a bit philosophical, but assuming we had a "meaning maximization" function, what stops us from writing it as a loss function and using our current supervised machine learning frameworks?
Just because we can formulate an optimization objective, it is not guaranteed that we will find an algorithm that solves it within a reasonable amount of time. In case of humans, these objectives or preferred states are possibly very simple ones, like hunger and pain avoidance, reproduction and curiosity; and it is actually easy to write down an algorithm that optimizes these objectives (if you ignore how reality actually works): Simply try out all possible ways of reacting to the environment and choose the best one.
This works in theory, but in practice you only have a limited amount of chances to try something out (because of the arrow of time). This makes learning a necessity. You need to keep a record of all trials you have performed so that you can reuse this information later when the same situation reoccurs. How to do this in an optimal way is described by Bayes' theorem.
The key to AI will be a certain set of priors, biases and fixed function units that make this computationally tractable; we'll likely need things like invariance of the learned information to various changes so that it can be reused in different settings, segmentation of data coming from the world into episodes (hippocampus), attention, control (basal ganglia), mental rotation (cortex) and path integration (hippocampus, grid cells).
That's true, there are certainly many optimization objectives computationally intractable, or perhaps too abstract to be useful for learning.
However, I would argue the prior of Bayesian modeling can be just as nebulous and computationally intractable as an optimization objective. Like supervised learning, Bayesian modeling is just a tool.
I'm skeptical that we will reach AI through a deep understanding or modeling of the brain. Technology and computer science advances more quickly than the biological sciences, at least in recent times. You might argue a success in robotics like [0] is a motor control system. But they built this extending mathematical frameworks not being biologically inspired, and the big wins there didn't come from fixating on a learning framework or biological mimicry; just like humans learning to fly didn't come about from flapping wings like a bird. At some point we hacked an engine (invented for other purposes) onto a wing and came up with powered flight.
As an aside, only seeing input a limited number of times would likely improve your ability to find models that generalize as your model must be able to take these one off learnings and unify them in some way to achieve high training performance. With respect to human learning, a specific individual only has one chance, but nature has had many. We are only a selection of those chances that seemed to work well enough. There are many commonalities to existence that allow for this to work well in practice.
Your agent may need a way to ask for a "kind" of training instance from the world in order to maximize meaning. Like maybe I've seen mammals, and now I need to see another kind of animal to maximize my understanding of the meaning animal. A human being—perhaps instinctively, or perhaps by some other force—has the curiosity to go find / pay attention to fish and birds. A kid can tell you he wants to go to the zoo.
A supervised machine learning framework can't tell the researcher what training instances it needs to see in order to improve its meaning. A supervised learning framework can't imagine where it might find that training instance, or describe what it may look like. A supervised learning framework never asks to go to the zoo.
Yes, in theory an offline supervised learner should never beat an online reinforcement learner. Adding a set of actions A that can be used to bias future examples in a predictable manner is certainly an advantage that will yield better convergence properties in almost all scenarios, simply because it lets you gain more information per observation.