I think one of the major things missing from "LLMs as AGI" is the ability to lea...

I think one of the major things missing from "LLMs as AGI" is the ability to learn continuously and incrementally, which is needed to learn a new skill (as well as to reason in the general case where you are learning as you go when facing a novel task).

If we're relying on pre-training for skill/knowledge acquisition, then the training set would need to include robot-POV training data for every task and scenario is was expected to succeed at, which seems essentially impossible even given a potential world simulator in which to train them.

Without continual learning, or being pre-trained for every task, past or future, it'd be perpetual groundhog day where you coach your robo-plumber to do a task one day, and have to coach it again every time the same task comes up. Of course most consumers aren't expert plumbers, so there is really no alternative than have robo-plumber come pre-trained for all eventualities or be able to learn on the job the same way an inexperienced plumber would do.

There are many other things needed to replicate animal intelligence other than continual learning (e.g. traits like curiosity & boredom, in order to drive an autonomous system to experiment and learn), but continuous learning is a big one. We've been stuck on "whole dataset SGD-train, then deploy" since the advent of neural networks, despite many smart folk like Hinton trying to find something better.

As far as things like flexibility, the bar is pretty high. The robo-plumber needs to be able to lie on it's back in a pool or spray of water while contorting itself in the cabinet under your kitchen sink to fix a leak when the water couldn't be shut off 100% ... the real world is infinitely more messy and challenging than any simulation is going to be, and the simulation isn't going to prepare the robot for the wet/greasy/slippery/etc physical environment in which it'd be working.

Never mind figuring how to build human level AGI (with learning, etc), having it operate in real time and be battery powered is a massive challenge. A car at least has the battery being charged continuously by the engine/alternator. Real-time response by a multi-modal LLM currently requires multiple H100's or similar - probably a few kilo-watts. There's no reason to suppose that even in theory it's possible to build a compact battery (or super-capacitor) technology capable of delivering that sort of power output for an 8-hour shift. There's more hope that future cognitive architectures, and realizations (dataflow vs synchronous?) might reduce power needed to what's available from a battery, but that'd be many decades away.