Not OP, but I agree with the point you question, and my rationale for so doing is that I've yet to see a compelling argument that a self-improving system is other than the software version of a perpetual-motion machine. Those seemed plausible enough, too, when thermodynamics was as ill-understood as information dynamics is now.
This one's tough to answer, actually. The truly optimal learner would have to use an incomputable procedure; even time-and-space bounded versions of this procedure have additive constants larger than the Solar System.
However, it's more-or-less a matter of compression, which, by some of the basics of Kolmogorov Complexity, tells us we face a nasty question: it's undecidable/incomputable/unprovable whether a given compression algorithm is the best compressor for the data you're giving it. So it's incomputable in general whether or not you've got the best learning algorithm for your sense-data: whether it compresses your observations optimally. You really won't know you could self-improve with a better compressor until you actually find the better compressor, if you ever do at all.
An agent bounded in terms of both compute-time and sample complexity (the amount of sense-data it can learn from before being required to make a prediction) will probably face something like a sigmoid curve, where the initial self-improvements are much easier and more useful while the later ones have diminishing marginal return in terms of how much they can reduce their prediction error versus how much CPU time they have to invest to both find and run the improved algorithm.
So far as I'm aware, most proponents of recursively self improving AIs don't necessarily think they can improve without upper limit (as in perpetual motion). They just think they can improve massively and quickly. Nuclear power lasts a hell of a long time and releases a hell of a lot of energy very fast (see: stars) but that's not perpetual motion/infinite energy either. And prior to those theories being developed it would seem inconceivable for so much energy to be packed into such a small space. But it was. Could be for AI too.
Not saying the parallel actually carries any meaning, just pointing out that you can make multiple analogies to physics and they don't really tell you anything one way or the other.
There are limits on resource management processes that are far too frequently ignored. "The computer could build it's own weapons!" -- but that would requires secretly taking over mines and building factories and processing ores and running power plants, etc. All of which require human direction. And even if they didn't, we'd need a good reason to network all these systems together, fail to build kill switches, and fail to monitor them, and fail to notice when our resources were being redirected to other purposes, and not have any backup systems in place whatsoever.
There are just so many obstacles in place, that we'd all already have to be brain-dead for computers to have the ability to kill us.
Self-improvement as perpetual motion seems unlikely.
I'm a not-terribly-bright mostly-hairless ape, but I can understand the basics of natural selection. I can imagine setting up a program to breed other hairless apes and ruthlessly select for intelligence. After a few generations, shazam, improvement.
The only reason you wouldn't call that process "SELF-improvement" is that I'm not improving myself, but there's no reason for a digital entity to have analog hangups about identity. If it can produce a "new" entity with the same goals but better able to accomplish them, why wouldn't it?
Assume this process could be simulated, as GAs have been doing for decades, and it could happen fast. Note that I'm not saying GAs will do this, I'm saying they could, which suggests there's no fundamental law that says they can't, in which case any number of other approaches could work as well.
The problem with this is that you have to determine what the goals are and how to evaluate whether they are met in a meaningful way. A computerized process like this will quickly over-fit to its input and be useless for 'actual' intelligence. The only way past this is to gather good information, which requires a real-world presence. It can't be done in simulation.
It's the same reason you can't test in a simulation. Say you wanted to test a lawnmower in a simulation... how hard are the rocks? How deep are the holes? How strong are the blades? How efficient is the battery? If you already know this stuff, then you don't need to test. If you don't know it, then you can't write a meaningful simulation anyway.
That's an interesting argument, but doesn't it assume a small, non-real-world input/goal set?
Dumb example off the top of my head: what if the input was the entire StackOverflow corpus with "accepted" information removed, and the goal was to predict as accurately as possible which answer would be accepted for a given question? Yes, it assumes a whole bunch of NLP and domain knowledge, and a "perfect" AI wouldn't get a perfect score because SO posters don't always accept the best answer, but it's big and it's real and it's measurable.
A narrower example: did the Watson team test against the full corpus of previous Jeopardy questions? Did they tweak things based on the resulting score? Could that testing/tweaking have been automated by some sort of GA?
The point there is that you can make a computer that's very good at predicting StackOverflow results or Jeopardy, but it won't be able to tie a shoe. If you want computers to be skilled at living in the real world, they have to be trained with real-world experiences. There is just not enough information in StackOverflow or Jeopardy to provide a meaningful representation of the real world. You'll end up overfitting to the data you have.
The bottom line is that without sensory input, you can't optimize for real world 'general AI'-like results.