I don’t entirely disagree, but that is still an intuition, not a proof that our interfaces should always work that way.
We used to ride animals with legs, which worked a lot like our legs do. Does that mean the wheel is wrong? We don’t have wheels, and they don’t occur in nature.
I don’t think Apple has invented the wheel, and I’m inclined to agree that leveraging our hardware acceleration makes sense. But I haven’t seen anything beyond blind assertion that of course it has to work that way.
I think things are more "differential" than that. Since many of us look at these interfaces more than any other visual stimulus, our perception will be optimized around them. The ideal system, in the short term, will involve familiarity more than anything.
we shouldn't need a manual to interpret a UI