If being probabilistic prevented learning deterministic functions, transformers couldn’t learn addition either. But they can, so that can't be the reason.
Are you sure? I bet you if you pull 10 people off the street and ask them to multiply 5 digit by 5 digit numbers by hand, you won't have a 100% success rate.
Transformers do just fine on many deterministic tasks, and are not necessarily probabilistic. This is not the issue at all. So, it's hard for everyone else because they're not confidently wrong like you are.
Bad take. It's not that it's hard for everyone - there's critical pushback because we don't know for certain if LLM technology can or cannot do the task in question. Which is the reason there's a paper being discussed.
If we were to take the stance of "ok, that happened so it must be the case" we wouldn't be better off in many cases, we would still be accusing people of being witches most likely.
Science is about coming up with a theory and trying to poke holes into it until you can't and in which case, after careful peer-review to ensure you're not just tricking yourself into seeing something which isn't there a consensus is approached in which we can continue to build more truth and knowledge.
Not true though. Internally they can “shell out” to sub-tasks that know how to do specific things. The specific things don’t have to be models.
(I’m specifically talking about commercial hosted ones that have the capability i describe - obviously your run of the mill one downloaded off of the internet cannot do this).