These models are clearly great with language, be it natural language or code. However, I wonder where the expectation comes from that a static stochastic parrot should be able to compute arbitrary first order logic (in a series of one-shot next word predictions). Could any expert elaborate on how this would be solved by a transformer model?