The thing is that I think it could be an optimal way of saying it. Should we not put it into context of making a particular LLM? Why count three versions of three LLMs? They made it hard to choose the one that makes up for not having GPT 1. GPT 3.5 and Codex are both good candidates. And of course calling GPT 4 the third and fifth could be considered as well.
That doesn’t resolve the problem of whether third or fifth is better than fourth. I have yet to be convinced that their wording here shows that they fail to grasp the pace of the development.