Bugs only exist when there is a specification that is violated. Test cases validate whether a specification is implemented. The specification is the gold master; a bug exists when a specification is violated, whether or not that behavior has a test case. It can be, though, that some behaviors are specified only in test cases.
Thus, a program which crashes with an access violation can be specified as being built to demonstrate that effect, in which case it's not a bug.
(Some specifications can be unwritten; like the expectation that a word processing application doesn't die with access violations, ever. Even that may not be realistic; we have to hedge it with some weasel words like "under ordinary conditions", like editing a reasonably sized document on a machine with adequate resources or whatever.)
I would define a bug as defying user expectations in a negative way. Most novel products are figuring out what user expectations are as they go so you are better off letting your users tell you what a bug is then sticking to some definition that requires a test suite or a predefined specification. It is hard to see chatGPT making stuff up as desirable so whether it is a bug or not is just semantics.
They are trying to crowdsource this with OpenAI evals. https://github.com/openai/evals
I'm sure they have a lot of internal benchmarks too, but of course they won't share them.
>If not, there can be no "bug" in ChatGPT.
I don't understand the objection. Are you claiming that bugs only exist if you have testcases for them?