Hacker News new | past | comments | ask | show | jobs | submit login

>Does ChatGPT have a large test suite consisting of a large number of input question and expected responses that have to match?

They are trying to crowdsource this with OpenAI evals. https://github.com/openai/evals

I'm sure they have a lot of internal benchmarks too, but of course they won't share them.

>If not, there can be no "bug" in ChatGPT.

I don't understand the objection. Are you claiming that bugs only exist if you have testcases for them?




Bugs only exist when there is a specification that is violated. Test cases validate whether a specification is implemented. The specification is the gold master; a bug exists when a specification is violated, whether or not that behavior has a test case. It can be, though, that some behaviors are specified only in test cases.

Thus, a program which crashes with an access violation can be specified as being built to demonstrate that effect, in which case it's not a bug.

(Some specifications can be unwritten; like the expectation that a word processing application doesn't die with access violations, ever. Even that may not be realistic; we have to hedge it with some weasel words like "under ordinary conditions", like editing a reasonably sized document on a machine with adequate resources or whatever.)


I would define a bug as defying user expectations in a negative way. Most novel products are figuring out what user expectations are as they go so you are better off letting your users tell you what a bug is then sticking to some definition that requires a test suite or a predefined specification. It is hard to see chatGPT making stuff up as desirable so whether it is a bug or not is just semantics.


Defied user expectations are a bug when the purveyor of the system becomes aware of those expectations and subsequently adopts them as a requirement.

If the expectations are rejected, then the situation is resolved as a non-bug.

An expectations bug can go away if the user's expectations are "managed" in a direction away from it.

Expectations have to be formalized into something that is testable, if we are to actually implement a bug fix and close the bug.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: