Hacker News new | past | comments | ask | show | jobs | submit login

Only if there isn’t a systemic fault, eg bad prompting.

Their errors appear to disappear when you correctly set the context from conversational to adversarial testing — and Apple is actually testing the social context and not its ability to reason.

I’m just waiting for Apple to release their GSM-NoOp dataset to validate that; preliminary testing shows it’s the case, but we’d prefer to use the same dataset so it’s an apples-to-apples comparison. (They claim it will be released “soon”.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: