Hacker News new | past | comments | ask | show | jobs | submit login

I feel like the biggest takeaway here is that a classifier trained on samples could only predict whether or not ChatGPT would refuse their response 76% of the time, which to me seems very low (given that they used BERT, regression, and a random forest as their classifier).

Probably means there's a lot we still can't predict about how LLMs work internally, even if we try to apply classification to it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: