Hacker News new | past | comments | ask | show | jobs | submit login

".... The small manually-labeled dataset is used to train a refusal classifier, which achieves an accuracy of 92%. Second, we use this refusal classifier to bootstrap a larger (n=10,000) dataset adapted from the Quora Insincere Questions dataset. With this machine-labeled data, we train a prompt classifier to predict whether ChatGPT will refuse a given question, without seeing ChatGPT’s response. This prompt classifier achieves 76% accuracy on a test set of manually labeled questions (n=1,009). We examine our classifiers and the prompt 𝑛-grams that are most predictive of either compliance or refusal. Datasets and code are available at https://github.com/maxwellreuter/chatgpt-refusals..."



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: