Hacker News new | past | comments | ask | show | jobs | submit login

I've gotten the impression that:

1. The bias is mostly due to the training data being from larger models, which were heavily RLHF'd. It identified that OpenAI/Qwen models tended to refuse to answer certain queries, and imitated the results. But Deepseek models were not RLHF'd for censorship/'alignment' reasons after that.

2. The official Deepseek website (and API?) does some level of censorship on top of the outputs to shut down 'inappropriate' results. This censorship is not embedded in the open model itself though, and other inference providers host the model without a censoring layer.

Adit: Actually it's possible that Qwen was actively RLHF'd to avoid topics like Tiananmen and Deepseek learned to imitate that. But the only examples of such refusals I've seen online were clearly due to some censorship layer on Deepseek.com, which isn't evidence that the model itself is censored.






RLHF == Reinforcement Learning from Human Feedback



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: