Hacker News new | past | comments | ask | show | jobs | submit login

The mathematics of the BNNs are sound. The shannon entropy of a word is really small (I vaguely remember ~2 bits). Also all neural networks are ridiculously over provisioned.

I worked on 7 years ago trying to efficiently binarize CNNs from existing models. It the difficult was getting training running without the losses going to high. I think that vision models will be much more difficult to binarize, but you might not need to with clip if the vision encoder stays in regular math {fp16,int8}




What about text to speech models? Do you think ternary will work?


Just to be clear, it's all theoretically possible. There are already versions of BNN versions of YoLo and other CNNs. No reason why transformers wouldn't work for that or audio. It just might be harder to get them to train well enough.

Speech to text, however, is super interesting. You just gave me an idea! I'm gonna go run some experiments :D


Please report back! :-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: