Hacker News new | past | comments | ask | show | jobs | submit login

The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: