Hacker News new | past | comments | ask | show | jobs | submit login




As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?


I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: