As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?
I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.