It's IMO a pedantic distinction. A compiled binary is a bad metaphor because it ...

hedgehog · on Dec 10, 2023

It's a big distinction, if I want to tinker with the model architecture I essentially can't because the training pipeline is not public.

minimaxir · on Dec 10, 2023

If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...

If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.

hedgehog · on Dec 10, 2023

I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.

sillysaurusx · on Dec 10, 2023

We do, via TRC. Eleuther does too. I think it’s a bad idea to have a fatalistic attitude towards model reproduction.

hedgehog · on Dec 10, 2023

Exactly, nice work BTW. And no hate for Mistral, they're doing great work, but let's not confuse weights-available with fully open models.

emadm · on Dec 10, 2023

With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use