Hacker News new | past | comments | ask | show | jobs | submit login

It's IMO a pedantic distinction.

A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.

It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.




It's a big distinction, if I want to tinker with the model architecture I essentially can't because the training pipeline is not public.


If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...

If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.


I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.


We do, via TRC. Eleuther does too. I think it’s a bad idea to have a fatalistic attitude towards model reproduction.


Exactly, nice work BTW. And no hate for Mistral, they're doing great work, but let's not confuse weights-available with fully open models.


With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: