A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.
It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.
I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.
With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use
A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.
It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.