The part of Meta research that worked on LLaMa happened to be based in the Paris office. Then some of the leads left and started Mistral.
Complex/simple is not really the right way to think about training these models, I'd say its more arcane. Every mistake is expensive because it takes a ton of GPU time and/or human fine tuning time. Take a look at the logbooks of some of the open source/research training runs.
So these engineers have some value as they've seen these mistakes (paid for by Meta's budget).
Complex/simple is not really the right way to think about training these models, I'd say its more arcane. Every mistake is expensive because it takes a ton of GPU time and/or human fine tuning time. Take a look at the logbooks of some of the open source/research training runs.
So these engineers have some value as they've seen these mistakes (paid for by Meta's budget).