Just a reminder that LLaMA is not open—in order to use it legally you have to agree to Meta's terms, which currently means research use only. The versions circulating on torrents are essential pirated, and while I don't have an ethical problem with that at all you can't use it safely in a business.
The open replacements for LLaMA have yet to reach 30B, let alone 65B.
If anyone has a copyright claim to an LLM, the creators of the input data have more of a copyright claim than the company that trained it. There's a good chance they are not copyrightable at all. I'd bet there's a lot of people willing to take on that risk.
However, they might still fall under trade secret law.
The "software" part of an LLM is pretty trivial -- the interesting piece is the the weights. Since the weights are mechanically generated by a computer, it can be argued that the weights are not copyrightable, just like a photograph taken by a monkey isn't copyrightable.
The software is the matrix multiplication and gradient descent. We are talking about the numbers in the matrices. They are the output of a training algorithm, so we can only talk about the copyright on the training algorithm, and on its input data.
The model weights could be seen as a derived work, for which they didn't get the permission of the original copyright holders. Alternatively, it can be argued that the LLMs are no different than a fanfic writer trying to imitate the style of their favor author.
It's not obvious which way it will go, but I can see the point of those arguing that LLM data are ill-gotten gains.
People always bring this up like it’s a big deal, but most users aren’t interested in starting a business. We just wanna play with LLMs.
Frankly, I’m glad we don’t have a bunch of llamas in different skins being hawked like the current crop of “AI” startups that are just thin layers over OpenAI’s API.
The open replacements for LLaMA have yet to reach 30B, let alone 65B.