Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself.
https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...
(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)
And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.
(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)
And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.