Hacker News new | past | comments | ask | show | jobs | submit login

> model weights are definitely copyrightable.

on what legal theory or precedence makes this true?

IMHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable; only the layouts and expressive portion of a phone directory is copyrightable.

So to make the weights copyrightable, it needs to be argued that the 'layout' of the weight is a creative expression, rather than a 'fact'. But the weights are matrices , which is not expressive or creative. Someone else could derive this exact same set of weights from scratch via the same algorithmic procedure, and therefore, these weights cannot be a creative expression.




"Definitely" is too certain w.r.t. law, but it's pretty obvious how you'd argue these fall under copyright. The difficulty would really be the opposite, it'd be arguing the weights are not derived works of the copyrighted input data sets.

Firstly, weights are not merely a collection of facts like a telephone book is. If two companies train two LLMs they'll get different weights every time. The weights are fundamentally derived from the creative choices they make around hyperparameter selection, training data choices, algorithmic tweaks etc.

Secondly, weights can be considered software and software is copyrightable. You might consider it obvious that weights are not software, but to argue this you'd need an argument that also generalizes to other things that are commonly considered to be copyrightable like compiled binaries, application data files and so on. You'd also need to tackle the argument that weights have no value without the software that uses them (and thus are an extension of that software).

Finally, there's the practical argument. Weights should be copyrightable because they cost a lot of money to produce, society benefits from having large models exist, and this requires them to be treated as the private property of whoever creates them. This latter one should in theory more be a political matter, but copyright law is vague enough that it can come down to a social decision by judges.


I agree but I'd suggest that weights are less like the telephone numbers in a directory and much more like the proportional weights in a recipe.

Recipes, famously, are almost but not quite copyrightable | patentable.

eg:

https://copyrightalliance.org/are-recipes-cookbooks-protecte...

https://etheringtons.com.au/are-recipes-protected-by-copyrig...


> MHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable

I would contest the analogy, but even if we accept it, it's still not clear whether phone directories (or other compilation of factual data) are definitely not copyrightable. The position is clear in the US, but in the UK and presumably other jurisdictions, I wouldn't be so sure.

You could claim we're just talking about US law here, but if you release something on github/huggingface without geo-restrictions, and your company does business in Europe, you might not only have to comply with US law...

eg. https://www.jstor.org/stable/24866738 , eg. https://books.google.com.hk/books?id=wHJBemWuPT4C&pg=PA114&l...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: