>The aggregation performed by model training is highly lossy and the model itsel...

fnordpiglet · on May 31, 2023

And if you use it, you’re violating copyright. But you will find no copy of the logo in the model data. The model is way too small to contain its training imagery from an information theoretic point of view.

falcolas · on May 31, 2023

> But you will find no copy of the logo in the model data.

You wont find a copy of a plaintext in a cyphertext. But you can still extract the plaintext from the cyphertext.

fnordpiglet · on May 31, 2023

That’s an example of a two way lossless transformation. The data is certainly encoded in cypher text and is directly retrievable and has no other purpose than to contain the original data. The model you can’t directly retrieve the original data, and it has more purposes than producing the original. It requires you to specifically manipulate it to produce the copyrighted material, and just because copyrighted material was used to train it doesn’t mean it can even reproduce a facsimile.

I think a better counter example is mpeg and other lossy formats. But again, the format does nothing but carry the original even if it’s not a perfect reproduction. You can’t use it in any other way. Its expressed intent is the reproduction of the copyrighted material with no modification or improvement or derivation. These models are not trained with the intent or purpose of only producing the copyrighted materials. It requires your specific action to induce the reproductions if it’s even possible, but it generally serves other purposes in all other uses.

This is more like a xerox than not - you can certainly violate copyright with a xerox. But the existence of the xerox itself isn’t to violate copyright. It’s for other purposes. The ambiguity obviously comes in that a xerox machine wasn’t built by scanning all documents on earth first. But I think the very act of mixing all the other images and documents together into the model, which again, is just a statistical aggregate of everything that was trained with mushed together, turns it into at worst a derived work that falls under fair use.

nl · on June 1, 2023

If you search "Superman Logo" you find actual copies of the Superman logo which are served from Google's cache.

If you ask a VFX artist to create the "Superman Logo" with Photoshop they'll do an excellent job.

The first one isn't copyright violation because it is fair use. The second maybe if it is redistributed but we don't ban the use of photoshop by artists because they can choose to reproduce copyright things with it.

gedy · on June 1, 2023

I agree, and I honestly think that a big part of the issue with AI image generation is people just really have a hard time conceiving of a technology that can make such accurate images from a relatively small model like this.

"It must have a copy" - but van Gogh didn't make paintings of hot rods or whatever, and you can't copyright style or technique.

vidarh · on June 1, 2023

I see so many lay persons and even sometimes people with a CS background describe diffusion models as some sort of magic content addressable data store you "just" look up a bunch of original images in and somehow copy pieces of. These debates would get a whole lot better if more people had a least a very basic understanding of how the training process works.

mjevans · on May 31, 2023

Superhero costume using logo. Logo inspired by the strongest gem's typical cut with first a monogram the first letter of the name for maximum size and visual clarity.

Literally if you asked someone for a recognizable outline of the strongest gemstone's iconic cut you'd get the outline and the rest is an obvious path. Humans might unconsciously, or even by choice, avoid something too similar to something they already know.

Superman's costume also uses vibrant colors. The red / blue pairing is used extensively across many logos and visual representations for the high contrast of two vibrant colors.

As I try to imagine an older child or young adult somehow raised in an environment like pop culture but through some twist absolutely unexposed to Superman or any related concepts, it isn't that far of a stretch to imagine independent invention of a strikingly similar idea. Maybe not as a first draft but in exploring a range of possible powers and automatic logos. E.G. as in the range of an LLM backed character creator for a superhero game, and then aneling the results though simulated effectiveness / fitness of hero powers, logo design, etc.

Everyone wants to think they're a special snowflake and that what they create is somehow unique as well. However we're all drawing on a huge pool of common culture to synthesize expressions which fulfill a set of constraints prescribed by the culture and the culture's influence on the individual and the moment being experienced.

In the case of Superman that's even arguably a description of the archetype. They are literally a super man. Clark Kent however, that's a little more unique and probably a Trade Mark (consumer commercial use protection) as long as such a registration is maintained.

karaterobot · on May 31, 2023

Unfortunately, I don't believe any of that matters with trademarks. If someone came up with the Superman logo on their own, and released a product that used it, they could not say "but it's a really simple logo" and get a free pass. I'm not sure what that means for ChatGPT, but it would certainly factor into your use of images produced by ChatGPT.

mywittyname · on June 1, 2023

I feel like this is a very strong point that just gets hand-waved away. There are numerous cases where AI-generated content is an exact copy of a derived work. This happens with text, music, and art.

If a we have ai-powered content generation in a video game, and you put into a prompt, "generate 300 mickey mouses, then play some music that sounds like Taylor Swift's new album", and the results look exactly like mickey mouse and the music is Taylor Swift's, it's really difficult to argue that's not copyright infringement.

Yet, people get away with thinking that's not copyright infringement because "the algorithm learned it, like a real human". If the prompt just created a human-designed model, then that is copyright infringement.

The solution might be big corporations create an adversarial network that you can train against to purge copyrighted works from your network.

fnordpiglet · on June 1, 2023

It is copyright infringement - but it was you who promoted the production of the copyright violations who is at fault. The model isn’t specifically any more a copyright violation than a browser cache or a photocopier. The person who uses the machine to produce violations is at fault, not the thing that in addition to legitimate transformed works can be used to produce copyright violations. As a company hosting such a service my goal would be similar to YouTube where I do a best effort to monitor for violation and guard rail the best I can. But I shouldn’t be held liable for your intentional use of a product for ill so long as I did do that best effort.