Google denies training Bard on ChatGPT chats from ShareGPT

paxys · on March 30, 2023

1. Google denies doing it, so at the very least the title should have an "allegedly".

2. Even if they did – so what? The output from ChatGPT is not copyrightable by OpenAI. In fact it is OpenAI that is training its models on copyrighted data, pictures, code from all over the internet.

dang · on March 30, 2023

Ok, I've added that information to the title—thanks. There's also https://www.theverge.com/2023/3/29/23662621/google-bard-chat....

Unfortunately the original report (https://www.theinformation.com/articles/alphabets-google-and...) is hardwalled.

caligarn · on April 1, 2023

The Verge doesn’t have an Information sub?

Jimmc414 · on March 30, 2023

>Even if they did – so what?

Amplification of biases, propagation of errors, echolalia and over-optimization, lack of diverse data, overfitting

funkyjazz · on March 30, 2023

Not to mention it's embarrassing. Google playing second banana to OpenAI.

guyzero · on March 31, 2023

Many of Google's products were second, or later, to market. Google does not care.

dietr1ch · on March 31, 2023

Isn't it generally very hard to be first to market? And even if you are it's more likely that someone coming in later will take your lunch.

Apple wasn't the first one to try to make a successful smartphone, but they had resources, know-how, and tried at a better time with fewer unknowns around.

throwawaysleep · on March 31, 2023

Or were just willing to adapt when others didn’t. Blackberry mocked the touchscreen for years until finally coming around and their initial implementation was awful.

arthurcolle · on April 2, 2023

Blackberry Torch??

ulfw · on March 31, 2023

That's okay if they can leapfrog.

Google Maps wasn't the first map product. We all used mapquest way before. But Google Maps was technologically advanced. Ajax made maps usable for the first time.

Gmail wasn't the first webmail. Hotmail had millions of customers already. But Google gave people unlimited space to store old email, whereas email in the old days filled up your inboxes and needed to be deleted.

Question is if they can and will leapfrog.

Google Plus was a sign of desperation and utterly failed. The dozens and dozens of different Messengers (sorry I don't even know what the latest one they're pushing is, RCS?) all failed.

As an organization we will see in the coming months and years if Google can still overtake others when coming in from behind or not.

treesknees · on March 31, 2023

>But Google gave people unlimited space to store old email

This isn't correct. Gmail launched with 1GB per user, which was way higher than other services, and they did keep doubling the storage space year-after-year, but it was never unlimited until Google Apps offered unlimited storage for businesses and schools.

ulfw · on April 1, 2023

That is true. You're technically correct.

In 2004 1GB of email was effectively unlimited though. Keep in mind hotmail offered 2MB and only increased that 125x when Gmail launched.

https://www.cnet.com/tech/services-and-software/hotmail-to-o...

ithkuil · on March 30, 2023

That assumes that training on the output of another language model somehow gives you the ability to improve your model and to catch up somehow

satvikpendem · on March 30, 2023

Well, it does, that's how we got Alpaca from LLaMA.

iandanforth · on March 30, 2023

It does. In general this is known as teacher-student training or knowledge distillation. It works better if you have access to the activations of the model but you can work with just outputs as well.

ithkuil · on March 31, 2023

Does it work even if you don't control the inputs?

beezlewax · on April 1, 2023

Ascribing actual human emotion to a giant corporation like a google is probably not a good idea. Their motivations aren't going be heavily dictated by feelings of shame at being late out of the gate.

prepend · on March 30, 2023

Google’s been second banana to openai for a few years now, right?

falcor84 · on March 31, 2023

The funny part is that deepmind's tech (and some of Google Brain's research) seems to be as good as openAI's or better, but Google's unwillingness/inability to productionize these systems is keeping them back. It seems like the issue is only with the management, and I'll be looking forward to reading about Google's version of Fumbling the Future[0].

[0] https://www.amazon.com/Fumbling-Future-Invented-Personal-Com...

nicehill · on March 30, 2023

I think Amazon was first in the (free) banana business

jrirhfifj · on March 30, 2023

you joke, but first producy they changed on whole foods were the bananas.

before: organic (south america) and regular (central ou SEA) for 69, 59.

then: both chikita's brand with regular and organic stickers (clearly the same produce, always from SEA) for 49 and 39 cents.

thats was days after the announcement

bbarnett · on March 30, 2023

Did you inadvertently reverse to regular/organic order, or was organic cheaper after?

paxys · on March 30, 2023

That's just the base concern with every single model regardless of where they sourced their data from. Garbage in, garbage out.

Jimmc414 · on March 30, 2023

Right, but training an LLM on the output of another LLM can certainly exacerbate these issues

paxys · on March 30, 2023

Maybe, but we are fast approaching the point (or more likely have crossed it already) where distinguishing between human and AI generated data isn't really possible. If Google indexes a blog, how does it know whether it was written with AI assistance and therefore should not be used for training? Heck, how does OpenAI itself prevent such a feedback loop from its own output (or that of other LLMs)?

madeofpalk · on March 30, 2023

> If Google indexes a blog, how does it know whether it was written with AI assistance and therefore should not be used for training

Yes, this is an existential problem for Google and training future LLMs.

See also, https://www.theverge.com/23642073/best-printer-2023-brother-... and https://searchengineland.com/verge-best-printer-2023-394709

ethbr0 · on March 31, 2023

Or Google can just materialize the expected page into existence at search time.

... it's uncanny how it always finds what you thought you were looking for!

notahacker · on March 30, 2023

I'm only half joking.... I think we likely will end up with flags for human generated/curated content (and it will have to be that way round, as I can't imagine spammers bothering to put flags on AI-generated stuff), and we probably already should have an equivalent of robots.txt protocol that allows users to specify which parts of their website they would and wouldn't like used in the training of LLMs.

jfk13 · on March 30, 2023

If content with a "human-generated" flag is rated more highly in some way -- e.g. search results -- then of course spammers will automatically add that flag to their AI-generated garbage. How do you propose to prevent them?

notahacker · on March 30, 2023

I assume, like the actual meta generator tags, it wouldn't actually be a massive boon for regular search results

groestl · on March 31, 2023

And if it's not: why bother.

chatmasta · on March 30, 2023

I think something like this will definitely happen, and your suggestion is the cleanest implementation idea I've seen for it. I imagine there will be a service provided by Google and OpenAI where they verify your identity as a human and then grant you a token to put into your meta tags (wait a second... this sounds like sama's worldcoin idea...).

It will need to be based somewhat on the honor system (just because someone's proved they're a human doesn't mean they won't put their attestation on auto-generated text), but it definitely sounds better than nothing.

They'll still need to incentivize it somehow, though. Why do I as a human want to add that meta tag? If the answer is "better search ranking" then it renders the whole scheme mostly pointless because obviously spammers will want to acquire the attestation and attach it to their auto-generated content.

shubhamkrm · on March 30, 2023

Reminds me of the old “evil bit” RFC[1]

[1] https://www.ietf.org/rfc/rfc3514.txt

rightbyte · on March 30, 2023

> Heck, how does OpenAI itself prevent such a feedback loop from its own output (or that or other LLMs)?

Seems trivial. Only use old data for the bulk? Feed some new data carefully curated?

toxik · on March 30, 2023

Future job: token selector / archiving

p1necone · on March 31, 2023

Pre-AI data is going to become like pre-nuclear steel.

abduhl · on March 30, 2023

Your argument would have a lot more force if we were past that point rather than fast approaching that point. Concerns about training data errors being compounded are much more important when you're talking about the bleeding edge.

And your question about how OpenAI prevents their training data from being corrupted is one we should be asking as well!

educaysean · on March 30, 2023

Sure. Does that fact mean we're prohibited from expressing concerns about data quality? ShareGPT isn't representative of authentic, quality writing.

RosanaAnaDana · on March 30, 2023

I mean maybe. There also might be something to this. OpenAI has been very opaque about training techniques.

jrirhfifj · on March 30, 2023

you talk like chatgpt was some bastion of curated perfectly correct content. get a grip. web scraping is web scraping.

manojlds · on March 30, 2023

But remember many years back when it was news that Bing used Google search results to improve its results.

magicalist · on March 30, 2023

It's not quite the same thing, because Bing was getting the data from a browser toolbar and watching the search terms used and where the user went afterwards.

A closer equivalent would be if someone had made a ShareSERP site and people posted their favorite search terms and the results Google gave and Bing crawled that and incorporated the search terms to links connections into their search graph.

The actual actions had maybe gone too far (personally I thought it was more funny than "copying"), the hypothetical would be pretty much what you'd expect to happen. Even google would probably crawl ShareSERP and inadvertently reinforce their own results (the same way OpenAI presumably gets more than a bit of their own results back at them in any new crawls of reddit, hn, etc even if they avoid sites like ShareGPT deliberately).

cma · on March 30, 2023

> Google catches Bing copying [search results], Microsoft says “so what?”

https://arstechnica.com/information-technology/2011/02/googl...

KRAKRISMOTT · on March 30, 2023

OpenAI Terms of service forbid training competitor models via their ML outputs (LoRa alpaca laundering is probably not allowed for commercial use).

vagabund · on March 30, 2023

Google has no contract with OpenAI though. They used a third party site to scrape conversations. If the outputs themselves are not copyrighted, and they never agreed to the terms of service, it should be fine, right? Albeit unethical and embarrassing.

acoustics · on March 30, 2023

> Albeit unethical and embarrassing.

I really don’t understand this angle. In fact, I am fairly positive that the training set for GPT-4 contains many thousands of conversations with AI agents not developed by OpenAI.

Do AI companies need to manually sift through the corpus and scrub webpages that contain competitor LLM output?

(“Yes” is an acceptable answer to this, but then it applies to OpenAI’s currently existing models just as much as to Bard)

j_maffe · on March 30, 2023

How did you come about being "fairly positive" that GPT-4 is trained on other AI conversations?

TremendousJudge · on March 30, 2023

Many AI conversations have been floating around internet forums since the original GPT was released. As OpenAI hasn't shared anything about its training set, to err on the side of caution I would assume that they didn't filter these conversations out. If they aren't even marked as such, it may not even be possible to do. I think it would be very hard to prove that no AI conversations are included in the training set, even if it wasn't secret.

paxys · on March 30, 2023

Hardly unethical, considering OpenAI is doing exactly this.

layer8 · on March 30, 2023

Two wrongs don’t make a right.

pantalaimon · on March 30, 2023

It’s still debatable if training a computer neutral network on public data is 'wrong' when we very much accept it as a right for biological neural networks.

asddubs · on March 30, 2023

forgive me if i have limited sympathy when a burglars house gets robbed

WillPostForFood · on March 30, 2023

It's even less worthy of sympathy - like a counterfeit piece of art being counterfeited. And there isn't even an original, just like a made up counterfeit.

layer8 · on March 31, 2023

That's not how rights work, though. I'm sure you don't want your rights to be conditional on the sympathy of others.

asddubs · on March 31, 2023

good thing I'm just some guy expressing an opinion, and not a judge, then

kbrkbr · on March 30, 2023

vagabund · on March 30, 2023

You can quibble about the ethics of web scraping for ML in general but I think you're conflating issues.

OpenAI and Google both scour the web for human-generated content. What Google cares about here is the learnings from OpenAI's proprietary RLHF dataset, for which they had to contract a large sum of human labelers. Finding a roundabout way to extract the value of a direct competitor's purpose-built, costly data feels meaningfully different from scraping the web in general as an input to a transformative use.

paxys · on March 30, 2023

> OpenAI and Google both scour the web for human-generated content

OpenAI and Google both scour the web for content, period. That content could be human generated or AI generated or a mix of the two. Neither company is respecting copyright or terms of service of every individual bit of data collected. Neither company cares how much effort was put into creating the data, whether humans were paid to do it, or whatever else. So there really isn't that much difference between the two. In fact I can guarantee that there was some Google-generated content within OpenAI's training data.

vkou · on March 30, 2023

And herein is the main problem of AI. Its creators consume knowledge from the commons, and give nothing free and unencumbered back.

It's like the guy who never brings anything to the potluck, but after everyone finishes eating, he boxes up the leftovers, and starts selling them out of a food cart.

justinclift · on March 31, 2023

> after everyone finishes eating, he boxes up the leftovers, and starts selling them out of a food cart.

That particular example doesn't seem all that great, as it gives the impression of not letting food go to waste.

Though sure, the guy could just give the food away, but shrug.

shmel · on March 30, 2023

So what? Is OpenAI RLHF dataset more valuable than millions of books and paintings OpenAI used for free without stopping a second? Why is that? Because one big tech corp paid money for that dataset?

ClumsyPilot · on March 30, 2023

> labelers. Finding a roundabout way to extract the value of a direct competitor's purpose-built, costly data feels meaningfully different from scraping the web in general as an input to a transformative use

There we go again, its, one law for the unwashed plebs and the other for us.

Why do you think that I, after spending my time and effort to write my blog, own my content to a lesser extent that OpenAI does their? Such hypocracy.

abeppu · on March 30, 2023

If there's a party which has intentionally conflated scraping web content in general with scraping it to build a direct competitor to the original sources, that party is Google.

Yes, this latest instance with OpenAI outputs is shady, but I think it's in the same spirit as scraping news organizations for content which journalists were paid to write, and then showing portions of it directly in response to queries so people don't go directly to the news organization's pages, and it's in the same spirit as showing answers to query-questions that are excerpts from scraped pages which another organization paid to produce.

bloppe · on March 30, 2023

I see no difference. Any web scraping is a means to deflect revenue-generating traffic to yourself, and away from other websites. Fewer people will go to Stack Overflow because of Codex and Copilot. The point that the content was paid for vs volunteered becomes moot once it's posted publicly online for free, on ShareGPT.

caconym_ · on March 30, 2023

No more unethical or embarrassing than scraping the web for millions of copyrighted works and selling access to unauthorized derivative works.

bloppe · on March 30, 2023

The recent HiQ vs LinkedIn case would seem to make this ToS unenforceable, unless Google actually created a user account on ShareGPT and affirmatively accepted the terms. "Acceptance by default" does not count, and I can easily browse ShareGPT without affirmatively accepting any ToS, without which web scraping is totally legal.

short_sells_poo · on March 30, 2023

I love it how they don't want others to use their model output but they have no qualms about training their model on the copyrighted works of others? Isn't this a stunning level of hypocrisy?

saurik · on March 30, 2023

So, to verify, are you claiming that if someone added a similar clause to their source code and then GitHub went ahead and trained Copilot against it, that would be an issue?

bloppe · on March 30, 2023

You relinquish all licensing rights when you upload your code to GitHub. Microsoft can do whatever they want with it. That's in their ToS, which you have to agree to when you make an account. Normally, only affirmatively accepted ToS are enforceable, so just putting a clause into your license doesn't work (unless it's a copyright, which doesn't require consent).

flir · on March 30, 2023

> You relinquish all licensing rights when you upload your code to GitHub

What now? Seriously?

I found this. Section D4.

"We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video."

"as necessary to provide the Service" seems critical.

commoner · on March 30, 2023

Also, section D3 of the GitHub Terms of Service says:

> You retain ownership of and responsibility for Your Content.

and section D4 says:

> This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

There is nothing in the terms that requires the GitHub user to relinquish all licensing rights.

https://docs.github.com/en/site-policy/github-terms/github-t...

bloppe · on March 30, 2023

The clauses always have a trap door: "[outside of] our provision of the Service" means they can do anything as long as it's a service they provide.

Under definitions: The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

commoner · on March 30, 2023

I think there's a misunderstanding over what the word "relinquish" means.

The terms make clear that uploading code to GitHub gives GitHub the right to "store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time" while the code is hosted on GitHub.

However, that's not the same thing as relinquishing (giving up) licensing rights to GitHub. The uploader still retains those rights, and there is nothing in the terms that says otherwise.

gcr · on March 30, 2023

The question turns on whether you consider copilot part of the "GitHub service."

GitHub would argue that it is, and they'd likely argue that charging for access to copilot is akin to charging for access to private repositories.

Others would say that copilot is somehow separate from the services Github provides, so using their code for CoPilot wouldn't be covered by the ToS.

bloppe · on March 30, 2023

It is certainly a service that's being provided. If not by GitHub, then by whom?

I'll repeat the definition of service: The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

cycomanic · on March 30, 2023

So do you believe if you hosted a closed source project on GitHub, and GitHub decided they want to integrate this into their service they would simply be allowed to take the code?

Fortunately HN commenters are not judges. And I would wager any bet that MS lawyers would not try to argue based on their ToS either, that would be a recipe for loosing any court case.

bloppe · on March 30, 2023

I just mean that it doesn't really matter what your license says as long as GitHub can come up with a business justification for using it in some way. Certainly, other users still legally have to obey your copyright.

bloppe · on March 30, 2023

"Improving the service over time" can do a lot of heavy lifting, definitely including training Copilot.

saurik · on March 30, 2023

So, to verify, are you claiming it would not be allowed for you to upload my otherwise-open-source code (code I do not myself host at GitHub, but which was reasonably popular / important code) to GitHub?

bloppe · on March 30, 2023

Yep. It's in their ToS:

If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post.

I suppose this means if I upload your stuff to GitHub, and you sue GitHub, then GitHub would be able to somehow deflect liability onto me.

commoner · on March 30, 2023

That doesn't make sense. For example, GPLv3 allows anyone to redistribute the software's source code if the license is intact:

> You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program.

https://www.gnu.org/licenses/gpl-3.0.en.html

If GitHub then uses the source code in a way that violates the license, there is no provision in the GitHub terms of service that would allow GitHub to deflect legal liability to the GitHub user who uploaded the program. The uploader satisfied the requirements of GPLv3, and GitHub would be the only party in violation.

8note · on March 30, 2023

Uploading is granting GitHub a license separate from the gpl license.

If you can't actually grant that separate license, you're misrepresenting your ownership and license to that code

commoner · on March 31, 2023

I'd like to see that theory tested in court. Section D3 of the terms says:

> If you upload Content that already comes with a license granting GitHub the permissions we need to run our Service, no additional license is required.

https://docs.github.com/en/site-policy/github-terms/github-t...

and section D4 does not mention any permissions that GPLv3 does not already cover. GitHub automatically recognizes when a repo is GPLv3-licensed, so it cannot claim ignorance of what GPLv3 is.

Certhas · on March 30, 2023

This is really hilarious. Authors and artists never gave permission to use their work to train AI models either...

Not legally the same situation, but ethically close enough.

space_fountain · on March 30, 2023

Where exactly does it do that? I looked a bit and could t find it, but likely I was just wrong

shmatt · on March 30, 2023

breaking terms of service is not punishable in any way. Facebook tried and lost in court

paxys · on March 30, 2023

Correction – breaking terms of service that you have not explicitly agreed to is not punishable in any way. A site cannot enforce a "by using this site you agree to..." clause deep inside some license page that visitors are generally unaware of. If you violate an agreement that you willingly chose to enter, however, you will likely be found liable for it.

worldofmatthew · on March 30, 2023

Are the TOS even enforceable is AI content can't be copyrighted?

ladon86 · on March 30, 2023

> Google denies doing it

Read their statement carefully and it's actually not a denial of the allegation.

> But Google is firmly and clearly denying the data was used: “Bard is not trained on any data from ShareGPT or ChatGPT,” spokesperson Chris Pappas tells The Verge

* Allegation: Google used ShareGPT to train Bard.

* Rebuttal: The current production version of Bard is not trained on ShareGPT data

Both things can be true:

* Google did use ShareGPT to train Bard

* Bard is not currently trained on any data from ShareGPT or ChatGPT.

It depends on what the meaning of is is ;)

ithkuil · on March 30, 2023

Intent matters I guess.

Did they accidentally train on that public piece of info they scraped anyway because they are scraping the whole web?

Or did they intentionally scrape chatgpt output to see if that would help?

bbarnett · on March 30, 2023

They could have trained, then modified code, repeat, to better enhance training in the current version.

Then after, train on raw data.

m00x · on March 30, 2023

Trained would mean the current model wasn't trained at all from ShareGPT data, not that was trained on it previously, and isn't being trained anymore.

This association makes no sense.

mdgrech23 · on March 30, 2023

This is an argument in bad faith but at this point I have zero trust in corporations and feel like you can generally count on them to do shitty things if they can benefit from it so I can be easily swayed by little proof at this point.

recursive · on March 30, 2023

What's the argument? What's been done by anyone that's shitty? I don't even understand the point of this post. As far as I know, the current wave of text-based AIs is trained on all text accessible on the internet. Would it be a scandal to learn that ChatGPT is trained on wikipedia? Reddit? What is even the argument here, good faith or otherwise?

mdgrech23 · on March 30, 2023

The argument is these companies are using our ideas created by us humans in this thing called the interenet for free and without attribution and it's problematic.

dimitrios1 · on March 30, 2023

Responding to sibling comment: We need some clarification here: are we speaking about just ideas in the abstract sense, or ideas that have been fleshed out i.e "materialized"

If the latter, there are many laws that say you can own an idea, provided it exists somewhere.

abstrakraft · on March 30, 2023

I'm not necessarily arguing against you, but "problematic" is too generic a term to be useful. Genocide is "problematic". Having to run to the bathroom every 5 minutes to blow my runny nose is "problematic". What do you actually mean?

visarga · on March 30, 2023

You can't own ideas, they got their own life-cycle.

whimsicalism · on March 30, 2023

Right, but I do think you can "own" (by which I mean our societally-mediated legal definition of ownership in the anglosphere) specific sequences of text or at least the right to copy them?

visarga · on March 30, 2023

From an open source point of view it would be better if scraping proprietary LLMs would be allowed. Small LMs need this infusion of data to develop.

But the big news is that it works, just a bit of data can have a large impact on the open source LLMs. OpenAI can't have a moat in their proprietary RLHF dataset. Public models leak, they can be distilled.

canadianfella · on March 30, 2023

What shitty things are you talking about?

Ifkaluva · on March 30, 2023

Regarding point 2, I think there's nothing "wrong" with it, mainly it's funny that they don't know how to do it themselves. Provides additional evidence that Google is outgunned in this fight.

karmasimida · on March 30, 2023

Yup

The idea of doing this is embarrassing enough for Google.

Google index the whole web, some of the documents are due to be generated by ChatGPT, there is no way around it.

b7r6 · on March 31, 2023

Whether or not training one giant LLM with socially enormous stakes on the output of another commercially-controlled LLM is an interesting question.

My stronger opinion is that the people who can do this stuff via having a crawled corpus of the Internet need to keep in mind that it's all our "user-generated content" that they've freely appropriated to build their models, and so whatever the technical copyright rules are (or become): you don't ethically own something that's closely imitating stuff we all wrote over the years.

dragonwriter · on March 30, 2023

> The output from ChatGPT is not copyrightable by OpenAI.

I think the argument here is over the OpenAI Terms of Service, not copyright.

paxys · on March 30, 2023

And what about the terms of service of my blog or code repository? Does OpenAI respect that?

dragonwriter · on March 30, 2023

> And what about the terms of service of my blog or code repository? Does OpenAI respect that?

Seems to me that’s an issue between you and OpenAI. (Does your blog or code repository actually have published restrictive terms of service? Did it when OpenAI accessed it? Did OpenAI even access it?)

deckard1 · on March 30, 2023

You think OpenAI is going to care unless you have a team of expensive lawyers to back you up?

Microsoft is out there laundering GPL code with Copilot. These companies live firmly in the don't give a fuck region of capitalism. Copyright law for thee, not for me.

magicalist · on March 30, 2023

Since it was through ShareGPT, is the argument like "what color are your bits" but for ToS?

Maybe if they had put in their terms of service "you can only share this on sites with their own ToS that allow sharing but disallow using the content for training models, and also replicate this requirement", I don't see how you could have any sort of viral ToS like that.

Seems more like it's just a bad idea to rely heavily on another LLM's output for training.

bloppe · on March 30, 2023

See HiQ vs LinkedIn. ToS has to be affirmatively accepted. I doubt that happened in this case.

orblivion · on March 30, 2023

Seems to me like it makes Google look kind of pathetic. That's worse than any legal issue here. (Caveat: assuming I understand the situation correctly)

naikrovek · on March 30, 2023

if ChatGPT trained using Bard data, this site would be LIT UP because of OpenAI's association with Microsoft.

but it's google so no big deal right?

ankit219 · on March 30, 2023

According to the article, the story goes this way: This engineer Jacob Devlin raised his concerns on training Bard with ShareGPT data. Then he directly joined OpenAI.

He also claims that Google were about to do it, and then they stopped after his warnings. And presumably removed every trace of openai's responses.

A couple of things:

1. So, Bard could have been trained on ShareGPT but it's not - according to the same engineer who raised the concern (and google denial in the verge).

2. Since he directly joined OpenAI, he could have told them and they could have taken action, and nothing is public on that front yet. Probably nothing to see here.

Edit: The engineer too wasnt directly involved with the Bard team, it appeared to him that Bard team was heavily relying on ShareGPT.

binarymax · on March 30, 2023

For those that don't know, Jacob Devlin was the lead engineer and first publisher of the widely popular BERT model architecture, and initial bert-base models released by Google.

https://www.semanticscholar.org/author/Jacob-Devlin/39172707

rgbrenner · on March 30, 2023

Take what action? Pretty sure that’s not illegal, especially since the training data is ai generated and therefore can’t be copyrighted.

chatmasta · on March 30, 2023

I think the oomph behind the story is due to it being embarrassing, rather than illegal.

frozenlettuce · on March 30, 2023

Not illegal, but that won't stop people from finding it amusing that the company considered the world's beacon of innovation is copying someone else's homework. It's hard being the favorite horse.

friend_and_foe · on March 31, 2023

I don't think google has really been considered a beacon of innovation for a number of years.

dvngnt_ · on March 30, 2023

tech companies steal ideas all the time. snapchat invented stories and now whatsapp, facebook, instagram, tiktok, youtube have them

dahfizz · on March 30, 2023

OpenAI could have blocked Google's accounts, for example. Nothing really to do with legality.

sebzim4500 · on March 30, 2023

No one is alleging that Google directly used OpenAI's API to get training data (which would be unambiguously against TOS). The claim is that they downloaded examples from ShareGPT.

whimsicalism · on March 30, 2023

Your comment doesn't make sense to me.

> Bard team was heavily relying on ShareGPT.

> He also claims that Google were about to do it, and then they stopped after his warnings.

So were they heavily relying or were they about to and then stopped? It's unclear from your comment. Could you link where you're getting this info from? The Information article is walled, unfortunately.

ankit219 · on March 30, 2023

[1] gives a jist as well.

What I meant to say was that: Acc to The Information article the engineer raised concerns because it appeared to him (article wording) Bard team was using (and heavily reliant on) ShareGPT for Bard training. The engineer wasnt working on Bard and presumably someone told him or somehow he got the impression that Bard team was reliant on ShareGPT. At the time he was at Google.

Then, when he raised concerns to Sundar Pichai, Bard team stopped doing it and also scrapped any traces of ShareGPT data. So, the headline is false and Bard (again presumably) is not trained on any of ShareGPT data.

[1]: https://www.theverge.com/2023/3/29/23662621/google-bard-chat...

whimsicalism · on March 30, 2023

I think I might be confused by your usage of “about to do it” in your original comment to mean “actively doing it.”

You claim that the very engineer accusing Google of training Bard on ShareGPT acknowledges that the final product was not. As far as I can tell, Devlin did no such thing.

Not sure why you would presume they restarted their expensive training process.

It just doesn’t seem like a good faith characterization to me.

seanhunter · on March 30, 2023

"What's sauce for the goose is sauce for the gander" as the legal cliche goes. OpenAI cannot on the one hand claim that google did something wrong if they used their outputs as part of the bard training while simultaneously on the other hand claiming they themselves are free to use everyone on the internets content to train their model.

Either they believe that training should respect copyright (in which case they could not do what they do) or they believe that training is fair use (in which case they cannot possibly object to Google doing the same as them).

sebzim4500 · on March 30, 2023

No one is alleging copyright violations. The claim is that they violated OpenAI's terms of service. We don't know whether Google ever even agreed to those terms of service in the first place.

seanhunter · on March 30, 2023

Are OpenAI saying they have adhered to the terms of service of all the content they have used?

dragonwriter · on March 30, 2023

Content is not subject to terms of service.

Services are subject to terms of service. (If content is received through a service, the terms of service may govern use of it, but that’s not a feature of the content, but the acquisition route.)

danShumway · on March 30, 2023

ShareGPT isn't part of that service though. Yes, it would be a TOS violation if Google directly used ChatGPT to generate transcripts -- but not even the original Twitter thread is claiming that.

The only claim being made against Google here is that they used ChatGPT content. I can't find any sources claiming that Google made use of an OpenAI service. So the distinction is correct, but doesn't seem particularly valuable in this context -- using data from ShareGPT is not a TOS violation.

deckard1 · on March 30, 2023

Terms of Service, Terms and Conditions, and Terms of Use are all the same thing. There is no legal difference between them.

> that’s not a feature of the content, but the acquisition route.

It's neither. It's a feature of contract law.

az226 · on March 30, 2023

[flagged]

cornholio · on March 30, 2023

That's nonsensical. An AI is either transformative or it's not, it's an intrinsic quality that has nothing to do with the training data or the "product" type. If OpenAI is sufficiently transformative to claim fair use (which I don't believe for a second, alas), then any other AI built on similar fundamentals has the same claims and can crunch any data their creators see fit, including the output of other AIs.

danShumway · on March 30, 2023

So?

First off, the whole argument behind these models has been from day one that training on copyrighted material is fair use. At most this would be a TOS violation. Second off, AI output is not subject to copyright, so it has even less protection than the original works it was trained on.

Copyright maximalism for me, but not for thee. It's just so silly for someone working at OpenAI to complain about this.

magicalist · on March 30, 2023

> At most this would be a TOS violation

And would it be a ShareGPT TOS violation (assuming it had any)?

If OpenAI says "you can share these online but don't use them for AI training", people share them on another site, and then someone else comes along to scrape that site for AI training data, there's no relationship between OpenAI and the scraper for the TOS to apply to.

Normally I think you'd rely on copyright in that kind of case, but that doesn't apply to ChatGPT's output, so...

danShumway · on March 30, 2023

Right. And what even is the penalty of that TOS violation and how enforceable is it?

I don't have an OpenAI account. I have never agreed to any TOS. I don't see what legal claim they would have to stop me from training an LLM on ShareGPT.

judge2020 · on March 31, 2023

If Google were specifically going to ChatGPT to get its output and train off of it, they could be sued for breach of contract - and OpenAI would likely have a pretty good argument:

- they specifically tried extracting and learning from our model when it says you can't in our TOS

- this makes it easier for them to compete with us via the data they obtain in their breach of contract

- more businesses and enterprises might pass up on renting a shared or dedicated instance from us if they can just get it from Google

danShumway · on March 31, 2023

> If Google were specifically going to ChatGPT to get its output and train off of it

But (correct me if I'm wrong) I don't think anyone anywhere is claiming that's what happened. The claim was just that Google looked into using existing chats that it scraped from another website.

Edit: realizing you're probably replying specifically to the question I asked, "and what even is the penalty of that TOS violation and how enforceable is it?" In which case, yeah, that's a decent clarification to add, sorry for pushing back on it.

yreg · on March 30, 2023

> It's just so silly for someone working at OpenAI to complain about this.

Who from OpenAI is complaining?

danShumway · on March 30, 2023

My understanding is that the Twitter thread author works at OpenAI. Maybe I'm wrong about that.

yreg · on March 31, 2023

According to his bio, he works at Vercel. He made a hobby project called ShareGPT[0] and that's probably where the accusation came from.

[0] - https://sharegpt.com/

danShumway · on March 31, 2023

Thanks for the clarification. Aside from the OP, I haven't seen anyone from OpenAI commenting on this, so yeah, unless I've missed something I think you're correct to point that they're not involved so far.

robocat · on March 30, 2023

> AI output is not subject to copyright

The chats include human output too, which is presumably copyrighted, and is presumably necessary for training purposes.

danShumway · on March 30, 2023

OpenAI doesn't own the copyright on the human aspects of the chats, so it still doesn't really have a claim to make around them. And even if it did own that copyright, we loop right back around to "wait, training an AI on copyrighted material isn't fair use now?"

There's no way that ChatGPT's conversations are going to be subject to more intellectual property protection than the human chats it was trained on.

dathinab · on March 30, 2023

People complained that new AI is "stealing" from artists.

But stealing from other AI turns out to often be easier.

And this is where things get fun, because companies like OpenAI want to be able to train on all the data without any explicit permissions from the creators, but the moment people do the same to them they likely (we will see) be very much against it.

So it will be interesting if they will be able to both have and eat the cake (e.g. by using Microsoft lobby to push absurd law) or will they fall apart due to cannibalization making it non profitable to create better AI.

EDIT: This comment isn't specific to Google/Bert, so it doesn't matter weather Google actually did so or weather not.

commoner · on March 30, 2023

I can see the GitHub Copilot controversy being resolved in this way. If Microsoft, GitHub, and OpenAI successfully use the fair use defense for Copilot's appropriation of proprietary and incompatibly licensed code, then a free and open source alternative to Copilot can be trained on Copilot's outputs.

After all, the GitHub Copilot Product Specific Terms say:

> 2. Ownership of Suggestions and Your Code

> GitHub does not claim any ownership rights in Suggestions. You retain ownership of Your Code.

https://github.com/customer-terms/github-copilot-product-spe...

circuit10 · on April 1, 2023

Why would it need to be trained on Copilot’s output? Its training data is publicly available code on GitHub, so just use that directly. ChatGPT is different because they specifically trained it as an assistant with a private dataset

commoner · on April 4, 2023

That was in reference to GP's point that

> stealing from other AI turns out to often be easier

If that holds true for Copilot, then it would make sense for a free and open source alternative to train on Copilot's outputs.

century19 · on March 30, 2023

Google accused Microsoft Bing of using them for page rankings a few years ago. Setup a sting to show that when you searched for something unique on Google using MS Explorer, shortly afterwards the same search result would start showing up on Bing.

This was seen as deeply embarrassing for Microsoft at the time.

blisterpeanuts · on March 30, 2023

Embarrassing, maybe, but imitation is the sincerest form of flattery.

int_19h · on March 30, 2023

Indeed, which is why the biggest impact this revelation is likely to have (if proven true) is on Google's stock.

century19 · on March 31, 2023

It's not imitation though, which would be copying their method.

godzillabrennus · on March 30, 2023

The deeply embarrassing period at Microsoft began and ended when Ballmer ran the show. The Bing results saga was the hangover.

realPubkey · on March 30, 2023

Thankfully archive.org exists, otherwise it would not be possible to get good training data in a few years when the internet is flooded with AI content.

bko · on March 30, 2023

Isn't most of the internet available through common crawl? I don't know what percentage of training data is just that data set but i assume it's enough for anyone with enough compute and ingenuity to create a reasonable LLM

JustLurking2022 · on March 30, 2023

Missed the point - they are saying that, in the future, there will be no human generated content left on the Internet.

edgyquant · on March 30, 2023

Which is a baseless hyperbole. We get it, blog spam is annoying. That doesn’t change the fact that humans generate a ton of data just interacting with one another online.

Karawebnetwork · on March 30, 2023

As a forum moderator, I have transitioned to relying heavily on AI-generated responses to users.

These responses can range from short and concise ("Friendly reminder: please ensure that all content posted adheres to our rules regarding hate speech. Let's work together to maintain a safe and inclusive community for everyone") to lengthy explanations of underlying issues.

By using AI-generated content, a small moderation team can efficiently manage a large group of users in a timely manner.

This approach is becoming increasingly common, as evidenced by the rise in AI-generated comments on popular sites such as HN, Reddit, Twitter, and Facebook.

Many users are also using AI tools to fix grammar issues and add extra content to their comments, which can be tempting but may result in unintentional changes to the original message.

In fact, I myself have used this technique to edit this very comment to provide an example.

---- Original comment:

As an online forum mod, I switched to mainly using AI to generate replies to users. Some are very short ("Hey! Remember the rules.") and some are long paragraphs explaining underlying issues. Someone training on my replies would pretty much train on AI generated content without knowing. It allows a small moderation team to moderate a large group quickly. I know that I am not alone in this.

There is also a raise in AI generated comments on sites like HN, Reddit, Twitter and Facebook. It's tempting to copy-paste a comment in AI for it to fix grammar issues, which often results in extra content being added to text. In fact, I did it for this comment.

Jensson · on March 30, 2023

> Original comment

The original comment is much better, please stop rewriting your comments using OpenAI.

> In fact, I did it for this comment.

Yes, it was obvious from the second sentence. The way ChatGPT structures text by default is very different from how most humans writes. Always the same "By using", "These X can range from" etc.

Padding your text with more words doesn't make it better, more words makes it worse, this isn't school.

Karawebnetwork · on March 31, 2023

Interesting, the "By using" was my own addition to shorten a long sentence it had generated that distracted from the example.

To be more clear, using AI to rewrite comments such as this one is not something I often do. My personal use of it for moderation purposes is more prompt based than pasting a long comment for grammar and spelling corrections.

What I did here was an example and that example provided the same criticism that you wrote here as a reply ("which can be tempting but may result in unintentional changes to the original message"). In other words, makes the text more verbose and sanitizes the writing style.

The prompt we use for moderation contains our site's rules and some added context. So using ChatGPT, we can paste in someone's comment and ask the bot to write a short text explaining how that comment does not follow our rules and what the user can do.

"Using the rules above, write a very short message for a user that wrote a rule breaking comment. Show empathy. Use simple English. Explain the rules that were broken. The comment is [comment here]"

Using this saves a lot of time. Is the quality of the comment not as good as it could be if it was written by a human? Absolutely. However, using AI let us change the user:mod ratio in a way. Automoderators are nothing new, what is new is that now the automoderator can take context into account and provide a customized message.

sebzim4500 · on March 30, 2023

And how are you going to distinguish those interactions from chatbots trying to sell you something?

chatmasta · on March 30, 2023

OpenAI at least can track the hashes of all content it's ever output, and filter that content out of future training data. Of course they won't be able to do this for the output of other LLMs, but maybe we'll see something like a federated bloom index or something.

Agreed there is no perfect solution though, and it will definitely be a problem finding high quality training data in the future.

CuriouslyC · on March 30, 2023

A network of trust, backed by a social graph, which can be used to filter untrusted content.

sebzim4500 · on March 30, 2023

What if people start trusting the AI more than other people? It will tell them exactly what they want to hear.

CuriouslyC · on March 30, 2023

AI content will be associated with a user or organization in the trust graph. If someone you trust trusts a user or organization who posts AI content, you're free to revoke your trust in that person or blacklist the specific users/organizations you don't want to see anymore.

pessimizer · on March 31, 2023

We've been pretending to be just about to do this for decades. The fact is that internet companies will not develop a network of trust, because they are primarily advertisers looking for better ways to abuse trust.

hnlmorg · on March 30, 2023

I think their comment was meant to be taken as humour rather than a literal prediction.

sn_master · on March 30, 2023

I am assuming OP means when AI takes over there's going to be a content explosion and most of what's available on the common internet will be AI generated content rather than human made one and they want to use archive.org to get access to the pre-AI internet.

aftbit · on March 30, 2023

Definitely not "most" of the internet. The internet is many exabytes at this point, while Common Crawl is only low petabytes.

adastra22 · on March 31, 2023

The entire internet has been flooded with AI / autogenerated SEO spam for years now.

WithinReason · on March 30, 2023

Only if the amount of bad information in ChatGPT content that makes it back into the training set is worse than what's already on internet already is. Probably the outputs that make it back are outputs that are better than average, because those are more likely to be posted elsewhere.

Imnimo · on March 30, 2023

I don't care at all about this from a copyright or data ownership perspective, but I am a little skeptical that it's a good idea to be this incestuous with training data in the long run. It's one thing to do fine tuning or knowledge distillation for specialized domains or shrinking models. But if you're trying to train your own foundation model, is relying on output from other foundation models going to make them learn to imitate their own errors?

sdenton4 · on March 30, 2023

Things like ShareGPT or PromptHero give vast repositories of human-curated ML outputs, which make them fantastic for at least incremental improvement on the base model. In the grand scheme of things, these will be just another style, mixed in with all the other crap in the training set, so I don't imagine it's too harmful... eg, 'paint starry night in the style of midjourney 5'

pessimizer · on March 31, 2023

The internet is an easy, convenient way to train LLMs, but I'm pretty sure you could train them with microphones. One cloud surveillance company, like maybe for networked security monitoring, or maybe just Alexa/Siri etc. could dip into as many and as varied communications per hour than all the books ever written.

berkle4455 · on March 30, 2023

Where are any LLMs going to get data from as they become more ubiquitous and humans produce less publicly accessible original and thoughtful content?

The whole thing is a plateaued feedback loop.

TillE · on March 30, 2023

It'd be cool to have an LLM that's trained almost exclusively on books from good publishers, and other select sources. Working out licensing deals would be a challenge, of course.

whimsicalism · on March 30, 2023

Corpora is likely too small. It would just be an "LM"

whimsicalism · on March 30, 2023

Probably from multiple modalities as well as extending the sequence lookback length further and further.

They have low perplexity now, but the perplexity possible when predicting the next word on page 365 of a book where you can attend over the last 364 pages will allow even more complexity to emerge.

whimsicalism · on March 30, 2023

But Bard isn't a foundation model?

Clearly this data has value as some sort of RLHF finetuning dataset. Honestly they probably used it for negative examples.

seydor · on March 30, 2023

Funny how NOBODY seems to care that all of their training data, including sharegpt is copyrighted by end users. Not openai or google

datkam · on March 30, 2023

It only matters when it hurts a large corporation, apparently...

sebzim4500 · on March 30, 2023

Did Google ever agree to these terms of service? Why should they care?

From a legal point of view this doesn't matter and from a moral point of view it's hilarious.

nico · on March 30, 2023

If a Google employee working on this thing ever agreed to OpenAI's terms of service, they might be screwed.

From OpenAI's terms:

(c) Restrictions. You may not (i) use the Services in a way that infringes, misappropriates or violates any person’s rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with OpenAI;

(j) Equitable Remedies. You acknowledge that if you violate or breach these Terms, it may cause irreparable harm to OpenAI and its affiliates, and OpenAI shall have the right to seek injunctive relief against you in addition to any other legal remedies.

Those two very clearly establish that if you use the output of their service to develop your own models, then you are in breach of the terms and they can seek injunctive relief against you (stop you from working until the case is resolved).

Jevon23 · on March 30, 2023

I hereby set a terms of service for everything I post on the internet from now on. OpenAI may not train future GPT models on my words or my code without my express written permission.

…

Somehow, I don’t think they’ll care.

nico · on March 30, 2023

Sure. If you can get everyone to create an account and agree to those terms before reading your comments, you might have a case.

Otherwise, it will be considered public information, at which point it is free to be scraped by anyone (see the precedent set by the LinkedIn/hiQ case).

verdverm · on March 30, 2023

LinkedIn won that case on appeal, HiQ waas found to be violating the ToS, common misconception

I was pointed at a link explaining the case here on HN, after trying to make a similar point, but cannot find the link currently

edit, not the one I was pointed at, but similar

https://www.fbm.com/publications/what-recent-rulings-in-hiq-...

sebzim4500 · on March 30, 2023

That's just because they made accounts and so agreed to the terms right?

From your link:

>These rulings suggest that courts are much more comfortable restricting scraping activity where the parties have agreed by contract (whether directly or through agents) not to scrape. But courts remain wary of applying the CFAA and the potential criminal consequences it carries to scraping. The apparent exception is when a company engages in a pattern of intentionally creating fake accounts to collect logged-in data.

verdverm · on March 30, 2023

No, the case did not decide anything, no precedent was set. The point is that you cannot use this case to argue that you can scrape public data free of consequence

syrrim · on March 30, 2023

What's the legal status of such terms of service? Suppose you simply said "i didn't agree to these terms" - what's the consequence? It seems like the strongest thing they could legitimately do would be to kick you off of their platform. Simply writing "we can seek injunctive relief" doesn't make it so.

sebzim4500 · on March 30, 2023

Wouldn't that only apply if that employee was acting as an agent of Google at the time?

Otherwise it would create an interesting dynamic that startups where no-one has created an OpenAI account would have a massive advantage, since they can freely scrape ShareGPT data and train on it while larger companies have enough employees that someone must have signed every TOS.

mattbee · on March 30, 2023

Good luck to them. AI models are automated plagiarism, top to bottom. None of us gave OpenAI permission to derive their model from our writing, surely billions of dollars worth, but they took it anyway. Copyright hasn't caught up so all that stolen value rests securely with OpenAI. If we're not getting that back, I don't see why AI competitors should have any qualms about borrowing each others' work.

danShumway · on March 30, 2023

I'm not a copyright maximalist, and I kind of agree that training should be fair use. Maybe I'm right about that, maybe I'm wrong. BUT importantly, that has to go hand in hand with an acknowledgement that AI material is not copyrightable and that training on other model output is fine.

What companies like OpenAI want is a system where everything they build is protected, and nothing that anyone else builds is protected. It's wildly hypocritical, what's good for the goose is good for the gander.

That some AI proponents are now freaking out about how model output can be legally used shows that on some level those people weren't really honestly engaging with artists who were freaking out about their work being appropriated to copy them. It's all just "learning from the art" until it affects somebody's competitive moat, and then suddenly people do understand how LLM weights could be seen as a derivative work of their inputs.

shagie · on March 30, 2023

Building things and maintaining it as a trade secret can be protected as a trade secret.

Trade secrets don't need to be copyrightable (e.g. list of customer numbers is a trade secret but not copyrightable).

https://copyrightalliance.org/faqs/difference-copyright-pate...

> Trade secret protection protects secrets from unauthorized disclosure and use by others. A trade secret is information that has an economic benefit due to its secret nature, has value to others who cannot legitimately obtain it, and is subject to reasonable efforts to maintain its secrecy. The protections afforded by trade secret law are very different from others forms of IP.

mattnewton · on March 30, 2023

I am not a lawyer, but I don’t believe a trade secret would prevent someone from reverse engineering your model’s knowledge from it’s output though, in the same way that it doesn’t prevent someone from reverse engineering your hot sauce from buying a bunch and experimenting with the ingredients until it tastes similar.

shagie · on March 30, 2023

Yep, that's correct.

My point was more of there are protections for things that aren't copyrightable. If the model is protected as a trade secret, then it is a trade secret.

The example of the hot sauce recipe is quite apt - the recipe isn't copyrightable, but you can be certain that the secret formula for how to make Coca-Cola syrup is protected as a trade secret.

https://www.coca-colacompany.com/company/history/coca-cola-f...

seydor · on March 30, 2023

That shouldn't be hard. Are Google's results copyrightable?

waselighis · on March 30, 2023

Our writing, our code, our artwork... Furthermore, the U.S. Copyright Office (USCO) concluded that AI-generated works on their own cannot be copyright, so these ChatGPT logs are free game. It would be hypocritical to think that Google is wrong and OpenAI is not.

LegitShady · on March 30, 2023

its not even that on their own those works cant be copywritten. its that even when you make changes to those works, your changes might qualify for copyright but they do not affect the copyright status of the ai generated portions of the work.

if you used ai to design a new superhero and then added pink shoes, yellow hair, and a beard, only those three elements would possibly be able to be protected by copywrite. your additions do not change the status of the underlying ai work which cannot be protected and is available for anyone to use.

rhtgrg · on March 30, 2023

> if you used ai to design a new superhero and then added pink shoes, yellow hair, and a beard

Wouldn't that depend heavily on the prompt used (among other factors such as image to image and ControlNet)? You could be specifying lots of detail about the design in your prompt, and the AI could only be generating concept artwork with little variation from what you already provided.

If I'm already providing the pose, the face, and the outfit for a character (say via ControlNet and Textual Inversion), generating <my_character> should be no different from generating <superman>, that is to say, the copyright already exists thanks to my work and the AI is just a tool, the output of which should have no bearing on who owns that copyright (DC is going to be perfectly able to challenge my commercial use of AI generated superman artwork).

LegitShady · on March 30, 2023

According to the copyright board a promot is not anymore than any person commissioning a work from an artist, which does not provide copyright, and the lack of human authorship for the design decisions still stops it from being protected by copyright.

rhtgrg · on March 31, 2023

Textual inversion involves providing self-created images, which should confer copyright in the same way AI images of DC's superman are considered to fall under the copyright of DC. In other words, commissioning fanart still allows the original owner of the IP to exert copyright -- shouldn't that be the case here?