Ask HN: Is open sourcing your code still an option with the rise of AI?

krapp · 2024-06-28T22:40:54 1719614454

You aren't going to have a choice - anything you make available on the web will be consumed by AI regardless of the license. That is essentially going to be the primary purpose of the web going forward, to provide training data for AI.

b20000 · 2024-06-29T01:49:24 1719625764

I wonder what this means for SEO. If someone develops content and someone else then generates very similar content using AI, how do the search engines know who to rank higher?

krapp · 2024-06-29T13:11:00 1719666660

I suspect both SEO and search engines as a concept will be made obsolete. SEO doesn't even make sense when machines are creating the content and consuming it. AI is going to be the primary UI for software applications in general, so you won't even really use a browser, you'll talk to an AI (probably directly integrated with your OS) and it will generate content or dynamic applications on the fly - you won't ever even interact with the web directly, much less with human-created content.

codingdave · 2024-06-30T11:58:03 1719748683

Everything evolves. The internet existed before the web, and we navigated its content. The web existed before search engines did, and we navigated its content. Content and ways to navigate it will continue to exist once search engines are no longer relevant. Maybe SEO simply doesn't matter?

palata · 2024-07-01T06:23:19 1719814999

SEO will become LLMO, with the added bonus that if you do it right, you can get the LLM to lie in your favour.

8organicbits · 2024-06-28T23:29:25 1719617365

You'll need to clarify what your specific goals and concerns are. Most of my code is open source because I want people to build derivatives. There's lots of license variations to consider.

palata · 2024-07-01T06:24:51 1719815091

The question is specifically about the problem with licenses, if I understand correctly. AI is now a big license-laundering machine, so there are not "lots of license variations to consider": AI just removes the license, whatever it is.

ericxanderson · 2024-06-28T23:34:40 1719617680

Also, depending how the legalese shakes out, you could still be compensated if they stole your code.

b20000 · 2024-06-29T01:45:30 1719625530

and how are you going to be compensated when an AI model has been trained on it, and your code is spit out? and who will pay for the litigation?

shrimp_emoji · 2024-06-30T22:55:24 1719788124

For the legally paranoid, this is already a "fruit from the poisoned tree" situation.

Some people are informally "banned" from working on whole categories of open source projects because they've had provable exposure to closed source code in the same domain.

That's a partial motivator in maintaining pseudonymity online. If no one knows who you are, and they don't know you've done kernel work at Microsoft, you or the open source kernel project you're contributing to can't be sued by Microsoft, hypothetically, for "inspiration".

There's legal questions here for which there's never been precedent, so nobody knows where the line is -- and this is all before LLMs ever entered the picture. For other countries, whose courts don't rely on precedent, it's even more of a minefield since the legal outcome of a case is always undefined behavior.

tjr · 2024-06-28T23:41:25 1719618085

A lot of companies avoid using GPL code at all, because they don't want to even accidentally find their own proprietary code subject to being released under the GPL.

I can imagine that some similar licensing concept could be applied to copyrightable works with respect to AI training. Use a legally restricted work for AI training, and your entire AI training set is subject to free public release?

josephcsible · 2024-06-29T00:43:54 1719621834

No such licensing concept could work. The AI companies' argument is that what they're doing counts as fair use, so they don't need to abide by any license. If that argument is true, then such a thing wouldn't do anything, and if it's false, then such a thing is unnecessary, because they're already violating even the most lax licenses (e.g., MIT) by failing to provide the required attribution.

muzani · 2024-06-30T15:02:03 1719759723

Can't one just literally say that AI is not allowed? It's not fair use when you're going directly against the license.

josephcsible · 2024-06-30T22:45:01 1719787501

> It's not fair use when you're going directly against the license.

Isn't the whole point of fair use that it's stuff you can do even if the copyright holder isn't okay with it?

tjr · 2024-07-01T18:10:47 1719857447

https://www.copyright.gov/fair-use/

Fair use can be a bit ambiguous, but I think there are definitely grounds to claim that what the AI companies are doing is not fair use.

I imagine it will have to be tried in court for anyone to say for certain. (And I imagine whatever court ruling happens would be contested. We may actually need multiple court rulings...)

b20000 · 2024-06-29T01:46:47 1719625607

i don't think anyone training AI models cares about any of that right now

palata · 2024-07-01T06:26:15 1719815175

If engineers cared about ethics, we would know it.

talldayo · 2024-06-28T22:39:43 1719614383

> with the rise of AI and inability to enforce licenses or control whether code is used for training.

The inability to precisely enforce license abuse has always been a problem with Open-Sourcing your code. I don't understand how someone would see AI as a dealbreaker in the real-world of Open Source pragmatism.

palata · 2024-07-01T06:27:15 1719815235

Because of the scale? AI is orders of magnitude better at laundering open source licenses than anything that existed before.

b20000 · 2024-06-29T01:48:07 1719625687

the open source licenses are not provided along with the AI generated code, AFAIK

talldayo · 2024-06-29T02:03:00 1719626580

They're not included when Tesla ships them illegally either: https://sfconservancy.org/blog/2018/may/18/tesla-incomplete-...

Even when they were called out on it, it took them years to respond. And this was before LLMs; even then we knew that releasing things under Open Source is mostly a good-faith social contract.

These "is it still okay to x because AI exists" questions are besides the point. If you weren't considering the worst-case scenario outcome of your actions already then maybe you should. If AI forces your hand, so be it. But you can always Open Source your code, as long as you've accepted the same risks that existed for the past 20-30 years.

jvalencia · 2024-06-28T22:51:19 1719615079

I think this is the wrong question. The question assumes that AI is a derivative to your code. The reality is that there will come a day, sooner than you like, when the AI can replicate the application you wrote without your code. In that world, what does it matter what code is open vs closed?

palata · 2024-07-01T06:28:09 1719815289

> The reality is that there will come a day, sooner than you like

Were you one of those saying that Tesla's FSD would come "soon", too?

ericxanderson · 2024-06-28T23:35:27 1719617727

Royalties. There's yet a solid legal framework.

b20000 · 2024-06-29T01:47:36 1719625656

i think we are a very long way from that, there is no "intelligence", just advanced interpolation

elpocko · 2024-06-29T05:47:19 1719640039

The license under which I have published my projects applies equally to both humans and machines. I see no reason to differentiate.

palata · 2024-07-01T06:29:16 1719815356

Except that those who use AI say that the license does not apply ("fair use" giving them the right to ignore the license).

It's not a question of humans vs machines: it's rather a question of whether those who train AI get to ignore copyright or not.

elpocko · 2024-07-01T14:16:31 1719843391

OP asked a specific question though.

FWIW, learning from copyrighted data should not be a copyright violation. This applies equally to both humans and machines. I see no reason to differentiate.

palata · 2024-07-01T17:45:14 1719855914

> I see no reason to differentiate.

And those who disagree with you precisely see reasons to differentiate. I do.

elpocko · 2024-07-01T18:39:53 1719859193

Well, I hope no one will listen to you and your Luddite friends because generative AI is amazing and one of the best things humans have created in my lifetime. Take all the available data and make amazing and fun things with it! That's what the Internet was created for.

palata · 2024-07-01T22:38:43 1719873523

> generative AI is amazing and one of the best things humans have created in my lifetime

Admittedly, humans haven't created a ton of great things in the last couple of decades. The evolution of Tech in the last 20 years is depressing. I guess I can understand how you may consider it "amazing", though I would expect that a teenager could understand how their life is going to get a lot harder in the future, and that Tech is a big part of the problem.

elpocko · 2024-07-01T23:10:11 1719875411

Damn it, I thought I was a miserable prick. It's statistically likely that I'm older than you; just not as bitter yet.

I am optimistic and cheerful considering the mind-blowing advancements in generative AI. Given my good fortune to be alive during this time, I'm pleased to see these developments unfold, and I try my best to disregard the haters who are disparaging one of the greatest achievements in my lifetime.

I, a teenager who create their account here in 2016, instructed a locally run LLM to write the above message.

palata · 2024-07-02T07:30:09 1719905409

Sorry, I didn't mean that you were necessarily a teenager. I was just saying that given the last 20 years in Tech, a teenager has only known it getting worse. You could be 60 and think that generative AI is one of the most amazing things invented in your lifetime, for all I know.

shrimp_emoji · 2024-07-01T18:51:08 1719859868

Kind of. Making stuff is great; having an AI make it for you is not so great. It's kind of great. The art is soulless; the code is wonky. It enables people to do more, but, if anything, I think what we've learned is that taking shortcuts to do more stuff is bad. Hey, let's allow people to download modules right into their codebase of code other people wrote so it's easier for them to do more stuff! Enter left-pad. Enter a flood of crapware. Enter dependency build issues. Enter more bloated software than you've ever seen before. Enter needing more RAM than ever to compile AUR packages during hour long updates (assuming no dependency errors happen).

I am a luddite. I didn't used to be, and I don't want to be, and it's positioning myself on the wrong side of history, consigning me to a diverging function of increasing bitterness, but I feel forced into it by the direction tech is going. I think we've bitten the apple in our greed, and now we have to worry about our car spying on us because it's equipped with an autonomous and intelligent agent of oppression where, just a few years ago, that wasn't possible. "Progress", huh?

Anyway, I had to say that before saying I totally agree with you about your open source philosophy. I hate Microsoft, but, before, Microsoft employees could read my code, take inspiration, and write stuff with it. Hell, they could even paste snippets of it in. I have no way of knowing. Now, I'm uploading my code directly to Microsoft (okay: this fucks with me), and a Microsoft virtual mind is reading my code, taking inspiration, and/or pasting snippets of it in. It's a petty difference. The idea of free software is that anyone, man or machine, can see it and learn from it.

If you truly believe in the idea of free software, you should de facto be okay with AIs, Nazis, terrorists, $MEGACORP_YOU_DESPISE, ANYONE using your code.

The biggest differences are the increased probability that someone else will now be using your code and that you won't be accredited or your GPL license will be violated, but I consider those pretty ethereal issues.

palata · 2024-07-01T22:34:50 1719873290

I am a luddite as well, and didn't used to be.

On the one hand, "working in tech" became so accessible that titles like "prompt engineer" don't sound completely ridiculous. Those people don't seem interested in understanding how technology works, but rather in making profit with whatever low-hanging fruit they find.

On the other hand, actually competent engineers get paid a ton to ignore any kind of ethics. They just get rich while having fun building tech that the first group will use.

The combination of both (unethical tech and not-so-competent "engineers" building on top of it) is destroying the world.

> The biggest differences are the increased probability that someone else will now be using your code and that you won't be accredited or your GPL license will be violated, but I consider those pretty ethereal issues.

Because you don't care about Free Software. I do.

xan_ps007 · 2024-07-01T05:53:17 1719813197

We open sourced our project long back - https://github.com/bolna-ai/bolna because eventually it'll be fed to the agent one way or the other :)