Meta AI: "The Future of AI Is Open Source and Decentralized"

rkou · 2024-09-18T18:51:49 1726685509

And what about the future of social media?

This is such devious, but increasingly obvious, narrative crafting by a commercial entity that has proven itself adversarial to an open and decentralized internet / ideas and knowledge economy.

The argument goes as follows:

- The future of AI is open source and decentralized

- We want to win the future of AI instead, become a central leader and player in the collective open-source community (a corporate entity with personhood for which Mark is the human mask/spokesperson)

- So let's call our open-weight models open-source, and benefit from its imago, require all Llama developers to transfer any goodwill to us, and decentralize responsibility and liability, for when our 20 million dollar plus "AI jet engine" Waifu emulator causes harm.

Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.

bee_rider · 2024-09-18T19:25:55 1726687555

> Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.

As much as I hate Facebook, I think that seems pretty… reasonable? These AI tools are just tools. If somebody uses a crayon to violate copyright, the crayon is not to blame, and certainly the crayon company is not, the person using it is.

The fact that Facebook won’t voluntarily take liability for any thing their users’ users’ might do with their software means that software might not be useable in some cases. It is a reason to avoid that software if you have one of those use cases.

But I think if you find some company that says “yes, we’ll be responsible for anything your users do with with our product,” I mean… that seems like a hard promise to take seriously, right?

gyomu · 2024-09-18T23:14:01 1726701241

This is a bad analogy. The factory producing crayons doesn’t need to ingest hundreds of millions of copyrighted works as a fundamental part of its process to make crayons.

bee_rider · 2024-09-19T03:37:20 1726717040

I don’t think it is a bad analogy, it is just separating out the issues.

If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.

dartos · 2024-09-19T04:04:42 1726718682

Crayons aren’t made by mashing people’s artwork through a gpu.

Crayons don’t generate content either.

If I download something from megaupload (rip) megaupload is the one that gets in trouble. They are storing, compressing, and shipping that information to me.

The same thing happens with AI, the information is just encoded in the model weights instead of a video or text encoding or whatever. When you download a model, you’re downloading a lossy compressed version of all the data it was trained on.

bee_rider · 2024-09-19T04:50:15 1726721415

This seems more like an argument that the model just shouldn’t have been created, or that it shouldn’t be used. If a model is just an lossy compressed version of a bunch of infringing content, why would Facebook (or OpenAI, or anybody else hosting a model and providing an API to it) be in the clear?

camillomiller · 2024-09-19T06:19:51 1726726791

To be fair, maybe yes, these models shouldn’t have been created. Well they have been created so now we need a new novel way to make sure they don’t damage other people’s work. Something like this did not exist before, and therefore needs a new set of rules that the model creators, with all their might and power, are trying to strongly lobby against.

dartos · 2024-09-19T12:38:36 1726749516

Tech likes to follow the “ask for forgiveness, not for permission “ motto.

If OpenAI, Facebook, or whoever asked for permission to gobble up all publicly visible data to train a program to output statistically similar data, I don’t believe they would’ve got the permission.

In that sense, I don’t think these models should’ve been made.

I dont think any of those companies would be in the clear. That’s my point.

AI is a copyright black hole, albeit a useful one.

thesz · 2024-09-20T05:35:45 1726810545

Let's say a factory builds a mega puzzle from many images shredded to identically-shaped puzzle pieces so you can piece them together as you want or need. Some pieces from some images are omitted due to their closeness in image space or due to them being infrequent enough.

This 70B pieces puzzle is an LLM.

You can reproduces likings of any hero from Marvel universe close enough using this puzzle. Or you can create

Who is to blame?

rkou · 2024-09-18T19:40:35 1726688435

AI safety is expensive, or even impossible, by releasing your models for local inference (not behind API). Meta AI shifts the responsibility of highly-general highly-capable AI models to smaller developers, putting ethics, safety, legal, and guard-rails responsibility on innovators who want to innovate with AI (without having the knowledge or resources to do so by themselves) as an "open-source" hacking project.

While Mark claims his Open Source AI is safer, because fully transparent and many eyes make all bugs shallow, the latest technical report makes mention of an internal, secret, benchmark that had to be developed, because available benchmarks did not suffice at that level of capabilities. For child abuse generation, it only makes mention that it investigated this, not any results of these tests or conditions under which it possibly failed. They shove all this liability on the developer, while claiming any positive goodwill generated.

It completely loses their motivation to care for AI safety and ethics if fines don't punish them, but those who used the library to build.

Reasonable for Meta? Yes. Reasonable for us to nod along when they misuse open source to accomplish this? No.

bee_rider · 2024-09-18T20:00:02 1726689602

I think this could be a somewhat reasonable argument for the position that open AI just shouldn’t exist (there are counter arguments, but I’m not interested enough to do a back and forth on that). If Facebook can’t produce something safe, maybe they shouldn’t release anything at all.

But, I think in that case the failing is not in not taking the liability for what other people do with their tool. It is in producing the tool in the first place.

rkou · 2024-09-18T20:25:06 1726691106

Perhaps Open AI simply can't exist (too hard and expensive to coordinate/crowd-source compute and hardware). If it can, then, to me, it should and would.

OpenAI produced GPT-2, but did not release it, as it couldn't be made safe under those conditions, when not monitored or patch-able. So it put it behind an API and owned its responsibility.

I didn't take issue with Meta's business methods and can respect its cunning moves. I take issue with things like them arguing "Open Source AI improves safety", so we can't focus on the legit cost-benefits of releasing advanced, ever-so-slightly risky, AI into the hands of novices and bad actors. It would be a failure on my part if I let myself get rigamaroled.

One should ideally own that hypothetical 3% failure rate to deny CSAM request when arguing for releasing your model still. Heck, ignore it for all I care, but they damn well do know how much this goes up when the model is jailbroken. But claiming instead that your open model release will make the world a better place for children's safety, so there is not even a need to have this difficult discussion?

cornholio · 2024-09-18T23:35:32 1726702532

This strange obsession with synthetic CSAM as the absolute epitome of "AI safety" says more about the collective phobias and sensibilities of our society than about any objective "safety" issues.

Of course, from a PR perspective, it would be extremely "unsafe" for a publicly traded company to release a tool that can spew out pedophile literature, to the point of being an existential threat. Twitter was economically cancelled for much less. But as far as dangerous AI goes, it's one of the most benign and inconsequential failure modes.

Calvin02 · 2024-09-18T19:24:56 1726687496

Doesn’t Threads and Fediverse indicate that they are headed that way for social as well?

redleader55 · 2024-09-18T20:04:10 1726689850

The last time we had a corporate romance between an open source protocol/project, "XMPP + Gtalk/Facebook = <3", XMPP was crappy and it was moving too slowly to the mobile age. Gtalk/Messenger gave up on XMPP and evolved their own protocols and stopped federating with the "legacy" one.

I think the success of the "Threads + Fediverse = <3" relies on the Fediverse not throwing the towel and leaving Threads as the biggest player in the space. That would mean fixing a lot of problems that that people have with Activity Pub today.

I don't want to say the big tech are awesome and without fault, but at the end of the day big-techs will be big-techs. Let's keep the Fediverse relevant and Meta will continue to support it, otherwise it will be swallowed by the bigger fish.

bee_rider · 2024-09-18T20:13:19 1726690399

For some reason, this has made me wonder if we just need more non-classical-social-media fediverse stuff. Like of course people will glom on to Threads, it means they can interact with the network while still being inside Facebook’s walled garden…

I wonder if video game engines could use it as an alternative to Steam or Discord integration.

LtWorf · 2024-09-18T22:35:46 1726698946

The problem was not that it was not evolving. The problem was that they decided they had trapped all the users of other networks they could trap.

Slack did the same killing xmpp and irc bridge. I don't see them making a matrix bridge.

tacocataco · 2024-09-18T23:36:38 1726702598

Last I checked, there was a movement in the biggest instances to defederate from meta's embrace stage of "embrace extend extinguish" playbook. I didn't check back to see if it got pushed through.

Given the nature of the fediverse, if it happened or not depends on the instance you use/follow.

amy-petrik-214 · 2024-09-19T04:05:26 1726718726

It's got nothing to do with Meta's social media business directly. Massive as the FB dataset is, it gets mogged by google who, what with their advanced non-PHP-based infra and superior coders, basically have way more and way better and way more accessible data... and their own AI CPUs they made, and a bigger cluster, and faster software, and more store, and so on. Big picture, Google is poised to steamroll Facebook AI-wise, and if no them, then openAI+microsoft

So Meta says "well we will buy tons of compute and try to make it distributed" "we'll make the model open and people will fine-tune with data that they found" and so on. Now google and openAI aren't competing versus meta, they are competing versus meta + all compute owned by amateurs + all data scrapped by all amateurs, which is non-trivial. so it's not so much as aspiring to be #1 as capping the knees of the competition who has superior competitiveness - but people love it because the common man wins here for once.

Anyway, eventually, they'll all be open models. Near future weaker models will run on a PC, bigger models on the cluster, weakest models on the phone... then just weak models on the phone and bigger on the PC.. eventually anything and everything fits on a phone and maybe iWatch. Even Google and openAI will have to run on the PC/phone at this point, it wouldn't make sense not to. Then since people have local access to these devices, it all gets reverse engineered, boom boom boom. now they're all open

eli_gottlieb · 2024-09-18T19:37:03 1726688223

If it was really open-source you'd be able to just train one yourself.

echelon · 2024-09-18T20:37:40 1726691860

This sort of puts the whole notion of "open source" at risk.

Code is a single input and is cheap to compile, modify, and distribute. It's cheap to run.

Models are many things: data sets, data set processing code, training code, inference code, weights, etc. But it doesn't even matter if all of these inputs are "open source". Models take millions of dollars to train, and the inference costs aren't cheap either.

edit:

Remember when platforms ate the open web? We might be looking at a time where giants eat small software due to the cost and scale barriers.

indymike · 2024-09-19T13:07:11 1726751231

> We might be looking at a time where giants eat small software due to the cost and scale barriers.

This assumes that abstractions are no longer possible.

nicce · 2024-09-18T20:35:36 1726691736

Only if you were a billionaire. These models are starting to be so out of reach for single researchers or even traditional academic research groups.

jongjong · 2024-09-19T00:14:31 1726704871

Maybe the road to heaven is paved with bad intentions.

bschmidt1 · 2024-09-18T19:57:11 1726689431

It's especially rich coming from Facebook who was all for regulating everyone else in social media after they had already captured the market.

Everyone tries this. Apple tried it with lawsuits and patents, Facebook did it under the guise of privacy, OpenAI will do it under the guise of public safety.

There's almost no case where a private company is going to be able to successfully argue "they shouldn't be allowed but we should" I wonder why so many companies these days try. Just hire better people and win outright.

foobar_______ · 2024-09-18T19:16:31 1726686991

It has been clear from the beginning that Meta's supposed desire for an open source AI, is just a coping mechanism for the fact that got beat out of the gate. This is an attempt to commoditize AI and reduce OpenAI/Google/Whoever's advantage. It is effective, not doubt, but all this wankery about how noble they are for creating an open-source AI future is just bullshit.

ipsum2 · 2024-09-18T21:00:43 1726693243

You're wrong here. Meta has released state of the art open source ML models prior to ChatGPT. I know a few successful startups (now valued at >$1b) that were built on top of Detectron2, a best-in-class image segmentation model.

tomjen3 · 2024-09-19T03:45:07 1726717507

It’s because Facebooks complementary good is content (primary good is ad slots) and if somebody wins the ai race they can pump out enough content to jumpstart a Facebook competitor with a ton of content.

CaptainFever · 2024-09-18T19:47:40 1726688860

I feel the same way. I'm grateful to Meta for releasing libre models, but I also understand that this is simply because they're second in the AI race. The winner always plays dirty, the underdog always plays nice.

KaiserPro · 2024-09-18T20:03:17 1726689797

but they've _always_ released their stuff. Thats part of the reason why the industry uses pytorch, that and because its better than tensorflow.

In the same way that detectron and Segment anything is an industry standard.

Sure, for LLMs openAI released a product first. but its not unusual for meta to release useful models.

jsheard · 2024-09-18T17:53:39 1726682019

Decentralized inferencing perhaps, but the training is very much centralized around Metas continued willingness to burn obscene amounts of money. The open source community simply can't afford to pick up the torch if Meta stops releasing free models.

leetharris · 2024-09-18T18:00:59 1726682459

There's plenty of open source AI out there that isn't Meta. It's just not as good.

The #1 problem is not compute, but data and the manpower required to clean that data up.

The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.

jsheard · 2024-09-18T18:06:22 1726682782

> There's plenty of open source AI out there that isn't Meta. It's just not as good.

To my knowledge all of the notable open source models are subsidised by corporations in one way or another, whether by being the side project of a mega-corp which can absorb the loss (Meta) or coasting on investor hype (Mistral, Stability). Neither of those give me much confidence that they will continue forever, especially the latter category which will just run out of money eventually.

For open source AI to actually be sustainable it needs to stand on its own, which will likely require orders of magnitude more efficient training, and even then the data cleaning and RLHF are a huge money sink.

exe34 · 2024-09-18T18:52:23 1726685543

if you can do 100x more efficient training with open source, closeAI can simply take that and train a model that's 100x bigger/longer/more tokens.

bugglebeetle · 2024-09-18T19:28:11 1726687691

AKA why Unsloth is now YC backed for their even better (but closed source) fine-tuning.

moffkalast · 2024-09-18T18:15:15 1726683315

https://huggingface.co/datasets/HuggingFaceFW/fineweb

The #1 problem is absolutely compute. People barely get funding for fine tunes, and even if you physically buy the GPUs it'll cost you in power consumption.

That said, good data is definitely the #2 problem. But nowadays you can just get good synthetic datasets from calling closed model APIs or just using existing local LLMs to sift through trash. That'll cost you too.

citboin · 2024-09-18T18:19:52 1726683592

>The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.

Alternatively we could create standardized open source training data like wikipedia, wikimedia as well as public domain literature and open courseware. I'm sure that there are many other such free and legal sources of data.

KaiserPro · 2024-09-18T20:04:56 1726689896

but the training data is one of the key bits that makes or breaks your model's performance.

There is a reason why datasets are private and the model weights aren't.

Der_Einzige · 2024-09-18T21:57:59 1726696679

Compute is for sure the number one problem. Look at how long it’s taking for anything better than Pony Diffusion to come out for NSFW image gen despite the insane amount of demand for it.

Look at how much computer purple AI actually has. It’s basically nothing.

cynicalpeace · 2024-09-18T18:27:40 1726684060

One area that's interesting, but easy to dismiss because it's the ultimate cross-section of hype (AI and crypto) is bittensor.

AFAICT it decentralizes the training of these models by giving you an incentive to train models which will mine the crypto if you're improving it.

I learned about it years ago, mined some crypto, lost the keys and now kicking myself cuz I would've made a pretty penny lol

jsheard · 2024-09-18T18:29:18 1726684158

Does it actually work? AIUI the current consensus is that you need massive interconnect bandwidth to train big models efficiently, and the internet is nowhere near that. I'm sure the Nvidia DGX boxes have 10x400Gb NICs for a reason.

bloatedGoat · 2024-09-18T20:07:43 1726690063

There are methods that make it feasible to train models over the internet. DiLoCo is one [1] and NousResearch has found a way to improve on that using a method they call DisTro [2].

1. https://arxiv.org/abs/2311.08105

2. https://github.com/NousResearch/DisTrO?tab=readme-ov-file

cynicalpeace · 2024-09-18T18:33:28 1726684408

I have no idea. The idea is certainly interesting but I've never actually understood how to run inference on these models... the people that run it seem to be unable to just talk simply.

CaptainFever · 2024-09-18T19:53:29 1726689209

I've seen bittensor before. I think it makes sense, as a way to incentivise people to rent their GPUs, without relying on a central platform. But I've always felt it was kind of a scam because it was so hard to find any guides on how to use it.

Also, this doesn't seem to actually solve the issue of fine tuners needing funding to rent those GPUs? One alternative is something like AI Horde, which pays GPU providers with "labour vouchers" that allow them to get priority next time they want GPU. Requires a central platform to track vouchers and ban those who exchange them. Basically a sort of real-life comparison of mutualism (AI Horde) vs capitalism (bittensor).

numpad0 · 2024-09-18T22:06:44 1726697204

Centralized production, decentralized consumption.

monkeydust · 2024-09-18T18:05:38 1726682738

Curious but is there a path where llm training or inference could be distributed across the BOINC network: https://en.m.wikipedia.org/wiki/Berkeley_Open_Infrastructure...

WithinReason · 2024-09-19T05:36:45 1726724205

Not yet, the bandwidth requirement is too high. But if someone figures this out that's when we will have true open source models. A crowdsourced supercomputer can outcompete any corporation's server farm.

monkeydust · 2024-09-19T07:14:40 1726730080

Yea seems like an inflection moment when it happens. Curious who's working on this problem.

WithinReason · 2024-09-19T07:53:15 1726732395

It's not in the interest of the big players for sure!

jmyeet · 2024-09-18T21:14:24 1726694064

Two things spring to mind:

1. Open source is for losers. I'm not calling anyone involved in open source a loser, to be clear. I have deep respect for anyone who volunteers their time for this. I'm saying that when companies push for open source it's because they're losing in the marketplace. Always. No companiy that is winning ever open sources more than a token amount for PR; and

2. Joel Spolsky's now 20+ year old letter [1]:

> Smart companies try to commoditize their products’ complements.

Meta is clearly behind the curve on AI here so they're trying to commoditize it.

There is no moral high ground these companies are operating from. They're not using their vast wisdom to predict the future. They're trying to bring about the future the most helps them. Not just Meta. Every company does this.

It's why you'll never see Meta saying the future of social media is federation, open source and democratization.

[1]: https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

Qshdg · 2024-09-18T18:04:59 1726682699

Great, who gives me $500,000,000, Nvidia connections to actually get graphics cards and a legal team to protect against copyright lawsuits from the entities whose IP was stolen for training?

Then I can go ahead and train my open source model.

riku_iki · 2024-09-18T18:18:47 1726683527

you can pick existing pretrained foundational model from corp (google, MS, Meta) and then finetune it(much cheaper) with your innovative ideas.

hyuuu · 2024-09-18T18:27:07 1726684027

the view of the comments here seems to be quite negative for what meta is doing. Honest question, should they go to the route of openai and closed source + paid access instead? OpenAI or Claude seem to garner more positive views than llama open sourced.

naming_the_user · 2024-09-18T18:46:41 1726685201

The models are not open source, you're getting the equivalent of a precompiled binary. They are free to use.

Palmik · 2024-09-19T04:39:46 1726720786

That's a bad analogy. The weights are much closer to source code, because you can directly modify them (fine tune, merge or otherwise) using open source software that Meta released (torchtune, but there are tons of other libraries and frameworks).

progval · 2024-09-19T05:47:53 1726724873

You can also modify a precompiled binary with the right tools.

Palmik · 2024-09-19T05:55:59 1726725359

Except doing continued pre-training or fine tuning of the released model weights is the same process through which the original weights were created in the first place. There's no reverse engineering required. Meta engineers working on various products that need custom versions of the Llama model will use the same processes / tools.

RealStickman_ · 2024-09-18T19:45:42 1726688742

Free to use with restrictions, so you maybe get 1.5/4 FOSS freedoms.

meiraleal · 2024-09-18T18:58:22 1726685902

Not much would change if they did. Meta intentions and OpenAI intentions are the same: reach monopoly and take all the investment back with a 100x return. Anyone that achieves it will be as evil as the other one.

> OpenAI or Claude seem to garner more positive views than llama open sourced.

that's more about Meta than the others. Although OpenAI isn't that far from Meta already.

troupo · 2024-09-18T19:00:20 1726686020

They use "open source" to whitewash their image.

Now ask yourself a question: where does Meta's data come from? Perhaps from their users' data? And they opted everyone in by default. And made the opt-out process as cumbersome as possible: https://threadreaderapp.com/thread/1794863603964891567.html And now complain that the EU is preventing them from "collecting rich cultural context" or something https://x.com/nickclegg/status/1834594456689066225

KaiserPro · 2024-09-18T20:09:57 1726690197

> Perhaps from their users' data?

nope, not yet.

FAIR, the people that do the bigboi training, for a lot of their stuff cant even see user data, because the place they do the training can't support the access.

Its not like openAI where the lawyers don't even know whats going on, because they've not yet been properly taken to court.

at Meta, the lawyers are everywhere and if you do naughty shit to user data, you are going to be absolutely fucked.

troupo · 2024-09-19T04:20:11 1726719611

> the lawyers are everywhere and if you do naughty shit to user data, you are going to be absolutely fucked.

I even provided the links that has screenshots of their opt-out form.

--- start quote ---

Al at Meta is our collection of generative Al features and experiences, like Meta Al and Al Creative Tools, along with the models that power them.

Information you've shared on our Products and services could be things like:

- Posts

- Photos and their captions

- The messages you send to an Al

...

We may still process information about you to develop and improve Al at Meta, even if you object or don't use our Products and services. For example, this could happen if you or your information:

- Appear anywhere in an image shared on our Products or services by someone who uses them

- Are mentioned in posts or captions that someone else shares on our Products and services

--- end quote ---

See the words "Meta AI" and "models powering it"?

Meta couldn't give crap about simpler clear-cut cases like "don't track users across the internet", much less this.

KaiserPro · 2024-09-19T10:00:55 1726740055

> I even provided the links that has screenshots of their opt-out form.

and I am asking you to think like a lawyer.

The reason they are doing this is because they want to access user data. They cannot yet.

As I stated in the post, FAIR can't process user data, as a large part of their infra doesn't support it.

If the rest of the AI team want to process the shit people enter into it, they need to get explicit legal review to do so. This warning/ToC change is the direct result of that.

bear in mind that the FCC audits the place every year, so if they see that the lawyers have gone "nope do use that data until we have implied permission" and then the audit turns up that they've just ignored the lawyers, its going to cost literal billions.

> We may still process information about you

Can you outline how might someone reliably and accurately detect your face in a photo taken by a tourist in a public place?

Again, that's lawyer for covering arses.

troupo · 2024-09-19T13:04:35 1726751075

> and I am asking you to think like a lawyer.

Given that Facebook explicitly said they are going to use user data for training if their AIs and given that Facebook explicitly designed the opt-out form as cumbersome as possible while at the same time saying they will not even honor it if it suits them... they've already talked to their lawyers.

> Can you outline how might someone reliably and accurately detect your face in a photo taken by a tourist in a public place

If a friend of mine didn't go through the consent form and posts a picture of me, Facebook will use that for their AI

If a friend of mine didn't go through the consent form and posts information about me, Facebook will use that for their AI

> Again, that's lawyer for covering arses.

Lawyers explicitly covering their asses would not even allow opt in by default, and statements like "we're still using your data even if you opt out".

KaiserPro · 2024-09-19T14:03:27 1726754607

So here in GDPR land, there is the concept of reasonableness.

Facebook are explicitly not allowed to farm for PII, so unless they have explicit consent, they can't scan for faces to reject people who have opted out. Plus, how do you hold a descriptor for someone who's opted out, because you're not allowed to hold any data on them?

Therefore its unreasonable for them to guarantee them they will never process your data when submitted by a third party.

You seem to be taking my comments as a pro-meta stance. Its very much not.

If you can design a way to stop people's data being processed when uploaded by a third party, I want in on that. lets start a movement to get it working.

troupo · 2024-09-19T15:40:10 1726760410

> Facebook are explicitly not allowed to farm for PII, so unless they have explicit consent

And for the past 8 years Facebook has been fighting this in courts and kept collecting every scrap of data they could get their hands on

> Therefore its unreasonable for them to guarantee them they will never process your data when submitted by a third party.

No, it's not unreasonable. They wouldn't be in this situation if they didn't opt everyone by default in the first place. They very explicitly looked at your "not allowed to farm for PII", said "fuck it, we have deep pockets and lawyers" and literally opted every one into PII farming.

> If you can design a way to stop people's data being processed when uploaded by a third party, I want in on that.

Don't farm it in the first place. Don't opt-in everyone by default into your PII farming machine.

It's also funny how you went from "AI people have no access to user data" to "oh they cannot guarantee user data for people who explicitly opted-out won't end up in the models"

KaiserPro · 2024-09-19T17:50:05 1726768205

> It's also funny how you went from "AI people have no access to user data"

No, I said they didn't _yet_. hence this thing.

> Don't farm it in the first place.

how do you host photos?

VeejayRampay · 2024-09-18T20:07:15 1726690035

people are just way too invested in the OpenAI hype and they don't want people threatening that in any way

fullshark · 2024-09-19T00:01:55 1726704115

Nah they are just calling out shameless corporate bullshit. OpenAI were bullshitters too re: promoting open source, but they are not even pretending anymore and once they change their governance structure the idea that they were ever promoting open source will be merely an amusing memory.

abetusk · 2024-09-18T19:00:25 1726686025

This is the modern form of embrace, extend and extinguish. "Embrace" open source, "extend" the definition to make it non open/libre and finally extinguish the competition by shoring up the drawbridge to the moat they've just built.

uptownfunk · 2024-09-18T21:15:00 1726694100

It’s marketing to get the best researchers. The researchers want the meta pay and they want to hedge their careers to continue to publish. That’s the real game, it’s a war for talent. Everything else is just secondary effects.

_yid9 · 2024-09-18T17:54:43 1726682083

The future of everything you depend on is open source and decentralized.

Because all indications are that the powers over you cannot abide your freedoms of association, communication and commerce.

So, if it’s something your family needs to survive - it has better be distributed and cryptographically secured against interference.

This includes interference in the training dataset of whatever AIs you use; this has become a potent influence on the formation of beliefs, and thus extremely valuable.

caeril · 2024-09-18T20:17:53 1726690673

It's not the training dataset.

All of these models, including the "open" ones, have been RLHF'ed by teams of politically-motivated people to be "safe" after initial foundation training.

_yid9 · 2024-09-18T20:30:08 1726691408

And I’m not even remotely interested in the “corrections” supplied by some group of right-thinking meddlers!

This corruption must be disclosed as assiduously as the base dataset, if not more so.

_yid9 · 2024-09-18T20:44:31 1726692271

Or, at least package them up as "personnas" and give them an appropriate name, eg. "Church Lady", "Jr. Marxist Barista", "Undergrad Philosophy Major", ...

Actually, those seem like an apt composite description of the PoV of the typical mass-market AI... 8/

Der_Einzige · 2024-09-18T22:00:24 1726696824

Not mistrals. Mistral large is willing to tell me how to genocide minorities or NSFW without any kind of orthogonalization or fine tuning. Please actually try models instead of pontificating without evidence.

Try it for yourself: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407

_yid9 · 2024-09-18T22:40:42 1726699242

I wasn’t aware that there was any publicly accessible interface to the Mistrals (or any other) models without training-wheels!

alecco · 2024-09-18T18:09:55 1726682995

* pre-trained models

* does not apply to training data

lccerina · 2024-09-19T08:37:02 1726735022

"The Future" in the meantime we will keep doing our stuff, building walled gardens of AI generated spam and slop, and claiming our AI models are open source when they are not. The faster Meta dies, the better it would be.

stonethrowaway · 2024-09-18T22:57:15 1726700235

I’ll link to my comment here from approx. 52 days ago: https://news.ycombinator.com/item?id=41090142

This is chess pieces being moved around the board at the moment.

dzonga · 2024-09-18T19:05:11 1726686311

the reason - i'm a little bearish on AI is due to its cost. small companies won't innovate on models if they don't have billions to burn to train the models.

yet when you look back at history, things that were revolutionary, it was due to low cost of production. web, bicycles, cars, steam engine cars etc.

rafaelmn · 2024-09-18T19:10:02 1726686602

> yet when you look back at history, things that were revolutionary, it was due to low cost of production.

Nuclear everything, rockets/satellites, tons of revolutionary things that are very expensive to produce and develop.

Also software scales differently.

miguelaeh · 2024-09-18T20:17:23 1726690643

The first cars, networks, and many other things were not unexpensive. They became so with time and growing adoption.

Cost of compute will continue decreasing and we will reach that point where it is feasible to have AI everywhere. I think with this particular technology we have already reached a no return point

Manuel_D · 2024-09-18T23:16:54 1726701414

I suspect that models will become smaller, getting pruned to focus on relevant tasks. Someone using an LLM to power tech support chat doesn't want, nor need, the ability to generate random short stories. In this sense, AI is akin to cars prior to assembly line manufacturing: expensive and bespoke machines, with their full potential tapped when they're later made in a more efficient manner.

farco12 · 2024-09-18T22:37:09 1726699029

I could see the cost of licensing data to train models increasing significantly, but the cost of compute for training models is only going to drop on a $/PFLOP basis.

zwijnsberg · 2024-09-18T19:44:56 1726688696

yet if the weights are made public, smaller companies can leverage these pretrained models can't they?

menacingly · 2024-09-18T19:16:56 1726687016

Decentralized on centralized hardware?

latchkey · 2024-09-18T19:22:27 1726687347

Evidence is showing that AMD MI300x are proving to be a strong contender.

YetAnotherNick · 2024-09-22T18:19:43 1727029183

> AMD MI300x

Which costs significantly more than H100 at least when renting[1]. Also the price of hardware isn't significantly lower.

Also, both AMD and Nvidia have been deliberately stopping progress in cheaper consumer graphics card by not increasing VRAM and removing things like fast interconnect.

[1]: https://getdeploying.com/runpod

latchkey · 2024-09-22T18:28:25 1727029705

Your source is a paid advertisement[0] that only lists providers who pay the tax. I'm not really interested in that sort of game. We've done the research and our MI300x pricing is competitive with H100's and even more so if you consider the amount of vram in a MI300x.

I am also not sure if it is deliberate or just realizing that running this stuff is error prone and requires a lot of capex, power and infrastructure. It is difficult to support that at the consumer level, so why bother when enterprises are now offering super computers for rent. It is not their wheelhouse, so I can see why they do not want to take on the extra risk.

[0] https://getdeploying.com/plans

amzans · 2024-09-23T08:32:33 1727080353

Hey, I'm the guy behind GetDeploying. To be clear:

This is a side project and the vast majority of companies were added by me without being paid for it. I now charge to get listed or add a banner because the site takes too much time for me to maintain.

And totally agree, AMD GPUs are not covered enough. Happy to list your company at no cost to help me fix that. Feel free to email me if interested.

latchkey · 2024-09-23T17:51:25 1727113885

Hey, thanks for the response. I'm currently not interested in being listed on a paid site like this, even if the offer is free. I don't think that is what the community needs.

What good is any of this other than driving clicks for your benefit? If I'm going to get any traffic from your site, it is all going to be driven by people just searching or quoting comparisons, not actual sales.

For example, right now, you list another MI300x provider. Right at the top of the page you parrot their bogus claims about 20k GPUs by 2024. They don't have pricing, it is just "contact us". "Based on our records, XXX has at least 2 data center locations around the world"... yet it lists both of them in the US, not "around the world". I could go on and on, but what I know is that I don't want to be associated with something like this.

Sorry for the truth bomb, but if it is taking too much time for you to maintain, you should shut it down or find someone else willing to maintain it properly. Having incomplete and bogus data isn't helpful for anyone.

amzans · 2024-09-24T22:59:23 1727218763

Thanks for the feedback. There's certainly room for improvement.

latchkey · 2024-09-25T01:02:46 1727226166

I saw your original reply, good editing.

YetAnotherNick · 2024-09-22T18:40:39 1727030439

> We've done the research and our MI300x pricing is competitive

More info would be appreciated. Because I tried finding the pricing for all the providers and they aren't similar. In my research, in almost all the cases, 2*A100 is superior than both H100 and MI300x in VRAM, performance and pricing if the usecase supports multi GPU.

latchkey · 2024-09-22T19:41:20 1727034080

If the A100 pricing you've found works better for your use case, then go for it. I'm not here to convince you into something you don't need or want.

Please bear with me though, I would like to take this opportunity to explain a bit how this industry works cause I feel like there is a lot of justified confusion.

You'll find that there is no public pricing because it is usecase dependent. Everyone needs something unique and per/gpu/hr pricing doesn't really quantify the entire hardware stack. Inference doesn't need machines with 8x400G networking. One person needs a week, others need multiple years. Some people want CFD, others want HFT. Frankly, there is also a supply/demand aspect... not many companies offer or have MI300x for rent and we've taken on that capex risk for you.

That said, I can speak about what we are doing and where we are going that aligns with our overall transparency. We've got base weekly pricing now in public (which is competitive to H100's) and we're working on publishing a set of public % discount tiers that should cover longer term rentals. Eventually, we plan to offer inference specific hardware, for even lower prices, since it has different requirements that do not cost as much. We're also going to be offering an hourly docker experience soon too.

At the end of the day though, we're not trying to be the cheapest. We will let others fight that race to zero. We're trying to be the best in our own niche. That happens by picking the best data centers, best hardware vendors, professional next business day support contracts with Dell, and white glove customer support. This sets us apart and above the rest.

Those are areas that the capex moat, is very difficult to compete with. You'll try the cheapest route first and realize that when you see things overheating or failing and taking forever to resolve, you will wish you had come to us. The idea is that we've spent quite a bit more to de-risk your business, as well as ours.

candiddevmike · 2024-09-18T18:19:50 1726683590

Stop releasing your models under a non FOSS license.

nis0s · 2024-09-18T18:15:41 1726683341

I like the idea of this! But is there any reason to be concerned about walled gardens in this case, like how Apple does with its iOS ecosystem? For example, what if access to model weights could be revoked.

There is a lot of interest in regulating open source AI, but many sources of criticism miss the point that open source AI helps democratize access to technologies. It worries me that Meta is proposing an open source and decentralized future because how does that serve their company? Or is there some hope of creating a captive audience? I hate to be a pessimist or cynic, but just wondering out loud, haha. I am happy to be proven wrong.

croes · 2024-09-18T22:24:02 1726698242

Also Meta: The future is VR

Refusing23 · 2024-09-19T06:13:42 1726726422

'AI is so expensive, we'd rather have it handled with communism!'

If that means it'll be free/cheaper... sure

exabrial · 2024-09-18T18:13:16 1726683196

only when it's financially convenient for them...

dkga · 2024-09-18T19:08:54 1726686534

Well, yes. They are a company, with shareholders and all. So while not breaching any law, they should indeed pursue strategies that they think would be profitable.

And for all the negativity seen in many of the comments here I think it’s actually quite remarkable that they make model checkpoints available freely. It’s an externality, but a positive one. Not quite there yet in terms of the ideal - which is definitely open source - and surely with an abuse of language, which I also note. But overall, the best that is achievable now I think.

The true question we should be tackling is, is there an incentive-compatible way to develop foundation models in a truly open source way? How to promote these conditions, if they do exist?

CatWChainsaw · 2024-09-18T19:14:41 1726686881

Facebook promised to connect the world in a happy circle of friendship and instead causes election integrity controversies, bizarre conspiracy theories about pandemics and immigrants to go viral, and massive increases in teen suicide. Not sure why anyone would trust them with their promises of decentralized-AI and roses.

atq2119 · 2024-09-18T20:10:23 1726690223

Good. Now compare to OpenAI. Clearly what Meta is doing is better than OpenAI from the perspective of freedom and decentralization.

CatWChainsaw · 2024-09-18T20:16:40 1726690600

Cool. What Meta is doing is better than cigarettes from the perspective of addiction. If Meta is the best we have, then we'd better create something better, or prepare for the inevitable enshittification.

jazzyjackson · 2024-09-18T21:16:24 1726694184

I love it. TOBACCO PRODUCTION MUST BE DECENTRLIZED.

CatWChainsaw · 2024-09-19T21:39:30 1726781970

Good old tobacco farms.

mrkramer · 2024-09-18T20:09:30 1726690170

Yea, I believe you Zuck, it's not like Facebook is closed centralized privacy breaking walled garden.

pie420 · 2024-09-18T20:11:29 1726690289

IBM Social Media Head: "The Future of Social Media is OPEN SOURCE and DECENTRALIZED"

This must be a sign that Meta is not confident in their AI offerings.

troupo · 2024-09-18T19:02:00 1726686120

I've had as a comment to a comment, but I'll repost it at the top level:

They use "open source" to whitewash their image.

Now ask yourself a question: where does Meta's data come from? Perhaps from their users' data? And they opted everyone in by default. And made the opt-out process as cumbersome as possible: https://threadreaderapp.com/thread/1794863603964891567.html And now complain that the EU is preventing them from "collecting rich cultural context" or something https://x.com/nickclegg/status/1834594456689066225

rkou · 2024-09-18T19:29:34 1726687774

Also known as https://en.wikipedia.org/wiki/Openwashing

> In 2012, Red Hat Inc. accused VMWare Inc. and Microsoft Corp. of openwashing in relation to their cloud products.[6] Red Hat claimed that VMWare and Microsoft were marketing their cloud products as open source, despite charging fees per machine using the cloud products.

Other companies are way more careful using "open source" in relation to their AI models. Meta now practically owns the term "Open Source AI" for whatever they take it to mean, might as well call it Meta AI and be done with it: https://opensource.org/blog/metas-llama-2-license-is-not-ope...