A bit off topic but, the power of GPT (and DL in general) is in the data. Yet, w...

belltaco · on Jan 18, 2023

I am not getting the angle here. Anyone, including you, can write GPT-like code, train the model with public data and release it for free. It may cost a few million to train in GPU costs, but if what you say is that important, surely there are folks here(I am assuming a good chunk of HN folks have a decent amount of disposable income) who will donate it for the public good? If there are not, then they either don't consider it important or are just virtue signaling. Or the effort is actually hard to implement.

I am totally okay with OpenAI being worth $30 billion or whatever when compared to crypto scams being worth billions.

Art9681 · on Jan 19, 2023

I'm interested in solving problems you mention in this space. For the sake of simplicity, I will also agree that the data and the models are free if you know how and where to look. The problem is what then? Who has the money and/or compute capacity to do the work at a scale that can compete with the industry behemoths?

I've been slowly building out a home lab to test mesh computing in this space. Perhaps there is a way to carve the workloads into chunks that can be deployed to a distributed mesh of trusted nodes that have a hardware specs suitable for the task. Then somehow aggregated the results and distributes the entire package back to the network of contributors of that compute capacity. In other words, I will agree to lend you my compute capacity in exchange for a copy of the model you are training. I'd love to collaborate with folks and grow this idea and get a legit open source project going.

Let's build the "Constellation". If anyone wants to geek out and make this happen i'd love to chat. art.aquino at compute dot tech

Building compute clusters and cool software is a passion of mine. So i'm looking to build a network of like minded folks without any commitment and just to help each other.

sandkoan · on Jan 19, 2023

For a distributed computing/BitTorrent-style method of running these LLMs, see: https://github.com/bigscience-workshop/petals.

bamboozled · on Jan 18, 2023

How can we really stop this though?

I feel like Microsoft is doing what Microsoft does yet again. They piggybacked on the whole "open source" angle but this time, rather than be "software vendor lock-in closed source assholes", they're now being "IP theft open source" assholes.

sharemywin · on Jan 18, 2023

It's not just Microsoft. Google probably has models at least on par with OpenAI.

Facebook probably too.

pjmorris · on Jan 18, 2023

> I don’t know where we took the wrong turn within the past decade but we desperately need to correct this mistake.

I don't know if these ideas are worse technically or politically, or both, but what comes to mind are these alternatives:

1. Have someone start a non-profit that curates public data goods, maybe gaining access to data through voluntary donations and through buying all the data provider feeds, and funding through subscriptions by people and organizations who want to understand the data provider infrastructure.

2. Get legislation passed that identifies public data goods and requires that they be made available to all.

fithisux · on Jan 18, 2023

Totally agree. I also doing some Gcloud training. This should also be public good. When I move my machines to Google/AWS/ .. my business changes ownership and I become slave chained by the "competition" and "pricing".

To fix this we need to change mentality and governments.

But people are satisfied with their:

"Mirror, Mirror on the Wall, Who's the Coolest of Them All? ... You, because you are indistinguishable from the others!!!!"