Hacker News new | past | comments | ask | show | jobs | submit login
GPT-4 details leaked? (threadreaderapp.com)
661 points by bx376 on July 11, 2023 | hide | past | favorite | 621 comments




Previously posted about here: https://news.ycombinator.com/item?id=36671588 and here: https://news.ycombinator.com/item?id=36674905

With the original source being: https://www.semianalysis.com/p/gpt-4-architecture-infrastruc...

The twitter guy seems to just be paraphrasing the actual blog post? That's presumably why the tweets are now deleted.

---

The fact that they're using MoE was news to me and very interesting. I'd love to know more details about how they got that to work. Variations in that implementation would explain the fluctuations in the quality of output that people have observed.

I'm still waiting for the release of their vision model which is mentioned here but we still know little about, sans a few demos a few months ago.


I had to ask GPT what MoE means:

"MoE" in the context of artificial intelligence typically stands for "Mixture of Experts". This is a machine learning technique that is based on the idea of dividing a problem into sub-problems, solving each sub-problem with a specialized "expert" (or model), and then combining their outputs.


Yep they (would) basically have 8-16 "experts" that are each about the size of GPT-3. Since they each see different batches of the dataset, they learn to model those distributions independently rather than the distribution of the whole dataset. Some of the attention is shared between them however.

Then another "routing model" decides which model is most suitable for the given user prompt.

Given they use relatively few experts, each one is likely similarly capable to the others on many tasks. I assume this make deployment easier and is a "more conservative" less risky approach. Even if the wrong model is chosen by the router, answers should still tend to be somewhat acceptable, for instance.


This is not how mixture of experts works at all. The experts are chosen on each layer, not for the whole network, and attention is shared between all of them.


Oh I’m happy to admit if I’m wrong in the details. My bad.

So you’re saying the experts chosen are a more literal mixture of layers from each model? Rather than a simple “pick which model to run”?


That's interesting because that's more or less on more level above the multi-head attention.


Please don’t post assumptions while making them look like you know 100% what you are talking about…


Source?


Just theorizing from the top-level post here. No clue if it's legitimate.


To be clear, you just made up MoE details while MoE is actually well established and hails from decades old research?


This is common behavior for inference based learners who don’t hail from strong academic backgrounds. Many developers who are self taught utilize a similar method of learning, essentially using pattern recognition to make “educated guesses” that are then internalized as potential facts and tested at the earliest opportunity. In this instance the test was to project the incorrect information out onto a public forum containing experts and using the presence of contradiction as weak evidence to the validity of their newfound knowledge.

Yes, this is done in lieu of actually looking up extended details on what something means. It has it’s advantages though.


Are you saying that this was a form of Socratic questioning- intentionally presenting an incorrect statement in order to obtain the correction?


A bit like that, but without prior knowledge of whether the information is incorrect, rather an intuition that it is correct.


Sorry but that is just ludicrous. You do not answer a factual question with a mostly made-up answer without saying clearly "this is pure speculation" at some point, preferably early on.


When I was younger (hah, I'm only 26 now) I sure did do exactly what you say. If your statement is intended to say that "People should not" then you're absolutely correct, however it is a learned behavior that some people must adopt after being corrected by their peers.


> It has its advantages though.

Seems the advantage is somewhat localized to the individual inference-based learner; it doesn't seem like a pro-social strategy which would optimize benefit to the group. Overall this seems like it would generalize to widespread misinformation if the majority of uses adopted this behavior.

I'm guessing it's in the best interests of the wider group to try to minimize the occurrence of this type of participation.


The advantages in a social setting lie in the introduction of entropy, that is _creativity_, to a community. In a rigorous academic setting and with proper training these individuals are more likely identify links between ideas or information that may not seem obvious at first, and tend to be your more 'eccentric' academics.

For the interests of the wider group, the best outcome is to help these individuals refine their communication to make it clear when information they present is unsubstantiated inference as opposed to verified knowledge.

Once the proper 'rules of engagement' are outlined the contributions of these individuals is an oftentimes useful 'ingredient' to the success of many enterprises.


Given that this is an online forum, another advantage is that a conversational trail is left for others to discover. The inferences these types of individuals make are often based on a structure of knowledge and reality that others share, so the most common preconceived and incorrect notions tend to have the most documentation on how to ameliorate the incorrectness (given that these individuals are allowed to state their inferences out loud).


This had got to be the best thread I’ve ever (inadvertently) started.


Its a good thread, particularly Exuma's comment, but a memento mori from the root node:

My one word "Source?" meant: "You are adding a new episode of fictional info to a discussion about fictional info."

Even in that fictional world the post was wrong. Even if it was right in some allegorical sense, the simplistic allegory adds nothing. "Mixture of experts is like a group of experts where you pick the right expert to ask a question" isn't some hard-won cross-domain self-taught knowledge. It's something a bright 6th grader would pull off.

The self-taught feeling-stuff-out stuff matters when you're making useful connections that get practical results.

When you're just wiring stuff together online, and the wiring together is meaningless, you're doing nothing and taking the consequences of the negative signals


I have quite deeply enjoyed this thread myself. Thanks :)


You have clarified something I have always thought about very intensely and deeply but haven’t really ever read anyone else who understands that so well or rather put it into words so clearly.

I’m an inferenced based learner to an extreme and it definitely has many upsides and also downsides. The upsides are being able to learn extremely rapidly by making connections between pieces of information where there’s gaps and then using a sort of heuristic detection like a compass to feel out which gaps need filled in most. Then, I follow that trail down, regardless of how hard or complex it is to the bottom just to the point where it accomplishes what I need (whether it is statistics, machine learning, transaction isolation that I've learned for the 50th time...). Another upside is significant abstract thinking ability, and sometimes it feels like looking at a maze from overhead.

I’ve built over 100+ projects over close to 30,000 hours of programming over like 15 years

The downside is always when I’m around strong people of the other type I get the sense they don’t respect this style of learning sometimes. It comes through in their words, tone, subtle body language cues.

My friend, who is very very much the other type with a PhD in something very hard I don’t remember … algorithms and data structures or something, said it’s because I don’t value domain knowledge. He said if you spent an entire life building say, a database, you would not consider that a life worth lived. I laughed cause that makes me sound like an asshole but the more I thought about it the more it’s clear that I actually agree somehow. As if information on its own as a means to an end is not fulfilling to me. To me it feels like efficiency, creativity and and moving from A to B very rapidly while hierarchically organizing a massive amount of chaotic information is engrained in my DNA but just simply getting the correct, deepest domain knowledge possible is not appealing to me at all. I sort of will go to the depths that’s needed then go elsewhere.

I’m VERY thankful for those people though as that’s where a lot if not most of progress is made.

It has been an internal paradox for most of my life where I can’t figure out if I’m smart or stupid. I have built companies completely on my own on the tech side where one made over 10m and another made over 200m revenue. I’ve been told I built some things entire teams were not capable of doing on another project.., this gives me signals that I’m smart. Then other things like getting an F on this hard programming interview from my first employer who is genius level Harvard graduate academic style domain knowledge style person. It made me feel completely idiotic. There are many other times and situations where I often think I’m wired weird where it “feels” like I’m stupid.

This over the years I’ve accepted this paradox but it wasn’t until this domain knowledge piece or the creativity aspect that made me finally just accept it as ok and not something wrong with me.

The hardest part is VERY often being misunderstood. So much of it that I often have to expend an exhausting amount of time when working with new teams to say “how I think” because like a fortune teller I can always predict what will be misperceived, and even when I say it up front it usually happens anyway. This is why trust is paramount with my business partners. They know I’m extremely eccentric so to speak but they “trust the process” when I lock myself in a room for 30 days and come out with an amazing piece of tech that was built purely on raw intuition.

The other part that often made me feel stupid is despite its upsides this way of thinking often is exhausting because I don’t usually rely on past experiences ti make decisions. Each situation is different. So even if I’ve done something new 30 times I will feel this “stepping into unknown” feeling which takes great willpower and courage to repeatedly, especially when other people are relying on it.

Using this method tho I’ve also built cool things, one recently is the platform I built is the best converting one in the entire industry. On that project there is a massive team on the other side but the platform itself was also just built by me alone without much starting info to go off of other than a few multi hour brain dump calls. One thing my PhD friend I mentioned pointed out is that Feynman was a creative learner I think. It helped me feel better that it’s not a “wrong” way of thinking or stupid way if other people out there that high up might share similar ways of thinking more creatively. Of course it’s not exactly everything I describe to a T, I’m not saying that, but threads of it.

Hopefully none of what I wrote sounds insulting or arrogant to anyone. I fully acknowledge that domain knowledge is what moves world forward in many ways.


I have established multiple companies, some of which have grown significantly with over 600 employees. For quite some time, I've transitioned from development and mainly held executive roles such as CEO, Chairman, etc. Simultaneously, it's intriguing to note that I've mostly been unsuccessful in securing 'normal' jobs through interviews in the past (Google, McKinsey, Bain, Accenture etc).

I believe this poses a fascinating topic on the way people assess creativity and intelligence in general.

From my perspective, the crux of the issue lies in the inherent difficulty of accurately measuring creativity in comparison to quick problem-solving skills during job interviews. Consequently, it seems that corporations tend to favor the latter.


@Exuma, this comment is ridiculously resonant with me, the part about 'learning transaction isolation for the 50th time' is very on point too.

Everything you said I pretty much feel the same way. I've accepted it as part of how I work, and the advantages are many (and valued by many) - but yes, interacting with deep experts usually ends with feeling a bit like a fraud. I feel like I maybe was an expert at whatever the thing is at some point in time, momentarily, but then I just shed the information as soon as the next thing needs to be done, and it just ends up as part of the background inference pattern matcher.

Certain things where I'm really forced to learn something deeply do stick, but I find my ways of thinking about that domain to be very different to most 'true' experts, and rely heavily on visual models and analogies with other concepts.


haha yes! What mentioned about analogies... I must use like 50 analogies a day. I also noticed I can use phrases like "always" and "never" and I can say them without a second of hesitation, because they are merely indications of magnitude in a predictive sense, not a literal interpretation. But to someone who must understand information deeply, they never use phrases like that because they operate based on observed knowledge and sort of "hypothesis testing" like a scientist.

It's fun to realize other people are out there who can relate. Thanks for your comment


Thanks everyone for this thread :)


This was very well said and I don't know if you could have said it any better. FWIW, I'm a person with multiple degrees in CS, but the best programmers I've worked with and who get stuff done have zero degrees. I have eight years of hardcore programming experience to include professional and side project stuff - I've learned more actually doing than in any classroom. Yeah it's cool to know what a bubble sort is and how it compares to a merge sort, but knowing all the fine details isn't really needed for actually building things, especially now that we're at the point where an AI can give you the code along with complete instruction.

It sounds like you've done completely fine for yourself and built things that people want, so I would try not to be too hard on yourself.


Thank you, I really appreciate it.


We are generalists, as opposed to specialists. Generalists use information from experts across multiple domains. Specialists are the experts who build a particular domain.

As a culture, we look down on generalists - "A jack of all trades is master of none." However, a world full of specialists creates information silos, where experts solve the same problems over and over in isolation. This is where society is at the moment. We need more generalists to navigate these silos.


I really appreciate your response, thank you for sharing your perspective and experiences.

When I read message I can't help but picture you as the storybook 'inventor' who is locked away inside his house with strange colored smoke coming out the chimney & weird noises heard from the street, yet when the doors open the whole town would gather to see what you made.


Haha, that would be me! In some ways you are right about strangeness. I actually work lying down... in my bed. It allows my mind to completely dissolve into the code or problem as if I'm weightless. My business partners all joke that "uhhh yes, the next stop on our tour, well.. this is our CTO's office but its actually just a bed in there so we wont go in that room...." This gives me a good laugh every time


I don’t appreciate the many bad-faith assumptions made. in particular the assertion that I’m not from a “strong academic background”. For what it’s worth, I made a mistake in a public forum, admitting to this twice. I’ve sought accurate versions of my response which no one provided. Nevertheless I have continued to educate myself on the subject and only feel more confident that I wasn’t misinformed to the degree you all indicated.

You may not realize it but not everyone on the internet is nefarious and if you were to speak in this analytical way about say a classmate while they were in ear shot - that person would likely be quite upset.


I take it all back. The comment I responded to appears to be both correct in assertion and implication.


Okay then. Best of luck with all of that. There’s a forum called LessWrong you would likely be interested in.


I'm not the biggest fan of LessWrong, I do however refer new developers who seem interested to the "Rationality: From AI to Zombies" sequences to help refine conscious development of rational thinking.

Also, I absolutely loved the fanfic "Harry Potter and the Methods of Rationality" written by the forum's creator.

** Edit **

And for the record, my comment wasn't intended to be an accurate depiction of you specifically, which honestly wasn't very effectively conveyed.

It was to highlight a common 'type of person' who make authoritative statements on areas they're not 'experts' in, through no malice on their part, rather as a function of their default mode of behavior.

As others who replied to me highlighted, there are at least two of these people in the world and I wanted others to at least be aware of their existence and point of view; the end goal of this being they might offer others the benefit of the doubt and perhaps some constructive feedback instead of unproductive criticism in similar exchanges.

In essence, I thought other comments were being too hard on you and wanted to point out a potential scenario in which their critiques were at at best unproductive.


> Yes, this is done in lieu of actually looking up extended details on what something means.

If they were capable of understanding the extended details, they would already have an academic background in the subject. Laymen aren't going to have a clue what MoE means even if they went to the trouble of digging up the paper.

> Many developers who are self taught utilize a similar method of learning, essentially using pattern recognition to make “educated guesses” that are then internalized as potential facts and tested at the earliest opportunity.

Using pattern recognition skills to make an educated guess that is internalized as a potential fact sounds an awful lot like what LLMs do. At least when humans do it, we bother with the verification step instead of just acting like we know what we're talking about.


> If they were capable of understanding the extended details, they would already have an academic background in the subject. Laymen aren't going to have a clue what MoE means even if they went to the trouble of digging up the paper.

This is, hopefully, an accidental thought experiment gone awry. "IF THEY WERE CAPABLE of understanding the extended details, they would already have an academic background in the subject" can and should == "I spent a ton of time in the library", and an follow-up apology for putting "capable" and "academic background" in the same sentence.

The whole friggin' point of this glorified LAN is that we can break down those dumb walled gardens and let kids learn from random BBS textfiles, MIT YT videos and the gathered wisdom of HN.

If you are going to just dismiss auto-didacts, you're going to have to re-write the complete higher education History in Western society. I won't even begin to try and validate how wrong this is for Eastern History as well.


> At least when humans do it, we bother with the verification step instead of just acting like we know what we're talking about

We do??


We ask Google (and now ChatGPT) if it's true. It goes round.


I believe it’s called ‘hallucination’


Can you detail what mistakes were made? I’m brushing up on it currently but having trouble grokking it.


I would have presented it as a question versus a statement, in that case.


[flagged]


If the answer is wrong, perhaps you could post a correction so that we are all better off, instead of just insulting me.

Honestly, I've had a fairly rough day, and your answer has made me a bit more upset than perhaps I should be. At least GPT doesn't act like a jerk when I ask it a stupid question.


I think the complaint most folks have for people posting ChatGPT responses is that it adds nothing 'human' to the conversation.

It's sort of like copy and pasting a wikipedia article into a comment, which in itself isn't necessarily wrong. The 'wikipedia' comment however does comes off as an impersonal PSA that also happens to be citing a newfangled encyclopedia that makes an easy target for ire.

If you care to accommodate those who seemingly don't care to accommodate you, try phrasing your messages connect it in some way to the conversation with a couple of sentences about how that information changed your perspective or helped you understand something. That way, rather than making an announcement containing the definition of a word or concept, you're participating in the discourse as an active participant.


You should consider asking ChatGPT for help in clarifying your point(s).


FYI, George Hotz has been claiming to know this aspect for a couple of weeks now.

> The fact that they're using MoE was news to me and very interesting.

Maybe adds some legitimacy to the claim.


Interestingly Google was using ~2000 experts back in the first Trasnformer architecture (if I understand correctly) https://www.youtube.com/watch?v=9P_VAMyb-7k&t=6m42s [sparsely-gated mixture of experts layer]


Yeah the Mixture of Experts might have not been called out by name, but it was pretty obvious you were getting different models depending on the question.

It goes to show how LLMs are nothing like AGI. I think combining it with a calculator is just a bandaid. A useful bandaid, but its not going to be able to do science ever.


Sparse architectures are a way to theoritcally utilize only a small portion of a general models parameters at any given time. All "experts" are trained on the exact same data. They're not experts in the way you seem to think they are and they're certainly not wholly different models. The "experts" work at the token level. An expert for one token could be different from the expert chosen for the very next.

GPT-4 isn't "nothing like AGI" any more than its dense equivalent would be.


I dont see how LLMs using many experts means it's very different from AGI. Why would anyone assume that human AGI isn't based on multiple models running in a similar architecture? At minimum humans are operating with a left and right brain, which process data very differently.


The previous posts are to a twitter thread that's been taken down, and the preview of a post that requires a $1000 subscription. This post however is freely available (for now at least).


And the tweeter of the twitter thread paid the $1000, copied the useful info to twitter, and then did a credit card chargeback.


Seems he summarized it and didn't copy it


A summary isn't allowed under US copyright law. The copyright office calls them "condensations", and they are considered derivative works.

His use was likely not within US copyright law. "Effect of the use upon the potential market for or value of the copyrighted work" is one of four factors a judge should use to decide if fair use applies, and it is clear that publishing the main information from an article, information which is not available elsewhere, freely, severely degrades the market for the original.


Interesting on a meta point that the more clickbaity title "GPT-4 details leaked" won out over the more dispassionate but drier "GPT-4 Architecture, Infrastructure, Training Dataset, Costs".


Clickbait has its time and place. Despite my hatred towards it, sometimes it's really needed.


is it needed when you pay for the blogpost and then immediately chargeback the card like this dude did? https://twitter.com/untitled01ipynb/status/16786550120150712...

what a colossal asshole


Why is it not okay to summarize? It's clearly transformative and not a copyright violation. Yes asshole but he should be in the clear legally.


It is needed if you want people to click on your content more.


I think in this case it's a better title in terms of highlighting the relevant info. What's really interesting is the source - these are important details that actually came from within OpenAI. Because the latter title emphasizes the type of info but not where it came from, I'd probably assume that's a blog post with some industry expert speculating about those things.


When choosing titles for my own submissions, yeah, the accurate title that HN says they desire gets no votes whatsoever. Any clickbait on here, people bring upon themselves (and this isn't even a clickbait-level title)


I don't want accurate titles because they'll make me vote for it. I want accurate titles because it helps me determine if I'll read it BEFORE clicking it.

The whole point of accurate titles is that you'll get less votes on uninteresting content.


But if the title requires you to click, and then you find out it's uninteresting, why'd you upvote at that point? It shouldn't get your vote at all then, having wasted your time


To be fair the latter has the meat of it behind a paywall.


If this is true, then:

1. Training took 21 yottaflops. When was the last time you saw the yotta- prefix for anything?

2. The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.


>> The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

That really doesn't change anything at all. The more training large models gets cheaper, the more large corporations are able to train larger models than everyone else.

Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.


At a certain point though, models become good enough for particular tasks. Once that happens for whatever my application is, I don't care if OpenAI has a model that's twice as good on some metric, because it's overkill for my use-case. I'm going to be happy using a smaller, cheaper model from a competitor.


I think we're far from that point though. For the vast majority of use cases, I always wish that the answers could be more accurate.

Sure - they might be 'good enough' to build a business on. But if a competitor builds their business on top of a more accurate model, their product will work better, and they will win the market.


Yea but the bench being discussed here is FOSS. Which for me, and many, translates to can i run something useful in my closet or on my phone. I've found LLaMA neat and yea, some FOSS models are getting decent - but they're a far cry from GPT4. I pay for GPT4, use it almost daily and that's my bench.

Yes, when i can run GPT4 in my closet, OpenAI will have GPT7 or w/e - but it doesn't change the fact that i have something useful running in my closed network and that opens up all kinds of data integration that i'm unwilling to ship to OpenAI. In that day i'll probably still use GPT7, but i'll _also_ have GPT4 running in my closet and integrating with a ton of things on my local network.


My guess is you'll be running GPT4 equivalent in your closet, but with a 4K context window.

Where the big guys will have GPT-who-cares-what-version with a 100K context window.

Context size is as much of a big deal as newer generations of models imo.


Am I right in my layman's understanding that context windows scaling up requires (mainly) much more compute at run time? Or do longer context models require different/longer training?


> their product will work better, and they will win the market

Like Betamax?


One important milestone a model that is good enough to produce an acceptable quality of answer to x% of public users questions without any data being sent to the megacorps.


> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

I think a better frame is, if rice got so absolutely cheap to make that anybody could spin up a bag of rice on a demand, anybody whose business model was based on selling rice sacks would be in trouble, especially if their specialty was selling rice in bulk instead of, eg, mom-and-pop restaurants selling cooked rice with flavors and a focus on customer experience.

(Not sure the metaphor is a good fit for AI. Maybe OpenAI comes up with GPT-5 and makes something so powerful that by the time OSS projects get to GPT-4 level nobody cares. But if GPT-5 is only incrementally better than GPT-4, then yeah, they have no moat.)


Surely there are diminishing returns for the AI computing though? I mean, is a model with 10x the parameter count 10x better? I think it is still possible that the training costs will be irrelevant for all players at some point with this non-linear scale. Access to data is another story


10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.


It's not clear. Scaling laws still seem to hold AFAICT.

Right now the bottleneck is "how big a model can you fit on an H100 TPU". It's possible that in a few years, when bigger cards come out and/or we get better at compressing models, we'll get even better models just by increasing the scale.


It's still SO early. We are in the "640K [of memory] ought to be enough for anybody" phase of LLMs. So much more to go.


> if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And all that rice would be useless since you could only eat one cup a day.

The richest person in the world and someone who is solidly middle class both use the exact same iPhone. After a point more dollars doesn't necessarily mean better or more useful technology. If training "good enough" models becomes cheap enough to be achievable by small-time developers then OpenAI/Google/Anthropic etc. will definitely lose some of their edge in the space.


> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And?


And...

...the market for rice will totally collapse because it would cost more to transport it than the farmer would make by selling it. Feel free to substitute "rice" for whatever commodity which becomes "too cheap to meter".

The "invisible hand" has a tendency to bitchslap people who don't have an even modest understanding of economic principles.


Training data quality and quantity is the bottleneck.

"Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models. This means that we need to source, clean, and filter to around 33TB of text data for a 1T-parameter model." https://lifearchitect.ai/chinchilla/

GPT4 has been trained on images exactly for this reason (it might not have been worth it separately from multi-modality, but together these two advantages seem decisive).


>Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

...and billions would be lifted out of poverty, and world hunger would be solved. The rice metaphor doesn't quite apply here.

If the price of GPU training continues to drop at the present rate, then it would be possible to train a GPT-4 level LLM on a $3000 card in 10 years. The ability to run inference on it would come way sooner.


The real moat is an abundance of high quality data.


Well open AI raised eye brows by crawling the internet and using everyone's data to make a commercial product

One day some new startup will train on all of libgen and torrent networks, but it will be very hard to prove. You'll keep getting these gaps up in questionable morality and legality, and even openai will complain about playing fair


ThePile already contains some content from a torrent, and there's as lawsuit alleging that Meta has committed copyright infringement by using it.

https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...


Many people train on libgen/torrent in the form of books3 (e.g. LLaMa does this).


Google Classroom, teenager's essays, written by humans, for learning what it means to be human, and graded by humans, is a richer dataset than anything else I can think of that anyone else couldn't get their hands on.


An awful lot of teachers can grade a 10 page essay in about 90 seconds...

Skim read it, mark out some grammar errors, assign it a grade based on the quality of the opening and closing paragraphs.


Yup, and they're doing it the whole country over, and putting that data in to Google Classrooms for Bard to know "this is C-grade work" and "this is A-grade work". Knowing what's deemed good and bad writing is where I'm thinking this dataset shines for training LLMs.


Yeah they have the internet from before LLMs were used for anything, so the data is not poisoned. Not unlike carbon dating becoming useless for estimating age of anything made after nuclear atmospheric tests, or low-background steel.


You talk as if humans weren't perfectly capable of coming up with nonsense.

Blogs upon blogs full of worthless pap that is there for SEO reasons have existed for like a decade already.


And those blogs took a decade+ to make, and now in another year we'll make that much information again. Then it will be that much information in a month. Then that much pap in a day.

And in the past it was still a million people making that much crap. Now it's a single "entity" making that much crap with it's own style and mistakes.


IMO the real moat right now is expertise / smart teams and cash.


The infrastructure/training libraries already exists. I'm sure you can get people who worked at scale that can figure out how to glue things together.

Reddit, twitter, etc.. raising prices is going to make it more expensive.


If you are right then it just becomes who wants to throw the most cash in like a giant game of poker but where you don’t know the pot odds.


... stolen without regard for copyright and licensing.


Fair use!? /s


> The google memo was right about the lack of a moat.

5 months on, and nobody has yet beaten their result quality. I think there is a moat.

Also, I think for many usecases, smarter is better. If a few cents can buy a more accurate answer, then it is always worth paying those few cents. So, while more hardware and more data can train a bigger better model, then that is the moat.


The moat is there until someone releases (or leaks) comparable training data.


And that gets more difficult every day, as previously accesible sources of data turn off their api.

though google may have something up its sleeve with the corpus of google books! I have been wondering if openAI secretly pulled in scihub or zlibrary to neutralize that potential advantage.


Are there any stats on how many words are in google books, vs how many words are on the open web?

My feeling is that the web has a lot more on it than the total of all libraries - simply because anyone can start a blog, but publishing a book requires quite some commitment.


I think you're right but I also think the text in published books would be at least an order of magnitude more valuable than the same length of text from the web


> great news for open source.

Yes, and great news for shills, bad actors, agitators, trolls, foreign intel, and propagandists. I'm impressed by the tech but terrified because for once I cannot conceive of what this means for the future. My guess is that this kills the open web and laws get passed which bury it.


Everybody is self-soothing with the idea that OpenAI's (frankly, half hearted) push for regulation is just mundane regulatory capture and profit seeking, and not the fact that it will, at best, absolutely destroy everything about the internet and technology that we've come to love and know. Should a 4chan torrent show up like LLaMA, with weights and code for a base GPT4-level model, modern society is done. Golden age over.


From your perspective, how would modern society be "done" if GPT-4 was generally available? How would it be substantially different from LLaMA?


GPT-4 is far more capable than LLaMA. Just as one area of impact - captchas would become permanently ineffective. If you're experienced in developing captchas and everything they do for us, you know the implications of that alone lead to a very dystopian internet and world.

I like to answer a question with a question: if you sit and think about it, what both unintentional misuses and intentional abuses can you think of? It helps to write down a list of known abilities, then thinking up several "what if..." negative utilities or implications of each, then iterating further to see second, third, fourth order effects.


> If you're experienced in developing captchas and everything they do for us

What they have done, fairly overtly for a long time, is train AI to defeat captchas.

That this was self-limiting was somewhat obvious.


Hey, captchas also prevent disabled people from using the internet!


> The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model.

In other words: the speculation was likely right, I'll propose a specific mechanism explaining it, but then still insult the people bringing it up and keep gaslighting them.


Calling something a conspiracy theory is not an insult against anybody. It's a theory because it's unproven and it's a conspiracy because people think OpenAI purposely degraded their own service, hence conspiracy theory.


That's a motte-and-bailey defense. Yes, what you say is technically correct with respect to meaning of "conspiracy" and "theory" as individual words. But it's also completely false with respect to what "conspiracy theory" means in actual use - which is to group the subjects (here: people believing GPT-4 quality has been degrading over time, in spite of OpenAI strongly implying otherwise) in the same bucket as flat earthers, vaccine denialists, UFO believers, NWO fearmongers, etc.

Calling the belief "that the new GPT-4 quality had been deteriorated" a "conspiracy theory" goes beyond claiming the belief itself is wrong - it's also claiming that holding this belief implies significantly compromised reasoning skills. That is, it's just a drive-by insult.


This guy doesn't have any idea what he is talking about. He consistently posts such bullshit on twitter. Mostly copy paste with added spice mix.


I noted several things that don't seem consistent with what people have been assuming from before.

For instance - MoE yes, but 16 experts at 111B parameters? Doesn't make sense. GPT 3 had 175B parameters. I doubt they would go less on base models from now on. The number that makes more sense is ~220B parameters per model and 8 expert models. That is the same inference cost in total.

The 13T tokens of training data seems pulled from thin air.


It's Twitter, why would you think otherwise?


I'm tired of people just saying 'oh, internet' every time there is a factual inaccuracy somewhere. Yes, we know this is a social media network. Now can we get back to discussing the topic at hand?


Google has been doing research into mixture of experts for scaling LLMs. Their GLaM model published in 2022 has 1.7 trillion parameters and 64 experts.

https://icml.cc/media/icml-2022/Slides/17378.pdf


Google is jokingly behind in terms of LLMs. They've done a pretty good job at incorporating vision and audio ML models into their ecosystem, but they underestimated language.


How do you know? do you have insider knowledge of this or is it just based on what they share publically?


From what I see in that GPT 3 and 4 was a bit of a rugpull for the industry, now we're all laughing at Google because seemingly they had their hands on the rug for nearly a decade and did nothing - but from the other perspective, maybe they saw the future openai has now brought us and decided against being the pioneers


Amazingly enough, I think this is a bit of it. Some powerful enough people at Google became concerned about implications, including around "hallucinations", "poisoning", etc., and decided to put this sort of research on something of a backburner - justified, in part, by a lack of some obvious easy interfacing of this with search (scaling, hallucinations, etc.).

Of course, the 'wonderful' thing about humans / "independent agents with survival drives in competitive game-theoretic type scenarios" is: if enough people / "agents" have access / opportunity, someone WILL "push the button".

It's just delicious ... the same kinds of patterns over and over - "oh, we should really do something about X / nobody should have power like X, ... but, there's no stopping it, ... oh well".

And, the "rules" really are subtly, many levels down, in place, to make it apparently impossible to not get trapped, one way or another.

(Anyway... [Cartman voice] Screw you guys, I'm going to my other planet...)


as a fun ancedote, the Google Bard's implicit code execution update from *last month*, advertised by Sundar... no longer works https://twitter.com/swyx/status/1678495067663925248

i'd love to know whats going on in that team.


Probably safety-driven terror. They really really want to get their bots going, but in every single meeting some PM or other concerned engineer talks about safety and f**s up the entire meeting.

They even made the bot not respond to arithmetics questions because the bot is bad at this, lol. Someone who knows how to modify the bot had actually spent their time on something as unimportant as that.


Bard being bad at anything else doesn't seem to stop it. It hallucinates at the drop of the hat. Asking it almost any question implying X nonexistent thing exists causes it to make that thing up.


> i'd love to know whats going on in that team.

Seems like a good time to rewatch Silicon Valley and watch Hooli scramble to keep up.


quite, some aspects of emergent behaviour are exciting, others must be a painful learning experience.


Their translation service is based on llms and is commercially a successful product.


Transformer models rather than LLMs surely. ChatGPT behaves nothing like Google Translate.


The T in ChatGPT stands for Transformers. The similarity between the OG Transformer from 2017 and GPT3 (and other modern LLMs) is pretty big


The data, size and training process are what's different.


The point is LLMs are built with Transformer architecture. They're all transformers, attention is an integral part of building worthwhile, contextual answers.


Yeah, but LLMs are more specific, so it's a subset.

Also, LLMs use "in-context learning" ie you actually ask it "hey, translate this" and then it has a conversation with you where you can ask to clarify or provide word definitions.

Google Translate is more hardcoded; a big problem with it is that it can't explain any of the decisions it's made or show its uncertainty about anything.

Of course, neither of these are reliable since they hallucinate.


I favor ChatGPT over Google Translate now. ChatGPT has the added benefit it is focused on providing helpful answers, so if it comes across something almost untranslatable, it will be able to address that and provide an explanation in the style of a footpage note.


Maybe, but they're relatively open with their research, which is great. They also made BERT and released it for free.


Hmm “Sam Altman won't tell you that GPT-4 has 220B parameters and is 16-way mixture model with 8 sets of weights” George Hotz said this in his recent interview with Lex Fridman. It looked like Lex knew this to be true by the way he reacted.


This is unsubstantiated. The only folks who know exactly how GPT-4 works are employed at OpenAI. The rest of us can only guess.


Even if I just go with Sam Altman's public comment, I would have came to similar conclusion: GPT-4 is big and it is hard to make it is faster.

The secret sauce and moat lies in data though. I have heard rumour that they have paid competitive coders to write and annotate code with information like complexity for them.


GPT4 can diagram sentences using link grammar parsing (https://www.link.cs.cmu.edu/link/) which is obscure enough I really don't think they've generated data for it. So it can get pretty good without that.


It's obvious they use data from github and other places. I am talking about extra 0.00..1% very high quality data they (likely)created.


I've been wondering how freemium services like Thread Reader still operate now that Twitter is charging prohibitive prices for API access and taking measures to prevent scraping. The cheapest API plan with read access is $100/month, which reads 10,000 tweets, so could only produce about 500 pages like this one on demand.


There was a post on HN recently with a workaround these apps are using. I don't have it handy but I'm sure you can find it if you look.


There's probably some interesting bits of info in yesterday's Nitter thread: https://news.ycombinator.com/item?id=36665406


const puppeteer = require('puppeteer'); and so on and so forth.


For all the 'I know every number' certainty of this post, there's some weird stuff:

>(Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.)

Why flex both system size and training time to arbitrary numbers?

>For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation. This means parts may sit dormant when other parts are being used. When serving users, this really hurts utilization rates.

Utilization of what? Memory? If you're that worried about inference utilization, then why not just fire up a non-MOE model?

Here's what the post said about MQA:

>Because of that only 1 head is needed and memory capacity can be significantly reduced for the KV cache

This is close but wrong. You only need one Key and Value (KV) head, but you still have the same amount of query heads.

My guess is that this is all a relatively knowledgeable person, using formulas laid out by the 2020 scaling paper and making a fantasy system (with the correct math), based on that.

Put differently, I could probably fake my way through a similar post and be an equal level of close but definitely wrong because I'm way out of my league. That vibe makes me very suspicious.


No, the post is correct about MQA. A KV-cache only caches the key and value heads. The point of MQA is that your KV-cache is 1/heads smaller than usual because of this sharing.

Having multiple query heads does not affect the cache size, which is the limiting factor in MHA decoding for both memory capacity and bandwidth reasons.


>Autoregressive decoder inference is a severe bottleneck for Transformer models due to the memory bandwidth overhead from loading decoder weights and all attention keys and values at every decoding step (Shazeer, 2019; Pope et al., 2022; de Jong et al., 2022). The memory bandwidth from loading keys and values can be sharply reduced through multi-query attention (Shazeer, 2019), which uses multiple query heads but single key and value heads.

Emphasis mine, source here [0]

[0] https://arxiv.org/pdf/2305.13245.pdf

FWIW the original MQA paper is called One Write head is all you need.

Here's the quote from that referencing multiple heads [1]

>We propose a variant called multi-query attention, where the keys and values are shared across all of the different attention "heads", greatly reducing the size of these tensors and hence the memory bandwidth requirements of incremental decoding. We verify experimentally that the resulting models can indeed be much faster to decode, and incur only minor quality degradation from the baseline.

[1]https://arxiv.org/pdf/1911.02150.pdf


What is this hyper dramatic nonsense tweet about, “It’s over“? What’s over?


It's a meme based on quoting this tweet.

https://twitter.com/jebbush/status/929541504187686912


The thing, dude, the thing, is over!


The wait to find out what the model is I'm guessing?


Can anyone provide an alternative link to https://twitter.com/i/web/status/1678545170508267522

I haven't registered for Twitter since it started and I'd rather not now (though I probably will if it's the only way to get leaked gpt4 training details)


Wayback failed to load the subtweets but archive.is has a copy but it seems to stop after around 10 subtweets. The threader link that was posted has it all though.

https://archive.is/Y72Gu


The tweet is gone. What was in it?

Also, I'm dubious about this unsubstantiated claim. The biggest past innovation (training with human feedback) actually shrunk the size of a model. Compare Bloom-366B with falcon-40B (much better). I would be mildly surprised if it turned out Gpt4 has 1.8T parameters. (even if it's a composite model as they say)

The article says they use 16 experts 111B each. So the best thing to assume is probably that each of these experts is basically a fine tuned version of the same initial model for some problem domain.


As a note the 366B in Bloom-366B refers to the number of tokens, not the number of parameters. Bloom had 176B parameters (still many more than Falcon)


Maybe 111B is the base GPT-3.5 model.


>If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million.

If someone legitimate put together a crowd funding effort, I would donate a non-insignificant amount to train an open model. Has it been tried before?


I too would be interesting in donating money toward this.

Given that the price since the original training effort has already dropped to ~$20 million, and that (a) the fundraising will take time, and (b) improvements are being made every day with regard to resource usage, you could probably get away with aiming for a much lower number.

Pulling a number out of my arse, I'd guess that training a comparable model will only cost $1-5 million in 12 months time, with the hardest part of doing so once you have the funds being acquiring the training data.


Not yet, heard tale of several people having the same idea to train an open model though through either crowdfunding or some wizardry with crowdsourcing GPUs.

$65 million sounds pretty high though.


Considering an effort to buy a copy of the constitution raised almost $47M, I wouldn't be so sure. [^1]

Worth noting, though, that it isn't just the computing budget that's missing here - it is also (and perhaps even more importantly) the high quality data to actually train the model.

[^1]: https://en.wikipedia.org/wiki/ConstitutionDAO


Some kind of SETI project, but for training a high number parameter llm would be awesome.


How many people have A100s at home?


The comparison to SETI@home[1] is that you just need the same mount of total processing power, not the same supercomputer setup.

People don't need to own A100s, they just need to be willing to be part of a distributed supercomputer by running a background app that downloads chunks of data, processes them, and sends the result back. The utility comes from having enough people participate (which worked quite well for SETI@home, but helping find "signals from outer space" is a little bit more interesting than "helping train an LLM")

[1] https://setiathome.berkeley.edu/


The fact they are using MoE is interesting. There are alot of specialised open source models on HuggingFace. You just need an LLM to act as the core "brain" and a few other components.

HuggingGPT works similar to this. It automatically chooses, downloads and runs the right "expert" model from HuggingFace https://arxiv.org/abs/2303.17580


I wonder what the legal implications of them using SciHub and Libgen would be if that's true. I'd imagine OpenAI is big enough to make deals with publishers.


Libgen / Scihub or not, if the model can provide details about the book other than just high level info like the summary and no explicit deal with the publisher has been made, you can make a strong argument that it is plagiarism.

Even if bits and pieces of the book text are distributed across the internet and you end up picking up portions of the book, you still read the book.

It is extremely sad but ChatGPT will be taken down by the end of this year and replaced by a highly neutered model next year.


I'm not a lawyer and obviously we won't get any definite answer unless it actually goes to court, all of this is just hand waving and guessing.

But I think that unless GPT starts reciting large parts outside of the context of learning/education/research, reciting smaller snippets would fall into "fair use" and not be illegal.


For it to be fair use, they still have to have legally owned the book (as far as I understand).

You can't steal a book, photocopy some pages, then claim the photocopied pages are fair use.


I think you can. It is a separate "crime". You would get 2 cases one for fair use (which if you are quoting, commenting, reviewing, generally repurposing content and it is in fact fair) and second case for license/terms breach and/or illegally obtaining this piece of work(for example if you stolen it from bookstore).


If you recite enough small snippets, you make a large one.

Especially with ChatGPT you can probe the model by asking certain questions about the material at hand to see if it has seen the entire book.

Also you don’t have to be able to recite the book verbatim for it to have been in your training set. The snippets I am referring to are on the side of the training data


If I read a book and then write a summary, is that plagiarism? What's the difference? I am legitimately not familiar with copyright law, but real lawyers seem to think it is unclear whether training on copyrighted data is illegal (in Japan it's definitely not).


If that's true, then OpenAI has probably taken extreme protective measure to ensure the secret is well protected. Even if OpenAI is big enough to make deals, they probably did not spend several years making deals with all of them.

It's, however, very interesting to see if they fund efforts to massively (re)start books digitalisation.


probably just easier to use drm-free copies of books


We should default to using the thread aggregators instead of using twitter links. My God Twitter threads are unreadable.


"Open" AI, a charity to benefit us all by pushing and publishing the frontier of scientific knowledge.

Nevermind, fuckers, actually it's just to take your jobs and make a few VCs richer. We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

https://github.com/ggerganov/llama.cpp

https://github.com/openlm-research/open_llama

https://huggingface.co/TheBloke/open-llama-7b-open-instruct-...

https://huggingface.co/TheBloke/open-llama-13b-open-instruct...

You can use the above without paying OpenAI. You don't even need a GPU. There are no license issues like with the facebook llama.


>> We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

Just to be clear, there's no science being kept secret because there is no science being done. OpenAI's is a feat of engineering, borne aloft by a huge budget supporting a large team whose expertise lies in tuning neural net systems, and not in doing science.

Machine learning, as it is practiced today, is not science. There is no scientific theory behind it and there is no scientific method applied. There are no scientific questions asked, or attempted to be answered. There is no new knowledge produced other than how to tune systems to beat benchmarks. The standard machine learning paper is a bunch of text and arcane-looking formulae around a glorified leaderboard: a little table with competing systems on one side and arbitrarily chosen benchmark datasets on the other side; and all our results in bold so everyone knows we're winning. That's as much doing science as is racing cool-looking sports cars.


I'm tired of science as a religion. People treat it as some gospel, like if you check out some criterions you're suddenly "scientific" and instantly get a sense of validity and authority that you shouldn't logically get.

I judge things as "what you can do", not "what can you predict". The only demonstration of knowledge and understanding is being able to do something. Not predict. Not "scientific method" and ridiculous "peer review" (actually peer pressure), not blind trials and not rigorous statistical analysis.

In the end of the day, you either manage to do something or you don't. So much of so called science had lost all contact with reality because our judgement of success isn't successfully doing something, it is successfully jumping through "scientific" hoops. Look at string theory and social sciences. The scientific process, instead of being a tool, became the purpose. It became a stamp of validity to seek. A stamp of validity with gatekeepers in the academia, in the peer review process, in the media coverage afterwards all the way to social media censorship and "fact checkers".

What used to be the frontier of creative people became a stagnant beurocratic machine worshipped like a new religion. The side of the heretics burned at the stake became the ones crying out heresy.

Enjoy your new brand of science. I'll stick to the older brand of heretics and mad men which did whatever it was the prevailing orthodoxy told them to avoid doing and thinking, and I'll remind you that the only real reason those are remembered is because they did something useful, not because of the social traditions they adhered to or the rigorous scientific standards they followed.


Science is not academia. You say you prefer to judge things as "what you can do". Well, how do you judge "what you can do"? Astrologists, homeopaths, podiatrists, Christian scientists (!!!) and other such "heretics and mad men" rejected by the scientific establishment, will all tell you that they "can do" stuff, and so will all their many paying customers. How do we know they can't do what they say?

Because science gives you the tools to know that you're wrong. If you're a good scientist, you will be wrong _all the time_. That's how science advances: one mistake at a time. But you can't make mistakes if all you ever do is doing stuff with computers, like beating all the benchmarks, because that is a meaningless result judged by its own, self-chosen, measure of success that can never fail; and so can never inform.

Peer review is also not science, but it has been great to catch errors in my papers. Not in conferences, mind. I try to stay away from conferences. Everybody flocks to conferences because of quick turnaround, instant gratification. Journals have the good reviewers who can take their time understanding your work and helping you find where you've gone wrong. "Reject with encouragement to resubmit" is the best review result I ever got.

>> I judge things as "what you can do", not "what can you predict".

The goal of science is not to make predictions, but to understand how the world works, and why. Put that into instrumentalism's pipe and smoke it.


Placebos are known to work. I neither overestimate nor underestimate them. I understand them for what they are: placebos. And some people actually need them.

Moreover, in social contexts, religion also "works" in many senses. A person asked me what me what can he do about depression and feeling of meaninglessness. Science would prescribe anti-depressants, medicate him and he would both have side-effects and the problem would never be cured. I suggested that if he were religious, the meaninglessness would immediately go away and he would need no medication, no external interventions.

So again, judging religion by "what can it do", it can successfully give people meanings. Which is quite non-trivial thing that science fails to do to many people. So religion can be presumed to be acting on some truths regarding what drives and gives people meanings. Religion successfully demonstrated an understanding of that, you don't need statistical analysis or peer review to see that, it's plainly obvious. That knowledge just doesn't carry on to other stuff like how celestial bodies move, and there's no reason to think it does.

I love how you're the spokesperson for "science".

Ironically, when you look at how you describe the merits of peer review, you sound quite like an instrumentalist yourself.

Also quite ironically, the new science religion keep on pushing this funny narrative that science is about "being wrong all the time" which is like the complete opposite of scientific history. All of the established science is just the successes. Look at the great physicists. There is no mistake in Newton's laws. Or in Einstein's. They are just as valid today as they have been back then, in the areas where they were demonstrated and studied.

How did we reach this apologetic science? "good scientist will be wrong all the time"? That sounds like gaslighting scientists in a failed system rather than accurate historical statement. Science advances because someone eventually succeeds. In the grand scheme of things, there are no mistakes. There are corrections, there are expanding the domain of applicability. But the mistakes are sent to the trashbin of history, and if a failed system decided to gaslight you into rationalizing failure, stop listening to it and start seeking success.


Sorry but I don't want to continue this discussion if you're accusing me of gaslighting others, and of being gaslit myself. Or of pretending to be a "spokesperson for science". I was hoping to have an honourable exchange.


>There is no mistake in Newton's laws. Or in Einstein's. They are just as valid today as they have been back then, in the areas where they were demonstrated and studied.

A model being incomplete is just another way of saying it's wrong.

The models they came up with are wrong, but still useful.


And whence the tools to judge that

> science gives you the tools to know that you're wrong.


> In the end of the day, you either manage to do something or you don't.

The hard sciences would like a word.


Taleb wrote well about that, changed my view on science. Not that science should be discounted, but it’s not the single source of truth. https://twitter.com/nntaleb/status/1419843561286160397


This is the reason why I left academia for startups... Seems like a better way (not perfect by any means) to innovate, actually doing things.

PS: I would be happy to connect, you can find my socials in my bio.


You would love Thomas Kuhn.


Science as a rear-view mirror.


Just add epicycles.


Add enough epicycles and you've got a Fourier transform... Epicycles were a great idea but applied for the wrong reason.


Compressing millenia of astronomical observations into the compound of three or four sinusoidal waves whose formulae could be carried and calculated in a pocket book?


Oh, they worked really well; they were still used for numerical calculation even after Galileo. But they weren't a "demonstration of knowledge and understanding" of planetary motion.


Understanding is a moving target, though. Newtonian mechanics was incorrect for modeling the solar system as well - as it was eventually superseded - but that doesn't mean it wasn't a scientific understanding of planetary motion.

Epicycles gave a better description of the movement of the planets, based on the observations available, which was entirely falsifiable. They were eventually superseded, and that's science at work.


Interesting to say it's scientific because falsifiable. The objection is that it wasn't a theory, just fitting a function to data. It did "work" in that it captured some pattern: it was extremely good at generalizing/extrapolating/predicting. And was a "model" of something in the data. But there was no operational model behind it, of what was actually happening.

Newtonian mechanics has a model, beyond curve fitting.

BTW: I made this analogy between LLM and epicycles as a joke, but it's looking strangely isomorphic...

It's funny, because epicycles used to be the poster-child for ad hoc, overcomplex models with no conceptual basis... which is an exact match for LLM... but I don't recall the analogy being made. Not even in https://norvig.com/chomsky.html


This is incorrect: Epicycles were a model of planetary motion - the theory was that the planets moved around the earth, but also had additional circular motion as they moved along their path around the earth. This model explains the apparent geocentric motion of the planets, much as Newtonian gravity explains the apparent heliocentric motion of the planets (but is also wrong). Finding the exact parameters for the epicycles was the curve fitting part.

We now discount that model because it's based on an incorrect geocentric model of the solar system, but that doesn't mean that the model wasn't a model...

[edit] The CHomsky link is interesting - I just listened to an interview where he pooh-poohs LLMs at great length.

This point: "Statistical models have been proven incapable of learning language; therefore language must be innate, so why are these statistical modelers wasting their time on the wrong enterprise?" is interesting in that context - in the recent interview, Chomsky is now unhappy that LLMs can learn /any/ language, even unnatural ones, and therefore aren't good tools for understanding human language. Quite the reversal. I personally think it's a 'science progresses one funeral at a time' kind of situation...


At some point I'd like to carefully study the history of that early era of science because I don't know it as well as I'd like. But I believe I understand that the epicyclical model (and it was a model, rather than a theory) did not in any way depend on geocentrism. For one thing, Coppernicus' model itself, while heliocentric, retained the epicycles of the earlier, geocentric model. Instead, the assumption on which the epicyclical model depended was the shape of the planets' orbits and of the planets themselves, which were considered to be necessarily circular, and spherical, respectively. I think this had to do with assumptions about the geometric perfection of the universe, as a creation of the gods. In any case, assuming that planetary orbits were circular an explanation was needed for the apparent "retrograde" motion of the planets (meaning it looks like they double back and turn against their original heading). Explaining this apparent motion was why epicycles were hypothesised in the first place.

The first time this necessarily circular model was abandoned was with Kepler's laws of planetary motion, which correctly identified the motion of the planets as elliptical, that for the first time explained their apparent retrograde motion without the need for epicycles. Then Newton's theory of universal gravitation explained how the planets could possibly be moving on elliptical orbits. In fact, I believe Newton's theory of universal gravitation explained how the planets could be turning around the sun without crashing down, despite not having anything to hold them up. I think this was the first big mystery that the ancients tried to answer- hence the name of "firmament" for the universe.

And Newton's theory was not the end of the story of course.


Chomsky hasn't made any reversal, Norvig misrepresents what Chomsky has said.

> Chomsky is now unhappy that LLMs can learn /any/ language, even unnatural ones, and therefore aren't good tools for understanding human language

That's also not what he has said: He said they aren't useful for understanding the human language faculty, in other words, understanding how people are able to have language. As he says it obviously can't be the same way as LLMs because LLMs are able to learn languages humans can't learn.


I realize I've been repeating shibboleths from my postgrad without full understanding.

A problem with epicycles is they are harder to falsify. If the model doesn't match new observations, just adjust it, or add another epicycle. In contrast, Newtonian gravity can hardly be tweaked at all. So when Mercury's orbit was slightly off, they knew something was wrong.

I'm not quite clear on how I feel about this. Geocentricity is a theory, in broad terms. It seems disrespectful to say adding epicycles makes it "not a theory". As you say, there was a theory that the planets actually moved in epicyclic motion. It wasn't just calculation to them.

(I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)

Yet, having more parameters than data means the model doesn't explain in simpler terms, only restates. In this sense, it's "not a theory" (by Occam's razor). It seems enough epicycles can model anything: (3Blue1Brown Fourier Series) https://youtube.com/watch?v=bL0LV0Huj1s OTOH the epicycles did predict planetary motion, so they did capture some regularity... not sure what to think.

RE Chomsky: You can see it's like epicycles: with enough parameters, an LLM is like a numerical method for curve fitting, that doesn't explain the data (any more than a fourier transform does). Curiously, they do seem to predict very accurately... yet also generalize strangely ("hallucinate"). What to think?

But it seems it's got to help! Even if only as a device, like a telescope. Also, from this interview with Terry Sejnowski (https://youtube.com/watch?v=XKC-4Tosdd8 3 hours!) there's instances where a technique was developed for a problem with Neural Nets, and an equivalent was found in the brain.

He also gives a Chomsky-like view: if you duplicated the human brain and it worked perfectly, you wouldn't have done any science if you didn't understand anything.

BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.


> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.

Do you know where Chomsky says this exactly? I've been looking for how/if his argument is the same as Gold's.


>> RE Chomsky: You can see it's like epicycles: with enough parameters, an LLM is like a numerical method for curve fitting, that doesn't explain the data (any more than a fourier transform does). Curiously, they do seem to predict very accurately... yet also generalize strangely ("hallucinate"). What to think?

Well, that's the fundamental problem of modelling: that for any set of observations there's an arbitrary number of models that fit the data with great accuracy and even predict future observations well; and we don't know which one is the best in the long term.

The answer is that we should prefer not predictive models, but explanatory theories, that not only predict future observations but also explain why those observations should be expected to be made.

For example, the epicyclical model did not explain anything: it said nothing about why the planets should move on circular orbits with epicycles. Kepler's laws didn't explain anything because they didn't say why the planets should move on ellpitical orbits. Newton's law of universal gravitation explained it all in one stroke: because gravity. And that's why we consider Newton the greatest scientist of his era, not Kepler, not Coppernicus, not Gallileo, but Newton, because he explained the world and didn't just describe it.

Ultimately the advantage is, like you say, that when an explanatory theory fails, we can better know why. When a predictive model fails, we have no clue.

>> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.

Gold's negative learnability result was a huge upheaval that led directly to the current paradigm of machine learning. Chomsky used it to support his argument about the poverty of the stimulous but linguistics was only one of the two fields that Gold's result turned upside down.

And it was a negative result. As I say in another comment, science gives you the tools to know when you're wrong and that's how progress is made, when we find out where we were wrong before.

With epicycles, it took almost two thousand years before we figured out where the model was wrong. Let's hope that it doesn't take that long with LLMs and neural nets also, because I doubt we have another couple thousand years to spare on a wild goose chase.

>> (I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)

The epicyclical model persisted for so long because it was so good, and because there was nothing better. It is common for people who don't understand science to look at scientists of the past with derision and think they weren't even scientists, but for almost two thousand years, astronomers did exactly what a scientist must do: they accepted the best available theory, even if many of them hated it with a burning passion (and they did!). If it wasn't for the ancients stumbling and fumbling in the dark for millennia, we wouldn't today be enlightened and we owe them every respect.


> not predictive models, but explanatory theories

Maybe an advantage of an explanatory theory is in revealing more of the "black box", giving more ways to check the theory. (But I'm not sure how this could apply to Newton's gravity, since the only observations were outcomes. And no plausible way to "experiment".)

> If it wasn't for the ancients stumbling and fumbling in the dark for millennia

Is there any evidence that the epicyclic models helped scientific understanding, even indirectly? Later theories didn't seem to build on it. I wonder if it actually detoured understanding, with its misleadingly impressive accuracy, so that understanding would have progressed more quickly without it.

Thinking of pg's "great work" (https://news.ycombinator.com/item?id=36550615): to be the Newton of neural nets would seem the most ambitious aspiration of our times. But it took a bunch of geniuses just to get to Newton... and it seems an even harder problem than planetary motion. Though a difference is neural nets are based on actual neurons (loosely!).

It's looking like working human-level AI will precede understanding... perhaps by those 2000 years?


The gravity model is similar though: we posit a force that pulls things together, but we don't know /why/ that force seems to exist, no more than the ancients knew /why/ the planets seemed to move in smaller circles along their circular paths. We're really not /that/ enlightened, after all.


I think that's right, but ultimately all explanations we have are based on prior knowledge that is itself not necessarily complete. It's explanations all the way down, until we hit some primary observations or axiomatic assumptions that are the hardest to get rid of.

"Enlightened" was my bad choice of a word. I get overexcited when I think of how much we have learned in the past couple thousand years and I forget that we mainly learned how little we know. Or can explain!


Yes, ultimately it's also descriptive.

I'd like to think explaining means giving a model simpler than the observations. But this also can be true of a purely predictive model, that offers no "why". Another commenter pointed out that epicycles do simplify - so they do "explain" in this sense.

What defines an "explanation"? What makes something a "why"?


> Chomsky used it to support his argument about the poverty of the stimulous but linguistics

Do you know where Chomsky refers (directly or indirectly) to Gold? I've been searching for a reference for some time.


No, I'm sorry. I'm not a linguist so I only know the relation between Gold's result and linguistics second-hand. I'm more interested in it from the point of view of inductive generalisation in machine learning; that's my schtick.

Just to make sure I didn't hallucinate all that, I had an admittedly perfunctory search online and I could find this paper:

https://proceedings.neurips.cc/paper/2002/file/04ad5632029cb...

Whose introduction describes how Gold's result is considered to support the arguments for linguistic nativism from the poverty of the stimulus. Then again, the author doesn't seem to be a linguist himself and he doesn't give any more specific references, so I'm now a little worried; and your question remains un-answered.

Have you tried wading through Chomsky's early work on linguistics? I don't have the courage to. The closest I've got to is I have a friend who has read a couple of Chomsky's linguistics books. My friend is making a living as an astrologist now so maybe that's a bit of a warning there :P


(not who you asked) I thought this would be in the linked transcript, but it's not. Norvig must be getting it from elsewhere (maybe in the 404ed video?), but it seems like misrepresentation.


I don't think the problem wasn't that it wasn't a "model" it's that it had lots of unexplained "plug-in" behavior. The heliocentric model had much less; it explained more using less despite being less accurate.


Nobody's arguing that the heliocentric model wasn't better for a variety of reasons... simply that getting superseded doesn't make the previous approach 'not science.' All models are wrong, some are useful, and all that.

And the heliocentric model still has plenty of unexplained parameters: the major and minor radii for each body. (Not to mention those pesky perturbations in mercury's orbit.)


Sure. But remember also that the standard “science” is largely a broken model. Academia sucks. It’s honestly bad. It’s overflowing with papers that are misleading, irrelevant, or fraudulent due to a mix of poor stats knowledge, bad incentives, and a failure to organize effectively.

While people praise the scientific method, the majority of achievements we attribute to “science” are not derived from guess and check grad students doing their thing.

The standard paper is not very good. And a lot of the reason is that our scientific model is largely designed to find tiny effect sizes. Which is fine for like… medical stuff. But the juice is in large effect sizes. Stuff so obvious you don’t really need the stats to evaluate it. It doesn’t really matter if you followed the scientific method or not when you discover penicillin. It just works, clearly. And you can demonstrate it working again.


Right. In historical context the scientific method was a huge improvement over the previous method of understanding the world, which was mostly religion.

But today, academia is a victim of organizational and political capture, making it less competitive for talent.


People struggle to differentiate the scientific method and the lifecycle of academia.


In some ways academia is akin to a religion or cult that sprang up around the scientific method, and the cult has now diverged as far from the seed as far right christianity has from jesus’s teachings


Oh come on, have some perspective: academia has millions more people to slaughter before they catch up with the cult of Christianity.

https://apholt.com/2019/01/30/death-estimates-for-the-crusad...

John Shertzer Hittell– Estimates 1,000,000 total dead for the crusades to the East covering the period from 1095 to 1291.

“In the two centuries of this warfare one million persons had been slain, but it had not been without some compensations.” John Shertzer Hittell, A Brief History of Culture (1874), p.137.

[...]

John M. Robertson– Estimates 9,000,000 total dead for the crusades to the East covering the period from 1095 to 1291.

“It is a reasonable calculation that in the two centuries from the first crusade to the fall of Acre (1291) there had perished, in the attempts to recover and hold the Holy Land, nine millions of human beings, at least half of them Christians. Misery and chronic pestilence had slain most; but the mere carnage had been stupendous.” John M. Robertson, A Short History of Christianity (1902), p. 278.


You might be stuck in an argumentative mode of thought, it seems like you replied to an interpretation of my comment that left out some salient details.


If you have some salient details about millions of people being slaughtered by academia, then please post them!

Otherwise my point stands that your comparison is ridiculous, because religion has killed millions more people than academia, despite Jesus's teachings and the "Thou Shalt Not Kill" commandment.

Religion has diverged vastly and diabolically further from the teachings of Jesus than academia has from the scientific method. There's no comparison.

You're being extremely anti-intellectual while whitewashing and rationalizing the slaughter of millions of innocent people, misery, carnage, and chronic pestilence, if you think academia is anywhere near as bad a cult as Christianity.


He compared it to a cult. Not all cults have killed millions of people. But many cults do stray very far from the original stated reason for the cult existing. For example some cults will treat their members like free labor and claim the goal is to make everyone happy through hard work and lack of worry about money. But then the leader will get a lot of money and start buying nice things for himself.

Buying a nice car with other people's labor isn't as bad as killing millions of people but it is straying very far from the original declared mission.

You seem to believe that murder is the metric that can be used to measure how well an organization is doing as far as following their mission.


I remain convinced that the correct model for science is a) serious hobbyists and b) patronage for the really talented. It would weed out the careerists, at least. You just can't make a "science factory" and expect to produce discoveries like cogs. It's stupid to even try.


I think you can. That’s basically corporate research teams.

The problem with academia is that it’s comprised of people who are held hostage until they find something. Just terrible incentives.


I think the search for tiny effects and the belief that they should drive treatment, is one of the biggest things wrong with medical research. Doctors will put millions of people on a pill that hits a vital metabolic pathway, for life, based on tiny (to my thinking) statistically significant (i.e. only a 5% chance they aren't based on a real effect) results.

These days I only believe large effects- like smoking causes heart disease


Medical standards aren’t that bad. It’s just a complicated system with lots of interacting effects that only work for specific circumstances. Coupled with the fact that many medicines are not cures, but something that makes you die slower.


Pretty sure the idea of scaling is under test here, just because it’s relatively straightforward to use larger models doesn’t make it less of an application of the scientific method. Also, process supervision, devops is experimental for a startup like OpenAI, and the mixture of experts really hasn’t been used with MHA models at scale before.

I’d argue it’s pedantic to assume all science has to be completely novel or revolutionary. More neurons good, is a perfectly reasonable reason to experiment. I think you’re being a bit “snooty” in setting such a high bar to gatekeep science…


> Machine learning, as it is practiced today, is not science. There is no scientific theory behind it and there is no scientific method applied. There are no scientific questions asked, or attempted to be answered.

Total horseshit. There are tons of scientific papers on ML published. In fact it is MORE like traditional science than typical CS, because it is trying to reverse engineer how something we encountered in the real world works. We know NNs do amazing things, and we don't fully yet understand how.


{Hypothesis, test, loop} is the scientific method, and I can guarantee it is being used when fine tuning an LLM.


That's a common interpretation of what science is, but it largely ends up being driven by confirmation bias. Because how do you know your hypothesis and test are even really connected? Or what you're seeing is a cause and not a correlation? This is why you need arguably the two most important factors in "real" science: predictability and falsifiability.

Predictability means that if your hypothesis is correct then you'd be able to formulate other improbable (and ideally currently untestable) predictions from it. And falsifiability means that if these predictions fail to occur then your initial hypothesis was also almost certainly wrong. So for instance Newton's hypothesis was that gravity was driven by a mathematical relationship between the mass/distance of two bodies. It was good science because it lead to the shocking ability to be able to dramatically simplify orbital dynamics, and create a complete predictive system of these bodies. It was even used to mathematically discover a completely unknown planet - Neptune.

Incidentally, his theory would also be able to be shown to be false if any of these unexpected predictions ended up being false. And that's actually exactly what happened. Observation of Mercury's orbit about the Sun showed it was off by ~1/3600th of one degree per century, relative to what was expected. And it's from there that people knew there was a mistake, which would only be explained later by Einstein hundreds of years later, who hypothesized a system with far more absurd predictions... and so the story continues.


We've got lots of datasets at this point in history - it ends up being pretty reasonable to formulate a hypothesis, propose a change to the ML system, then see how it does on a host of datasets. Most papers focus on one or two datasets, getting some initial improvement going from train-to-test, but the successful ideas end up transferring to a wide range of contexts. The extension to other dataseets + contexts constitutes the prediction of performance on other systems.

And sometimes (often?) techniques do fail to transfer! IME, it's only the simplest ideas which do transfer to new contexts well - there's a lot of dark heat in pushing Imagenet another hundredth of a point, which you realize when you try to take incremental 'new SOTA' papers and apply them to audio problems.

But the things which /do/ transfer constitute real advances. And that's (drumroll) science! Sometimes things look good initially and get falsified. Medicine is still science, even if drugs get disqualified in phase iii trials...


ML has both of those things. In fact the situation is better than in many other sciences because the evaluation is completely digital. When you run an ML experiment you get back an objective measure in minutes or hours telling you if the model performed better or worse. If it's worse, boom, falsified. Many current papers are making predictive testable claims -- that based on various things we know what ideal parameter values are, the types of problems current NN archs will perform well on or poorly on, etc.


That's a necessary but insufficient condition for something to be considered scientific.


It is not the scientific method at all. You didn't even include an analyze or a publish step, two critical components to the scientific method. The point is that science is a thoughtful, methodical, recorded, repeatable process that is scrutinized by not just your friends but by the entire world, including those who compete against you. In science, it's normal for your competitor to recreate your experiment exactly to see if you did it as well as you claim. Or to push it further.

What OpenAI is doing doesn't even resemble science at all.


Umm.. no. I’m pretty sure when science was “invented” (whatever that means), there was no way to share your research with “the world”. You wrote it down in a book in you cared to reproduce it.

Stop gate-keeping science. Observe, Predict, Test, Repeat. That is science. Kids do science when they watch adults do things. Just because you aren’t sitting in a lab publishing things no one reads doesn’t mean you can’t “do science.”


Just simply learning something isn't "doing science." The scientific method is a natural learning process it's not the only thing that defines science. This is incredibly light gate keeping. It's just maintaining a basic professional definition.


If a child watching an adult is "science", then when I brush my teeth I'm a dentist.

Words have meaning and when you dilute them down this far the only result is the destruction of communication.


Words do have meaning.

den· tist ˈden-təst : one who is skilled in and licensed to practice the prevention, diagnosis, and treatment of diseases, injuries, and malformations of the teeth, jaws, and mouth and who makes and inserts false teeth

Brushing your teeth does not imply you are a dentist.

sci·ence noun 1. the systematic study of the structure and behavior of the physical and natural world through observation, experimentation, and the testing of theories against the evidence obtained. "the world of science and technology"

A child may not be a good scientist, but they observe, experiment and explore their ideas. That sounds like science to me. Observational learning is a very common way scientist begin to understand things.


The publish step of the scientific method always seemed out of place to me - it means you can't discover anything on your own, which doesn't seem correct to me.

However, to discover something new (not just new to you) does require contact with established knowledge. And if you don't publish, you can't be very sure that it really is "new"; and it does not become part of established knowledge - it has not been discovered.

i.e. the concept of discovering new knowledge means it cannot be done on a desert island.


Do not forget that with ANNs we are still dealing with technologies with obscure sides: function approximators where the original function, and what it means, is oftentimes not reconstructed.

Replication of the experiment, "verification" of the law, can still be within a largely preliminary side - of the steps that describe behavioural details, patterns over "protocols", without the explanatory side (that which cause understanding, e.g. enabling deduction).


The reason that this is Engineering as opposed to Science, is that the hypothesis is just, "hey, maybe this will work".

Nobody has really explained why it works.


Ask Feynman why magnets works... And he'll say no one knows. https://www.youtube.com/watch?v=36GT2zI8lVA

It's the same in ML. Good predictions do come from real ideas about how the systems work - ideas in information theory, entropy, cognitive science, statistical mechanics and so on - formulated in a context of our existing understanding and prior results. That's science.


Feynman was a genius. It takes a genius to say something like that, and it's surprising how many "scientists" are upset by statements like that, because they are so proud of their hard-won capabilities in modelling. Their curl and divergence operators, their Maxwell's equations... they forget that the map is not the terrain, and the universe is, at it's root, irreducibly mysterious, and the fact that we understand anything about it at all is gobsmacking.


>irreducibly

What are you basing that on?


Which is a fundamental flaw in science.

Most of AI academics have spent their career theorizing complex algorithms or complex explanations of intelligence.

But the engineers have built large enough Neural Networks to give us data points that show intelligence is emergent out of relatively simple components.

Unsurprisingly, the people who believed they were the smartest were the least likely to explore the possibility that human intelligence isn't general, but specialized.

Echos of the academics building heliocentric models of the universe centuries ago.


This is my opinion, so take with a grain of salt:

I believe it's entirely possible that the scientific method breaks down past a certain level of system complexity defined somehow by thermodynamics.

This would in part be due to the infeasibility of running the proper experiments to understand the effects of single variables when tens of thousands of variables might be changing at the micro level.


I think the colloquial understanding of "science" means studying how the world works. This is more like engineering: exploiting our understanding of how the world works to build new capabilities that we don't fully understand.

I don't think you would call it "science" for a bunch of single cellular organisms to cooperatively evolve a multicellular one. Similarly you wouldn't call it science when humans create digital lifeforms that require actual science to be done to understand how they work.


...but in a technical context, not in a scientific one.

One thing is identifying (age of copper; age of bronze) the best ways of smelting ore to obtain the metal through trial and error, another is to try and understand the nature of materials.


Corporate wants you to find the difference...

That is to say, the two things you mention are the same process. "Identifying the best ways of smelting ore to obtain the metal through trial and error" is the easy part, when you get to pick low-hanging fruits in a field. But as the easy options get cleared out, continuing improvements requires increasingly complex, sophisticated methods - that's where the process transitions towards "trying and understanding the nature of materials". It's still the same process, but couple layers of abstraction up from the original "how to get better stick than neighbor and beat them instead of getting beaten".


The difference, while moving in "interdependent" directions, is in the purpose: obtaining some sufficient information on how things work versus an actual consideration of the nature of things.

It is not really (fully the same process), because you could (in theory) "early stop" when you have achieved technically sufficient competence - the description of the optimal process -, before the jump to the understanding. Something works - that is the technical side; why it works is the scientific side.

Note that the approach has controversial sides: if you took Newton's "hypoteses non fingo", He refused that jump explicitly:

> hypotheses, whether metaphysical or physical, or based on occult qualities, or mechanical, have no place in experimental philosophy

I.e.: Newton proposed an «experimental philosophy» which stopped short of a "true understanding of the reasons behind phenomena", in order to avoid contexts in which solidity (at his time) could not be expected. "Understanding" as "identifying universal laws" was the best that seemed affordable.


I wanted to address that in the second paragraph, which I ultimately deleted before submitting, because I couldn't phrase it right. But since you brought it up: I'd consider this an effect of specialization.

In the process of improving your object-level "how things work" goals, you end up generalizing and stacking increasingly complex theoretical models. Soon enough, you end up with people working high up the stack - not knowing or caring about the initial goals. Those people end up growing the "mound of knowledge" both upwards and sideways. The work is sort of self-justifying, but really, it's also self-similar. Where early materials science may have been driven by, say, desire for better/cheaper weapons, soon enough, you have people doing materials science because they desire to solve a puzzle. Whether the practitioners are smelting different combinations of ores to find one that will win them the war, or they're mixing up different kinds of equations to figure out a clean solution to a theoretical conundrum - it's the same process, same motivation. And it always involves play.

The kind of methodical, boring approach, with hypotheses and control groups and peer-reviewed papers? That's the boring part you have to do after play.

See also (with no implied judgement in this context): software developers that lose sight of (or care little about) business goals, and instead aim for theoretical markers of what "good code" is, and/or solve abstract puzzles of algorithms and architecture. Or the MBAs that view companies as abstract money-printing processes, running them by the rulebook that's entirely independent of whatever it is the business is actually doing or selling. Both are cases of growing complexity creating a new field of work that's independent of what brought it into existence.


> an effect of specialization

It is also a specific purpose, a deliberate endeavour. We are in front of a world, we have to digest its phenomena, we try to understand it - it is a primal process, i.e. mental digestion. In some ways it could be argued that the "scientific drive" is anterior to the technical one: first you assess, then you act. Consistently, science started as "natural philosophy".

It is relevant, in the framework of "scientific drive as (development from the natural process of) mental digestion", that terminologically "science" is not just "the realm where false statements start to exist (and are tried to be avoided)" ("discriminate" - ex scindere), but also, before that, an activity of sorting, of "separating" conceptual elements - mental digestion. The "scient" (as a participle of scire, used as "to know") is a "distinguishing" mind.

We not only «desire to solve a puzzle» - it is our nature (as per the above) as minds dealing with a phenomenical world we need to grasp, as part of the activity of "seeing" it, already before action.

«Play[ing]» is a process that both allows to refine knowledge of the world, physically (in the physical play), and to refine the grasping of the world, mentally (in the mental play).

> growing complexity

Or just abstraction, enabled by a refined framework (following which, other areas become primary, topical - in the synthetic effort which is part of the above said process).


Those steps are both necessary and usually take decades to build understanding.


Next time we get the urge to complain that GPT-n is just applying patterns seen in the training corpus we should remember humans don't do much better. We are just language agents with rich feedback from outside. We can't even write software top-down in one go without running the code.


Incidentally, that opinion is also offensive. Not just some tossed "All X are Y, goodnight", but a loaded one.

If you happen to conjure thoughts and let them roam wildly, well don't. Make an effort. Critical thinking as a conscious intentional effort must have plenty of literature. The "feedforward" output, the hallucination, is not a dish for service.


> patterns

Except we consciously (intentionally) work on those matchings.

> are

Opinions should be offered in proper places. Polling is not debating.


But the activity of "attempting to find explanations" is specific.

The attempt to formulate laws (the "better faring hypoteses") can bring to "universal laws" or to more humble "patterns". It will contribute to the scientific effort - it does not exhaust it.


I don't think things are quite as dire as that. I have experience as a machine learning scientist, have actually never worked on the kinds of papers you're talking about, but have helped apply machine learning to scientific problems. First in a systems biology lab for tasks upstream of drug discovery, and then in industry for credit risk modeling. I've mostly just used classical machine learning or Bayesian machine learning models though.

My interest has always been in using machine learning as a tool to help understand some underlying phenomena, not in trying to push to the top of the leaderboard on benchmark datasets. I think this kind of attitude isn't uncommon in academic labs focused on doing real science, though the methods used aren't necessarily state of the art, and ML typically plays only a supporting role.

Curiously, I'm going to present at a conference later today and will give a toy example showing why those leaderboards are not necessarily reliable for distinguishing between the quality of different models.


> racing cool-looking sports cars.

I like the analogy attempt but there's a lot of science behind Formula 1


> as it is practiced today

Reminder that such situation - science being out of focus, below the attempt to obtain practical results "empirically" - is not necessary but contingent.

The effort towards "science" is just postponed and/or relegated to other researchers in the same field.

And anyway, it is not that the effort towards "explanation" is completely absent. The situation is that many are working on the prospect of achieving big results through bets.


Of course, 99.9% of science is engineering, not "science". We do science all the time without asking questions or attempting to come up with generalized answers.

Is a scientist doing a linear regression not doing science?

Anyway, alphafold, while not particularly "scientific", did answer one scientific question: it is possible to predict the structure of most proteins thru a combination of limited structural and extensive sequence information, combined with a sophisticated (and "non-scientific") algorithm. That was an open question for some time and their results convinced the community that their methods were right. What's amazing is that while it's entirely nonscientific, the results have been absolutely blockbuster in the scientific field. And even better, the only reason Alphafold was able to show this is because there was a well-defined protein structure leaderboard.


I love this diatribe! But to be fair, this kind of engineering historically can at least lead to science.


Still, the guys developing ML algorithms like Neural nets, Linear regression, Decision tree, SVM, Naive Bayes, KNN etc., weren't doing science?


There is actually an old argument about whether Computer Science is math or science. The lack of hypotheses and physicality drives the "it's math" camp. I tend to fall into that camp. Most computer science is either math or applied math.

That said, most of machine learning is not just engineering. If anything, there is little engineering done at all.


Yeah to my newb eyes, the core of machine learning can (and probably is) as scientific as any mathematics goes (granted they try novel ideas, and not applying old solutions). Now openai sits more on the scaling/engineering to make his business.


The note is interesting and in a way a time taking puzzle,

but do not forget the apparent point in the original poster's is the lacking approach below the implementations of above said techniques.

The techniques you listed in a substantial way expanded our knowledge ("oh, we can do this ... and solve more problems") while raising little more questions.

The said implementations on topic give results but raise many more unanswered questions.

(This post is unfortunately an "immature" reply - just a provisional tentative point. The matter of "ML algorithms ∩ science" requires more time and concentration and consultation of the texts of the Great Ones.)


>> but do not forget the apparent point in the original poster's is the lacking approach below the implementations of above said techniques.

That's right. I think the approach that is lacking is best explained in the submission guidelines for the Journal of AI Research:

>> Papers describing systems should clearly describe the contributions or the principles underlying the system. Papers describing theoretical results should also discuss their practical utility. In general, it should be clear how the work advances the current state of understanding and why the advance matters. Papers should report on what was learned in doing the work, rather than merely on what was done.

https://www.jair.org/index.php/jair/about/submissions

Doing stuff is not science, even if it's doing stuff that's never been done before. In science, we experiment to understand why stuff can be done, and how, not just that it can be done. And of course it matters what is being done, so for example experimenting to find the best combination of spices for a dish is not an instance of science.


It's like asking whether a carpenter is a scientist because they developed a cabinet...


I hate to break it to you but that is part of science. Perhaps the major part of it too. Hypothesis: I can build a cabinet with these materials which will bear some load range. Experiment: I built it and it obviously works.

Now change cabinet to particle accelerator that a giant team of other theorists and engineers designed. Am I not doing science by participating in building it? So experimental scientists arent scientists?


To me, you're describing the differences between a cook and a chef. Just because you've built something with well known methods doesn't make you a scientist. You didn't come up with anything new. In fact, we're starting to sound a lot like Apple. Apple is (in)famous for taking ideas that someone else did all of the hard work of developing and proving to work, and then take various ideas like that to combine into an actual useful working something. (They also do a lot of pure deep research as well, so don't think I'm going too far with the analogy.)



semantics, I say. cooks are "chiefs" of certain things otherwise they'd have no value on a team. unless you're saying that chefs setting the menu means theyre more of a scientist. at worst you're saying e.g. scientists are like chefs in that their used their creativity to make up the standard model. at best you're saying chefs are theorists and cooks are experimental scientists which is , again, to my own point.


i'm saying that chefs are more likely to be aware of what is happening when doing the whipping, baking, cooling, etc and why that's important vs just doing it as a step. so when they experiment with the menus, there's a bit more than basic understanding in why they think the experiments might work. cooks are just the kids in chemistry class following the directions, and depending on how well they follow the directions (and to a point how well the instructions are written), they might not blow up the lab (or burn the m.f. soufflé)


says who? cooks at a fine restaurant better know everything about their part. but they chose to try to work under a greater chef because their eventual upward mobility as a practitioner is higher. literally chef means leader or the boss. in the rest of the world that usually means less specifically qualified. not likely here that a chef isnt a good cook but a good cook certainly has the ability to set a menu if they wanted. you're not making a strong argument imo.


Well, Wikipedia might be wrong, but this is how they describe Computer Science: "Computer science is the study of computation, information, and automation. Computer science spans theoretical disciplines (such as algorithms, theory of computation, and information theory) to applied disciplines (including the design and implementation of hardware and software). Though more often considered an academic discipline, computer science is closely related to computer programming."

You might argue, though, that Computer Science is not "real" science.


I honestly feel like that, as a "Software Engineer". As the digital world is new, and full of similar concepts to the old analogue world, it's natural that we borrow names along with the concepts. Similar to how a lot of things in the sea are named after land things, with a marine prefix, like the sea horse, sea star, sea cucumber, and so on. And now in IT we have engineers, architects, rockstars, tribes, and science, even though, very often, they have no relation to the original profession or concept, and especially doesn't have the responsibility or impact of that.


Computer science is the study, that is looking at how computers and the use of computers can benefit mankind (in my short lay version). Software engineering is a subset of computer science where you build a real world application of the studies.

The algorithms behind the Facebook feed and your favorite AI/ML product is the science. The code that powers the feed and chatgpt is the engineering.

Wikipedia is not wrong IMHO, but then I studied computer science 20 years ago so maybe I’m out of touch


I agree with you. There's a lot of engineering, and science going into IT. But I'm a Software Engineer too, and while I'm clever and do clever things, I can assure you that there's no rigor or reason to call it engineering. This version of engineering is at most the re-use of the word, like how the word art is not just covering artistic expressions, but skillfulness too, even though when you apply a skill artfully, you don't create art. Similarly, I do a lot that takes ideas of engineering, but really I'm just writing code and kinda design a smallish system maybe.


Engineering is at all levels, I’ve designed systems that span global infrastructure and systems that run in embedded systems. I still think that design process is a key part of engineering. Figuring out how to put things together is part of design. I agree the more jr engineer you are, the less “engineering” you see.. but you can’t run until you learn to walk.

I talk to my friends who sent projects into space and friends who design deep sea drilling rigs. The engineering process is very similar.


I think that you could genuinely make that argument though.


It's more a science than pure math or CS, since they are producing falsifiable models.


In what sense are they falsifiable?


if they're wrong, they don't work.


If my BST implementation isn't sorted, it also doesn't work.


The difference is that you can reason about why it didn't work.

With deep learning models not so. They are too big to reason about in the same sense as your BST algorithm.

Hence, you need a scientific approach to construct them. I.e., with lots of experimenting, hypotheses, etc.


That seems like it pushes it further from science, no? The point of a well-crafted hypothesis is that if it doesn’t bear out, you know that it’s because one+ of your assumptions was wrong. Your ability to continue your scientific inquiry is pretty much == your ability to then identify which assumption was wrong.


You don't need to do an scientific experiment to tell why your BST doesn't work. "Computer Science" is a misnomer because most of its contents and methodologies are from mathematics (which is used in science but is not a science in itself).

CS uses mathematical proofs. You don't need a computer to execute your code to tell why the BST doesn't work. You can introspect your code and figure out why it works or does not work. If it's correct, CS methodology says you can "prove" that it works (without executing it).

Working with large AI models is like working with an artificial brain. It's as scientific as neuroscience in this sense. You make some hypotheses, tweak some hyperparameters, and get a result, which may or may not invalidate your hypotheses. Nobody knows why. Science is not necessarily about knowing the fundamental "whys" (amateurs think humanity has figured all the "whys" out, but that's a lie). It's about establishing some useful model of how things work.

But it's definitely possible to know why your BST does not work, even without a computer, without empirical testing. That's why CS is not a science.


Not sure what you are saying here. Perhaps an analogy helps.

Psychology is a science. You can make falsifiable statements about the human brain. You will need experiments to build and test theories. It's the same with deep learning.

With computer "science" (and math) it's not the same. You can reason completely about your subjects, i.e. you can determine if something will or will not work just by reasoning, no experiments needed.

For more information on the differences between math and science I recommend reading: https://en.wikipedia.org/wiki/Scientific_method#Relationship...


Seems like the distinction is mainly between which tools are available to you as a scientist (at least if we stick to comp-sci, math is in a league of it's own). When, or if, we can completely model a human brain, a psychologist would no longer need to perform experiments to test their theories.

Given enough computing power, most theories could theoretically by proven or falsified purely through reasoning.


The point is: being able to run a brain inside a computer is not the same as understanding that brain. If you wanted to build a new brain, you'd have to reach for the tool all the time in an iterative way and hope for the best. Only tools that aid in understanding matter. We have only very few tools that help DL researchers better understand what they are doing. Hence DL is more akin towards science than towards math/CS or engineering.


Same is true of a bridge or a combustion engine though?


Exactly. I think the word "science" changed its meaning a lot these years.

For some reason people tend to consider a field that is more "formal" (like pure math, some CS concepts like lambda calculus) more science, even though historically formal systems came very late, and practically very few systems can be described that way.

I really wonder whether people who regurgitate "machine learning isn't science" think theory of evolution is science or not.


Science is knowing how the model scales when you throw this much processing power at it. They won't even tell us how much processing power that is.


Are you secretly Ali Rahimi? :-D


They are doing computer science.


Obligatory xkcd: https://xkcd.com/1838/


Unfortunately I've found the current OSS models to be vastly inferior to the OpenAI models. Would love to see someone actually get close to what they can do with GPT-3.5/4, except capable of running on commodity GPUs. What's the most impressive open model so far?


LLaMA 30B or 60B can be very impressive when correctly prompted. Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .

If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored... The interesting thing with these Uncensored models is that they don't constantly answer that they cannot help you (which is what ChatGPT and GPT-4 are doing more and more).


Just a reminder that LLaMA is not open—in order to use it legally you have to agree to Meta's terms, which currently means research use only. The versions circulating on torrents are essential pirated, and while I don't have an ethical problem with that at all you can't use it safely in a business.

The open replacements for LLaMA have yet to reach 30B, let alone 65B.


If anyone has a copyright claim to an LLM, the creators of the input data have more of a copyright claim than the company that trained it. There's a good chance they are not copyrightable at all. I'd bet there's a lot of people willing to take on that risk.

However, they might still fall under trade secret law.


Why would an LLM be any less copyrightable than any other piece of software?


The "software" part of an LLM is pretty trivial -- the interesting piece is the the weights. Since the weights are mechanically generated by a computer, it can be argued that the weights are not copyrightable, just like a photograph taken by a monkey isn't copyrightable.


The software is the matrix multiplication and gradient descent. We are talking about the numbers in the matrices. They are the output of a training algorithm, so we can only talk about the copyright on the training algorithm, and on its input data.


The model weights could be seen as a derived work, for which they didn't get the permission of the original copyright holders. Alternatively, it can be argued that the LLMs are no different than a fanfic writer trying to imitate the style of their favor author.

It's not obvious which way it will go, but I can see the point of those arguing that LLM data are ill-gotten gains.


For the same reason that phone books cannot have copyright.


People always bring this up like it’s a big deal, but most users aren’t interested in starting a business. We just wanna play with LLMs.

Frankly, I’m glad we don’t have a bunch of llamas in different skins being hawked like the current crop of “AI” startups that are just thin layers over OpenAI’s API.


That hasn’t been true for a while. Falcon 40B seemingly outperforms LLaMA 60B according to the OpenLLM leaderboard

https://huggingface.co/tiiuae/falcon-40b


Fair enough. I haven't really looked at Falcon as a replacement for LLaMA yet because it isn't supported by llama.cpp, but it looks promising.


Falcon is an open (Apache licensed) replacement for LLaMA, with a 40B version that's competitive with LLaMA 65B on benchmarks.


4-bit quantization removes a lot of the model's sophistication, and 60B parameters is still smaller than what GPT4 is using.


The point is that it's infinitely better in not being there "just to take your jobs and make a few VCs richer". Nobody even claimed it's more performant. It's like the difference getting nothing, but keeping your land, and getting glass pearls, losing your land. You have to completely ignore the meat of the argument to even pretend there is a contest.

And this is without considering what happened if we stopped feeding hostile actors and supported ourselves, instead of keeping to do the reverse. Not just here and there, but consistently for decades.


That argument seems more political than practical.

If on one hand you have a tool that you can actually use to help with your job, and another that sounds like a very advanced chatbot but doesn't actually provide value, well the second tool being open-source doesn't change that it's doesn't provide value.

(Also, assuming that open-source tools aren't going to upend a ton of people's jobs seems really naive. These people aren't going to be any less bitter that their jobs are taken by freelance nerds instead of corporate nerds.)


There is no way I am going to spin up my own worse LLM so a few people will make less money. Even if it was 1-5% better. It's just not worth the time.


It's not "a few people making less money", it's a few gigantic monoliths carving up the future, like blind watch-destroying gods -- or at least wanting to, no matter how nicely they dress it up. And it's not about utility or chance of success for everyone, either, but rather trying to do something in an ethical or more clean way just because that's more fun for them.

But I have to admit to being an idealist, and while I disagree with you because of that, I don't think you should be downvoted for basically just bringing up the majority position. It's easy to complain over people not being starry-eyed idealists that make great personal sacrifices to bring along an utopia for people in 10 generations, or whatever. It's way harder to find and teach the joy of doing something for the sake of doing it, and at the same time come up with medium and long-term ideas that are realistic enough to make working towards them fulfilling, but also genuinely beautiful and true. The whole "rather than teaching to build a ship (we can't even agree on!), teach people how to long for the ocean" thing. It's a really hard problem.


Wow. Beautifully articulated.

What a pleasant reply to read. I don’t have an argument regarding my position other than I agree that what you’re saying is true and that getting people like me to care and make sacrifices not just today but every day in a long term way is what makes hard problems hard.


Thank you :) But I didn't mean to say it's those pesky non-dreamers who make the problem hard. Three idealists will have ten possible utopias that are partly or completely mutually exclusive.

And to be frank, I think a lot of the finger pointing at people who don't care enough about issue X or Y is really because of not having found good ways to work constructively and make progress with however few people who do care. Partly also because people cannot agree (for long, tend to splinter into more pure sub groups and all that).

At any rate, the way can't be "I should feel bad and do better", but rather "I want what they got!". And the burden can't be on the people who aren't yet seeing anything that makes them excited to get excited anyway. And it can't be about being selfless for the benefit of others, or future generations. It has to its own reward right here and now. It is about and for you just as it is anyone else, if you know what I mean. Sacrificing others and sacrificing oneself is sacrificing people in both cases. Neither is noble IMO.

I guess the best chance of fighting tech giant strangleholds is still empowering "normal people" to carve out their own little spaces. All people will not finally learn how to make websites if only we crushed Facebook and what have you, but instead if more people had fun making their own little websites, and if we could come up with good ways for them to connect(peer-to-peer on the desktop, right after Linux!), Facebook and others would play nicer. It's not that big companies are a problem, it's the abusive things they do when they're the only game in town.

And likewise, and back on topic: a really good argument would be something I don't have the knowledge for, namely things you can do with a LLM that you can fully control (or at least can wildly poke at and experiment with, or just "download mods for") versus a much more powerful LLM that you don't really control, other than your prompts.

Thanks for reading!


> 60B parameters is still smaller than what GPT4 is using

I mean if the article is right, then it's about 3.3% the size of GPT 4 (although it's a sparse model so not all of it is used on every pass).

Meta also didn't train LLaMAs on nearly as much code it seems, so they're much worse for that in general.


Does it? The GTPQ paper claims that the accuracy loss is small.


Can't lose what it didn't have in the first place.


I remember to have read somewhere that GPT4 is not a single model but several models whose parameter counts are reported as a single sum. Perhaps quite doable but at lower speeds?


The article linked here talks about GPT4 being a mixture of experts, which is exactly what you’re describing


How do you correctly prompt it? A lot of people are not familiar with how to do this. I think this would improve how many people are using the non-OpenAI models.


How low can you get the memory and computational power requirements that way?


You can run that model (Wizard-30) on a computer with 64 gigabytes of RAM (or smaller, I don't know how tight you can cut it). You obviously want fast RAM and a good CPU, but you don't need a GPU.


I am running 30b llama models (4 bit quantized using llama.cpp) on 32 gb of ram and no GPU. I get around 2 tokens/second.


You can also travel on a bike from NY to LA.


>You can also travel on a bike from NY to LA.

You can. In fact, my brother did so a bunch of years ago. He found it to be a wonderful experience that made his life better.

He's also flown on a commercial airplane from NY to LA (as have I, as well as millions of others) and while it got him to Los Angeles, it didn't provide the levels of sensory input, personal interactions and experience that riding his bicycle did.

That's not to say everyone should ride bicycles across the US every time they need/want to make such a trip, but doing so at least once can be a more positive experience than sitting next to some strangers for five hours.

The satisfaction of doing so, or the experiences in interacting with people and the landscape during such a trip aren't quantifiable, but reducing the value of doing so (if I'm missing your point here, my apologies) to the time required to make such a trip is reductive in the extreme IMHO.

Edit: Clarified my prose.


Love this comment so much. Your brother sounds cool :)


Just imagine if Boeing lobbied the government to pass a law banning you biking from NY to LA.


AFAIK you can get away with a swapfile, no need for large amounts of RAM.


wont that nearly kill your ssd if you do it for extended periods of time?


Most of the ram is for storing the model once it is loaded it is read only so will not harm an SSD.


It only reads from memory,not swap directly. If it needs to read something from swap, it'll write out something from memory to swap, then read the swap into memory. Reading 1gb of swap, will essentially write 1gb to the ssd too. (rough numbers)

Correct me if I misunderstand swap?


If the underlying data hasn't changed, the page isn't written to disk. CPUs do keep track of writes that mark a memory page as "dirty".


That's basically right. I'm not sure if Linux or windows will keep track of the pages it read out of swap to know if they're still there and valid, but there's a better way for this that I think at least ggml supports where it stores a copy of the model unpacked and ready on disk as a cache for doing the work rather than relying on the OS virtual memory to handle it. This should be faster than the OS VMM (though probably not by much) but since it'll know which pieces it needs to leave on disk and where they are it should be much safer as far as writes go since it will know enough to not write multiple times like that.


> The interesting thing with these Uncensored models is that they don't constantly answer that they cannot help you (which is what ChatGPT and GPT-4 are doing more and more)

that's great to hear. that political correctness in gpt is annoying.


Have you tried falcon 40b instruct? Also take into account that chatgpt likely has some preprompt and by talking to falcon or other OS models it's all in your hands.

Furthermore, Not many people discuss the significance of proper output sampling. I myself used to just test open source models with the greedy decoding only. Who knows if they wouldn't even beat (not at all)OpenAI with some clever output sampling scheme.


Does anyone have a link / instructions on getting Falcon 40b to install on Apple Silicon?

Apparently "Hugging Face" have some internal swift code that works (but it has not been released). I'm keen to see how it performs on a maxed out Mac Studio (with all that unified memory available).


I thought this too, but it sounds like the performance of the GPU ultimately holds it back. Maybe it is the software, but I have yet to see a test on Mac silicon that performed well.


It is not too bad, 7B is doing 432 tokens per second on a Macbook[1].

[1] Video: https://huggingface.co/datasets/huggingface/documentation-im...


Idk, has anyone tried falcon yet? The support for running it remains nonexistent except for one fork of llama.cpp that isn't integrated into anything. This trend of every new model breaking compatibility really needs to stop.


I have and I am running it locally. Mostly in 7b variant which runs pretty much at chatgpt speed (when streaming) on my ryzen 3700 cpu + 32gb ram + 2xnvme in a mirror (it doesn't fit in the memory in its entirety, few gb go into the swap).

Of course to run it like that I have to be running nothing else. No xorg, no chromium etc. Just a pure linux console.

If you want to try falcon 40b instruct by yourself here I'd a public demo : https://huggingface.co/blog/falcon

Go to the bottom of the page.


While OpenAI likely has some insights that open-source and closed-source competitors are lacking, OpenAI is mostly in the lead because they can burn absurd amounts of cash running an absurd amount of compute via their partnership with Microsoft.


What are all the researchers in universities doing ? Couldn't they improve these models (they do have big brains after all) with tax payer's money and put the results under some cool open source license...


Yes, they are doing the improving, but then you need loads of money to do the learning no university can afford. So now big tech is hiring promising university researchers for good money to scale up their research. This could be solved by massive decentralization where millions of users provide compute with their gpus and i think it will be at some point, cause i believe foss is more powerful than this openai bs. There are people working on this, but afaik the techniques aren't quite there. You need a different kind of model with much more parallelization then what is currently used.


> There are people working on this, but afaik the techniques aren't quite there. You need a different kind of model with much more parallelization then what is currently used.

What if crypto is switching from mindless hashing as proof-of-work to training AI models as proof-of-work? That would mean suddenly big computing resources are available.


For blockchains using proof-of-work you need two things:

1) The work has to be very hard to do (and quantifiably so), but very easy to verify.

2) The block's transactions have to be an input to the computation, and it has to be impossible to get the same output with a modified set of transactions.

Cryptographic hash functions fulfill both of these requirements. Almost nothing else does.

However, if blockchains switch to proof-of-stake, then the GPUs previously dedicated to that blockchain are available for other purposes. But the biggest GPU-based blockchain already did that, and Bitcoin uses specialized hardware that can't do anything other than Bitcoin's hash function.


“What if running part of the paperclip maximiser that destroys humankind was profitable?” is both a bit of a nightmare sci-fi dystopia, and a reasonable description of companies in the modern economy.


That would indeed be neat, no idea if that is possible safely. Blockchain technology sure looks like a good fit for the organization of such a decentralized model. The reaction of the people of hn to 'blockchain-ai', probably won't be kind though, lol.


Blockchains depend on small inputs and outputs, and easy verifiability of the computation. AI training is none of that. It's a massively interconnected problem, Google and NVIDIA built traditionalish fast-interconnect supercomputers to do it.

Maybe it's possible to implement it in a decentralized way with not-completely-useless performance, but by then OpenAI and others will be even more ahead. (Sure, it'd be good to have such an implementation, so maybe enthusiasts will do it eventually, but that's mostly for fun.)


Yeah, that's the current state of course, I'm betting on future developments. Neural networks are still in their infancy. Stable diffusion showed everyone how much more powerful open models are (at generating porn /s). Chatgpt is severely limited, cause Openai fear public outrage, litigation and regulation.


Yeah I keep telling people the universe played a joke on us with the crypto winter happening right before GPT-4 being released.

Flip the timeline and crypto from the start would have had this as a goal.


I knew for the lack of money. But the parallelization idea, I didn't. Thanks for posting !


They are busily working on all the other large scale engineering projects that they are so good at producing and maintaining.


they produce papers that contain (sometimes) useful ideas. They don't produce code (other than proof of concept) and they certainly can't afford to train a large model


They don't have enough money and thus computing resources. It's a huge gap.


Probably Wizard 30B, they released the training weights at every offset as well.


In my testing, the chat and instruct-tuned versions of MPT-30B is very close to 3.5 for many tasks, but of course the team who made it got bought up immediately and it’s licensed only for non-commercial use. I’m hoping the open source community runs with the base model in the same way they did with LLaMA.


> You can use the above without paying OpenAI. You don't even need a GPU. There are no license issues like with the facebook llama.

I actually wrote about getting an LLM chatbot up and running a while ago: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...

It's good that the technology and models are both available for free, and you don't even need a GPU for it. However, there are still large memory requirements (if you want output that makes sense) and using CPU does result in somewhat slower performance.

There are async use cases where it can make sense, but for something like autocomplete or other near real time situations we're not there yet. Nor is the quality of the models comparable to some of the commercial offerings, at least not yet.

So I don't have it in me to blame anyone who forks over the money to a SaaS platform instead of getting a good GPU/RAM and hosting stuff themselves.

Here's hoping the more open options keep getting better and more competitive, though!


I bet that open models win in the end because porn. There is already very weird and vibrant community creating "waifus" and tinkering with these models.


Huh, more power to those folks then, I guess.

But I can easily imagine more conventional forms of entertainment, as well. Like a game of D&D that's narrated by the AI, or a text based adventure set in the Mass Effect universe, Lord of the Rings, Warhammer or any other fandom, really. Maybe like those old Choose Your Own Adventure games.

I think some companies are also experimenting with characters in video games that get their dialogue from these models - where the developers give the character a persona, provide information about events in the world and let players interact with them, like the Detective Origins demo. Of course, due to the slightly unpredictable nature of these models and their hardware requirements, no idea how viable this will be.


The application in games I'm most excited about is commenters in FIFA career mode that don't have a limited set of prerecorded voice lines, and take your recent games, formation changes etc into account too, like real commentators would. The recent installments already do that to a small degree. Of course this would also easily open the doors to having multiple commentators/analysts to choose from, each with their individual "personalities". The technology for that is pretty much all already there right?

A game like Fallout/Elder Scrolls with AI generated NPCs and questlines would be so sick too if executed well.


There is a mod for Skyrim where someone piped together multiple AI models. It goes like this: You speak into your microphone and ask a NPC something. This gets transcribed (voice to text) by Whisper AI. This transcript gets send to eg. GPT-4 with a pre-prompt engineered to give background, current information and the "personality" for the NPC you are talking to. The output of this gets piped back to a Text-to-Speech solution like eleven-labs with the original NPC voice.


I've seen an example and the weakest link seemed to be the TTS, which sounded several generations behind.


Maybe for side quests but even that would be a debug hell.


You are describing AI Dungeon, a third-party product using GPT-3 that OpenAI killed about two years ago.


Do you seriously think D&D and other choose your adventure games will be more popular than porn? Seriously?


If they also include porn, maybe.


Did you try AI Dungeon before OpenAI killed it?

Yes, people used it for porn.


The (even pre-emptive!) opposition and censorship seems way stronger this time around than with previous technologies. Like instead of ignoring the porn (or even profiting from it), they are trying very hard to make it impossible.


Perhaps this could happen to make Stable Diffusion win against Midjourney/Adobe. LLMs however have more than enough commercial support to overcome weird communities, and these communities care less about text than images.


Porn could definitely be a killer app for LLMs, and I strongly hope that open source models win out for a number of reasons, but I'm not sure it will happen. Microsoft and OpenAI have quickly gone to great lengths to try and limit the amount of output they find offensive, and those efforts have been fairly well received by the general public.

Despite the size of the regular online porn industry, it still gets strangled by payment processors. There is plenty of appetite out there for restricting porn in various ways.


> Microsoft and OpenAI have quickly gone to great lengths to try and limit the amount of output they find offensive, and those efforts have been fairly well received by the general public.

it offends me. i think many find this neutering annoying. and many more will. i think bing chat is fine this way, and for the general public, but if i'm paying to access the gpt api, there should be an uncensored version. i'm an adult. i can handle 'offensive' things. it's offensive telling me i'm not allowed to see 'offensive' things.


Porn will save the world again!


> try to pressure the government into making it illegal for you to compete with us.

I mean the guy who created GPT-4 literally demanded a ban of any system more powerful than GPT-4.


Are you sure that's not a quote from a game of telephone?

What I've seen from the horse's mouth is more like:

"""There are several other areas I mentioned in my written testimony where I believe that companies like ours can partner with governments, including ensuring that the most powerful AI models adhere to a set of safety requirements, facilitating processes to develop and update safety measures, and examining opportunities for global coordination."""

Now, sure, I could probably take a malevolent moustache twirling villain monologue and use chatGPT to turn it into a bland and prosaic statement like that, or even something utterly milquetoast, but if I try to imagine Altman playing 5D chess with the aid of his own AGI, there's the much easier solution of… just not telling anyone you have GPT-4 in the first place while using its power to manipulate etc.


The written testimony didn't say "ban", but it did stress fairly complex regulation.

"First, it is vital that AI companies–especially those working on the most powerful models–adhere to an appropriate set of safety requirements, including internal and external testing prior to release and publication of evaluation results. To ensure this, the U.S. government should consider a combination of licensing or registration requirements for development and release of AI models above a crucial threshold of capabilities, alongside incentives for full compliance with these requirement"

https://www.judiciary.senate.gov/imo/media/doc/2023-05-16%20...


> The written testimony didn't say "ban", but it did stress fairly complex regulation

You feel that quotation justifies the adjective "complex"?


I didn't mean "complex" in some sort of flippant, derogatory way. The language is suggesting regulation with active government involvement and oversight (licensing, for example). In the range of regulations in the software world, that would be on the far right end. ITAR, for example, requires licensing. People regularly describe the process as "complex".


Hmm.

While I don't doubt that ITAR is especially complex, following legislation is pretty common as a basic requirement in software development — say GDPR, or tax rules, or COPPA, etc. — it really doesn't seem to me that this is pushing all that hard for anything specific and detailed enough to even justify the claim that they're calling for a specific level of complexity of legislation.


Yeah, I'm not trying to argue that what's being proposed is bad, just trying to clarify what he said. The licensing part does make it more like ITAR and less like GDPR, HIPPA, etc. Those sorts of laws don't have active government activity to certify and license beforehand, though there are 3rd party orgs that will do that sort of thing if you want.


> Yeah, I'm not trying to argue that what's being proposed is bad, just trying to clarify what he said.

Ah, fair enough then; it seemed to me like you were saying it was excessive and bad.


Same as any political argument today. Strip it completely of nuance. Put it in a Tweet. Get clicks.


yes, they want few companies (open ai, goog, not sure who are others) gate keep potential competition through safety certification.


I don’t understand this. Won’t that hurt their progress on GPT-5?


The public position (as opposed to the rumour mills) is that they're not working on a 5, and don't intend to at least until they can figure out how to do it safely.


And safely here means non disruptive to established businesses. If they create a tool so powerful that it can replace entire professional classes, that would just mean people can bypass the employers and get their value directly from an API.

OpenAI needs to make sure they have a business model where they can charge enterprise fees and provide value to corporations, not directly to people. ChatGPT was merely a publicity stunt. I do believe they got scared of how much value people could derive independently from it tho, hence the "regulate this" pressure.


> And safely here means non disruptive to established businesses

Why would OpenAI care about that?

No, it means safety, as in not giving out dangerous answers that get people killed.


Because they belong to Microsoft. Because they are funded by rich people who own established businesses. Because they are the privileged .01% who are invested in the status quo. Because their whole networks consist of people with entrenched business interests. Pick one or more. Why would you think rich American engineers are in general any more worried about the overall safety of the world then their own self interest? That's what they show day in day out with their decisions.


> Because they belong to Microsoft

They don't — 49%, and they get to buy themselves out of even that, and if they do that by eating MS's lunch? Then it would suck to be an MS shareholder.

And they've been cautious since before the investment, with the cautious announcement of GPT-2 a month before the change in corporate structure to allow investors.

But even if OpenAI were owned completely by MS, why should MS care? Disrupting all employment sounds like the raison d'être for most if not all of silicon valley.

At the very least, imagine all the employees MS could let go if GPT-4 was even 80% as good as the advertisements in the form of "buy my guide for how to use GPT to make a fortune with best selling books/award winning advertisements/fantastic business proposals" keep claiming.

> Why would you think rich American engineers are in general any more worried about the overall safety of the world then their own self interest?

One is a subset of the other.

People regularly disagree about how risky things are, but they don't generally pick "destroy all things" over "destroy that which is close to me" when they actually believe that's the real choice and not, say, a panicked Chicken Little (which is how I think LeCun sees Yudkowsky).


MS not owning OpenAI outright is a technicality. "It would suck to be an MS shareholder" due to things done to subsidiaries of MS constitutes dereliction of duty by MS officers. MS is not Silicon Valley. They're based in Seattle and predate 95% of SV. Besides this is just one of my points. Even if everything you claimed were to be true, OpenAI still is, as I pointed out, part of the American Corporate establishment and despite their PR posturing, they won't provide us with technology to disrupt said establishment.

They are clearly trying to angle for maximizing the profitability of that establishment.


> OpenAI still is, as I pointed out, part of the American Corporate establishment and despite their PR posturing, they won't provide us with technology to disrupt said establishment

They're not acting like they're part of the American corporate establishment, and I don't understand why you think they are.

Rather than quote my other comment comparing against 3M, I'll link to it: https://news.ycombinator.com/item?id=36677182


Pointing to a company that was fined for dishonesty as an argument for taking another company's PR at face value makes no sense.


Eh? People as a rule care about others and this tendency actually INCREASES as they get wealthier and safer as are less worried about themselves.

Your average OpenAI machine learning expert cares plenty about not killing people, just like you do.


> as in not giving out dangerous answers that get people killed

Literally taken, that is quite close to impossible.

It was news days ago of somebody who committed suicide after having an interaction with a bot about nuclear risks or similar.

To avoid that, the bot would have to be a high-ranking professional psychologist with an explicit purpose not to trigger destructive reactions.

And that would fail the nature of a "consultant", which is something "under best effort to speak the truth" - incompatible direction with "something reassuring".


> To avoid that, the bot would have to be a high-ranking professional psychologist with an explicit purpose not to trigger destructive reactions.

Sounds fairly dooable, TBH, given what else they're doing reasonably well at.

> And that would fail the nature of a "consultant", which is something "under best effort to speak the truth" - incompatible direction with "something reassuring".

I don't think they're as incompatible as you appear to.


With reference to the case I mentioned (which seems like a decent border case to point out difficulties), what you want as a consultant is something that tells you "the likelihood of the event is bound to this and that", not something that goes "Oh, everything will be all right, dear". Outside figurative: truth (say, facts, or instances of computation) and comfort may not always be compatible.

> reasonably well

But I wrote «high-ranking professional». «Reasonably well» does not cover the cases relevant to full safety.

And, by the way, such attitude will backfire heavily on false positives.

Anyway: the case presented is there to point to a few difficulties involved in the idea of «not giving out dangerous answers».


"I can't conceive of such failure mode thus we're safe"


Sorry, I do not understand what you mean...


I think they read the first line of your reply and took it as you arguing that it's nearly impossible for an LLM to give output that would get someone killed.


I was agreeing with them. Why assume wilful ignorance? Have we become the new reddit? People just yelling in disagreement? Maybe it's time to log out and delete this password...


No, Namaria, we were just trying to understand what you meant - which for us resulted cryptic.


Apologies, just found your post confusing and was trying to make sense.


if that is the bar to pass for content for public consumption, we're dangerously close to book burning, again.


Because there's more profit in a corporation paying you billions than in thousands of people paying you $10/mo each.


Huh? They'd sell access to the tool to the employers. They'd make a nice rent, employers would make a nice profit by firing all their employees, and customers would still have to go to employers to get whatever service done.


Or it means they are legitimately concerned about the capabilities of models more powerful than GPT-4 and their impacts on society. Didn't a bunch of influential people sign a six month halt on AI research to figure out alignment?


Safety also might mean not providing a route/framework for an api loop to create a self destructing information virus that becomes the best worm in human history :)


I think the reality is that training a "GPT-5" which is as much of a bump as GPT-4 was over GPT-3 or that was over GPT-2 would cost in the billions of dollars using today's hardware so they're choosing to wait it out until the risk of them screwing it up doesn't cost as much as a state of the art microfabrication plant.


[or until someone else starts catching up]

The same dynamic that incentivized a mass rollout of these unaligned systems will be perfectly sufficient to incentivize mass rollout of stronger, also unaligned systems.


Yeah, that's one of Yudkowsky's fears.

I'm not going to take his fears as gospel, as he's spent so long focussing on his fear there's a danger of availability heuristic/attention biases having him take the worst case for everything.

I'm still going to promote caution, as the more potent a tech the greater the downside of being wrong, but it currently looks like we can cooperate with each other in this IRL iterated prisoner's dilemma.


More cynically, they think they've hit a wall and a projected GPT-5 won't be a huge or meaningful jump in performance, so they'd like everyone else to slow down too.


I believe that the superalignment thing will basically be GPT-5.


I think it's a necessary precursor to any future model. Would you want to spend ten times as much (or whatever) red-teaming the next increase in capabilities? And what would make you more confident in advance that the red team would spot everything?


Keep in mind, that was the idea originally. Then in 2018 Elon decided he wanted to be CEO, the board rejected that idea for now obvious reasons, and he reneged on 90% of a promised $1b donation. The only way forward was to become a for-profit company and do a normal funding round, which is what happened with Microsoft. Elon’s rug pull is why this happened.


And to this day Elon slanders OpenAI for taking profit.

It’s strange because their corporate structure is highly unique. They generate fixed returns on investments, so investors can’t make that much money. Also, the profit-generating division is wholly owned and controlled by the nonprofit. I don’t understand why Elon and others have so many problems with this.


OpenAI was so destitute, having only $100,000,000, that they had no choice but to fuck us all and lobby congress to make it illegal for us to use this new technology? That doesn't make sense.

If your choice is between $100M + doing earnest work, or $1B+ and debasing our free and competitive society, and you take the latter... what does that say about your collective character as an organization and as a set of people?


$100m buys you nothing. $1b is table stakes now.


I would love to use these... but they suck. They don't even come close to what OpenAI offers.


I think most of HN has only tried ChatGPT with GPT-3.5 in December 2022 when the OpenAI servers were getting hammered. That is why they're impressed by these slow and low quality local models and mistakenly think they're on par with OpenAI's offerings.

Honestly even reddit and teenagers on TikTok have a more accurate view on OpenAI vs. local LLMs than HN.


There’s not a single thing out there that even comes close to GPT-4. Not one that I’ve used anyway. Benchmarks be damned, it’s the experience that matters and I’ve yet to have an LLM blow my mind the way GPT-4 does.


Have you used Claude? I regularly use them instead of GPT4 because of the larger context window. 4 is still useful when I want to give instructions to the system (Claude will refuse to do a lot), but generally the responses seem on par with each other.


Sorry about the link.in style links but I just posted this there and came here and felt it might be interesting to someone, they do work!

...

Here's a round up of open source projects focused on allowing you to run your own model's locally ('AI'), they all take slightly different approaches although under the hood many use the same models.

https://lnkd.in/exKqJZm8 A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.

https://lnkd.in/etVCmZHB With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. State-of-the-art LLMs: built-in supports a wide range of open-source LLMs and model runtime, including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.

https://lnkd.in/e7-NKGzJ LocalAI is a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.

https://lnkd.in/ef_Sa9AN Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer

https://lnkd.in/e288q-Wb A desktop app for local, private, secured AI experimentation. Included out-of-the box are: A known-good model API and a model downloader, with descriptions such as recommended hardware specs, model license, blake3/sha256 hashes etc... A simple note-taking app, with inference config PER note. The note and its config are output into plain text .mdx format A model inference streaming server (/completion endpoint, similar to OpenAI)

https://lnkd.in/eycRJn6b Transcribe and translate audio offline on your personal computer. Powered by OpenAI's Whisper.

https://lnkd.in/eUrtE3uQ The easiest way to install and use Stable Diffusion on your computer. Does not require technical knowledge, does not require pre-installed software. 1-click install, powerful features, friendly community.


See also, Orca and Falcon models which are also open source. I'm not sure if any frontends support them yet.


Wow, the top comment is neither relevant to the post, nor friendly or interesting. Activism, even with false premises. Many of us tried, and those with a little sense left know that running your local LLM on a non-GPU is not really useful.

Besides, what does your post add to the discussion, and why is it the top posting?

Create your local LLM, use it, tell other people about how you did it exactly, and be happy. But why the heck do you need to fight a company in that space?

Wehre have the times gone when someone motivated to do something nice just went ahead and did it, instead of running in circles and telling everyone else what they should NOT do.


Well in 1929-1930 The stock market crashed and regulating others in a market became needed. Now it’s who can regulate the market best in their interests, like cooperations dominate with bailouts and subsidies. You’re picking a fight with a small actor while OpenAI is spending millions doing the same, fighting for regulating to keep their status quo


If OpenAI is open, the Congo Free State was a free state.


Enshitification comes fast these days.


>There are no license issues like with the facebook llama.

OpenLLaMa uses a dataset which does not seem to have gotten propper commercial licensing for the training data. There is potential licensing issues because the copyright situation is not well defended.


So far as anyone knows, this is not a derivative work, its transformative, and therefore not subject to any licensing requirement.

You're right though, that's arguably still up for debate, but I think the precedent of transformative work is pretty well attested.


is it?

Because condensation is literally part of the definition of derivative, and basically the weights are a condensed form of the input data. It's some sort of lossy compression, when looking at it from the right point of view.

Summarization and translation are also clearly derivative.

The definition of transformative I found:

  - add something new (context of other books I guess, this one might pass)
  - with a further purpose or different character (further purpose clearly yes)
  - do not substitute for the original use of the work (this one I find difficult. In the case of books, probably. In the case of github, it aims to replace quite some aspects of it)


is it compression though? It's a model of the way our language works and contains a set of knowledge.

If I read 200 physics textbooks, I couldn't recall any of them exactly, but I could write another book... that would contain largely the same knowledge.

The same is true of AI, it's not "compression", it's "learning" (by recording statistics about relationships between pieces of information) -- it can't (usually) recreate whole works, it'll make stuff up and alter it and whathaveyou. How much of the sum total of the internet and all books can you compress into an n-gigabyte model? I've got an 8 gigabyte file here that seems remarkably smart, but it can't recite the contents of most books to me, though it knows their famous quotes and some excerpts.

The "compression" argument does seem to have some merit, but it also seems thin in the end and, in my opinion, unlikely to hold up in court, but we'll see.

I do think a very strong, probably winning argument could be made in court that it is transformative, but that's the nature of this whole debate, we don't know yet with certainty.


We must destroy OpenAI!


From the post:

    [Rumors that start to become lawsuits]

    Some speculations are:
    - LibGen (4M+ books)
    - Sci-Hub (80M+ papers)
    - All of GitHub 
This is the most funny, but in the end sad aspect. If ChatGPT was indeed trained on pirated content and is able to be(come) such a powerful tool, then the copyright laws should have been abolished yesterday. If ChatGPT was not trained on all these resources out there, then think how much powerful a tool it would be if it were trained, then copyright laws are actively stifling advancement and should have been abolished yesterday.


That is a very dangerous way to approach copyright laws. They are definitely abused by corporations like Disney, infamously so, but abolishing them is absolutely not the answer. Art makers are already struggling en masse, taking away their ability to earn money off their work isn't an answer, especially if it's just to train a predictive text generator.


Counterpoint: there's way too much "art makers", copyright is keeping the quality ceiling down, good content loses monetization game with spam - all while that "predictive text generator" is, for better or worse, pretty much the most magnificent piece of technology invented this century, and - for better or worse - it's likely to become next major shift to economy and life.


How do you believe copyright is keeping the quality down in particular? As I see it, the reason is marketability and trying to conform to standards. Making a new Marvel Movie is sure to have a payoff. Making Parasite MIGHT have a higher payoff, but the risk is way higher.

Removing copyright might help to get various more spin-offs and reiterations on previous work, which does increase the scope, but taking the ability for makers to profit from their work in a traditional sense, will bring us closer (back) to a time when commissioned artwork was the main art you saw


Letting "makers" (studios) profit from "their" (stolen by unbalanced contracts) work for unreasonable amounts of time has clearly proven to be a failure while crowd-comissioned spaces (patreon etc) are thriving financially AND artistically.


- HN on social media : The powers are too centralized, future is decentralization, question is how

- HN on free software: is good

- HN on copyright : WAY TOO MUCH PEASANTS CLAIMING INDIVIDUAL RIGHTS, RIGHTS THAT ARENT EVEN REAL, ART BE CENTRALIZED FOR MAXIMUM MONOPOLY


Copyright does very little for individuals. Most benefits from the copyright system are accrued to large corporations.


> Most benefits from the copyright system are accrued to large corporations

Citation fucking needed. Among those who study copyright and inequality, none suggest abandoning it [1][2].

Within the context of machine learning, one of the only pillars buttressing individuals against multi-trillion dollar corporations is copyright [3].

[1] https://journals.library.wustl.edu/lawreview/article/id/5108...

[2] https://www.jstor.org/stable/1339714

[3] https://sfstandard.com/2023/07/10/cruise-and-waymos-24-7-san...


To nitpick, the second article is from 1970, back then a concept such as the Amazon Antitrust Paradox [1] would cause burning at the stake in the legal field. Similarly, we also have an Elsevier Antitrust Paradox, a Nature(.com) Antitrust Paradox, and so on.

Just the fact that I cannot freely read an article published 53 years ago is beyond ludicrous. And also, if I pay JSTOR ($19.5/month) and read the 1970 article, do you think the money will go to Stephen Breyer [2], the author? No, 100% of the $19.5/month will go to Ithaka Harbors, parent company of JSTOR [3], that's where the "accrued to large corporations" lies.

[1] 2017, https://www.yalelawjournal.org/note/amazons-antitrust-parado..., Lina M. Khan is the chairwoman of the Federal Trade Commission since 2021.

[2] https://en.wikipedia.org/wiki/Stephen_Breyer

[3] "Ithaka's total revenue was $105 million in 2019, most of it ($79 million) from JSTOR service fees", https://en.wikipedia.org/wiki/Ithaka_Harbors


> the fact that I cannot freely read an article published 53 years ago is beyond ludicrous

And has as much to do with individual versus corporate power as the health of a single tree can speak for a forest.

Nobody is saying the current situation is good or even sustainable. OP just made a big claim for which there is no evidence, despite many looking for it.


"there is no evidence"

As quoted above, "Ithaka's total revenue was $105 million in 2019, most of it ($79 million) from JSTOR service fees". All those $79 million and more JSTOR has stolen from the authors. Although JSTOR should have been dissolved long before this for effectively killing Aaron Swartz. But yes, there is no justice in this world. I suppose that's my main contention, why pretend anymore, we live in a society, yada yada.


This isn’t a cogent argument. You’re identifying troubling behaviour. But it’s not being stitched into anything cohesive. Hiding bad rhetoric behind post-modernist nihilism is in vogue, but unproductive.


Not sure where you picked nihilism in my reply, referencing the "we live in a society" meme I actually smiled. Nevertheless, stopping the pretence today, painful as it is, realizing the state of the world is far from perfect, is the first step for a better tomorrow.


There is no political debate in which increasing the opposition’s nihilism is not advantageous. Nhilihism keeps people from thinking. It keeps people at home, away from voting booths. There is daylight between Panglossian utopia and dystopia. If you’re cynical, one of the most productive uses of your time might be engaging the other side and convincing them civic engagement is worthless.


Again, haven't brought up nihilism or cynicism.

Incidentally, just to give you some more to rummage about, staying away from the voting booth seems like the best thing to do and civic engagement is indeed worthless. Unless you are able to say and vote "No", effectively banning all the "sides" from political action, forcing new sides to emerge. No people will ever be free, representational democracy or not, if they can't even say "No". Not being able to say "No" is also the root of the cynicism.


I'm not sure how that has to do with abolishing copyright, when that mostly has to do with weak and skewed protections current laws offer. They're not allowed to take names off copyrighted text, which is partly why that name is still on that paywalled text. They're not effectively forbidden by law to purchase to become a corporate owner of copyright, which is how they get to paywall it in their name.

If they, in loose usage of the pronoun, were allowed to deface an art, papers, code, anything copyrighted, they will. Stripping copyright text could even become an "etiquette". Conversely, if they were not allowed to profit off of a thing, copyrighted, with appropriate enforcement, they won't. These should go without saying for core Internet users and software engineers.

I mean, haven't you heard what Oracle just said?


I never argued that copyright should be abolished, I was responding to a comment that essentially said that anyone who was opposed to copyright is doing so at the behest of large corporations at the expense of individuals, which is a really flawed and one sided way of looking at copyright laws, the first linked article you provided being a great example.

For the record, I can’t find a single large corporation pushing for the dissolution of copyright, and many clear examples of large corporations lobbying for increased copyright protections and terms. I thinks this puts the onus for providing evidence on those who would argue that corporations are completely failing to act in their own interest.


> I can’t find a single large corporation pushing for the dissolution of copyright

The defendant of every copyright infringement lawsuit.


Hacker News has at least two users!


Already copyright laws do not offer protection for writers (see the current [screen]writers strike [1]), visual artists, musicians (see the recent Taylor Swift re-recording debacle [2]), not to speak of the "lower" arts, crafts & merch, where colossi such as H&M and Zara steal art, designs, concepts regularly [3] (not to even mention the sweatshops).

Copyright laws, or in fact generally laws, are for the rich and powerful, perhaps even for the corporations, not the individuals, since they paid for them to be so [4], it's just us, individually, the hoi polloi, who are still trapped into believing that there is rhyme or reason to our current system.

The joke is that while our system extolls itself as the most efficient system in history, based on competition of equals and free markets (two contradictions in terms contradicting each other), it harbours terrible inefficiencies such as the copyright laws, the patent system, and so on.

[1] https://en.wikipedia.org/wiki/2023_Writers_Guild_of_America_...

[2] https://www.vox.com/culture/22278732/taylor-swift-re-recordi...

[3] https://www.boredpanda.com/zara-stealing-designs-copying-ind...

[4] The Power Corporations Have In Changing Laws, https://www.npr.org/2021/04/02/983925056/the-power-corporati...


Ask someone on the screenwriters' strike if they'd rather copyright was eliminated so they didn't get paid at all...


Most of them already don't get paid, hence the strike [1].

[1] "Writers say they want a living wage as streaming devalues their work even as it demands more of their time.", https://www.indiewire.com/news/business/writers-strike-2023-...


No, they get paid but would, understandably, like to be paid more. Something which would be even less likely to happen if copyright was abolished so anybody could use their output at zero cost.


From what I understand they are not getting paid since the paradigm changed: "the way it's changed is most of these streaming services focus on a metric called ARPU, which is the average revenue per user", "the main difference is it used to be that the incentives were linked. So the writers and the studios were trying to get people to watch the show and get high ratings and get people to pay attention to the show. Today, the streamer is trying to get people to subscribe to their service, so they're looking more at the aggregate of the service versus the individual show." [1]

[1] https://www.npr.org/2023/05/03/1173612099/why-writers-are-ha...


I'm not sure quite why you're continuing to post articles explaining why writers get paid less as streaming services' revenue is less linked to the ratings of new shows as an argument in favour of marking down the value of the intellectual property they create to zero...

My original point still stands. Ask the writers. They're not getting paid nothing yet, and they won't approve of your passionate advocacy of a future in which they are paid nothing.


I'm not quite sure why you are not understanding that with the streaming model of business the writers, and even the actors, no longer get residuals, here is another article [1]: yes, they get paid nothing.

I am advocating for the abolishment of copyright laws, not for not being rewarded for one's work.

[1] Euphoria actress Sydney Sweeney revealed in a recent interview with The Hollywood Reporter that "They don’t pay actors like they used to, and with streamers, you no longer get residuals.", https://nofilmschool.com/streaming-services-residuals


I'm not sure why you are not understanding that not getting residuals is not the same as not getting a salary or a per writing contribution payment. Or that production companies are not paying these salaries (and the rest of the production budget) out of the kindness of their hearts, but out of revenues accrued from networks and streaming services having to pay to screen their shows.

Or that writers will not be paid anything (and certainly not by streaming services) in a copyright-free world in which anybody is allowed to use any creative work in any way free of charge. The abolition of copyright laws is literally the abolition of virtually[1] all existing rewards for creating IP.

This is not complicated stuff: if it's genuinely news to you that writers aren't working for free, perhaps you are not the person to lecture us on how copyright should work.

[1]I guess writers could still ask for donations. They could do this already, but they prefer negotiating with networks for higher pay packages...


If it's genuinely news to you that exploitation is a normal day-to-day part of any industry, perhaps these pointless replies should end here.

Yes, writers have been working for free [1].

[1] "In October 2015, Wil Wheaton created a stir when he declared that he had turned down an offer to write for the Huffington Post. He refused, according to him, because they had declined to pay for his work, in keeping with their policy of reimbursing writers with "exposure" in lieu of payment." https://www.vox.com/2016/2/26/11106006/writing-for-free Guess what writers who aren't famous have to do? Work for free, for "exposure".


Strangely, it is not news to me that exploitation is a normal day-to-day part of any industry .

(This is why I have not made any statements to that effect, never mind done anything as ludicrous as post articles about unionised screenwriters seeking to negotiate a higher pay rate as evidence that "most" of them earned "nothing")

And this is also why I am not advocating a copyright-free world in which HuffPo has the right to sell ads around everything Wil Wheaton ever wrote without paying him a penny or even seeking his permission. Guess what writers whose work isn't copyrightable will be paid in? That's right, "exposure", and not even exposure with much prospect of paid compensation if their work takes off.


Hope at least you are getting paid and you don't deploy such ridiculous amounts of bad faith for free.

My reply from above "Most of them already don't get paid, hence the strike" was specifically in the context of the streaming platforms not paying residuals in the same manner the studios do, hence the reference to the article.

What I find most funny is that you don't even know or care about alternative solutions, you just assume the copyright laws is for the best in the best of all possible worlds [1] and anything else would be chaos.

[1] Tout est pour le mieux dans le meilleur des mondes possibles, https://en.wikipedia.org/wiki/Candide


What I find most funny is that you're implying there are alternative solutions that you do know and care about.

And yet instead of addressing my actual objection - that screenwriters seeking to get paid more for co-creating IP would be unlikely to see abolition of IP protection as a solution - by articulating those alternatives and how it would allow them to get paid more, you chose to assert they don't get paid.[1]

I mean, I'm not the one advocating the radical change here, even though actually I don't "assume that the copyright laws is for the best in the best of all possible worlds". So it's not really incumbent on me to answer my own objections by imagining screenwriter-satisfying solutions involving the abolition of copyright. If you actually had one and were able to advocate it with as much zeal as you have defended the claim "most of them already don't get paid" this might be a more interesting discussion.

[1]doesn't really matter if you were doing so "specifically in the context of streaming platform residuals" since whether or not they get residuals from a particular media type is irrelevant to the fact screenwriters are not in favour of proposals which would entail them losing both the residuals they do get and their job


Ok, I'll bite, we are already at the 6th reply, who cares anymore.

The alternative solution is, shortly put, to quote Geoffrey Hinton: socialism [1]. Longly put: fully automated luxury communism [2].

Once the first two tiers of Maslow's pyramid (physiological and safety needs) are covered by automated systems people will be truly free; some will express themselves in writing, and expecting copyright protection over their work will seem as ridiculous as if one of us today would start charging money from people for using the word "the" [3]. We are currently tasked to give rise to the automated systems.

Sorry, to Gordian problems [4] I have only Gordian solutions.

[1] Geoffrey Hinton - Two Paths to Intelligence, https://youtu.be/rGgGOccMEiY?t=3626

[2] https://www.versobooks.com/en-gb/products/476-fully-automate...

[3] https://www.nbcnews.com/news/us-news/ohio-state-university-o...

[4] https://en.wikipedia.org/wiki/Gordian_Knot


The laws themselves are dangerous and encourages the myth of ideas being akin to somebody's personal property.


since when is struggling artist making use of the copyright system. They are protected by it entirely in theory alone, they aint taking someone to court.


Is this how Terminator begins?


Not sure if you're being sarcastic but I agree. Humans should not have AI research or advanced AI at all. It (a) removes purpose from people, (b) presents a situation that is too alien for human minds to handle, (c) increases the addictiveness of technology and thereby pushes us further into growing the technological system, (d) crosses the "adaptability threshhold", i.e. the point at which the PACE of technological development is so rapid that adaptation of our society to such a pace is NO longer possible.

In short, we do not have the maturity to handle AI; we are playing with fire. Every person who contributes anything to AI development is responsible for the disasters that AI will bring, and all AI research should be destroyed.


It’s ironic you say: “we are playing with fire.” Playing with fire is, in large part, literally how humans have come to dominate this planet. Why stop now?


To turn your metaphor on its head, we aren’t playing with fire when we use it constructively; rather we are very carefully and thoughtfully deploying it, no doubt due to our gradual and deadly lessons with it over time. When we “play” with it (a la fireworks or neglected campfires), it wreaks rampant destruction.

Being we are basically toddlers with this new technology, I would argue the breathless speed at which it’s finding its way into our lives tells me we are not being careful or thoughtful with it.


Counterpoint: "playing with it" is the only way we have to actually master something. "Carefully and thoughtfully deploying it" only comes way after many people first extensively played with it (for any specific "it"), first because of curiosity (i.e. for shits and giggles), then for a quick buck.


Maybe because we're on the verge of being able to create fires which can actually consume the only home we have?

Playing with fire is in large part an ego and greed issue. Yes, it allows us to dominate, but at what cost?

I'd rather live a more balanced life than a greedy and ego driven life. I may not own the world, but I can be happy and sleep sound at night, and that matters.


On the verge? We set that fire in motion a century ago. Our home is nearly consumed.

Today is the hottest day in recorded history.

Yesterday was the hottest day in recorded history.

The day before yesterday was the hottest day in recorded history.

The day before the day before yesterday was the hottest day in recorded history.

The day before the day before the day before yesterday was the hottest day in recorded history.


We had nuclear weapons for almost 80 years and the world still hasn't ended. And I think that nuclear weapons are way more dangerous than Markov chains on steroids.


I can't launch a tactical nuke because somebody wronged me, but can create a disinformation campaign with the tools I have and optionally 2-3 smart, motivated individuals, for free.

Both can be equally devastating.

Or, if I want to go the extra mile, I can use the latter to create motivation for the utilization of the former. e.g. I may say that a country has WMDs, and maybe try to manufacture consent for destruction of these...

Oh, wait a minute...


> can create a disinformation campaign with the tools I have and optionally 2-3 smart, motivated individuals, for free

You can, and it may cause unbelievably nuisance, but not to the devastating outcome of a tactical nuke. Can you prove otherwise? Russian disinformation came close, such as the 2016 election, but that was state sponsored.

> Or, if I want to go the extra mile, I can use the latter to create motivation for the utilization of the former. e.g. I may say that a country has WMDs, and maybe try to manufacture consent for destruction of these...

You cannot. It was still state's action. Not to mention that many countries had their own intelligence that no doubt had different assessment. They weren't blind. They used WMD argument as the excuse to join the US led war.


>e.g. I may say that a country has WMDs, and maybe try to manufacture consent for destruction

too soon


What if Wargames had LLMs involved?


No, playing is how humans grow up to be adults that don't play, but think.


Are you serious? Us dominating the planet is NOT a good thing.


Serious as a heart attack. I hope that we don't have to resort to military force, but it may be the only option: https://futurism.com/ai-expert-bomb-datacenters


This is such a pessimistic view to take that I cannot even begin to describe how you are wrong.


I would love to see your rebuttals, especially since I have never seen any strong arguments in favour of AI being a net benefit to society, and I have thought and read about this at lenght.

Of course, I always expect downvotes on my posts here since there is a strong tendency towards loving technology here. But what I find most interesting is that there is absolutely no taking of responsibility of any technological creations.

Before you downvote, please just ask yourself the following: is it reasonable to say that all these latest AI developments have no serious risks? And in response to your reply, humanity is in quite a precarious state now, so isn't it expected that a sober analysis of it is rather pessimistic, especially with regard to hyper-advanced technologies?


> But what I find most interesting is that there is absolutely no taking of responsibility of any technological creations.

I appreciate your willingness to talk about it, but to be honest it doesn't seem like it matters much what you, or any of us (not singling you out in particular), thinks about it, does it? It probably doesn't even matter who these people are who should take responsibility. This is one genie, like the internet, that probably isn't going back in the bottle anytime soon. I haven't seen any argument against "AI" that's much different than those against "the internet" and "computers" that we've at other times in the last 50 years when tech hit new mind-blowing milestones. It just keeps going regardless, right?


It actually does make a difference. The genie is out of the bottle partially but it depends a lot on what we allow it it be used on. If we sit idly and allow for ingesting all what’s written for instance, including whats currently written and let bros make derivative works for a quick buck then we mostly killed the writer’s incentive to write or publish. If we slow down and not allow ripping one another off it could have the opposite effect and trully lift all the boats at once. There’s a concurent thread about Sarah Silverman suing OpenAI, that’s what Im talking about as well here


You can slow things down, but not by more than a few years, because of the gradual democratization of training foundation models. Right now training a model competitive with chatgpt can be done for $150K (microsoft orca 13b). In a few years the cost will be low enough that individuals can train models. At that point regulating it will require draconian dictatorships.

I’m also very wary of the copyright angle on this, because just like we don’t prohibit people from learning copyrighted materials in their brains, it feels very wrong to regulate how we train digital brains. I’m ok with forbidding the output of copies of individual existing copyrighted works, but we already have laws on the books for that. I find it downright immoral to prohibit the generation of works “in the style of”. That again reeks like the kind of draconian society I don’t want to live in.

People will always be willing to pay for human-made art, just like we pay more for handmade pots, even though machines can make them better, so I think the doomsayers who predict the end of art are flat out wrong. Easy access to mass-generated AI content could be the best thing that happened to true artists, just like chess AI that can beat every human player was the best thing that happened to the chess world. We need labeling laws that show the origin of works so people can choose whether they want artificial or human-made, but please not another extension of the copyright regime to be even more suffocating and hostile of cultural flourishing.


>> You can slow things down, but not by more than a few years, because of the gradual democratization of training foundation models.

Just to be clear, what's being "democratised" is the fine-tuning of second-tier, inferior-performance models; or pre-training of third-tier ones. In the game of training large neural nets, the players that can afford to train the largest models with the most amount of data and compute at any given time will continue to dominate for the foreseeable future.

To make it plain, maybe in a couple of years you'll be able to train GPT-4 on your student laptop (unlikely, but let's allow it for the sake of argument). You'll still not be able to get anywhere near the performance of GPT-6 or whatever OpenAI and Google will be able to train by then.

Academics, hobbyists and smaller companies will continue to play second fiddle to large corporations as long as the dominant paradigm is more data and more compute.


Capabilities of the open-source models are only increasing over time by objective measurement. Yes, every one of them is demonstrably inferior to GPT-4, but we have historical precedent that the cost of compute only ever goes down.

Additionally, assuming the leaked details given here are accurate, there might not be a GPT-6. This entire approach of AI via language models very well could be approaching a local maximum and/or have already reached the point of diminishing returns.

If that is the case, OpenAI's moat is guaranteed to run dry. It should be telling that very few of the improvements over the past few months involve the base model, rather they are value-adds like plug-ins and hooking it up to a VM, things that are not protected by training difficulty.


I don't think there's a problem with the pace of "progress", the problem is what it is feed and we could definitely make changes around that. For example, if I were an author I wouldn't want this stuff to be be feed in without my permission.


> If we sit idly and allow for ingesting all what’s written for instance

It's already happened. What do you do now? For decades, it's happened regardless of the robots.txt so you have to assume it's all been ingested by it all, everywhere. What now?


If we were a rational civilization we'd stop all scientific research immediately. First there's a good chance the great filter is ahead of us and will be triggered by a technology break-trough. Second with nuclear weapons we got lucky in that it's extremely hard to separate fission capable isotope of uranium from mineral ores; if in the future we invent a powerful weapon that's easy to produce organizations like al-Qaeda will destroy every city on Earth.


I can't figure out if this is a rebuttal by absurdity or serious?


I agree with you that we'd stop scientific research immediately, or at least most of it.


The great filter is definitely ahead of us if we stop all scientific research. Just ask the dinosaurs.


>is it reasonable to say that all these latest AI developments have no serious risks?

There are definitely problems today with people abusing ChatGPT for malicious purposes, but given the retardedness of current LLMs I don't think we need to worry about a AI mastermind taking over anytime soon.


The only people pissed are a bunch of developers who want to use it for their own good.

GPT4 costs are ridiculously cheap for the value you get out of it. Any other company wouldn’t even release it to the public like they’ve done


In today's world, "Science equals Capitalism".

Or at least Science is allowed to progress and get funded only as long as it serves the interest of Capitalism.


That's engineering, not science.


The National Science Foundation is the architect behind the whole STEM Pipeline. It’s one in the same


> We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

This is essentially the Capitalist Credo, expressed in practical vs theoretical terms.


> pressure the government into making it

Not really, Capitalism is free-trade between two parties.

The more government involvement you have the more it moves towards socialism or communism where the government controls trade.

At least this was the historical meaning. These days Capitalism is being redefined to mean private (non-government) Communism. That is, power concentrated in the hands of a few.


Capitalism is not free trade between two parties. That's free market. Capitalism is control by capital.


There is no free market without Capitalism. Capitalism is where the private citizens controls the capital. Not the government.

It's not possible to have a free market without competitive markets, price systems, private property, property rights recognition, and voluntary exchange.


??? Bazaars existed for millennia before capitalism arose.

> Capitalism is where the private citizens controls the capital.

Well, that's a tautology, but based on the examples you cite in the next sentence, you're mixing up "property" with "capital". Capital is specifically the means of production. All capital is property, but not all property is capital. A factory filled with machinery is capital; your toothbrush isn't.

> It's not possible to have a free market without competitive markets, price systems, private property, property rights recognition, and voluntary exchange.

Most of these predate capitalism.


the word predates karl marx but he made it popular, and the meaning usually is referring to his use. i think it's great there are a million writers with different ideas and i wish marxists would read more of them instead of more of him and his other cult members. they should try and write a manifesto too. it's easy. write two. then three works of fiction each with a different utopia.


I would recommend reading this entire page: https://plato.stanford.edu/entries/socialism/

But if you don't have the time for that, just read the intro paragraph.


> Capitalism is being redefined to mean private (non-government) Communism

i've heard this called super-capitalism, yeah.


You forgot to link where I can buy your tin foil hat at the end.


> "Open" AI, a charity to benefit us all by pushing and publishing the frontier of scientific knowledge. > Nevermind, fuckers, actually it's just to take your jobs and make a few VCs richer. We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

1. I don't think this is the right place for this kind of content, perhaps find your way back to Twitter or Reddit

2. Have you contributed funds to OpenAI? If not, where did your sense of entitlement come from?

3. What makes you think that any of what OpenAI has produced and provided would be available without funding? I assume the answer to 1 above will be no, so, how do you expect them to build without funds?

and 4. What's stopping you from creating what you thought OpenAI should be? Feel free. Nobody stopping you.


> 2. Have you contributed funds to OpenAI? If not, where did your sense of entitlement come from?

Is your argument that we need to have given a company money before we can be opposed to their unethical business practices? We're talking about a company that wantonly disregarded copyright, broke the DMCA billions of times (proving that law is for completely destroying individuals who want to personally enjoy some media they probably otherwise wouldn't buy, not for corporations who want to hoard the wealth collectively made by all of society), charges its users even when its product is not delivered due to their own server errors, and then goes in front of congress to try to put up a regulatory moat to make competition illegal. Everything they do is based on "the law applies to you, not us." And you're saying that we're entitled for being outraged by this?

> 4. What's stopping you from creating what you thought OpenAI should be? Feel free. Nobody stopping you.

That's literally what Sam Altman is trying to get congress to do. This will literally be illegal if we don't fight back. This is literally what the poster you're arguing against is saying is happening. How many more "literally"s do I need here!?


I think his post is appropriate here, and on point. how about responding to his arguments about algorithm secrecy and the push to regulate competition?


Why the vitriol towards OpenAI?

If Elon hadn't pulled the rug out from under them after they refused his forceful takeover*, they wouldn't have had to go to Microsoft and they'd still be open.

* a takeover which he predicated on the claim that OpenAI was "doomed to fail"


It's several reasons, first it's the lies and the abuse of a charity, that wouldn't be an issue if they had began as a private company instead of robbing a charity.

But secondly, even if they were a private company, it's dishonest and reprehensible to claim to congress that you want to "protect the public" when you really only give a shit about protecting your moat, I'm not happy about that either.

I'm also tired of tech oligarchs general tomfuckery in all our daily lives, as I suspect many more people here are. OpenAI is just particularly egregious about it.

I also think it's my civic duty to let other developers know that OpenAI does not have, by any stretch of the imagination, a stranglehold on this technology or any secret sauce. That's why they're lying and sweating in front of congress.


Yeah, they went from v1 to regulatory capture in the span of months


> regulatory capture

If this leak is correct, regulatory capture is likely the only moat OpenAI could have hoped for. It would explain why Sam was so absolutely adamant that this tech needed to receive oversight.

If correct, every big tech company now has a recipe to build their own GPT-4. I'd expect for the open source efforts to try to duplicate the results as well. LLMs will increase in quality across the board and become fairly interchangeable, leading to thin margins and a race to the bottom.

We could be watching $13 billion going up in flames, all of it predicated on a secret as flimsy as the Coca-Cola recipe.


$13 billion isn't $13 billion if $N billion of it has to mandatorily be spent on high margin (70%?) Azure services.


Tbf the coca-cola recipe is a more sustainable trade secret, since the product is “perfected”.

GPT4 is a work in progress so a competitor can be objectively better.

I’m hoping for open source models to start incorporating some of these ideas.


Just wait until the RLHF class action hits them.


Did you actually read my comment or did you have a preplanned diatribe?

They didn't start the private company until the person who promised them $1B reneged. That person reneged because they tried to forcefully takeover and were rebuked.

They were running out of money and forced to raise funds in a very for-profit way or fold.

-

Edit: Rate limited because the hivemind has decided that it's unacceptable to insult their leader

People keep acting like if it wasn't for OpenAI we'd be in some LLM utopia. The reality is some other big tech giant would reach the current SOTA first and we'd be in the same predicament except with a company with 1000x more machinery to do the things you're complaining about.

It's ridiculous the sense of entitlement some people must have to keep insisting that OpenAI should have crawled into a cave and died because some megalomaniac threw a tantrum.


So fold.

Running out of money to run a charity means they now suddenly need to fuck over the people the charity was supposed to help (that is, everyone on earth) ?

Imagine if a feed the homeless charity accepted private investment and went around installing anti-homeless spikes everywhere after stopping feeding anyone. It's a similar style of behavior.


It's not even remotely the same as anti-homeless spikes.

We've just had the massive fine against 3M for knowing and hiding the risks of PFAS, and the top comments here were "increase the fines! Lock up the bosses!"

Now we have a company going "we had to put a lot of effort into preventing this model from cheerfully outputting Al Quaida propaganda, explicit rape threats, and detailed instructions for an amateur to make deadly chemicals using only home supplies", and despite that effort they still had legal trouble in Italy because the output was unsuitable for minors (as well as the GDPR issues that were the primary headline at the time).

And a lot of researchers with no financial incentives saying "yup, there's danger in these AI".

The reaction here?

Disbelief OpenAI might be saying what they mean, and meaning what they say.

And people parroting "moat!" as if none of the other FAANGs could trivially cross any regulatory barriers that emerge.

It's like the entire topic of AI has been politicised harder than "is nuclear power green?"


> "outputs al qaeda propaganda"

So does Microsoft Word. Both require a human to tell the software what to output.

> as if none of the other FAANGs could trivially cross any regulatory barriers that emerge.

That's the point.

This miracle technology does not belong to a few rich men, it belongs to us all.

Tech oligarchs are provably not more responsible than the rest of us, and do not deserve to lock us out of the garden whose fruit they seek to pick.

A rich man can meet the regulations that allow him to build a robot to take your job, but YOU are not allowed to build the same robot to help boost your own income? Because that might be irresponsible?


> So does Microsoft Word. Both require a human to tell the software what to output.

Clippy does what now?

Or do you mean "I can type", because if so you're minimising the very same capabilities that you're later describing as a miracle and saying belongs to us all.

> Tech oligarchs are provably not more responsible than the rest of us, and do not deserve to lock us out of the garden whose fruit they seek to pick.

"No more responsible than the rest of us" is a dangerously low standard.

The rest of us, collectively rather than each and every one of us, play lotteries, drive dangerously, addict ourselves to drugs, pickle our livers, and win Darwin awards.

For all our sophistication and sophistry, we're all just fancy balding primates with fairly similar tribal attitudes and motivations.

> YOU are not allowed to build the same robot to help boost your own income

Yes, obviously, with literal robots there are countless examples of public liability insurance and health & safety legislation. With computers, likewise, because they're connected to stuff.


Lol, I’ve never heard a single person suggest that we’d be living in some LLM utopia if OpenAI was gone.

Who have you been talking to?


> the abuse of a charity,

A non-profit and charity are two different things.

While a charity is a form of non-profit, it has to follow certain rules to qualify as one. Their profits must go towards the charity.

A non-profit is a company that is set up not to make a profit. It is allowed to make a profit if it does. This is what OpenAI was.

They switched to a "capped" for-profit model so that they could get more funding. It also allowed their employees to invest in the company, and openAI gave equity to their employees.

There was no lies or abuse. Where did you get that information from?


> It is allowed to make a profit if it does.

A nonprofit is subject to the non-distribution constraint: any revenues that exceed expenses must be committed to the organization's purpose, not taken by private parties.

https://en.wikipedia.org/wiki/Nonprofit_organization


The "regulatory capture" conspiracy theory makes no sense to me. It takes 9 figures in cash to create and run one of these super big models. Only big tech was ever going to create them, and big tech is already very experienced at navigating regulation, regulation wasn't ever going to stop them from competing. And in general, our democracy works better than the nihilist libertarians give it credit for.


I bet it'll be 6 figures within 18 months.


The thing about these models is compute scales quadratically with model dimensionality and memory scales quadratically with sequence length.

We are nowhere near diminishing returns for either variable, so sure current models maybe scale quickly but the cutting edge will want as much compute as possible for a long time.

That’s kind of the humor of everyone saying this leak somehow leaves OpenAI vulnerable.

The work isn’t deciding if an MoE is the right architecture or not, it’s how to run 25k GPUs concurrently in a fault tolerant way (likely the true reason for the deep Azure links).


Memory does not scale quadratically with sequence length.


During training, you have to store a dot product of Q and V that has dimension Ncrt^2.

That's quadratic scaling, no?


presumably you mean a dot product of Q and K, and no you do not have to store this: https://arxiv.org/abs/2205.14135


I mean, sure you can work around it, but from your own link:

>since the time and memory complexity of self-attention are quadratic in sequence length


Except in practice this is not true, and hasn't been for more than a year. It's not just a workaround either -- FlashAttention is both faster at runtime and uses less memory.


Even if it was free to train, FAANGs can beat OpenAI on spending to follow regulations.


I interpreted openAIs regulatory capture bid as more of an attempt to create competition-hostile regulation than an attempt to reduce regulation to cut costs.


My point is that openai's competitors have no problem handling regulations.


It was a massive bait and switch, luckily AI isn't powerful enough to take over the world we'd be done for.


This a really interesting spin on reality


I think this commentary was pithier in your head...

https://hackernoon.com/how-openai-transitioned-from-a-nonpro...


A "capped" profit of 100x is absolutely ridiculous. That is not even the slightest attempt to stay in the same ballpark as nonprofit.

Blaming any loss of Elon money for that is spin.

And he still gave them a hundred million.


Not sure what you're on about, both my comment and the article are about how they went for-profit.

They went for-profit because they needed to raise.

They raised after Elon reneged a fraction of the way into his promise of $1B. (and not 100MM deep: he silently revised his figures after TC started digging.)

It's not complicated but you seem to want to make it complicated.


They didn't need to go that far, and they gaslit people about it. Elon is not responsible for either of those.

And was their existing money pile actually not enough?


If it involves Elon being a bad guy, it is certain to have HNers salivating at the thought.


At this point, after attempting to start a literal fight with Zuckerberg, Musk is proposing a penis-measuring contest.

He appears to be decompensating in real time. If that's salivation fodder, so be it, but it just makes me sad. You hate to see it happen... or at least I do.


Maybe Russia has been injecting lead into his water supply lol.

I’m also saddened.

Musk is not really a hero or a villain, but his manic stages have given us our first realistic shot at becoming a spacefaring civilization, and moved the needle big time on the lock that the perro cartels had on the automotive industry vis-a-vis electric cars.

I hope elon gets better. Losing a billionaire tech maximalists manic episodes is going to set us back decades as more reasonable people chase profit instead of dreams.


It feels very ignorant to think you can diagnose someone has having manic episodes when you know nothing about them as a person, their motivations, their mental health history, and base all your opinions on mainstream outrage over tweets that are less dumb than most people’s


I don’t have a diagnosis obviously, I mean manic as a description of his apparent behavior, not as a pathology. Still, he seems to be not doing great if his social media is to be taken at face value (which is dicey at best) so I do hope that he gets better.


He has admitted it.


He's joking but making less than no effort to communicate like someone who owns a large company normally would.


I mean, seemed like an obvious joke to me. I thought it was funny.


[flagged]


Can we please stop trying to diagnose mental health issues when we have no background to do so and don’t actually know the patient


My apologies, however Elon has identified as bipolar publicly.

https://www.dailymail.co.uk/health/article-4746914/Elon-Musk...

As someone who has bipolar as well, I think it’s an important thing to talk about and glad he has. Now that he has, I hope we can too without being shut down. It’s not uncommon in our field but it has a huge stigma attached to it that’s unhelpful to the people with it, or to the people are are affected by a friends, coworkers, or loved ones mania or depression. When I’m under a lot of stress I tend towards mania, especially when it’s “good stress” like an achievement or great new job or something really exciting to work on. Inexorably I get drawn into a pit of despair, especially as I start to realize the impact my mania has had on my relationships and reputation. I have a good network and good self awareness built over 30 years of meditation and Buddhist study, so the impacts are mitigated.

I suspect if people understood bipolar and were willing to discuss and learn, we might understand better what Elon does and why. He’s not bad. He’s just different. Bipolar is considered an dimension of neurodiversity, and like other aspects like autism, is nothing to be ashamed of.


He literally tweeted that he’s maybe not medically bipolar. Don’t take daily mail headlines at face value


Every company who promotes and develops AI is morally responsible for the coming disaster that it will bring on us. If I could have one wish it would be that every trace of AI research is destroyed.


> develops AI is morally responsible

Responsible for an attempt to arrive at the construction of production facilities for a good that seems to be in dire scarcity in today's world: intelligence.

If somebody comes and implements "artificial morons", that is actually out of the root that made the field of research necessary.


I recommend reading comments like this and substituting "a baby" for AI. A baby also can't be aligned and is capable of deciding to destroy the world. It's not gonna do it though.


A baby doesn't output propaganda at 6GB/s in computer readable text.


Not with that attitude! /s


"*The post about GPT-4's architecture had been removed due to a copyright claim.", https://twitter.com/Yampeleg/status/1678582275561103360


> This, of course, is “only” a batch size of 7.5 million tokens per expert due to not every expert seeing all tokens.

> Mixture of Expert Tradeoffs: There are multiple MoE tradeoffs taken: For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation.

Are these experts able to communicate among them in one query? How do they get selected? How do they know who to pass information to?

Would I be able to influence the selection of experts by how I create my questions? For example to ensure that a question about code gets passed directly to an expert in code? I feel silly asking this question, but I honestly have no idea how to interpret this.


You shouldn't take the "mixture of experts" too literally, it's yet another architecture to use internally for a gradient descent optimized graph of ops.

I obviously don't know how GPT-4 do it (or if it even does it) but think of partitioning your network into a couple of very isolated sub-graphs (the "experts"), and add another learnable network between the input tokens and the experts, that learns to route tokens to 1 or more expert sub-graphs. Then the gain is that you can potentially ignore running the unused sub-graphs completely for that token, and you can distribute them on other GPUs as except for the input and output they are independent of each other.

It all depends on the problem, data, and if the gradient descent optimizer can find a way to actually partition the problem usefully using the router and "experts".


Recently I was saying how much amazing stuff there is in retro computing. One thing that keeps coming to mind for me recently is just how visionary Thinking Machines Connection Machine supercomputer architecture was with its massive parallelism built in, with neural network applications being a key predicted use case at the time. That was so long ago!

Interesting to think about in comparison to the challenges today around parallelizing 'commodity' GPUs. Scare quotes because he A100 and H100 are pretty impressive machines in and of themselves.



This a duplicate post of pure speculation.


> The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model.

Whether or not this specific theory is true something along these lines seems like the most likely explanation for the quality degradation that many have noticed; where OpenAI's claims about not changing the model are both technically true and conpletely misleading.


It is a bit problematic if it is being trained on copyrighted textbooks without compensation for the authors. Even for open-source science, I think it is a bit unethical if OpenAI is using public founded research without attribution or compensation. Tax Payers paid for those NIH grants, you know...


Wait til you see all the copyrighted and pirated data most large language models are trained on:

- https://www.washingtonpost.com/technology/interactive/2023/a...

- https://pile.eleuther.ai/ (data hosted by https://the-eye.eu/, where it's not too hard to find pirated, copyrighted books, e.g. https://the-eye.eu/public/Books/cdn.preterhuman.net/texts/li...)


I've previously noticed when playing with GPT-4 it can sometimes 'autocomplete' on different sections of the text its feeding back, sometimes what looks like 4 or more different sections. Might be unrelated but is this MoE in action or them streaming the response in some way?


This is just an issue with their frontend that seems to occur when it encounters \n\n. The actual data coming in only changes at the end of the message.


“Leaked” seems like a strong clickbait claim from whoever wrote this, along with the “it’s over” part….


Leaked is I think an accurate term -- this (or the original post) is fairly new information leaked from openai.


> It is over.

What does this mean?


It's the tweet equivalent of one of those obnoxious YouTube 'reaction' thumbnails.


“It’s over” is the latest meme phrase to describe some kind of defeat.


If this is true, anyone [1] can now build a GPT-4 given training data and budget.

There's no magic here.

[1] That's probably twenty or so orgs right now, which will blow away OpenAI's moat and margins.


presumably the speculation


> Part of this extremely low utilization is due to an absurd number of failures requiring checkpoints that needed to be restarted from.

Hahahaha, the truth of anyone who has worked with quanty types running Python code at scale on a cluster


There's a section at the end where there is speculation on what the entire dataset entails. My guess is a chunk of it is probably from ChatGPT data (or GPT3 data from when training on your requests was opt-out rather than opt-in).


No, this is fake, a light dusting of nothing on top of a meme post that was circulating in grifting communities as early as Q4 2022. It gains a little bit in every retelling, sort of impressive to see its almost blog scale.


No, a meme post from 2022 did not in fact reference papers posted in 2023.

You must be thinking of some other post, or you're just making stuff up.


Like I said, it gains a little bit in every retelling. Why are you aggressively defending unsourced tripe?


well as someone out of the loop - is there a source on the Q4 2022 version then?


Not sure about the Q4 2022 version but there was a post [1] a month ago that also claimed something like 16 MOE that got a lot of attention, and some similar-ish rumors before too with less detail.

So could be either more detail leaking over time or just a random made up post that took root a long time ago continually being retold with slightly more guesstimated detail added own each time to make the poster sound like they're in the know.

Impossible to tell until the actual details are confirmed I guess.

[1] https://news.ycombinator.com/item?id=36413296


That explains why an ad when I tried to click through to the tweet.


I wonder if any open source MOE models are being worked on. Could I run an 8x13B model on my 16GB graphics card, only loading the expert that is needed per run?


So George Hotz was right


These words are nonsense to me. Can someone explain?


GPT-4 is the name of a machine learning (language) model that is the basis of chatGPT. This post speculates about its internals.

https://en.wikipedia.org/wiki/GPT-4


I meant things like:

- parameters

- layers

- "Mixture Of Experts"

- tokens

That's about as far as I made it


Parameters: In the context of AI and language models, parameters refer to the internal settings or variables that an AI model uses to make predictions or generate responses. Think of them as the knobs and switches that can be adjusted to fine-tune how the AI understands and generates language. These parameters are learned during the training process, where the AI model analyzes vast amounts of data to optimize its performance.

Layers: In AI, layers are like stacked building blocks within a neural network, which is the fundamental structure of many AI models. Each layer performs different computations, transforming the input data as it passes through them. Think of a layer as a specific task or filter that the AI model can utilize to understand and process information. The deeper the neural network, the more layers it has, allowing for more complex patterns and representations to be learned.

"Mixture Of Experts": An "MoE" is an approach in AI that combines multiple specialized AI models, known as "experts," to work together on a task. Each expert focuses on a particular subset or aspect of the problem, leveraging their expertise to contribute to the final result. It's like having a team of experts who specialize in different areas collaborating to provide the best solution. By dividing the task and letting each expert handle their niche, the AI model can achieve better overall performance.

Tokens: In the context of AI and language models, tokens are chunks of text that are used as input or output. They can be individual words, characters, or even subwords, depending on how the language model is designed. For example, in the sentence "I love cats," the tokens would be "I," "love," and "cats." Tokens help the AI model understand and process language by breaking it down into manageable units. They allow the model to learn patterns, context, and relationships between words to generate meaningful responses or predictions.

See: https://platform.openai.com/tokenizer


> "Mixture Of Experts": An "MoE" is an approach in AI that combines multiple specialized AI models, known as "experts," (...)

Wonder when that stopped being called just an "ensemble model", which is a term I recall from 10 years ago. Terminology churn?


Mixture of experts is different from ensembles because MoE happens at every layer as opposed to joining the models once at the end


Thanks, that makes sense - and isn't obvious from the explanations I see people give.


Great explanations! How about Multi-Query Attention?


This is the original paper: https://arxiv.org/abs/1911.02150 . The idea is that with a transformer you have many heads, say 64 for LLaMa, and for each head you have 1 "query" vector one "key" vector and one "value" vector per token. Most of the cost of inferencing models is loading the key and value vectors from GPU memory to the GPU itself. the idea behind MQA is that instead of having 64 queries, 64 keys, and 64 values, you have 64 queries, 1 key, and 1 value ("Multi-Query" as opposed to "Multi-Head", the original name). This means that there is much less data to load from GPU memory to the GPU during inference.


Beautiful. Thanks!


Bard taking notes…


[flagged]


Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


Everyone hates on crypto because of all the electricity use for mining. But how much electricity is the training of all the giant LLMs costing us?


LLMs actually have tangible use (granted far from bespoke yet). Crypto mining has no tangible benefit.


Not tangible, but there will always be a theoretical and idealistic appeal for decentralized app/game/protocol hosting and governance. Corporate bodies like OpenAI, Twitter, YouTube, Blizzard Entertainment have made unpopular changes to popular services they own in a centralized fashion. What if LLMs, social media, and MMOs were both open source and hosted and controlled by their users, not just the whims of a single IP owner?

It seems that much like running a co-op or commune the barrier is whether enough people care to take up the fraction of effort and cost it takes to join such an arrangement.


Decentralization is a noble goal and one I admire but we've been told it is coming for over a decade and centralized cloud and financial services are more entrenched than ever.

Even if the social winds did miraculously change, LLMs will never be decentralized for the sole reason that once they're good enough to operate a gun and servo motors every government on the planet is going to lock that shit up fast (assuming massive relentless cyber campaigns somehow don't trigger that response first).


I agree there is benefit. But my complaint is at the apparent overlap in work. I mean you've got several giant tech companies all training models that do mostly the same thing. Sure one might come out on top, but it just seems crazy to train takes millions of cpu hours.


To be fair, it’s still early for crypto. The best use cases of the technology are yet to emerge.


Crypto enthusiasts have been saying that for the past 14 years.


Since 12 years?


Surely they'll find a use case before the sun burns out!


That's quite uncertain, given that Bitcoin is couple scaling rounds from literally sucking the sun dry to power the "proof of work" scheme.


If it was trained on CS textbooks, they weren't very good ones. I asked it (GPT4) to write a quantum computer algorithm to square a number. It very confidently told me that to simplify the problem it would use two bits. Okay, fine. But then the algorithm it (again confidently) implemented did a left shift (which it reminded me was multiplying by 2, so it definitely intended this!) and then add the number to itself. It then wrote that in terms of QC gates. Tada! It took me a half beat to realize that rather than this being some new version of squaring a number that I somehow wasn't aware of, it's completely wrong. It only works on 00! Confronted, of course it did the usual "So sorry... I guess I don't know how to do this." dance. I don't get why anyone thinks that this thing is worth anything at all, except for cheating on creative writing tests.


Damn so you tried once to use it for a thing and it failed? That's crazy, truly a mystery then why so many devs continue to use it daily.


It's arguably the first useful general purpose AI. Claiming it is not worth anything at all because it can't solve a problem that 99.999% of humans would not be able to solve is a pretty ridiculous definition of 'worth'.


You (and everyone else) is missing the point of my post. (I admit to having thrown it poorly.) Forget the QC part; It confidently described an algorithm to square a number, which literally any beginning CS student could do, that didn’t even come close.

I will admit to using it all the time for simple programming tasks, and it occasionally does them correctly. Often it comes close enough that I can fix them. (Interestingly in most of these cases I can’t talk it into fixing itself. It kinda gets into wrong-approach ruts), and sometimes it’s horribly wrong (like here).

I find the horribly wrong cases funny.


99.999% of humans who haven't seen the data.


If you had just spent 90 days observing the equivalent of millions of books what are the odds you could recall even one thing from each of them.


Zero.


What percentile rank among college-educated Americans would that correspond to?

I'd guess that takes it out of the top 0.1%, but not the top 1%.


Call it 100 million college-educated Americans. I don’t think 100,000 people can formulate working quantum computing algorithms in ten seconds. It is probably closer to 0.001%, but those people probably can’t describe 13th century medical technology very well.


Surprisingly you need to be quite adept at 14th century medicine in order to write quantum algorithms


the tricky part is that the rest of that million college-educated folks would answer "I don't know", something LLMs really struggle at


A product that mostly declines to be a product doesn’t seem like it would sell very well.


I must be getting old, because GPT-4 is basically magic compared to what has come before... and just a few months in people are happily dunking on it.


It's an ego problem imo, some folks have a co-dependency problem with being "smart" and now GenAi took away their trophy


jquery is definitely old.

The dunking makes sense to me—“AI is taking our Jobs” is a real concern, so pointing out how bad ChatGPT is at coding on an arguably coding social network is one way to control the narrative.


1. I asked my child how to write a quantum computer algorithm to square a number and they didn't know. It's amazing that anyone thinks children are worth anything at all. I immediately sold mine to be harvested for organs and I suggest everyone else do the same.

2. I looked in The Art of Computer Programming for a quantum computer algorithm to square a number and it didn't have one. If it's a CS textbook it obviously isn't a very good one. In fact it's amazing that anyone thinks Knuth is worth anything at all. I immediately threw my copies in the recycling.

In fact you can divide everything into the set of things which know how to square a number on a quantum computer (let's call that the set of valuable things) and everything else. Everything else can be discarded.


It sounds like you don't know how to use it effectively is all I can see from your post.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: