GPT-4 details leaked?

neonate · on July 11, 2023

CSMastermind · on July 11, 2023

Previously posted about here: https://news.ycombinator.com/item?id=36671588 and here: https://news.ycombinator.com/item?id=36674905

With the original source being: https://www.semianalysis.com/p/gpt-4-architecture-infrastruc...

The twitter guy seems to just be paraphrasing the actual blog post? That's presumably why the tweets are now deleted.

---

The fact that they're using MoE was news to me and very interesting. I'd love to know more details about how they got that to work. Variations in that implementation would explain the fluctuations in the quality of output that people have observed.

I'm still waiting for the release of their vision model which is mentioned here but we still know little about, sans a few demos a few months ago.

doctor_eval · on July 11, 2023

I had to ask GPT what MoE means:

"MoE" in the context of artificial intelligence typically stands for "Mixture of Experts". This is a machine learning technique that is based on the idea of dividing a problem into sub-problems, solving each sub-problem with a specialized "expert" (or model), and then combining their outputs.

ShamelessC · on July 11, 2023

Yep they (would) basically have 8-16 "experts" that are each about the size of GPT-3. Since they each see different batches of the dataset, they learn to model those distributions independently rather than the distribution of the whole dataset. Some of the attention is shared between them however.

Then another "routing model" decides which model is most suitable for the given user prompt.

Given they use relatively few experts, each one is likely similarly capable to the others on many tasks. I assume this make deployment easier and is a "more conservative" less risky approach. Even if the wrong model is chosen by the router, answers should still tend to be somewhat acceptable, for instance.

why_only_15 · on July 11, 2023

This is not how mixture of experts works at all. The experts are chosen on each layer, not for the whole network, and attention is shared between all of them.

ShamelessC · on July 11, 2023

Oh I’m happy to admit if I’m wrong in the details. My bad.

So you’re saying the experts chosen are a more literal mixture of layers from each model? Rather than a simple “pick which model to run”?

ta988 · on July 11, 2023

That's interesting because that's more or less on more level above the multi-head attention.

nazka · on July 15, 2023

Please don’t post assumptions while making them look like you know 100% what you are talking about…

refulgentis · on July 11, 2023

Source?

ShamelessC · on July 11, 2023

Just theorizing from the top-level post here. No clue if it's legitimate.

toxik · on July 11, 2023

To be clear, you just made up MoE details while MoE is actually well established and hails from decades old research?

natpalmer1776 · on July 11, 2023

This is common behavior for inference based learners who don’t hail from strong academic backgrounds. Many developers who are self taught utilize a similar method of learning, essentially using pattern recognition to make “educated guesses” that are then internalized as potential facts and tested at the earliest opportunity. In this instance the test was to project the incorrect information out onto a public forum containing experts and using the presence of contradiction as weak evidence to the validity of their newfound knowledge.

Yes, this is done in lieu of actually looking up extended details on what something means. It has it’s advantages though.

drumttocs8 · on July 14, 2023

Are you saying that this was a form of Socratic questioning- intentionally presenting an incorrect statement in order to obtain the correction?

natpalmer1776 · on July 14, 2023

A bit like that, but without prior knowledge of whether the information is incorrect, rather an intuition that it is correct.

toxik · on July 20, 2023

Sorry but that is just ludicrous. You do not answer a factual question with a mostly made-up answer without saying clearly "this is pure speculation" at some point, preferably early on.

natpalmer1776 · on July 20, 2023

When I was younger (hah, I'm only 26 now) I sure did do exactly what you say. If your statement is intended to say that "People should not" then you're absolutely correct, however it is a learned behavior that some people must adopt after being corrected by their peers.

reaperman · on July 11, 2023

> It has its advantages though.

Seems the advantage is somewhat localized to the individual inference-based learner; it doesn't seem like a pro-social strategy which would optimize benefit to the group. Overall this seems like it would generalize to widespread misinformation if the majority of uses adopted this behavior.

I'm guessing it's in the best interests of the wider group to try to minimize the occurrence of this type of participation.

natpalmer1776 · on July 11, 2023

The advantages in a social setting lie in the introduction of entropy, that is _creativity_, to a community. In a rigorous academic setting and with proper training these individuals are more likely identify links between ideas or information that may not seem obvious at first, and tend to be your more 'eccentric' academics.

For the interests of the wider group, the best outcome is to help these individuals refine their communication to make it clear when information they present is unsubstantiated inference as opposed to verified knowledge.

Once the proper 'rules of engagement' are outlined the contributions of these individuals is an oftentimes useful 'ingredient' to the success of many enterprises.

chaos_emergent · on July 11, 2023

Given that this is an online forum, another advantage is that a conversational trail is left for others to discover. The inferences these types of individuals make are often based on a structure of knowledge and reality that others share, so the most common preconceived and incorrect notions tend to have the most documentation on how to ameliorate the incorrectness (given that these individuals are allowed to state their inferences out loud).

doctor_eval · on July 12, 2023

This had got to be the best thread I’ve ever (inadvertently) started.

refulgentis · on July 12, 2023

Its a good thread, particularly Exuma's comment, but a memento mori from the root node:

My one word "Source?" meant: "You are adding a new episode of fictional info to a discussion about fictional info."

Even in that fictional world the post was wrong. Even if it was right in some allegorical sense, the simplistic allegory adds nothing. "Mixture of experts is like a group of experts where you pick the right expert to ask a question" isn't some hard-won cross-domain self-taught knowledge. It's something a bright 6th grader would pull off.

The self-taught feeling-stuff-out stuff matters when you're making useful connections that get practical results.

When you're just wiring stuff together online, and the wiring together is meaningless, you're doing nothing and taking the consequences of the negative signals

natpalmer1776 · on July 13, 2023

I have quite deeply enjoyed this thread myself. Thanks :)

Exuma · on July 12, 2023

You have clarified something I have always thought about very intensely and deeply but haven’t really ever read anyone else who understands that so well or rather put it into words so clearly.

I’m an inferenced based learner to an extreme and it definitely has many upsides and also downsides. The upsides are being able to learn extremely rapidly by making connections between pieces of information where there’s gaps and then using a sort of heuristic detection like a compass to feel out which gaps need filled in most. Then, I follow that trail down, regardless of how hard or complex it is to the bottom just to the point where it accomplishes what I need (whether it is statistics, machine learning, transaction isolation that I've learned for the 50th time...). Another upside is significant abstract thinking ability, and sometimes it feels like looking at a maze from overhead.

I’ve built over 100+ projects over close to 30,000 hours of programming over like 15 years

The downside is always when I’m around strong people of the other type I get the sense they don’t respect this style of learning sometimes. It comes through in their words, tone, subtle body language cues.

My friend, who is very very much the other type with a PhD in something very hard I don’t remember … algorithms and data structures or something, said it’s because I don’t value domain knowledge. He said if you spent an entire life building say, a database, you would not consider that a life worth lived. I laughed cause that makes me sound like an asshole but the more I thought about it the more it’s clear that I actually agree somehow. As if information on its own as a means to an end is not fulfilling to me. To me it feels like efficiency, creativity and and moving from A to B very rapidly while hierarchically organizing a massive amount of chaotic information is engrained in my DNA but just simply getting the correct, deepest domain knowledge possible is not appealing to me at all. I sort of will go to the depths that’s needed then go elsewhere.

I’m VERY thankful for those people though as that’s where a lot if not most of progress is made.

It has been an internal paradox for most of my life where I can’t figure out if I’m smart or stupid. I have built companies completely on my own on the tech side where one made over 10m and another made over 200m revenue. I’ve been told I built some things entire teams were not capable of doing on another project.., this gives me signals that I’m smart. Then other things like getting an F on this hard programming interview from my first employer who is genius level Harvard graduate academic style domain knowledge style person. It made me feel completely idiotic. There are many other times and situations where I often think I’m wired weird where it “feels” like I’m stupid.

This over the years I’ve accepted this paradox but it wasn’t until this domain knowledge piece or the creativity aspect that made me finally just accept it as ok and not something wrong with me.

The hardest part is VERY often being misunderstood. So much of it that I often have to expend an exhausting amount of time when working with new teams to say “how I think” because like a fortune teller I can always predict what will be misperceived, and even when I say it up front it usually happens anyway. This is why trust is paramount with my business partners. They know I’m extremely eccentric so to speak but they “trust the process” when I lock myself in a room for 30 days and come out with an amazing piece of tech that was built purely on raw intuition.

The other part that often made me feel stupid is despite its upsides this way of thinking often is exhausting because I don’t usually rely on past experiences ti make decisions. Each situation is different. So even if I’ve done something new 30 times I will feel this “stepping into unknown” feeling which takes great willpower and courage to repeatedly, especially when other people are relying on it.

Using this method tho I’ve also built cool things, one recently is the platform I built is the best converting one in the entire industry. On that project there is a massive team on the other side but the platform itself was also just built by me alone without much starting info to go off of other than a few multi hour brain dump calls. One thing my PhD friend I mentioned pointed out is that Feynman was a creative learner I think. It helped me feel better that it’s not a “wrong” way of thinking or stupid way if other people out there that high up might share similar ways of thinking more creatively. Of course it’s not exactly everything I describe to a T, I’m not saying that, but threads of it.

Hopefully none of what I wrote sounds insulting or arrogant to anyone. I fully acknowledge that domain knowledge is what moves world forward in many ways.

theclue · on July 24, 2023

I have established multiple companies, some of which have grown significantly with over 600 employees. For quite some time, I've transitioned from development and mainly held executive roles such as CEO, Chairman, etc. Simultaneously, it's intriguing to note that I've mostly been unsuccessful in securing 'normal' jobs through interviews in the past (Google, McKinsey, Bain, Accenture etc).

I believe this poses a fascinating topic on the way people assess creativity and intelligence in general.

From my perspective, the crux of the issue lies in the inherent difficulty of accurately measuring creativity in comparison to quick problem-solving skills during job interviews. Consequently, it seems that corporations tend to favor the latter.

incidentnormal · on July 12, 2023

@Exuma, this comment is ridiculously resonant with me, the part about 'learning transaction isolation for the 50th time' is very on point too.

Everything you said I pretty much feel the same way. I've accepted it as part of how I work, and the advantages are many (and valued by many) - but yes, interacting with deep experts usually ends with feeling a bit like a fraud. I feel like I maybe was an expert at whatever the thing is at some point in time, momentarily, but then I just shed the information as soon as the next thing needs to be done, and it just ends up as part of the background inference pattern matcher.

Certain things where I'm really forced to learn something deeply do stick, but I find my ways of thinking about that domain to be very different to most 'true' experts, and rely heavily on visual models and analogies with other concepts.

Exuma · on July 12, 2023

haha yes! What mentioned about analogies... I must use like 50 analogies a day. I also noticed I can use phrases like "always" and "never" and I can say them without a second of hesitation, because they are merely indications of magnitude in a predictive sense, not a literal interpretation. But to someone who must understand information deeply, they never use phrases like that because they operate based on observed knowledge and sort of "hypothesis testing" like a scientist.

It's fun to realize other people are out there who can relate. Thanks for your comment

mf2hd · on July 13, 2023

Thanks everyone for this thread :)

Shocka1 · on July 13, 2023

This was very well said and I don't know if you could have said it any better. FWIW, I'm a person with multiple degrees in CS, but the best programmers I've worked with and who get stuff done have zero degrees. I have eight years of hardcore programming experience to include professional and side project stuff - I've learned more actually doing than in any classroom. Yeah it's cool to know what a bubble sort is and how it compares to a merge sort, but knowing all the fine details isn't really needed for actually building things, especially now that we're at the point where an AI can give you the code along with complete instruction.

It sounds like you've done completely fine for yourself and built things that people want, so I would try not to be too hard on yourself.

Exuma · on July 13, 2023

Thank you, I really appreciate it.

growingkittens · on July 14, 2023

We are generalists, as opposed to specialists. Generalists use information from experts across multiple domains. Specialists are the experts who build a particular domain.

As a culture, we look down on generalists - "A jack of all trades is master of none." However, a world full of specialists creates information silos, where experts solve the same problems over and over in isolation. This is where society is at the moment. We need more generalists to navigate these silos.

natpalmer1776 · on July 13, 2023

I really appreciate your response, thank you for sharing your perspective and experiences.

When I read message I can't help but picture you as the storybook 'inventor' who is locked away inside his house with strange colored smoke coming out the chimney & weird noises heard from the street, yet when the doors open the whole town would gather to see what you made.

Exuma · on July 13, 2023

Haha, that would be me! In some ways you are right about strangeness. I actually work lying down... in my bed. It allows my mind to completely dissolve into the code or problem as if I'm weightless. My business partners all joke that "uhhh yes, the next stop on our tour, well.. this is our CTO's office but its actually just a bed in there so we wont go in that room...." This gives me a good laugh every time

ShamelessC · on July 13, 2023

I don’t appreciate the many bad-faith assumptions made. in particular the assertion that I’m not from a “strong academic background”. For what it’s worth, I made a mistake in a public forum, admitting to this twice. I’ve sought accurate versions of my response which no one provided. Nevertheless I have continued to educate myself on the subject and only feel more confident that I wasn’t misinformed to the degree you all indicated.

You may not realize it but not everyone on the internet is nefarious and if you were to speak in this analytical way about say a classmate while they were in ear shot - that person would likely be quite upset.

natpalmer1776 · on July 13, 2023

I take it all back. The comment I responded to appears to be both correct in assertion and implication.

ShamelessC · on July 13, 2023

Okay then. Best of luck with all of that. There’s a forum called LessWrong you would likely be interested in.

natpalmer1776 · on July 13, 2023

I'm not the biggest fan of LessWrong, I do however refer new developers who seem interested to the "Rationality: From AI to Zombies" sequences to help refine conscious development of rational thinking.

Also, I absolutely loved the fanfic "Harry Potter and the Methods of Rationality" written by the forum's creator.

** Edit **

And for the record, my comment wasn't intended to be an accurate depiction of you specifically, which honestly wasn't very effectively conveyed.

It was to highlight a common 'type of person' who make authoritative statements on areas they're not 'experts' in, through no malice on their part, rather as a function of their default mode of behavior.

As others who replied to me highlighted, there are at least two of these people in the world and I wanted others to at least be aware of their existence and point of view; the end goal of this being they might offer others the benefit of the doubt and perhaps some constructive feedback instead of unproductive criticism in similar exchanges.

In essence, I thought other comments were being too hard on you and wanted to point out a potential scenario in which their critiques were at at best unproductive.

jstarfish · on July 11, 2023

> Yes, this is done in lieu of actually looking up extended details on what something means.

If they were capable of understanding the extended details, they would already have an academic background in the subject. Laymen aren't going to have a clue what MoE means even if they went to the trouble of digging up the paper.

> Many developers who are self taught utilize a similar method of learning, essentially using pattern recognition to make “educated guesses” that are then internalized as potential facts and tested at the earliest opportunity.

Using pattern recognition skills to make an educated guess that is internalized as a potential fact sounds an awful lot like what LLMs do. At least when humans do it, we bother with the verification step instead of just acting like we know what we're talking about.

jprd · on July 12, 2023

> If they were capable of understanding the extended details, they would already have an academic background in the subject. Laymen aren't going to have a clue what MoE means even if they went to the trouble of digging up the paper.

This is, hopefully, an accidental thought experiment gone awry. "IF THEY WERE CAPABLE of understanding the extended details, they would already have an academic background in the subject" can and should == "I spent a ton of time in the library", and an follow-up apology for putting "capable" and "academic background" in the same sentence.

The whole friggin' point of this glorified LAN is that we can break down those dumb walled gardens and let kids learn from random BBS textfiles, MIT YT videos and the gathered wisdom of HN.

If you are going to just dismiss auto-didacts, you're going to have to re-write the complete higher education History in Western society. I won't even begin to try and validate how wrong this is for Eastern History as well.

mvkel · on July 12, 2023

> At least when humans do it, we bother with the verification step instead of just acting like we know what we're talking about

We do??

lobocinza · on July 12, 2023

We ask Google (and now ChatGPT) if it's true. It goes round.

rdlecler1 · on July 12, 2023

I believe it’s called ‘hallucination’

ShamelessC · on July 11, 2023

Can you detail what mistakes were made? I’m brushing up on it currently but having trouble grokking it.

drumttocs8 · on July 14, 2023

I would have presented it as a question versus a statement, in that case.

aaron695 · on July 11, 2023

[flagged]

doctor_eval · on July 11, 2023

If the answer is wrong, perhaps you could post a correction so that we are all better off, instead of just insulting me.

Honestly, I've had a fairly rough day, and your answer has made me a bit more upset than perhaps I should be. At least GPT doesn't act like a jerk when I ask it a stupid question.

natpalmer1776 · on July 11, 2023

I think the complaint most folks have for people posting ChatGPT responses is that it adds nothing 'human' to the conversation.

It's sort of like copy and pasting a wikipedia article into a comment, which in itself isn't necessarily wrong. The 'wikipedia' comment however does comes off as an impersonal PSA that also happens to be citing a newfangled encyclopedia that makes an easy target for ire.

If you care to accommodate those who seemingly don't care to accommodate you, try phrasing your messages connect it in some way to the conversation with a couple of sentences about how that information changed your perspective or helped you understand something. That way, rather than making an announcement containing the definition of a word or concept, you're participating in the discourse as an active participant.

kristiandupont · on July 11, 2023

You should consider asking ChatGPT for help in clarifying your point(s).

ShamelessC · on July 11, 2023

FYI, George Hotz has been claiming to know this aspect for a couple of weeks now.

> The fact that they're using MoE was news to me and very interesting.

Maybe adds some legitimacy to the claim.

pas · on July 11, 2023

Interestingly Google was using ~2000 experts back in the first Trasnformer architecture (if I understand correctly) https://www.youtube.com/watch?v=9P_VAMyb-7k&t=6m42s [sparsely-gated mixture of experts layer]

hospitalJail · on July 11, 2023

Yeah the Mixture of Experts might have not been called out by name, but it was pretty obvious you were getting different models depending on the question.

It goes to show how LLMs are nothing like AGI. I think combining it with a calculator is just a bandaid. A useful bandaid, but its not going to be able to do science ever.

og_kalu · on July 11, 2023

Sparse architectures are a way to theoritcally utilize only a small portion of a general models parameters at any given time. All "experts" are trained on the exact same data. They're not experts in the way you seem to think they are and they're certainly not wholly different models. The "experts" work at the token level. An expert for one token could be different from the expert chosen for the very next.

GPT-4 isn't "nothing like AGI" any more than its dense equivalent would be.

htss2013 · on July 11, 2023

I dont see how LLMs using many experts means it's very different from AGI. Why would anyone assume that human AGI isn't based on multiple models running in a similar architecture? At minimum humans are operating with a left and right brain, which process data very differently.

jph00 · on July 11, 2023

The previous posts are to a twitter thread that's been taken down, and the preview of a post that requires a $1000 subscription. This post however is freely available (for now at least).

londons_explore · on July 11, 2023

And the tweeter of the twitter thread paid the $1000, copied the useful info to twitter, and then did a credit card chargeback.

renlo · on July 11, 2023

Seems he summarized it and didn't copy it

londons_explore · on July 11, 2023

A summary isn't allowed under US copyright law. The copyright office calls them "condensations", and they are considered derivative works.

His use was likely not within US copyright law. "Effect of the use upon the potential market for or value of the copyrighted work" is one of four factors a judge should use to decide if fair use applies, and it is clear that publishing the main information from an article, information which is not available elsewhere, freely, severely degrades the market for the original.

krackers · on July 11, 2023

Interesting on a meta point that the more clickbaity title "GPT-4 details leaked" won out over the more dispassionate but drier "GPT-4 Architecture, Infrastructure, Training Dataset, Costs".

behnamoh · on July 11, 2023

Clickbait has its time and place. Despite my hatred towards it, sometimes it's really needed.

swyx · on July 11, 2023

is it needed when you pay for the blogpost and then immediately chargeback the card like this dude did? https://twitter.com/untitled01ipynb/status/16786550120150712...

what a colossal asshole

KaoruAoiShiho · on July 11, 2023

Why is it not okay to summarize? It's clearly transformative and not a copyright violation. Yes asshole but he should be in the clear legally.

H8crilA · on July 11, 2023

It is needed if you want people to click on your content more.

idopmstuff · on July 11, 2023

I think in this case it's a better title in terms of highlighting the relevant info. What's really interesting is the source - these are important details that actually came from within OpenAI. Because the latter title emphasizes the type of info but not where it came from, I'd probably assume that's a blog post with some industry expert speculating about those things.

Aachen · on July 11, 2023

When choosing titles for my own submissions, yeah, the accurate title that HN says they desire gets no votes whatsoever. Any clickbait on here, people bring upon themselves (and this isn't even a clickbait-level title)

delusional · on July 11, 2023

I don't want accurate titles because they'll make me vote for it. I want accurate titles because it helps me determine if I'll read it BEFORE clicking it.

The whole point of accurate titles is that you'll get less votes on uninteresting content.

Aachen · on July 11, 2023

But if the title requires you to click, and then you find out it's uninteresting, why'd you upvote at that point? It shouldn't get your vote at all then, having wasted your time

nicpottier · on July 11, 2023

To be fair the latter has the meat of it behind a paywall.

xeckr · on July 11, 2023

If this is true, then:

1. Training took 21 yottaflops. When was the last time you saw the yotta- prefix for anything?

2. The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

YeGoblynQueenne · on July 11, 2023

>> The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

That really doesn't change anything at all. The more training large models gets cheaper, the more large corporations are able to train larger models than everyone else.

Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

jetrink · on July 11, 2023

At a certain point though, models become good enough for particular tasks. Once that happens for whatever my application is, I don't care if OpenAI has a model that's twice as good on some metric, because it's overkill for my use-case. I'm going to be happy using a smaller, cheaper model from a competitor.

londons_explore · on July 11, 2023

I think we're far from that point though. For the vast majority of use cases, I always wish that the answers could be more accurate.

Sure - they might be 'good enough' to build a business on. But if a competitor builds their business on top of a more accurate model, their product will work better, and they will win the market.

unshavedyak · on July 11, 2023

Yea but the bench being discussed here is FOSS. Which for me, and many, translates to can i run something useful in my closet or on my phone. I've found LLaMA neat and yea, some FOSS models are getting decent - but they're a far cry from GPT4. I pay for GPT4, use it almost daily and that's my bench.

Yes, when i can run GPT4 in my closet, OpenAI will have GPT7 or w/e - but it doesn't change the fact that i have something useful running in my closed network and that opens up all kinds of data integration that i'm unwilling to ship to OpenAI. In that day i'll probably still use GPT7, but i'll _also_ have GPT4 running in my closet and integrating with a ton of things on my local network.

mvkel · on July 12, 2023

My guess is you'll be running GPT4 equivalent in your closet, but with a 4K context window.

Where the big guys will have GPT-who-cares-what-version with a 100K context window.

Context size is as much of a big deal as newer generations of models imo.

mptest · on July 12, 2023

Am I right in my layman's understanding that context windows scaling up requires (mainly) much more compute at run time? Or do longer context models require different/longer training?

kmstout · on July 11, 2023

> their product will work better, and they will win the market

Like Betamax?

thelittleone · on July 11, 2023

One important milestone a model that is good enough to produce an acceptable quality of answer to x% of public users questions without any data being sent to the megacorps.

PoignardAzur · on July 11, 2023

> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

I think a better frame is, if rice got so absolutely cheap to make that anybody could spin up a bag of rice on a demand, anybody whose business model was based on selling rice sacks would be in trouble, especially if their specialty was selling rice in bulk instead of, eg, mom-and-pop restaurants selling cooked rice with flavors and a focus on customer experience.

(Not sure the metaphor is a good fit for AI. Maybe OpenAI comes up with GPT-5 and makes something so powerful that by the time OSS projects get to GPT-4 level nobody cares. But if GPT-5 is only incrementally better than GPT-4, then yeah, they have no moat.)

thifhi · on July 11, 2023

Surely there are diminishing returns for the AI computing though? I mean, is a model with 10x the parameter count 10x better? I think it is still possible that the training costs will be irrelevant for all players at some point with this non-linear scale. Access to data is another story

swalsh · on July 11, 2023

10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.

PoignardAzur · on July 11, 2023

It's not clear. Scaling laws still seem to hold AFAICT.

Right now the bottleneck is "how big a model can you fit on an H100 TPU". It's possible that in a few years, when bigger cards come out and/or we get better at compressing models, we'll get even better models just by increasing the scale.

mvkel · on July 12, 2023

It's still SO early. We are in the "640K [of memory] ought to be enough for anybody" phase of LLMs. So much more to go.

paxys · on July 11, 2023

> if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And all that rice would be useless since you could only eat one cup a day.

The richest person in the world and someone who is solidly middle class both use the exact same iPhone. After a point more dollars doesn't necessarily mean better or more useful technology. If training "good enough" models becomes cheap enough to be achievable by small-time developers then OpenAI/Google/Anthropic etc. will definitely lose some of their edge in the space.

weird-eye-issue · on July 11, 2023

> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And?

UncleEntity · on July 11, 2023

And...

...the market for rice will totally collapse because it would cost more to transport it than the farmer would make by selling it. Feel free to substitute "rice" for whatever commodity which becomes "too cheap to meter".

The "invisible hand" has a tendency to bitchslap people who don't have an even modest understanding of economic principles.

pas · on July 11, 2023

Training data quality and quantity is the bottleneck.

"Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models. This means that we need to source, clean, and filter to around 33TB of text data for a 1T-parameter model." https://lifearchitect.ai/chinchilla/

GPT4 has been trained on images exactly for this reason (it might not have been worth it separately from multi-modality, but together these two advantages seem decisive).

xeckr · on July 11, 2023

>Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

...and billions would be lifted out of poverty, and world hunger would be solved. The rice metaphor doesn't quite apply here.

If the price of GPU training continues to drop at the present rate, then it would be possible to train a GPT-4 level LLM on a $3000 card in 10 years. The ability to run inference on it would come way sooner.

theLiminator · on July 11, 2023

The real moat is an abundance of high quality data.

JimmyRuska · on July 11, 2023

Well open AI raised eye brows by crawling the internet and using everyone's data to make a commercial product

One day some new startup will train on all of libgen and torrent networks, but it will be very hard to prove. You'll keep getting these gaps up in questionable morality and legality, and even openai will complain about playing fair

pas · on July 11, 2023

ThePile already contains some content from a torrent, and there's as lawsuit alleging that Meta has committed copyright infringement by using it.

https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...

why_only_15 · on July 11, 2023

Many people train on libgen/torrent in the form of books3 (e.g. LLaMa does this).

fragmede · on July 11, 2023

Google Classroom, teenager's essays, written by humans, for learning what it means to be human, and graded by humans, is a richer dataset than anything else I can think of that anyone else couldn't get their hands on.

londons_explore · on July 11, 2023

An awful lot of teachers can grade a 10 page essay in about 90 seconds...

Skim read it, mark out some grammar errors, assign it a grade based on the quality of the opening and closing paragraphs.

fragmede · on July 13, 2023

Yup, and they're doing it the whole country over, and putting that data in to Google Classrooms for Bard to know "this is C-grade work" and "this is A-grade work". Knowing what's deemed good and bad writing is where I'm thinking this dataset shines for training LLMs.

baq · on July 11, 2023

Yeah they have the internet from before LLMs were used for anything, so the data is not poisoned. Not unlike carbon dating becoming useless for estimating age of anything made after nuclear atmospheric tests, or low-background steel.

joiqj · on July 11, 2023

You talk as if humans weren't perfectly capable of coming up with nonsense.

Blogs upon blogs full of worthless pap that is there for SEO reasons have existed for like a decade already.

pixl97 · on July 11, 2023

And those blogs took a decade+ to make, and now in another year we'll make that much information again. Then it will be that much information in a month. Then that much pap in a day.

And in the past it was still a million people making that much crap. Now it's a single "entity" making that much crap with it's own style and mistakes.

quickthrower2 · on July 11, 2023

IMO the real moat right now is expertise / smart teams and cash.

hospitalJail · on July 11, 2023

The infrastructure/training libraries already exists. I'm sure you can get people who worked at scale that can figure out how to glue things together.

Reddit, twitter, etc.. raising prices is going to make it more expensive.

quickthrower2 · on July 11, 2023

If you are right then it just becomes who wants to throw the most cash in like a giant game of poker but where you don’t know the pot odds.

classified · on July 11, 2023

... stolen without regard for copyright and licensing.

pas · on July 11, 2023

Fair use!? /s

londons_explore · on July 11, 2023

> The google memo was right about the lack of a moat.

5 months on, and nobody has yet beaten their result quality. I think there is a moat.

Also, I think for many usecases, smarter is better. If a few cents can buy a more accurate answer, then it is always worth paying those few cents. So, while more hardware and more data can train a bigger better model, then that is the moat.

makestuff · on July 11, 2023

The moat is there until someone releases (or leaks) comparable training data.

droopyEyelids · on July 11, 2023

And that gets more difficult every day, as previously accesible sources of data turn off their api.

though google may have something up its sleeve with the corpus of google books! I have been wondering if openAI secretly pulled in scihub or zlibrary to neutralize that potential advantage.

londons_explore · on July 11, 2023

Are there any stats on how many words are in google books, vs how many words are on the open web?

My feeling is that the web has a lot more on it than the total of all libraries - simply because anyone can start a blog, but publishing a book requires quite some commitment.

droopyEyelids · on July 11, 2023

I think you're right but I also think the text in published books would be at least an order of magnitude more valuable than the same length of text from the web

2OEH8eoCRo0 · on July 11, 2023

> great news for open source.

Yes, and great news for shills, bad actors, agitators, trolls, foreign intel, and propagandists. I'm impressed by the tech but terrified because for once I cannot conceive of what this means for the future. My guess is that this kills the open web and laws get passed which bury it.

flangola7 · on July 11, 2023

Everybody is self-soothing with the idea that OpenAI's (frankly, half hearted) push for regulation is just mundane regulatory capture and profit seeking, and not the fact that it will, at best, absolutely destroy everything about the internet and technology that we've come to love and know. Should a 4chan torrent show up like LLaMA, with weights and code for a base GPT4-level model, modern society is done. Golden age over.

netsec_burn · on July 11, 2023

From your perspective, how would modern society be "done" if GPT-4 was generally available? How would it be substantially different from LLaMA?

flangola7 · on July 11, 2023

GPT-4 is far more capable than LLaMA. Just as one area of impact - captchas would become permanently ineffective. If you're experienced in developing captchas and everything they do for us, you know the implications of that alone lead to a very dystopian internet and world.

I like to answer a question with a question: if you sit and think about it, what both unintentional misuses and intentional abuses can you think of? It helps to write down a list of known abilities, then thinking up several "what if..." negative utilities or implications of each, then iterating further to see second, third, fourth order effects.

dragonwriter · on July 11, 2023

> If you're experienced in developing captchas and everything they do for us

What they have done, fairly overtly for a long time, is train AI to defeat captchas.

That this was self-limiting was somewhat obvious.

ccooffee · on July 11, 2023

Hey, captchas also prevent disabled people from using the internet!

TeMPOraL · on July 11, 2023

> The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model.

In other words: the speculation was likely right, I'll propose a specific mechanism explaining it, but then still insult the people bringing it up and keep gaslighting them.

mitchdoogle · on July 12, 2023

Calling something a conspiracy theory is not an insult against anybody. It's a theory because it's unproven and it's a conspiracy because people think OpenAI purposely degraded their own service, hence conspiracy theory.

TeMPOraL · on July 12, 2023

That's a motte-and-bailey defense. Yes, what you say is technically correct with respect to meaning of "conspiracy" and "theory" as individual words. But it's also completely false with respect to what "conspiracy theory" means in actual use - which is to group the subjects (here: people believing GPT-4 quality has been degrading over time, in spite of OpenAI strongly implying otherwise) in the same bucket as flat earthers, vaccine denialists, UFO believers, NWO fearmongers, etc.

Calling the belief "that the new GPT-4 quality had been deteriorated" a "conspiracy theory" goes beyond claiming the belief itself is wrong - it's also claiming that holding this belief implies significantly compromised reasoning skills. That is, it's just a drive-by insult.

shahules · on July 11, 2023

This guy doesn't have any idea what he is talking about. He consistently posts such bullshit on twitter. Mostly copy paste with added spice mix.

mk_stjames · on July 11, 2023

I noted several things that don't seem consistent with what people have been assuming from before.

For instance - MoE yes, but 16 experts at 111B parameters? Doesn't make sense. GPT 3 had 175B parameters. I doubt they would go less on base models from now on. The number that makes more sense is ~220B parameters per model and 8 expert models. That is the same inference cost in total.

The 13T tokens of training data seems pulled from thin air.

mt_ · on July 11, 2023

It's Twitter, why would you think otherwise?

s3p · on July 14, 2023

I'm tired of people just saying 'oh, internet' every time there is a factual inaccuracy somewhere. Yes, we know this is a social media network. Now can we get back to discussing the topic at hand?

potatoman22 · on July 11, 2023

Google has been doing research into mixture of experts for scaling LLMs. Their GLaM model published in 2022 has 1.7 trillion parameters and 64 experts.

https://icml.cc/media/icml-2022/Slides/17378.pdf

behnamoh · on July 11, 2023

Google is jokingly behind in terms of LLMs. They've done a pretty good job at incorporating vision and audio ML models into their ecosystem, but they underestimated language.

chucknthem · on July 11, 2023

How do you know? do you have insider knowledge of this or is it just based on what they share publically?

seanthemon · on July 11, 2023

From what I see in that GPT 3 and 4 was a bit of a rugpull for the industry, now we're all laughing at Google because seemingly they had their hands on the rug for nearly a decade and did nothing - but from the other perspective, maybe they saw the future openai has now brought us and decided against being the pioneers

tiffanyg · on July 11, 2023

Amazingly enough, I think this is a bit of it. Some powerful enough people at Google became concerned about implications, including around "hallucinations", "poisoning", etc., and decided to put this sort of research on something of a backburner - justified, in part, by a lack of some obvious easy interfacing of this with search (scaling, hallucinations, etc.).

Of course, the 'wonderful' thing about humans / "independent agents with survival drives in competitive game-theoretic type scenarios" is: if enough people / "agents" have access / opportunity, someone WILL "push the button".

It's just delicious ... the same kinds of patterns over and over - "oh, we should really do something about X / nobody should have power like X, ... but, there's no stopping it, ... oh well".

And, the "rules" really are subtly, many levels down, in place, to make it apparently impossible to not get trapped, one way or another.

(Anyway... [Cartman voice] Screw you guys, I'm going to my other planet...)

swyx · on July 11, 2023

as a fun ancedote, the Google Bard's implicit code execution update from *last month*, advertised by Sundar... no longer works https://twitter.com/swyx/status/1678495067663925248

i'd love to know whats going on in that team.

H8crilA · on July 11, 2023

Probably safety-driven terror. They really really want to get their bots going, but in every single meeting some PM or other concerned engineer talks about safety and f**s up the entire meeting.

They even made the bot not respond to arithmetics questions because the bot is bad at this, lol. Someone who knows how to modify the bot had actually spent their time on something as unimportant as that.

astrange · on July 11, 2023

Bard being bad at anything else doesn't seem to stop it. It hallucinates at the drop of the hat. Asking it almost any question implying X nonexistent thing exists causes it to make that thing up.

wickedsight · on July 11, 2023

> i'd love to know whats going on in that team.

Seems like a good time to rewatch Silicon Valley and watch Hooli scramble to keep up.

tudorw · on July 11, 2023

quite, some aspects of emergent behaviour are exciting, others must be a painful learning experience.

holoduke · on July 11, 2023

Their translation service is based on llms and is commercially a successful product.

astrange · on July 11, 2023

Transformer models rather than LLMs surely. ChatGPT behaves nothing like Google Translate.

rolisz · on July 11, 2023

The T in ChatGPT stands for Transformers. The similarity between the OG Transformer from 2017 and GPT3 (and other modern LLMs) is pretty big

astrange · on July 11, 2023

The data, size and training process are what's different.

ryneandal · on July 11, 2023

The point is LLMs are built with Transformer architecture. They're all transformers, attention is an integral part of building worthwhile, contextual answers.

astrange · on July 11, 2023

Yeah, but LLMs are more specific, so it's a subset.

Also, LLMs use "in-context learning" ie you actually ask it "hey, translate this" and then it has a conversation with you where you can ask to clarify or provide word definitions.

Google Translate is more hardcoded; a big problem with it is that it can't explain any of the decisions it's made or show its uncertainty about anything.

Of course, neither of these are reliable since they hallucinate.

StackOverlord · on July 12, 2023

I favor ChatGPT over Google Translate now. ChatGPT has the added benefit it is focused on providing helpful answers, so if it comes across something almost untranslatable, it will be able to address that and provide an explanation in the style of a footpage note.

potatoman22 · on July 12, 2023

Maybe, but they're relatively open with their research, which is great. They also made BERT and released it for free.

qaq · on July 11, 2023

Hmm “Sam Altman won't tell you that GPT-4 has 220B parameters and is 16-way mixture model with 8 sets of weights” George Hotz said this in his recent interview with Lex Fridman. It looked like Lex knew this to be true by the way he reacted.

npsomaratna · on July 11, 2023

This is unsubstantiated. The only folks who know exactly how GPT-4 works are employed at OpenAI. The rest of us can only guess.

YetAnotherNick · on July 11, 2023

Even if I just go with Sam Altman's public comment, I would have came to similar conclusion: GPT-4 is big and it is hard to make it is faster.

The secret sauce and moat lies in data though. I have heard rumour that they have paid competitive coders to write and annotate code with information like complexity for them.

astrange · on July 11, 2023

GPT4 can diagram sentences using link grammar parsing (https://www.link.cs.cmu.edu/link/) which is obscure enough I really don't think they've generated data for it. So it can get pretty good without that.

YetAnotherNick · on July 11, 2023

It's obvious they use data from github and other places. I am talking about extra 0.00..1% very high quality data they (likely)created.

mmahemoff · on July 11, 2023

I've been wondering how freemium services like Thread Reader still operate now that Twitter is charging prohibitive prices for API access and taking measures to prevent scraping. The cheapest API plan with read access is $100/month, which reads 10,000 tweets, so could only produce about 500 pages like this one on demand.

errantmind · on July 11, 2023

There was a post on HN recently with a workaround these apps are using. I don't have it handy but I'm sure you can find it if you look.

n1c · on July 11, 2023

There's probably some interesting bits of info in yesterday's Nitter thread: https://news.ycombinator.com/item?id=36665406

xeckr · on July 11, 2023

const puppeteer = require('puppeteer'); and so on and so forth.

RC_ITR · on July 11, 2023

For all the 'I know every number' certainty of this post, there's some weird stuff:

>(Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.)

Why flex both system size and training time to arbitrary numbers?

>For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation. This means parts may sit dormant when other parts are being used. When serving users, this really hurts utilization rates.

Utilization of what? Memory? If you're that worried about inference utilization, then why not just fire up a non-MOE model?

Here's what the post said about MQA:

>Because of that only 1 head is needed and memory capacity can be significantly reduced for the KV cache

This is close but wrong. You only need one Key and Value (KV) head, but you still have the same amount of query heads.

My guess is that this is all a relatively knowledgeable person, using formulas laid out by the 2020 scaling paper and making a fantasy system (with the correct math), based on that.

Put differently, I could probably fake my way through a similar post and be an equal level of close but definitely wrong because I'm way out of my league. That vibe makes me very suspicious.

moconnor · on July 11, 2023

No, the post is correct about MQA. A KV-cache only caches the key and value heads. The point of MQA is that your KV-cache is 1/heads smaller than usual because of this sharing.

Having multiple query heads does not affect the cache size, which is the limiting factor in MHA decoding for both memory capacity and bandwidth reasons.

RC_ITR · on July 11, 2023

>Autoregressive decoder inference is a severe bottleneck for Transformer models due to the memory bandwidth overhead from loading decoder weights and all attention keys and values at every decoding step (Shazeer, 2019; Pope et al., 2022; de Jong et al., 2022). The memory bandwidth from loading keys and values can be sharply reduced through multi-query attention (Shazeer, 2019), which uses multiple query heads but single key and value heads.

Emphasis mine, source here [0]

[0] https://arxiv.org/pdf/2305.13245.pdf

FWIW the original MQA paper is called One Write head is all you need.

Here's the quote from that referencing multiple heads [1]

>We propose a variant called multi-query attention, where the keys and values are shared across all of the different attention "heads", greatly reducing the size of these tensors and hence the memory bandwidth requirements of incremental decoding. We verify experimentally that the resulting models can indeed be much faster to decode, and incur only minor quality degradation from the baseline.

[1]https://arxiv.org/pdf/1911.02150.pdf

PUSH_AX · on July 11, 2023

What is this hyper dramatic nonsense tweet about, “It’s over“? What’s over?

astrange · on July 11, 2023

It's a meme based on quoting this tweet.

https://twitter.com/jebbush/status/929541504187686912

firtoz · on July 11, 2023

The thing, dude, the thing, is over!

sweezyjeezy · on July 11, 2023

The wait to find out what the model is I'm guessing?

dmarchand90 · on July 11, 2023

Can anyone provide an alternative link to https://twitter.com/i/web/status/1678545170508267522

I haven't registered for Twitter since it started and I'd rather not now (though I probably will if it's the only way to get leaked gpt4 training details)

_a9 · on July 11, 2023

Wayback failed to load the subtweets but archive.is has a copy but it seems to stop after around 10 subtweets. The threader link that was posted has it all though.

https://archive.is/Y72Gu

Roark66 · on July 11, 2023

The tweet is gone. What was in it?

Also, I'm dubious about this unsubstantiated claim. The biggest past innovation (training with human feedback) actually shrunk the size of a model. Compare Bloom-366B with falcon-40B (much better). I would be mildly surprised if it turned out Gpt4 has 1.8T parameters. (even if it's a composite model as they say)

The article says they use 16 experts 111B each. So the best thing to assume is probably that each of these experts is basically a fine tuned version of the same initial model for some problem domain.

why_only_15 · on July 11, 2023

As a note the 366B in Bloom-366B refers to the number of tokens, not the number of parameters. Bloom had 176B parameters (still many more than Falcon)

Al0neStar · on July 11, 2023

Maybe 111B is the base GPT-3.5 model.

getmeinrn · on July 11, 2023

>If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million.

If someone legitimate put together a crowd funding effort, I would donate a non-insignificant amount to train an open model. Has it been tried before?

messe · on July 11, 2023

I too would be interesting in donating money toward this.

Given that the price since the original training effort has already dropped to ~$20 million, and that (a) the fundraising will take time, and (b) improvements are being made every day with regard to resource usage, you could probably get away with aiming for a much lower number.

Pulling a number out of my arse, I'd guess that training a comparable model will only cost $1-5 million in 12 months time, with the hardest part of doing so once you have the funds being acquiring the training data.

asynchronous · on July 11, 2023

Not yet, heard tale of several people having the same idea to train an open model though through either crowdfunding or some wizardry with crowdsourcing GPUs.

$65 million sounds pretty high though.

fredoliveira · on July 11, 2023

Considering an effort to buy a copy of the constitution raised almost $47M, I wouldn't be so sure. [^1]

Worth noting, though, that it isn't just the computing budget that's missing here - it is also (and perhaps even more importantly) the high quality data to actually train the model.

[^1]: https://en.wikipedia.org/wiki/ConstitutionDAO

holoduke · on July 11, 2023

Some kind of SETI project, but for training a high number parameter llm would be awesome.

drexlspivey · on July 11, 2023

How many people have A100s at home?

TheRealPomax · on July 11, 2023

The comparison to SETI@home[1] is that you just need the same mount of total processing power, not the same supercomputer setup.

People don't need to own A100s, they just need to be willing to be part of a distributed supercomputer by running a background app that downloads chunks of data, processes them, and sends the result back. The utility comes from having enough people participate (which worked quite well for SETI@home, but helping find "signals from outer space" is a little bit more interesting than "helping train an LLM")

[1] https://setiathome.berkeley.edu/

aussieguy1234 · on July 11, 2023

The fact they are using MoE is interesting. There are alot of specialised open source models on HuggingFace. You just need an LLM to act as the core "brain" and a few other components.

HuggingGPT works similar to this. It automatically chooses, downloads and runs the right "expert" model from HuggingFace https://arxiv.org/abs/2303.17580

potatoman22 · on July 11, 2023

I wonder what the legal implications of them using SciHub and Libgen would be if that's true. I'd imagine OpenAI is big enough to make deals with publishers.

twayt · on July 11, 2023

Libgen / Scihub or not, if the model can provide details about the book other than just high level info like the summary and no explicit deal with the publisher has been made, you can make a strong argument that it is plagiarism.

Even if bits and pieces of the book text are distributed across the internet and you end up picking up portions of the book, you still read the book.

It is extremely sad but ChatGPT will be taken down by the end of this year and replaced by a highly neutered model next year.

capableweb · on July 11, 2023

I'm not a lawyer and obviously we won't get any definite answer unless it actually goes to court, all of this is just hand waving and guessing.

But I think that unless GPT starts reciting large parts outside of the context of learning/education/research, reciting smaller snippets would fall into "fair use" and not be illegal.

CJefferson · on July 11, 2023

For it to be fair use, they still have to have legally owned the book (as far as I understand).

You can't steal a book, photocopy some pages, then claim the photocopied pages are fair use.

roguas · on July 13, 2023

I think you can. It is a separate "crime". You would get 2 cases one for fair use (which if you are quoting, commenting, reviewing, generally repurposing content and it is in fact fair) and second case for license/terms breach and/or illegally obtaining this piece of work(for example if you stolen it from bookstore).

twayt · on July 11, 2023

If you recite enough small snippets, you make a large one.

Especially with ChatGPT you can probe the model by asking certain questions about the material at hand to see if it has seen the entire book.

Also you don’t have to be able to recite the book verbatim for it to have been in your training set. The snippets I am referring to are on the side of the training data

why_only_15 · on July 11, 2023

If I read a book and then write a summary, is that plagiarism? What's the difference? I am legitimately not familiar with copyright law, but real lawyers seem to think it is unclear whether training on copyrighted data is illegal (in Japan it's definitely not).

Fiahil · on July 11, 2023

If that's true, then OpenAI has probably taken extreme protective measure to ensure the secret is well protected. Even if OpenAI is big enough to make deals, they probably did not spend several years making deals with all of them.

It's, however, very interesting to see if they fund efforts to massively (re)start books digitalisation.

msp26 · on July 11, 2023

probably just easier to use drm-free copies of books

langsoul-com · on July 11, 2023

We should default to using the thread aggregators instead of using twitter links. My God Twitter threads are unreadable.

PostOnce · on July 11, 2023

"Open" AI, a charity to benefit us all by pushing and publishing the frontier of scientific knowledge.

Nevermind, fuckers, actually it's just to take your jobs and make a few VCs richer. We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

https://github.com/ggerganov/llama.cpp

https://github.com/openlm-research/open_llama

https://huggingface.co/TheBloke/open-llama-7b-open-instruct-...

https://huggingface.co/TheBloke/open-llama-13b-open-instruct...

You can use the above without paying OpenAI. You don't even need a GPU. There are no license issues like with the facebook llama.

YeGoblynQueenne · on July 11, 2023

>> We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

Just to be clear, there's no science being kept secret because there is no science being done. OpenAI's is a feat of engineering, borne aloft by a huge budget supporting a large team whose expertise lies in tuning neural net systems, and not in doing science.

Machine learning, as it is practiced today, is not science. There is no scientific theory behind it and there is no scientific method applied. There are no scientific questions asked, or attempted to be answered. There is no new knowledge produced other than how to tune systems to beat benchmarks. The standard machine learning paper is a bunch of text and arcane-looking formulae around a glorified leaderboard: a little table with competing systems on one side and arbitrarily chosen benchmark datasets on the other side; and all our results in bold so everyone knows we're winning. That's as much doing science as is racing cool-looking sports cars.

machina_ex_deus · on July 11, 2023

I'm tired of science as a religion. People treat it as some gospel, like if you check out some criterions you're suddenly "scientific" and instantly get a sense of validity and authority that you shouldn't logically get.

I judge things as "what you can do", not "what can you predict". The only demonstration of knowledge and understanding is being able to do something. Not predict. Not "scientific method" and ridiculous "peer review" (actually peer pressure), not blind trials and not rigorous statistical analysis.

In the end of the day, you either manage to do something or you don't. So much of so called science had lost all contact with reality because our judgement of success isn't successfully doing something, it is successfully jumping through "scientific" hoops. Look at string theory and social sciences. The scientific process, instead of being a tool, became the purpose. It became a stamp of validity to seek. A stamp of validity with gatekeepers in the academia, in the peer review process, in the media coverage afterwards all the way to social media censorship and "fact checkers".

What used to be the frontier of creative people became a stagnant beurocratic machine worshipped like a new religion. The side of the heretics burned at the stake became the ones crying out heresy.

Enjoy your new brand of science. I'll stick to the older brand of heretics and mad men which did whatever it was the prevailing orthodoxy told them to avoid doing and thinking, and I'll remind you that the only real reason those are remembered is because they did something useful, not because of the social traditions they adhered to or the rigorous scientific standards they followed.

YeGoblynQueenne · on July 11, 2023

Science is not academia. You say you prefer to judge things as "what you can do". Well, how do you judge "what you can do"? Astrologists, homeopaths, podiatrists, Christian scientists (!!!) and other such "heretics and mad men" rejected by the scientific establishment, will all tell you that they "can do" stuff, and so will all their many paying customers. How do we know they can't do what they say?

Because science gives you the tools to know that you're wrong. If you're a good scientist, you will be wrong _all the time_. That's how science advances: one mistake at a time. But you can't make mistakes if all you ever do is doing stuff with computers, like beating all the benchmarks, because that is a meaningless result judged by its own, self-chosen, measure of success that can never fail; and so can never inform.

Peer review is also not science, but it has been great to catch errors in my papers. Not in conferences, mind. I try to stay away from conferences. Everybody flocks to conferences because of quick turnaround, instant gratification. Journals have the good reviewers who can take their time understanding your work and helping you find where you've gone wrong. "Reject with encouragement to resubmit" is the best review result I ever got.

>> I judge things as "what you can do", not "what can you predict".

The goal of science is not to make predictions, but to understand how the world works, and why. Put that into instrumentalism's pipe and smoke it.

machina_ex_deus · on July 11, 2023

Placebos are known to work. I neither overestimate nor underestimate them. I understand them for what they are: placebos. And some people actually need them.

Moreover, in social contexts, religion also "works" in many senses. A person asked me what me what can he do about depression and feeling of meaninglessness. Science would prescribe anti-depressants, medicate him and he would both have side-effects and the problem would never be cured. I suggested that if he were religious, the meaninglessness would immediately go away and he would need no medication, no external interventions.

So again, judging religion by "what can it do", it can successfully give people meanings. Which is quite non-trivial thing that science fails to do to many people. So religion can be presumed to be acting on some truths regarding what drives and gives people meanings. Religion successfully demonstrated an understanding of that, you don't need statistical analysis or peer review to see that, it's plainly obvious. That knowledge just doesn't carry on to other stuff like how celestial bodies move, and there's no reason to think it does.

I love how you're the spokesperson for "science".

Ironically, when you look at how you describe the merits of peer review, you sound quite like an instrumentalist yourself.

Also quite ironically, the new science religion keep on pushing this funny narrative that science is about "being wrong all the time" which is like the complete opposite of scientific history. All of the established science is just the successes. Look at the great physicists. There is no mistake in Newton's laws. Or in Einstein's. They are just as valid today as they have been back then, in the areas where they were demonstrated and studied.

How did we reach this apologetic science? "good scientist will be wrong all the time"? That sounds like gaslighting scientists in a failed system rather than accurate historical statement. Science advances because someone eventually succeeds. In the grand scheme of things, there are no mistakes. There are corrections, there are expanding the domain of applicability. But the mistakes are sent to the trashbin of history, and if a failed system decided to gaslight you into rationalizing failure, stop listening to it and start seeking success.

YeGoblynQueenne · on July 11, 2023

Sorry but I don't want to continue this discussion if you're accusing me of gaslighting others, and of being gaslit myself. Or of pretending to be a "spokesperson for science". I was hoping to have an honourable exchange.

optimalsolver · on July 11, 2023

>There is no mistake in Newton's laws. Or in Einstein's. They are just as valid today as they have been back then, in the areas where they were demonstrated and studied.

A model being incomplete is just another way of saying it's wrong.

The models they came up with are wrong, but still useful.

mistercheph · on July 11, 2023

And whence the tools to judge that

> science gives you the tools to know that you're wrong.

TheRealPomax · on July 11, 2023

> In the end of the day, you either manage to do something or you don't.

The hard sciences would like a word.

maigret · on July 11, 2023

Taleb wrote well about that, changed my view on science. Not that science should be discounted, but it’s not the single source of truth. https://twitter.com/nntaleb/status/1419843561286160397

lorepieri · on July 11, 2023

This is the reason why I left academia for startups... Seems like a better way (not perfect by any means) to innovate, actually doing things.

PS: I would be happy to connect, you can find my socials in my bio.

dcow · on July 11, 2023

You would love Thomas Kuhn.

JKCalhoun · on July 11, 2023

Science as a rear-view mirror.

hyperthesis · on July 11, 2023

Just add epicycles.

sdenton4 · on July 11, 2023

Add enough epicycles and you've got a Fourier transform... Epicycles were a great idea but applied for the wrong reason.

mistercheph · on July 11, 2023

Compressing millenia of astronomical observations into the compound of three or four sinusoidal waves whose formulae could be carried and calculated in a pocket book?

hyperthesis · on July 11, 2023

Oh, they worked really well; they were still used for numerical calculation even after Galileo. But they weren't a "demonstration of knowledge and understanding" of planetary motion.

sdenton4 · on July 11, 2023

Understanding is a moving target, though. Newtonian mechanics was incorrect for modeling the solar system as well - as it was eventually superseded - but that doesn't mean it wasn't a scientific understanding of planetary motion.

Epicycles gave a better description of the movement of the planets, based on the observations available, which was entirely falsifiable. They were eventually superseded, and that's science at work.

hyperthesis · on July 11, 2023

Interesting to say it's scientific because falsifiable. The objection is that it wasn't a theory, just fitting a function to data. It did "work" in that it captured some pattern: it was extremely good at generalizing/extrapolating/predicting. And was a "model" of something in the data. But there was no operational model behind it, of what was actually happening.

Newtonian mechanics has a model, beyond curve fitting.

BTW: I made this analogy between LLM and epicycles as a joke, but it's looking strangely isomorphic...

It's funny, because epicycles used to be the poster-child for ad hoc, overcomplex models with no conceptual basis... which is an exact match for LLM... but I don't recall the analogy being made. Not even in https://norvig.com/chomsky.html

sdenton4 · on July 11, 2023

This is incorrect: Epicycles were a model of planetary motion - the theory was that the planets moved around the earth, but also had additional circular motion as they moved along their path around the earth. This model explains the apparent geocentric motion of the planets, much as Newtonian gravity explains the apparent heliocentric motion of the planets (but is also wrong). Finding the exact parameters for the epicycles was the curve fitting part.

We now discount that model because it's based on an incorrect geocentric model of the solar system, but that doesn't mean that the model wasn't a model...

[edit] The CHomsky link is interesting - I just listened to an interview where he pooh-poohs LLMs at great length.

This point: "Statistical models have been proven incapable of learning language; therefore language must be innate, so why are these statistical modelers wasting their time on the wrong enterprise?" is interesting in that context - in the recent interview, Chomsky is now unhappy that LLMs can learn /any/ language, even unnatural ones, and therefore aren't good tools for understanding human language. Quite the reversal. I personally think it's a 'science progresses one funeral at a time' kind of situation...

YeGoblynQueenne · on July 11, 2023

At some point I'd like to carefully study the history of that early era of science because I don't know it as well as I'd like. But I believe I understand that the epicyclical model (and it was a model, rather than a theory) did not in any way depend on geocentrism. For one thing, Coppernicus' model itself, while heliocentric, retained the epicycles of the earlier, geocentric model. Instead, the assumption on which the epicyclical model depended was the shape of the planets' orbits and of the planets themselves, which were considered to be necessarily circular, and spherical, respectively. I think this had to do with assumptions about the geometric perfection of the universe, as a creation of the gods. In any case, assuming that planetary orbits were circular an explanation was needed for the apparent "retrograde" motion of the planets (meaning it looks like they double back and turn against their original heading). Explaining this apparent motion was why epicycles were hypothesised in the first place.

The first time this necessarily circular model was abandoned was with Kepler's laws of planetary motion, which correctly identified the motion of the planets as elliptical, that for the first time explained their apparent retrograde motion without the need for epicycles. Then Newton's theory of universal gravitation explained how the planets could possibly be moving on elliptical orbits. In fact, I believe Newton's theory of universal gravitation explained how the planets could be turning around the sun without crashing down, despite not having anything to hold them up. I think this was the first big mystery that the ancients tried to answer- hence the name of "firmament" for the universe.

And Newton's theory was not the end of the story of course.

foobarqux · on July 12, 2023

Chomsky hasn't made any reversal, Norvig misrepresents what Chomsky has said.

> Chomsky is now unhappy that LLMs can learn /any/ language, even unnatural ones, and therefore aren't good tools for understanding human language

That's also not what he has said: He said they aren't useful for understanding the human language faculty, in other words, understanding how people are able to have language. As he says it obviously can't be the same way as LLMs because LLMs are able to learn languages humans can't learn.

hyperthesis · on July 12, 2023

I realize I've been repeating shibboleths from my postgrad without full understanding.

A problem with epicycles is they are harder to falsify. If the model doesn't match new observations, just adjust it, or add another epicycle. In contrast, Newtonian gravity can hardly be tweaked at all. So when Mercury's orbit was slightly off, they knew something was wrong.

I'm not quite clear on how I feel about this. Geocentricity is a theory, in broad terms. It seems disrespectful to say adding epicycles makes it "not a theory". As you say, there was a theory that the planets actually moved in epicyclic motion. It wasn't just calculation to them.

(I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)

Yet, having more parameters than data means the model doesn't explain in simpler terms, only restates. In this sense, it's "not a theory" (by Occam's razor). It seems enough epicycles can model anything: (3Blue1Brown Fourier Series) https://youtube.com/watch?v=bL0LV0Huj1s OTOH the epicycles did predict planetary motion, so they did capture some regularity... not sure what to think.

RE Chomsky: You can see it's like epicycles: with enough parameters, an LLM is like a numerical method for curve fitting, that doesn't explain the data (any more than a fourier transform does). Curiously, they do seem to predict very accurately... yet also generalize strangely ("hallucinate"). What to think?

But it seems it's got to help! Even if only as a device, like a telescope. Also, from this interview with Terry Sejnowski (https://youtube.com/watch?v=XKC-4Tosdd8 3 hours!) there's instances where a technique was developed for a problem with Neural Nets, and an equivalent was found in the brain.

He also gives a Chomsky-like view: if you duplicated the human brain and it worked perfectly, you wouldn't have done any science if you didn't understand anything.

BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.

foobarqux · on July 12, 2023

> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.

Do you know where Chomsky says this exactly? I've been looking for how/if his argument is the same as Gold's.

YeGoblynQueenne · on July 12, 2023

>> RE Chomsky: You can see it's like epicycles: with enough parameters, an LLM is like a numerical method for curve fitting, that doesn't explain the data (any more than a fourier transform does). Curiously, they do seem to predict very accurately... yet also generalize strangely ("hallucinate"). What to think?

Well, that's the fundamental problem of modelling: that for any set of observations there's an arbitrary number of models that fit the data with great accuracy and even predict future observations well; and we don't know which one is the best in the long term.

The answer is that we should prefer not predictive models, but explanatory theories, that not only predict future observations but also explain why those observations should be expected to be made.

For example, the epicyclical model did not explain anything: it said nothing about why the planets should move on circular orbits with epicycles. Kepler's laws didn't explain anything because they didn't say why the planets should move on ellpitical orbits. Newton's law of universal gravitation explained it all in one stroke: because gravity. And that's why we consider Newton the greatest scientist of his era, not Kepler, not Coppernicus, not Gallileo, but Newton, because he explained the world and didn't just describe it.

Ultimately the advantage is, like you say, that when an explanatory theory fails, we can better know why. When a predictive model fails, we have no clue.

>> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.

Gold's negative learnability result was a huge upheaval that led directly to the current paradigm of machine learning. Chomsky used it to support his argument about the poverty of the stimulous but linguistics was only one of the two fields that Gold's result turned upside down.

And it was a negative result. As I say in another comment, science gives you the tools to know when you're wrong and that's how progress is made, when we find out where we were wrong before.

With epicycles, it took almost two thousand years before we figured out where the model was wrong. Let's hope that it doesn't take that long with LLMs and neural nets also, because I doubt we have another couple thousand years to spare on a wild goose chase.

>> (I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)

The epicyclical model persisted for so long because it was so good, and because there was nothing better. It is common for people who don't understand science to look at scientists of the past with derision and think they weren't even scientists, but for almost two thousand years, astronomers did exactly what a scientist must do: they accepted the best available theory, even if many of them hated it with a burning passion (and they did!). If it wasn't for the ancients stumbling and fumbling in the dark for millennia, we wouldn't today be enlightened and we owe them every respect.