Hacker News new | past | comments | ask | show | jobs | submit | more jxjnskkzxxhx's comments login

Nothing means anything then.

In one of karpathys videos he said that he was a bit suspicious that the models that score the highest in LMarena aren't the ones that people use the most to solve actual day to day problems.

Wild read, thanks for sharing.


I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".

Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.


LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). So they don't learn probabilities, they learn how to continue the sentence with a given token. We apply softmax at the logits for mathematical reasons, and it is natural/simpler to think in terms of probabilities, but that's not what happens, nor the neural networks they are composed of is just able to approximate probabilistic functions. This "next token" probability is the source of a lot misunderstanding. It's much better to imagine the logits as "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." and so forth. Now there are evidences, too, that in the activations producing a given token the LLM already has an idea about how most of the sentence is going to continue.

Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.

It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.


> LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1).

But enough data implies probabilities. Consider 2 sentences:

"For breakfast I had oats"

"For breakfast I had eggs"

Training on this data, how do you complete "For breakfast I had..."?

There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"


So it is still largely, probabilities pattern matching?


You can model the whole universe with probabilities!

I don't think the difference is material, between "they learn probabilities" Vs "they learn how they want a sentence to continue". Seems like an implementation detail to me. In fact, you can add a temperature, set it to zero, and you become deterministic, so no probabilities anywhere. The fact is, they learn from examples of sequences and are very good at finding patterns in those sequences, to a point that they "sound human".

But the point of my response was just that I find it an extremely surprising how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human, and I'm suspicious of anyone who pretends this isn't incredible. Can we agree on this?


I don't find anything surprising about that. What humans generally see of each other is little more than outer shells that are made out of sequenced linguistic patterns. They generally find that completely sufficient.

(All things considered, you may be right to be suspicious of me.)


Nah, to me you're just an average person on the internet. If the recent developments don't surprise you, I just chalk it up to lack of curiosity. I'm well aware that people like you exist, most people are like that in fact. My comment was referring to experts specifically.


>how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human

What surprises me is the assumption that there's more than "find patterns in sequences" to "sounding human" i.e. to emitting human-like communication patterns. What else could there be to it? It's a tautology.

>If the recent developments don't surprise you, I just chalk it up to lack of curiosity.

Recent developments don't surprise me in the least. I am, however, curious enough to be absolutely terrified by them. For one, behind the human-shaped communication sequences there could previously be assumed to be an actual human.


Just for anyone reading this who isn't sure, much like an LLM this is confident-sounding nonsense.


I don't understand. Deterministic and stochastic have very specific meanings. The statement: "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." sounds very much like a probability distribution.


If you really want to think at it as a probability, think at it as "the probability to express correctly the sentence/idea that was modeled in the activations of the model for that token". Which is totally different than "the probability that this sentence continues in a given way", as the latter is like "how in general this sentence continues", but instead the model picks tokens based on what it is modeling in the latent space.


That's not quite how auto-regressive models are trained (the expression of "ideas" bit). There is no notion of "ideas." Words are not defined like we humans do, they're only related.

And on the latent space bit, it's also true for classical models, and the basic idea behind any pattern recognition or dimensionality reduction. That doesn't mean it's necessarily "getting the right idea."

Again, I don't want to "think of it as a probability." I'm saying what you're describing is a probability distribution. Do you have a citation for "probability to express correctly the sentence/idea" bit? Because just having a latent space is no implication of representing an idea.


There are some that would describe LLMs as next word predictors, akin to having a bag of magnetic words, where you put your hand in, rummage around, and just pick a next word and put it on the fridge and eventually form sentences. It's "just" predicting the next word, so as an analogy as to how they work, that seems reasonable. The thing is, when that bag consists of a dozen bags-in-bags, like Russian nesting dolls, and the "bag" has a hundred million words in it, the analogy stops being a useful description. It's like describing humans as multicellular organisms. It's an accurate description of what a human is, but somewhere between a simple hydra with 100,000 cells and a human with 3 trillion cells, intelligence arises. Describing humans as merely multicellular organisms and using hydra as your point of reference isn't going to get you very far.


Sports teams? Who cares?


this is just one of the categories. There are other ones available and tomorrow's one won't be about sports :D


I will never understand responses like this.


Speaking of formatting, one thing I've always found weird is the amount of time and effort people decide to fonts. They discuss things like readability etc. They're all readable! It's not that interesting!


Sounds like you're looking for Iosevka!

https://github.com/be5invis/Iosevka


I switched to this as the narrow version was best working on a 12" monitor


Im tired of the line "computer go brr". Is that the best you can explain yourself?


There are valuable things out there other than "physical goods".


Well, how are going to charge for them?

For example, how to deal with Steam? most of the games I buy there (Baldurs Gate 3 for e.g.) are not made in the USA.


And USA get 30% (or 20% depending on the studio) of the price you paid. Do you use Android, windows or Apple, or any licensed software?


> any physical USA goods

Do you see the word "physical" in my original comment? And me saying that I don't buy such things?

I will admit to buying Intel CPUs, but aren't they manufactured in the Far East?


I am from EU and the only physical idem proudly wearing "Made in USA" label I saw over the years is the Urimat pad in public toilets.

P.S. Upon inspecting my memory, I think the other one was an HP cesium clock, but that was many years ago.


HP oscilloscopes were pretty good too, back when . But I haven't been there for many years, and I guess you can get good, cheap, LCD ones today for much less money.

And rotating rusty hard drives, I suppose. But still all gone.

And don't get me started on American urinals!


I responded specifically in your steam comment, not the original one, because i did not thing you original comment was wrong.


Intel have fabs in the US, Israel and Asia. So it depends.


At this point, capital controls might be back on the menu?


The tariffs are only about physical goods though, so I'm not sure how this point is relevant?


Yes, it's called pulling the ladder up behind you. I don't think "he was a hacker" mitigates anything whatsoever.


Meh. People also invent justifications after the fact.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: