Hacker News new | past | comments | ask | show | jobs | submit login

The only thing it shows is that the author has a lot to learn.



Of course I do, who doesn't? Can you pick 1 thing I need to focus on most so I can get some value out of this comment? :D


You can't "Demystify LLMs for people" if you don't know how it works.

And it definitely isn't a Marcov model (https://en.wikipedia.org/wiki/Hidden_Markov_model) like you implied it is.

As far as the "emergent" apparent intelligence of SoTA LLM goes, it's not just counting the number of words that appear after a phrase. I don't think anyone has a good explanation of how it works so far, only that empirically a bunch of techniques combined together seems to do the trick and LLMs do seem to acquire intelligence far beyond what a Marcov model could acquire after the training process.

Therefore, see eg. https://www.reddit.com/r/LocalLLaMA/comments/1bgh9h4/the_tru...

A joke, but it's the truth as we know it.

> Of course I do, who doesn't?

Implying you (who apparently just watched a single video) are as ignorant as people who has read extensively on the subject (not me, but presumably many others here) isn't helpful. I mean, there's nothing stopping you from believing you know enough to explain what LLMs are to other people, but the GP is honestly telling you what you need to hear IMHO.


I never said or implied it's a Markov model, someone else did in this thread, and someone parroted them. A Markov chain is a much looser concept than next token prediction, and would apply to any finite state machine where you can predict the next state. The React state management library XState is a Markov chain. Has nothing to do with language at all.

I don't appreciate you calling me ignorant, but I think the source of your projection is this phrase: "I don't think anyone has a good explanation of how it works so far" which translates to you don't know the basics. I get it, makes you mad for some reason lol But maybe it's you that needs to read up on what a language model actually is - what next token prediction means, and how it applies to LLMs you know and love. It really is not rocket science :)

Thanks for your comment!


hnfong was referring to emergent abilities, not to "the basics" – the definition of emergent abilities as they apply to language models is that they are not present in smaller models, only in large ones.

That also implicitly defines that a model without "emergence" would not be large.

If you happen to know exactly why this happens, I'll definitely read your paper!


Prepare to be underwhelmed - no need for a paper for this one: The only emergent ability of an LLM is that the model is more robust when there are more samples. When the number of samples is in the trillions instead of thousands, there are a lot more complete concepts available for it to match to your query.

The so-called "emergent ability" of a chat-focused LLM is its accuracy, which is only possible with enough sample data, and also only possible if it's good at matching a query to the right samples, and wording a response in a way that's pleasing to the end user - something even the mainstream LLMs struggle to do well.

IME, pretty much only GPT-4 and Mistral are even that good. Most of them aren't that good at anything yet, and half the time don't follow what I ask at all. Being a very large model is only part of what makes it good.

The sad state of Hacker News is not how many people conflate "LLM" with "chatbot" but that an academic paper holds more weight than a library you can npm install right now and try. Seems we lost our way if abstract appeals to authority hold more weight than evidence before your eyes.

Thanks for your comment! I do appreciate the discussion.


Love doesn't always come in the form of flattery or praise, and if you took that as an assault or even an insult (which TBH it wasn't) instead of pointing out the limits of your knowledge, then that's a shame.

I mean, it's pointless to compare the size of our intellectual capabilities on the internet since we don't know each other, but perhaps I was wrong for a second that you might have actually wanted to "get some value" of the comment of "author has a lot to learn".


What is love? Baby, don't hurt me Don't hurt me no more

xD

Typically falsely accusing then calling a person ignorant is somewhat insulting! But now that you said it was in the name of love I'm tearing up (in a good way).

And you're right - what was I thinking? I'm in the presence of a literal genius here. THANK YOU for pointing out the "limits of my knowledge", sometimes it takes a really smart person like you to point it out, even though you got the wrong guy (Markov) and have yet to make a single point about LLMs or next-token prediction, beyond:

"Nobody knows how LLMs work" and a link to a Reddit meme. I can't be too far behind your advanced state of intellectual genius with zingers like these.

Finally, it may not seem like it, but I have derived value from your comments, just not any realizations you'd likely care to hear :)


^ This comment being downvoted, as innocent as it is, makes HN basically Reddit, the guy called me ignorant for sharing an auto-completion library trained on a lot of data, I deal with the immature comment humorously, kindly really, and it's downvoted lol whatever /thumbsdown Hacker News get rid of these people, promote tinkering and intelligence again


I would suggest first removing the reference to LLM in the description:

"build fast LLMs from scratch!"


Why? If I build a chat bot from it would it seem more like an LLM to you?


You're cargo culting "LLM".


You don't know what an LLM is - there's a difference.


You should start from chapter 1.


You've been coming back to this post for 3 days to leave more negative comments, how sad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: