How do transformers work?

wojciem · 2024-02-04T21:12:26 1707081146

Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”.

I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation

kgeist · 2024-02-04T21:21:02 1707081662

There's also this type of articles where the first half of the article is easily understandable by a layman but then they suddenly drop a lot of jargon and math formulas and you get completely lost.

Lerc · 2024-02-04T21:55:24 1707083724

A friend once described these kin of descriptions by analogy with a recipie that went;

Recipie for buns, First you need flour, this is a white fined grained powder that is produced from ground wheat that can be acquired by exchanging for money (a standardised convention for storing value) at a store which contains many such products. When mixed with the raising agent and other ingredients you should remove the buns from the oven when golden brown.

jeremiahbuckley · 2024-02-05T02:18:17 1707099497

For this situation, if it feels worth it, I have been applying chatGPT Q&A on the jargon to bridge the gap. I haven’t read this article through yet, so can’t recommend, but in many cases it’s a super useful contextual jargon clearer.

Lerc · 2024-02-04T21:47:22 1707083242

Agreed, I have made my own shakespeare babbler following Karpathy's videos. I have a decent understanding of the structure and process but I don't really grasp how they work.

It's obvious how the error reduces, but I feel like there's something semanticly going on that isn't directly expressed in the code.

Geisterde · 2024-02-05T01:51:48 1707097908

Im saving the latter half for tomorrow but so far its making sense. People have different learning styles, and I think this is lacking in the visual department. Parts like the vectors all being displayed next to the word like "cat", could have been better annotated to show where those numbers come from visually.

3abiton · 2024-02-05T02:26:31 1707099991

Super data science had a nice episode on this recently.

mccrory · 2024-02-04T20:12:56 1707077576

Does this apply to both types of transformers? Decepticons and Autobots?

sbarre · 2024-02-04T20:20:12 1707078012

I would actually love to see an article on the design process for transforming toys like Transformers.

It seems like such a cool design feat to come up with those. At least I thought so as a kid...

mccrory · 2024-02-05T06:02:05 1707112925

Sign me up for that, I made my own out of legos as a kid. I managed to make two of them that looked like robots and vehicles. Using just hinges and the small squares with rotating centers.

sbarre · 2024-02-05T23:33:21 1707176001

I guess I should just have googled it. First result:

https://interestingengineering.com/culture/the-design-proces...

LordAtlas · 2024-02-04T20:27:32 1707078452

There's more to them than meets the eye.

myself248 · 2024-02-04T21:43:17 1707082997

How do you determine the required permeability of the core material?

the_doctah · 2024-02-04T22:38:10 1707086290

You have something against Maximals and Predacons?

mccrory · 2024-02-05T06:00:49 1707112849

No, just was thinking OG Transformers. All though some of the very new stuff isn’t my thing.

bilsbie · 2024-02-04T21:44:42 1707083082

Is it normal not to come away from these with any kind of intuition like “ah, that’s why transformers work so well”

Instead I just come away feeling like it’s a Frankenstein of different matrices and statistics.

Geisterde · 2024-02-05T01:48:26 1707097706

If it looks like Frankenstein, and it often acts like Frankenstein...

These things can and are and will be useful, but its not a magic ticket to knowing things. I booted up lamma2 at home, first question "how do I make you smarter?", all it returned were responses like "tell me a book or field to research". Obviously, that doesnt work, and it never suggested importing libraries or running training cycles.

Hey, I search a linux problem on brave, and the ai gives me the right answer 75% of the time, nice, but the other 25% it is just regurgiating outdated reddit threads. It has no cognition, it has no bearing on what consequences are, it just pronounces the result of a statistical model, it doesnt know what any of it means.

romusha · 2024-02-05T02:23:40 1707099820

And we already have people zealously argue that LLM has cognition, just like predicted by sci-fi

hiddencost · 2024-02-04T21:59:03 1707083943

The reason they're good is that they're not recurrent: you can do attention over all of the sequence at once.

Recurrent models had two flaws: (1) vanishing gradient problem (2) not parallelizable

bluescrn · 2024-02-04T22:08:32 1707084512

Wasn't there a better name that could have been used for these things?

One that wasn't already used for an extremely common electrical device, as well as a toy/movie franchise?

zone411 · 2024-02-04T22:10:45 1707084645

Articles like this pop up often on HN but I don't think they can lead to genuine comprehension. Doubtful that people outside ML circles will dive into Mamba or analogous architectures if they supplant transformers. They're just harder to understand.

hasty_pudding · 2024-02-04T18:57:19 1707073039

Arent they aliens? So their technology to transform between vehicle and robot would be somewhat alien to humans.

speed_spread · 2024-02-04T20:00:50 1707076850

Even as a kid, I was scared of thinking about what happened to humans in the vehicles when they transformed. You'd always be one bad pivot away from being crushed to a pulp.

DaveExeter · 2024-02-04T19:50:01 1707076201

Transformers use magnetic fields to transform electricity.

That's why Tesla's AC system beat out Edison's DC system.

Geisterde · 2024-02-05T02:05:26 1707098726

Sir, teslas have batteroes, those are DC.

jimmySixDOF · 2024-02-04T22:15:56 1707084956

So if this article is of interest to you then this was a good discussion of a nice 3D visualization of a small LLM (NanoGPT) in motion ...

LLM Visualization https://news.ycombinator.com/item?id=38505211 (https://bbycroft.net/llm)

__loam · 2024-02-04T20:34:40 1707078880

Time for the weekly LLM explainer.

nerdponx · 2024-02-04T20:50:35 1707079835

Writing a "how does X work" post is a good way to learn about X. And when X is very popular, lots of people will be writing and sharing those posts, both for their own study and just for clicks.

__loam · 2024-02-04T21:01:48 1707080508

Hence, the weekly LLM explainer at the top of the hackernews feed.

totetsu · 2024-02-04T21:44:58 1707083098

How does that work?

Geisterde · 2024-02-05T02:04:37 1707098677

Teach someone something, then reflect on the experience. Youll find you put additional effort into formalizing your own understanding, I always try to teach 2-3 people something when I learn it straight out of the gate.

sva_ · 2024-02-04T21:12:30 1707081150

Waiting for @dang to post a list

fancyfredbot · 2024-02-04T22:06:45 1707084405

Why do we add the positional encoding to the original vector instead of appending it? Doesn't adding positional data to the embedding destroy information?

razodactyl · 2024-02-07T23:23:51 1707348231

The hope is that the neural network will learn to undo the positional information and understand that it's an augmentation used to tell the neural network where the word occurred in the sequence.

Sequence being a misleading term since the input to the network is actually a set of tokens meaning no positional information without the extra information added.

imtringued · 2024-02-05T07:46:46 1707119206

What's up with the excessive reliance on nested bullet points? It certainly doesn't look like a good blog post.

Beefin · 2024-02-04T20:49:53 1707079793

i want to see a "how do State Spaces work" paper

visarga · 2024-02-04T21:51:27 1707083487

Oh that RNN-CNN duality is such a nice math trick. You can do recurrence in parallel.

visarga · 2024-02-04T20:25:21 1707078321

No, they use two coils around a metallic core. I made a few when I was a kid. I know.

quickthrower2 · 2024-02-04T21:09:08 1707080948

No, they are a type that takes a monad as a type parameter and creates a new monad that extends the original monads capabilities. The downside is the need to use lift all the time to access the outer monad.