Similarly, consider a series like A Song of Ice and Fire. A human reader is stil...

TeMPOraL · on May 23, 2023

> Similarly, consider a series like A Song of Ice and Fire. A human reader is still consciously aware of (and waiting for) the answers to questions raised in the very first book.

Some of them, some of the time. This is best comparable with ChatGPT having those books in its training dataset.

The context window is more like short-term memory. GPT-4 can fit[0] ~1.5 chapters of Game of Thrones; GPT-4-32k almost six. Making space for prompt, questions and replies, say one chapter for GPT-4, and five chapters for GPT-4-32k.

Can you imagine having a whole chapter in your working memory at once? Being simultaneously aware of every word, every space, every comma, every turn of phrase, every character and every plot line mentioned in it - and then being able to take it all into account when answering questions? Humans can do it for a paragraph, a stanza, maybe half a page. Not a whole chapter in a novel. Definitely not five. Not simultaneously at every level.

I feel in this sense, LLMs already surpassed our low-level capacity - though the comparison is a bit flawed, since our short-term memory also keeps tracks of sights, sounds, smells, time, etc. and emotions. My point here isn't really to compare who has more space for short-term recall - it's to point out that answering questions about immediately read text is another narrow, focused task which machines can now do better than us.

----

[0] - 298000 words in the book (via [1]), over 72 chapters (via [2]), gives us 4139 words per chapter. Multiplying by 4/3, we get 5519 tokens per chapter. GPT-4-8k can fit 1.45x that; GPT-4-32k can fit 5.8x that.

[1] - https://blog.fostergrant.co.uk/2017/08/03/word-counts-popula...

[2] - https://awoiaf.westeros.org/index.php/Chapters_Table_of_cont...

noduerme · on May 24, 2023

Just thinking about this, I realized that as a musician I do it all the time. I can recall lyrics, chords, instrumental parts and phrasing to hundreds if not thousands of pieces of music and "play them back" in my head. Unlike a training set, though, I can usually do that after listening to a piece only a few times, and also recall what I thought of each part of each piece, and how I preferred to treat each note or phrase each time I played it, which gives me more of a catalog of possible phrasings the next time I perform it. This is much easier for me than remembering exact words I've read in prose. I suspect the relationships between all those different dimensions is what makes the memory more durable. I must also be creating intermediary dimensions and vectors to do that processing, because one side effect of it is that I associate colors with pitches.

kaliqt · on May 23, 2023

If we are trying to at least match human level then all we have to do is summarize and store information for retrieval in the context window. Emphasis on summarize.

We take out key points explicitly so it's not summarized, and for the rest (less important parts) we summarize it and save it.

That would very likely fit and it would probably yield equal to or better recall and understanding than humans.

danielbln · on May 23, 2023

Some loss to fidelity? Our memories are hugely lossy, reconstructed at recall based on a bunch of concepts. It's great, but it's also very lossy.

dcl · on May 23, 2023

You got me.

zamnos · on May 23, 2023

ASIOF spoiler below!

> At the Tower of Joy, Ned Stark defeated three members of the Kingsguard and discovered his dying sister, Lyanna, who made him promise to protect her son, Jon Snow, whose true parentage remained a closely guarded secret.

Seems like ChatGPT-3 already knows, unless there's a deeper secret that I'm not deep enough into ASIOF fandom to know.

jokteur · on May 23, 2023

But this is becaude ASIOF was in the training dataset. Chatgpt wouldn't be able to say anything about this book if it wasn't in his dataset, and you wouldn't be able to have enough tokens to present the whole book to chatgpt.

nl · on May 23, 2023

Thinking of it as "the training dataset" vs "the context window" is the wrong way of looking at it.

There's a bunch of prior art for adaption techniques for getting new data into a trained model (fine tuning, RLHF etc). There's no real reason to think there won't be more techniques that turn what think of now as the context window into something that alters the weights in the model and is serialized back to disk.

dcl · on May 23, 2023

It's a reasonable way to look at it given that's how pretty much all 'deployed' versions of LLM's work?

dcl · on May 23, 2023

Exactly.

But also, not just ASIOF is in the training set, but presumably lots of discussion about it and all the interesting events in the book.