Hacker News new | past | comments | ask | show | jobs | submit login

With the introduction of plugins, is it feasible to give ChatGPT some kind of long term and short term memory model?



OpenAI actually thinking about this too. It’s buried in their open source repo and not clear the exact mechanism that ChatGPT knows to make use of it. But we’re already here evidently.

https://github.com/openai/chatgpt-retrieval-plugin#memory-fe...


LangChain is a great workaround for that. [1]

> how to work with a memory module that remembers things about specific entities. It extracts information on entities (using LLMs) and builds up its knowledge about that entity over time (also using LLMs).

[1] https://python.langchain.com/en/latest/modules/memory/types/...


There are attempts via langchains [0] depending on how much context is required I could see a summary step where the history to compressed and used to carry forward progress.

An alternative could be a vector store, injecting small snippets of relative text as a step.

0 - https://python.langchain.com/en/latest/modules/memory/key_co...


Maybe, you could give it a combination of both. We'll call it long short term memory.


The reason I ask is because I feel that a memory model is one of the major bottlenecks toward AGI.


On a more serious note, I do agree with you that memory and self-excitation seem like they are the last push thats needed to get to something more akin to "AGI". But I don't think that Rubicon will be crossed with plugins.


>I do agree with you that memory and self-excitation seem like they are the last push thats needed to get to something more akin to "AGI"

"We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potentially, simulating any algorithm."

From "Memory Augmented Large Language Models are Computationally Universal"

https://deepai.org/publication/memory-augmented-large-langua...


why? short and long-term memory is really easy to do. Even my own basic assistant has it (running on fine-tuned curie model)


I suspect with a 'window' of 32k tokens, OpenAI has already done similar memory tricks.

I suspect that if you filled the context window with "1 1 1 1 1 1 1 1 1 1", and then asked "How many 1's did I just show you?", it probably wouldn't know, simply because whatever tricks they use to have such an apparently large context window don't allow it to 'see' all of it at any given moment.


Ah so you think the 32k context window works differently than eg the 4k davinci context window? They didnt just increase ${hyperparam}?


Training compute goes up with approximately the 3rd power of the window size.

So turning a 4k window to a 32k window means a 512x increase in compute they'd need (just to maintain similar output quality).

I suspect they must have found a better solution to be able to scale the window so big. They haven't announced what it is.


Very interesting, thanks




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: