One of the worst problems of LLMs at this point in time is keeping them updated.
For instance ChatGPT should be able to talk about the Superbowl in 1984 when the Chicago Bears trounced the New England Patriots (I remember it well because I grew up in New England!) but I couldn't expect it to have anything to say about the (other kind of football) game I saw yesterday where West Ham beat Brighton because nothing about the later game is in the training set.
This problem just gets worse as time passes and the world continues to change. Bing's chatbot works around this for my soccer example by running a conventional query and then having the LLM summarize it which gave a pretty good summary of the game but when I asked it pointed questions about this particular game such "Who had the most possession?" which was relevant because it was really lopsided in the direction of the losing team, it fell down, it seemed to be working off structured statistics that didn't have this data as opposed to media reports of the game which surely would have noticed that.
With current technology they will need to rebuild the whole thing one day which will (1) be crazy expensive and (2) will break all the document vectors that people have saved from the model which will be a big problem for anybody using systems like LangChain or doing embedding-based similarity search.
There's a lot of need for some ability to update an LLM incrementally and not wreck it's performance and this kind of research points to one path to that.
The most promising work along these lines centers around augmenting LLMs with an external data store ("retrieval-augmented LLM"s). I think this started with Facebook's KNN-LLM ( https://arxiv.org/pdf/1911.00172.pdf ). Legal conflict may force the industry to move towards vector DBs as the predominant method by which facts are "stored" rather than model parameters ( https://arxiv.org/pdf/2308.04430.pdf ) , with the happy side effect of update-ability over time.
[1] https://arxiv.org/abs/2202.05262