Hacker News new | past | comments | ask | show | jobs | submit login
PMET: Precise Model Editing in a Transformer (arxiv.org)
119 points by PaulHoule on Aug 27, 2023 | hide | past | favorite | 13 comments



Fyi, Meng et al 2022 [1] is pretty much required reading in order to understand this paper

[1] https://arxiv.org/abs/2202.05262


Yannic did a great interview with the authors some time ago https://youtu.be/_NMQyOu2HTo


This may drop the cost and significantly increase the feasibility for government / court mandated changes / censoring / edits to models.



The PRC would doubtless have an interest in precisely removing all knowledge of certain historical facts from LLMs within China.


That's just one application.

One of the worst problems of LLMs at this point in time is keeping them updated.

For instance ChatGPT should be able to talk about the Superbowl in 1984 when the Chicago Bears trounced the New England Patriots (I remember it well because I grew up in New England!) but I couldn't expect it to have anything to say about the (other kind of football) game I saw yesterday where West Ham beat Brighton because nothing about the later game is in the training set.

This problem just gets worse as time passes and the world continues to change. Bing's chatbot works around this for my soccer example by running a conventional query and then having the LLM summarize it which gave a pretty good summary of the game but when I asked it pointed questions about this particular game such "Who had the most possession?" which was relevant because it was really lopsided in the direction of the losing team, it fell down, it seemed to be working off structured statistics that didn't have this data as opposed to media reports of the game which surely would have noticed that.

With current technology they will need to rebuild the whole thing one day which will (1) be crazy expensive and (2) will break all the document vectors that people have saved from the model which will be a big problem for anybody using systems like LangChain or doing embedding-based similarity search.

There's a lot of need for some ability to update an LLM incrementally and not wreck it's performance and this kind of research points to one path to that.


The most promising work along these lines centers around augmenting LLMs with an external data store ("retrieval-augmented LLM"s). I think this started with Facebook's KNN-LLM ( https://arxiv.org/pdf/1911.00172.pdf ). Legal conflict may force the industry to move towards vector DBs as the predominant method by which facts are "stored" rather than model parameters ( https://arxiv.org/pdf/2308.04430.pdf ) , with the happy side effect of update-ability over time.



Crap I got the year wrong... It was 1986

https://en.wikipedia.org/wiki/Super_Bowl_XX


How do you save a document vector and do similarity search with it?


There is this

https://github.com/openai/chatgpt-retrieval-plugin

I just use SBERT which has models I can run locally

https://sbert.net/


You encode your document with some kind of embedding, eg HuggingFace Sentence Transformers: https://www.sbert.net/ (probably most commonly used) or OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings/what-are-... and then use a vector database (Elastic, Postgres, FAISS or whatever) to do a similarity search.


they could just use it without publishing the paper … wonder what the reason could be …




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: