Wouldn't this approach be quite brittle? For example, where would one define sni...

Ozzie_osman · on Feb 26, 2023

It works surprisingly well and you can see examples if you look up the documentation of GPT-Index or Langchain (both are libraries designed to enabled these type of use-cases, among others). Also, you can get fancy, for instance, you can have GPT3 (or any LLM) create multiple "layers" of snippets (for instance, you can have snippets of the actual text, then summaries of a section, then summaries of a chapter, and embed all those and pull in the relevant pieces). Or, you can go back-and-forth with the prompt multiple times to give/get more information.

I'm sure the techniques will evolve over time, but for now, these sorts of patterns (pre-index, then augmenting the prompt at query-time) seem to work best for feeding information/context into the model that it doesn't know about. The other broad family of techniques is around trying to train the model with your custom information ("fine-tuning", etc), but I think most practitioners will agree that's currently less effective for these sorts of use-cases. (Disclaimer: I'm not an expert by any means, but I've played around with both techniques and try to keep up-to-date on what the experts are saying).

umaar · on Feb 26, 2023

Excited to see what comes of it. Lots of people will have a private corpus, and the idea that we can semantically query it sounds so interesting.

Like asking 'what streaming services am I paying for and how much have I spent on them to date?', and some tool going over your bank statements to pick out spotify, netflix etc. I could see being useful.

https://simonwillison.net/2023/Jan/13/semantic-search-answer...