LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very powerful but they're very general. As a common example for this limitation, imagine you want your LLM to answer questions over a large corpus.
You can't pass the entire corpus into the prompt. So you might:
- preprocess the corpus by iterating over documents, splitting them into chunks, and summarizing them
- embed those chunks/summaries in some vector space
- when you get a question, search your vector space for similar chunks
- pass those chunks to the LLM in the prompt, along with your question
This ends up being a very common pattern, where you need to do some preprocessing of some information, some real-time collecting of pieces, and then an interaction with the LLM (in some cases, you might go back and forth with the LLM). For instance, code and semantic search follows a similar pattern (preprocess -> embed -> nearest-neighbors at query time -> LLM).
Langchain provides a great abstraction for composing these pieces. IMO, this sort of "prompt plumbing" is far more important than all the slick (but somewhat gimicky) "prompt engineering" examples we see.
I suspect this will get more important as the LLMs become more powerful and more integrated, requiring more data to be provided at prompt time.
Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index) which takes a more "data structure" approach (and actually uses langchain for some stuff under the hood).
Thank you for the link. I have a project with plenty of structured data from which I create summaries and short articles. Currently I'm generating prompts with context inside the prompt. GPT Index seems to pretty well fitted to that usecase
This is great! I love seeing how rapidly in the past 6 months these ideas are evolving. I've been internally calling these systems "prompt machines". I'm a strong believer that chaining together language model prompts is core to extracting real, and reproducible value from language models. I sometimes even wonder if systems like this are the path to AGI as well, and spent a full month 'stuck' on that hypothesis in October.
Specific to prompt-chaining: I've spent a lot of time ideating about where "prompts live" (are they best as API endpoint, as cli programs, as machines with internal state, treated as a single 'assembly instruction' -- where do "prompts" live naturally) and eventually decided on them being the most synonymous with functions (and api endpoints via the RPC concept)
mental model I've developed (sharing in case it resonates with anyone else)
a "chain" is `a = 'text'; b = p1(a); c = p2(b)` where p1 and p2 are LLM prompts.
What comes next (in my opinion) is other programming constructs: loops, conditionals, variables (memory), etc. (I think LangChain represents some of these concepts as their "areas" -> chain (function chaining), agents (loops), memory (variables))
To offer this code-style interface on top of LLMs, I made something similar to LangChain, but scoped what i made to only focus on the bare functional interface and the concept of a "prompt function", and leave the power of the "execution flow" up to the language interpreter itself (in this case python) so the user can make anything with it.
I've had so much fun recently just playing with prompt chaining in general, it feels like the "new toy" in the AI space (orders of magnitude more fun than dall-e or chat-gpt for me). (I built sketch (posted the other day on HN) based on lambdaprompt)
My favorites have been things to test the inherent behaviors of language models using iterated prompts. I spent some time looking for "fractal" like behavior inside the functions, hoping that if I got the right starting point, an iterated function would avoid fixed points --> this has eluded me so far, so if anyone finds non-fixed points in LLMs, please let me know!
I'm a believer that the "next revolution" in machine-written code and behavior from LLMs will come when someone can tame LLM prompting to self-write prompt chains themselves (whether that is on lambdaprompt, langchain, or something else!)
All in all, I'm super hyped about LangChain, love the space they are in and the rapid attention they are getting~
- Iterate on a prompt with another "discriminator" prompt, that determines the result is good / safe
- Write N-trials of an answer, then use another prompt to select the best answer
- When doing code writing (SQL or Pandas) write the output, then use a parser (eg. `ast` in python) to validate code is valid, if not, feed back into a prompt for fixes
- Logical negation checks (check if X, and if ~X, give opposite answers, then it's likely consistent, if it's both "affirmative" (as the models tend to bias towards), then it's definitely hallucinating)
Other 'product' ideas i've tried:
- A chat style interface (I made a chat-bot last year, similar to chatGPT)
- A "google-this-for-me" style chain, that checks google, summarizes multiple results, then synthesizes a final result
Ideas I've been sitting on, that I think would be fun to prototype:
- An iterative "large document" editor: storing global intent, instructions, outline, and the raw text, and each iteration of the prompt works on the these objects to build a large document.
- A "research this topic for me", similar to the above, but include the google searching, summarizing, and such
- A code-repository "AI agent" that takes `Issues` and `Pull requests` as input, and writes and edits code for you, and by adding feedback in github, it uses that to modify the branch and act as a developer. (Code via github interface, rather than an IDE)
This resonates with me. I've been slowly working on similar project, instead of python functions I'm doing api creation. I have a website that allows you to create templated apis for each prompt.
The next step for me is the workflow composition part. Instead of a functional model of python functions Im going to try to compose workflows with AWS step functions where each step function calls a particular one of the templated api endpoints.
LangChain is very cool. Templates and composability are the way forward.
I'm guessing that most folks haven't created composable prompts, so a few experiences on why this stuff is necessary. I've been building a language learning app and it makes heavy use of GPT-3 [1], which has made clear a number of things for me:
1) Composability is fundamental to leveraging language models. You can't get language models to just generate things in one go. It's rather like a human. If you're writing a book, you start with an idea, then an outline, then a chapter outline, then writing paragraphs... Or in our case, generate a flashcard deck description, then vocab, then example sentences, etc.
2) Externalized prompt templates are also important. Engineers need to be able interface with experts who can create custom prompts. E.g. in our case, I need experts who speaks to build prompts specific to other languages for a language learning app [2].
3) Unit testing is critical. There is no linter for a prompt. I have made so many typos over the last year that broke things. OpenAI has released several new models over the last 2 years. Anthropic is coming out with a model. You need to have assurances your prompts work. I've actually had to start building a basic unit tester for our prompts because of this... [3] (please someone else do this so I don't have to)
4) external data is the next step forward for large language models. E.g. in my case, someone may want to learn about the history/culture of a country and we may want to reference existing articles on it since LLMs are known to hallucinate. We need to be able to interface with the web and databases easily. I'm not convinced that LangChain is the write layer of abstraction for this. I suspect/hope that the next version of GPT will have some significant advances in this regard.
So much effort and processing power going into narrowing down and coercing a vastly general thing, makes me wonder at what point it stops being worth it to use the LLM.
It’s a lot of work and it’s messy, but the multiplier on productivity is too large to ignore. At some point all this will get easier, but for now it’s do or die.
After the dust settles, what is it really multiplying? Especially accounting for the energy/gpus it takes. At the very least, why not generate all the cards once, and edit as needed?
Temperature = 0 solves a lot of it, but not all. A list of all options solves a portion of the remaining. Not sure what to do about the rest - maybe ask the LLM?
By far the most interesting aspect of this, for me, is that we're now seeing tools for building software infrastructure with layers of APIs that operate on -- gasp! -- natural language, which is notoriously prone to imprecision and ambiguity. And yet it works remarkably well. It's hard not to look at all this, mouth agape, in awe.
Part of me wonders, though:
Wouldn't it be better if we could compose LLMs by passing sequences of embeddings (e.g., in a standardized high-dimensional space), which are much richer representations of LLM input, internal, and output states?
I think embeddings are actually low-'resolution' representations. Like GPT's ability to parse and calculate the structure of the sentence "Hello, how are you?" is not represented in the embedding for the sentence. The embedding is a 1-dimensional vector and inside the model it interacts with other texts in 10,000+ dimensions
I wonder if there'd be any use for a "ontological" representation, somewhere in-between a natural-language string and its embedding in a particular LLM. Maybe something that balances human-readability, LLM-composability, lack of brittleness, insight into the local structure of the embedding, etc.
I wonder too. I imagine the best we could do with present technology is to get back the generated text in the form of text tokens accompanied by their corresponding deep embeddings (last hidden states): `[(text_token, deep_emb), (text_token, deep_emb), ...]`. Those deep embeddings incorporate "everything the model knows" about each generated token of text.
Maybe a mapping/representation for a "medium embedding" could be learned that strikes a balance between shallow and deep. I have no idea what a good objective-function would be, though.
I mean deep embeddings (i.e., sequences of hidden states, the ones are computed by all those interactions) , not the shallow embeddings of token ids in the first layer of the model! Those deep embeddings are much richer representations.
Imagine if you and others building apps had access to "GPT3 deep sequence embeddings v1.0" via an API.
Not quite. My understanding is that OpenAI's various embeddings APIs return only a single vector per document, instead of the sequence of hidden states corresponding to each predicted next token in the response generated by a GPT-type LLM.
Imagine getting generated text from a GPT LLM that comes with a deep embedding of each generated token's "contextual meaning":
by which measure are you making this claim? even a 95% reliability means you get 5% wrong. on top of that you have prompt injection attacks. this stuff is much less suitable the more you move away from demos to predictable business applications
Whoa, I didn't say this is suitable for predictable business applications yet!
What I did say is that I'm in awe at the fact that this stuff works as well as it does, given that natural language is so notoriously prone to imprecision and ambiguity. I mean, if you had told me six months ago that this would be working even "95%" of the time in demos, I would have said, no way.
Basically, I agree with you that at present this becomes "less suitable the more you move away from demos to predictable business applications" :-)
Perhaps, but language is the common denominator in a multi-model world. E.g., I pass the GPT output into other models which are fine tuned for that sub domain. You can do embedding to embedding conversion, but not sure it's worth the effort.
Imagine if OpenAI made GPT3's final hidden states available via an API ("GPT3 deep sequence embeddings v1.0"), next to each generated text token: [(text_token, deep_emb), (text_token, deep_emb), ...]. You and anyone else could build apps on top. Those hidden states would incorporate much more, and much richer, information than the text. Higher-level models could be trained to act on such information!
I am prototyping new features where I work, on top of GPT3. To get it beyond fancy demos and actually delivering customer utility, you need a LOT of work to build in robustness and correctness.
Prompt chains is where I’ve landed at the moment. This library seems very relevant and timely.
It works well if you continously provide progress to the user. In my specific use case, the data generated is progressively displayed to the user, and refined in many further chain steps. They see the evolution of this. So it feels like it's fast, but also that a bunch of work is happening.
Isn't this a chat interface to various apis? I guess that's the point. But it doesn't seem like that is utilizing the power of large language models. Maybe I'm not understanding this.
Using a Google search api + a calculator to answer a question is cool [0]. but... we could already do that?
>but... we could already do that?
that sums up so much.
the emperor has no clothes. few see it.
its inferior to the modern purpose built tools.
we act impressed a simple program can be created from a prompt, what has every programmer been doing already for the last 10 years with google search and stackoverflow?
I think building regexes is a good example of where chatgpt/copilot is useful. It’s not that it’s particularly difficult or requires much understanding, just very time consuming compared to writing an English language description and walking through an example.
Useful compared to writing a regex in notepad yeah.
Last time I was challenged by regex I easily found very fancy web page with so many nice features. Actual documentation, ability to select a specific regex engine (or implementation or whatever you call it) , real-time results on test data, highlighting that shows how the regex works, etc.
and that was years ago I’m sure there’s even better web apps now.
I can’t imagine having a better experience asking AI chat than using a web app made for the purpose
Being able give some examples and just state in plain English what you want the capture groups to be is pretty much my ideal regex experience (in other words, I don’t want to think about the semantics of regex ever).
That’s what I figured. Maybe it’s just me but for complicated use-cases this still takes me forever to get the right regex string, especially with capture groups.
I'm very curious as to how they are able to limit the chat bot's knowledge to their own documentation.
It seems as if it should also have the entire knowledge of the LLM, but
> Who was Thomas Jefferson?
Outputs
> Hmm, I'm not sure. I'm an AI assistant for the open source library LangChain. You can find more information about LangChain at https://langchain.readthedocs.io.
It doesn't explain why the model only refers to the documentation though.
Basically what they are doing is giving GPT-3 a prompt that includes the (semantically relevant) pieces of the documentation.
But I don't see why a User can't ask about something that is tangentially relevant ("Who is Bill Gates?") and get an answer that really comes from GPT-3 pre-existing knowledge.
And very clever way for these guys to go from "hey we found this cool problem" to "well, did you notice that it ends up super complex and slow? Well well well, we could make it so much better with... wait for it... a data pipeline!"
("...and don't we just happen to sell data pipeline software! What a coincidence!")
lol.
Great read, thank you for the recommendation.
Yes also the part I don't understand is that how langchain can be used to make the agents smarter for a particular domain? I did not see any interface to provide feedback of feed data. Maybe I am missing something?
It's usually either providing data to the LLM, or doing back-and-forth with the LLM (usually a mix of both).
For instance, if you want it to answer questions about your code-base, the model doesn't know your code base. You can't feed the entire code-base into a prompt. So, you'd use langchain to:
- preprocess your code-base, by chunking it and embedding it in some vector space
- when you get a question, see where it is in the vector space and find the "k nearest neighbors"
- pass those nearest neighbors, along with your question, to the LLM (because those neighbors are the contextually relevant pieces, and they'd fit in the prompt)
> LangChain uses pre-trained models from Hugging Face, such as BERT, GPT-2, and XLNet. For more information, please see the Getting Started Documentation[0].
That links to a 404, but I did find the correct link[1]. Oddly that doc only mentions an OpenAI API wrapper. I couldn’t find anything about the other models from huggingface.
Does LangChain have any tooling around fine tuning pre-trained LLMs like GPTNeoX[2]?
This is amazing. Thank you to all the contributors!
I am working on a project in the old media space and we were planning to build several of these elements ourselves if something like this didn’t come along. Love the open source nature and would love to contribute.
Hell yeah. Been thinking about this in my spare cycles a lot! The missing thing when interacting with GPT3 was building up layers of re-usable abstraction that could be mixed together: i.e. a prompt that's great at summarization, a prompt that great at generating a list of 5 x related ideas, a prompt that's great at sorting a list of inputs according to some verbal sorting description.
Had a play around with this early-GPT3 days by creating a Python decorator that let you easily write "prompt functions" and then you could combine these to create higher-level prompt generating machines.
LangChain takes this to the logical end-point, awesome!
I like to think I am smart and in the loop - but would this allow me to feed all sorts of PowerPoint presentations, word docs, text files, etc... so that new hires could get a concise answer to business processes?
I have a side project at the moment to use LangChain to help build a Q&A system from domain specific documentation on a product I own. Its around 500 different PDFs. I was going to follow this guide* which looks credible and solves for how to deal with large corpus of texts. Curious what others think as to approach.
I’ve been following both LangChain and GPT Index but to be honest I’m getting super confused which use cases are better suited for each (or a combination of both!). It would be great to see a presentation of both tools together like this :)
OpenAI already does charge access to its API. ChatGPT is currently free, but LLM model access has been available through a paid API since before GPT-3.
This looks useful. It should compete with Rasa, which is supposed to do something similar, but in practice is mostly only able to match what you're saying to nodes in a phone tree. Has anyone done a game NPC with this?
Pretty sweet, but are there similar setup more geared towards coding?
This would be a great way to get a pair programmer for people who are just coding for hobby or transitioning to SWE. Or to alleviate imposter syndrome.
Such a good idea! A few years ago I tried importing pre-trained Keras models into Racket Scheme, creating a function for each model. I didn't really get the idea right, LangChain looks much better.
You can't pass the entire corpus into the prompt. So you might: - preprocess the corpus by iterating over documents, splitting them into chunks, and summarizing them - embed those chunks/summaries in some vector space - when you get a question, search your vector space for similar chunks - pass those chunks to the LLM in the prompt, along with your question
This ends up being a very common pattern, where you need to do some preprocessing of some information, some real-time collecting of pieces, and then an interaction with the LLM (in some cases, you might go back and forth with the LLM). For instance, code and semantic search follows a similar pattern (preprocess -> embed -> nearest-neighbors at query time -> LLM).
Langchain provides a great abstraction for composing these pieces. IMO, this sort of "prompt plumbing" is far more important than all the slick (but somewhat gimicky) "prompt engineering" examples we see.
I suspect this will get more important as the LLMs become more powerful and more integrated, requiring more data to be provided at prompt time.