LMQL: A query language for programming (large) language models

CGamesPlay · on May 16, 2023

There's a lot of cool stuff in this library. It's been submitted many times to HN, the authors had some discourse here: https://news.ycombinator.com/item?id=35484673

lbeurerkellner · on May 16, 2023

Author here, thanks for posting us.

Be sure to play around with our entirely web-based playground IDE here: https://lmql.ai/playground and our example showcase at https://lmql.ai. We are happy to answer any questions that come up.

killthebuddha · on May 16, 2023

This looks really interesting, great work! Is there somewhere in the docs that describes the essence of how LMQL works? My initial assumption was that it's a compiler that transforms LMQL queries into prompts, but then I realized that there also needs to be some kind of query execution engine to do prompt chaining, retries, etc. More generally, I'm trying to figure out how difficult it would be to use LMQL from Node.js.

lbeurerkellner · on May 16, 2023

Yes, you are correct. LMQL has its own interpreter and runtime, that executes and translates your query programs into multiple LLM calls, required to satisfy your overall workload. It is inherently multi-part and automates several consecutive calls to the LLM, based on the query program and prompt template that you provide.

Re Node.js: We are actively investigating LMQL use from other languages than Python (see also https://github.com/eth-sri/lmql/issues/1). The current interpreter is quite closely tied to a Python environment, meaning to use it from Node.js, you will at least have to host a Python interpreter as a subprocess for now. We are actively looking into gRPC/inter-process communication though, hoping to improve on this a bit in the near future.

veselin · on May 16, 2023

One of main reasons somebody may want to use such a library is to constrain the output of a LLM. The language is designed to make this easy and abstract this part of the querying away. There are trivial cases when some value is coming from multiple choice, but one can also easily constrain one word to depend on a previously generated word.

IanCal · on May 16, 2023

Is it possible to roll back the output and insert essentially into the previous line of the LLM output? But not losing what they're writing now.

Here's the flow I've been picturing while writing code writing models. Feel free to take this idea anyone, would appreciate a heads up if it works and you publish sota before I get around to it :)

Let's give llms ide level info. As they type, if there's a function they're starting to call use a language server to get tooltip docs and put it in a comment just above the line it's writing. Put auto complete help, type docs, etc in their context while they code.

Edit

Second thought, this kind of thing feels very low in compute compared to the LLM calculations, is this the kind of thing that could be passed up to a remote service as a wasm bundle/similar, to control the streaming output?

lbeurerkellner · on May 16, 2023

If I understand correctly what you are imagining is some sort of local retrieval enhancement to generate a specific part of the overall response, which later is removed, once e.g. the generated piece of code (e.g. a function call) has completed generation.

Indeed, this is a form of LLM prompting that we are also exploring in our preview release channel with something called in-context functions. See https://next.lmql.ai and choose the "In-Context Functions" example in the New Features showcase screen.

With in-context functions, you can provide additional instructions/data to the LLM, that will apply locally only. Once such an in-context function returns, additional instructions/retrieved info is removed and only the LLM-generated end-result remains.

tehsauce · on May 16, 2023

What reason is there to learn a new query language when I can program a LLM with any existing language?

lbeurerkellner · on May 16, 2023

Author here.

LMQL gives you a concise way to define multi-part prompts and enforce constraint on LLMs. For instance, you can make sure the model always adheres to a specific output format, where parsing of the output is automatically taken care of. Also abstracts a number of things like APIs and local models, tokenisation, optimisation and makes tool integration (e.g. tool function calls during LLM reasoning) much easier.

In practice this saves you a lot of ugly text concatenation and output parsing code, letting you focus on the core logic of your project. Overall, however, you will still use your host language to call LMQL. E.g. we are fully integrated with Python, where LMQL query code simply lives in decorated functions (https://docs.lmql.ai/en/latest/python/python.html).

CGamesPlay · on May 16, 2023

Think of this more like SQL rather than Python. This language doesn't replace your main language, and you could implement all of the logic in your main language, obviously. But this provides you with easy-to-access primitives for beam search, constrained responses, etc.

Kiro · on May 16, 2023

Where can I see an example of an actual prompt that is produced?

lbeurerkellner · on May 16, 2023

For a showcase of different LMQL queries and resulting model output, have a look at https://lmql.ai.

Per query, the LMQL runtime calls the underlying LM several times, to execute the complete specified (multi-part) query program.

It does not only translate to only one LM prompt, but rather a sequence of prompts, where during generation additional constraining of the LM is applied, to ensure the LM behaves according to the provided template. That's how it support control-flow and external function calls during generation, it actually executes the queries with a proper runtime and only uses LLMs on the backend.