Anthropic's Claude is said to be very good. Instruction tuned LLaMA 65B/Falcon 4...

legendofbrando · on June 1, 2023

The problem with Claude is that it is quite literally impossible to get off the waiting list to use it. To OpenAI’s credit they actually ship the product in an accessible way to developers.

renonce · on June 2, 2023

Apart from poe.com there is also nat.dev. It even supports Claude-100K. Just pay $5 and it will bill by API pricing, proportional to number of tokens.

55555 · on June 1, 2023

Poe.com. Takes 1 minute to sign up and then you can use it for 7 days for free. Pretty sweet deal. Not affiliated.

BoorishBears · on June 1, 2023

Imo the least interesting use of LLMs is stuff like Chatbots. API access is a prerequisite to do 99% of the interesting things that they can do.

legendofbrando · on June 1, 2023

I agree - it's not that I 'can't' access Claude, it's that they're not really shipping the API at the same scale that OAI is.

esperent · on June 1, 2023

I just checked out poe.com. Seems you can only buy a subscription if you own Apple hardware (first time I've ever heard that).

It's $20 a month and comes with 300 GPT-4 messages and 1000 Claude 1.2 messages.

By comparison, ChatGPT Plus gives gives you up to 6000 GPT-4 messages a month for the same price (admittedly it would be hard to use that many as they are given in 3 hour blocks).

timeserious · on May 31, 2023

Can you ELI5 why an embeddings database helps here? Can pinecone/milvus be used to 'extend memory' of OSS and vendor LLMs without retraining?

brucethemoose2 · on June 1, 2023

First some context: llm "prompts" are actually the whole conversation + initial context. They learn nothing, hence the whole conversation gets fed into them every time, but the instruction following ones are trained to answer your most recent chat response.

In a nutshell, part of your llm prompt (usually your most recent question?) gets fed as a query for the embedding/vector database. It retrieves the most "similar" entries to your question (which is what an embedding database does), and that information is pasted into the context of the llm. Its kinda like pasting the first entry from a local Google search into the beginning of your question as "background."

Some implementations insert your old conversations (that are too big to fit into the llm's context window) into the database as they are pushed out.

This is what I have seen, anyway. Maybe some other implementations do things better.

thomasahle · on June 1, 2023

> part of your llm prompt (usually your most recent question?) gets fed as a query for the embedding/vector database

How is it embedded? Using a separere embedding model, like Bert or something? Or do you use the LLM itself somehow? Also, how do you create content for the vector database keys themselves? Also just some arbitrary off the shelf embedding? Or do you train it as part of training the LLM?

brucethemoose2 · on June 1, 2023

Yeah its completely seperate. The LLM just gets some extra text in the prompt, that is all. The text you want to insert is "encoded" into the database which is not particularly compute expensive. You can read about one such implementation here: https://github.com/chroma-core/chroma

CamperBob2 · on June 3, 2023

One thing I don't understand is how feeding the entire conversation back as a prefix for every prompt doesn't waste the entire 4K-token context almost immediately. I'd swear that a given ChatGPT window is stateful, somehow, just for that reason alone... but everything I've read suggests that it's not.

thomasahle · on June 2, 2023

Have you tried something like Memory Transformers https://arxiv.org/abs/2006.11527 where you move the k/v pairs that don't fit in the context window to a vector db? Seems like a more general approach, but I have tested then against each other.

freediver · on June 1, 2023

Any database can be used to extend the memory of LLMs. What a database does is store stuff and lets you search/retrieve stuff. Embeddings are differet form of data that are in many (but not all) cases superior to searching through text.

You do not need a fancy cloud hosted service to use an embeddings database like you do not need one to use a regular databse (although you could).

Check https://github.com/kagisearch/vectordb for a simple implementation of a vector search database that uses local, on-premise open source tools and lets you use an embeddings database in 3 lines of code.

poulpy123 · on June 1, 2023

I don't have access to GPT-4 but claude is competiting with gpt-3.5 (chatgpt) and bingAI (whatever they use)