Is there any easy way to run the embedding logic locally? Maybe even locally to ...

jmorgan · on April 17, 2024

Support for _some_ embedding models works in Ollama (and llama.cpp - Bert models specifically)

  ollama pull all-minilm

  curl http://localhost:11434/api/embeddings -d '{
    "model": "all-minilm",
    "prompt": "Here is an article about llamas..."
  }'

Embedding models run quite well even on CPU since they are smaller models. There are other implementations with a library form factor like transformers.js https://xenova.github.io/transformers.js/ and sentence-transformers https://pypi.org/project/sentence-transformers/

laktek · on April 17, 2024

If you are building using Supabase stack (Postgres as DB with pgVector), we just released a built-in embedding generation API yesterday. This works both locally (in CPUs) and you can deploy it without any modifications.

Check this video on building Semantic Search in Supabase: https://youtu.be/w4Rr_1whU-U

Also, the blog on announcement with links to text versions of the tutorials: https://supabase.com/blog/ai-inference-now-available-in-supa...

jonplackett · on April 17, 2024

So handy! I already got some embeddings working with supabase pgvector and OpenAI and it worked great.

What would the cost of running this be like compared to the OpenAI embedding api?

laktek · on April 17, 2024

There are no extra costs other than the what we'd normally charge for Edge Function invocations (you get up to 500K in the free plan and 2M in the Pro plan)

_bramses · on April 18, 2024

neat! one thing i’d really love tooling for: supporting multi user apps where each has their own siloed data and embeddings. i find myself having to set up databases from scratch for all my clients, which results in a lot of repetitive work. i’d love to have the ability one day to easily add users to the same db and let them get to embedding without having to have any knowledge going in

kiwicopple · on April 18, 2024

This is possible in supabase. You can store all the data in a table and restrict access with Row Level Security

You also have various ways to separate the data for indexes/performance

- use metadata filtering first (eg: filter by customer ID prior to running a semantic search). This is fast in postgres since its a relational DB

- pgvector supports partial indexes - create one per customer based on a customer ID column

- use table partitions

- use Foreign Data Wrappers (more involved but scales horizontally)

ngalstyan4 · on April 17, 2024

We provide this functionality in Lantern cloud via our Lantern Extras extension: <https://github.com/lanterndata/lantern_extras>

You can generate CLIP embeddings locally on the DB server via:

  SELECT abstract,
       introduction,
       figure1,
       clip_text(abstract) AS abstract_ai,
       clip_text(introduction) AS introduction_ai,
       clip_image(figure1) AS figure1_ai
  INTO papers_augmented
  FROM papers;

Then you can search for embeddings via:

  SELECT abstract, introduction FROM papers_augmented ORDER BY clip_text(query) <=> abstract_ai LIMIT 10;

The approach significantly decreases search latency and results in cleaner code. As an added bonus, EXPLAIN ANALYZE can now tell percentage of time spent in embedding generation vs search.

The linked library enables embedding generation for a dozen open source models and proprietary APIs (list here: <https://lantern.dev/docs/develop/generate>, and adding new ones is really easy.

charlieyuan · on April 17, 2024

Lantern seems really cool! Interestingly we did try CLIP (openclip) image embeddings but the results were poor for 24px by 24px icons. Any ideas?

Charlie @ v0.app

ngalstyan4 · on April 17, 2024

I have tried CLIP on my personal photo album collection and it worked really well there - I could write detailed scene descriptions of past road trips, and the photos I had in mind would pop up. Probably the model is better for everyday photos than for icons

simonw · on April 17, 2024

There are a bunch of embedding models you can run on your own machine. My LLM tool had plugins for some of those:

- https://llm.datasette.io/en/stable/plugins/directory.html#em...

Here's how to use them: https://simonwillison.net/2023/Sep/4/llm-embeddings/

dvt · on April 17, 2024

Yes, I use fastembed-rs[1] in a project I'm working on and it runs flawlessly. You can store the embeddings in any boring database (it's just an array of f32s at the end of the day). But for fast vector math (which you need for similarity search), a vector database is recommended, e.g. the pgvector[2] postgres extension.

[1] https://github.com/Anush008/fastembed-rs

[2] https://github.com/pgvector/pgvector

J_Shelby_J · on April 17, 2024

Fun timing!

I literally just published my first crate: candle_embed[1]

It uses Candle under the hood (the crate is more of a user friendly wrapper) and lets you use any model on HF like the new SoTA model from Snowflake[2].

[1] https://github.com/ShelbyJenkins/candle_embed [2] https://huggingface.co/Snowflake/snowflake-arctic-embed-l

jonnycoder · on April 17, 2024

The MTEB leaderboard has you covered. That is a goto for finding the leading embedding models and I believe many of them can run locally.

https://huggingface.co/spaces/mteb/leaderboard

bryantwolf · on April 17, 2024

This is a good call out. OpenAI embeddings were simple to stand up, pretty good, cheap at this scale, and accessible to everyone. I think that makes them a good starting point for many people. That said, they're closed-source, and there are open-source embeddings you can run on your infrastructure to reduce external dependencies.

notakash · on April 17, 2024

If you're building an iOS app, I've had success storing vectors in coredata and using a tiny coreml model that runs on device for embedding and then doing cosine similarity.

internet101010 · on April 18, 2024

Open WebUI has langchain built-in and integrates perfectly with ollama. They have several variations of docker compose files on their github.

https://github.com/open-webui/open-webui