Hacker News new | past | comments | ask | show | jobs | submit login

Yes, you can shove the embeddings in a BLOB, but then you can't do the kinds of query operations you expect to be able to do with embeddings.



You can run similarity scores with a custom SQLite function.

I use a Python one usually, but it's also possible to build a much faster one in C: https://simonwillison.net/2024/Mar/23/building-c-extensions-...


Right like you could use it sort of like cache and send the blobs to OpenAI to use their similarity API, but you couldn't really use SQL to do cosine similarity operations?

My understanding of what's going on at a technical level might be a bit limited.


Yes.

Although if you really wanted to, and normalized your data like a good little Edgar F. Codd devotee, you could write something like this:

SELECT SUM(v.dot) / (SQRT(SUM(v.v1)) * SQRT(SUM(v.v2))) FROM (SELECT v1.dimension as dim, v1.value as v1, v2.value as v2, v1.value * v2.value as dot FROM vectors as v1 INNER JOIN vectors as v2 ON v1.dimension = v2.dimension WHERE v1.vector_id = "?" AND v2.vector_id = "?") as v;

This assumes one table called "vectors" with columns vector_id, dimension, and value; vector_id and dimension being primary. The inner query grabs two vectors as separate columns with some self-join trickery, computes the product of each component, and then the outer query computes aggregate functions on the inner query to do the actual cosine similarity.

No I have not tested this on an actual database engine, I probably screwed up the SQL somehow. And obviously it's easier to just have a database (or Postgres extension) that recognizes vector data as a distinct data type and gives you a dedicated cosine-similarity function.


Thanks for the explanation! Appreciate that you took the time to give an example. Makes a lot more sense why we reach for specific tools for this.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: