As an individual, I love the idea of pushing to simplify even further to understand these core concepts. For the ecosystem, I like that vector stores make these features accessible to environments outside of Python.
If you ask ChatGPT to give you a cosine similarity function that works against two arrays of floating numbers in any programming language you'll get the code that you need.
Here's one in JavaScript (my prompt was "cosine similarity function for two javascript arrays of floating point numbers"):
function cosineSimilarity(vecA, vecB) {
if (vecA.length !== vecB.length) {
throw "Vectors do not have the same dimensions";
}
let dotProduct = 0.0;
let normA = 0.0;
let normB = 0.0;
for (let i = 0; i < vecA.length; i++) {
dotProduct += vecA[i] * vecB[i];
normA += vecA[i] ** 2;
normB += vecB[i] ** 2;
}
if (normA === 0 || normB === 0) {
throw "One of the vectors is zero, cannot compute similarity";
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
Vector stores really aren't necessary if you're dealing with less than a few hundred thousand vectors - load them up in a bunch of in-memory arrays and run a function like that against them using brute-force.