love this and have been using tf/idf for embeddings and various measures of similarity for some personal pet projects. one thing i came across in my research is that cosine similarity was more useful for vectors of different lengths and that euclidean distance was useful for vectors of similar length but simon alludes to a same-length requirement. iām not formally trained in this area so i was hoping someone could shed some light on this for me.
You can use cosine similarity with embedding vectors of different lengths (or better, the vectors have all the same length, but they are sparse with most components being 0).