Hacker News new | past | comments | ask | show | jobs | submit login

One useful technique here could be to use text embeddings and cosine similarity: https://simonwillison.net/2023/Oct/23/embeddings/



love this and have been using tf/idf for embeddings and various measures of similarity for some personal pet projects. one thing i came across in my research is that cosine similarity was more useful for vectors of different lengths and that euclidean distance was useful for vectors of similar length but simon alludes to a same-length requirement. iā€™m not formally trained in this area so i was hoping someone could shed some light on this for me.


You can use cosine similarity with embedding vectors of different lengths (or better, the vectors have all the same length, but they are sparse with most components being 0).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: