Hacker News new | past | comments | ask | show | jobs | submit login

There's a specialized algorithm specifically for running levenshtein distance over tens of thousands of strings called bitap[1]. I wrote an impl in rust a while back[2].

But your advice still applies! Run time scales with max distance.

[1] https://en.wikipedia.org/wiki/Bitap_algorithm [2] https://github.com/heyimalex/bitap




Interesting. I used a similar trick to compute a lower bound on the levenshtein distance. I was able to skip the real levenshtein distance calculation, if the lower bound already exceeded a threshold ( I used it for clustering ).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: