Hacker News new | past | comments | ask | show | jobs | submit login

I did the same thing. I was merging our billing db with that of a business we had just bought. Due to the nature of our business, customers could exist in both.

So I had a routine that normalized the address, as best I could anyway, didn't help that it was all just one varchar field. Then implimented Levenshtein Distance, messed with the weighting a bit to fit our particular data and away it went. Saved a bunch of headaches. It wasn't perfect, but it was better than hand matching a couple thousand accounts.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: