Hacker News new | past | comments | ask | show | jobs | submit login

> it won't find anything matching exactly

This approach is probably inadequate. In my line of (NLP) research I find many things have been said exactly many, many times over.

You can try this out yourself by grouping and counting strings using the many publically available Bigquery corpora for various substring lengths and offsets, e.g. [0-16]; [0-32]; [0-64] substring lengths at different offsets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: