This approach is probably inadequate. In my line of (NLP) research I find many things have been said exactly many, many times over.
You can try this out yourself by grouping and counting strings using the many publically available Bigquery corpora for various substring lengths and offsets, e.g. [0-16]; [0-32]; [0-64] substring lengths at different offsets.
This approach is probably inadequate. In my line of (NLP) research I find many things have been said exactly many, many times over.
You can try this out yourself by grouping and counting strings using the many publically available Bigquery corpora for various substring lengths and offsets, e.g. [0-16]; [0-32]; [0-64] substring lengths at different offsets.