Sure here it is[0]. This one is for parsing Kindle clippings but it's quite similar (except no sorting of all words, I was usually using some service for mobi to txt switch before feeding script). I should probably clean it up and switch implementation to english (but don't use it anymore).