Hello HN!
I became frustrated with the unpredictible/poor match quality and opaqueness of "relevance scores" in existing fuzzy and fulltext search libs, so I tried something different and this is the result. The main selling point is the result quality / ordering, with best-in-class memory overhead and excellent performance being bonuses. The API is pretty stable at this point, but looking for feedback before committing to 1.0.
TL;DR
The test corpus is a 4MB json file with 162k words/phrases, so give it a second for initial download. You can also drag/drop your own text/json corpus into the UI to try it against your own dataset.
Live demo/compare with a few other libs (there are many more in the codebase, in various states of completion, WIP):
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF...
In isolation for perf assessment:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF...
To increase fuzziness and get broader results, try setting intraMax=1 (core) and enable outOfOrder (userland):
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF...
Also play with the sortPreset selector to swap out the default Array.sort() for one in userland that prioritizes typehead-ness (the resultset remains identical).
Still TODO:
- Example of stripping diacritics
- Example of using non-latin charsets
- Example of prefix-caching to improve typeahead perf even further
- Example of poor man's document search (matching multiple object properties)
That's all, thanks!
I am also quite frustrated with the current state of full text search in the javascript world. All libs I've tried miss the most basic examples and their community seems to ignore it. Will give yours a try but it already looks much better from the comparison page.
Edit: Nope, your lib doesn't seem to handle substitution well (THE most common type of typo), so yep, we are back to square one ...