Hacker News new | past | comments | ask | show | jobs | submit login

Ok, so how does it work? Which stemming algorithm?

It's not helpful unless we know what it's doing under the covers.




I disagree. For use as a tool, for some use-cases, if it "works as advertised," all we really need to know is whether it's efficient enough for our purposes.

One (horrible) use for something like this is in a framework like Rails, where there's a cultural acceptance of making method and field names by pluralising and conjugating etc.

In Rails if your `House` model has a one-to-many relationship with your `Mouse` model, a `house` object probably automatically gets a `.mice` field. That sort of thing can be done with extensive rule sets, or it can be done with a black-box library.

Of course it's a horrible use-case, and you'll probably always need to deal with ambiguity and context, but for that sort of thing the implementation details aren't nearly as relevant as "the dimensions of the black box" -- how quickly does it run, how quickly does it start up, how much memory does it use, how good at its job is it, and which languages does/can it be made to support?


The lemmatizer is an adaptation of https://wordnet.princeton.edu/documentation/morphy7wn. It conjugates the verb and adjectives to their base form, whereas the plural nouns are converted to singular form. The wink-pos-tagger (https://github.com/winkjs/wink-pos-tagger) leverages the lemmatizer to automatically find lemma of each word as per its part of speech.

For stemming, we have wink-porter2-stemmer (https://github.com/winkjs/wink-porter2-stemmer) that uses Porter Stemmer Algorithm V2 by Dr Martin F Porter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: