Hacker News new | past | comments | ask | show | jobs | submit login

With all the eyeballs on word2vec and gensim how did this not get picked up before?



Gensim considers the `word2vec.c` code, from the original authors of the Word2Vec paper, as canonical and seeks to match its behavior exactly, even in ways it might deviate from some interpretations of the paper.

If there's an actual benefit to be had here, Gensim could add it as an option - but would likely always default to the same CBOW behavior as in `word2vec.c` (& similarly, FastText) - rather than this 'koan' variant.



While I still need to read this paper in detail, I'm not sure their only change is to this scaling of the update.

The `koan` CBOW change has mixed effects on benchmarks, and makes their implementation no longer match the choices of the original, canonical `word2vec.c` release from the original Google authors of the word2vec paper. (Or, by my understanding, the CBOW mode of the FastText code.)

So all the reasoning in that issue for why Gensim didn't want to make any change stands. Of course, if there's an alternate mode that offers proven benefits, it'd be a welcome suggestion/addition. (At this point, it's possible that simply using the `cbow_mean=0` sum-rather-than-average mode, or a different starting `alpha`, matches any claimed benefits of koan_CBOW.)


The paper itself says the only change is normalizing by the context window size C.


Ah, but I've now looked at their code, and it's not the only change! They've also eliminated the `reduced_window` method of weighting-by-distance that's present in `word2vec.c`, Gensim, and FastText.

What if that's the real reason for their sometimes slightly-better, sometimes slightly-worse performance on some benchmarks? Perhaps there are other changes, too.

This is why I continue to think Gensim's policy of matching the reference implementations from the original authors, at least by default, is usually the best policy – rather than using an alternate interpretation of the often-underspecified papers.


The word2vec implementation has many details that are unmentioned or at least not emphasized much in the paper. The source is also not very commented if memory serves.

This is another paper that's basically just about some details of word2vec and GloVe and their effects on the results:

Improving Distributional Similarity with Lessons Learned from Word Embeddings - ACL Anthology https://www.aclweb.org/anthology/Q15-1016/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: