Both fantastic papers. For those who aren't aware, Mikolov also helped create word2vec.
One curious thing: this seems to use heirarchal softmax instead of the "negative sampling" described in their earlier paper, despite that paper reporting that "negative sampling" is more computationally efficient and of similar quality. Anyone know why that might be?
Bag of Tricks for Efficient Text Classification:
Enriching Word Vectors with Subword Information:
Both fantastic papers. For those who aren't aware, Mikolov also helped create word2vec.
One curious thing: this seems to use heirarchal softmax instead of the "negative sampling" described in their earlier paper, despite that paper reporting that "negative sampling" is more computationally efficient and of similar quality. Anyone know why that might be?