Hacker News new | past | comments | ask | show | jobs | submit login

Slightly tangential, but does anyone know if word2vec can be used in a compound form to build up "concepts"? I'm interested to know if it could be used to identify parallelism in works of literature e.g. identifying plagiarism, parallels between the old and new testament, or intertextual works like Ulysses by Joyce and the Odyssey.

One of the things we found out working on this project, is the problem of converting a word vector to a paragraph vector. Apparently there are many ways to do so and each one yields different results based on the length of the text and content. We used a weighted average of the words based on their frequency in a corpus.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
