Hacker News new | past | comments | ask | show | jobs | submit login

Latent semantic mapping is a technique which takes a large number of text documents, maps them to term frequency vectors (vector-space semantics), and performs dimensionality reduction into a smaller semantic space. This then lets you determine how similar in meaning different documents are. You can use this for a variety of tasks.

Wikipedia: Latent Semantic Mapping http://en.wikipedia.org/wiki/Latent_semantic_mapping

WWDC 2011 talk, now available: "Latent semantic mapping: exposing the meaning behind words and documents" https://developer.apple.com/videos/wwdc/2011/




I would really be interested in some use-cases. The examples they give are fairly limited.


Classification (e.g. spam detection) and document categorization, as well as clustering similar documents.

You can do all these tasks in the original document space, instead of in the latent space, but the advantage of the latent space is that it can capture patterns across the entire corpus. This is called unsupervised learning.

In particular, if I have only 100 training examples (e.g. 10 examples of spam and 90 examples of ham), I will learn a better classifier if I first use LSM and then train my classifier, than if I train my classifier over the original documents. In the former case, unsupervised learning detects patterns over the entire corpus, which I use to discriminate between spam and ham. In the latter case, I can only use features from the 100 labeled documents, so it is more difficult to generalize.

More examples:

* What language is this document?

* Is this document about sports?

* Is this news article similar to 50 news articles that I previously marked as "highly interesting" ?


Well, for example, how about sorting out a lot of pdf documents I have in a folder called papers/ ? I do use Mendeley now but there are some leftovers from before that I really don't want to sit and sort through (not to mention the fact that I probably may have multiple copies of some of them.)


Curriculum review comittees could reduce redundancy and fill gaps by reviewing course documents.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: