Hacker News new | past | comments | ask | show | jobs | submit login
Natural Language Corpus Data: Beautiful Data (2009) (norvig.com)
53 points by zerojames 12 months ago | hide | past | favorite | 3 comments



Related:

Natural Language Corpus Data: Beautiful Data - https://news.ycombinator.com/item?id=13197612 - Dec 2016 (13 comments)

Natural Language Corpus Data (2009) [pdf] - https://news.ycombinator.com/item?id=6411711 - Sept 2013 (3 comments)

Natural Language Corpus Data: Beautiful Data - https://news.ycombinator.com/item?id=1483187 - July 2010 (1 comment)


I clicked on count_1w.txt and scrolled to the bottom and found a lot of what seem like misspellings of Google. Then I clicked on count_2w.txt and did the same and regretted doing so


I have been building a word game and I came across these datasets. I find the range of words delightfully quirky, and something that may be useful in a game.

One thing I noticed about count_1w.txt is there are brand names like Starbucks in there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: