Hacker News new | past | comments | ask | show | jobs | submit login

The graph axis isn't the only problem. The word "toast" did not drop in usage by 40%, Google's dataset shifted dramatically towards a different genre than it was composed of previously. I've been in conversations with people trying to explain those drops in the 70s, and no one (myself included) realized that it was such a dramatic flaw in the data.



That’s fair, the article has a very valid point, which would be made even stronger without the misreading of the plots they’re critiquing, whether it was accidental or intentional. I always thought Ngrams were weird too, I remember in the past thinking some of the dramatic shifts it shows were unlikely.


Is there no way to filter out particular data sets? This seems like a pretty huge limitation.


Sort of, but it's pretty blunt. You can select between a few different English corpuses, but it's basically fiction versus everything, not more fine than that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: