Hacker News new | past | comments | ask | show | jobs | submit login

Because unlike the authors of this set - who went and stripped the posts out of usernames and permalinks to anonymize it - that set you mention just grabbed data out of the API as-is (at least based on its huggingface description that's left over).

That's the difference.




Just a reminder that anonymization is much harder than merely removing metadata:

Every time I hear "anonymous data", I think of that time AOL published anonymized search logs (for academic research). The anonymization was negligent, and an NYT reporter de-anonymized and tracked down one of the users with the local & personal info present in the search queries.

https://en.wikipedia.org/wiki/AOL_search_log_release

https://web.archive.org/web/20130404175032/http://www.nytime...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: