Hacker News new | past | comments | ask | show | jobs | submit login

People here seem to attack only the argument about "anyone can write a blog", but I was expecting HN to give a couple of thoughts also about his first point. To me it reads something like: "publish all your data out in the open for everybody to see".

I find it extremely problematic. Sure, sharing data is necessary to verify that the conclusions are supported by it and they are not due to methodological errors. But to anyone? Out in the open? I would expect better data ethics, especially from a psychologist.

Your data can contain political opinions, health records, sexual orientation, contact information, the places a person has visited, and when. People sign up for these studies usually with the agreement that the information that can harm them cannot be freely shared, unless to people involved in studies with similar data protection systems. Data like I mentioned has sometimes to be stored in computers not connected to the Internet, to reduce the risk of data leak. Free access to this data paves the way to persecution and shaming.

If I were to sign up as subject to a psychology study and had a person with his ideas leading it, I'd withdraw immediately. I'd question if this person should be a psychology researcher at all. Sharing data is good, but protocols are there for a reason.




This is obviously a problem that will be difficult to solve for social and medical sciences, but surely for the life and engineering sciences a general stance towards openness would be beneficial to the disciplines?


Sure, I can concede that -- and by the way I still concede that data sharing done right should be done in social and medical sciences too.

However, he put in the introduction of his post a warning about his field, as if he expects that his arguments applies especially to experimental psychology.


Personally identifiable details can be removed from data.


There is a seminal paper showing how carelessly adopting this point of view can lead to disasters: "87% of the U.S. population is uniquely identified by date of birth, gender, postal code" [1]

In many other cases, as pointed out, it's just not possible. I'm studying mobility patterns through cellphone metadata. Even if you strip out the actual phone number with a random ID you still know where a person is going, and thus re-identify them if you have other public data.

[1] https://en.wikipedia.org/wiki/Latanya_Sweeney


full name + date of birth + place of birth.

That leaves about 500 non unique individuals in a country of > 50 million inhabitants.


The very depressing opposite conclusion was discovered in the late 70's:

DE Denning, PJ Denning, M Schwartz, ‘‘The tracker: a threat to statistical database security’’, in ACM Transactions on Database Systems v 4 no 1 (1979) pp 76–96

A general tracker can always be found, unless the data released is extremely restricted. Almost anything is personally identifiable as it can be used to build a tracker into the database.

I am aware of this result from chapter 9 of Security Engineering (http://www.cl.cam.ac.uk/~rja14/book.html) by Ross Anderson, if you are more generally interested.


They actually usually cannot, especially if you want to still be able to do research into them (which is the motivation behind differential privacy).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: