Hacker News new | past | comments | ask | show | jobs | submit login
Unique in the Crowd: The privacy bounds of human mobility (nature.com)
104 points by lemming on June 12, 2013 | hide | past | favorite | 19 comments



Somewhat relevant: My brother attended a conference where a speaker said that gait of human beings are so unique that an individual can be identified with 100% certainty using the accelerometer of the phone in their pocket.

EDIT: Found a paper on the subject from 2009: http://www.cs.yale.edu/homes/mfn3/pub/mfn_gait_id.pdf

EDIT 2: Found another paper from 2012 claiming 99.4% accuracy: http://users.ece.cmu.edu/~juefeix/btas_2012_felix.pdf


I worked in a lab at Univserity that had several projects using gait analysis. It is very identifying, but much harder to collect than this kind of data -- you need a physical device on the target.


Still, it could be very useful for intelligence agencies. If they have identified that a phone is being used by a terrorist they can identify which terrorist is carrying it if they're able to get access to its accelerometer. Or they could gain access to millions of phones' accelerometers and identify which ones are being used by known terrorists. At the very least I expect to see it in some spy movies in the next few years.


Given the recent leaks, I think we should start using "terrorist" in quotes, at least when referring to NSA "terrorists".


Don't most people carry a mobile with them?


Yes, but I don't think the phone companies save your accelerometer data like they save your phone call metadata. You'd have to get access to the phone's accelerometer somehow.


I've always assumed that, given the sheer and ever-accelerating quantity of data produced by people in the 21st century, the similarly increasing collectability of that data, and the massive benefits stemming from making all data more rather than less accessible, algorithmic systems capable of knowing practically everything about everyone are utterly inevitable. Attempts at camouflage are hopeless. Data collection from every avenue is only accelerating (imagine Glass, Kinect, Leap, Streetview and Fitbit recording everything, everywhere, across the whole globe, 24/7) and even the absence of a signal is a trace.

The main flipside is that I see no reason why this power should be restricted to any one sector of society, although it will flow first from those sectors which can sustain the most focus and wield the most resources (as we see currently with the primary use of these systems by governments and large corporations). So the flipside is, okay, maybe the pattern of mouseclicks, touchscreen interactions, body movements and physiological signals of incipient terrorists can be identified, but maybe so can individuals planning government fraud or cronyism. Maybe the positive and negative traces left by surveillance agencies can also be detected and wielded against them. Code has no loyalty.

In other words, equilibrium of social power will be reached not by trying to prevent these possibilities from being explored, but from following simple economic logic and endeavouring to make use of them yourself. I don't see this as the death of anything, more as a new and inevitable frontier, a radically new state of play with massive rewards open to individuals willing to relinquish the old paradigm and embrace the new.


Just imagine the uses:

- the govt could build a really detailed voter database; they could pin the political leanings of a person by the list of web pages they browse, their FB and Twitter feeds, or the analysis of the email and phone contacts. This list could be used to improve the efficiency of "get out to vote" campaigns and donation drives

- the govt could run a "graph rank" or "page rank" algo over the network of interconnections to determine the influencers; then, in a sensible situation, they know who to silence first; this would make political crackdown very efficient

- the govt could data mine who's committing crimes and infractions; people who imagined they would slip under the radar will be caught; in the past there was a cloak of anonymity and an asymmetry - there were too few policemen and judges to cope with the illegalities, they had to pick and choose whom to prosecute, but now they could auto-prosecute by machine learning (just like MPAA auto-sends lawsuits based on mere IP addresses)

- if they wanted, they could selectively silence certain people instead of blocking FB wholesale; this, applied on the list of influencers would wreak havoc in the activist social networks

- our health, sexual, religious, political and drug use status would be used against us by governments and corporations; there would be no forgiveness and no forgetting

- economic espionage; ability to blackmail people (because they know all their secrets); ability to blackmail people inside various companies to secretly install backdoors (that's how China gets FB data, from what I read)

There is no escape except self censure. Whatever escapes our heads is public and there is no privacy left. It was inevitable.

The first to be monitored will be people working for the state and especially in NSA, politics and large corporations. They will be the first victims of their own creation. Activists too.


Self censure is futile. We all generate far too many data points, its simply inescapable, and machine learning can work with anything, it doesn't have to be explicit as long as you have enough little details. But I don't think there's any reason to panic. Governments and large corps just have a head start over the public in utilising these possibilities, but there is no reason for the situation to remain one-sided forever. Besides, any serious attempt to leverage these possibilities to their fullest extent would result in essentially an instant civil uprising.


Someone please edit the title - extremely misleading!

Here's what the article states: "In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals."

In other words, they need the location from which you made the call. This data is not available in a normal CDR (call detail record) that carriers routinely use for billing. This is the same data you see on your phone bill. All it has is the date/time, to/from, duration and outcome.

As for the location, there were a number of papers floated in the past that specified that anywhere from 3-5 destinations identified a person uniquely. In fact, a persons daily commute is enough to identify most people. (Don't have a link to paper that found this.)


That's true, but according to the Guardian here: http://www.guardian.co.uk/world/2013/jun/06/nsa-phone-record... Verizon handed over location data to the NSA (the original court order states "comprehensive communications routing information" and "trunk identifiers" should be handed over). It's reasonable to assume that the NSA will have that information, which makes their defence that "we don't get your name" pretty laughable since they can almost certainly just derive it.


According the Washington Post and other outlets, they do have approximate location data: "So the NSA is collecting information about my location as well as who I’ve called? It appears so. Cellphones make calls using the closest tower. So if the NSA knows you made a call using a specific tower, they can safely assume you were near that tower at the time of the call."

http://www.washingtonpost.com/blogs/wonkblog/wp/2013/06/06/e...


Correction:

Unique in the Crowd: The surveillance bounds of human mobility


I'd absolutely agree. I'm sure I'm an outlier, but I'm definitely recognizable - my calls:

* my girlfriend, in my town (Olympia, WA) * the ambulance company I work for (also in Olympia) * my "daytime" employer, a software company in Scottsdale AZ

Even the convergence of 2 or 3 of these calls would likely identify most.


Even metadata is too much information.


Just like how it only takes a few GPS coordinates to figure out who you are with a high degree of accuracy. Anonymizing data doesn't mean much in the big data world.


Seems similar to claim that you can identify a person based on a handful of locations she visits from her regular daily routine.


*he/she, his/her


Cesar was my advisor and I was his first student. I've never been happier to see someone make a splash.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: