Hacker News new | past | comments | ask | show | jobs | submit login
Is the NSA Blinded by Big Data? (medium.com/surveillance-state)
25 points by mixmax on July 23, 2013 | hide | past | favorite | 5 comments



The author might be underestimating the NSA's technical ability. The logic here is that each person has ~300 friends, and three hops is 300 x 300 x 300 = 27 million. This is a big number, but the NSA doesn't have to give the same scrutiny and technical resource to each person. It's well known that most of the connections won't be interesting and there are methods to reduce the size of the list.

Which brings us to the author's dismissal of pattern recognition. Statements like "A pattern recognition algorithm would come up with “young Saudi men” as the closest heuristic for the 9/11 bombers" are equivalent to "I've designed a bad classifier in my head and I'm telling you how wrong it would be". The NSA has more than a few smart people on this and they are probably working all the harder due to so much public attention. Algorithms can get very sophisticated with a sufficient amount of data and processing cores.

The reality is that the NSA doesn't have the technical capability to find meaning in billions of communications. But they aren't stupid either. The scale of their data isn't going to hide us. Whether or not they'll use the data responsibly is another question.


Do you know what mass surveillance and analysis is good at? Chilling political movements and spotting rising leaderships.

I bet NSA is not blind to that. Actually, they're not blind at all.


There's one huge flaw in the exponential math here: duplicates. There's a good chance that a large number of people your friends know are already included in your network – the percentages will vary, but it definitely cuts down the pure exponential growth.


Several simplifying assumptions were made; in a real world scenario you would have a certain number of duplicates that is shared with any given contact; but the overlap of redundancies between contacts is going to be low. You also need to figure that some of your contacts have an order of magnitude more contacts than your your average contact, and that they are more likely be connected to other super-connectors. So your second order-contacts are likely to be richer in people who have more contacts than the average networker...

Even building a decent estimator for this sort of problem is a hard problem; since you need to get some good estimates for a number of distributions ( number of contacts, number of duplicates between pairs of contacts, number super-connector bilateral contacts, etc.).

My general sense is that duplication of contacts has little effect on the number of contacts of contacts of contacts; and that the number of super-connectors you know would have a greater impact.


So instead of exponential growth, each additional suspect causes the total to asymptotically approach 100% of the vertices in the graph. I can't help but feel that's what they're really going for with three hops.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: