Hacker News new | past | comments | ask | show | jobs | submit login

> "There's only one way to be safe. And that's to collect only minimal amounts of data for minimal apps."

No. The only way to be really safe is to go live in a cave, not to have any friends and never go anywhere near any tech ever again. I'm sure you can tell that I don't consider that to be much of an option.

It's unhelpful to label 'data collection' as the source of the problem as 'collection' happens everywhere in our interactions with the world. There are valid arguments and discussions to be had about who really owns that data. It may well be about me, but that doesn't necessarily mean that it's mine. I'd say the majority of people are only just beginning to understand that a discussion needs to take place even though technical folks have known this for a long time.

One solution to the 'ownership' problem is to make it possible for everyone to run their own infrastructure. i.e. have their own 'backend' that apps/services can be installed into, which their end-devices can then connect to. This is already possible for the technically savvy but a lot of work is needed before I feel we can trust such systems (I'm working on approaches to this [1,2], based on unikernels).

In the meantime, I applaud business models that are not built around profiling (which is the crux of the issue cf. advertising). They're the only ones where the incentives have any hope of aligning.

[1] http://nymote.org/blog/2013/introducing-nymote/

[2] http://amirchaudhry.com/brewing-miso-to-serve-nymote/




My thoughts are basically:

+ "Run your own infrastructure" is currently a pipe dream for what is 99.9% of people.

+ Companies selling data when they go bankrupt or M&As changing the nature of what they do with the collection happens regularly.

+ Discussing ownership is fairly simple as well, "Does the company already have your data?"

+ Any business not based around profiling has a additional options to make more money by selling their (your) data.

Why isnt excess data collection a problem? Is it because its very very hard to solve and if we pass laws bad actors will continue to flaunt them? Is it because we are past the point of no return?


> + "Run your own infrastructure" is currently a pipe dream for what is 99.9% of people.

Or not even a dream, because those 99.9% prefer not to run their own infrastructure even if it's easy and cheap.

However, "local apps" (ones which don't require infrastructure) could be appealing if they could achieve parity with the enormous-backend apps.


I just find it hard to run my own infrastructure. I tried setting up an e-mail server once, and the e-mails I sent from it were directly routed to the Spam folder of any Gmail account I tried to send them to.

I tried DKIM, but the e-mails were still sent to Gmail's spam folders. Eventually I gave up on self-hosted e-mail.


> Is it because we are past the point of no return?

If you look at the computer history, we already had several cycles from single computer to timesharing to mainframes/supercomputers to PC to thin-clients/mobile to (?). There has always been a time when single powerful computers were in the majority and then there has been times when thin-clients with big server structures were the majority. The Internet supports also the model of more decentral computers and everyone could host their data on their own devices. It's currently just not the right time and there is no business model behind it, so no one push for that solution at the moment.


Or researching fast fully homomorphic encryption so they can one day be able to mine data while preserving strong user privacy.


Thank you for posting a well-balanced summary of the situation.

I think the biggest difficulty with balancing privacy concerns against other factors -- or even encouraging debate about and awareness of the issues among non-technical friends and family -- is that an item of data is itself neutral. It is how that data is used or combined with other data that may or may not be in any given party's interests.

Sometimes the exact same technology or data could be used for very good purposes, useful and generally harmless purposes, or hostile purposes, depending on the context. For example, consider automated number plate tracking of motor vehicles. If your child has been kidnapped and a witness caught the plate of the kidnapper's vehicle, you're going to appreciate the police being able to find and intercept that vehicle as quickly as possible. If you're an urban planner responsible for keeping transport infrastructure as efficient as possible in the face of a rising population, an aggregated data set showing how real travellers want to move around your city could be very useful, helping you make decisions that improve the system for everyone. If you're that same urban planner but on the side you're working with a load of jewel thieves and abusing your access to historical movement records to figure out when specific wealthy residents are usually away from their homes so the thieves can break in and then abusing your access to real time tracking to confirm that the residents really are out or warn if they come home early, that's not such a happy ending for the data subject. Analogous issues arise with many kinds of personal data, including more sensitive areas like financial or health data.

In the modern world, with vast databases and powerful data analysis tools and effectively instant communications and effectively unlimited storage, sometimes seemingly innocuous data can also give away a lot more about you than you might want or need. Things like what you bought at a store, or who you were tagged with in photos on social media, or a recording of you walking across a street on CCTV, can be used to determine many apparently unrelated things about you with relatively high (but, significantly, not complete) reliability, again sometimes quite sensitive ones that you might very much prefer to keep to yourself. Consider the store loyalty programme that determines a girl is pregnant from her purchasing patterns long before her partner or parents know. What about the person outed as gay because they were tagged with a whole group of openly gay people on holiday? Oh, by the way, unlike their friends in the photos, the first person lives in a country where homosexuality is still frowned upon and being open about it has real consequences. Also, gait analysis said you looked nervous as you walked into the airport, but it couldn't tell whether you just read a news story about a plane going down or you have inside knowledge of a terrorist threat, so unfortunately you won't be flying today. Just wait until automated text analysis software reaches the point that it can effectively de-anonymise posts like this one, and suddenly a lot of people who thought posting under a pseudonym was going to hide their criticism/whistle-blowing/advocacy of some controversial subject realise they weren't as safe as they thought and those comments are now permanently recorded on some public web site.

We therefore need to move beyond black and white assessments like "data collection is dangerous" or "social media sharing is fun". What matters is not just what data is collected, but also who has access to that data, what they are allowed to use it for (including for how long it's stored, what else it can be combined with, and the like), and crucially, how these things can effectively be controlled or regulated to ensure that everyone is playing by the rules when data is data and once someone has it for one purpose it can readily be used for another.


Agreed that data is in and of itself neutral. Similar to a gun it is not its use but how it is used that can be detrimental. In fact, this is true of most anything from a pencil to a car.

I think at the core however the issue is not how the data is used but one of ownership and privacy. It is "my" business where I drive. It is "my" business whom I call. It is data about me and my behavior. So to have that information about me used, even if it could be assessed publicly, without any need for consent from me becomes the issue.

License plate readers are tantamount to placing a tail on me. The same with tracking my movements via several CCTV cameras, etc.

Retention just makes the situation worse. Yes, you could approach it with laws of appropriate use of this data but the problem is what about when the laws change. And the data is still there. Similar to how people gave data to Radio Shack years ago to find it now being sold to 3rd parties. Privacy policies are constantly shifting for the companies' best interests. Who can keep track?

The problem is that once the data is collected it is out of your hands. And why the regulations need be at the collection level with strict opt-in requirements as well as the usage and retention level.


It is "my" business where I drive.

True enough. However, it is also the business of the people who are responsible for building and maintaining the roads to know how those roads are being used.

It is "my" business whom I call.

True again. However, it is also the business of the phone company who must provide the service you ordered and to which you will owe money based on the calls you make and, in some cases, who you are calling.

I do understand your concern about data being out there at all. However, the reality is that we interact with the world and the people around us all the time. Sometimes that will result in data that refers to us but also affects other people for legitimate and often unavoidable reasons. So I don't think an extremist position that no-one should ever be able to collect data about you without your explicit consent is ever going to work. As others have noted, that would mean you couldn't interact with almost anyone or anything in the modern world, and unless you're planning to live 100% off-grid as a hermit that's just not a viable possibility. Instead, we should consider issues like the retention and repurposing of data.

Licence plate readers that are automatically tracking cars through a congested area that has variable speed limits are potentially in everyone's interests: smoothing out the traffic flow makes everyone's journey faster and safer, and has basically no downside. However, once a vehicle has left that area, it is no longer necessary to keep any specific details about it for that purpose. The data can be discarded, or completely anonymised simply by turning each plate into a unique but otherwise meaningless number before it's recorded if it's useful to store aggregated data for more general traffic planning purposes. Similarly, if plate recognition is being used for enforcement of that speed limit, there is no need to record the details of anyone except those the system has determined to be exceeding the speed limit, where the evidence will be used for a subsequent prosecution. Once any resulting legal processes have run their course, the data can also then be completely discarded if no conviction resulted.

The risk in either case is not the scanning itself, it is the retention of the data and potentially use for other purposes and correlation with other data sets later. Given robust rules about keeping personal data no longer than necessary for its stated purpose, and probably about declaring its stated purpose in a meaningful and usefully specific way, this is not so much having a tail as having a driver in a car behind who happens to be following you for a while on the same road but then forgets they ever saw you within moments of going your separate ways. I doubt even the most privacy-conscious person would consider that an unreasonable risk in other contexts or expect to be able to prevent it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: