Hacker News new | past | comments | ask | show | jobs | submit login
Your address book is mine: Many iPhone apps take your data (venturebeat.com)
126 points by kirbmart on Feb 15, 2012 | hide | past | favorite | 54 comments



> On the web, Twitter informs its members that it stores contacts for up to 18 months, and may use contact information to make “Who To Follow” suggestions.

Twitter is the only one of these services I use, and when I read that line it took me 5 minutes to decide whether to actually uninstall the Twitter client from my phone. I decided to try to get this bad news from the horse's mouth.

https://twitter.com/privacy contains Twitter's privacy policy. Does it actually say what the article claims?

TL;DR: No.

The number '18' occurs exactly once. It is used in this context:

Log Data: Our servers automatically record information ("Log Data") created by your use of the Services. Log Data may include information such as your IP address, browser type, the referring domain, pages visited, your mobile carrier, device and application IDs, and search terms. Other actions, such as interactions with our website, applications and advertisements, may also be included in Log Data. If we haven’t already deleted the Log Data earlier, we will either delete it or remove any common account identifiers, such as your username, full IP address, or email address, after 18 months.

The word 'contact' appears 5 times, in these contexts:

1. If you have any questions or comments about this Privacy Policy, please contact us at privacy@twitter.com.

2. We may use your contact information to send you information about our Services or to market to you.

3. If you email us, we may keep your message, email address and contact information to respond to your request.

4. If you become aware that your child has provided us with personal information without your consent, please contact us at privacy@twitter.com.

5. Page footer: © 2012 Twitter About Us Contact Blog Status Resources API Business Help Jobs Terms Privacy

The phrase 'address book' appears once, in this context:

Additional Information: You may provide us with additional information to make public, such as a short biography, your location, or a picture. You may customize your account with information such as a cell phone number for the delivery of SMS messages or your address book so that we can help you find Twitter users you know.


You think what you've quoted from their privacy policy exonerates Twitter, but you may simply be misinterpreting it.

Twitter Inc. has acknowledged that after mobile users tap the "Find friends" feature on its smartphone app, the company downloads users' entire address book, including names, email addresses and phone numbers, and keeps the data on its servers for 18 months. The company also said it plans to update its apps to clarify that user contacts are being transmitted and stored.

The company's current privacy policy does not explicitly disclose that Twitter downloads and stores user address books.

It does say that Twitter users "may customize your account with information such as a cellphone number for the delivery of SMS messages or your address book so that we can help you find Twitter users you know."

As with many online social services, Twitter allows users to look for friends that are also registered users. In the case of Twitter's iPhone app, users see a screen noting that the service will "Scan your Contacts for people you already know on Twitter." The short description of the feature does not mention that it also downloads every entry in the address book and stores it.

Twitter's current privacy policy notes that some categories of "Log Data" are stored for up to 18 months.

"Log Data may include information such as your IP address, browser type, the referring domain, pages visited, your mobile carrier, device and application IDs, and search terms," the policy says. "Other actions, such as interactions with our website, applications and advertisements, may also be included in Log Data."

http://www.latimes.com/business/technology/la-fi-tn-twitter-...


> You think what you've quoted from their privacy policy exonerates Twitter, but you may simply be misinterpreting it.

Exonerates? Not at all!

My point was just that (contrary to the article) Twitter doesn't make any claims whatsoever about how long they hold your Address Book data.

In fact, since it's mentioned along with other "data with which users may customize their account," that sort of implies that they consider it an integral and permanent part of your customized account profile. If someone told me that the data stays around until I choose to delete my account, I don't see anything in the privacy policy that would contradict that.


I really think Apple needs to do something here with their next update. For every company that does the right thing by asking permission first, who knows how many are being sneaky.


Yes. The API should be updated now to require a permissions modal before an app can access your address book. I'm actually shocked this wasn't built in to the SDK from day one.


Agreed. But you're being too kind on Apple. This is the worst privacy breach from Apple in iPhone's history.

No point blaming app programmers. The functionality for apps to acquire the address book without asking shouldn't exist.

Dear Apple, thank you for protecting me from adult material in the app store. But, can you... er, this is awkward... can you NOT GIVE MY ADDRESS BOOK AWAY WITHOUT MY PERMISSION? Thanks. And sorry for yelling, it's just, y'know, my address book and all.


The app programmers do have to take the blame. Those breaches of privacy have always been possible on desktop PCs but app programmers usually didn’t do them because that would make them a pariah.

I do not know why developers for mobile apps suddenly think that has changed. But they do. That’s certainly a problem and Apple should react to it quickly. The culprit, though, are still the developers who overstepped a pretty clear line.


"Always been possible on desktop PCs"

Well, yes, but unlike on your iPhone you could actively do something against it. E.g. let outlook encrypt your address book, change the addressbook access permissions etc. it was trivial to bar anyone/thing from accessing your address book without having to remove the software you want to use.

On the iphone, you can only chose to install an app or not, if you chose to install, you have to accept anything that comes with it.


Trivial? No, not at all. Not in the slightest.


I think I did not make my point clear enough. It is trivial to make it inaccessible to programs who assume that it is easily accessible. I did not mean to include solely malicious programs.

If you stick your addressbook into a truecrypt container 100% of programs (i know there is no 100% security, but there is not enough space to spell out all 99.999999s) will not be able to access it anymore without you unlocking/mounting it first. Thus, requiring your permission.


That would hurt the virality of their apps, and therefore not make their app ecosystem as useful (or as competitive, if competing with apps on other platforms).


"Virality" of apps is not a problem users should bear. Good apps will be evangelized by users if they are genuinely useful. I'm suspicious of the utility* of apps that need to goad non-users into joining.

* Usually, 'utility' is not the primary driver of these apps, it is more in building a large userbase quickly so VC attention can be garnered.


What if your address book was hashed (sha1, bcrypt) and then uploaded? In that case, all that would be uploaded is a list of hashes for the email addresses or phone numbers of people i know.

Then, when another person signs up for an account, it's easy to see who they should suggest they should join, but nowhere is any personal data being stored.

Sounds like it'd work to me?


It would take laughably small time to bruteforce the phone number out of a hash. Knowing the algorithm I can compute all possible 10,000,000,000 combinations and store them in one file.

Same goes for hashing IPv4 addresses. There is no way to make eitther one secure by hashing.


Concatenate FirstName+LastName+emailAddress+phoneNumber, then hash or HMAC that.

Your lookup table just got a lot bigger.


And the likelihood that you'll match somebody got a lot smaller. Names have different spellings, phone numbers have different spellings (to say nothing of different numbers), people have multiple emails. You can't canonicalize some of these either.


So send multiple hashes per contact!

Feral, Chimp, myEmail, myDeskPhone, myCellPhone

H1 = sha1(Feral-Chimp-myEmail-myDeskPhone-myCellPhone)

H2 = sha1(F-Chimp-myEmail-myDeskPhone-myCellPhone)

H3 = sha1(Feral-C-myEmail-myDeskPhone-myCellPhone)

H4 = sha1(F-C-myEmail-VALUEOMITTED-VALUEOMITTED)

...

...

So flexibility is available, it's just more computationally expensive for the server to do the extra comparisons. The hash calcs use a little extra battery per user, but I'm much happier to donate some of my processor/battery than I am my Address Book contents.


Not all names are simple or even computable transforms though. How do you deal with entries like "Mom" or "Steve," names like "St. Clair" that may be entered dozens of different ways, or non-English names like "姚明" that may have dozens or hundreds of different possible romanizations, anglicizations, francizations, and spellings under any other language?

How much extra battery are you willing to spend, and more importantly, how much effort are you willing to spend running your users' batteries down for something that you'd never yourself use?


> How do you deal with entries like "Mom" or "Steve," names like "St. Clair" that may be entered dozens of different ways, or non-English names like "姚明" that may have dozens or hundreds of different possible romanizations, anglicizations, francizations, and spellings under any other language?

That's sort of beside the point, though. The question is: How does uploading my Address Book information provide a much more efficient or complete solution to that set of problems? I don't think that it does. In each case, you're dependent on heuristics and sometimes those heuristics are going to miss a match. If I were Twitter, I'd happily accept those misses in exchange for hitting the matches where people spell each other's first names and email addresses consistently (for example).

> how much effort are you willing to spend running your users' batteries down for something that you'd never yourself use?

You lost me there. My point was just that for Twitter, my iPhone CPU/battery are an "externality." So while they might spend more cycles doing server-side comparisons on the hashes my phone sends over, at least they're not paying to compute the hash values.


The engineer in me says you could soundex/metaphone the names before adding them to the hash. The rest of me says apps just shouldn't be doing this.


I'm probably out of my territory (I don't have a phone right now[1]), but my first UI instinct would be to allow address book entries to be grouped (or tagged or labelled, the actual mechanism is secondary), and on first request from the app being asked what groups I'd like to allow it access to. Bonus points for the option to create a per-app group and put some users in right there and then.

But as I said, outside of my territory.

[1] I'm not that old, really.


But all we care about is making it look secure for the hobbyist packet sniffers, right? (kidding)

Although what about hardcoding a salt into the app to hash with?


Salting wouldn't help. The data is collected to match it to other users' data, so the salt would have to be identical for all hashes for all users. An attacker would just precompute the dictionary with the hard coded salt, nullifying its purpose.


You're right in the case of SHA1. However, with a suitable work factor, you can't bruteforce bcrypt AFAIK


1) bcrypt uses salts so that won't work at all

2) if you use some other computationally expensive hash you run into the problem of low powered mobile hardware. Remember you're not hashing 1 thing, but dozens or hundreds of phone numbers or email addresses.


You know, i think you're right - if we use different salts for each emails/phones, there'd be no way of ensuring that different users use the same salts, or even if we stored the salt along with the hash, it wouldn't be much use. Hmm this is a tricky problem!


I'm pretty sure the parent meant "precompute" (e.g. a rainbow table), not "brute force".


Again: you're right in the case of SHA1, but rainbow tables don't apply to bcrypt because of the varying work factor, also you can use a different salt for each person.

Crypto is actually really cool these days, there's pretty much a solution to every weakness :)


So your solution to providing a "secure" way to compare phone numbers, etc, between users is to make the hashes non-comparable? Remember, the goal is to make it so two people can come up with the same hash in order to "find" each other.


What problem would this solve?

If my problem is that I don't want Facebook to have the phone numbers in my address book, then surely I don't want them to have the SHA1 of each of the numbers in my address book, either?

Isn't it going to be easy for an organisation with Facebook's resources to build a dictionary of the space of SHA1s of phone numbers?

And if they salt the numbers, first, then they can't be compared with each other for suggestions? So whats the point?

Have I overlooked something?


It avoids accidents. With plain text data, there is a lot of room for accidentally leaking private data (for example, to server logs). Building a sha1 dictionary isn't something you do by accident.


Facebook only knows the contacts that match i.e. are already on Facebook. And that's exactly what the user wanted to find out.

It's true that they could store the hashes and build a graph of mystery contacts, but that's still not as bad as taking the actual contact info.


If users aren't asked for access to their address book, this is still a potential public relations nightmare, since many people will inevitably be distressed that a company will be able to know who was in your address book even if they don't know those people's phone numbers.


I don't think anyone has ever had a problem with explicitly asking and allowing users to opt-out. Apps that do that shouldn't be lumped in with the other ones.


Which ones are lumped in? Facebook and Instapaper are explicitly mentioned as examples of apps that prompt for permission before accessing the address book. Instagram, Foursquare and Path's prompts are only mentioned in passing, but those apps only added a prompt after the original story broke a week or so ago.


Upload or Store?

Path stored the data but I know 4sq does a search against it but does not store it. That can make a huge difference...


As can when it happens - I think 4Square was already pretty reasonable, because they only uploaded the address book when you hit "Find my Friends using Contacts" or something like it.

Path, to me, was only particularly bad because they'd upload your contacts every time you signed in.


How so? For one thing, there's no way to know if they're even being honest about whether they store the data.


That's certainly true, but at some level you have to trust us right? Consider that the whole basis of foursquare is that you're telling us where you're going or using us to figure out where to go next. That information is probably a lot more sensitive than your address book.

I think we've earned that trust over the past 3 years, and will continue to earn it over and over again into the future by sticking to our word and being transparent about what happens to your data when you send it to us.

I'm not sure how else it could work?


That's essentially what I'm saying. Installing an app on a phone with personally information is implicitly trusting the developers of that app. A least for technical people, it would be foolish to blindly trust a phone manufacturer's sandboxing and policies.


Facebook always stored. Even when you gave it access from the desktop to your webmail client, etc.

That's how it knows to suggest people that you should friend once you create an account for the first time and DON'T give it your address book: it stored people that had your email listed in their address books.


I don't think those firms necessarily view this behavior as 'broken' to begin with.


That instagram screenshot clearly shows that the transmission is over https, which means that the whole "susceptible to would-be interceptors" thing invalid. That is, unless you're under attack by a man-in-the-middle proxy. If this is the case, then you have bigger issues on your hands. The foodspotting screenshot is over plain http, though, so it IS susceptible to a normal eavesdropping attack.


There's a fine difference between uploading and storing, and uploading and not storing.

The issue with Path is that they stored the data. Instagram and 4Square do not. They have to re-crunch the "numbers."


You know how every once in awhile you hear about some celebrity's phone getting "hacked" and their contact list stolen?

HHmmmmm........


"All your address books are belong to us!"


Just wondering if you could:

1. Hash first on the device (SHA-2, no salt) after converting to common case and removing extraneous characters from contact data. 2. Send hashes over secure connection (SSL, TLS). 3. Hash again on the server (SHA-2, salted with value known only to service provider) 4. Delete all data a reasonable time after comparison / mapping is done.

This way, although it's not unbreakable, there are the advantages of:

- Encrypted, pre-hashed data over the wire. - Easily comparable data on the server. - Reasonably secure server side storage as long as the salt is secured.

Dumb idea?


Twitter also does it as part of the special iOS5-privileged settings panel:

http://cache.gizmodo.com/assets/images/4/2011/06/ios5twitter...

That must have had close Apple review as part of the official iOS/Twitter integration.

The explanation underneath the 'Update Contacts' button is somewhat reasonable, though it may not register with most people that, barring some unlikely fancy indirection, to 'use' email addresses and phone numbers means they're being reported to Twitter's servers.


Honestly, I don't understand the fuss. I thought everyone had figured out and come to terms with the fact years ago that social media is all about gathering as much data as possible. The degree of precision with which Facebook, Twitter, Linkedin, etc. recommend "people I might know" makes it pretty obvious that they know a lot about me, regardless of where they got the information from. And I don't blame them in the least - these are all free consumer apps that can only exist by having a lot of users, so if they can get a few more by utilizing data that is right in front of them they'd be crazy not to use it. If anything, people should be angry at Apple. Considering how notoriously annoying their approval process is for the sole purpose of protecting their users, they probably should have made app permissions more explicit or let users opt out of individual permissions like facebook recently started allowing.

Additionally, nobody has touched on the fact that companies can use address books to prevent fraudulent use. For most of the companies listed there isn't too much to be gained from fraudulent use. But you can imagine for services that frequently have to address fraud and, say, don't want a single user to have multiple accounts or that want to make sure all their accounts are owned by real people, doing things like cross-validating address books can be very useful. This can still be done if you hash the names and phone numbers before uploading them, though, which is maybe what everyone should be doing.

tl;dr Data is money/power these days, it's strange that people are shocked by companies making use of all the data they have access to.


What this whole line of stories has forced me to think more about than I had in the past is the question: "Why do these companies even exist? Why'd they even create the app they created?". Take Path ... what's its purpose? Are they developing it because they want to give people a better way of sharing photos with their friends? What's in it for them? Why would they even care? Furthermore, why does that information even matter? Ultimately, the only answer I keep coming back to is that the only reason these companies exist is to collect information, and re-purpose that information in the form of intelligent advertising, or other revenue-generation relationships they have with partner companies that find value in knowing as much about someone as possible in order to generate some portion of their overall revenue by using that information to their advantage. Same goes for FB, or any other social service.

I guess I'm just deflated in that while some aspects of these services enrich my life in some way by exposing me to information I might not otherwise have easy access to ... their primary reason for existence seems misguided from the start. A service built just to advance the quest towards personal or shared wealth feels unnecessarily shallow to me when the ingredients used to generate that wealth are of such a personal nature.


If you come home from work some day, to find Facebook employees going through your dumpster, are you going to be ok with that?

Probably not, right? You know they want to gather as much data as possible, but you are angry, because you never gave them permission to go through your thrash.

I think thats how some people feel if an application goes through their phonebook, when they didn't give it permission to; the phonebook is not information people consider public; its privileged. Thats where there's a fuss.


That's a reasonable analogy, but I still think the anger should be directed at apple. The way I see it, it's like I told my friend to watch my sandwich for a minute and I come back and someone else is eating it because my friend had handed it to him. I'm going to be mad at my friend; the other guy took advantage of an opportunity to get a free sandwich, which I can hardly blame him for.


Analogy isn't needed at all. Address book uploaded without permission... enough said. To your second point, I agree, Apple should have a setting on the iPhone that denies ANYONE but the owner of the iPhone access to the address book. In addition, each app should be forced to ask for permission to use contacts. The ball is in Apple's court to explain.


THe thing we're shocked about is that apps could access our address book and upload them wholesale to their servers without our knowledge or permission. That's a huge privacy violation.

Facebook knows about the connections I've told it about, or others have requested and I've approved.

Yes, it may be useful - but the address book on my phone is quite personal, and not something I would hand over to a 3rd party readily. There are people in there, let's say, who I wouldn't WANT people to know are in there. There may be people in there who hate other people in there.

So.. they can take my address book and do what they want with it? No amount of "cool stuff" adds up to allowing a 3rd party company to have the contents of my personal contact list, sorry. This is really bad.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: