Hacker News new | past | comments | ask | show | jobs | submit login
Study of Thousands of Dropbox Projects Reveals How Successful Teams Collaborate (hbr.org)
166 points by m0nhawk on July 23, 2018 | hide | past | favorite | 78 comments



>> Dropbox gave us access to project-folder-related data, which we aggregated and anonymized[...]

Wait, Dropbox gave away non-anonymized data to a third party and they then anonymized it. Wow, what could go wrong? Just thinking of the endless possibilities of where all that data is now... Its deeply troubling how much unwarranted trust there is when it comes to handling of personal data.


This is not at all surprising.

Dropbox put Condoleezza Rice on their board, who supports warrentless wiretaps [1].

I deleted my account when they did that. Not so much because it would have any direct effect, but because it’s clear that we have differing views on how user data should be treated.

I surprised that people are shocked by them treating user data like this, it’s absolutely in character.

[1] http://www.drop-dropbox.com


just deleted mine (which I hadn't used in years anyways).


It's been edited:

> Dropbox gave us access to project-folder-related data, which Dropbox had aggregated and anonymized,

> we and Dropbox employees could view no personally identifiable information

> Editor’s note: We’ve clarified this article to say that Dropbox anonymized and aggregated the data before providing it for this analysis.


I just cancelled my subscription.

Truth be told I've also been dissatisfied with the price of the Plus and Pro subscriptions, relative to what they provide, with their support and their direction, so was looking for motivation to move.

This is just icing on the cake.


Given the very public failure of Netflix to "anonymize" data, they shouldn't be even giving away anonymized data without user permission on account of anonymized data not actually being anonymous.


It's not already in their TOS that they'll share any data they feel like with any one they feel like?

https://www.dropbox.com/terms#privacy

It certainly says the words:

"We may share information as discussed below, ... Others working for and with Dropbox."


This is deeply troubling. As a scientist who uses* Dropbox I gave no informed consent. I know they claim personally identifiable information was removed but still I gave no consent for this.

*not for long, perhaps


I can't speak to how informed you were when you gave the consent, but if you are using the service, you provided it.

"Law & Order and the Public Interest. We may disclose your information to third parties if we determine that such disclosure is reasonably necessary to: (a) comply with any applicable law, regulation, legal process, or appropriate government request; (b) protect any person from death or serious bodily injury; (c) prevent fraud or abuse of Dropbox or our users; (d) protect Dropbox’s rights, property, safety, or interest; or (e) perform a task carried out in the public interest."

I would assume that this research fell under the "task carried out in the public interest" clause.


If THAT is their defense —- we all agreed to it even if we didn’t understand it at the time —- well then good luck to them


Isn’t public interest a criteria for getting IRB approval? Having read all the recent AoIR threads on ethics, this doesn’t seem outside of the accepted norms.


Accepted by who? Obviously not the person you are responding to. Ethics is a relative field, not one full of absolutes.


Why else would Condi Rice be on the board /puts on tinfoil hat


That doesn't stand up under the GDPR any more as far as I know.


GDPR explicitly exempts anonymized data:

"The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes." [1]

[1] https://gdpr-info.eu/recitals/no-26/


I imagine the work was carried out by a processor, which could be perfectly legal if the contract between the two entities had adequate data protection clauses. This is just a guess though, I'm sure its much more complex than that.


If they asked consent specifically for these types of studies, then it's legal.

As in: a form asking the user if their information can be used in this way and giving them the possibility of opting out. Adding one more clause to the privacy policy doesn't count.


opting-out is not OK via the GDPR. Only Opt-In is allowed or at least that's my reading

GDRP section 32

Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject's agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject's acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. Consent should cover all processing activities carried out for the same purpose or purposes. When the processing has multiple purposes, consent should be given for all of them. If the data subject's consent is to be given following a request by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided.

https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=15323486...

Am I mis-understanding?


You are correct, however this is for when consent is relied upon as the legal basis for processing.

My guess is that they are using provision of service as the legal basis for processing, whilst relying upon the "public interest" clause in the ToS to justify the sub-processing by the third party.


That doesn't work, you need to have a legal basis for all processing. It's hard to argue that operating the service requires this sort of research, so you need another basis.

There's some public interest exceptions, but from my knowledge it's not established that stuff like this would work under it.


Yes, you are correct. I think it would be extremely difficult to justify that this kind of processing was necessary for the provision of service.

It seems to me that an organisation the size of Dropbox would have a fairly watertight justification. However if the legal basis for processing is neither consent nor provision of service, then they must have done a pretty good job of obfuscating all PII (as the article says "...we and Dropbox employees could view no personally identifiable information.". If this is the case then this sharing of information may not even be in-scope of GDPR.

I'm not sure if the public interest exceptions would be a safe route to go down. The EU has made it clear that, like 'Legitimate Interest', the get-out-of-jail-free justification is going to be highly scrutinised.

EDIT: I have just seen that the article has been edited to say that the anonymisation and aggregation was carried out by Dropbox before being transferred to the third party, which kind of kills the discussion.


They could be relying on the "public interest" part of their TOS, in which case they'd possibly argue that this processing was necessary as part of the provision of service, and therefore wouldn't require any further consent from the user.

For the record: I'm not suggesting that what they did was ok, just trying to think about it from a GDPR perspective. Anonymising account information is great and all, but how can you be sure you've obfuscated all PII from information saved to file storage, unless you audit all that information - which in and of itself seems ropey from a data protection point of view.


Why do you belive that the work under discussion was performend in the interest of the public?

It seems hardly necessary to share data with HBR so that Dropbox can offer file-sharing services...

Based on my reading here: https://ico.org.uk/for-organisations/guide-to-the-general-da... this does not apply.


So it looks as though the article has been edited to say the anonymisation and aggregation was carried out before being transferred to a third party, which craps on our discussion a bit.

However, to answer your question anyway - I don't believe you could justify the work as being in the public interest. I think it would be an extremely tenuous link and I think you'd be a fool to try and rely on something as flimsy as public interest if you're not a government body, or processing data on behalf of one.

I suppose I was taking a stab at understanding what their thinking was to see if anyone else could provide me with something which I had not considered.



I continue to be stunned that people still somehow think that these services have any modicum of consideration for their users.

If you don't want your information accessed--run your own servers, people. That's your only option.


I wonder how they got approval from the Northwestern Institutional Review Board. Not having explicit consent from research subjects might indicate that they qualified for some form of exempt status. Did they sell them that Dropbox collects that data as part of their normal operation, therefore consent is not required? Did they say that Dropbox's anonymization was enough to guarantee subjects' anonymity? Did they say that Dropbox's user agreement already enrolls users into research projects?


As a Dropbox paying customer and never having heard of the Northwestern Institutional Review Board it's not them that pisses me off. I haven't reconsidered my usage of Dropbox for a very long time, since I made the decision to stay with them after their no-password fiasco. Today is the first day in a long time.


Looks easy enough to ask: https://irb.northwestern.edu/participants/questions-and-conc... Participant questions go to eyates@northwestern.edu and non-participant questions go to irb@northwestern.edu.


This may come across as argumentative, but it's still a valid question - What's the harm to you?


Imagine you have a folder in your Dropbox with 237 subfolders, and each of those subfolders has a certain number of files in it. The largest folder has 1,132 files, for example, the second largest has 916, the third-largest has 771, etc.

Then imagine you have a second folder with 117 subfolders with another pattern like above.

Now imagine that the first folder structure matches a torrent of embarrassing pornography and the second appears to be a superset of a project published to GitHub under your name (i.e. with some directories being gitignored)


Not the op but I'll answer for myself.

I've stored non anoynmized data on Dropbox as part of my own research. IRB gave me permission to keep that data and my consent form explained it to participants. We were all working under the assumption this type of sharing by Dropbox was impossible. My school's IRB does not allow the use of Google drive for nonanonymized data storage based on just this type of concern.


The point of consent is that it isn't your (or Dropbox's) opinion on what constitutes harm that matters. It is the person who gives consent.


Maybe the parent was performing a research project on Dropbox collaboration techniques and got scooped?

But seriously, as an example, I know people that share sensitive personal information with their accountants at tax time using Dropbox. Would suck for any of that to be made available to any third parties.


Given it was from universities, then prior disclosure of IP for patent applications could be a "harm". Even the directory names and structures could be key information in some applications.


The harm is that 1) data which seems anonymized can be de-anonymized due to carelessness or advances in analytical techniques, and 2) it’s mine. I have laptops in my house for example that I’ve not used in years and will never use again. That doesn’t give you the right to steal them even if there’s no explicit “harm” to me.


> Dropbox gave us access to project-folder-related data, which we aggregated and anonymized, for all the scientists using its platform over the period from May 2015 to May 2017 — a group that represented 1,000 universities. This included information on a user’s total number of folders, folder structure, and shared folder access

This seems like heaven for industrial espionage purposes. Just because there's some anonymisation doesn't mean that the metadata is useless. I sincerely hope they get GDPR'd over this.


If it's complete "folder structure" tree data with names removed, that could potentially be matched against public repositories that contain the same folders, like posted Zip files, GitHub projects or university web space, to identify some of the users.

From there, you could potentially identify other tools those people are using or embarrassing folder structures (e.g., deep folder tree structures people used to use to primitively conceal porn and other secret files, or signature folder structures for embarrassing repositories like erotica archives, collections of extremist literature, or piracy tools).

Hopefully they'll release more info and some sample files (like for themselves).


> Just because there's some anonymisation

"which we [...] anonymized" doesn't sound like there was any anonymisation.


It is only data from universities, not real for-profit businesses.


> It is only data from universities, not real for-profit businesses.

Plenty of universities work (often in collaboration with national laboratories and/or with corporations) on work that's far more important and critical than many "real for-profit businesses".


This is not better or worse. It's just as bad. Do you not think universities handle sensitive information?


A large part of what universities do is research in consultation with businesses and especially governments. It's not just teaching students; in many universities, that's only a small fraction of what the staff do.


ha. Have you seen what some countries charge for tuition.


University administrators are not shareholders! Their greed is much better for society...


Why the heck would an administrator care if they 'hold shares' when they are pulling 200k-500k per year as base pay?


Ouch! Is "greed" a bad word? It's not an inaccurate one...


The lack of consent requested is ridiculous. There may be some sort of 'obscure paragraph in the terms and conditions that says Dropbox can do whatever they want' but this is horrible for privacy and business security. I'm glad I've been client-side encrypting my Dropbox files.

For the past two years I've using a free open-source encryption app called Cryptomator (https://cryptomator.org/) for my Dropbox folder without problems. The only caveat is the mobile apps aren't free.

Another Dropbox encryption app is BoxCryptor, but I quit using them when they went subscription-only.


I've just looked on Boxcryptor, and they appear to have a free tier. Are you sure it's subscription only?


Boxcryptor used to be a one-time purchase to allow an unlimited number of cloud providers and devices.

However they decided they needed a consistent revenue stream so they renamed their software "Boxcryptor Classic", stopped updating it, and now users have to pay $48/year to get features previously available as a one-time fee. This was about 2 years ago, by now they've probably scrubbed all references to the "Classic" version on their website.

To be fair the subscription version does have new group/admin features for multiple users or businesses.


Ah, okay I see. Thanks man.


Dropbox sharing your data without consent? Nothing new. I've been using dropbox paper for a year now and only recently found out that by default all docs are shareable. That means, if you log into dropbox -> open a dropbox-paper doc -> Logout you're not safe. Anybody with your browsing history can re-access the document you've been working on even after you log out. So essentially if you're using dropbox paper on a shared computer or on a public/library computer and logout, people will still have accesss to your docs that you've worked on. Only way to turn this off, is to manually click the 'invite' button and uncheck the share option for each and every document seperately. I had some personal/sensitive info on a few docs and was shocked to learn of this. Completely unacceptable from dropbox!


If you are leaving browser history in public libraries, you have bigger problems than Dropbox Paper.


Care to expand? I wouldn't usually think of browser history as particularly sensitive.


There are a _LOT_ of services that treat the URL as a secret. If you're leaving these URLs in your browser history on a public computer other people can access them. For example, the images served up by Google Photos, that have the form lh3.googleusercontent.com/[kilobytes of base64-encoded spew], can be accessed by anybody having the URL. So if you use a public computer to access these, even though you need to be authenticated to browse Google Photos, and despite the fact that you conscientiously logged out, anybody with access to the history can still look at your photos. This is not be any means the only such example.


I love Dropbox, but this kind of data-gathering without explicit permission is bananas. What I'd really love to see added to Dropbox is client-side encryption (i.e. I want to manage my own keys so nobody can monkey with my data). And yes, I know I can store an encrypted container inside Dropbox, but that defeats the purpose of easily accessing my data from every device.



Didn't know about this. Thank you!


Dropbox may not natively add client side encryption in the near future. The way Dropbox has stored data from the beginning is by deduplicating information across all its users to save space. So if you upload a book or a movie or a song and I upload the same, Dropbox stores one single copy of it for the both of us.

This is just a simplified explanation. The actual deduplication is done in smaller blocks.

If you want client side encryption with Dropbox, you have to add a layer before the Dropbox client sees your files on your system, using Cryptomator or Boxcryptor or encrypted volumes with Veracrypt, etc.

Or you could switch to other online backup/sync services that claim to have client side encryption, like SpiderOak and a few others.


That's true, but I'm sure businesses (and many individuals) would pay a premium for native client-side encryption with keys they hold themselves.


What kind of junk statistics are these? HBR ought to be more discerning in what it publishes.

"How successful teams collaborate"... wait, I meant "the average number of users who update the same directories in Dropbox from institutions that tend to have influential research.

Sound insights. Make sure you're collaborating with no more than 2.3 people or else you'll have to move your research projects over to Yale.


Agreed, these sound like completely arbitrary measures.

I agree that senior researchers probably bring valuable experience and insight to research projects, but I don’t think you can validly arrive at that conclusion from the number of times they open a doc in Dropbox.


Yes this was curious. I wonder if these insights (2.3 v 3 collaborators over 180 vs 130 days with the top person contributing x%) was really effective or just a coincidence.


> 2.3 v 3 collaborators over 180 vs 130 days

Yeah, that caught my eye too especially the missing RMS so we could see whether the difference between 2.3 and 3 is significant.


- The researchers claimed they could see "every Dropbox folder associated with a given researcher."

- Dropbox denies giving researchers non-anonymized user data

https://www.zdnet.com/article/dropbox-denies-giving-research...


As the linked article states, all personally identifiable information is removed, but you still want to be able to say "Alice worked with Bob in folder 1, and that same Alice worked with Charlie in folder 2", so you assign unique identifiers to each user, such that you can't tie Alice to "Prof. Smith at University of Chicago", but you can tie folder 1 and folder 2 to the same Alice.


The GDPR has provisions for information like this, specifically to say that small pieces of information can together still constitute personal data. Consider that you can retrieve names if you map someone's professional interactions with this kind of detail.

Regardless of whether or not the GDPR applies to these people, it's a useful tool to illustrate why this kind of data is still wrong to share (especially without any kind of consent!).


If using a Mac, it’s pretty easy to encrypt a drive and store it in Dropbox, then mount it when you want to use it. Kind of negates the whole point of Dropbox (mobile access, small sync etc) but I started doing it for more sensitive things.

Doesn’t require any 3rd party addons.

I really can’t believe they shared this data. Universities do work for businesses all the time. Imagine a folder of research subjects organized by geo/age/sex then full patient name or SSN, under a folder called HIV survey or something. I mean really?


I am sorry this might be a cliched post and doesn't add to the discussion but if they do things like this to academics what do they do with other peoples private data that we dont know about?


This was a very poor read, and just listed correlations based on some numbers. I personally didn't learn anything that I, or anyone else, could apply. It's just a "correlation=causation" based list.


How is this analysis even valuable "The average number of people on a project at a top-10% university was 2.3"

Analysis where the majority of the projects have less than 3 people tells you nothing on how to collaborate.


"To invesitage the impact on peformance"

Wow, can't HBR afford a proofreader?


Or a spell-checker at least.


How do you determine "whether they were senior or junior faculty" from anonymized data?


> Teams at lower-performing institutions were more likely to have one person or a small number of people doing more of the “heavy lifting.”

Is there some way to address this when it does happen? Or is it just a matter of the right people being involved?


My guess is that this is a symptom of only specific people being competent/engaged. Alternatively, this is specific team members hating process.

In the latter case, I think that means you adjust process to be lighter. In tbe former, there is no procedural fix, only hiring fixes.


I am surprised by the comments here. No one seems to have actually read the article nor taken the time to learn about how Dropbox partners with researchers: https://blogs.dropbox.com/business/2018/06/nico-customer-sto...


The five rules listed feels more like correlation and less like cause.

Also, there's more to true collaboration than sharing files.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: