Study of Thousands of Dropbox Projects Reveals How Successful Teams Collaborate

kerng · on July 24, 2018

>> Dropbox gave us access to project-folder-related data, which we aggregated and anonymized[...]

Wait, Dropbox gave away non-anonymized data to a third party and they then anonymized it. Wow, what could go wrong? Just thinking of the endless possibilities of where all that data is now... Its deeply troubling how much unwarranted trust there is when it comes to handling of personal data.

xevb3k · on July 24, 2018

This is not at all surprising.

Dropbox put Condoleezza Rice on their board, who supports warrentless wiretaps [1].

I deleted my account when they did that. Not so much because it would have any direct effect, but because it’s clear that we have differing views on how user data should be treated.

I surprised that people are shocked by them treating user data like this, it’s absolutely in character.

[1] http://www.drop-dropbox.com

TomK32 · on July 24, 2018

just deleted mine (which I hadn't used in years anyways).

j88439h84 · on July 24, 2018

It's been edited:

> Dropbox gave us access to project-folder-related data, which Dropbox had aggregated and anonymized,

> we and Dropbox employees could view no personally identifiable information

> Editor’s note: We’ve clarified this article to say that Dropbox anonymized and aggregated the data before providing it for this analysis.

bad_user · on July 24, 2018

I just cancelled my subscription.

Truth be told I've also been dissatisfied with the price of the Plus and Pro subscriptions, relative to what they provide, with their support and their direction, so was looking for motivation to move.

This is just icing on the cake.

c3534l · on July 24, 2018

Given the very public failure of Netflix to "anonymize" data, they shouldn't be even giving away anonymized data without user permission on account of anonymized data not actually being anonymous.

greggman · on July 24, 2018

It's not already in their TOS that they'll share any data they feel like with any one they feel like?

https://www.dropbox.com/terms#privacy

It certainly says the words:

"We may share information as discussed below, ... Others working for and with Dropbox."

plg · on July 23, 2018

This is deeply troubling. As a scientist who uses* Dropbox I gave no informed consent. I know they claim personally identifiable information was removed but still I gave no consent for this.

*not for long, perhaps

yborg · on July 23, 2018

I can't speak to how informed you were when you gave the consent, but if you are using the service, you provided it.

"Law & Order and the Public Interest. We may disclose your information to third parties if we determine that such disclosure is reasonably necessary to: (a) comply with any applicable law, regulation, legal process, or appropriate government request; (b) protect any person from death or serious bodily injury; (c) prevent fraud or abuse of Dropbox or our users; (d) protect Dropbox’s rights, property, safety, or interest; or (e) perform a task carried out in the public interest."

I would assume that this research fell under the "task carried out in the public interest" clause.

plg · on July 23, 2018

If THAT is their defense —- we all agreed to it even if we didn’t understand it at the time —- well then good luck to them

Alex3917 · on July 24, 2018

Isn’t public interest a criteria for getting IRB approval? Having read all the recent AoIR threads on ethics, this doesn’t seem outside of the accepted norms.

krageon · on July 24, 2018

Accepted by who? Obviously not the person you are responding to. Ethics is a relative field, not one full of absolutes.

smokeyj · on July 23, 2018

Why else would Condi Rice be on the board /puts on tinfoil hat

IshKebab · on July 24, 2018

That doesn't stand up under the GDPR any more as far as I know.

TheCoelacanth · on July 24, 2018

GDPR explicitly exempts anonymized data:

"The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes." [1]

[1] https://gdpr-info.eu/recitals/no-26/

trampypizza · on July 24, 2018

I imagine the work was carried out by a processor, which could be perfectly legal if the contract between the two entities had adequate data protection clauses. This is just a guess though, I'm sure its much more complex than that.

blub · on July 24, 2018

If they asked consent specifically for these types of studies, then it's legal.

As in: a form asking the user if their information can be used in this way and giving them the possibility of opting out. Adding one more clause to the privacy policy doesn't count.

greggman · on July 24, 2018

opting-out is not OK via the GDPR. Only Opt-In is allowed or at least that's my reading

GDRP section 32

Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject's agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject's acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. Consent should cover all processing activities carried out for the same purpose or purposes. When the processing has multiple purposes, consent should be given for all of them. If the data subject's consent is to be given following a request by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided.

https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=15323486...

Am I mis-understanding?

trampypizza · on July 24, 2018

You are correct, however this is for when consent is relied upon as the legal basis for processing.

My guess is that they are using provision of service as the legal basis for processing, whilst relying upon the "public interest" clause in the ToS to justify the sub-processing by the third party.

detaro · on July 24, 2018

That doesn't work, you need to have a legal basis for all processing. It's hard to argue that operating the service requires this sort of research, so you need another basis.

There's some public interest exceptions, but from my knowledge it's not established that stuff like this would work under it.

trampypizza · on July 24, 2018

Yes, you are correct. I think it would be extremely difficult to justify that this kind of processing was necessary for the provision of service.

It seems to me that an organisation the size of Dropbox would have a fairly watertight justification. However if the legal basis for processing is neither consent nor provision of service, then they must have done a pretty good job of obfuscating all PII (as the article says "...we and Dropbox employees could view no personally identifiable information.". If this is the case then this sharing of information may not even be in-scope of GDPR.

I'm not sure if the public interest exceptions would be a safe route to go down. The EU has made it clear that, like 'Legitimate Interest', the get-out-of-jail-free justification is going to be highly scrutinised.

EDIT: I have just seen that the article has been edited to say that the anonymisation and aggregation was carried out by Dropbox before being transferred to the third party, which kind of kills the discussion.

trampypizza · on July 24, 2018

They could be relying on the "public interest" part of their TOS, in which case they'd possibly argue that this processing was necessary as part of the provision of service, and therefore wouldn't require any further consent from the user.

For the record: I'm not suggesting that what they did was ok, just trying to think about it from a GDPR perspective. Anonymising account information is great and all, but how can you be sure you've obfuscated all PII from information saved to file storage, unless you audit all that information - which in and of itself seems ropey from a data protection point of view.

blub · on July 24, 2018

Why do you belive that the work under discussion was performend in the interest of the public?

It seems hardly necessary to share data with HBR so that Dropbox can offer file-sharing services...

Based on my reading here: https://ico.org.uk/for-organisations/guide-to-the-general-da... this does not apply.

trampypizza · on July 24, 2018

So it looks as though the article has been edited to say the anonymisation and aggregation was carried out before being transferred to a third party, which craps on our discussion a bit.

However, to answer your question anyway - I don't believe you could justify the work as being in the public interest. I think it would be an extremely tenuous link and I think you'd be a fool to try and rely on something as flimsy as public interest if you're not a government body, or processing data on behalf of one.

I suppose I was taking a stab at understanding what their thinking was to see if anyone else could provide me with something which I had not considered.

raphman · on July 23, 2018

Exactly. See also https://twitter.com/kopfnuss/status/1020575998205710336 and https://twitter.com/RaphaelWimmer/status/1020582873345163264

bsder · on July 24, 2018

I continue to be stunned that people still somehow think that these services have any modicum of consideration for their users.

If you don't want your information accessed--run your own servers, people. That's your only option.

pacbard · on July 24, 2018

I wonder how they got approval from the Northwestern Institutional Review Board. Not having explicit consent from research subjects might indicate that they qualified for some form of exempt status. Did they sell them that Dropbox collects that data as part of their normal operation, therefore consent is not required? Did they say that Dropbox's anonymization was enough to guarantee subjects' anonymity? Did they say that Dropbox's user agreement already enrolls users into research projects?

ggg9990 · on July 24, 2018

As a Dropbox paying customer and never having heard of the Northwestern Institutional Review Board it's not them that pisses me off. I haven't reconsidered my usage of Dropbox for a very long time, since I made the decision to stay with them after their no-password fiasco. Today is the first day in a long time.

fhsm · on July 24, 2018

Looks easy enough to ask: https://irb.northwestern.edu/participants/questions-and-conc... Participant questions go to eyates@northwestern.edu and non-participant questions go to irb@northwestern.edu.

ErikVandeWater · on July 23, 2018

This may come across as argumentative, but it's still a valid question - What's the harm to you?

ggg9990 · on July 24, 2018

Imagine you have a folder in your Dropbox with 237 subfolders, and each of those subfolders has a certain number of files in it. The largest folder has 1,132 files, for example, the second largest has 916, the third-largest has 771, etc.

Then imagine you have a second folder with 117 subfolders with another pattern like above.

Now imagine that the first folder structure matches a torrent of embarrassing pornography and the second appears to be a superset of a project published to GitHub under your name (i.e. with some directories being gitignored)

avs733 · on July 24, 2018

Not the op but I'll answer for myself.

I've stored non anoynmized data on Dropbox as part of my own research. IRB gave me permission to keep that data and my consent form explained it to participants. We were all working under the assumption this type of sharing by Dropbox was impossible. My school's IRB does not allow the use of Google drive for nonanonymized data storage based on just this type of concern.

another-one-off · on July 23, 2018

The point of consent is that it isn't your (or Dropbox's) opinion on what constitutes harm that matters. It is the person who gives consent.

jeffwass · on July 23, 2018

Maybe the parent was performing a research project on Dropbox collaboration techniques and got scooped?

But seriously, as an example, I know people that share sensitive personal information with their accountants at tax time using Dropbox. Would suck for any of that to be made available to any third parties.

nl · on July 24, 2018

Given it was from universities, then prior disclosure of IP for patent applications could be a "harm". Even the directory names and structures could be key information in some applications.

ggg9990 · on July 23, 2018

The harm is that 1) data which seems anonymized can be de-anonymized due to carelessness or advances in analytical techniques, and 2) it’s mine. I have laptops in my house for example that I’ve not used in years and will never use again. That doesn’t give you the right to steal them even if there’s no explicit “harm” to me.

zkms · on July 23, 2018

> Dropbox gave us access to project-folder-related data, which we aggregated and anonymized, for all the scientists using its platform over the period from May 2015 to May 2017 — a group that represented 1,000 universities. This included information on a user’s total number of folders, folder structure, and shared folder access

This seems like heaven for industrial espionage purposes. Just because there's some anonymisation doesn't mean that the metadata is useless. I sincerely hope they get GDPR'd over this.

smelendez · on July 24, 2018

If it's complete "folder structure" tree data with names removed, that could potentially be matched against public repositories that contain the same folders, like posted Zip files, GitHub projects or university web space, to identify some of the users.

From there, you could potentially identify other tools those people are using or embarrassing folder structures (e.g., deep folder tree structures people used to use to primitively conceal porn and other secret files, or signature folder structures for embarrassing repositories like erotica archives, collections of extremist literature, or piracy tools).

Hopefully they'll release more info and some sample files (like for themselves).

stavros · on July 23, 2018

> Just because there's some anonymisation

"which we [...] anonymized" doesn't sound like there was any anonymisation.

kimdotcom · on July 24, 2018

It is only data from universities, not real for-profit businesses.

zkms · on July 24, 2018

> It is only data from universities, not real for-profit businesses.

Plenty of universities work (often in collaboration with national laboratories and/or with corporations) on work that's far more important and critical than many "real for-profit businesses".

krageon · on July 24, 2018

This is not better or worse. It's just as bad. Do you not think universities handle sensitive information?

barrkel · on July 24, 2018

A large part of what universities do is research in consultation with businesses and especially governments. It's not just teaching students; in many universities, that's only a small fraction of what the staff do.

bigkm · on July 24, 2018

ha. Have you seen what some countries charge for tuition.

jessaustin · on July 24, 2018

University administrators are not shareholders! Their greed is much better for society...

toast_coder · on July 24, 2018

Why the heck would an administrator care if they 'hold shares' when they are pulling 200k-500k per year as base pay?

jessaustin · on July 24, 2018

Ouch! Is "greed" a bad word? It's not an inaccurate one...

bmarquez · on July 24, 2018

The lack of consent requested is ridiculous. There may be some sort of 'obscure paragraph in the terms and conditions that says Dropbox can do whatever they want' but this is horrible for privacy and business security. I'm glad I've been client-side encrypting my Dropbox files.

For the past two years I've using a free open-source encryption app called Cryptomator (https://cryptomator.org/) for my Dropbox folder without problems. The only caveat is the mobile apps aren't free.

Another Dropbox encryption app is BoxCryptor, but I quit using them when they went subscription-only.

ycombinete · on July 24, 2018

I've just looked on Boxcryptor, and they appear to have a free tier. Are you sure it's subscription only?

bmarquez · on July 24, 2018

Boxcryptor used to be a one-time purchase to allow an unlimited number of cloud providers and devices.

However they decided they needed a consistent revenue stream so they renamed their software "Boxcryptor Classic", stopped updating it, and now users have to pay $48/year to get features previously available as a one-time fee. This was about 2 years ago, by now they've probably scrubbed all references to the "Classic" version on their website.

To be fair the subscription version does have new group/admin features for multiple users or businesses.

ycombinete · on July 24, 2018

Ah, okay I see. Thanks man.

tempaccount777 · on July 24, 2018

Dropbox sharing your data without consent? Nothing new. I've been using dropbox paper for a year now and only recently found out that by default all docs are shareable. That means, if you log into dropbox -> open a dropbox-paper doc -> Logout you're not safe. Anybody with your browsing history can re-access the document you've been working on even after you log out. So essentially if you're using dropbox paper on a shared computer or on a public/library computer and logout, people will still have accesss to your docs that you've worked on. Only way to turn this off, is to manually click the 'invite' button and uncheck the share option for each and every document seperately. I had some personal/sensitive info on a few docs and was shocked to learn of this. Completely unacceptable from dropbox!

ebikelaw · on July 24, 2018

If you are leaving browser history in public libraries, you have bigger problems than Dropbox Paper.

j_koreth · on July 24, 2018

Care to expand? I wouldn't usually think of browser history as particularly sensitive.

ebikelaw · on July 24, 2018

There are a _LOT_ of services that treat the URL as a secret. If you're leaving these URLs in your browser history on a public computer other people can access them. For example, the images served up by Google Photos, that have the form lh3.googleusercontent.com/[kilobytes of base64-encoded spew], can be accessed by anybody having the URL. So if you use a public computer to access these, even though you need to be authenticated to browse Google Photos, and despite the fact that you conscientiously logged out, anybody with access to the history can still look at your photos. This is not be any means the only such example.

whitepoplar · on July 23, 2018

I love Dropbox, but this kind of data-gathering without explicit permission is bananas. What I'd really love to see added to Dropbox is client-side encryption (i.e. I want to manage my own keys so nobody can monkey with my data). And yes, I know I can store an encrypted container inside Dropbox, but that defeats the purpose of easily accessing my data from every device.

swaroop · on July 24, 2018

See https://cryptomator.org/

kilroy123 · on July 24, 2018

Didn't know about this. Thank you!

newscracker · on July 24, 2018

Dropbox may not natively add client side encryption in the near future. The way Dropbox has stored data from the beginning is by deduplicating information across all its users to save space. So if you upload a book or a movie or a song and I upload the same, Dropbox stores one single copy of it for the both of us.

This is just a simplified explanation. The actual deduplication is done in smaller blocks.

If you want client side encryption with Dropbox, you have to add a layer before the Dropbox client sees your files on your system, using Cryptomator or Boxcryptor or encrypted volumes with Veracrypt, etc.

Or you could switch to other online backup/sync services that claim to have client side encryption, like SpiderOak and a few others.

whitepoplar · on July 24, 2018

That's true, but I'm sure businesses (and many individuals) would pay a premium for native client-side encryption with keys they hold themselves.

unepipe · on July 23, 2018

What kind of junk statistics are these? HBR ought to be more discerning in what it publishes.

"How successful teams collaborate"... wait, I meant "the average number of users who update the same directories in Dropbox from institutions that tend to have influential research.

Sound insights. Make sure you're collaborating with no more than 2.3 people or else you'll have to move your research projects over to Yale.

jmknoll · on July 24, 2018

Agreed, these sound like completely arbitrary measures.

I agree that senior researchers probably bring valuable experience and insight to research projects, but I don’t think you can validly arrive at that conclusion from the number of times they open a doc in Dropbox.

projectramo · on July 24, 2018

Yes this was curious. I wonder if these insights (2.3 v 3 collaborators over 180 vs 130 days with the top person contributing x%) was really effective or just a coincidence.

jpmattia · on July 24, 2018

> 2.3 v 3 collaborators over 180 vs 130 days

Yeah, that caught my eye too especially the missing RMS so we could see whether the difference between 2.3 and 3 is significant.

primedteam · on July 23, 2018

- The researchers claimed they could see "every Dropbox folder associated with a given researcher."

- Dropbox denies giving researchers non-anonymized user data

https://www.zdnet.com/article/dropbox-denies-giving-research...

staticfloat · on July 24, 2018

As the linked article states, all personally identifiable information is removed, but you still want to be able to say "Alice worked with Bob in folder 1, and that same Alice worked with Charlie in folder 2", so you assign unique identifiers to each user, such that you can't tie Alice to "Prof. Smith at University of Chicago", but you can tie folder 1 and folder 2 to the same Alice.

krageon · on July 24, 2018

The GDPR has provisions for information like this, specifically to say that small pieces of information can together still constitute personal data. Consider that you can retrieve names if you map someone's professional interactions with this kind of detail.

Regardless of whether or not the GDPR applies to these people, it's a useful tool to illustrate why this kind of data is still wrong to share (especially without any kind of consent!).

sbr464 · on July 24, 2018

If using a Mac, it’s pretty easy to encrypt a drive and store it in Dropbox, then mount it when you want to use it. Kind of negates the whole point of Dropbox (mobile access, small sync etc) but I started doing it for more sensitive things.

Doesn’t require any 3rd party addons.

I really can’t believe they shared this data. Universities do work for businesses all the time. Imagine a folder of research subjects organized by geo/age/sex then full patient name or SSN, under a folder called HIV survey or something. I mean really?

brian_herman · on July 24, 2018

I am sorry this might be a cliched post and doesn't add to the discussion but if they do things like this to academics what do they do with other peoples private data that we dont know about?

newscracker · on July 24, 2018

This was a very poor read, and just listed correlations based on some numbers. I personally didn't learn anything that I, or anyone else, could apply. It's just a "correlation=causation" based list.

mywacaday · on July 24, 2018

How is this analysis even valuable "The average number of people on a project at a top-10% university was 2.3"

Analysis where the majority of the projects have less than 3 people tells you nothing on how to collaborate.

rahimnathwani · on July 23, 2018

"To invesitage the impact on peformance"

Wow, can't HBR afford a proofreader?

Cyphase · on July 23, 2018

Or a spell-checker at least.

herf · on July 24, 2018

How do you determine "whether they were senior or junior faculty" from anonymized data?

projektir · on July 23, 2018

> Teams at lower-performing institutions were more likely to have one person or a small number of people doing more of the “heavy lifting.”

Is there some way to address this when it does happen? Or is it just a matter of the right people being involved?

munchbunny · on July 23, 2018

My guess is that this is a symptom of only specific people being competent/engaged. Alternatively, this is specific team members hating process.

In the latter case, I think that means you adjust process to be lighter. In tbe former, there is no procedural fix, only hiring fixes.

Dowwie · on July 24, 2018

I am surprised by the comments here. No one seems to have actually read the article nor taken the time to learn about how Dropbox partners with researchers: https://blogs.dropbox.com/business/2018/06/nico-customer-sto...

chiefalchemist · on July 24, 2018

The five rules listed feels more like correlation and less like cause.

Also, there's more to true collaboration than sharing files.