Hacker News new | past | comments | ask | show | jobs | submit login
Dropship — successor to torrents? (forwardfeed.pl)
127 points by herbatnic on April 24, 2011 | hide | past | favorite | 69 comments



This isn't even remotely similar to bittorrent, it's more akin to rapidshare et al. You're completely at the whim of a 3rd party, Dropbox. And I'm pretty sure something like this would violate whatever contract you agree to when signing up.

So no, not a successor to torrents.


I'm pretty sure I've seen stuff suggesting that Dropbox occasionally purges copyrighted files from its system (which is made super-easy by the hash fingerprinting system that it uses to deduplicate storage) so I agree that this is not likely to meet most people's use case for torrents; i.e. stealing copyrighted music and films.

I wonder why the github repo has been taken down.


Arash (the CTO) asked me to, in a really civil way. So I decided to respect his wish and take down the repository.

Myself, I really regarded dropship as a nice feature. As Dropbox had implemented the great idea of putting all humanity's data in one big hash-addressable vat, sharing is a logical extension. If you would cache the popular blocks locally (dropbox already does this in a way with LAN P2P), global data distribution would be pretty much a solved problem.

Obviously, this affects legal and illegal files in the same way. It's really a shame that people are still so obsessed with the illegal applications, that they become blinded to how useful this is for legal ones.


Did he give any rationale for his request?


Yes, as I kind of hinted at in my post, the main reason is that they don't want the stigma that is associated with file sharing.

Even though there is a lot of (social) legal sharing going on between users, the focus is always on illegal sharing. He has a point there, though I think it's a pity.

IMO it's not even that suited to piracy, as the deduplication means that they can find everyone that has a file! Torrents are way better for that.

The principles of dropship could be used for sharing photos, videos, public datasets, git-like source control, or even as building block for wiki-like distributed databases. The possibilities are endless when every file can be called up with just its hash.


There is a way around this. Charge the person sharing the file a certain amount of money after a certain bandwidth (rather than the person downloading the file). This would virtually prevent large scale piracy without preventing many other usages.


I've been sending TV shows to friends privately since I started using dropbox. Never seen a takedown. As long as they don't get a dmca, I doubt they care.

If they did how hard would it be to pad media files with some salt to break hashing anyways? Not hard at all...


I've been doing the same, but on a very small scale. Mostly sending a funny episode of some show to a group of friends or occasionally sending a movie to my folks. I don't doubt that if I was mass distributing these files it would attract attention.

Also, wouldn't breaking the hash nullify one of the ostensible advantages of this method (the de-dupe of the stored files)? If the goal is solving global file distribution, making each copy of every originally-identical file unique - and therefore requiring n times the storage - isn't a viable solution.


I think that's a very poor business choice by Arash. Third party developers need freedom.


They don't want their brand associated with piracy. According to the developer, they have resolved the issue in a civil way. I don't see a problem here.


Think of all the bandwidth charges Dropbox would be incurring if this took off. They'd have to make the service more expensive for everyone.


How do they decide which copyrighted content to delete? The files in the Dropbox are by default not public. Merely having copyrighted files in your Dropbox is certainly no violation of copyright law.

At which point does it become illegal? Is sharing it with one or two people ok? I would think that even putting it in your public folder is not necessarily illegal: What if you don't share the link publicly (or only with one or two people)?

Services like Rapidshare thrive on those ambiguities. They let you upload any file and give you a link, only after this link really becomes public will they take down copyrighted content (which introduces a time delay).

I have actually never seen that happen with Dropbox links (which, I think, is the right strategy for them: It would be bad for their brand if they were to become "that piracy website"), so they must be doing something different.


I have copyrighted material in my Dropbox right now. It's copyrighted by me and my business partners. We're making a film and the material on our dropbox will eventually make it into the public eye.

We're not Big Media people, but what about other content creators? Especially musical collaborators...


> Merely having copyrighted files in your Dropbox is certainly no violation of copyright law.

Actually, it could be. Copyright means exactly that: the right to copy.


Yes, and having a copy of something doesn't mean that you don't have the right to have this copy.


Copyright law is quite a bit more complicated than that. It's at any rate not only the copyright owner who is allowed to make a copy. You can, for example, rip your CDs and copy those files on your HDD as often as you want.


Actually strictly speaking under the copyright law in this country, you cannot. It says "all rights reserved" on my CDs, and one of those rights is literally the right to copy. Bear in mind these laws were written a long time ago, when consumers did not have the means to make unauthorised copies, to prevent mass infringement.


You can. 17 U.S.C. § 1008 (http://www.law.cornell.edu/uscode/17/1008.html) allows consumers to make non-commercial copies (both digital and analog).

As I said, copyright law is quite complex and full of exceptions and clarifications.


Can anyone give a reference for this or indicate if it's true? I use Dropbox to backup my purchased music downloads; the thought that when my hard disk crashes, I can't restore them from my Dropbox because they might have been "purged" is rather worrying.


If you're purchasing from iTunes, the files have your email and other personal info in them which would give them a unique hash and differentiate them from the content that was popular/being deleted.


dropbox doesn't encrypt your data either, so it's generally not a good idea to keep any kind of your sensitive data unencrypted there.

i guess storing something like a small truecrypt volume there would be just about enough.


Agreed. It's still a really neat hack, though.


From the README, in case it wasn't obvious:

"These utilities make use of the deduplication scheme of Dropbox__ to allow for "teleporting" files into your Dropbox account given only a list of hashes, provided of course that the files already exist on their servers. This enables arbitrary, anonymous transfers of files between Dropbox accounts."

Between this and the minor information leakage issue I suspect Dropbox will be making changes to their deduplication scheme.

A simple way to fix both of these issues is to require each user to upload the complete file once, regardless of whether Dropbox already has it stored. Deduplication in storage and per-user uploading is still possible.

Also interesting to note is the Github repo for this has been deleted. Tarball of the source is still available.


Napkin-cryptographic way how Dropbox could fix this while still getting full deduplication: currently, when the client discovers that a file has been added locally, it sends hashes of 4MB blocks, and the server considers the file added.

Additional measure at that point: the server could challenge the client to provide the values of bytes at a couple of arbitrarily chosen byte offsets of the original file. (Could precompute that, provided the queries don't repeat often).


What would stop pirates from querying each other (maybe on some P2P network) for those random bytes?

Client A wants the file that Client B has so when Dropbox asks Client A for some random offset, Client A asks Client B in the background and relays the result to Dropbox.

It really depends on how far pirates would be willing to go.


Of course, Dropbox can't prevent people from sharing content out of band. But if Client A and Client B are offering arbitrary byte ranges to complete strangers, they are effectively playing BitTorrent again.


Yes, but they are only exchanging a constant amount of information to fool the server challenge, whereas we could hope to do better if the server builds challenges which use information that he knows the client has.

For some reason, this inspired me to write a blog post: http://a3nm.net/blog/deduplication_attacks.html and http://news.ycombinator.com/item?id=2489594


Does that mean that the current protocol allows users to steal arbitrary files given a hash?

For example if some web site charges per download of a file, but still has the hash posted publicly, you can try to "steal" it from someone who has it stored privately in Dropbox?

IOW, the file hash is equivalent to your account login/password combo [restricted to any given file]?


As far as I understand the original posting, you can download any file from Dropbox's servers if you know its Dropbox hash, which apparently is a sequence of SHA256 hashes of 4MB blocks.

If you have a sub-4MB sensitive file, and you publish its SHA256, and the Dropbox protocol applies the hash function in the same way as file hashing tools (e.g. doesn't include a tag meaning "this hash is computed particularly for Dropbox deduplication" into the SHA computation), yes, then apparently people can download your file.

However, I rarely see SHA256 checksums along with download links; more SHA1 and MD5.


That's still a bit worrying though; do people stop to consider that publishing a SHA256 hash bears the risk of being equivalent of publishing the file itself (assuming someone uploads it to dropbox)?

Another related attack could be to start with a known file (say, your employment contract), swap out the name with a colleague and generate a bunch of files with different salary amounts, essentially bruteforcing sha256 sums. If dropbox suddenly coughs up a file, you've revealed his salary!


Assuming you know the exact structure of the file this would be a perfectly valid attack. There could be a lot of variance in rich formats like PDF files from things like compression, etc, so this might be expensive to perform on non-plaintext files.

Dropbox effectively acts as an "existence oracle". You can't ask it to cough up a file you don't have, but you can ask it if a given file exists anywhere in the system.

This would be an effective way for law enforcement or copyright civil enforcement to check for content that is clearly illegal or a certainly copyright violation to possess. They would need to query for a set of hashes of the given illegal content. If any matches returned positive data, they would be able to issue a subpoena for all users who stored the given content in their dropbox folder and pursue them further.


> for content that is clearly illegal or a certainly copyright violation to possess

How can something be "clearly" a violation? If I have an album, but copy someone else's rip instead of making my own - is that "clearly" a violation? Alternatively if I used the same application, I'd probably obtain the exact same file - is that clearly a violation too?

(grooveshark kind of operates on the assumption that it's ok)


I'm thinking of something like a pre-release album, a theatre rip of a movie, etc. Not a rip of something legitimately licensed to you, but of something not officially released to the public.


The Perkeo database used by some German polices contains hashes of known child-porn image files. Probably not SHA256, though, given that it was started in 1998.


The employment contract scenario doesn't require download-by-hash, only deduplication. You could just measure the amount of network traffic the client needs to "upload" your file.


Just read the reappeared sourcecode (assuming it works as advertised): The hash is an SHA256 of pure 4MB blocks in the input file. They add no message type information which could prevent mixups between Dropbox-deduplication hashes and hashes computed for other purposes.

The following dropship file was assembled using only shasum, ls and vi:

         {"blocks": ["f3f754a5dcd93f271ad013a5ee84f495a36da84f152e0a1fec4646345b0c10d6"], "name": "ostseestrand.jpg", "size": 514779}
Could someone who has never shared files with me verify that it indeed produces a picture of a beach?


When I run dropship with a file containing the JSON you quoted, it prints, "('Oops, blocks are not known: %s', [u'8_dUpdzZPyca0BOl7oT0laNtqE8VLgof7EZGNFsMENY'])".


Yes, I've got the beach image in my Dropbox folder now :)


Yeah, Canon PowerShot A60 ;)


From "How does it work?" in the Readme:

    Dropbox its deduplication scheme works by breaking files into blocks. 
    Each of these blocks is hashed with the SHA256__
    algorithm and represented by the digest. Only blocks that are not yet
    known are uploaded to the server when syncing.

    By using the same API as the native client, Dropship pretends to sync a
    file to the dropbox folder without actually having the contents. This bluff
    succeeds because the only proof needed server-side is the hash of each 4MB block
    of the file, which is known. The server then adds the file metadata to the folder,
    which is, as usual, propagated to all clients. These will then start downloading
    the file.
It looks like the Github repo was deleted a few hours ago, but the direct download link still works.


I still don't get it. Anyone willing to explain?


Dropbox avoids having to store multiple copies of huge files by detecting duplicated files, storing only one copy, and letting every user that stores the file download from that one copy. Dropship exploits this system for filesharing by lying to the Dropbox servers and saying that it already owns a copy of the file.

For example, Person A wants to distribute a copy of a CD or something. They upload the file to Dropbox normally. They then use Dropship to create something describing that file, which they then publish. Persons B and C download that descriptor and feed it to Dropship, which tricks Dropbox into thinking that they also own copies of the file. Dropbox then lets Person B and Person C download the file that Person A wanted to distribute, and mission accomplished.

It's all very clever. I like it.


Somewhere on a Dropbox server is a file that you want. Normally the file's owner would have to share that file with you in order for it to appear in your Dropbox. But if you know the hash of file you can trick Dropbox into thinking you already have the file and are just adding it to your Dropbox. Apparently Dropbox notices they already have that file, and instead of you uploading it they just make it appear in your account. Then you can download it.

I wasn't impressed by the OP, but this is actually a really cool hack.


I'll try.

Let's say you want to upload files A and B to Dropbox from your computer. A is a 3mb file, and B is a 12mb file.

The dropbox client first looks at A, sees that it's <4mb, and so hashes[1] the whole file. That means that it runs a function which turns the file into a 256-byte string (a "hash") which is unique[2] to that file.

The client then sends that hash to the server, which checks to see if it has already seen that hash. If it has, then it assumes that it already has the file, and just copies it from the previous location where it stored the block with that hash. If it hasn't, then it goes ahead and uploads the file.

The process for uploading file B is very similar, except that the client breaks it into three 4mb blocks, hashes each of those, and sends the hash to the server to see if it's already received those blocks.

Phew. OK, now we can get to why Dropship is (was?) a neat hack. The idea is, if Alfred has uploaded file C, and Barbara wants to get a hold of file C, but doesn't want to download it, she can just send the dropbox server the hashes for each 4mb block of file C.

The server will see each hash, say "ahha! I've already got the block represented by that hash, so I won't make you upload it!", and put the file in Barbara's Dropbox.

Does that make sense?

[1]: http://en.wikipedia.org/wiki/Hash_function

[2]: Not really unique, but the idea of hash functions is that we turn each input into a "hash" which is really really really likely to be unique, so likely that we can treat it as unique.




> Restricted Content

> This file is no longer available. For additional information contact Dropbox Support.

So, Dropbox has censorship? Ni-i-ice.


Dear Dropbox User:

We have received a notification under the Digital Millennium Copyright Act ("DMCA") from Dropbox that the following material is claimed to be infringing.

/Public/laanwj-dropship-464e1c4.tar.gz

Accordingly, pursuant to Section 512(c)(1)(C) of DMCA, we have removed or disabled access to the material that is claimed to be infringing or to be the subject of infringing activity.

-----------------

This is BULLSHIT! Dropbox is censoring this because they don't want it to get out there. What will they censor next?


Wow. Either Dropbox have some copyright issues with Dropship's code, or this is just a blatant misuse of DMCA to take down the content. Won't speculate, but I personally suspect the latter.

Wladimir did release the software under FOSS Expat ("MIT") license so he can't really take it back. It's now up to good will of others.

While I understand that this may put Dropbox in unfortunate situation, such methods to take down the problematic piece of software somehow feel wrong.


Here I have mirrored Dropship on gitorious, enjoy. https://gitorious.org/dropship/dropship


Thanks a lot! How'd you manage to grab it? Your post is from just one hour ago, but it was already gone when I checked a few hours ago.


There was a direct link from the original webpage.


forgive what is possibly a very ignorant question, but are there security concerns here? I understand that the key space is immensely huge and that for any file over 4MB in size it would be virtually impossible to guess, but what is to stop someone from just trying hashes for fun to see if they get interesting files?

Like I said for file over 4MB it seems fine, guessing sequential hashes would be all but impossible. I assume the realistic solution is just to encrypt my files (preferably in a truecrypt volume over 4MB in size) if I'm truly concerned.

On a side note, it would be interesting to see if this could be modified to tell me how unique my overall file set is.


I see your point here. I hope non-public files are protected from Dropbox's deduplication


They aren't. A colleague copied a whole bunch of documentation from his private Dropbox onto my computer; when I then copied it into my Dropbox it took around half a minute to sync and it was a couple hundred MB.


I think the real successor to torrents was actually its predecessor, and that's binary usenet files. Download speeds are bottleknecked at your own downstream, most providers have SSL support for encryption of everything you download, and there's a plethora of content. People don't really know about it though


absolutely. instead of leaving a movie download overnight i leave it for a shower. i hope it never hits mainstream, especially since it's subscription based


From my experience rapidshare like sites have much more content than usenet. is there a some hidden way to to access vast content in usenet ?


It's centralized, not distributed. You are at the whim of dropbox. This is not a successor to torrents at all.


It's a novel exploit of deduplication, but I don't see how it's practically any better than moving a file into the /Public directory and handing them a URL.


In theory, developing this a bit more would lead to searchable directories of files and an easy way to retrieve them, very similar to a torrent tracker.

That has significant benefit over shared files and I have to think would scare the heck out of dropbox because of the ire it might bring upon them. This would have to be a worst nightmare for them. Although removing deduplication would solve it for them (with significant increase in what has to be stored).


Dropbox has bandwidth limits on /Public URLs, particularly low for free accounts. This wholly circumvents that, and I suspect that's the real reason Arash asked for a takedown, not so much the loose association with piracy.


Dropbox could do the same for hashes. For example internet cannot download more than 400MB from a hash (4MB) per day.


Consider

1) User buys a file from the rights owner and explorers it into his Dropbox

2) User obtains a blockwise hash of the file and runs dropship on it

3) User obtains a public URL from somebody who has the file and downloads the file from the Dropbox web server

The point is that the Dropbox server cannot distinguish 1 from 2, but both from 3. Therefore, 2 should be more robust against takedown notices than 3.


At last. I'm surprised nobody thought of implementing this earlier. I thought about it, but this being a direct attack on Dropbox, I don't see much value in it. Apart from being unethical, it will only force Dropbox to either remove this very useful feature, or implement a challenge-like system which will render this useless. This will be short-lived code if it spreads.

In fact, I think this "feature" is one of the (many) reasons why Dropbox doesn't have an opensource client. And it isn't exposed it in its so-called "API".

Edit: I just saw that they killed the feature: http://news.ycombinator.com/item?id=2483053


Indeed, great idea/hack but the only issue I can foresee is Dropship becoming very popular for illegal file duplication, bringing forth the attention of the RIAA/publishers etc.. causing legal headaches for our beloved Dropbox.



So who is game for setting up a repository site of json hash files?


Does anybody have another mirror?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: