This is a great direction for a certain class of problems, but to call this a general approach for building a super computer ignores the biggest challenge of all: communication performance.
The leadership-class super computers have large budgets for network interconnects like Infiniband, because the hardest problems require ultra low-latency, high-throughput RDMA message passing between processes.
Absolutely true. Embarrassingly parallel problems are generally not the most interesting, because you can just throw compute power at it until it is solved. Most applications running on actual supercomputers are communication bound.
I actually made this a few years ago for an ill-fated application to YC (I was still a sophomore in college).
The first big problem is that you can't trust any of the results that you receive from the workers. You can alleviate this somewhat by not accepting job results until they've been replicated by at least several other workers (lowering efficiency of the system), but then you're still vulnerable to sybil attacks.
The second issue was the total lack of privacy for the data and computations.
That's what you would initially think, but a clever adversary can produce arbitrarily many identities, so even if results agree, there is a non-zero chance that all of the agreeing nodes are compromised. Even if you begin blacklisting nodes, an attacker can simply spoof new ones.
So then you need a proof of work scheme to defend against sybil attacks, but even then an attacker could choose to only produce incorrect output when they detect that a quorum group is made entirely of their compromised nodes, and otherwise produce correct output. That would once again render blacklisting ineffective.
I'm not saying it's impossible, but securing this sort of browser-based grid computing scheme is a nontrivial task. And that's coming from someone who is very fond of this technology.
Neither Steem nor Nano seem to be a supercomputer with arbitrary computation.
I'm not very familiar with either of them, but Steem seems to be a non-centralized database of reddit-like posts. This doesn't seem to be offloading arbitrary computations to others like OP wants to do.
Nano also doesn't seem to be offloading arbitrary computations to others.
Ethereum does allow arbitrary computations, but it is not offloading, it is requiring every node to rerun every computation, which is obviously much less efficient than a single centralized computer doing the computation.
Ok, this is similar to Gridcoin (https://gridcoin.io/) , Golem (https://cryptoslate.com/coins/golem/) , SPARC (https://sparc.network/) , et al. I could keep going, but the point is, something like this exists. And we are seeing the issues of what happens when you embed a program that takes advantage of processing power. You get things like Coinhive (https://coinhive.com/), which were meant to be benign, turned into botnets.
This was one of the first attempts to monetize spare CPU capacity and to build a business around massively distributed computing. It seems to be in a zombi mode now, but it was launched almost 10 years ago - https://web.archive.org/web/20080529034258/http://www.gomezp...
This will only work if you can cheaply validate the work performed. Or send it to several distinct visitors and compare results, driving down the likelyhood that the results are bogus.
good point. my hunch is that the only way would be to duplicate work and check for consensus. it seems like there isn't any way to cheaply validate results for any interesting work.
Something like this could be a way to get finally rid of advertisements, and help finance free services in a novel new way if the CPU isn't over committed. I thought the same around crypto mining... Many people might prefer the resource providing/computation option compared to ads.
> Unlike a regular computer cluster, the nodes are very ephemeral (as website visitors come and go) and can’t talk to each other (no cross site requests).
Not sure how that would actually help but they somewhat _can_ talk to each other through WebRTC. I wonder if that would change anything here
This could be a good alternative for SETI or that protein folding project but if it somehow turns into a plague of attempts to shift the computing requirements of running the backend onto the users, our content blocking lists will have to grow a little bit longer again.
opportunity here to create cloud computation platforms where people submit work which gets executed on other people's machines who are getting paid to allow their browser to contribute compute ... eventually price of AWS / google compute engine will drop ... win win
The big problem I see to be solved with p2p computation is trust. In the example, the problem to be solved is "Do any of these strings hash to a known value?" An adversary can accept the job, sleep for a second, and report "Nope" and collect their payment, without ever doing any work.
In contrast, I trust that if I run code on AWS, it'll actually run the code I give it.
These are solvable problems, but important to think about.
A bigger issue than lying (and randomly failing) clients is the fact that you're farming out your users data to third parties. You might be willing to trust a single entity like Amazon, Microsoft, or Google, who have a reputation to maintain / can be sued (for potentially a lot); but if your farming out to essentially anonymous individuals, there's no repercussions for, say, using their credit card information.
Now, there are definitely use cases where you don't care; which would be a cool application for this; but unless we figure out how to do homomorphic computing, this doesn't seem immediately likely to impact cloud computing prices; even if it was cheaper to simply use excess compute capacity (which I'm not convinced it is).
AWS at least already has very cheap ways to get compute time if you don’t particularly care about having uniform availability of it in the form of spot instances.
The economics of people letting you use their browser doesn’t make sense. The pay is crap and the time when they are browsing is exactly when they would not want their computer doing a bunch of background processing. You’re better off letting people rent out their CPU time through a dedicated application when they are not using it. And remember electricity is not free.
Even then, it’d be tough to compete with AWS on pricing considering its spot instances do exactly this, even letting you bid for what you want to pay.
I should have explained this in my original post, but I figured it was obvious.
To suggest a solution: don’t offer payments at all but maybe instead gameify CPU time donation somehow. Maybe then you’ll have a better chance, but potentially less participation, unless people are donating to charitable causes. Either way it’s not going to do anything to affect AWS or Google Compute prices. Creating an entire startup just for the goal of getting some reduced prices on the actual compute platforms you want to use is madness. Just pay the money.
AWS is a premium provider charging premium prices, they've essentially priced themselves as the most expensive provider on the market. It feels insane to describe them as "very cheap".
You might as well be calling a Bugatti Chiron "pretty cheap".
Any datacenter provider will be able to sell you compute time at a fraction of the cost of EC2. Look at OVH for example.
Seems like getting a Linux kernel and a LAMP stack running on this would be a good first step. The technical challenge of doing it sounds fun and interesting. The real payoff may be when a billing and payment system is able to consolidate billing from processing clients and to distribute the payments to the owners of the processing clients based on the work performed
The leadership-class super computers have large budgets for network interconnects like Infiniband, because the hardest problems require ultra low-latency, high-throughput RDMA message passing between processes.