Turning web traffic into a Super Computer

noahdesu · on March 3, 2018

This is a great direction for a certain class of problems, but to call this a general approach for building a super computer ignores the biggest challenge of all: communication performance.

The leadership-class super computers have large budgets for network interconnects like Infiniband, because the hardest problems require ultra low-latency, high-throughput RDMA message passing between processes.

xtreme · on March 3, 2018

Absolutely true. Embarrassingly parallel problems are generally not the most interesting, because you can just throw compute power at it until it is solved. Most applications running on actual supercomputers are communication bound.

chroem- · on March 3, 2018

I actually made this a few years ago for an ill-fated application to YC (I was still a sophomore in college).

The first big problem is that you can't trust any of the results that you receive from the workers. You can alleviate this somewhat by not accepting job results until they've been replicated by at least several other workers (lowering efficiency of the system), but then you're still vulnerable to sybil attacks.

The second issue was the total lack of privacy for the data and computations.

ikeboy · on March 4, 2018

Check 10% of submissions randomly - if some are wrong, blacklist that source.

You can send the checking to other nodes and check those randomly as well.

chroem- · on March 4, 2018

That's what you would initially think, but a clever adversary can produce arbitrarily many identities, so even if results agree, there is a non-zero chance that all of the agreeing nodes are compromised. Even if you begin blacklisting nodes, an attacker can simply spoof new ones.

So then you need a proof of work scheme to defend against sybil attacks, but even then an attacker could choose to only produce incorrect output when they detect that a quorum group is made entirely of their compromised nodes, and otherwise produce correct output. That would once again render blacklisting ineffective.

I'm not saying it's impossible, but securing this sort of browser-based grid computing scheme is a nontrivial task. And that's coming from someone who is very fond of this technology.

ikeboy · on March 4, 2018

Use homomorphic encryption to ensure attackers can't tell when they get the same thing twice.

IPs are not so cheap, anyway. If you have enough legitimate traffic it would be tough to spoof enough.

radioo7555 · on March 5, 2018

IPs are very cheap. You aren't renting them for a month. You are using them for a few requests.

Cyphase · on March 4, 2018

You're thinking in IPv4.

ikeboy · on March 4, 2018

Most traffic will be IPv4, if you only use them for work you'll do ok

mycall · on March 4, 2018

How does Steem and Nano solve this? They much have the same problems.

Buge · on March 4, 2018

Neither Steem nor Nano seem to be a supercomputer with arbitrary computation.

I'm not very familiar with either of them, but Steem seems to be a non-centralized database of reddit-like posts. This doesn't seem to be offloading arbitrary computations to others like OP wants to do.

Nano also doesn't seem to be offloading arbitrary computations to others.

Ethereum does allow arbitrary computations, but it is not offloading, it is requiring every node to rerun every computation, which is obviously much less efficient than a single centralized computer doing the computation.

hexscrews · on March 3, 2018

Ok, this is similar to Gridcoin (https://gridcoin.io/) , Golem (https://cryptoslate.com/coins/golem/) , SPARC (https://sparc.network/) , et al. I could keep going, but the point is, something like this exists. And we are seeing the issues of what happens when you embed a program that takes advantage of processing power. You get things like Coinhive (https://coinhive.com/), which were meant to be benign, turned into botnets.

huhtenberg · on March 5, 2018

Tangentially related - http://www.gomezpeerzone.com/

This was one of the first attempts to monetize spare CPU capacity and to build a business around massively distributed computing. It seems to be in a zombi mode now, but it was launched almost 10 years ago - https://web.archive.org/web/20080529034258/http://www.gomezp...

tsneed290 · on March 3, 2018

For those having trouble viewing the site: https://web.archive.org/web/20180303173013/http://ben.akrin....

clon · on March 3, 2018

This will only work if you can cheaply validate the work performed. Or send it to several distinct visitors and compare results, driving down the likelyhood that the results are bogus.

noahdesu · on March 3, 2018

good point. my hunch is that the only way would be to duplicate work and check for consensus. it seems like there isn't any way to cheaply validate results for any interesting work.

0xFFFF0000 · on March 3, 2018

Something like this could be a way to get finally rid of advertisements, and help finance free services in a novel new way if the CPU isn't over committed. I thought the same around crypto mining... Many people might prefer the resource providing/computation option compared to ads.

Dylan16807 · on March 3, 2018

Good luck getting people to not double-dip.

Also good luck making it so I don't spend 10 cents of electricity or have my phone die early to give a site <1 cent of value.

redka · on March 4, 2018

> Unlike a regular computer cluster, the nodes are very ephemeral (as website visitors come and go) and can’t talk to each other (no cross site requests).

Not sure how that would actually help but they somewhat _can_ talk to each other through WebRTC. I wonder if that would change anything here

nukeop · on March 3, 2018

This could be a good alternative for SETI or that protein folding project but if it somehow turns into a plague of attempts to shift the computing requirements of running the backend onto the users, our content blocking lists will have to grow a little bit longer again.

AtomicOrbital · on March 3, 2018

opportunity here to create cloud computation platforms where people submit work which gets executed on other people's machines who are getting paid to allow their browser to contribute compute ... eventually price of AWS / google compute engine will drop ... win win

fwip · on March 3, 2018

The big problem I see to be solved with p2p computation is trust. In the example, the problem to be solved is "Do any of these strings hash to a known value?" An adversary can accept the job, sleep for a second, and report "Nope" and collect their payment, without ever doing any work.

In contrast, I trust that if I run code on AWS, it'll actually run the code I give it.

These are solvable problems, but important to think about.

archgoon · on March 3, 2018

A bigger issue than lying (and randomly failing) clients is the fact that you're farming out your users data to third parties. You might be willing to trust a single entity like Amazon, Microsoft, or Google, who have a reputation to maintain / can be sued (for potentially a lot); but if your farming out to essentially anonymous individuals, there's no repercussions for, say, using their credit card information.

Now, there are definitely use cases where you don't care; which would be a cool application for this; but unless we figure out how to do homomorphic computing, this doesn't seem immediately likely to impact cloud computing prices; even if it was cheaper to simply use excess compute capacity (which I'm not convinced it is).

cercatrova · on March 3, 2018

This is what Golem cryptocurrency does, with the trust protocol in place through the blockchain [1]

1. https://golem.network

matte_black · on March 3, 2018

Keep dreaming.

bob_theslob646 · on March 3, 2018

Why should we keep dreaming?

matte_black · on March 3, 2018

AWS at least already has very cheap ways to get compute time if you don’t particularly care about having uniform availability of it in the form of spot instances.

The economics of people letting you use their browser doesn’t make sense. The pay is crap and the time when they are browsing is exactly when they would not want their computer doing a bunch of background processing. You’re better off letting people rent out their CPU time through a dedicated application when they are not using it. And remember electricity is not free.

Even then, it’d be tough to compete with AWS on pricing considering its spot instances do exactly this, even letting you bid for what you want to pay.

I should have explained this in my original post, but I figured it was obvious.

To suggest a solution: don’t offer payments at all but maybe instead gameify CPU time donation somehow. Maybe then you’ll have a better chance, but potentially less participation, unless people are donating to charitable causes. Either way it’s not going to do anything to affect AWS or Google Compute prices. Creating an entire startup just for the goal of getting some reduced prices on the actual compute platforms you want to use is madness. Just pay the money.

NegativeLatency · on March 3, 2018

Seti@home has been doing this for years

bradknowles · on March 3, 2018

I was thinking of that exact use case — could this be a modern alternative that might get even wider distribution due to lower barriers to entry?

sosborn · on March 3, 2018

Almost 20(!)

matte_black · on March 3, 2018

ryanlol · on March 3, 2018

>AWS at least already has very cheap ways to get compute time

No it does not. AWS and "very cheap" simply don't belong in the same sentence.

matte_black · on March 3, 2018

Compared to what? Useless to compare anything without numbers.

ryanlol · on March 3, 2018

Compared to what? The compute market in general.

AWS is a premium provider charging premium prices, they've essentially priced themselves as the most expensive provider on the market. It feels insane to describe them as "very cheap".

You might as well be calling a Bugatti Chiron "pretty cheap".

Any datacenter provider will be able to sell you compute time at a fraction of the cost of EC2. Look at OVH for example.

matte_black · on March 4, 2018

Spot instances are insanely cheap, not talking about EC2.

ryanlol · on March 4, 2018

EC2 spot instances are still EC2.

matte_black · on March 5, 2018

But cheaper.

zappo2938 · on March 3, 2018

>If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them.

Henry David Thoreau

king07828 · on March 3, 2018

Seems like getting a Linux kernel and a LAMP stack running on this would be a good first step. The technical challenge of doing it sounds fun and interesting. The real payoff may be when a billing and payment system is able to consolidate billing from processing clients and to distribute the payments to the owners of the processing clients based on the work performed

king07828 · on March 5, 2018

Why did the parent get downvoted?

antoineMoPa · on March 3, 2018

I should code a fractal renderer that combines the work of many clients...

WalterGR · on March 3, 2018

What are the performance numbers for work distributed like this?