Crowdsourcing Bruteforcing: Help me win the EngineYard challenge

randomwalker · on July 20, 2009

Let's keep some perspective here folks. The idea is cool and all, but as discussed in the other thread, SHA-1 throughput on GPUs is between 200 million and 600 million per second, depending on the GPU (http://forums.nvidia.com/index.php?s=&showtopic=102349&#...). On the other hand, the maximum throughput that people are getting with JS is about 3000/second. Which is a hundred thousand times slower.

I think it is safe to conclude that browser-based crowdsourcing is not a compelling approach to numerical computation.

tlrobinson · on July 21, 2009

Yeah, AFAICT the leader right now is http://twitter.com/seibert with a hamming distance of 31 (http://forums.nvidia.com/index.php?s=87acc31e73d939b72dffbf7...), and he says he's pushing 800M/sec across 4 GPUs (http://twitter.com/seibert/status/2748691432).

For comparison, this guy's distributed JS implementation just reached 1B a few hours ago: http://twitter.com/wondersquirrel/status/2750527789

If Engine Yard was trying to promote their cloud services with this contest it's rather ironic it appears it will probably be won be a few desktop computers with high end GPUs, which cloud hosts usually don't even have.

Torn · on July 21, 2009

Even on my lowly 8800GT with quad-core I'm hitting 83+ megahashes per second -- my lowest distance is currently 37.

CUDA is pretty neat :)

jacquesm · on July 21, 2009

That's true and it was my first thought (which is why I asked about benchmarking the js code). What it did do was to get a pretty clever idea out into the world, which is to crowdsource the answer to some contest.

Contests are usually held to fire up peoples creativity and this idea to solve it via the browser as an ad-hoc cluster is really pretty creative.

Personally I think to try to solve this contest by brute forcing it is changing it in to a simple pissing contest between the various hardware / software combination, I'm sure some guy with an fpga kit will do a lot better than the CUDA competitors.

It would be nicer if someone discovered a previously undetected flaw and solved the contest through a backdoor. that would be hacker news :)

fsniper · on July 20, 2009

This idea is a really well thought and implemented. This may be a new way of cloud sourcing. Have you considered making a facebook app? It would definitely attract many more people than this. Of course you shou ld offer some carrots for users that do not have any idea what's going on.

rottencupcakes · on July 20, 2009

It's pretty well though out and is awesome in principle. The problem is that our interpreted javascript cloud network is getting owned by the awesome power of nVidia GPUs.

"Yeah, my 5 GPUs are currently cranking at a total rate of 800 M/sec. (Best score of 39 after running for a few minutes.)

(Hah! Just got lucky and hit 35 on one card after an hour.) " - The nVidia CUDA group

PStamatiou · on July 20, 2009

Agreed. running it on my computer with the slider set to max. Hope you win!

mikeyur · on July 21, 2009

Ditto. Got 10 tabs in Chrome and 10 tabs in Safari all running at 1ms. Using ~60% CPU. I'll let it run for the next 4-5hrs until I go to work.

Good luck.

fsniper · on July 20, 2009

oops :) I intended to comment on racymorgan implementation not this one. But this is good too. Opening tabs sometimes cause these mistakes. Sorry for both parties.

philfreo · on July 20, 2009

I hope you aren't disappointed when you a see a new update on your AJAX postback page that looks like a new winning string, when you see it is simply a note from me. :)

(I also generated a few hundred thousand hashes for you)

philfreo · on July 20, 2009

uhh.. sorry if I broke it. It now says the smallest hamming distance found is zero, so either you had a major breakthrough or something went wrong.

Edit: it's back up to 43, a nice improvement from 48 from a few minutes ago.

sil3ntmac · on July 20, 2009

Thanks Phil :)

I added a word checker, so no more of this tomfoolery!

andreyf · on July 20, 2009

To get more people excited, might be a good idea to say how many cycles each browser has contributed - make it a fight of Chrome vs Firefox vs Safari ;)

ars · on July 21, 2009

He can't.

As far as I can tell, the browsers only contact him if they find a string with a shorter hamming distance, otherwise it's entirely offline, including picking random words.

lukas · on July 20, 2009

Haha, nice work guys -- great minds. We made: http://shacontest.doloreslabs.com - same approach but server side.

I upvoted you :).

judofyr · on July 20, 2009

Did you just get a hamming distance of 0, or is something broken?

sil3ntmac · on July 20, 2009

Nah, some script kiddie messin with me. Fixed and patching the hole now...

jacquesm · on July 21, 2009

a quicker approach to your bit counting routines: (in C because that's what I'm most comfortable with, but it is pretty much a drop-in replacement for your code)

  #include <stdio.h>

  char dist[256];

  hamm_init()

  {
        int i,j,n;

        for (i=0;i<256;i++) {
                j = i;       

                n = 0;
                while (j) {
                        if (j & 1) {
                                n++;
                        }           
                        j >>= 1;    
                }

                dist[i] = n;
        }
  }

  int hamming_distance(char * hash1,char * hash2) {
        //calculate the distance between two hashes

        int d = 0;

        while (*hash1) {
                if (*hash2 == 0) {
                        break;
                }

                d += dist[*hash1++ ^ *hash2++];
        }

        return d;
  }

  main()

  {
        hamm_init();

        printf("0 %d\n",hamming_distance("abcd","abcd"));

        printf("1 %d\n",hamming_distance("accd","abcd"));

  }

barredo · on July 20, 2009

Well, 43 isn't that bad :-)

sil3ntmac · on July 20, 2009

Haha, he left a message in my database.

"Oh but for your trouble I generated a few hundred thousand hashes for you. --Phil Freo"

Thanks Phil :-) Your message's hash had a hamming distance of 73

rarestblog · on July 20, 2009

Please increase delta range down to 1ms, Chrome barely uses CPU power at 10ms. 6 tabs @10ms uses 25% of Core2Duo.

rarestblog · on July 20, 2009

Also decrease rate at which HTML stats are updated - DOM update slows stuff down, it should update stats only once a second or so.

sil3ntmac · on July 20, 2009

Updated to 1ms! I'll work on the HTML stat updates

Edit: fixed and fixed

andreyf · on July 20, 2009

Hah! I just generated as many hashes in a minute that I did in the hour I had it running beforehand. Good changes :)

rarestblog · on July 20, 2009

You might also consider removing UTF8Encoding from SHA1 implementation, I might be wrong, but I think contest states no encoding should be used - i.e. ASCII only. If that's a care, that's a useless O(N) operation.

sp332 · on July 21, 2009

UTF8(ASCII) == ASCII

Unless you're using characters > 255, UTF-8 is identical to ASCII.

m_eiman · on July 21, 2009

Actually it's UTF8(ASCII) == ASCII iff ASCII < 128, since the MSB signifies a multi-byte character.

http://en.wikipedia.org/wiki/UTF-8

Hm, on second thought UTF8(ASCII) is -always- == ASCII, since ASCII is only defined for 0-127, unless you count the extended variants :P

http://en.wikipedia.org/wiki/ASCII

sil3ntmac · on July 20, 2009

Yeah, my JS hamming distance implementation is pretty miserable. I'm no bit wizard, but I thikn I can modify the SHA1 function (before it returns the value as a hex string) and XOR the 5 vars with the hex value that they gave us, and then count the set bits. Couldn't get it working last night though so I switched over to this

m_eiman · on July 21, 2009

You could also make use of the fancy new worker thread stuff in browsers that support it. Should let you skip the timeout altogether.

rarestblog · on July 20, 2009

Now we're talking! 7 tabs = 80% cpu usage!

You might also consider lookup tables for Hamming distance (google for "counting bits").

jacquesm · on July 20, 2009

Have you compared running your code in JS on an ad-hoc cluster like this vs running it in optimized C on a local machine ?

Periodic · on July 20, 2009

It's going to be a lot slower, particularly with the goal of not slowing down a person's browser. By default, at least for me, their page is doing 1 hash per 200ms, or 5/second.

On the other hand, a quick Haskell implementation using Data.Digest.Pure.SHA is doing about 6,000 hashes per second (might have been 8000, I forget now) per core on an Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz

So you're looking at 1000 people to equal one core of a decent desktop. It's certainly an interesting approach, and has a lot of potential. I'm just not sure the marketing is there with the "help us win" motivation.

mbrubeck · on July 20, 2009

This JavaScript version: http://news.ycombinator.com/item?id=715014 (which doesn't do any throttling) is doing ~1500 hashes per second in one Chrome tab on my fairly old (2.8 GHz Pentium D) desktop. I'm curious how fast it would be on your Core 2 Duo in two Chrome tabs.

redorb · on July 20, 2009

Im getting about 3100/sec, on my duo centrino2 / thinkpad

ErrantX · on July 20, 2009

Mine peaked at 3,450 or so on a Core2 Quad @ 3.2Ghz

Retric · on July 21, 2009

I think GPU's are going to win this "Yeah, my 5 GPUs are currently cranking at a total rate of 800 M/sec" they are are already looking at the low 30's.

jmtulloss · on July 20, 2009

Maybe a Google Native Client approach would have better luck. You would have to require your users be running Chrome, but it would be damn fast.

sil3ntmac · on July 20, 2009

I don't think there would be much point, hah - JS will definitely be slower than optimized C. It's the crowdsourcing aspect that gives us a chance - it is very easy for anyone to just pull up the web page.

jacquesm · on July 20, 2009

If you cook me up a simple tag that does the work in the background without any visual distractions I'll give you a couple of thousand hosts :)

jacquesm · on July 20, 2009

have contributed 50k hashes now, will let it run through the night :)

good luck!

mbrubeck · on July 20, 2009

For unrelated reasons I recently did some benchmarking of SHA-1 in JavaScript vs Java. That particular benchmark was about 20x slower in the browser (both Chrome and Firefox 3.5) than on the JVM.

dmillar · on July 20, 2009

Not running C directly, but getting about 200,000 iterations per second running through an Nvidia GPU through CUDA and the Python wrapper PyCUDA.

sil3ntmac · on July 20, 2009

Just want to add that there is trick to make this incredibly easy to run in parallel - just open the page in a new tab!

fsniper · on July 20, 2009

Well on my dual core, opening a new tab could not fire up the other core. Opening opera side by the firefox did the trick.

mbrubeck · on July 20, 2009

I've got two tabs in Chrome using two CPU cores.

fsniper · on July 20, 2009

Chrome uses different processes per tab. It's their sand boxing tech for security.

rarestblog · on July 20, 2009

I've got 6 tabs on Core2Duo in Chrome, barely uses 25% CPU total.

lanstein · on July 20, 2009

six tabs in safari. knew those eight cores would come in handy.

schnalle · on July 20, 2009

17 chrome tabs@1ms ~95%

Davertron · on July 20, 2009

I would be interested to see if you could get something like this running using the Google NACL project (http://code.google.com/p/nativeclient/). Wouldn't that give you then benefit of having native code running on any machine that went to the page and had the plugin installed? You could have the user install a plugin when they went to the page and then reload the page. Just popped into my head, probably too late to implement anything now, but might be fun to give it a go.

silvio · on July 20, 2009

Would you be willing to write up the stats of your experiment once the challenge is over? I'd be really interested to know how much compute power you were able to harvest and what was the rate of compute acquisition.

Terrific idea and implementation. I have 10 tabs on Chrome. Good luck.

ars · on July 21, 2009

He can't.

As far as I can tell, the browsers only contact him if they find a string with a shorter hamming distance, otherwise it's entirely offline, including picking random words.

Klonoar · on July 20, 2009

Eh, I'll leave it on my PowerMac overnight. Just fyi, I've donated the following browsers...

OS X: Safari, Firefox 3.5, Webkit Nightly, Chromium Nightly, Opera 10 beta 2, Shiira, Camino 2, Seamonkey, Sunrise, Flock

Then I have 3 virtual machines open...

Windows XP #1: Firefox 3.5, Chrome, Internet Explorer 6 (Dev testing bed, not upgrading, but take it anyway), Opera 9, Maxthon 2 (Surprised I even had it)

Windows XP #2: Firefox 3.5, Chrome, Internet Explorer 7, Opera 9, Maxthon 2

Windows XP #3: Firefox 3.5, Chrome, Internet Explorer 8, Opera 9, Maxthon

I'd donate an Ubuntu VM or two, but... they don't wanna start up at this point. shrug

This may not help, I dunno, but I'm certainly not using the computer. Have a blast, and good luck!

alexgartrell · on July 20, 2009

As a CPU contributor, I demand that you give us stats as to which browsers were the fastest for your computation! This could be an interesting comparison point for the bundled javascript engines. (Maybe :))

andreyf · on July 20, 2009

Heh, I'm not sure if it would be representative of a good benchmark, but it might be exciting to see how many hashes each browser has contributed ;)

schnalle · on July 20, 2009

chrome definitley wins. but i think, (tracemonkeys) tracing could be THE feature in this case. lots of unrolled loops ... hopefully mozilla gets more out of it over time.

mrtron · on July 20, 2009

I wouldn't install an app to help you out.

I wouldn't leave it running if you killed my processor.

But you made it painless, so you get at least a few hours of runtime on me @ 1ms.

imok20 · on July 20, 2009

Fantastic! 10 tabs open in Safari 4 @1ms using < 1core of my 2.66GHz Core2Dueo. Cheers and good luck!

(I second, or third, the browser-war idea.)

Davertron · on July 20, 2009

Also, are you logging IP's for your best results? Would be cool to know if my machine was the one that came up with the shortest hamming distance (obviously IP's aren't a guaranteed way to figure this out, but still...)

petercooper · on July 20, 2009

Someone on Twitter claimed this entry has already got the 0:

http://twitter.com/rikfaith/statuses/2743945177

I have no way to check. True or not?

rikfaith · on July 20, 2009

Just to be clear, I never claimed it got a 0 :)

petercooper · on July 20, 2009

Oh right, just someone I was following pointed to it and said it was. My bad :) (Well theirs, I guess.. :P)

rikfaith · on July 20, 2009

(The tweets I'm sending to contesttracker have the hamming distance at the front.)

Oompa · on July 20, 2009

Not true, distance of 38.

barredo · on July 20, 2009

When I started it said: The smallest Hamming distance we've achieved so far is: 48

Now, I opened it also in Safari and says: The smallest Hamming distance we've achieved so far is: 0

0? Does that mean you got one?

Edit: Ok. Solved.

sil3ntmac · on July 21, 2009

I'm going to set the script up to tweet whenever I find a closer hash, and include the IP... you can follow me @silentmac

simonista · on July 20, 2009

I'd be curious to hear how many people/cycles you get to help with this.

infinity · on July 20, 2009

Very cool idea, I wish you good luck!

infinity · on July 20, 2009

I have just opened another 6 tabs on a different Duo Core machine, all sliders set to max. I will tell some more people ...

3pt14159 · on July 21, 2009

17M hashes overnight. :) Good luck.

Oompa · on July 20, 2009

Fantastic concept. Good luck!