Hacker News new | past | comments | ask | show | jobs | submit login
Crowdsourcing Bruteforcing: Help me win the EngineYard challenge (silentmac.com)
103 points by sil3ntmac on July 20, 2009 | hide | past | favorite | 71 comments



Let's keep some perspective here folks. The idea is cool and all, but as discussed in the other thread, SHA-1 throughput on GPUs is between 200 million and 600 million per second, depending on the GPU (http://forums.nvidia.com/index.php?s=&showtopic=102349&#...). On the other hand, the maximum throughput that people are getting with JS is about 3000/second. Which is a hundred thousand times slower.

I think it is safe to conclude that browser-based crowdsourcing is not a compelling approach to numerical computation.


Yeah, AFAICT the leader right now is http://twitter.com/seibert with a hamming distance of 31 (http://forums.nvidia.com/index.php?s=87acc31e73d939b72dffbf7...), and he says he's pushing 800M/sec across 4 GPUs (http://twitter.com/seibert/status/2748691432).

For comparison, this guy's distributed JS implementation just reached 1B a few hours ago: http://twitter.com/wondersquirrel/status/2750527789

If Engine Yard was trying to promote their cloud services with this contest it's rather ironic it appears it will probably be won be a few desktop computers with high end GPUs, which cloud hosts usually don't even have.


Even on my lowly 8800GT with quad-core I'm hitting 83+ megahashes per second -- my lowest distance is currently 37.

CUDA is pretty neat :)


That's true and it was my first thought (which is why I asked about benchmarking the js code). What it did do was to get a pretty clever idea out into the world, which is to crowdsource the answer to some contest.

Contests are usually held to fire up peoples creativity and this idea to solve it via the browser as an ad-hoc cluster is really pretty creative.

Personally I think to try to solve this contest by brute forcing it is changing it in to a simple pissing contest between the various hardware / software combination, I'm sure some guy with an fpga kit will do a lot better than the CUDA competitors.

It would be nicer if someone discovered a previously undetected flaw and solved the contest through a backdoor. that would be hacker news :)


This idea is a really well thought and implemented. This may be a new way of cloud sourcing. Have you considered making a facebook app? It would definitely attract many more people than this. Of course you shou ld offer some carrots for users that do not have any idea what's going on.


It's pretty well though out and is awesome in principle. The problem is that our interpreted javascript cloud network is getting owned by the awesome power of nVidia GPUs.

"Yeah, my 5 GPUs are currently cranking at a total rate of 800 M/sec. (Best score of 39 after running for a few minutes.)

(Hah! Just got lucky and hit 35 on one card after an hour.) " - The nVidia CUDA group


Agreed. running it on my computer with the slider set to max. Hope you win!


Ditto. Got 10 tabs in Chrome and 10 tabs in Safari all running at 1ms. Using ~60% CPU. I'll let it run for the next 4-5hrs until I go to work.

Good luck.


oops :) I intended to comment on racymorgan implementation not this one. But this is good too. Opening tabs sometimes cause these mistakes. Sorry for both parties.


I hope you aren't disappointed when you a see a new update on your AJAX postback page that looks like a new winning string, when you see it is simply a note from me. :)

(I also generated a few hundred thousand hashes for you)


uhh.. sorry if I broke it. It now says the smallest hamming distance found is zero, so either you had a major breakthrough or something went wrong.

Edit: it's back up to 43, a nice improvement from 48 from a few minutes ago.


Thanks Phil :)

I added a word checker, so no more of this tomfoolery!


To get more people excited, might be a good idea to say how many cycles each browser has contributed - make it a fight of Chrome vs Firefox vs Safari ;)


He can't.

As far as I can tell, the browsers only contact him if they find a string with a shorter hamming distance, otherwise it's entirely offline, including picking random words.


Haha, nice work guys -- great minds. We made: http://shacontest.doloreslabs.com - same approach but server side.

I upvoted you :).


Did you just get a hamming distance of 0, or is something broken?


Nah, some script kiddie messin with me. Fixed and patching the hole now...


a quicker approach to your bit counting routines: (in C because that's what I'm most comfortable with, but it is pretty much a drop-in replacement for your code)

  #include <stdio.h>

  char dist[256];

  hamm_init()

  {
        int i,j,n;

        for (i=0;i<256;i++) {
                j = i;       

                n = 0;
                while (j) {
                        if (j & 1) {
                                n++;
                        }           
                        j >>= 1;    
                }

                dist[i] = n;
        }
  }

  int hamming_distance(char * hash1,char * hash2) {
        //calculate the distance between two hashes

        int d = 0;

        while (*hash1) {
                if (*hash2 == 0) {
                        break;
                }

                d += dist[*hash1++ ^ *hash2++];
        }

        return d;
  }

  main()

  {
        hamm_init();

        printf("0 %d\n",hamming_distance("abcd","abcd"));

        printf("1 %d\n",hamming_distance("accd","abcd"));

  }


Well, 43 isn't that bad :-)


Haha, he left a message in my database.

"Oh but for your trouble I generated a few hundred thousand hashes for you. --Phil Freo"

Thanks Phil :-) Your message's hash had a hamming distance of 73


Please increase delta range down to 1ms, Chrome barely uses CPU power at 10ms. 6 tabs @10ms uses 25% of Core2Duo.


Also decrease rate at which HTML stats are updated - DOM update slows stuff down, it should update stats only once a second or so.


Updated to 1ms! I'll work on the HTML stat updates

Edit: fixed and fixed


Hah! I just generated as many hashes in a minute that I did in the hour I had it running beforehand. Good changes :)


You might also consider removing UTF8Encoding from SHA1 implementation, I might be wrong, but I think contest states no encoding should be used - i.e. ASCII only. If that's a care, that's a useless O(N) operation.


UTF8(ASCII) == ASCII

Unless you're using characters > 255, UTF-8 is identical to ASCII.


Actually it's UTF8(ASCII) == ASCII iff ASCII < 128, since the MSB signifies a multi-byte character.

http://en.wikipedia.org/wiki/UTF-8

Hm, on second thought UTF8(ASCII) is -always- == ASCII, since ASCII is only defined for 0-127, unless you count the extended variants :P

http://en.wikipedia.org/wiki/ASCII


Yeah, my JS hamming distance implementation is pretty miserable. I'm no bit wizard, but I thikn I can modify the SHA1 function (before it returns the value as a hex string) and XOR the 5 vars with the hex value that they gave us, and then count the set bits. Couldn't get it working last night though so I switched over to this


You could also make use of the fancy new worker thread stuff in browsers that support it. Should let you skip the timeout altogether.


Now we're talking! 7 tabs = 80% cpu usage!

You might also consider lookup tables for Hamming distance (google for "counting bits").


Have you compared running your code in JS on an ad-hoc cluster like this vs running it in optimized C on a local machine ?


It's going to be a lot slower, particularly with the goal of not slowing down a person's browser. By default, at least for me, their page is doing 1 hash per 200ms, or 5/second.

On the other hand, a quick Haskell implementation using Data.Digest.Pure.SHA is doing about 6,000 hashes per second (might have been 8000, I forget now) per core on an Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz

So you're looking at 1000 people to equal one core of a decent desktop. It's certainly an interesting approach, and has a lot of potential. I'm just not sure the marketing is there with the "help us win" motivation.


This JavaScript version: http://news.ycombinator.com/item?id=715014 (which doesn't do any throttling) is doing ~1500 hashes per second in one Chrome tab on my fairly old (2.8 GHz Pentium D) desktop. I'm curious how fast it would be on your Core 2 Duo in two Chrome tabs.


Im getting about 3100/sec, on my duo centrino2 / thinkpad


Mine peaked at 3,450 or so on a Core2 Quad @ 3.2Ghz


I think GPU's are going to win this "Yeah, my 5 GPUs are currently cranking at a total rate of 800 M/sec" they are are already looking at the low 30's.


Maybe a Google Native Client approach would have better luck. You would have to require your users be running Chrome, but it would be damn fast.


I don't think there would be much point, hah - JS will definitely be slower than optimized C. It's the crowdsourcing aspect that gives us a chance - it is very easy for anyone to just pull up the web page.


If you cook me up a simple tag that does the work in the background without any visual distractions I'll give you a couple of thousand hosts :)


have contributed 50k hashes now, will let it run through the night :)

good luck!


For unrelated reasons I recently did some benchmarking of SHA-1 in JavaScript vs Java. That particular benchmark was about 20x slower in the browser (both Chrome and Firefox 3.5) than on the JVM.


Not running C directly, but getting about 200,000 iterations per second running through an Nvidia GPU through CUDA and the Python wrapper PyCUDA.


Just want to add that there is trick to make this incredibly easy to run in parallel - just open the page in a new tab!


Well on my dual core, opening a new tab could not fire up the other core. Opening opera side by the firefox did the trick.


I've got two tabs in Chrome using two CPU cores.


Chrome uses different processes per tab. It's their sand boxing tech for security.


I've got 6 tabs on Core2Duo in Chrome, barely uses 25% CPU total.


six tabs in safari. knew those eight cores would come in handy.


17 chrome tabs@1ms ~95%


I would be interested to see if you could get something like this running using the Google NACL project (http://code.google.com/p/nativeclient/). Wouldn't that give you then benefit of having native code running on any machine that went to the page and had the plugin installed? You could have the user install a plugin when they went to the page and then reload the page. Just popped into my head, probably too late to implement anything now, but might be fun to give it a go.


Would you be willing to write up the stats of your experiment once the challenge is over? I'd be really interested to know how much compute power you were able to harvest and what was the rate of compute acquisition.

Terrific idea and implementation. I have 10 tabs on Chrome. Good luck.


He can't.

As far as I can tell, the browsers only contact him if they find a string with a shorter hamming distance, otherwise it's entirely offline, including picking random words.


Eh, I'll leave it on my PowerMac overnight. Just fyi, I've donated the following browsers...

OS X: Safari, Firefox 3.5, Webkit Nightly, Chromium Nightly, Opera 10 beta 2, Shiira, Camino 2, Seamonkey, Sunrise, Flock

Then I have 3 virtual machines open...

Windows XP #1: Firefox 3.5, Chrome, Internet Explorer 6 (Dev testing bed, not upgrading, but take it anyway), Opera 9, Maxthon 2 (Surprised I even had it)

Windows XP #2: Firefox 3.5, Chrome, Internet Explorer 7, Opera 9, Maxthon 2

Windows XP #3: Firefox 3.5, Chrome, Internet Explorer 8, Opera 9, Maxthon

I'd donate an Ubuntu VM or two, but... they don't wanna start up at this point. shrug

This may not help, I dunno, but I'm certainly not using the computer. Have a blast, and good luck!


As a CPU contributor, I demand that you give us stats as to which browsers were the fastest for your computation! This could be an interesting comparison point for the bundled javascript engines. (Maybe :))


Heh, I'm not sure if it would be representative of a good benchmark, but it might be exciting to see how many hashes each browser has contributed ;)


chrome definitley wins. but i think, (tracemonkeys) tracing could be THE feature in this case. lots of unrolled loops ... hopefully mozilla gets more out of it over time.


I wouldn't install an app to help you out.

I wouldn't leave it running if you killed my processor.

But you made it painless, so you get at least a few hours of runtime on me @ 1ms.


Fantastic! 10 tabs open in Safari 4 @1ms using < 1core of my 2.66GHz Core2Dueo. Cheers and good luck!

(I second, or third, the browser-war idea.)


Also, are you logging IP's for your best results? Would be cool to know if my machine was the one that came up with the shortest hamming distance (obviously IP's aren't a guaranteed way to figure this out, but still...)


Someone on Twitter claimed this entry has already got the 0:

http://twitter.com/rikfaith/statuses/2743945177

I have no way to check. True or not?


Just to be clear, I never claimed it got a 0 :)


Oh right, just someone I was following pointed to it and said it was. My bad :) (Well theirs, I guess.. :P)


(The tweets I'm sending to contesttracker have the hamming distance at the front.)


Not true, distance of 38.


When I started it said: The smallest Hamming distance we've achieved so far is: 48

Now, I opened it also in Safari and says: The smallest Hamming distance we've achieved so far is: 0

0? Does that mean you got one?

Edit: Ok. Solved.


I'm going to set the script up to tweet whenever I find a closer hash, and include the IP... you can follow me @silentmac


I'd be curious to hear how many people/cycles you get to help with this.


Very cool idea, I wish you good luck!


I have just opened another 6 tabs on a different Duo Core machine, all sliders set to max. I will tell some more people ...


17M hashes overnight. :) Good luck.


Fantastic concept. Good luck!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: