Some of the work of Distributed.net ( http://www.distributed.net/Main_Page ) is wonderful. Does anyone know if this idea could be more than it is currently, where computers (more than ever) are sitting idly and not contributing their cycles in any meaningful way? Even 5 minutes of raw CPU 100% usage per device could do some serious computation. Theoretically superseding modern supercomputers.
Computers more than ever rely on not being at 100% CPU all the time, because the increased power consumption and heat dissipation is a problem. Instead it's all about the "race to idle": do the work and then go to sleep for a few miliseconds to cool down.
Case in point: with my MBP battery, I can get 8 hours of browsing the Web, reading articles, watching a YouTube video or another. But if I spin a parallel build that uses 100% of all cores for about 15 minutes, I eat through half of my battery life.
The difference between idle and full power is very large in modern CPUs - a typical laptop one can consume an average of a few watts at idle, but several dozen at full load; furthermore, they can switch between these states thousands of times per second. This is why CPU power circuitry is not easy to design, since it has to keep up with the very fast changes in current as the CPU transitions between power states.
Incidentally, this is also why you may be able to hear audible sounds from a computer when it's idle or running some particular process - the wakeups/sleeps are happening at a frequency in the hearing range, and the components like capacitors and coils can act as tiny speakers.
Thing is that when boinc and co were popular all those cycles were really wasted. Today computers just enter power-saving states when not in use saving electricity and in effect money.
This is because Intel is tricking you with the core count. I'm guessing you have an i5/i7/whatever with 8 cores but just two memory channels. Since it only takes two or three cores to saturate those memory channels, you will never be able to max out 8 cores on anything that processes much more than $CACHE_SIZE (~4MB) of data. So you can use 8 cores for stuff like finding primes or bruteforcing RC5 (like distributed.net), but not much else.
Depends on what you want and how much money you want to spend. If your workload is embarrasingly parallel, it could be cheaper to buy several servers with a CPU like the Xeon E5-1630 v3 (Edit: which has just 4 cores and 4 mem channels). Or, as you say, go the POWER8/SPARC route, which also includes the recent option of a single x86 server with up to 8x Xeon E7-8893 v3. For the latter option you may hit NUMA issues as well; depends on your workload.
Your choice boils down to the classic "message passing" vs. "shared memory" (think MPI vs. OpenMP) architectural choice. What the optimal solution is depends on the specific application, as well as how far you want to scale it.