Looks like 64 words of instruction and 128 of data for each processor. You will want to think "dataflow of simple DSPs" rather than "cluster of linux".
The power use is amazing. 84mW while 100% active on their 36 processor 475MHz unit.
It is a 13x13 array with 5 chopped off to make space. That goes down to 164, but there are three special purpose processors added back in to get up to 167.
Also worth noting that although they do an 802.11a receiver in 20-30 cores, they use two big chunks of special purpose silicon for a Viterbi decoder and an FFT. I'd be curious to know how many more cores it takes without the special purpose hardware, if it is even possible.
Because the other approach very-many-cores processor design that I know of, was by Intellasys (http://www.intellasys.net/), the latest company of Forth's inventor Chuck Moore. And it seemed to have gone nowhere: http://www.colorforth.com/S40.htm
The problem with this design, as I could see it, is that tiny cores can be too small to do anything practically useful on their own; inter-processor communication and I/O eats cycles. In Intellasys' case I guess the problems were multiplied by the fact that it would be harder to get traction for the chip was designed to be programmed in Forth , and Chuck Moore would settle for nothing less.
Maybe University of California will have better chance of success, i.e. if it has more runway to tweak their design.
Does anyone know of other very-many-cores processor chips?
Rigel (https://rigel.crhc.illinois.edu/) is one such project, though the silicon implementation is decidedly less far-along than AsAp. The goal there is 1024+ cores in 45nm. The targeted applications are more like "things that are massively parallel but don't run well on GPUs" than "DSP kernels that decompose into pipelines", but they are both accelerators nonetheless. A more detailed paper on the architecture itself can be found here: https://netfiles.uiuc.edu/jkelm2/www/papers/kelm-isca2009.pd... .
Full Disclosure: I am affiliated with this project. However, I am making no claims about the relative merits of AsAp and Rigel, just pointing out another manycore research project.
Not as helpful as I'd want, but yeah, I recall one that was getting some traction as a signals-processing aide for tasks like, eg, radar or video analysis and possibly DPI.
Totally blanking on the name, it had a pretty similar kit: it seemed like you wrote the raw code in C or a C-like lang, then there was some software assist for laying out dataflow between cores.
IIRC the seaforth chips were designed to have blocking reactive IO at ports on each processor, so there was no need to waste cycles waiting on IO or explicitly driving peripherals, you just coded as if the stream was continuous.
Motorola used to do devkits for their 56k series that featured USB or RS232 connections to a box containing the DSP and a pair of audio i/o jacks with average DACs.
Someone's come up with a One Instruction Set Computer for this purpose -- to enable reconfigurable arrays of small processors for massively parallel signal processing.
Hotchips presentation: http://www.hotchips.org/archives/hc18/2_Mon/HC18.S5/HC18.S5T... [also of the previous 36 processor silicon]
Looks like 64 words of instruction and 128 of data for each processor. You will want to think "dataflow of simple DSPs" rather than "cluster of linux".
The power use is amazing. 84mW while 100% active on their 36 processor 475MHz unit.