I believe that reducing the power consumption and increasing the speed of AI inference will be best served by switching to analog, approximate circuits. We don't need perfect floating-point multiplication and addition, we just need something that takes an two input voltages and produces an output voltage that is close enough to what multiplying the input voltages would yield.
I know someone working in this direction; they've described the big challenges as:
* Finding ways to use extant chip fab technology to produce something that can do analog logic. I've heard CMOS flash presented a plausible option.
* Designing something that isn't an antenna.
* You would likely have to finetune your model for each physical chip you're running it on (the manufacturing tolerances aren't going to give exact results)
The big advantage is that instead of using 16 wires to represent a float16, you use the voltage on 1 wire to represent that number (which plausibly has far more precision than a float32). Additionally, you can e.g. wire two values directly together rather than loading numbers into an ALU, so the die space & power savings are potentially many, many orders of magnitude.
> which plausibly has far more precision than a float32
If that was true, then a DRAM cell could represent 32 bits instead of one bit. But the analog world is noisy and lossy, so you couldn't get anywhere near 32 bits of precision/accuracy.
Yes, very carefully designed analog circuits can get over 20 bits of precision, say A/D converters, but they are huge (relative to digital circuits), consume a lot of power, have low bandwidth as compared to GHz digital circuits, and require lots of shielding and power supply filtering.
This is spit-balling, but the types of circuits you can create for a neural network type chip is certainly under 8 bits, maybe 6 bits. But it gets worse. Unlike digital circuits where signal can be copied losslessly, a chain of analog circuits compounds the noise and accuracy losses stage by stage. To make it work you'd need frequent requantization to prevent getting nothing but mud out.
You can get 8bit analog signal resolution reasonablyish easyish. The Hagen mode [1] of BrainScaleS [2] is essentially that. But.. yeah. No way in hell you are getting more than 16bit with that kind of technology, let alone more.
And those things are huge which lead to very small network sizes. This is partially due to the fabrication node, but also simply because there is even less well developed tooling for analog circuits compared to digital ones compared to software compilers
> which plausibly has far more precision than a float32
+/- 1e-45 to 3.4e38. granted, roughly half of that is between -1 and 1.
When we worked with low power silicon, much of the optimization was running with minimal headroom - no point railing the bits 0/1 when .4/.6 will do just fine.
> Additionally, you can e.g. wire two values directly together rather than loading numbers into an ALU
You may want an adder. Wiring two circuit outputs directly together makes them fight, which is usually bad for signals.
an analog value in such a chip has far, far less resolution than a float32. Maybe you get 16 bits of resolution, more likely 8, and your multiplications are going to be quite imprecise. The whole thing hinges on the models being tolerant of that.
I think we're far away from analog circuits being practically useful, but one place that where we might embrace the tolerance for imprecision is in noisy digital circuits. Accepting that one in a million, say, bits in an output will be flipped to achieve a better performance/power ratio. Probably not when working with float32s where a single infinity[1] could totally mess things but for int8s the occasional 128 when you wanted a 0 seems like something that should be tolerable.
[1] Are H100s' maxtrix floating point units actually IEEE 754 compliant? I don't actually know.
I'd go a step further, something which resembles how "wet brains" (biological) actually work, but which could be produced easily.
Biological neural networks are nowhere near as connected as ANNs, which are typically fully connected. With biological neurons, the ingress / egress factors are < 10. So they are highly local
It is also an entirely different model, as there is no such thing as backpropagation in biology (that we know of).
What they do have is lieu of backpropagation is feedback (cycles)
And maybe there are support cells/processes which are critical to the function of the CNS that we don't know of yet.
There could also be a fair amount of "hard coded" connectedness, even at the higher levels. We already know of some. For instances, it is known that auditory neurons in the ears are connected and something similar to a "convolution" is done in order to localize sound source. It isn't an a emergent phenomena - you don't have to be "trained" to do it.
This is not surprising give life has had billions of years and a comparable number of generations in order to figure it out.
I guess in theory this could all be done in software. However given the trillion+ neurons in primate/human brains, this would be incredibly challenging on even the thousand-core machines we have nowadays. And before you scream "cloud" it would not have the necessary interconnectedness/latency.
It would be cool if you could successful model say, a worm/insect with this approach.
> What they do have is lieu of backpropagation is feedback (cycles)
I wonder where the partial data / feedback is stored. Don't want to sound like a creationist, but it seems very improbable that "how good my sound localization is" is inferred exclusively from the # of children I have.
What do you mean with inpossible? You are aware that what radio equipment does is often equivalent of analog operations like multiplication, addition, etc. just at high frequencies?
Sure accuracy is an issue, but this is not as impossible as you may think it would be. The main question will be if the benefits by going analog outweigh the issues arising from it.
In general the problem with analog is that every sequential operation introduces noise. If you're just doing a couple of multiplications to frequency shift a signal up and down that's fine. But if you've got hundreds of steps and you're also trying to pack huge numbers of parallel steps into a very small physical area.
Realistically, you'd train your model the same way it's done today and then custom-order analog ones with the weights programmed in. The advantage here would be faster inference (assuming analog circuits actually work out), but custom manufacturing circuits would only really work at scale.
I don't think reprogrammable analog circuits would really be feasible, at least with today's tech. You'd need to modify the resistors etc. to make it work.
Maybe because that is a VERY different problem than the one discussed here.
Building a single analog chip with 1 billion neurons would cost billions of dollars in a best case scenario. A Nvidia card with 1 billion digital neurons is in the hundreds of dollars of range.
Those costs could come down eventually, but at that point CUDA may be long gone.