I believe that reducing the power consumption and increasing the speed of AI inf...

danielheath · on May 13, 2024

I know someone working in this direction; they've described the big challenges as:

  * Finding ways to use extant chip fab technology to produce something that can do analog logic. I've heard CMOS flash presented a plausible option.
  * Designing something that isn't an antenna.
  * You would likely have to finetune your model for each physical chip you're running it on (the manufacturing tolerances aren't going to give exact results)

The big advantage is that instead of using 16 wires to represent a float16, you use the voltage on 1 wire to represent that number (which plausibly has far more precision than a float32). Additionally, you can e.g. wire two values directly together rather than loading numbers into an ALU, so the die space & power savings are potentially many, many orders of magnitude.

tasty_freeze · on May 13, 2024

> which plausibly has far more precision than a float32

If that was true, then a DRAM cell could represent 32 bits instead of one bit. But the analog world is noisy and lossy, so you couldn't get anywhere near 32 bits of precision/accuracy.

Yes, very carefully designed analog circuits can get over 20 bits of precision, say A/D converters, but they are huge (relative to digital circuits), consume a lot of power, have low bandwidth as compared to GHz digital circuits, and require lots of shielding and power supply filtering.

This is spit-balling, but the types of circuits you can create for a neural network type chip is certainly under 8 bits, maybe 6 bits. But it gets worse. Unlike digital circuits where signal can be copied losslessly, a chain of analog circuits compounds the noise and accuracy losses stage by stage. To make it work you'd need frequent requantization to prevent getting nothing but mud out.

hnaccount_rng · on May 15, 2024

You can get 8bit analog signal resolution reasonablyish easyish. The Hagen mode [1] of BrainScaleS [2] is essentially that. But.. yeah. No way in hell you are getting more than 16bit with that kind of technology, let alone more.

And those things are huge which lead to very small network sizes. This is partially due to the fabrication node, but also simply because there is even less well developed tooling for analog circuits compared to digital ones compared to software compilers

[1] https://electronicvisions.github.io/documentation-brainscale... [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907969/ [3] https://arxiv.org/pdf/2003.11996

bobmcnamara · on May 13, 2024

> which plausibly has far more precision than a float32

+/- 1e-45 to 3.4e38. granted, roughly half of that is between -1 and 1.

When we worked with low power silicon, much of the optimization was running with minimal headroom - no point railing the bits 0/1 when .4/.6 will do just fine.

> Additionally, you can e.g. wire two values directly together rather than loading numbers into an ALU

You may want an adder. Wiring two circuit outputs directly together makes them fight, which is usually bad for signals.

cyanide911 · on May 14, 2024

Do you have any references of papers/people working on this? I'm very interested in the possibilities that lie here, but have no idea where to start

rcxdude · on May 13, 2024

an analog value in such a chip has far, far less resolution than a float32. Maybe you get 16 bits of resolution, more likely 8, and your multiplications are going to be quite imprecise. The whole thing hinges on the models being tolerant of that.

Symmetry · on May 13, 2024

I think we're far away from analog circuits being practically useful, but one place that where we might embrace the tolerance for imprecision is in noisy digital circuits. Accepting that one in a million, say, bits in an output will be flipped to achieve a better performance/power ratio. Probably not when working with float32s where a single infinity[1] could totally mess things but for int8s the occasional 128 when you wanted a 0 seems like something that should be tolerable.

[1] Are H100s' maxtrix floating point units actually IEEE 754 compliant? I don't actually know.

SJC_Hacker · on May 14, 2024

I'd go a step further, something which resembles how "wet brains" (biological) actually work, but which could be produced easily.

Biological neural networks are nowhere near as connected as ANNs, which are typically fully connected. With biological neurons, the ingress / egress factors are < 10. So they are highly local

It is also an entirely different model, as there is no such thing as backpropagation in biology (that we know of).

What they do have is lieu of backpropagation is feedback (cycles)

And maybe there are support cells/processes which are critical to the function of the CNS that we don't know of yet.

There could also be a fair amount of "hard coded" connectedness, even at the higher levels. We already know of some. For instances, it is known that auditory neurons in the ears are connected and something similar to a "convolution" is done in order to localize sound source. It isn't an a emergent phenomena - you don't have to be "trained" to do it.

This is not surprising give life has had billions of years and a comparable number of generations in order to figure it out.

I guess in theory this could all be done in software. However given the trillion+ neurons in primate/human brains, this would be incredibly challenging on even the thousand-core machines we have nowadays. And before you scream "cloud" it would not have the necessary interconnectedness/latency.

It would be cool if you could successful model say, a worm/insect with this approach.

sdwr · on May 14, 2024

> What they do have is lieu of backpropagation is feedback (cycles)

I wonder where the partial data / feedback is stored. Don't want to sound like a creationist, but it seems very improbable that "how good my sound localization is" is inferred exclusively from the # of children I have.

SJC_Hacker · on May 14, 2024

It’s evolved in simpler organisms with a much shorter generation cycle and more offspring per generation.

Being able to localize sound source has a lot of benefits including predation avoidance and prey detection.

sdwr · on May 15, 2024

Almost convincing, except there's no animal composition (beyond gut microbes)! You can't stick 2 animals that each evolved 1 thing together.

brazzy · on May 13, 2024

Sounds pretty impossble to me do that with a sufficient combination of range and precision.

atoav · on May 13, 2024

What do you mean with inpossible? You are aware that what radio equipment does is often equivalent of analog operations like multiplication, addition, etc. just at high frequencies?

Sure accuracy is an issue, but this is not as impossible as you may think it would be. The main question will be if the benefits by going analog outweigh the issues arising from it.

Symmetry · on May 13, 2024

In general the problem with analog is that every sequential operation introduces noise. If you're just doing a couple of multiplications to frequency shift a signal up and down that's fine. But if you've got hundreds of steps and you're also trying to pack huge numbers of parallel steps into a very small physical area.

rsp1984 · on May 13, 2024

TBH that sounds like a nightmare to debug.

dnedic · on May 13, 2024

How do you inspect what is happening then without having ADCs sampling every weight, taking up huge die area?

jkaptur · on May 13, 2024

Maybe a silly question (I don't know anything about this) - how do you program / reprogram it?

Arch485 · on May 13, 2024

Realistically, you'd train your model the same way it's done today and then custom-order analog ones with the weights programmed in. The advantage here would be faster inference (assuming analog circuits actually work out), but custom manufacturing circuits would only really work at scale.

I don't think reprogrammable analog circuits would really be feasible, at least with today's tech. You'd need to modify the resistors etc. to make it work.

cptroot · on May 13, 2024

Here's an example of Veritasium talking about this from 2022: https://www.youtube.com/watch?v=GVsUOuSjvcg

hot_gril · on May 14, 2024

Even staying within digital, GPUs are not totally designed for AI learning or inference. But the more important problem right now is standardization.

brap · on May 13, 2024

I don’t know why you’re being downvoted, that’s an active area of research AFAIK

gitfan86 · on May 13, 2024

Maybe because that is a VERY different problem than the one discussed here.

Building a single analog chip with 1 billion neurons would cost billions of dollars in a best case scenario. A Nvidia card with 1 billion digital neurons is in the hundreds of dollars of range.

Those costs could come down eventually, but at that point CUDA may be long gone.

cyanide911 · on May 14, 2024

Do you have any references of papers/people working on this? I'm very interested in the possibilities that lie here, but have no idea where to start