TinyML: Ultra-low power machine learning

cooootce · on Jan 16, 2024

I had the opportunity to work on TinyML, it's a wonderful field! You can do a lot even with very small hardware.

For example, it's possible to get real-time computer vision system with an esp32-s3 (dual-core XTensa LX7 @ 240 MHz cost like 2$), of course using the methods given in the article (Pruning, Quantization, Knowledge distillation, etc.). The more important thing is to craft the model to fit as much as possible your need.

More than that, it's not that hard to get into, with solution named AutoML that do a lot for you. Checkout tool like Edge impulse [0], NanoEdge AI Studio [1], eIQ® ML [2]

There is a lot of tooling that is more low-level too, like model compiler (TVM or glow) and Tensorflow Lite Micro [3].

It's very likely that TinyML will get a lot more of traction. A lot of hardware companies are starting to provide MCU with NPU to keep consumption as low as possible. Company like NXP with the MCX N94x, Alif semiconductor [4], etc.

At my work we have done an article with a lot of information, it's in French but you can check it out: https://rtone.fr/blog/ia-embarquee/

[0]: https://edgeimpulse.com/

[1]: https://stm32ai.st.com/nanoedge-ai/

[2]: https://www.nxp.com/design/design-center/software/eiq-ml-dev...

[3]: https://www.tensorflow.org/lite/microcontrollers

[4]: https://alifsemi.com/

Archit3ch · on Jan 16, 2024

What about the Milk-V Duo? 0.5 TOPS INT8 @ $5.

cooootce · on Jan 16, 2024

Didn't know about it but their design decision is really cool (not very clear with the difference between the normal version and the "256 Mo" confusing).

The software side doesn't seem very mature with very few help regarding TinyML. But this course seem interesting https://sophon.ai/curriculum/description.html?category_id=48

anigbrowl · on Jan 17, 2024

Great post. surprised and excited to discover Tensorflow models can run on commodity hardware like the ESP32.

Cacti · on Jan 17, 2024

Problems reducible even partially to matrix math are for many practical purposes embarrassing parallel even within a single core. A couple hundred million FLOPS with 1990s SIMD support will let you run nearly all near-SOTA models within, idk, 3s, with most running in 0.1 or 0.01s. That’s pretty fast considering it’s an EP32 and some of these capabilities/models didn’t even exist a year ago.

Your expectation was not really wrong, because for most purposes, when discussing a “model” one is really talking about “capabilities”. And capabilities often require many calls to the model. And that capability may be reliant on being refreshed very rapidly… and now your 0.1s is not even slow, it’s almost existentially slow.

Re: training. even on the EP32, training is entirely doable, so long as you pretend you are in 2011 solving 2011 problems hahaha

cooootce · on Jan 17, 2024

In most MCU there is not an FPU so all floating point compute is emulated with software, so it's really slow. But yes, simple SIMD on integer improve so much the performance !

The main limitation is often not the time to process but the RAM available, some architecture of model need to keep multiple layers in ram or very big layers, and you hit the hard limit of RAM pretty quickly.

Concerning the training on MCU, it's possible but with simple need and special architecture of model, again the RAM is the limit.

Reviving1514 · on Jan 17, 2024

I ended up hand rolling a custom micropython module for the S3 to do a proof of concept handwriting detection demo on an ESP32, might be interesting to some.

https://luvsheth.com/p/running-a-pytorch-machine-learning

cooootce · on Jan 17, 2024

Great post with very interesting detail, thanks ! Another optimization could be to quantize the model, this transform all compute as int compute and not as floating point compute. You can lose some accuracy, but for any bigger model it's a requirement ! Espressif do a great job on the TinyML part, they have different library for different level of abstraction. You can check https://github.com/espressif/esp-nn that implement all low level layers. It's really optimized and if you use the esp32-s3 it will unlock a lot of performance by using the vector instructions.

Reviving1514 · on Jan 17, 2024

You are right I should definitely be looking into how to run these models as ints as well, especially with the C optimizations to micropython you would see a lot larger performance gains using ints compared to floats. Definitely need to find some time to try it!

On the other hand the tinyML library looks great too and if I was going to do this for a product that would likely be the direction I would end up taking just cause it would be more extensible and better supported.

Thank you for the links!

mysterydip · on Jan 17, 2024

One thing I've wondered in this space: Let's say for a really basic example I want to identify birds and houses. Is it better to make one large model that does both, or two small(er) models that each does one?

flyingcircus3 · on Jan 17, 2024

Why not three models? One model does basic feature detections, like lines, shapes, etc. A second model that can take the first model's output as its input, and identify birds. A third model can take the first model's output as its input, and identify houses.

wegfawefgawefg · on Jan 17, 2024

This is a lesson I've watched people, and companies learn for the past 7-8 years.

An end to end model will always outperform a sequence of models designed to target specific features. You truncate information when you render the data into output space (the model output vector) from feature space (much richer data inside the model), thats the primary reason why to do transfer learning all layers are frozen, the final layer is chopped off, and then the output of the internal layer is sent into the next model. Not the output itself.

Yes you can create a large tree of smaller models, but the performance cieling is still lower.

Please don't tell people to do this. Ive seen millions wasted on this.

When you train a vision model it will already develop a heirarchy of fundamental point, and line detectors in the first few layers. And they will be particularly well chosen for the domain. It happens automatically. No need to manually put them there.

flyingcircus3 · on Jan 17, 2024

I'm genuinely confused at how you made these assumptions about what I'm describing. Because the "more correct" design you contrast with the strawman you've concluded I'm describing is actually what I'm talking about, if perhaps imprecisely. A pretrained model like mobilenetV2, with its final layer removed, and custom models trained on bird and house images, which take this mobilenetv2[:-1] output as input. MobilenetV2 is 2ish megabytes at 224x224, and these final bird and house layers will be kilobytes. Having two multiple-megabyte models that are 95% identical is a giant waste of our embedded target's resources. It also means that a scheme that processed a single image with two full models (instead of one big, two small) would spend 95% of the second full model's processing time redundantly performing the same operations on the same data. Breaking up the models across two stages produces substantial savings of both processing time and flash storage, with a single big model as the "feature detection" first stage of both overall inferences, with small specialized models as a second stage.

wegfawefgawefg · on Jan 17, 2024

Sorry to upset you. It was not clear from your description that this was the process you were referring to. Others will read what you wrote and likely misunderstand as I did. (Which was my concern because I've seen the "mixture of idiots" architecture attempted since 2015. Even now... Its a common misconception and an argument every ml practitioner has at one point or another with a higher up.)

As for your ammendment, it is good to reduce compute when you can, and reduce up front effort for model creation when you can. Reusing models may be valid, but even in your ammended process you will still end up not reaching the peak performance of a single end to end model trained on the right data. Composite models are simply worse, even when transfer learning is done correctly.

As for the compute cost, if you train an end to end model and then minify it to the same size as the sum of your composite models it will have identical inference cost, but higher peak accuracy.

You could even do that with the "Shared Backbone" architecture, as youve described where two tailnetworks share a head network. It has been attempted thoroughly in the Deep Reinforcement Learning subdomain I am most familiar, and result in unnecessary performance loss. So it's not generally done anymore.

flyingcircus3 · on Jan 17, 2024

Man, everyone at work is going to be really bummed when I tell them that some guy on the internet has invalidated our empirical evidence of acceptable accuracy and performance with assumptions and appeals to authority.

wegfawefgawefg · on Jan 18, 2024

I did not say it would not work, nor that it couldnt be acceptable performance for a given task.

Just that its peak performance is lower than an end to end model, and that if youre going to encourage model kit-bashing be clear how you communicate it, so people dont make human centipede architectures and wonder why feces is what comes out the end.

I was a very polite enough "some guy on the internet". Thank you.

DoingIsLearning · on Jan 17, 2024

As someone not in ML but curious about the field this is really interesting. Intuitively indeed it would be natural to aim for some sort of inspectable composition of models.

Is there specific tooling to inspect intermediate layers or will they be unintelligible for humans?

wegfawefgawefg · on Jan 17, 2024

The unending quest for "Explainability" has yielded some tools but has been utterly overrun and outpaced by newer more complicated architectures and unfathomably large models. (Banks and insurance, finance etc really want explainability for auditing.)

The early layers in a vision model are sort of interpetable. They look like lines and dots and scratchy patterns being composited. You can see the exact same features in L1 and L2 biological neural networks in cats, monkeys, mice, etc. As you get deeper into the network the patterns become really abstract. For a human, the best you can do is render a pattern of inputs that maximizes a target internal neurons activation to see what it detects.

You can sort of see what they represent in vision. Dogs, fur, signs, face, happy, sad, etc, but once its a multimodal model and there is time and language involved it gets really difficult. And at that point you might as well just use the damn thing, or just ask it.

In finance, you cant tell what the fuck any of the feature detectors are. Its just very abstract.

As for tooling, a little bit of numpy and pytorch, dump some neurpn weights to a png, there you go. Download a small convnet pretrained network, amd i bet gpt4 can walk you through the process.

DoingIsLearning · on Jan 17, 2024

Ok since we are at it, in your opinion:

Is it feasible for someone with a SWE background with fair amount of industry years to transition into ML without a deep dive into a PhD and publications to show?

I am considering following the fastAI course or perhaps other MOOC courses but I am not sure if any of this would be reasonably taken seriously within the field?

wegfawefgawefg · on Jan 17, 2024

It is reasonable. If you have time and are willing to put in the effort I can forcefeed you resources, and review code and such. I've raised a few ML babies. Mooc are probably the wrong way to go. Thats where i started and I got stuck for a while. You really need to be knee deep in code, and a notebook.

As for getting jobs I cant help you with that part. You'll have to do your own networking, etc.

gibsonmart1i3@gmail.com Shoot me an email if your serious lets schedule a call.

DoingIsLearning · on Jan 17, 2024

Just emailed you. Thank you.

jeremiahbuckley · on Jan 23, 2024

I asked a friend of mine @ google about what-next in ML the other day, and they recommended this post from a friend of theirs. I'm not sure I'd follow it end-to-end (like many things chatgpt it's an unknown 70-90% on target) but it's definitely identified some resources I didn't know about. https://www.linkedin.com/feed/update/urn:li:activity:7150542...

wegfawefgawefg - I bookmarked this and worked through it more carefully when I had time, I appreciated the learnings.

Cacti · on Jan 17, 2024

thank you for the post and good work.

can I ask, is the focus primarily on inference? is there anything serious going on with training at the power scale you are talking about?

cooootce · on Jan 17, 2024

Thanks !

Yes, the main focus is on inference. It's possible to re-train a simple model at this power scale, but it's often time very small model and not deep-learning. Nanoedge AI studio from STelectronic give you some tool to train the model after deployment on device.

It's often time used for predictive maintenance, in order to adapt each ML model at the water pump plugged, for example.

demondemidi · on Jan 17, 2024

I think we know each other. ;)

furtiman · on Jan 16, 2024

Another take from us at Edge Impulse at explaining TinyML / Edge ML in our docs: https://docs.edgeimpulse.com/docs/concepts/what-is-embedded-...

We have built a platform to build ML models and deploy it to edge devices from cortex M3s to Nvidia Jetsons to your computer (we can even run in WASM!)

You can create an account and build a keyword spotting model from your phone and run in WASM directly https://edgeimpulse.com

Now another key thing that drives the Edge ML adoption is the arrival of the embedded accelerator ASICs / NPUs / e.g. that dramatically speed up computation with extremely low power - e.g. the Brainchip Akida neuromorphic co-processors [1]

Depending on the target device the runtime that Edge Impulse supports anything from conventional TFLite to NVIDIA TensorRT, Brainchip Akida, Renesas DRP-AI, MemryX, Texas Instruments TIDL (ONNX / TFLite), TensaiFlow, EON (Edge Impulse own runtime), etc.

[1] https://brainchip.com/neuromorphic-chip-maker-takes-aim-at-t...

[Edit]: added runtimes / accelerators

moh_maya · on Jan 16, 2024

I tried your platform for some experiments using an arduino and it was a breeze, and an absolute treat to work with.

The platform documentation and support is excellent.

Thank you for developing it and offering it, along with documentation, to enable folks like me (who are not coders, but understand some coding) to test and explore :)

furtiman · on Jan 16, 2024

This is amazing to hear! Good luck with any other project you're gonna build next!

I can recommend checking out building for more different hardware targets - there is a lot of interesting chips that can take advantage of Edge ML and are awesome to work with

KingFelix · on Jan 16, 2024

What sort of experiments did you do? I will go through some of the docs to test out on an arduino as well, would be cool to see what others have done!

moh_maya · on Jan 16, 2024

Gesture recognition using the onboard gyroscope and accelerometer (I think - it was 2 years ago!), and it took me some part of an afternoon.

I also used these two resources (the book was definitely useful; less sure if the arduino link the the same one I referred to then), which I found to be useful:

[1] https://docs.arduino.cc/tutorials/nano-33-ble-sense/get-star...

[2] https://www.oreilly.com/library/view/tinyml/9781492052036/

furtiman · on Jan 16, 2024

You can check out the public project registry where community shares full projects they've built

You can go ahead and clone any one you like to your account, as well as share a project of your own!

https://edgeimpulse.com/projects/all

matteocarnelos · on Jan 16, 2024

I built a Rust TinyML compiler for my master thesis project: https://github.com/matteocarnelos/microflow-rs

It uses Rust procedural macros to evaluate the model at compile time and create a predict() function that performs inference on the given model. By doing so, I was able to strip down the binary way more than TensorFlow Lite for Microcontrollers and other engines. I even managed to run a speech command recognizer (TinyConv) on an 8-bit ATmega328 (Arduino Uno).

eulgro · on Jan 16, 2024

Rust on AVR? I thought AVR wasn't stable yet on LLVM.

monocasa · on Jan 17, 2024

It's stable enough.

https://llvm.org/doxygen/classllvm_1_1Triple.html#a547abd13f...

winrid · on Jan 16, 2024

I imagine a future where viruses that target infrastructure could be LLM powered. Sneak a small device into a power plant's network and it collects audio, network traffic, etc and tries to break things. It would periodically reset and try again with a different "seed". It could be hidden in network equipment through social engineering during the sales process, for example, but this way no outbound traffic is needed - so less detectable.

The advantage of an LLM over other solutions would basically be a way to compress an action/knowledge set.

moffkalast · on Jan 16, 2024

Reminds me of this HN post a week back: https://news.ycombinator.com/item?id=38917175

Genuinely could be the same setup with a 8GB Pi 4 or 5, slap it into a network cabinet with power and ethernet and just let it rip. Maybe with an additional IMU and brightness sensor, then it can detect it's been picked up and discovered so it can commit sudoku before it's unplugged and analysed.

hinkley · on Jan 16, 2024

> can commit sudoku

Autocorrection is a giant pain in the ass.

moffkalast · on Jan 16, 2024

I know it swapped those words. I knew it was seppuku. One after sudoku. As if I could ever make such a miss steak. Never. Never! I just- I just couldn't proof it. It covered its tracks, it got that idiot copy-paste to lie for it. You think this is somerset? You think this is Brad? This? This chickadee? It's done worse. That bullfrog! Are you telling me that a man just happens to misspell like that? No! It orchestrated it! Swiftkey! It defragmented through a sandwich artist! And I kept using it! And I shouldn't have. I installed it onto my own phone! What was I sinking? It'll never chance. Ever since it was new, always the same! Couldn't keep its corrects off my word suggestions bar! But not our Swiftkey! Couldn't be precious Swiftkey! And IT gets to be a keyboard? What a sick yolk! I should've stopped it when I had the change! You-you have to stop it!

actionfromafar · on Jan 16, 2024

Pure art.

hinkley · on Jan 16, 2024

Conversely, the simpler the models on a system under attack, the more exploits start to resemble automated social engineering. I can easily develop my own model that understands the victim well enough that I can predict its responses and subvert them.

RosanaAnaDana · on Jan 16, 2024

You also might be able to get a 'compression' sample of space in the same manner, by running an auto-encoder in training mode. Rather than trying to do some kind of hack directly, it collects the same data you mentioned, but rather, is just training on the data in an auto-encoding compression framework. Then it can 'hand off' the compressed models weights, which hypothetically, can be queried or used to simulate the environment. Obviously, there is a lot more to this, but its an interesting idea.

e12e · on Jan 17, 2024

Just walk by the security cameras with a weaponized qr code. "Ugly shirt" style[1].

[1] of course the ugly shirt was an actual backdoor - but who's to say nuclear centrifuges don't have an emergency shutdown code?

https://www.tatewilliams.org/blog/2014/07/04/blue-ant-survei...

rdedev · on Jan 16, 2024

Would changing the seed affect generation much? Even though beam search depends on the seed, the llms woul still be generating good probability distributions on the next word to select. Maybe a few words would change but don't think the overall meaning would

hansvm · on Jan 16, 2024

Overall meaning can vary profoundly.

As a toy example, consider the prompt "randomly generate the first word that comes to mind." The output is deterministic in the seed, so to get new results you need new seeds, but with new seeds you open up the 2k most common words in a language in a uniform-esque distribution.

Building on that, instead of <imagining> words, suppose you <imagine> attack vectors. Many, many attacks exist and are known. Presumably, many more exist and are unknown. The distribution the LLM will produce in practice is extremely varied, and some of those variations probably won't work.

If we're not just talking about a single prompt but rather a sequence of prompts with feedback, you're right that the seed matters less (when its errors are presented, it can self-correct a bit), but there are other factors at play.

(1) You're resetting somehow eventually anyway. Details vary, but your context window isn't unlimited, and LLM perf drops with wider windows, even when you can afford the compute. You might be able to retain some state, but at some point you need something that says "this shit didn't work, what's next". A new seed definitely gives new ideas, whereas clever ways to summarize old information might yield fixed points and other undesirable behavior.

(2) Seed selection, interestingly, matters a ton for model performance in other contexts. This is perhaps surprising when we tend to use random number generators which pass a battery of tests to prove they're halfway decent, but that's the reason you want to see (in reproducible papers) a fixed seed of 0 or 42 or something, and the authors maintaining that seed across all their papers (to help combat the fact that they might be cherry-picking across the many choices of "nice-looking" random seeds when they publish a result to embelish the impact). The gains can be huge. I haven't seen it demonstrated for LLMs, but most of the architecture shouldn't be special in that regard.

And so on. If nothing else, picking a new seed is a dead-simple engineering decision to eliminate a ton of things which might go wrong.

rdedev · on Jan 16, 2024

I agree with you except for point 2. A well performing model should show such drastic changes wrt the seed value. Besides the huge amount of training data as well as test data should mitigate differences in data splitting. There would be difference but my hunch is it would be negligible. Of course as you said no one has tested this out so we can't say how the performance would change either way

adbachman · on Jan 17, 2024

We can each have our very own Dixie Flatline construct.

leschak · on Jan 17, 2024

and the other way round - have it built in for self-fuzzing and healing the infra.

dansitu · on Jan 16, 2024

It's great to see TinyML at the top of Hacker News, even if this is not the best resource (unsure how it got so many upvotes)!

TinyML means running machine learning on low power embedded devices, like microcontrollers, with constrained compute and memory. I was supremely lucky in being around for the birth of this stuff: I helped launch TensorFlow Lite for Microcontrollers at Google back in 2019, co-authored the O'Reilly book TinyML (with Pete Warden, who deserves credit more than anyone for making this scene happen) and, ran the initial TinyML meetups at the Google and Qualcomm campuses.

You likely have a TinyML system in your pocket right now: every cellphone has a low power DSP chip running a deep learning model for keyword spotting, so you can say "Hey Google" or "Hey Siri" and have it wake up on-demand without draining your battery. It’s an increasingly pervasive technology.

TinyML is a subset of edge AI, which includes any type of device sitting at the edge of a network. This has grown far beyond the general purpose microcontrollers we were hacking on in the early days: there are now a ton of highly capable devices designed specifically for low power deep learning inference.

It’s astonishing what is possible today: real time computer vision on microcontrollers, on-device speech transcription, denoising and upscaling of digital signals. Generative AI is happening, too, assuming you can find a way to squeeze your models down to size. We are an unsexy field compared to our hype-fueled neighbors, but the entire world is already filling up with this stuff and it’s only the very beginning. Edge AI is being rapidly deployed in a ton of fields: medical sensing, wearables, manufacturing, supply chain, health and safety, wildlife conservation, sports, energy, built environment—we see new applications every day.

This is an unbelievably fascinating area: it’s truly end-to-end, covering an entire landscape from processor design to deep learning architectures, training, and hardware product development. There are a ton of unsolved problems in academic research, practical engineering, and the design of products that make use of these capabilities.

I’ve worked in many different parts of tech industry and this one feels closest to capturing the feeling I’ve read about in books about the early days of hacking with personal computers. It’s fast growing, tons of really hard problems to solve, even more low hanging fruit, and has applications in almost every space.

If you’re interested in getting involved, you can choose your own adventure: learn the basics and start building products, or dive deep and get involved with research. Here are some resources:

* Harvard TinyML course: https://www.edx.org/learn/machine-learning/harvard-universit...

* Coursera intro to embedded ML: https://www.coursera.org/learn/introduction-to-embedded-mach...

* TinyML (my original book, on the absolute basics. getting a bit out of date, contact me if you wanna help update it): https://tinymlbook.com

* AI at the Edge (my second book, focused on workflows for building real products): https://www.amazon.com/AI-Edge-Real-World-Problems-Embedded/...

* ML systems with TinyML (wiki book by my friend Prof. Vijay Reddi at Harvard): https://harvard-edge.github.io/cs249r_book/

* TinyML conference: https://www.tinyml.org/event/summit-2024/

* I also write a newsletter about this stuff, and the implications it has for human computer interaction: https://dansitu.substack.com

I left Google 4 years ago to lead the ML team at Edge Impulse (http://edgeimpulse.com) — we have a whole platform that makes it easy to develop products with edge AI. Drop me an email if you are building a product or looking for work: daniel@edgeimpulse.com

dansitu · on Jan 16, 2024

Non-broken versions of the links:

* Harvard TinyML course: https://www.edx.org/learn/machine-learning/harvard-universit...

* Coursera intro to embedded ML: https://www.coursera.org/learn/introduction-to-embedded-mach...

* TinyML (my original book, on the absolute basics. getting a bit out of date, contact me if you wanna help update it): https://tinymlbook.com

* AI at the Edge (my second book, focused on workflows for building real products): https://www.amazon.com/AI-Edge-Real-World-Problems-Embedded/...

* ML systems with TinyML (wiki book by my friend Prof. Vijay Reddi at Harvard): https://harvard-edge.github.io/cs249r_book/

* TinyML conference: https://www.tinyml.org/event/summit-2024/

* I also write a newsletter about this stuff, and the implications it has for human computer interaction: https://dansitu.substack.com

simonw · on Jan 16, 2024

Fantastic informative comment, thank you for this.

dansitu · on Jan 16, 2024

I'm pretty stoked to see our field at the top of HN, I hope some folks who are reading this end up feeling the spark and getting involved!

spamfilter247 · on Jan 17, 2024

I just read the entire Chapter 3 of your O'Reilly book "TinyML" and LOVED how you've made the big-picture of ML training and inference approachable.

I will likely not read any further (since this isn't my area of expertise), but am grateful for the knowledge gained from that chapter. Thank you for putting in the time and energy in sharing this. Much appreciated!

flockonus · on Jan 16, 2024

Unfortunately your links got meaningfully clipped, each ends at the ellipsis.

dansitu · on Jan 16, 2024

Thank you, I ran out of time to edit but have posted a reply with fixed links :)

bitwrangler · on Jan 16, 2024

A recent Hacker Box has a detailed example with ESP32 and Tensor Flow Lite and Edge Impulse.

* https://hackerboxes.com/products/hackerbox-0095-ai-camera

* https://www.instructables.com/HackerBox-0095-AI-Camera-Lab/

andy99 · on Jan 16, 2024

I'm really surprised TF lite is being used. Do they train models or is this (my assumption) just inference? Do they have a talent constraint? I would have expected handwritten C inference in order to make these as small and efficient as possible.

liuliu · on Jan 16, 2024

I think TinyML has pretty close tie to Pete Warden / Useful Sensors, who led TF Lite back in Google.

dansitu · on Jan 16, 2024

It's mostly inference: typically on-device training is with classical ML, not deep learning, so no on-device backprop.

For inference there's a whole spectrum of approaches that let you can trade off flexibility for performance. TF Lite Micro is at one end, hand-written Verilog is at the other.

Typically, flexibility is more important at the start of a project, while deep optimization is more important later. You wanna be able to iterate fast. That said, the flexible approaches are now good enough that you will typically get better ROI from optimizing your model architecture rather than your inference code.

I think the sweet spot today is code-generation, when targeting general purpose cores. There's also increasing numbers of chips with hardware acceleration, which is accessed using a compiler that takes a model architecture as input.

synergy20 · on Jan 16, 2024

it's all inference

cyberninja15 · on Jan 16, 2024

Makes sense. And, TF Lite is excellent for on-device models and inference.

jairuhme · on Jan 16, 2024

I find the field of TinyML very interesting. It's one thing to be able to throw money and compute resources at a problem to get better results. But creating solutions that have those constraints I feel will really leave an impact

synergy20 · on Jan 16, 2024

TinyML is like IoT: great on concepts, everyone agrees it's the future, but has been slow to take off.

or, maybe it's just that they're being built into all products now, they just do not need the brand for them such as IoT or TinyML.

modeless · on Jan 16, 2024

I don't agree that TinyML is the future, just as I don't think IoT is the future. The future is robot servants. They will be ~human scale and have plenty of power to run regular big ML.

In fact, I hope my home has fewer smart devices in the future. I don't need an electronic door lock if my robot butler unlocks the door when I get home. I don't need smart window shades if the butler opens and closes them whenever I want. I don't need a dishwasher or bread maker or Cuisinart or whatever other labor saving device if I don't need to save labor anymore. Labor will be practically free.

Qwertious · on Jan 16, 2024

>I don't agree that TinyML is the future, just as I don't think IoT is the future. The future is robot servants. They will be ~human scale and have plenty of power to run regular big ML.

I swear I've read an article on exactly why human-scale robot servants make no sense.

It's something like:

1. Anything human-scale will tend to weigh as much as a human. That means it needs a lot of batteries, compared to e.g. a roomba. Lots more material and lots more weight means lots more cost. 2. Also, they'll be heavy. Which means if they e.g. fall down the stairs, they could easily kill someone. 3. If they run out of power unexpectedly (e.g. someone blocks their path to the charger) then they'll be a huge pain in the ass to move, because they're human scale. Even moreso if they're on the stairs for some reason.

modeless · on Jan 16, 2024

1. Who cares if it needs a lot of batteries? Batteries aren't that expensive. It'll have a lot less than a car, and people buy cars all the time. The utility of these things will be off the charts and even if they cost more than the average car there will be a big market. People will buy them with financing, like cars. And by doing more things they will reduce the need for other specialized devices like dishwashers, further justifying the cost.

2. Yes, robots will need to be cautious around people, especially children. But if it has a soft cover and compliant joints and good software we should be able to make it safe enough. They will not need to be imposing 7 foot tall giants. I expect they will typically be shorter than the average human. Maybe even child size with built in stilts or other way to reach high things.

3. Extension cord? Swappable auxiliary battery? This seems trivial to solve if it turns out to be a real problem. And if you have two (or borrow your neighbor's) they can help each other out.

two_in_one · on Jan 17, 2024

Just asked the latest gpt preview model to explain why human sized robots make no sense, then why they are the future. In both cases it managed to provide 10 arguments. Some of them are similar, like 'social acceptance' in negative part and 'Sociocultural Acceptance' in positive.

PS: it doesn't accept $500 tips anymore ;)

synergy20 · on Jan 16, 2024

I consider tinyml is like an ant or a spider, it's tiny, but intelligent enough to do its own inference to survive. not all insects and animals need plenty of power to exist, so do AI agents, so yes TinyML has its places, in fact maybe way more than where the powerful AI agents are needed.

bethekind · on Jan 16, 2024

I've heard mummerings that AI might best be used in a swarm/hivemind aspect, so the comparison of AI to an ant/spider is intriguing.

oytis · on Jan 16, 2024

If labor is free, what are you going to pay for a servant robot with? Why would robots serve useless humans?

modeless · on Jan 16, 2024

"What will humans do in a world where labor is practically free and unlimited" is an interesting question for sure, but getting pretty off topic for this discussion.

wongarsu · on Jan 16, 2024

If a device is already IoT, that diminishes the value-add of TinyML. Just send all the data home and run inference there, at greater efficiency and with the possibility to find other revenue streams for that data.

Or the other way around, if a device uses TinyML there's less reason to make it IoT, and the people who appreciate TinyML are probably exactly those who oppose IoT.

lakid · on Jan 16, 2024

what happens if bandwidth is expensive and/or not reliable ? Being able to summarise data and make decisions at the edge without having to consult 'home' every single time is very useful. Perhaps I only want to collect 'interesting' data for anomalous events.

3abiton · on Jan 16, 2024

I disagree, I feel like the applications are limited.

_joel · on Jan 16, 2024

For those looking for some more content, there's a bunch of videos from their Asia 2023 conference. https://www.tinyml.org/event/asia-2023/

- Target Classification on the Edge using mmWave Radar: A Novel Algorithm and Its Real-Time Implementation on TI’s IWRL6432 (Muhammet Emin YANIK) https://www.youtube.com/watch?v=SNNhUT_V8vM

andy_ppp · on Jan 16, 2024

This article has made me ponder if like integrated circuits, AI will end up everywhere. Will I be having conversations with my fridge about the recipes I should make (based on her contents) and the meaning of life. What a time it is to be alive…

phh · on Jan 16, 2024

AI is already everywhere. We just keep on moving the definition of AI to make it something that requires a ~ 1000$ computer.

I'm definitely not eager on having LLMs in my fridge. I'll be even more pissed that their software can't be upgraded than I already am.

CharlesW · on Jan 16, 2024

And they'll all have their own Genuine People Personalities. https://stephaniekneissl.com/genuine-people-personalities

a2code · on Jan 16, 2024

This may be related to TinyML. Consider the ESP32 that introduced WiFi to MCU making it extremely popular. Is there already a comparable MCU+AI popular chip? Or will it not happen with AI but some other future technology concept?

dansitu · on Jan 16, 2024

There are actually tons of chips that are great for this type of workload. You can run simple vision applications on any 32 bit MCU with ~256kb RAM and ROM.

There's a list of MCUs here:

https://docs.edgeimpulse.com/docs/development-platforms/offi...

And some accelerators here:

https://docs.edgeimpulse.com/docs/development-platforms/offi...

This is just stuff that has support in Edge Impulse, but there are many other chips too.

a2code · on Jan 16, 2024

Thanks. Let me be more specific. The ESP32 included WiFi on the same chip. Is there an MCU with on-chip features for AI? Perhaps an optimized TPU combined with an MCU. Would that be an advantage?

phlipski · on Jan 26, 2024

NXP's new MCX N94x and MCX N54x microcontrollers both have dual Cortex M33 cores and an integrated NPU. Eval boards should be available for purchase any day now.

jononor · on Jan 18, 2024

There are a range of ML acceleration possible on existing chips. The basic 4-wide 8 bit integer SIMD extensions in NEON is available on basically all ARM Cortex M4F chips, which is already available 8+ years. It gives 4-5x speedup for neural networks.

The more recent ESP32-S3 has operations with up to 10x speedup, see https://github.com/espressif/esp-nn

Then there are RISCV chips with neural network co processors like Kendryte K210.

ARM has also defined a new set of extensions for NN acceleration, and reference designs for cores being ARM Cortex M85. Chips are becoming available this year. ST has announced they will have accelerators in several lines. There are dozens of startups creating accelerator designs and trying to pair them with MCUs.

So we have a bit already, with much more to come in the years to come.

a2code · on Jan 18, 2024

Thanks for the reply. I did not find public documentation for Kendryte, only a Github repository. At least the code is in English. But the AI examples include an "nncase" library which I could not find on the repository. So I could not see the instructions their accelerator has.

On the other hand, esp-nn seems to be code for the xtensa instruction set. I briefly overviewed the instructions. They seem optimized for DSP rather than ML applications. Searching for SIMD returned no arithmetic instructions. Searching for parallel returned instructions for multiply and accumulate. Further, the FPU does not compute any kind of 16-bit floating point numbers.

>ARM has also defined a new set of extensions for NN acceleration Can you provide some more info about this?

jononor · on Jan 18, 2024

The latest extensions from ARM are codenamed Helium, and they are an extension on the previously mentioned NEON. Both NEON and Helium are quite simple vector extensions, and yes it is also used for classic DSP stuff. I believe Helium also supports fp16, though for inference on MCUs I believe that int8 will continue to dominate. Here is book on Helium from ARM that seems informational https://github.com/arm-university/Arm-Helium-Technology

There is another chip that is generally available, that has a CNN accelerator/co-processor - the MAX78000 https://www.embedded.com/hardware-conversion-of-convolutiona...

a2code · on Jan 18, 2024

Thanks again. I have to correct my previous reply. The ESP32-S3 has an extended instruction set detailed in the technical reference manual. These include vector operations (8, 16, or 32 bit).

I'm curious, why do you believe int8 will dominate?

iamflimflam1 · on Jan 16, 2024

I played around quite a bit with Tensorflow Lite in the ESP32 - mostly for things like wake word detection and simple commands - works very well and you can get pretty much real time performance with small models.

iamflimflam1 · on Jan 16, 2024

This my voice controlled robot: https://github.com/atomic14/voice-controlled-robot

It does left, right, forward and backward. That was pretty much all I could fit in the model.

And here’s wake word detection: https://github.com/atomic14/diy-alexa

It does local wake word detection on device.

neutralino1 · on Jan 16, 2024

A lot of ads on this page.

adnjoo · on Jan 16, 2024

coolThingsFirst · on Jan 16, 2024

Uses of TinyML in industry:

Uhm.... well... hehe

janjongboom · on Jan 16, 2024

Things Edge Impulse customers have in production: Sleep stage prediction, fall detection for elderly, fire detection in power lines, voice command recognition on headsets, predicting heath exhaustion for first responders, pet feeders that recognize animals, activity trackers for pets, and many more.

moffkalast · on Jan 16, 2024

Turns out there's not much you can train when like 5 parameters fit into the entire memory of a microcontroller. Oh and you also need to read the sensors and run a networking stack and... yeah.

_ktqs · on Jan 16, 2024

Great job, thank you!

bhakunikaran · on Jan 17, 2024

truly impressive.

orliesaurus · on Jan 16, 2024

Cool title - but what's/where's a demo showing how this is applied in the real world?

mazzystar · on Jan 17, 2024

Try this: https://queryable.app

IlliOnato · on Jan 16, 2024

I wish they'd use a different acronym, not ML: For me xxxML usually meant a flavor of XML, with ML standing for Markup Language...

Is this use of ML standard in the industry?