Hacker News new | past | comments | ask | show | jobs | submit login
Debugging bare-metal STM32 from the seventh level of hell (jpieper.com)
166 points by jpieper on Aug 8, 2022 | hide | past | favorite | 68 comments



>It seems that the ADCs on the STM32G4 do not like to be turned on in rapid succession, and if they do, bad things can happen like having the prescaler flipped to a different value without it showing in the corresponding register.

This sounds very, VERY much like an incorrectly configured clock where some of the peripherals would end up with a clock frequency slightly above what they were designed for. Will work 99% of the time and will give you hell for the remaining 1%. Much more likely than stumbling upon an undiscovered errata in a fairly popular device family with 10+ years of history.

Could also be flakey power (check your decoupling capacitors) or an outright b0rked chip/board.


For what it is worth, the G0/4 family is relatively new. I'm pretty sure it has unique ADC IP too, since the published errata (which I'm very familiar with) are different from any other ST chip I know of.

The clock should of course have been suspect (as noted in the writeup). The "bad state" in this problem was basically indistinguishable from running the ADC at too high a clock rate. In fact, the default rate when I first encountered this problem does ever so slightly overclock the ADC. It is rated for 60MHz for single ADC operation, but only 26MHz for multiple ADCs. The firmware used to run the ADCs at ~28MHz, purposefully going a tiny bit above that.

I didn't include it in the writeup since it was somewhat of a diversion, but this particular problem occurred even with the ADCs configured to be clocked slower. As mentioned, I think that their clock configuration became mis-set as a result of the underlying problem.

And while poor decoupling is also a likely problem, I'm 95% sure it is about as good as it can get. A high quality cap of appropriate size is immediately next to the chip on every supply pin with vias directly to the ground plane. This is a low pin count QFN part, so the only ground on the chip is the center pad, which is also via'ed directly to the ground plane.


I wonder if it would be possible to create a test jig that turns on the ADCs all at once then samples data through them (perhaps just from a function generator)?


Maybe, anyway it's worth opening a case with ST support. I found a bug in STM32L0 a few years ago, and they did help (they found a software workaround which was not yet documented- had to do with waking up from deep sleep).


Or it was just faulty firmware and he did hit the (almost) correct conclusion:

ADEN bit on g4:

"Note: The software is allowed to set ADEN only when all bits of ADC_CR registers are 0 (ADCAL = 0, JADSTART = 0, ADSTART = 0, ADSTP = 0, ADDIS = 0 and ADEN = 0) except for bit ADVREGEN which must be 1 (and the software must have wait for the startup time of the voltage regulator"

No errata needed, it's clearly stated that you cannot just set ADEN without waiting for certain other conditions.


Actually, if you look at the firmware at the time, the proper procedure was followed as far as I can tell. All of the necessary bits were set and checked with the appropriate delays where required.

https://github.com/mjbots/moteus/blob/dcb900c92ffd5d5c8f5405...

Notably, the datasheet is largely silent on interactions between different ADCs during initialization.


I think it's more likely that the data needs to cross a clock boundary and multi-clock synchronisation takes time (especially if metastability is potentially an issue) - they way you make it safe is by holding the data you're synchronising stable for multiple clocks and synchronise a strobe signal across the clock boundary and then back again

My guess is that the circuit they're using gets confused if you start a second write when the first is not yet done


That is possible, although here the consecutive writes were to different ADC peripherals. The ADC peripherals do share some common configuration and triggering, but I believe are otherwise largely independent.


but they're going over the same peripheral bus (from the CPU, and being synchronised to a subsystem with likely the same ADC clock domain - I'd design that hardware once (metastability stuff is notorious for being hard to get right, especially when you are trying to transfer multiple related bits across clock boundaries at the same time, and you want them all to arrive together)


Yep, there is probably one clock domain for all the ADCs, although there are two different prescalers (one for ADC1/2 and another for 3/4/5).

I could see one of the writes getting lost. In this case though, the ADC enable is what seems to be timing sensitive, however the ADCs always end up enabled properly. It is just that a write that was significantly earlier (the one that sets the prescaler) seems to be lost, despite the register reading back that it was read correctly.

I would expect that if the synchronization failed, reads back would read the wrong value?


Yikes! That sounds frustrating. As the article concluded with, the way forward appears at the end; ST adding an errata for this.

The elephant in the room: How is the author getting STM32 G4s? I've been F5ing this every day with no luck for months: https://octopart.com/search?q=stm32g4+ceu&currency=USD&specs...


I just bought 10 off digikey last week, to have some prototyping stock. Looks like they still got some: https://www.digikey.com/en/products/detail/stmicroelectronic...

Octopart, etc haven't been able to accurately track inventory through this disruption.


Good to know! I've been trying to get hold of certain H7 and G4 variants in a specific footprint. Family and footprint are a hard requirement, but not too picky otherwise. Checking it out.

Of note regarding the one you linked - great find; looks like that Octopart link was bad! I actually have some 491s, and they're great, except that the OSS firmware I'm using them with for this project doesn't support them. Got my own firmware working on them though.


I'm curious what other OSS firmware is using the G4 series?


Betaflight

I also have custom flight controller firmware in Rust that's running fine on G491, but Betaflight needs 47x or 48x.

No luck on getting H7s.


We can buy them in 10s at work without too much gnashing of teeth, but the thousands we need is impossible for another 18 months as far as we can tell.

Edited to add: not that it matters now, we moved to a different MCU that we could get larger shipments of, and we’re lucky enough our application of them can be flexible enough to do so!


Order far in advance....? These particular chips arrived more than a year ago.

But yes, the shortage is hitting hard here too. The last tray I received was one year ago, after which all orders have been unfulfilled.


There's plenty of STM32 of any flavor in China. Alibaba is a good source. There may be fakes though, but reputable resellers (pay attention to reviews and tenure) normally sell genuine chips.


Are fakes of lesser quality? How do you tell fake from genuine?


I've gotten some Blue Pills with (properly labeled) CKS32s on them, which might be of lesser quality, but I've also seen people reporting getting Blue Pills with (properly labeled) GD32s, which are higher quality than the ST part, at least in the sense that they run successfully at higher clock speeds. (I couldn't tell you if they have more problems turning on many peripherals at close-together times, or if they have more noise on their ADCs, or something.)


You have to research the counterfeit market for each chip/product series. There's no hard and fast rule for their quality because many of them come from the same factories that make the real chips when unscrupulous fabs have shadow shifts manufacturing their clients' designs for themselves or just straight up copy and rebrand it.


Thanks for the tip! Will check that out. Worth taking the risk.


Getting samples of G4's was pretty easy. Since its a fairly new part, there was a large stock that distributors had for engineering samples. The story is very different for things like the F5/F7's and even getting small sample volumes is very hard.


If you can use Cortex-M0+ (so not likely for this project, BLDC where you want floating point), maybe think about RP2040 (Raspberry Pi Pico chip)? More than 100K of them in stock at Digikey, only $1 each... But they come in only one package type, and use external flash (an advantage, IMHO).

I've recently started looking at them. The SDK is nice except that they chose to use CMake. I designed it out:

https://github.com/nklabs/libnklabs-pico


> an advantage, IMHO

it isn't

An extra chip means higher cost to produce & assemble board, larger board size, more pins wasted on this nonsense, most fast-edge signals to route, more passives, extra risk to handle for one extra chip being out of stock, and it is much easier to extract firmware than even from a "protected" stm32

Also wasting RAM (and power for it) on code, or random (between high and very high) latency of XIP from SPI flash


But balanced against basically unlimited flash size. Many of my projects end up having an external SPI-flash for one reason or another anyway.

Another similar chip is the i.MX rt1020 from NXP (except Cortex-M7 and way more expensive). The one gotcha was that there was only one QSPI-flash controller even though there were two ports. It meant that the extra port (which we assumed would be available during architecture) was not fully usable without interfering with the firmware.


You can still get that with chips that have onboard flash, via QSPI or OctoSPI.


> Many of my projects end up having an external SPI-flash for one reason or another anyway.

for what? What sort of possible thing would you need to do on a c-m0 that needs more than 128K of CODE!?


Not so much for code, but lots of use for persistent data (logs, databases, etc.). Only a part of it would be used for XIP. Also with more flash space it's easy to have multiple banks for safe firmware update and the like.


> logs, databases

in NOR flash???

:cringe:


NOR is more durable than NAND (but smaller, more expensive per bit), but yes you have to be careful with logs. In most recent case, it was in the 10s of small records per day.


Wait, wasn't there some guy who got an entire Linux system booting on a microcontroller that's smaller than a Cortex-M0? https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...


Yup, I did. that’s why I ask


The libraries you pull in can be big. Add, e.g., LWIP for internet, USB filesystem management (data logging and USB firmware update), code to properly manage OTA or Ethernet updates, text strings, especially in multiple languages, etc.

What seems like a huge amount of Flash disappears quickly.


ADC calibration data. Plus some ID data. Plus log of very nasty events specific to my application. And there comes external SPI flash.


The initial state of doubting the test fixtures (healthy skepticism!) highlights that it's very useful to keep a few known-good hw specimens over time, so that when you get a new batch that fails, you can test your test equipment against the older, known good hardware. Also swap firmware versions back and forth on old/new hw and see how behavior/metrics change.


Tangent: reading this, all I could think was "you're allocating memory in a motor controller?!?" and then "you're using interrupts in a motor controller!?!". Surely the bug write up was good and it is interesting, but all of the actual hard real time systems I've ever seen or talked to engineers who actually worked on them always avoided interrupts and never allocate memory. Both of these activities in hard real time code can produce very unexpected results which may be similar to what the author found with enabling the ADCs in rapid succession.


Here "allocation" is all fixed size things pulled from a fixed size buffer at startup. Technically malloc is compiled in the firmware, but it isn't used for anything but some C++ runtime initialization confirmed with debugger breakpoints. The only dynamic use of memory is the call stack, which has only fixed size local variables and limited depth recursion.

Similarly, "interrupts" may not mean what you are thinking. The highest priority interrupt is one attached to the PWM timer that operates the primary control loop that operates in interrupt context. As of a few months ago this is slightly more complicated to accommodate some "soft" quadrature decoding, but the principle is still the same that all motor control is performed in an interrupt context and nearly nothing else is.

Everything else, like CAN communication, is performed in a polling manner in the "main" loop.


Are you out of timers? IIRC the STM32 series typically has timer peripherals that can be configured as encoder inputs.


No, but out of pins connected to timers with the appropriate capabilities.


The "standard" motor control setup for FOC is to do the ADC measurement synchronously with the PWM, to sample the current while the low-side switch is closed. Then the flux angles and voltages are computed immediately afterwards, and the PWM duty updated for the next cycle. This all happens in the interrupt, in a very hard real-time system -- start missing interrupts and all hell breaks loose.

PWM is run as fast as you can get away with, 20 kHz minimum (human hearing). The controller spends maybe 80% of itw time in interrupt context, leaving main context for tasks that are not time-sensitive.


> always avoided interrupts

I'm curious... how does one achieve precise timing without interrupts?


Cycle counting is common. Other options exist like a tight loop watching a counter register etc.


Cycle counting is dead for ARM chips with their wait states and instruction pipelines and lazy stacking. Works great for PICs though.


> Tangent: reading this, all I could think was "you're allocating memory in a motor controller?!?" and then "you're using interrupts in a motor controller!?!".

to be fair, that STM32 from what I can see benchmarks at 550 in CoreMark which puts it above a Pentium I, it's not like they're using a 8051 or a PIC16F... there's likely a lot of spare CPU time


Forth is very handy for this kind of new iron startup work.

You can interrogate the hardware interactively for those really thorny problems and confirm your hypotheses immediately. You can write short test scripts and send them via the terminal and/or write and send short Assembler routines and test those interactively. Each routine you define is interactively usable to assist with higher level testing and/or can be further compiled together to make higher level or looping tests.

It's old and it's weird but it's still a good tool for this kind of work. It may mean you only have to enter the top domain of Hell. :-)

https://mecrisp-stellaris-folkdoc.sourceforge.io/flashing-me...

*I have no connection to mecrisp Forth but know it to have a good reputation. It is a a native code compiler with an integrated text interpreter.


Yeah, I have (maybe slightly fond?) memories of using Forth to develop a tape drive reader and writer for an undergraduate lab project. It is wonderful for some things, although in this case, where the problem was literally which addresses the instructions got assigned to, it is unclear if it would have made anything better.


How did they manage to get STM32s?

My friend's small business had to close, because key MCU is not available anywhere and Chinese sell it for the price of the end product. Makes business no longer viable. He spent over a month designing board for another MCU that was available at the time, but as soon as he was ready to produce more the alternative was gone too. It's like chasing own tail.

Big corporations can post huge back-orders, but small business can't tie up that kind of sums of money for unknown period of time. He placed orders for that MCU over a year ago and distributors still tell the production is delayed. Meanwhile Chinese don't seem to be short of supply and are milking the desperate market.

Seems like a state intervention would be welcome here.


> Seems like a state intervention would be welcome here.

And that should look like how??

> can't tie up that kind of sums of money for unknown period of time.

Seriously curious now, must have a real cash cow with little development or huge volume?

Because what my company would have needed to pay for the MCU to put in stock for say 3 years is still dwarfed by one developers salary for a year (and the software team alone is varying 5-10, not including electro engineers or all other kinds). So in the aftermath just ridiculous to not have invested that money for the stock when the demand and MCU was clear (and for us it was at that point back already).

Also, nothing would have been better than tieing money that way, it could have made more profit by also becoming a broker now, lol :D


It is a small business, one guy does EE the other does SE.

Their model was to buy parts as they are needed depending on order volume (JIT).

Now with the shortages it is not possible. You can calculate how many orders you may get within a year and place a back order for these parts. 1000 MCUs alone is like £20k. They were looking at spending £50k total and facing no income at all until the parts arrive.


Probably the beginning of that month of designing would have been a better time to order the microcontrollers than the end of it. The answer to "how did jpieper get STM32s" is "he ordered them from Mouser in early 02021": https://news.ycombinator.com/item?id=32383312

What kind of state intervention are you thinking of? I spent some time trying to think of a state policy that would reduce the risk that small businesses would lose access to parts crucial to their products, rather than increasing it, but I couldn't think of one.


The simplest intervention would be to offer state backed loans that could be used for back orders and to be repaid 3-6 months after the delivery.

It's not ideal, probably it will create further delays, but it would put small business on the level playing field with the big corporations when it comes to orders.

The criteria for eligibility may be difficult to define too, so that fraud is minimised.

Maybe such surge of demand could incentivise MCU manufacturers to secure more fab capacity.

State could also make scalping illegal. For instance buying MCUs with a sole purpose to hold on them and then resell at inflated price would be illegal. This could be done by setting maximum margin someone could ask, that should be no higher than Mouser or Digikey have. So anyone selling MCUs at 10x the price would have them confiscated and get fined. Foreign online stores that facilitate such sellers like Aliexpress would risk being sanctioned and banned.


The loan idea might work, but it might not. Generally big corporations are better at fulfilling criteria for eligibility than small businesses are (they can afford larger legal departments), and it might just drive up prices without increasing supply. That seems to be what happened with state-backed loans for higher education in the US, for example.

The price-controls idea would create shortages, at least locally; Mouser and Digi-Key, prohibited from allocating scarce parts to those with more willingness to pay, would allocate them according to other criteria, such as customers with the largest established volume. Companies in other countries would have a major edge because they'd have access to distributors who weren't at risk of having their stock confiscated. The upshot would be that companies in whatever country instituted those price controls would be unable to compete on the global market because they couldn't buy on Aliexpress. (I've seen this kind of thing a lot up close and personal because I live in Argentina.)


Fair points!


From my experience there’s nothing to do but order in advance with as large of a stack of cash as you can manage and pray. It’s hardly worth writing code these days until you have supply sorted. Bugging your suppliers frequently can help, it’s how we got enough STM32s for development, but for production quantities it’s a waiting game. Upper management at my company balked at tying up capital for a large chip order about a year ago. Guess what? Now we’re paying even more for fewer chips to keep the lines up.


I hope your friend's other MCU wasn't GigaDevices, because we are currently switching two products over from STM32 to that!


Hats off to you, that was a tough one to find! Learned a couple of tricks too, so thanks for sharing.


This is really just how low-level embedded land is like. Hardware is often buggy.


Thankfully it's usually not this bad. I remember spending a lot of time debugging a problem because although the signs pointed to bad hardware, it was in such a commonly-used feature that I knew it had to be my code. Otherwise, everyone else who used the part would be screaming at the manufacturer.

After I finally admitted defeat and called my TI (Texas Instruments) support engineer, he said, "oh, that's a brand new part. You guys are actually only the second company we've sold it to. Thanks for the bug report, but we just discovered it last week so it'll be fixed in the next rev."


> The initialization sequence for the ADC is documented as requiring a wait until the ADRDY flag is set, so the fix is just to wait for that for each ADC in turn before enabling the next one.

Hmm? Ok, it may be the clock or something else subtle but if they tell you to wait...


In this case, the firmware did wait for the ADRDY flag. It just waited for all 5 to be set, then moved on to enable all 5 ADCs simultaneously. The easy fix was to just do those serially instead.


This is a great writeup. Thank you for sharing.

I sure do love and miss doing this kind of engineering work.


this has "i just changed the comments and it works now but i dont know why " vibe to it.


STM32s are actually very nice things to debug by the standards of the embedded scene.

At least 1 set of stm32 cube tools will be always working.

It's not uncommon for microcontroller vendors to not to provide any tooling at all, and just say "Use JTAG" without any further explanations.

And then JTAG may, or may not work. Or only work on some decade old 3rd party IDE + 3rd party USB (or even RS232...) JTAG debugger + pre sp2 win xp version combo.


Most likely, the author is using counterfeit hardware.

I haven't seen authentic ones in stock anytime this year. And bugs in rare edge cases is precisely what I would expect from a price-conscious copy.


Surprisingly, it is almost certainly genuine. These particular chips likely came from a batch delivered in April of 2021 from Mouser, who isn't known for their shoddy sourcing practices.


It's also a relatively new part which hasn't developed the kind of demand that leads to counterfeiting yet.

Most of the counterfeiters are targeting well-established old parts like the STM32F103.


Agree, I've never had bad experiences with Mouser.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: