CPU rate is only quasi-continuous. The fact of a thread running or not running is binary. You cannot be 63% on a CPU, that is an illusion of sampling and integration.
But the real trouble with CPU rate is that it represents the past, not the future. It tells you what happened to the last request and nothing about what will happen to the next request. A signal with actual predictive power is how many other requests are running or waiting to run.
In our case, non-idle cumulative thread CPU time (including system activity) is a nice metric. If it exceeds 70%, start sending some 503s (or 500s depending on the client). A single core can easily service 100B requests per day, a few times more than we needed to, so if we exceeded that by enough margin to saturate every core to 70% then the right course of action was some combination of (1) scale up, and (2) notify engineering that a moderately important design assumption was violated. We could still service 100B requests per day per core, but the excess resulted in a few 5xx errors for a minute while autoscaling kicked in. It's not good enough for every service, but it was absolutely what we needed and leaps and bounds better than what we had before, with a side-benefit of being "obviously" correct (simple code causes fewer unexpected bugs).
The core idea (in tweaking the numerator in exponential smoothing) isn't any easily refutable definition of CPU time though. It's that you're able to tailor the algorithm to your individual needs. A related metric is just the wall-clock time from the beginning of a request being queued till when it was serviced (depending on your needs, maybe compared to the actual processing time of said request). If you ever get to total_time >= 2 * processing_time, that likely indicates a queue about to exceed a critical threshold.
> how many other requests are running or waiting to run
I referenced that idea pretty explicitly: "If you have more control over the low-level networking then queue depth or something might be more applicable, but you work with what you have."
Yes, that matters. That doesn't make other approaches invalid. Even then, you usually want a smoothed version of that stat for automated decision-making.
But the real trouble with CPU rate is that it represents the past, not the future. It tells you what happened to the last request and nothing about what will happen to the next request. A signal with actual predictive power is how many other requests are running or waiting to run.