Personal experience building infrastructure and an autoscaler for these things (which performed horribly due to API lag) while having weekly meetings with our GitHub rep to get the --ephemeral flag in (took about a year longer than promised). Sometimes the exit code would be 0 when using `--once`, and sometimes it would not be. Sometimes it'd also be 0 if the token somehow didn't work and the worker couldn't even register no matter how often you restarted it (of course with a cryptic Azure IAM error code). Either way, we eventually just decided that throwing away the machine if the runner exists for any reason was safest.
The linked issue is not about random exit codes though? And also doesn't really seem to support his assertion that the maintainers don't know how it works, but I'll admit I'm not very familiar with GitHub Actions.
she did not, the exit code story is a different thing, though one I told on HN before. basically i spent half a year or so with writing infra and autoscaling for action runners on GCP, and there's absolutely nothing there i'm proud of. the pieces just didn't fit together. and weirdly enough, the majority of the blame isn't even with GCP and their broken image snappotting, slow provisioning times and asinine API design.