This is usually a result of following 'cloud best practices' instead of being pragmatic.
For example, using Kubernetes with Docker where each pod is some virtual ECS 4-core whatever instance, assuming scaling the workload over a 1000 instances will be fast. Which in your case will lead to said pods individually spending 90% of their time running 'npm install' , and each pod is way slower than your desktop PC.
Different tests have different needs and bottlenecks. For example, if you're running unit tests, with no external dependencies, you'd want to separate out the build process, and distribute the artifacts to multiple test machines that run tests in parallel.
The choice of using a few big machines or more small ones is usually up to you, and is a wash cost-wise. However I do need to point out AWS sells you 192 core machines, which I'd wager are WAY faster than what you're sitting in front of.
And Amdahl's law is a thing as well. If there's a 10 minute non-parallizable build process, doing a 10 minute test run on a 192 core machine vs doing an ~0 minute test run on infinite machines ends up being only a 2x speedup.
And there's such a thing as setup costs. Spinning up an AWS machine from an image, setting up network interfaces, configuring the instance from scratch has a cost associated with it as well. And managing a cluster of 1000 computers also has its set of unique challenges. And assuming that if you ask Amazon for a 1000 machines at a drop of the hat, and plan to use each for a minute or two, AWS will throttle the fuck out of you.
All I'm trying to say with this incoherent rambling is KISS - know your requirements, and build the simplest infra that can satisfy it. Having a few large stopped instances in a pool might be all the complexity you need, and while it flies in the face of cloud best practices, it's probably going to be the fastest to start up.
I would argue the problem is for all the CI systems out there at the moment, there all "stupid": i.e. none of them try to predictively spin up an environment for a dev who is committing on a branch, none of them have any notion of promoting different "grades" of CI worker (i.e. an incremental versus a pristine) and none have any support for doing something nice like "lightweight test on push".
All of this should be possible, but the innovation is just not there.
> notion of promoting different "grades" of CI worker (i.e. an incremental versus a pristine) and none have any support for doing something nice like "lightweight test on push".
Which ones don't support that? Anything with a Docker cache, for example, can build layers efficiently, reusing previously built layers. Build triggers let you choose when to start a job, so GH actions, for example, can trigger a full test on any PR and a light test on any commit to a branch.
> And assuming that if you ask Amazon for a 1000 machines at a drop of the hat, and plan to use each for a minute or two, AWS will throttle the fuck out of you.
I haven’t scaled that far out, but I thought that was the whole point of cloud platforms like AWS, GCP and Azure
For example, using Kubernetes with Docker where each pod is some virtual ECS 4-core whatever instance, assuming scaling the workload over a 1000 instances will be fast. Which in your case will lead to said pods individually spending 90% of their time running 'npm install' , and each pod is way slower than your desktop PC.
Different tests have different needs and bottlenecks. For example, if you're running unit tests, with no external dependencies, you'd want to separate out the build process, and distribute the artifacts to multiple test machines that run tests in parallel.
The choice of using a few big machines or more small ones is usually up to you, and is a wash cost-wise. However I do need to point out AWS sells you 192 core machines, which I'd wager are WAY faster than what you're sitting in front of.
And Amdahl's law is a thing as well. If there's a 10 minute non-parallizable build process, doing a 10 minute test run on a 192 core machine vs doing an ~0 minute test run on infinite machines ends up being only a 2x speedup.
And there's such a thing as setup costs. Spinning up an AWS machine from an image, setting up network interfaces, configuring the instance from scratch has a cost associated with it as well. And managing a cluster of 1000 computers also has its set of unique challenges. And assuming that if you ask Amazon for a 1000 machines at a drop of the hat, and plan to use each for a minute or two, AWS will throttle the fuck out of you.
All I'm trying to say with this incoherent rambling is KISS - know your requirements, and build the simplest infra that can satisfy it. Having a few large stopped instances in a pool might be all the complexity you need, and while it flies in the face of cloud best practices, it's probably going to be the fastest to start up.