You can only have a single core running Python code per process.
You can also have a single Python process running multiple threads in parallel, as long as all but one of those threads are currently running C code (numpy, system calls, ...)
But if your program spends more than a tiny fraction of time in interpreted Python code, the GIL will slow you done if you try to use all cores.
You can have multiple processes running Python at the same time -- and indeed, many Python programs work around the GIL by forking sub-processes. This tends to introduce significant extra complexity into the program (and extra memory usage, as the easiest solution is typically to copy data to all processes).
If your inputs are immutable and you don't want to copy them that works too since if you're careful. Copy on write means that it won't be duplicated by default. Of course reference counting changes means you will be writing to the inputs by default but if you're just branching for a computation you can temporarily turn off garbage collection while you diverge. Disabling garbage collection for a short lived forked process can even make sense on its own terms.
You can also have a single Python process running multiple threads in parallel, as long as all but one of those threads are currently running C code (numpy, system calls, ...) But if your program spends more than a tiny fraction of time in interpreted Python code, the GIL will slow you done if you try to use all cores.
You can have multiple processes running Python at the same time -- and indeed, many Python programs work around the GIL by forking sub-processes. This tends to introduce significant extra complexity into the program (and extra memory usage, as the easiest solution is typically to copy data to all processes).