If you're working on CPU-bound tasks with NumPy/SciPy using threads then you have to think very hard to make sure most of the critical sections are hitting the NumPy calls in C which release the GIL. It's not a great reliable way to program. The way the author describes is basically the only pure-Python way of achieving parallelism for this kind of problem.
If you're holding a global mutex every 100 instructions while context switching between CPU bound tasks, then yes, the GIL does suck. There are a class of IO-bound problems where threading/evented models in Python can be used effectively but that's not the class of problems the author is talking about here.
Considering the author's use case (mathematical modeling) and language (Python), threading and event-based models would have no real performance benefit.
Event based models only shine when you are doing IO-bound tasks. They won't help you when you are chewing CPU.
Threading models in Python aren't attractive because of the GIL. If you are doing a parallel matrix operation you can only ever use one CPU because of the GIL. Not attractive.
STM is not a great fit for this kind of problem, there's no need for all the transaction machinery if the problem is embarrassingly parallel. In an ideal world what you want is threads which just split the work sections like they do in C/C++/Haskell.
Just curious but could this be possible to get around by having multiple copies of the python interpreter installed on your sytstem? Maybe possibly changing some config strings so the GIL thinks it's something else.
ProcessPoolExecuter from the futures package essentially does this, it runs multiple instances of the interpreter and then distributes instructions to them.
multiprocessing adds memory isolation through the CPU's protected memory.