Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python is already fast where it matters: often, it is just used to integrate existing C/C++ libraries like numpy or pytorch. It is more an integration language than one where you write your heavy algorithms in.

For JS, during the time that it received its JITs, there was no cross platform native code equivalent like wasm yet. JS had to compete with plugins written in C/C++ however. There was also competition between browser vendors, which gave the period the name "browser wars". Nowadays at least, the speed improvements for the end user thanks to JIT aren't also that great, Apple provides a mode to turn off JIT entirely for security.




Having recently implemented parallel image rendering in corrscope (https://github.com/corrscope/corrscope/pull/450), I can say that friends don't let friends write performance-critical code in Python. Depending on prebuilt C++ libraries hampers flexibility (eg. you can't customize the memory management or rasterization pipeline of matplotlib). Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance (you can't get matplotlib to render directly to an inter-process shared memory buffer, and the alternative of copying data from matplotlib's framebuffer to shared memory wastes CPU time).

Additionally a lot of the libraries/ecosystem around shared memory (https://docs.python.org/3/library/multiprocessing.shared_mem...) seems poorly conceived. If you pre-open shared memory in a ProcessPoolExecutor's initializer functions, you can't close them when the worker process exits (which might be fine, nobody knows!), but if you instead open and close a shared memory segment on every executor job, it measurably reduces performance, presumably from memory mapping overhead or TLB/page table thrashing.


> Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance

Well, imho the biggest problem with this approach to paralellism is that you're stepping out of the Python world with gc'ed objects etc. and into a world of ctypes and serialization. It's like you're not even programming Python anymore, but more something closer to C with the speed of an interpreted language.


That's why optional GIL will be so important.


> Depending on prebuilt C++ libraries hampers flexibility (eg. you can't customize the memory management or rasterization pipeline of matplotlib).

But what is the counterfactual? Implementing the whole thing in Python? It seems much more work than forking/fixing matplotlib.


> If you pre-open shared memory in a ProcessPoolExecutor's initializer functions, you can't close them when the worker process exits

That's quite surprising to learn, as I didn't think the initializer ran in a specialized context (like a pthread_atfork postfork hook in the child).

What happens when you try to close an initializer-allocated SharedMemory object on worker exit?


ProcessPoolExecutor doesn't let you supply a callback to run on worker process exit, only startup. Perhaps I could've looked for and tried something like atexit (https://docs.python.org/3/library/atexit.html)? In any case I don't want to touch my code at the moment until I regain interest or hear of resource exhaustion, since "it works".


Fair enough. If you ever do decide to touch it to address that, I suggest (in preference order):

- Subclassing ProcessPoolExecutor such that it spawns multiprocessing.Process objects whose runloop function wraps the stdlib "_process_worker" function in a try/finally which runs your at-shutdown logic. That'll be as reliable as any try/finally (e.g. SIGKILL and certain interpreter faults can bypass it).

- Writing custom destructors of objects in your call arguments which are aware of and can do appropriate cleanup actions for associated SharedMemory objects. This is less preferred than subclassing because of the usual issues with custom destructors: no real exception handling, and objects sneaking out into long-lived/global caches can cause destructors to run late (after the interpreter has torn down things your cleanup logic needs) or not at all.

- Atexit, as you suggest. This is least-preferred because the execution context of atexit code is .... weird, to say the least. Much like a signal handler or pthread_atfork callback, it's not a place that I'd put code that does complicated I/O or depends on the rest of the interpreter being in ordinary conditions.


What would you use instead of Python?


Cython? :o


I think usually the term “browser wars” refers to the time when Netscape and Microsoft were struggling for dominance, which concluded in 2001.

JavaScript JITs only emerged around 2008 with SpiderMonkey’s TraceMonkey, JavaScriptCore’s SquirrelFish Extreme, and V8’s original JIT.


There were multiple browser wars, otherwise you wouldn't need -s there ;-)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: