This sounds like a problem that would border on the complexity of replacing the GIL in Ruby or Python. The performance benefits are obvious but it seems like the correctness problems would be myriad and a constant source of (unpleasant) surprises.
This is different because there isn’t a whole ecosystem of packages that depend on access to a thread unsafe C API. Getting the GIL out of core Python isn’t too challenging. Getting all of the packages that depend on Python’s C API working is.
An other component of the Gil story is that removing the Gil require adding fine grained locks, which (aside from making VM development more complicated) significantly increases lock traffic and thus runtime costs, which noticeably impacts single-threaded performance, which is of major import.
Postgres starts from a share-nothing architecture, it’s quite a bit easier to evaluate the addition of sharing.
Postgres already shares a lot of state between processes via shared memory. There's not a whole lot that would initially change from a concurrency perspective.
> which (aside from making VM development more complicated) significantly increases lock traffic and thus runtime costs, which noticeably impacts single-threaded performance, which is of major import.
I don't think that's a fair characterization of the trade offs. Acquiring uncontended mutexes is basically free (and fairly side-effect free) so single-threaded performance will not be noticeably impacted.
Every large C project I'm aware of (read: kernels) that has publicly switched from coarse locks to fine-grained locks has considered it to be a huge win with little to no impact on single-threaded performance. You can even gain performance if you chop up objects or allocations into finer-grained blobs to fit your finer-grained locking strategy because it can play nicer with cache friendliness (accessing one bit of code doesn't kick the other bits of code out of the cache).
> which noticeably impacts single-threaded performance, which is of major import.
1) I don't buy this a priori. Almost everybody who removed a gigantic lock suddenly realizes that there was more contention than they thought and that atomizing it made performance improve.
2) Had Python bitten the bullet and removed the GIL back at Python 3.0, the performance would likely already be back to normal or better. You can't optimize hypothetically. Optimization on something like Python is an accumulation of lots of small wins.
You don’t have to buy anything, that’s been the result of every attempt so far and a big reason for their rejection. The latest effort only gained some traction because the backers also did optimisation work which compensated (and then was merged separately).
> Almost everybody who removed a gigantic lock
See that’s the issue with your response, you’re not actually reading the comment you’re replying to.
And the “almost” is a big tell.
> suddenly realizes that there was more contention than they thought and that atomizing it made performance improve.
There is no contention on the gil in single threaded workloads.
> Had Python bitten the bullet and removed the GIL back at Python 3.0
It would have taken several more years and been completely DOA.
> there isn’t a whole ecosystem of packages that depend on access to a thread unsafe C API
They mentioned a similar issue for Postgres extensions, no?
> Haas, though, is not convinced that it would ever be possible to remove support for the process-based mode. Threads might not perform better for all use cases, or some important extensions may never gain support for running in threads.
The correctness problem should be handled by a suite of automated tests which PostgreSQL has. If all tests pass, the application must work correctly. The project is too big, and has too many developers to make much progress without full test coverage. Where else would up-to-date documentation regarding the correct behavior of PostgreSQL exist? In some developers head? SQLite is pretty famous for there extreme approach to testing including out of memory conditions, and other rare circumstances: https://www.sqlite.org/testing.html
Parallelism is often incredibly hard to write automated tests for, and this will most likely create parallelism issues that were not dreamed of by the authors of the test suite.
> If all tests pass, the application must work correctly.
These are "famous last words" in many contexts, but when talking about difficult-to-reproduce parallelism issues, I just don't think it's a particularly applicable viewpoint at all. No disrespect. :)
Even the performance benefits are not big enough compare to the GIL.
Biggest problem of the process model might be the cost of having too many DB connections. Each client need a dedicated server process. Memory usage and the context switching overhead. Or if there is no connection pool, connection time overhead is very high.
This problem has been well addressed with a connection pool. Or having a middle ware instead of exposing the DB directly. That works very well so far.
Oracle has been supporting the thread based model and it's been usable for decades. I remember I tried the thread based configuration option (MTS or shared server) in 1990s. But no one likes that at least within my Oracle DBA network.
It would be a great research project but it would be a big problem if the community pushs this too early.