Instead of providing resources, continuation, or even references to these topics...

Game_Ender · on Aug 10, 2014

Took a little bit of reading but the OP is founder of company [0] making a massively scalable geospatial database. This is the closest article [1] I can find which has anything close to details. It lists the following technologies:

* A low level stack that bypasses the OS for at least disk IO (supposed to provide a 2-3x throughput increase)

* "polymorphic" space-filling curves for distributed parallel indexing

* hyper-dimensional spatial sieves [2]

* Allen’s Interval Algebra to parallelise SQL statements

More discussion on some of the OP's previous hacker news posts [3].

[0] - http://spacecurve.com/

[1] - http://www.it-director.com/blogs/Bloor_IM_Blog/2014/7/how-do...

[2] - http://www.google.com/patents/US20090182837

[3] - https://news.ycombinator.com/item?id=7482151

zodiac · on Aug 10, 2014

I see a listing of what the link lacks. It's not the parent's responsibility to provide references without being asked to do so - gathering resources for this kind of post can easily take 2-3x the effort of writing up the original post, especially if the parent's author hasn't studied the subject formally recently. Also, he doesn't know that there's demand for more references without first posting, and if we expect all such posts to contain references, we discourage their authors from writing them.

jandrewrogers · on Aug 10, 2014

Most of these topics have scant literature, or what is there is very theoretical in nature rather than something directly reducible to practice.

Greedy routing theory does have a lot of literature around it because it is used a lot in Layer-2 and Layer-3 packet routing protocols to optimize aggregate throughput of inherently decentralized systems. A lot of the robustness of modern IP networks are explained by this. However, above Layer-3 and particularly at the application level you almost never see properly designed distributed protocols; the people that design L2 routing protocols are not the people that design distributed systems and the knowledge is not transferred. (Admittedly this is a difficult mathematical area to understand with a lot of open problems. I know just enough to make good design decision but otherwise do not understand the underlying math.)

I've stated many times that the HPC algorithm people could learn a lot from the people that design distributed databases and that distributed databases theory can learn a lot from the people that design massively parallel HPC algorithms. As far as I can tell, those two groups of people do not talk to each other. I just happen to have done considerable R&D in both fields so I see what people in both domains are missing.

I've recently been tasked with writing about some of these topics, which should be interesting. My intent was not to bash or gloat but to highlight the point that we can do much better than we (or even I) are currently doing with distributed systems.