Implementing join planning in our open source Golang SQL query engine

thesz · on Dec 28, 2020

And here's how SQLite tackles more or less same task: https://www.sqlite.org/queryplanner-ng.html

With pitfalls explained and solution offered.

robertlagrant · on Dec 29, 2020

SQLite is just an amazing project.

londons_explore · on Dec 29, 2020

Which plan is chosen has a dramatic impact on performance. Various stats tables give clues which plan will be best, but there will always be queries where the stats still don't lead to the optimal plan...

Therefore, I believe all query engines should allow the changing of a plan mid query. Ie., when a query or subquery is not producing matching rows as rapidly as predicted, the query should be paused, a new plan selected, and the query continued with the new plan.

My testing of this on a fork of postgres found some queries can see a 1000x speedup, especially ones with tens of joins.

This leads to some complexity around cursors, but in my view, this complexity is worth it.

trhway · on Dec 29, 2020

>the query should be paused, a new plan selected, and the query continued with the new plan.

that is much more complex while in principle similar to the well known more practical approach - run a quick sampling of the related data and choose the plan according to the sampling results.

panta · on Dec 28, 2020

Nice article. Btw go-mysql-server[1] is a great project!

[1]: https://github.com/dolthub/go-mysql-server

stephen · on Dec 28, 2020

That's amazing (both the article and the project)!

I want exactly that but a) for postgres and b) written in javascript/node so I can use it for in-process access while running test suites.

I know they speak mysql as a wire protocol, just would be cool to avoid another process/port to manage.

mildbyte · on Dec 28, 2020

We have something that might work for your use case with Splitgraph [0] (co-founder here)!

Basically, it's a custom version of PostgreSQL that supports git + Docker-like operations (versioning, sharing, branching, delta compression, Dockerfiles for datasets etc). It's not in-memory but it runs in Docker and has some tooling built around it to make it simple to spin up and manage a throwaway DB for integration test purposes.

Here's an example of it in action in use cases like machine learning [1] where we can switch between different versions (test/train) of the dataset without having to change the client application.

[0] https://github.com/splitgraph/splitgraph

[1] https://github.com/splitgraph/splitgraph/blob/master/example...

zachmu · on Dec 28, 2020

Postgres support is planned, but we're concentrating on supporting MySQL 100% first. Paying customers who wanted this might make us change the priority order.

anukulrm · on Dec 29, 2020

Thanks for sharing! Are there any good resources describing query planning algorithms (other than the excellent SQLite one that was shared by another comment)?

Mrnothing_ · on Dec 28, 2020

The reason sql is popular is becouse is easy, in contrast whit graph database,what are the Real benefits of using it in commercial environments?

jimktrains2 · on Dec 28, 2020

Sql is not a valid comparison to graph databases. Sql is a language many relational databases use as their query language. There are things like Cypher for graph databases that are similar to SQL.

pram · on Dec 29, 2020

Considering relational databases have been around for 50 years, and are ubiquitous in practically every business you can imagine? For starters I'd say the massive pool of talent who understands SQL and DB infrastructure in general might be a real benefit in a commercial environment!

codeulike · on Dec 28, 2020

https://en.wikipedia.org/wiki/Relational_database

dastx · on Dec 29, 2020

Dolt has peaked my interest to say the least. Would love to know the story behind it, and also what are the use cases for it?

By the way, I can't click your about page, or the menu button (Firefox android). It just refreshes the page and goes to the data repositories on your main page.

zachmu · on Dec 29, 2020

Great question. Check out this blog post about use cases written by our founder and CEO.

https://www.dolthub.com/blog/2020-03-30-dolt-use-cases/

I've forwarded your bug report to our web team.