Hacker News new | past | comments | ask | show | jobs | submit login
Implementing join planning in our open source Golang SQL query engine (dolthub.com)
105 points by zachmu on Dec 28, 2020 | hide | past | favorite | 15 comments



And here's how SQLite tackles more or less same task: https://www.sqlite.org/queryplanner-ng.html

With pitfalls explained and solution offered.


SQLite is just an amazing project.


Which plan is chosen has a dramatic impact on performance. Various stats tables give clues which plan will be best, but there will always be queries where the stats still don't lead to the optimal plan...

Therefore, I believe all query engines should allow the changing of a plan mid query. Ie., when a query or subquery is not producing matching rows as rapidly as predicted, the query should be paused, a new plan selected, and the query continued with the new plan.

My testing of this on a fork of postgres found some queries can see a 1000x speedup, especially ones with tens of joins.

This leads to some complexity around cursors, but in my view, this complexity is worth it.


>the query should be paused, a new plan selected, and the query continued with the new plan.

that is much more complex while in principle similar to the well known more practical approach - run a quick sampling of the related data and choose the plan according to the sampling results.


Nice article. Btw go-mysql-server[1] is a great project!

[1]: https://github.com/dolthub/go-mysql-server


That's amazing (both the article and the project)!

I want exactly that but a) for postgres and b) written in javascript/node so I can use it for in-process access while running test suites.

I know they speak mysql as a wire protocol, just would be cool to avoid another process/port to manage.


We have something that might work for your use case with Splitgraph [0] (co-founder here)!

Basically, it's a custom version of PostgreSQL that supports git + Docker-like operations (versioning, sharing, branching, delta compression, Dockerfiles for datasets etc). It's not in-memory but it runs in Docker and has some tooling built around it to make it simple to spin up and manage a throwaway DB for integration test purposes.

Here's an example of it in action in use cases like machine learning [1] where we can switch between different versions (test/train) of the dataset without having to change the client application.

[0] https://github.com/splitgraph/splitgraph

[1] https://github.com/splitgraph/splitgraph/blob/master/example...


Postgres support is planned, but we're concentrating on supporting MySQL 100% first. Paying customers who wanted this might make us change the priority order.


Thanks for sharing! Are there any good resources describing query planning algorithms (other than the excellent SQLite one that was shared by another comment)?


The reason sql is popular is becouse is easy, in contrast whit graph database,what are the Real benefits of using it in commercial environments?


Sql is not a valid comparison to graph databases. Sql is a language many relational databases use as their query language. There are things like Cypher for graph databases that are similar to SQL.


Considering relational databases have been around for 50 years, and are ubiquitous in practically every business you can imagine? For starters I'd say the massive pool of talent who understands SQL and DB infrastructure in general might be a real benefit in a commercial environment!



Dolt has peaked my interest to say the least. Would love to know the story behind it, and also what are the use cases for it?

By the way, I can't click your about page, or the menu button (Firefox android). It just refreshes the page and goes to the data repositories on your main page.


Great question. Check out this blog post about use cases written by our founder and CEO.

https://www.dolthub.com/blog/2020-03-30-dolt-use-cases/

I've forwarded your bug report to our web team.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: