I'm wondering if one could write this bytecode directly (or with a higher level imperative language) instead of SQL. Often, the programmer knows exactly which index lookups need to happen in a loop, while it seems like a burden to express that in SQL. This might also be an opportunity to create a different type safe dsl for database access.
I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion. Despite the fact that most of the tables most people have are never any big (except for a few oltps that you ought to leave to specific sql server wizards anyway) and you usually do have the idea how it should work. SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have. And most of our complex queries are trivially expressible in imperative loops. Index lookup and a complex plan building are key features of an rdbms, everything else it steals from you and forces to use an arcane inconvenient language that in practice isn’t even a standard. Make plan building and indexing core ops and leave everything else to some sort of a general purpose bytecode, everyone will be happier and will create dozens of DSLs for all their needs.
> I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion.
Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.
I'm wondering the opposite - could RDBMS know better than programmer how to structure the program? The other day I had this realization, that if I squint, quite a lot of code I see and write could be seen as database tables, prepared statements, and specialized indexes. In particular, every time I use an associative table (or several) to speed up access to data along particular dimension (like X/Y coordinates in the world of a videogame), that's equivalent to making an index in SQL.
>SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have.
My usual problems are:
1) Running queries manually, where SQL is much more friendly than writing bytecode by hand.
2) Running queries using a clueless ORM that has no idea how to optimize them, so leaving this job to the database makes sense.
I believe the actually rare problem is "writing hyper-optimized complex queries tailored to the current state of the database", where bytecode would help. But that is a very unusual usecase.
Maybe there are shops that use databases very differently from me, but it makes sense that SQL is optimized for the common use case.
And since db internals are very different, it's hard to make a single bytecode standard that makes sense to use across different databases. You can probably write something specialized for sqlite or postgres, but since nobody did, probably it's not as useful for other people too.
hyper-optimized complex queries tailored to the current state of the database", where bytecode would help
Optimization isn’t the goal here, this quote misrepresents the idea. In the spirit of dismissal through all these years, btw, some people just don‘t get it and think it’s for better performance. It is, but for better performance of a developer, not that of a database. The goal is to have a set of languages above that bytecode that would help with usual programming bookkeeping and with expressing queries in an algorithmic way because that’s how you designed it. SQL lacks DX ergonomics, it’s just a language from the clueless epoch.
I was wondering the same thing. And in particular if a new query language that avoided many of the pitfalls of SQL could compile down to that bytecode and avoid having to deal with SQL as an intermediate representation. Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.
Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
That's exactly how those "prepared statements" work in SQLite - you get a handle to a piece of bytecode, and you can call it with different parameters.
You can do an automatic upgrade thing, where you recognise old formats and upgrade them on the fly. Possibly with a window on it where at sufficient age people need to run their elderly code through multiple upgrade cycles.
Or you can declare it an unstable interface and it's on the user to deal with it changing over time.
Or you can just leave it accessible without excessive work, but not document it, and then it's very obviously on the external tool to deal with the impedance matching.
Something akin to this is available[1] in DuckDB, itself started as a "shell" over SQLite. Using Python, you can compose your SQL plans as you like.
I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.
Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.
1) I think a simple new kind of query hint could go a long way: Declare a table as being "infinitely large" from the perspective of the query planner.
So, if a query caused a full table scan or similar against the table, it's an error, regardless of the actual content.
A lot of "typical" backend code would then simply require you to make the right indices you need to support your queries, and the query planner would be forced to use those indices in all situations.
2) You can already sort of do what you say; that "higher level imperative language" is available simply by breaking your query up into smaller chunks.
At least in MS-SQL (which is what I know), simply do
declare @tmp1 as table (...)
insert into @tmp1 ...
declare @tmp2 as table (...)
insert into @tmp2 ... from @tmp1 join ...
and so on; if you make each of those steps small enough you pretty much have the imperative language.
3) That said, I think SQL is a wonderful idea, I am also super-productive in implementing business logic in it; but it is a horrible language; the learning curve is so much higher than it needs to be and so many silly pitfalls. So fully support pushes for better languages.
Definitely yes. Databases and compilers have a lot in common. This bytecode is exactly equivalent to a compiler IR. It's called out as such in the OP where the decoupling between front end and backend is noted. Therefore you can construct the bytecode from outside of the database and feed it to the backend (may require patching sqlite to make the entry points accessible).
By analogy, this is really easy in LLVM. You can write the IR you want in a text file and feed it back into the infrastructure. This is used a lot in testing and debugging LLVM. It's much harder in GCC for reasons that never made much sense to me.
There's probably a parser for the format of the EXPLAIN text dump in sqlite somewhere intended for modifying the debug output then feeding it back into the engine. Maybe not documented, but I expect it works fine in practice.
You should watch this asianometry video on the birth of SQL, very interesting, and in fact a functional approach, based on relational algebra and tuple relational calculus, was originally what the father of the concept intended for interacting with the db. Other IBM engineers formulated the SQL language over that.
The precursors to sql rdbms, and the war over competing concepts, were also quite thought provoking.