Thanks for listing these. It's very useful. (1) is a function of what SQL you mi...

zasdffaa · on Aug 27, 2022

Interesting. Ok

> (3) just as with (2), the new thing transpiling to SQL wouldn't have to allow NULLs, and could generate SQL that uses `IS [NOT] NULL` / `IS [NOT] DISTINCT FROM NULL` to obtain the desired semantics if the new thing kept any notion of NULL.

You may disallow nulls stored, but anything like a non-inner join will produce nulls and then you're almost back to where you started.

> (4) is not a semantics issue, and anyways, [where 4 is "Finally, the execution model of SQL is always against a DBMS table or view. So in-memory relations, etc. look difficult to implement"].

Spot on and great catch. I missed that.

> I'm unconvinced that there are such problems that are very serious

Let me help you - simple sqlNG commands could transpile into more much complex normal sql target statements, breaking the optimiser. That's a very real risk.

Good post though

cryptonector · on Aug 27, 2022

> Let me help you - simple sqlNG commands could transpile into more much complex normal sql target statements, breaking the optimiser. That's a very real risk.

Breaking the optimizer is a big deal, yes. I have this fantasy that we could have an out-of-band hinting system for SQL, like "start the query plan with table source <alias>", or "use index <index-name> for table source <alias>", or "materialize table source <alias> and create these indices for it", or "index the CTEs by..." (the last one should really be automatically done by the optimizer, but when you can save it thinking time, that can be a good thing). In-band hinting has generally been problematic. Out-of-band hinting could either follow the query itself, or be provided in a SQL exec API as a separate argument, or as explicit method calls in query setup in LINKQ-like systems.

> You may disallow nulls stored, but anything like a non-inner join will produce nulls and then you're almost back to where you started.

Now that just points out that the problem arises naturally and isn't entirely SQL's fault. Null values are as much a problem for SQL as for C and many other languages.

The way modern programming languages deal with nulls is by insisting on full pattern matching on algebraic data types. So it follows that one might want to do the same in a new query language, and/or possibly as an extension to SQL. So that this would be OK because at the top-level, but not as a subquery:

  SELECT a.name, b.thing
  FROM a LEFT JOIN b USING (id)

and one would be forced to check for absence (like nullity, but with the requirement that one must check every time it's possible that a value is absent). In SQL we have `coalesce()`, `IS [NOT] NULL`, `IS [NOT] DISTINCT FROM NULL`, and so on, so we can do this pattern matching, but we're not required to, and that might be the real problem with SQL and NULL values.

> Good post though

Ditto.

zasdffaa · on Aug 28, 2022

This is getting even more interesting.

> I have this fantasy that we could have an out-of-band hinting system for SQL,

I think not breaking the optimiser, and out of band hinting appear somewhat orthogonal but I'd be very glad to be wrong. I'm looking at OOB hinting for a different reason, but what you're thinking is very close to mine, and I'm struck by the fact you see hinting the same way I do (actual, and extensive, logical vs physical separation, inc. indexes as hints).

> and one would be forced to check for [nulls]

Even more interesting. I will shortly have something of relevance. Do you have a contact e-mail?

cryptonector · on Aug 28, 2022

> I think not breaking the optimiser, and out of band hinting appear somewhat orthogonal but I'd be very glad to be wrong.

If you're writing a transpiler, you might as well make it an optimizer and use a solid hinting system to guide the target RDBMS' optimizer so as not to be at its mercy. If you were implementing from scratch rather than as a transpiler, you'd have to write an optimizer anyways.

Also, the reason optimizers can suck is precisely that there can be many ways to plan a query.

E.g., if you have 10 table sources INNER JOINed in sequence, you could start the query plan with any one of them, but not every one of those 10 tables will yield a good query plan when used as the starting table! You want to start such a query plan with the table that has a PK or index key that is most fully determined by the WHERE/ON constraints, by which I mean: the table source that will have the smallest cardinality given the constraints imposed on it by the query.

Sometimes the query that one has to optimize is a sub-query deeply buried in another, possibly in a VIEW, and the hinting needed may be specific to the context outside the view. Now inline hinting leads to having to inline sub-queries (repetition).

The many table sources INNER JOINed case is one I've actually run into.

Part of the problem seems to be that pushing constraints is a fairly advanced sort of query algebra: you need a good AST in order to be able to do it, and even then, it's not all you have to do to build a decent optimizer. If you look at the SQLite3 optimizer, it's gotten much better over the years at pushing constraints.

With every table source named, you can address hints about them, and you can do it out of band.

That or pick a target whose optimizer you can either work on or already is good enough for the transpiler's purposes. However, keeping things general is a good idea, especially when it comes to SQL (there's so much incompatible variation between SQLs!), so as to keep the transpiler generic and portable. The lack of a standard and solid hinting system is a real problem. Also, sometimes optimizers get worse.

> Do you have a contact e-mail?

  ${HN_username} @ gmail