Hacker News new | past | comments | ask | show | jobs | submit login

Are the feature comparison tables in this article supposed to reflect the state of SQL databases as of several years ago?

I've been using Postgres very consistently for the past half decade (as a dev, not a DBA) and it's had a `GROUP BY` and `WHERE` clause for at least as long.

I'm not totally sure what "base table `UNIQUE`" is supposed to mean in this context, but if it's just the bare `UNIQUE` constraint, Postgres has also had that for a long, long while.




The table is about dependency checking for those constructs.

E.g. SELECT countries.code, countries.name, count(*) FROM countries JOIN countries_languages as co_lang on co_lang.country_code = countries.code GROUP BY countries.code

This isn't supported by Postgres, because countries.name is not in the group by clause, on the other hand it does make sense, because code will be unique in the countries table, so there will always be only one name for a given code.

(see https://mysqlserverteam.com/mysql-5-7-only_full_group_by-imp... which is also linked from the original article)


And as specifically noted in that linked article you reference, it talks about how functional dependencies are a feature of SQL-99.

This aspect of the original article is really talking about how MySQL decided to implement a more up-to-date version of the spec that many competitors when they finally addresses their deficiency (as well they should, finally being SQL-92 compliant in the 2010's wouldn't really be anything to boast about).


Hi!

The first feature matrix you see is about checking of functional dependencies.

For example, PostgreSQL doesn't see the functional dependency between the columns ID and MA in the result of this query.

    SELECT id
         , max(a) ma
      FROM ...
     GROUP BY id
As a consequence, this query doesn't work, because the outer query refers to MA which is not in the GROUP BY clause of the outer query:

    SELECT COUNT(*) cnt
         , ma
      FROM (SELECT id
                 , max(a) ma
              FROM ...
             GROUP BY id) x
      GROUP BY id
Detecting these things makes life easier.


That's a fairly obtuse example. Is there a standard you're drawing from, for this? Because this example works fine:

    postgres=# create table func_test ( id int primary key, data text );
    CREATE TABLE
    postgres=# insert into func_test values ( 1, 'josh' ), ( 2, 'markus' );
    INSERT 0 2
    postgres=# select id, data from func_test group by id;
     id |  data  
    ----+--------
      2 | markus 
      1 | josh


In SQL:2016, it is

> 4.24.13 Known functional dependencies in the result of a <group by clause>

Not that this is about the result of a <group by clause>

edit: The matrix I have in the article is not complete. I've just picked a few examples from the standard and checked them to get an overview. I guess the fact that PostgreSQL supports some of them but MySQL more of them is well represented in the matrix.

The full text is below. I'm happy to correct, when I'm interpreting this wrong (which is perfectly possible).

Let T1 be the table that is the operand of the <group by clause>, and let R be the result of the <group by clause>. Let G be the set of columns specified by the <grouping column reference list> of the <group by clause>, after applying all syntactic transformations to eliminate ROLLUP, CUBE, and GROUPING SETS. The columns of R are the columns of G, with an additional column CI, whose value in any particular row of R somehow denotes the subset of rows of T1 that is associated with the combined value of the columns of G in that row. If every element of G is a column reference to a known not null column, then G is a BUC-set of R. If G is a subset of a BPK-set of columns of T1, then G is a BPK-set of R. G ↦ CI is a known functional dependency in R.


Hmmm. PostgreSQL treats output rowsets as their own thing, so this works differently in a subquery than in a top-level query. Not that that's desireable, it's just a consequence of an implementation that gives a lot more flexibility in other areas (table functions, for example, where Postgres goes well beyond the standard). So looking at the standard I agree that having your example work is superior to not having it work. I'd appreciate a footnote for that particular item for Postgres, though, since the most common forms of GROUP BY dependencies do work.


That example code seems overly complicated and redundant. Should the code be simpler like this:

  SELECT id
        , max(a) ma
         , count(*) Cnt
   FROM ...
  GROUP BY id


The purpose of the example is to demonstrate functional dependencies based on grouped tables.


I’m constantly flabagastered by what’s considered novel in the world of MySQL.

But then I remember this is the database that lied (lies?) about ACID transactions, and had no problems storing non-existent dates in a DateTime-column because its creators considered ensuring data-integrity and consistency “the job of the application layer” (which back then often was PHP).

Nothing should really surprise me.


> I’m constantly flabagastered by what’s considered novel in the world of MySQL.

What's that in reference to? The article explains that MySQL implemented group by using a newer standard than many competitors did, by looking at functional dependencies. Not exactly novel, but good for MySQL, they decided to go beyond the competition when finally catching up on this particular feature.

There are, of course, many other problems with MySQL (as you allude to), such as silently ignoring CHECK constraints, but that doesn't seem to make sense as what you were referring to in context of what they consider novel.


> this is the database that lied (lies?) about ACID transactions

Time moves on. The Innodb storage engine has had ACID transactions for over a decade.

> no problems storing non-existent dates in a DateTime-column

PostgreSQL checks date formats, but updating (or not updating) timezone databases would affect validity. So it's not as simple an issue as you make it to be.


You aren't reading that table correctly. It's about functional dependency checking, not about whether you can use a WHERE clause.


Yeah, later on in the article it has an "X" for PostgreSQL for "JSON_OBJECTAGG" and "JSON_ARRAYAGG" and I'm 100% sure postgres has the former (I used it like 10 minutes ago), and like 99% sure it has the latter.


The table is very strict, it requires that the actual standard be supported (possibly incompletely). Postgres has ways to aggregate into a json object or array, but they're non-standard. If you write standard SQL making use of those features, it will not work.


I noticed that too and PostgreSQL does have them. They're just named slightly differently: `json_object_agg()` and `json_agg()`.


https://www.postgresql.org/search/?q=json_objectagg

But there is a patch pending for this. Not sure if this gonna be included in 11.



Those gory details I've mentioned in the article.

This is how standard SQL JSON_OBJECTAGG works: https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016#js...

It can actually do much more:

<JSON object aggregate constructor> ::= JSON_OBJECTAGG <left paren> <JSON name and value> [ <JSON constructor null clause> ] [ <JSON key uniqueness constraint> ] [ <JSON output clause> ] <right paren>

This free technical report of ISO has some examples: http://standards.iso.org/ittf/PubliclyAvailableStandards/c06...


I haven't done any exact test, but it looks identical to how the json_object_agg function in postgres works, but with a different syntax.

Is it really that different to not even warrant a note in the table?

Clearly you know better than I do, but the big red X surprised me as the fantastic JSON support is one of the reasons why I tend to reach for Postgres so often over other DB engines.


A big red X doesn't mean the database doesn't have other, similar but proprietary functionality. It just means: it doesn't support this function in a standard confirming way yet.

There is nothing against using non-standard functionality in that case. In this particular case. PostgreSQL got JSON support (~2012?) long before it was added to the standard (2016). Obviously, their JSON functions differ from the standard (as they didn't lobby their functions into the standard).

However, slowly but surely database will add the standard functionality, which makes life easier.

Having that said, the standard JSON_OBJECTAGG is more powerful as it gives you control over null handling (NULL ON NULL -vs- ABSENT ON NULL) and allows checking for duplicate keys ([(WITH|WITHOUT) UNIQUE KEYS]).


Maybe add some little footnotes to the X? It took me a few minutes of thinking "Of course postgres has json object aggregation here it's working perfectly and it's documented" before I realised that sure it existed but it's not the standard version, at all, which is probably why it's marked as a no, because it's not Troel's page and Markus really really cares about the standard.


Given the name difference it's possible that the semantics are slightly different.


That part of the article doesn't make much sense. It sounds like MySQL finally enabled rational GROUP BYs instead of randomly choosing a row when the grouped rows aren't distinct. This is something at least PostgreSQL has done for ages.


Nope, the old mysql was extremely stupid, but the postgres behaviour wasn't that smart because it refused anything not in the group by clause (even when constraints on the table ensure that everything in your select clause can in fact be determined from the columns used to group).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: