Hacker News new | past | comments | ask | show | jobs | submit login
One Giant Leap for SQL: MySQL 8.0 Released (modern-sql.com)
136 points by dhd415 on April 25, 2018 | hide | past | favorite | 152 comments



PostgreSQL 9.5 manual: "Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table."

MySQL 8.0 manual: "Queries that skip locked rows return an inconsistent view of the data. SKIP LOCKED is therefore not suitable for general transactional work. However, it may be used to avoid lock contention when multiple sessions access the same queue-like table."

Flattering (I wrote those words). Of course we borrow the SKIP LOCKED syntax from yet another database (though the idea is much older, under different names).


The article says MySQL "the most popular free SQL database" but gives no reference for this claim. It seems to me that every system I know of has at least one install of SQLite somewhere (be it as something that was installed with the OS or in some commonly installed application). SQLite seems to agree with my assessment in https://sqlite.org/mostdeployed.html so I wonder where did the quoted statement above come from, and what's the basis for it?


Full quote.

> MySQL is very popular. According to db-engines.com, it’s the second most popular SQL database overall. More importantly: it is, by a huge margin, the most popular free SQL database.

The second sentence is linked: https://db-engines.com/en/ranking

The third sentence refers to the same source.


Thanks for the link, but I find this highly unlikely to be what the article or db-engines.com claims. The data comes from vague sources that don't jibe with number of installations. The word "popular" is used in such a perverse way it renders the db-engines.com ranking and that claim in the article impossible to believe.

That ranking site claims "The DB-Engines Ranking ranks database management systems according to their popularity." but then includes concepts such as "Number of mentions of the system on websites", "General interest in the system", and "Relevance in social networks" as part of the ranking (see https://db-engines.com/en/ranking_definition for details). In fact, the one thing one would typically consider to be "popularity" -- installed base size -- is not considered at all ("The DB-Engines Ranking does not measure the number of installations of the systems, or their use within IT systems."). In fact the paragraph from which I drew that quote highlights how ridiculous this site is: to understand that paragraph's use of the word "popular" you have to use the common definition, the definition they won't use in computing their so-called "popularity" ranking. This tells me that db-engine.com's ranking is not to be taken seriously.

As SQLite's page I pointed to claims, virtually every modern operating system and many popular programs come with at least one copy of SQLite. But then again, understanding the previous sentence requires using the word "popular" in a way that most people use that word most of the time, not the perverse way db-engines.com used the word. Put differently, Oracle, db-engines.com #1 pick, doesn't run on iOS, MacOS, GNU/Linux, and some variants of Windows. That means there are a considerable number of systems that host at least one copy of SQLite and will never host Oracle.

I find it very difficult to believe that there are more running or installed instances of Oracle on the world's computers than there are SQLite. The original article is plainly wrong. MySQL is nowhere nearly as "popular" as is claimed and the backing to believe otherwise hinges on a highly dubious source.


This looks like a great place to tell you about the burning white hot rage I have for Debian after my difficulties with MySQL this past weekend.

I was working on script that builds a LEMP stack with Vagrant on a Debian box. And when the script was upgraded to use Debian Stretch, everything we did to customize MySQL configuration broke, and I spent many hours of anguish not knowing why.

Until I stumbled onto the fact that Debian decided that in version 9, they'd start installing MariaDB when you typed "apt-get install mysql-server". I cannot fully explain the anger I have for this dark pattern: When I say "install x", you should either install x or say you don't have it, not install a competitor you think is close enough.


Debian MySQL maintainer here.

We wanted to maintain both MySQL and MariaDB in Debian stable. However the Debian release team forced the removal of MySQL on the basis that MariaDB only would suffice. We disagreed with this but they overruled us.

mysql-server in Debian stable became a transitional package in Debian stable (only; this doesn't affect testing or unstable) in order to make the upgrade work correctly. It does force users with MySQL server to crossgrade to MariaDB server, and does lead to "mysql-server" being a misnomer in Debian stable.

"default-mysql-server" being MariaDB is also misleading, IMHO, but it's the best we could come up with because there isn't any other collective term that covers both MySQL and MariaDB.

We didn't want this, but it was forced upon us by the Debian release team. Sorry for the trouble. As others have said, we did our best to explain this in the release notes.

We continue to maintain both MySQL and MariaDB in Debian unstable, which remains free of this mess.


I can see what the Debian release team was trying to do...but doesn't Debian also have a legal team? Did the release team consult with the legal team about having a package named "mysql-server" that uses a trademark owned by Oracle and installs software that is a competitor to the Oracle software the trademark refers to?

Oracle is litigious. This seems to be deep into "poke the bear" territory.


I totally understand the pain of being overruled by bad decisions, and another person here had also indicated a bit earlier that this was the case.

If I could still edit the comment, I would revise "burning white hot rage I have for Debian" to "burning white hot rage I have for the Debian release team". Alas I cannot. :(


> I would revise "burning white hot rage I have for Debian" to "burning white hot rage I have for the Debian release team".

Debian, as a project, appointed the release team and granted them the authority to make this decision. Therefore, IMHO, it's perfectly fair to attribute "Debian" to the decision. As long as you understand that it's not the Debian MySQL/MariaDB maintainers :)


When you upgrade to a new major release of Debian, you should take a look at the release notes. The mysql->mariadb migration has its own chapter: https://www.debian.org/releases/stable/amd64/release-notes/c...

I guess the fact that mariadb now owns the "mysql-server" package name is a side-effect of how apt manages upgrades and migrations. It's probably the best way to ensure existing users end up with a supported mysql-ish server after the upgrade, instead of being silently stranded on an obsolete debian8 package without receiving security updates.


Why does upgrading Debian need to replace my other software? If I have... Gimp installed, and I upgrade Windows from 7 to 10, I'd be understandably annoyed if Microsoft took the opportunity to replace Gimp with Adobe Photoshop. (And I'd be confused too, since neither Gimp nor Photoshop are Microsoft's. Yet here Debian has replaced software they don't own with other software they don't own, all without my permission.)

mysql-server doesn't belong to MariaDB, they can't "own" it. Doesn't Oracle hold a trademark on it?


Debian is an operating system that comes with its own curated library of software. To answer why it upgrades user software when you update the underlying operating system, you need to look at how that software is compiled in a Linux distribution.

Whereas Windows apps tend to standalone, bringing with them any libraries they need, Debian tends to have one copy of each library, runtime, and so on. If you upgrade Debian, you upgrade way more than just the kernel, the gnu tools etc. You also upgrade, for example, the Python, the windowing system, the networking stack. Apps that rely on these must also be upgraded in order to work with the new versions of whatever other packages they rely on.

It’s a domino effect and it keeps on knocking right up to GIMP or, in this case, MySQL.

My answer doesn’t explain why this particular choice was made but I hope it shows how Debian can do that sort of thing whereas Windows wouldn’t.


The way MySQL was being handled as a hybrid of open-source and closed-source for tests was causing a lot of trouble for organizations like Debian that wanted to ship truly open-source software. MariaDB was, at the time, a pretty close fork with the right license model so it was a good drop-in substitute.

These situations are rare, but it's not without precedent.


To be clear: I have no issue with Debian shipping MariaDB. I have an issue with Debian pretending MariaDB is MySQL.

If I run "apt-get install mysql-server" and get something other than MySQL, we now have a trust issue: The Debian repository cannot be trusted.


MySQL is not available in Debian repository since MariaDB has replaced it.

mysql-server package installs mariadb so that they can handle upgrades gracefully in a way that won't leave existing users in cold.

They would choose to do it the way you said but then the people who are upgrading their systems would be annoyed instead of you. Tough choice but either way it's a minor annoyance to be honest.

Addition:

Also it's not as if it isn't very clear what you're installing:

  $ apt install mysql-server
  ....
  The following NEW packages will be installed:
    default-mysql-server galera-3 libdbd-mysql-perl libhtml-template-perl libterm-readkey-perl mariadb-client-10.1
    mariadb-client-core-10.1 mariadb-common mariadb-server-10.1 mariadb-server-core-10.1 mysql-server
  ....

  $ man mysql
  MYSQL(1)                                                               MariaDB Database System                                                              MYSQL(1)

  NAME
         mysql - the MariaDB command-line tool

  ....

It is not aim of distribution maintainers to hide facts or make you use counterfeit programs, they just want to help users with transition.


> Also it's not as if it isn't very clear what you're installing

I certainly disagree that this qualifies as "very clear". The user asked to have mysql installed and the very first item on the list of stuff-to-be-installed is called "default-mysql-server". That certainly sounds like positive confirmation that you are indeed getting what you asked for.

Now, if you keep reading, you will see "mariadb-server-10.1" and stuff on that list, and if you also know two pieces of information (that mariadb is a fork of mysql, and that someone at Debian contemplated the idea of substituting one for the other), then you can see that you may have concluded the wrong thing and you can backtrack from there. Though really, at this point you still have conflicting information and it isn't conclusive.

So, I wouldn't say "very clear" is accurate. More like "possible to decipher if you have all the relevant context and are paying close attention".

As UIs go, this definitely violates the principle of least astonishment.


Okay, these are all fair points regarding package installation

(though I don't buy this was an enough excuse for GP in this specific case because it becomes clear at the moment you run `mysql' command)


It isn't correct to "upgrade" someone to a different piece of software, and it definitely wasn't "graceful". The whole source of my rage comes from the fact that everything was inexplicably broken and I did not know why.

If Debian doesn't have a piece of software, it shouldn't update it. It should leave it alone.

(And bear in mind, I was working from a script, so unless I sit there reading every line while it builds the VM, all of this unauthorized breakage of my machine is silent and without consulting me.)

And of course, the response to "man mysql" makes no sense, because MariaDB is not MySQL.


> It isn't correct to "upgrade" someone to a different piece of software, and it definitely wasn't "graceful". The whole source of my rage comes from the fact that everything was inexplicably broken and I did not know why.

> And of course, the response to "man mysql" makes no sense, because MariaDB is not MySQL.

MariaDB was supposed to be a drop-in replacement. This is why Debian (and other distros like Fedora) chose to name it and help transition of existing users. Distributions tried their best to make it as painless as possible. Sorry it wasn't the case for you but I guess it still helped more people than it hurt.

> (And bear in mind, I was working from a script, so unless I sit there reading every line while it builds the VM, all of this unauthorized breakage of my machine is silent and without consulting me.)

It was documented in Release Notes [0]. You should've read it before upgrading your script. It is always a good idea to read it regardless of this specific issue.

Even if you didn't read release notes, banner text of `mysql' command makes it very clear it's MariaDB:

  $ mysql
  Welcome to the MariaDB monitor.  Commands end with ; or \g.
  Your MariaDB connection id is 32
  Server version: 10.1.26-MariaDB-0+deb9u1 Debian 9.1

[0]: https://www.debian.org/releases/stable/amd64/release-notes/c...


> MariaDB was supposed to be a drop-in replacement.

"Supposed to" is a key point here, cause it's not: Explain works different if you need a certain thread info, AFAIK no JSON data type, GIS functions are different, lots of subtle differences that will bite you in the arse if you presume it's exactly the same engine. And as the time goes by the list of differences is just growing.


This is half correct. MySQL is not available in Debian STABLE.

It was allowed to stay in unstable:

https://packages.debian.org/sid/mysql-server-5.5

However, the maintainers are uninterested in maintaining it there, since the release and security team have overrode them and forcibly removed it from stable.


I know that but I didn't mention it because it would be an irrelevant detail in our context.


"White hot rage," "anguish," "trust issue"?

Transitional packages are a source of confusion that are never fun, but this still sounds usually harrowing for dealing with the bumps of upgrading servers to a new OS major version. Just how rushed and close to the wire was your deployment?


Coming from predominantly the Windows world, I never expected a Linux machine to silently go so far into being Microsoft.

Installing software that was not authorized on someone's machine is a serious security compromise.


Most people wouldn't blind update Windows versions (specially in a working environment), yet you updated your distro in a working environment without looking at release notes and are blaming the developers?

Just learn from your mistakes and carry on.


> Coming from predominantly the Windows world, I never expected a Linux machine to silently go so far into being Microsoft.

This is a common mistake: you are conflating Linux and Debian. You don't have a Linux operating system, you have a Debian operating system and services & applications package (a "distribution") that happens to use the Linux kernel and other common parts associated (though in many cases not exclusively) with the Linux ecosystem.

MS wouldn't touch MySQL in an update because they never provided it for you. They might break it with changes that the upgrade makes and it isn't their responsibility not to (if you kept MySQL updated then you'd be fine as the maintainers would be aware of breaking changes in the OS and would have released updates to handle them ahead of time - but the onus here is on you to monitor that situation). They wouldn't change the operation of the DBMS they provide with the OS because there is no DBMS provided with the OS for them to break. A fairer comparison is MS updating IE with Windows upgrades, potentially breaking maintained web sites/apps that depend on non-standard and/or undefined behaviours in older versions of IE. Windows releases do often break other software, as do SQL Server release upgrades, as any major update of that sort does which is why thorough testing before releasing to production or otherwise mission-critical environments is essential.

> Installing software that was not authorized on someone's machine is a serious security compromise.

The release notes quite clearly state that it is going to happen. By installing the release you effectively authorised the process. And "serious security compromise" is hyperbole here IMO. A functionality compromise, yes, so effectively a DoS, but only for software more manually installed that is therefore not Debian's responsibility to keep compatibility with at all, and by installing the update without understanding it when the documentation was there to read you DoSed yourself.

While we are throwing around hyperbolic statements of blame: Not understanding the nature of the release update by at least reading through the release notes, and not testing an update in a pre-production environment, are both serious due diligence failures.

If the upgrade had left a (now unmaintained) mySQL install in-place with warnings in the documentation someone who didn't read that documentation wouldn't have seen the warnings any more than they would see the notes about the switch from mysql to mariadb so would have been left with a potentially insecure environment down the line as flaws are found by remain unpatched as Debian don't support mysql packages in that release.

If they had simply dropped the package your system would have been completely broken without manual intervention to install mariadb in its stead leaving your environment (more) broken.

The only way to keep everyone happy would be to keep maintaining both sets of packages, which, lo and behold, for all intents & purposes they do trough the LTS program: current old-stable is effectively supported until April 2020 so if you must keep mysql for compatibility purposes with software not provided as part of the distribution then you can keep using that for a while yet giving you nearly two more years to find a solution (waiting for that software made compatible with mariadb, finding an alternative to it, using mysql from another source (compiled locally or through a third party package repo, or moving to a Linux-based OS that does maintain mysql packages).


Fedora made the same decision, so I don't know if you can blame Debian per-se. This was a community wide problem.


Not exactly. Fedora maintains MySQL as "community-mysql"[1], and MariaDB uses the "mariadb"[2] package.

[1]: https://apps.fedoraproject.org/packages/community-mysql [2]: https://apps.fedoraproject.org/packages/mariadb


`dnf install mysql` gives you MariaDB by default.


To be clear, those who were maintaining MySQL at the time this decision was made were perfectly happy maintaining both (I am one of those people, sort of, not very active the last couple years). The security team and release team is to blame, please direct your complaints to them.

https://tracker.debian.org/pkg/mysql-5.7


My experience has been that MariaDB is NOT a drop-in replacement for MySQL.


It was when they'd just forked, but over the years they have diverged.

You could argue this was inevitable, but it was a product of the conditions at the time.


But that time is not when Debian decided to install their software on my machine. They chose to patch over MySQL 5.5 with MariaDB 10.1 in 2017, and I'm going to judge them on that decision.


> But that time is not when Debian decided to install their software on my machine

No, you installed their software on your machine without reading the release notes.


Indeed. My issue with not being able to trust them to install what I tell them to install aside: If MariaDB had actually "just worked", I probably wouldn't have even noticed.


I agree. The MySQL -> MariaDB stuff is bonkers. I would like to read the discussion of it on the lists but can't be bothered to look for it.


It was a multi-year effort including us asking specifically for the policy to be spelled out at why there could be only one. IMO the Debian security team wrote a policy specifically to target Oracle in this specific instance, as a result. Digging it out is painful, as it all happened back in 2016, but you can start here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=793316


I also hold the opinion that MariaDB's failure to simply rename key "mysql" files/dirs to "mariadb" was intentionally confusing and an effort to steal market share.

Percona has been a much better citizen with its fork.


I don't think this is entirely debian maintainers fault. Package names aren't functions, so the string doesn't necessarily map to the thing you want. Left-pad is the modern analog to what I mean here.

Mariadb probably put in for the package name, tbh I can't be bothered to go find out, but I'd bet anything debian team didn't just arbitrarily make that call for users and ship it. 100% guarantee that.

I know your feeling though, that's the sort of thing that should be printing a warning or something when it happens.


Upon later research, I am not sure of the value of your guarantee: https://www.debian.org/News/2017/20170617

In "Stretch", the default MySQL variant is now MariaDB. The replacement of packages for MySQL 5.5 or 5.6 by the MariaDB 10.1 variant will happen automatically upon upgrade.


So then it wasn't unannounced...

Seems to me this is a non issue then.

I guess not everybody reads those, and I've already conceded that its something that ought to warrant a warning prompt or something.

But they didn't just change it for no reason.


Warning prompt is right at the command line tool:

  $ mysql
  Welcome to the MariaDB monitor.  Commands end with ; or \g.
  Your MariaDB connection id is 32
  Server version: 10.1.26-MariaDB-0+deb9u1 Debian 9.1

  Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

  $ mysql --version
  mysql  Ver 15.1 Distrib 10.1.26-MariaDB, for debian-linux-gnu (i686) using readline 5.2

It is pretty clear to anybody ever run the tool that it is MariaDB.


FWIW, this is also the case on RHEL-based systems.


As far as I know, no one has stepped up to maintain MySQL for RHEL/CentOS. RHEL/CentOS 7 shipped with MariaDB. There's nothing stopping _anyone_ from volunteering to maintain "community-mysql" for RHEL/CentOS in EPEL.


Unbelievable! I must say. Is Ubuntu (which is based on Debian) affected by this issue?


[flagged]


[flagged]


> Does anyone want to explain why they think it's acceptable to install software on someone's computer without their affirmative and informed consent

This is not an accurate description of what happened. The information was in the release notes, which are several notches above EULAs for providing informed consent. Secondly, Debian doesn't install sotware, the user does. Debian helpfully provides the list of the packages and the versions before installing. There were no surprises here.


Anyone who wants to know why the real legal world doesn't take tech people seriously, just look at this thread. Y'all live in your own little bubble.


I think you would need to read the GPL to figure out that Debian can do this and you have no recourse (no warranty clause is pretty clear on this).


Explain to me why you think the GPL matters here. Last I checked, the GPL pertains to the program (i.e., apt-get). That is not where the problem is.

Additionally, contracts are not carte blanche to violate criminal law. To the extent that a contract provision would purport to allow violating a criminal law, it would be unenforceable.


[flagged]


> (That being said, I wouldn't mind if Oracle cease-and-desisted them for installing someone else's software using their trademark as the package name.)

You are filled with hatred and rage towards a project and you wish they'd be be sued to death by Oracle because of a minor package naming issue?


Maybe it's time to switch to another distro. In Fedora, trying to install MySQL will install MariaDB, but it is a redirect of sorts, the package is not named MySQL.


> In Fedora, trying to install MySQL will install MariaDB, but it is a redirect of sorts, the package is not named MySQL.

Same as in Debian. mysql-server is a transitional package which ends up installing mariadb-server package at the end.


I'm having a hard time to pick between mysql8 and postgresql10.

MySQL is under Oracle is one of the major concern, plus it's said to be good for less-write-more-read workload, which is fine.

PostgreSQL is more on the open source side, however bugs like this really concerns me: https://lwn.net/Articles/752063/ -- basically, PostgreSQL does not do its own DIO instead it depends on file system and it can lead to data corruption, and there is no quick fix...

tried to install mysql8.0 on ubuntu16.04 failed so far, need a vm to test it out sometime.


From MySQL's documentation:

https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.ht...

> Caution: Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. In this case, the durability of transactions is not guaranteed even with the setting 1, and in the worst case, a power outage can corrupt InnoDB data. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer. You can also try to disable the caching of disk writes in hardware caches.


Pardon my ignorance, but do you know if this would affect PG as well, since it's an OS/hardware sourced issue?


It's same issue grandparent mentioned.


great, thanks for the finding.


MySQL does not do its own IO either. I'm not sure what the complaint here is. Almost every database system is affected by that bug.


...doesn't this bug mean that affected database systems do not have working transactions?


No.

It means that IF and only if, your storage is dying, you might learn about it later than you wish. In cases your storage is dying you are likely also going to observe problems that are outside of an fsync() not reporting an error when you'd like it to. As long as your storage hardware isn't broken, fsync still guarantees durability. Obviously it'd be better to get the error out ASAP.


Yes


Unless you need partitioned tables or have significant operational experience with MySQL and not with PostgreSQL, you should pick PostgreSQL.


With O_DIRECT you won't hit that bug. MySQL/InnoDB allows you to do that (not the default). However I think you're not very likely to hit this bug. This is why you have backups and replication.


I agree, this is in some ways a storm in a teacup and unlikely to cause silent data loss (it requires transient failure, but in reality failure storage probably fails completely and you definitely know about that). But as a matter of principle, it's bad that an operating system is so callous with my data.

On the plus side, I think this is great PR for FreeBSD (any filesystem) and ZFS (any OS).

One thing I'm curious about... MySQL still calls fsync() when you use O_DIRECT. See the warnings in the documentation about why O_DIRECT_NO_FSYNC is dangerous (to avoid loss of eg XFS meta-data, otherwise it might forget that a file was truncated or extended). I wonder if in some vanishingly rare failure mode you could lose data that way even with O_DIRECT.


I did not investigate MySQL code (or other product) but I would be very surprised if PostgreSQL is the only one impacted by this issue.


Yes that seems unlikely. Almost all Unix software uses buffered IO, including several other databases or database-like things including package managers, DNS servers, mail servers, ..., and MySQL does too by default. It even uses it when you choose O_DIRECT (though I'm not sure if that's exposed to write-back error eating risk or not).

That said, as several people have said, this is a very unlikely failure mode. When filesystems die, you know about it. The type of transient failure required to lose data this way is quite unlikely.


by reading the lwn article it seems postgresql is hit especially hard per its design and "heavy" use of fsync, I have yet to find similar impact on mysql.


All SQL databases are "heavy" users of fsync. Though, this problem is not relevant on FreeBSD nor Illumos.

Error handling for a operating system is really complex stuff. I'm surprised that this isn't working properly, fsync is not about speed, but reliability. Userland applications expect it to work and behave as intended.


the key is error reporting


For what it's worth, I was in this same situation.

Without getting overly technical, MySQL's ease of use, wide supported drivers and toolsets, along with it's ubiquitous nature made me choose MySQL.

Also, almost everyone knows MySQL. Easy to find talent.

(Avoiding the technical reasons for brevity).


Not just that, but I have yet to come across a piece of software that is both free and comparable to MySQL Workbench. It makes designing the database ridiculously easy.


Absolutely - I spent way too long finding postgres tool, and even then, it was just okay.

Also, so many big players use proven MySQL setups (Uber, Github, etc), that I feel like the scalability is there if needed.


true, mysql workbench is great


I gave up on Mysql long time ago when I found out that it cannot refer to temporary tables multiple times in the same query. Not sure if it has been fixed now...

EDIT: Does not seem fixed https://dev.mysql.com/doc/refman/8.0/en/temporary-table-prob...


Oh. Apparently I don't know SQL nearly as well as I thought.


You could write an entire book on SQL and have the same thought.


The more you know SQL, the more you don't know SQL.


As your circle of knowledge increases, so does your circumference of ignorance.


I used to have the entire reference set for DB2, and it didn't accurately describe the actual behavior of the RDBMS.

Several other documented configurations for DB2 components didnt work and after months of work with IBM support were later deemed unsupported. This was circa 2002-2004 while I was an intern. Havent used DB2 since, so dont know if the situation has changed.


SQL seems to be evolving by tacking on features willy-nilly, with no clear holistic philosophy, and worse, the standards which emerged aren't followed by all major implementations and even standards which are followed differ starkly in terms of performance across implementations. The product is that we all write implementation-specific SQL, which largely defeats the purpose of a standard language altogether.

I think if someone came along and fixed these major issues and gave it a modern syntax without significantly degrading performance, it would be sufficiently different that the XKCD "competing standards" meme would not apply.

I would love to hear from someone with database specialization.


The adoption of standard SQL is really poor on many SQL products. I mentioned that in the article:

> I believe the main reasons are a lack of knowledge and interest among developers along with poor support for modern SQL in database products.

However, it IS getting better at the moment. Have you seen the "New SQL" systems that implement window functions? MySQL is finally adding those features and MariaDB picks other features from the standard.

There is movement.

I'll just keep on making my comparison matrices to make the differences visible. This will (hopefully) increase developers interest, which is basically a market demand. Vendors listen to market demand. evil grin


Browsing the featured added by SQL standards on Wikipedia:

SQL:2003 added XML functions

SQL:2006 more XML!

SQL:2008 XQuery!

SQL:2016 JSON!

My guess it's not so much "willy-nilly" but more a sign of "what was popular."


A reasonable critique, in part, but you are missing some important stuff:

SQL:2003 added Window Functions, standardized sequences, and columns with auto-generated values

SQL:2006 okay, this one sounds like it was mostly about XML, but a lot of the functionality that went into dealing with structured content is what made the JSON support workable.

SQL:2008 this gave us the MERGE version of upsert, instead of triggers (which apply to views, and are a powerful way to deal with backwards compatibility)

SQL:2016 LISTAGG (super powerful), Row Pattern Recognition (a lot more intuitive that comparable uses of WINDOW functions), standardized date/time parsing, and polymorphic table functions (which you can kinda see in, for example, BigQuery's legacy table wildcard functions)


This is pretty much what I mean by "willy-nilly". Little concern seems to be paid to how well it integrates into some holistic information access theory or paradigm. I would expect some adherence to relational algebra, and if that isn't sufficiently powerful, I would like a transition to some more powerful theory rather than welding on the new hotness. I'm not saying I could do it better, mind you.


Calling it now: in 2020 they will add YAML support.


WASM for SQL


Definitely blockchain.


Hopefully one day GIS functions and types get standardized.


Do not be mislead, MySQL is the PHP of the database world.

https://slideshare.net/anandology/ten-reasons-to-prefer-post...


The reason I think a single database release has a dramatic effect on the entire SQL ecosystem is simple: MySQL is the weakest link in the chain.

That's a heckuva deep cut.

I wonder why/how MySQL got so popular, when in my head the go-to is Postgres (perhaps because I worked on Django/Rails stuff early on). Is it because of Oracle and their marketing/sales work?


> I wonder why/how MySQL got so popular, when in my head the go-to is Postgres (perhaps because I worked on Django/Rails stuff early on). Is it because of Oracle and their marketing/sales work?

MySQL's popularity long predates the Oracle acquisition. MySQL got popular because it was usually available (alongside PHP) on shared hosting and had very fast read performances (using the default MyISAM storage, at the cost of write performances or actually caring about your data). What with the web being front-loaded on read, that made it a pretty good fit.

Meanwhile historically postgres was byzantine to set up and slow. It cared much more for your data and had plenty of features (custom types! procedural languages! transactional DDL!), but it didn't really work OOTB (still doesn't, IIRC the base configuration still assumes rotating rust and 128MB RAM or some shit) and wasn't available on most hosting platforms, especially not the really cheap shared hosts.

Also while Django defaulted to postgres, I'm pretty sure Rails used to kinda-sorta default to mysql (I'm talking 10 years back, many plugins/extensions didn't work properly on pg).


Makes sense, thanks for the thorough answer! The popularity of the LAMP stack predates me a bit, so I didn't know as much about the history.


MySQL makes some of the same assumptions about memory. You have to tune it if you want optimal performance.

https://www.percona.com/blog/2016/05/03/best-practices-for-c...


No, it's been that way long before Oracle took over.

It's because the two projects had different starting goals, and thus attracted different audiences. MySQL was designed to be small and fast, while sacrificing consistency and integrity. Postgres was intended as an enterprise class DB from day one, with the expected robustness, integrity, and feature set. It led MySQL on features for many years, but lagged on speed.

During the .COM boom of the 90's, the flurry of startups cared more about speed and ease-of-use, and were willing to sacrifice consistency. Thus it became the default choice for most open-source projects and web development. Because of this, a whole ecosystem of support and tooling developed around MySQL, which reinforced the industry trend.

As to performance, Postgres has long since bridged that gap. As to ease of setup/config - Postgres was never difficult in the first place. The perceived complexity was due to its more robust feature set and security model. There was certainly more to learn, but honestly, it was never so terribly difficult as some people then and now seem to view it.


Web startups typically didn't care that much if they lost say 1 out of every 10,000 transactions if that sacrifice kept them cheap and nimble. Since they usually didn't charge customers, losing 1 out of 10,000 is not that big of a deal. Of course, if a bank did that they'd be sued up the wazoo, and that's why transactional integrity and "big iron" database features matter to banks.


MySQL became popular because it was quick and dirty.

It provided fast relational SQL database for the early web application stack without any concern of maintaining ACID (Atomicity, Consistency, Isolation, Durability) that "real" databases like Postgres providied. It had slight speed advantage and that was enough to turn the scale against Postgres.

MySQL was demonstration of Worse Is Better in the database world. https://www.dreamsongs.com/WorseIsBetter.html


> MySQL became popular because it was quick and dirty.

Yep, over and over again, "worse is better" proves itself as the all-important software mantra. Get that flimsy "database" out the door and loop back around to fix the "eats data" bugs 5+ years later.

The key to adoption seems to be to pinpoint something that people really want ("super fast 'SQL database' that I can run virtually for free" c. 1999), rush out a one-fifth-working version of the headliner features and pretend like it's all well and good, and then work to spread that solution as widely as possible.

Once you get the momentum, it's impossible to stop, and someone like Sun swings by to ask if you'd mind taking a billion dollars off their hands.

I don't think there's really a credible argument that technical merit has anything to do with success or adoption anymore (beyond just "superficially appears to work"). MySQL is a good specimen, but JavaScript is the real poster boy for the indisputable triumph of "worse is better".


> but JavaScript is the real poster boy for the indisputable triumph of "worse is better".

Not sure it beats PHP in being worse. But at least both got substantially better with recent versions.


MySQL became popular because it was quick and dirty.

The MS-DOS of databases


If anything, Oracle's "marketing" is destroying MySQL's popularity. Oracle is where software goes to die in my opinion. Their suite of software is the biggest rats nest of convoluted crap I've ever dealt with. So glad I'm off that team in my organization.


My memory might be a bit fuzzy (I was only in my teens when LAMP stack was popular), but I remember a lot of hosts didn't have postgres support built into their PHP distribution at first. If you were using shared hosting, it was a given you'd have MySql support, but maybe not Postgres.


MySQL was actually really easy to install when Posgresql wasn't. MySQL was also much faster when it supported less SQL features. Neither of those things are really true anymore and Posgres has been my go to DB for a while now.


Yup. The early MySQL developers spent a lot of time and effort making the install and setup process as fast and smooth as possible; a lot of people stuck with what they could get working first.


Back in the day (i.e. long before Oracle purchased MySQL), if you wanted an open source database, you had two choices: Postgres, which was reliable, and MySQL, which was fast. It turned out there were a lot more people who cared about being fast than there were who cared about reliability.

Eventually Postgres got fast enough and MySQL got reliable enough, which has made the distinction between them somewhat fuzzier today. But by then MySQL had become the standard database offered by umpty zillion web hosts, and ubiquity is a hard position to dislodge someone from.


>MySQL got reliable enough

MySQL still doesn't support nested DDL's, and this has caused me extreme amounts of pain in old projects where consistency was expected and we did not receive it on crucial payment processing.


>Postgres, which was reliable

My experience recently and historically has been the exact opposite. Postgres seems to blow up in more mysterious and hair-yankingly frustrating ways than MySQL does. "Why Uber Engineering moved from Postgres to MySQL" echos a lot of the issues I've had with Postgres in large production environments.


Uber’s use of PostgreSQL was a mess. It’s probably the worst example of why not to use PostgreSQL.


https://lwn.net/Articles/752063/ PostgreSQL's fsync() surprise

this is very scary, if not fatally scary.


I think it is scary about Linux. They are still wrestling with reliable error reporting for buffered IO after decades, and while the changes in 4.13 are an improvement in some ways they make some highly questionable assumptions that don't seem to be based on POSIX. I'm not following it in detail but fortunately it sounds like something might be done to improve the situation on the Linux side?


I can't read it because it's paywalled, but how is the operating system's fsync behavior "fatally scary" with regards to PostgreSQL? How does MySQL handle durability in these cases? What's your point?



I think that the main reason for a long time was (a) being able to install mysql quickly, and (b) having a decent SQL UI, which was MySqlWorkbench.

Things might have changed now for Postgres.


The thing is, features like Postgres' native UUID column types, JSONB and HStore are killer features that give developers data flexibility they didn't use to have before, meaning with one database I can go between relational or MongoDB-style JSON dumping where needed.

This, plus strong consistency to prevent shooting yourself in the foot, is valuable for 99% of use cases where you don't need insane web scale.


My 2 cents from the trenches back in 2004 and the LAMP stacks boomed; replication. It’s really really super simple. And the binlog format is open! The commercial and open source options at the time, were painful to scale. Go ask an amazon engineer about their old 3 oracle mirrors and cdn-like Berkeley db distribution over a beer or two.


To add to other replies. For a long time, Postgres did not run on Windows which was and is the dominant OS. This was the days before affordable VPS were available for you to spin up a server. You developed your site on your laptop and FTP's it to hosting provider.

Edit: Added more context


For a heck of a long time, well before Oracle owning it, Mysql was the defacto standard for building web apps. Remember the LAMP days? I'd wager it wasn't until early-ish in this decade that Postgres became the goto for web apps etc.


It's probably all because of PHP. Over 90% of PHP apps were built on Mysql (Wordpress, Drupal, Joomla). And you know, WP alone powers like half the websites on the internet. Sad reality.


This statistic is mentioned a lot, but it would be interesting to see a breakdown by amount of traffic on those sites. In these metrics thanks to the big web companies, the reach of Wordpress is much lower and other application frameworks are more predominant.


My recollection is that MySQL got popular because it was the cool and hip and free replacement for mSQL. Certainly I recall migrating from the latter to the former in 97 or so. Postgres at the time had only just started supporting SQL and was much harder to install, administer or understand than either (for me at least!)


MySQL was popular before Oracle aquired them


It was prior to the Oracle purchase, I believe.

MySQL was the M in the LAMP stack, and was installed on many linux distros by default.


cause PHP


Celso's quote applies equally well to SQLite, which is SQL compliant until it's not. Glad to see MySQL improvements, but wish SQLite was also getting similar upgrades. It was a bit depressing to see the solid column of Xs by SQLite for all these great new features the author was relishing.


> It was a bit depressing to see the solid column of Xs by SQLite

Except WITH [RECURSIVE] which is in SQLite since early 2014. Damn, I forgot to use that trivia in my MySQL bashing... ;)

Other than that, I met Richard Hipp in person some years back. He is a VERY pragmatic guy. I wrote an article about it:

https://use-the-index-luke.com/blog/2014-05/what-i-learned-a...


I'm happy to see that MySQL is progressing, but I think that enumerating its adherence to the latest SQL standards overlooks some of its other serious flaws from the perspective of operations and reliability. Below are two of the worst operational situations I've encountered with MySQL. I've run several other databases in high-volume production environments and never encountered those kinds of problems. That MySQL's developers would have made these choices that result in those kinds of problems is truly mind-boggling to me.

* If you have pre-5.6.4 date/time columns in your table and perform any kind of ALTER TABLE statement on that table (even one that doesn't involve the date/time columns), it will TAKE A TABLE LOCK AND REWRITE YOUR ENTIRE TABLE to upgrade to the new date/time types. In other words, your carefully-crafted online DDL statement will become fully offline and blocking for the entirety of the operation. To add insult to injury, the full table upgrade was UNAVOIDABLE until 5.6.24 when an option (still defaulted to off!) was added to decline the automatic upgrade of date/time columns. If you couldn't upgrade to 5.6.24, you had two choices with any table with pre-5.6.4 types: make no DDL changes of any kind to it or accept downtime while the full table rewrite was performed. To be as fair as possible, this is documented in the MySQL docs, but it is mind-blowing to me that any database team would release this behavior into production. In other words, in what world is the upgrade of date/time types to add a bit more fractional precision so important that all online DDL operations on that table will be silently, automatically, and unavoidably converted to offline operations in order to perform the upgrade? To me, this is indicative of the same mindset that released MySQL for so many years with the unsafe and silent downgrading of data as the default mode of operation.

* Dropping a table takes a global lock that prevents the execution of ANY QUERY until the underlying files for the table are removed from the filesystem. Under many circumstances, this would go unnoticed, but I experienced a 7-minute production outage when I dropped a 700GB table that was no longer used. Apparently, this delay is due to the time it takes the underlying Linux filesystems to delete the large table file. This was an RDS instance, so I had no visibility into the filesystem used and it was probably exacerbated by the EBS backing for the RDS instance, but still, what database takes out a GLOBAL LOCK TO DROP A TABLE? After the incident, I googled for and found this description of the problem (https://www.percona.com/blog/2009/06/16/slow-drop-table/) which isn't well-documented. It's almost as if you have to anticipate every possible way in which MySQL could screw you and then google for it if you want to avoid production downtime.


Anyone know when this will make it to AUR on Arch? I've been waiting for CTEs for a long time!


I have one in my backpack I am parking now.


Are the feature comparison tables in this article supposed to reflect the state of SQL databases as of several years ago?

I've been using Postgres very consistently for the past half decade (as a dev, not a DBA) and it's had a `GROUP BY` and `WHERE` clause for at least as long.

I'm not totally sure what "base table `UNIQUE`" is supposed to mean in this context, but if it's just the bare `UNIQUE` constraint, Postgres has also had that for a long, long while.


The table is about dependency checking for those constructs.

E.g. SELECT countries.code, countries.name, count(*) FROM countries JOIN countries_languages as co_lang on co_lang.country_code = countries.code GROUP BY countries.code

This isn't supported by Postgres, because countries.name is not in the group by clause, on the other hand it does make sense, because code will be unique in the countries table, so there will always be only one name for a given code.

(see https://mysqlserverteam.com/mysql-5-7-only_full_group_by-imp... which is also linked from the original article)


And as specifically noted in that linked article you reference, it talks about how functional dependencies are a feature of SQL-99.

This aspect of the original article is really talking about how MySQL decided to implement a more up-to-date version of the spec that many competitors when they finally addresses their deficiency (as well they should, finally being SQL-92 compliant in the 2010's wouldn't really be anything to boast about).


Hi!

The first feature matrix you see is about checking of functional dependencies.

For example, PostgreSQL doesn't see the functional dependency between the columns ID and MA in the result of this query.

    SELECT id
         , max(a) ma
      FROM ...
     GROUP BY id
As a consequence, this query doesn't work, because the outer query refers to MA which is not in the GROUP BY clause of the outer query:

    SELECT COUNT(*) cnt
         , ma
      FROM (SELECT id
                 , max(a) ma
              FROM ...
             GROUP BY id) x
      GROUP BY id
Detecting these things makes life easier.


That's a fairly obtuse example. Is there a standard you're drawing from, for this? Because this example works fine:

    postgres=# create table func_test ( id int primary key, data text );
    CREATE TABLE
    postgres=# insert into func_test values ( 1, 'josh' ), ( 2, 'markus' );
    INSERT 0 2
    postgres=# select id, data from func_test group by id;
     id |  data  
    ----+--------
      2 | markus 
      1 | josh


In SQL:2016, it is

> 4.24.13 Known functional dependencies in the result of a <group by clause>

Not that this is about the result of a <group by clause>

edit: The matrix I have in the article is not complete. I've just picked a few examples from the standard and checked them to get an overview. I guess the fact that PostgreSQL supports some of them but MySQL more of them is well represented in the matrix.

The full text is below. I'm happy to correct, when I'm interpreting this wrong (which is perfectly possible).

Let T1 be the table that is the operand of the <group by clause>, and let R be the result of the <group by clause>. Let G be the set of columns specified by the <grouping column reference list> of the <group by clause>, after applying all syntactic transformations to eliminate ROLLUP, CUBE, and GROUPING SETS. The columns of R are the columns of G, with an additional column CI, whose value in any particular row of R somehow denotes the subset of rows of T1 that is associated with the combined value of the columns of G in that row. If every element of G is a column reference to a known not null column, then G is a BUC-set of R. If G is a subset of a BPK-set of columns of T1, then G is a BPK-set of R. G ↦ CI is a known functional dependency in R.


Hmmm. PostgreSQL treats output rowsets as their own thing, so this works differently in a subquery than in a top-level query. Not that that's desireable, it's just a consequence of an implementation that gives a lot more flexibility in other areas (table functions, for example, where Postgres goes well beyond the standard). So looking at the standard I agree that having your example work is superior to not having it work. I'd appreciate a footnote for that particular item for Postgres, though, since the most common forms of GROUP BY dependencies do work.


That example code seems overly complicated and redundant. Should the code be simpler like this:

  SELECT id
        , max(a) ma
         , count(*) Cnt
   FROM ...
  GROUP BY id


The purpose of the example is to demonstrate functional dependencies based on grouped tables.


I’m constantly flabagastered by what’s considered novel in the world of MySQL.

But then I remember this is the database that lied (lies?) about ACID transactions, and had no problems storing non-existent dates in a DateTime-column because its creators considered ensuring data-integrity and consistency “the job of the application layer” (which back then often was PHP).

Nothing should really surprise me.


> I’m constantly flabagastered by what’s considered novel in the world of MySQL.

What's that in reference to? The article explains that MySQL implemented group by using a newer standard than many competitors did, by looking at functional dependencies. Not exactly novel, but good for MySQL, they decided to go beyond the competition when finally catching up on this particular feature.

There are, of course, many other problems with MySQL (as you allude to), such as silently ignoring CHECK constraints, but that doesn't seem to make sense as what you were referring to in context of what they consider novel.


> this is the database that lied (lies?) about ACID transactions

Time moves on. The Innodb storage engine has had ACID transactions for over a decade.

> no problems storing non-existent dates in a DateTime-column

PostgreSQL checks date formats, but updating (or not updating) timezone databases would affect validity. So it's not as simple an issue as you make it to be.


You aren't reading that table correctly. It's about functional dependency checking, not about whether you can use a WHERE clause.


Yeah, later on in the article it has an "X" for PostgreSQL for "JSON_OBJECTAGG" and "JSON_ARRAYAGG" and I'm 100% sure postgres has the former (I used it like 10 minutes ago), and like 99% sure it has the latter.


The table is very strict, it requires that the actual standard be supported (possibly incompletely). Postgres has ways to aggregate into a json object or array, but they're non-standard. If you write standard SQL making use of those features, it will not work.


I noticed that too and PostgreSQL does have them. They're just named slightly differently: `json_object_agg()` and `json_agg()`.


https://www.postgresql.org/search/?q=json_objectagg

But there is a patch pending for this. Not sure if this gonna be included in 11.



Those gory details I've mentioned in the article.

This is how standard SQL JSON_OBJECTAGG works: https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016#js...

It can actually do much more:

<JSON object aggregate constructor> ::= JSON_OBJECTAGG <left paren> <JSON name and value> [ <JSON constructor null clause> ] [ <JSON key uniqueness constraint> ] [ <JSON output clause> ] <right paren>

This free technical report of ISO has some examples: http://standards.iso.org/ittf/PubliclyAvailableStandards/c06...


I haven't done any exact test, but it looks identical to how the json_object_agg function in postgres works, but with a different syntax.

Is it really that different to not even warrant a note in the table?

Clearly you know better than I do, but the big red X surprised me as the fantastic JSON support is one of the reasons why I tend to reach for Postgres so often over other DB engines.


A big red X doesn't mean the database doesn't have other, similar but proprietary functionality. It just means: it doesn't support this function in a standard confirming way yet.

There is nothing against using non-standard functionality in that case. In this particular case. PostgreSQL got JSON support (~2012?) long before it was added to the standard (2016). Obviously, their JSON functions differ from the standard (as they didn't lobby their functions into the standard).

However, slowly but surely database will add the standard functionality, which makes life easier.

Having that said, the standard JSON_OBJECTAGG is more powerful as it gives you control over null handling (NULL ON NULL -vs- ABSENT ON NULL) and allows checking for duplicate keys ([(WITH|WITHOUT) UNIQUE KEYS]).


Maybe add some little footnotes to the X? It took me a few minutes of thinking "Of course postgres has json object aggregation here it's working perfectly and it's documented" before I realised that sure it existed but it's not the standard version, at all, which is probably why it's marked as a no, because it's not Troel's page and Markus really really cares about the standard.


Given the name difference it's possible that the semantics are slightly different.


That part of the article doesn't make much sense. It sounds like MySQL finally enabled rational GROUP BYs instead of randomly choosing a row when the grouped rows aren't distinct. This is something at least PostgreSQL has done for ages.


Nope, the old mysql was extremely stupid, but the postgres behaviour wasn't that smart because it refused anything not in the group by clause (even when constraints on the table ensure that everything in your select clause can in fact be determined from the columns used to group).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: