Incredible, commendable work ! Should be funded properly and praised by all DB communities that benefit from it.
Indeed is curious as to how it runs against Oracle or MSSQL, but maybe commercial guys should fund the work first otherwise they’ll just thanku-for-the-job which actually would potentially save them millions...
We will also consider testing DBMS with a DeWitt Clause in the future. For now, we have indirect evidence that our approaches could find bugs in Oracle. Specifically, we found a case in MySQL, which we consider a bug, but which was closed by the bug verifiers, partly because Oracle computes the same - we believe incorrect - result. Our bug report can be found here: https://bugs.mysql.com/bug.php?id=99182.
We believe that it is a bug, because:
* the invariants that we are testing using TLP, which is the approach that we used to find the bug, should hold for all DBMS;
* dropping the UNIQUE constraint would yield the result we would expect, and we believe that an index should never affect a query's result;
* an earlier version of MySQL computed the result we would expect;
* and DBMS that strive to support the MySQL SQL dialect, computed the result we would expect (e.g., TiDB).
It really is astounding. Given the amount of testing sqlite receives, one cannot help but wondering whether writing a database is beyond the reach of our current development tools.
Thank you! SQLite was indeed a challenging target. We could find and report many bugs in it only because all of our bugs were addressed very quickly (typically within a day). For some of the other DBMS that we tested, bugs were fixed slowly, and for some we also saw many unresolved bugs in their bug tracker, which is why we did/could not comprehensively test these DBMS.
Before we proposed our testing approaches, the state of the art of finding logic bugs was writing unit tests, which does not scale well, since it requires much manual effort. We believe that we provide the first practical approaches to automatically finding logic bugs. Companies have already started adopting them. For example, PingCAP implemented all three approaches that we proposed in https://github.com/chaos-mesh/go-sqlancer to test their DBMS TiDB.
A number of tools are available to test other aspects, like performance or for crash bugs. SQLsmith, for example, is one effective tool that allows finding crash bugs (see https://github.com/anse1/sqlsmith).
Clearly writing databases is not beyond the reach of of current development, since we have a bunch of them and they have been doing pretty well at serving production workloads for literally decades.
Writing 100% bug-free databases is probably not possible with current development tools, but it might also not be required. Every extra "nine" of reliability costs exponentially more, whether you're looking at uptime, durability or any other metric. At some point the extra costs just don't measure up to the extra correctness gained.
This is incredibly cool, finding almost 200 bugs in SQLite is a really great achievement. I'd certainly be interested in how often these bugs come up in real applications.
Thanks! Here is one interesting example of a bug that was reported on the SQLite mailing list: https://www.mail-archive.com/sqlite-users@mailinglists.sqlit.... The SQL query that triggers the bug is rather complex. Interestingly, it was generated automatically: "I might not spell it like that myself, but a code generator would do it (and much worse!). This example was simplified from a query generated by a Django ORM queryset using `.exclude(nullable_joined_table__column=1)`, for instance."
When the user reported the bug, it had actually already been fixed on the latest SQLite3 trunk based on one of our bug reports. The query in our test case used a “NOT NULL NOT NULL” predicate, which is unnecessary complex as well: https://www.sqlite.org/src/tktview?name=c31034044b. We believe that this is interesting evidence that even “obscure” bugs matter in practice, for example, when the queries are automatically generated, as demonstrated above.
Amazing work; I was wondering whether SQLancer can be integrated with systems like Cosette[1] and XData[2] in a useful way? These systems are different as in they test the "correct intention" of SQL queries and are mostly engine-agnostic.
Manual correctness checking of SQL is cumbersome and error-prone since developers may write queries in several different ways, while using fixed datasets (independent of the query) may fail to catch errors in them. XData automatically generates small datasets designed to catch errors in the queries (an example error is using a left-join instead of a full outer join).
Any ideas about how we can combine the utilities of SQLancer and XData / Cosette? I was a research intern at IITB Database Group, who developed XData and I still contribute to the project.
Thanks for your interest! I'd love to test H2, but we are currently lacking the resources to do so. We've received a number of similar requests in the last couple of days and weeks. For each DBMS that we test, we need to add some logic to randomly generate databases and queries, since the SQL dialects typically differ. I've opened an issue in SQLancer's issue tracker for now: https://github.com/sqlancer/sqlancer/issues/23
The name SQLancer suggests that it is made for SQL databases. Do you think it would be easy to adapt it for documents databases like CouchDB or MongoDB?
I believe that the approaches that we designed could be implemented/extended to document-oriented DBMS. Since I don't have any experience with those DBMS, I have no intuition on whether the approaches would uncover any bugs.
Implementation-wise, supporting CouchDB and MongoDB in SQLancer would be feasible. SQLancer shares little infrastructure between the testing implementations for the individual DBMS, and thus imposes only few restrictions on them. We would definitely welcome implementations for any NoSQL DBMS in SQLancer!
Many DBMS development teams use SQLsmith and other random query generators, which can effectively find crash bugs. In contrast, we designed and developed SQLancer to automatically find logic bugs, which are bugs that cause the DBMS to compute an incorrect result set (e.g., by omitting a record from the result set).
How do you design the tests? Do you think of complex cases and hope any dbms may fail the test? Because this is would result in many tests having no problem and sometimes finding a bug, which cant really be an effective way of doing it.
We automatically generate databases and queries. The main challenge is to verify that the query yields the expected result. To this end, we have designed and implemented three different approaches:
* Pivoted Query Synthesis (PQS): The main idea is to generate a query that is guaranteed to fetch a randomly-selected row, to which we refer as the pivot row. If the pivot row is not contained in the result set, we have detected a bug.
* Non-Optimizing Reference Engine Construction (NoREC): The idea is to transform a query so that it cannot be optimized effectively by the DBMS. By comparing the result set of the "optimized" query with the result set of the "unoptimized" query, we can detect bugs that affect the DBMS' query optimizer.
* Ternary Logic Partitioning (TLP): The main idea of this approach is that we can partition a given query into multiple "partitioning" queries, each of which computes a partition of the result, and to then combine these partitions to derive the same result as the original query. If the two results differ, we have detected a bug in the DBMS.
According to the PQS paper, they pick an existing row in the database and generate random predicates until they find one that’s true for that row. They then use that as a WHERE clause and verify the original row is contained in the result set. A similar procedure is used to find relevant JOIN clauses.
You could use a fuzzer and fuzz several DBMS products at the same time. If their db's states or query results start to diverge, you probably found a bug in one of the product.
Not possible as their licence explicitly forbids to "disclose results of any program benchmark tests without Oracle’s prior written consent." [1] and a listing of bugs found in different RDBMS by an automated approach would likely be considered a benchmark.
I doubt those terms have any legal validity. Still, pointless. Any one paying for Oracle probably doesn't care about bugs since they have support contract /s.
Indeed is curious as to how it runs against Oracle or MSSQL, but maybe commercial guys should fund the work first otherwise they’ll just thanku-for-the-job which actually would potentially save them millions...