I'm working on static analyzer for SQL: https://holistic.dev It's a useful tool ...

ibizaman · on March 22, 2020

That looks really cool. Any chance you can share the code? Or share more about what you did? Like what does those micro services look like and what do they do? Also what you say about type errors makes me think about static typing. Does it relate to that concept?

I too have been diving into writing raw SQL some while ago and I liked it.

antonrevyako · on March 24, 2020

Look at this, I've returned the ability to export types :) In beta now available renderers for flow and ts. You can see alive how the export results change when you add columns or change constraints. If there are no foreign keys using JOINs, fields can become nullable, etc. I made a simple example: https://holistic.dev/en/playground/5596577a-ad0e-40e6-a05b-e... Remove -- before ALTER and see how the result of the type export will change.

antonrevyako · on March 22, 2020

Thank you for the feedback! The service will be put into commercial operation as saas in the coming months. I do not plan to open the source code, unfortunately.

The idea is that the tool will only work with the sources of SQL queries, and I had to work hard to implement it.

The work consists of several steps 1) get the AST (Abstract Syntax Tree) database schema (DDL). At this stage, only Postgresql up to version 10 is supported. Soon I will deal with the parser from Postgresql 13. This is not a trivial task. I have to build everything from the source code of Postgresql :)

2) Building a database model. We parse all DDL-commands one by one and apply changes. For example, apply all ALTER TABLE to the table described above, add user-defined functions to the list of built-in functions, and so on. It is necessary to have a complete overview of all types of tables, indexes, and each table described in a DDL script.

3) get an AST query (DML) and build a model of the result. This is the most complex and interesting part :) The task is to get a list of field names and their types, which will be returned after the query execution. You need to consider CTE and the list of tables specified in FROM. You need to understand what function will be called and what result will be output. For example, function ROUND is described in three variants, from different arguments and with varying types of result, function ABS - in six variants. I.e., it is required to understand the types of arguments before selecting a suitable one :) In the process, implicit type casting is considered if necessary.

The same is valid for operators. An operator in Postgresql is a separate entity that you can create yourself. Postgresql 11, for example, describes 788 operators.

Various types of syntax are taken into account, for example - SELECT , t1., id, public.t2.name, t3.* FROM t1, t2, myschema.t3 - will be parsed correctly.

But even this is not the most challenging thing :) The most exciting thing is to be able to understand two things: A. whether each of the fields can be NULL or not. It depends on many factors, such as - how the JOIN of the table with the source column is made, whether there is a FOREIGN KEY, what type of JOIN is used, what conditions are described in the ON section, what is written in the WHERE conditions. B. How many records will return the query described in the NONE, ONE, ONE OR NONE, MANY, MANY OR NONE categories. Again, this is affected by the conditions described in JOIN and WHERE, whether there are aggregation functions, whether there is GROUP BY, whether there are functions that return multiple records.

This function, by the way, is also used in the first step - to get types for VIEW.

This was a brief description of the first part of the service ;). It can already be a service in itself. It is possible to generate types and all code of microservices, including JSON-schema and tests, based on DLL and DML set. But as I wrote above, most people prefer to use an ORM such as Django or RoR. :( For this reason, I've removed this functionality from a playground and will take it to a separate project when I get my hands on it. It will also include various tools, such as - information about all possible exceptions that may be thrown by request, automatic creation of migrations to CI if your DDL files are in the repository, whether there are unused indexes or fields and many other exciting things :)

And the second part of the service that I plan to promote in the first place is a tool to search for bugs, architectural, and performance problems automatically. The target audience here is DBA, who have to deal with forgeries RoR-developers and their colleagues :) This part is entirely based on all information obtained in the previous stages. Part of the errors can be understood by AST (linter principle), but the most interesting rules are based on knowledge about types, understanding of NULL/ IS NOT NULL, and the number of returned records.

There are more than a thousand such rules, but I suppose there will be about 5000 of them in the next couple of years :) Also described are about 200 rules that can lead to runtime errors, but they are not needed by DBA, because their job is to find problems in valid queries :) There will be more than 1000 such rules as well since Postgresql describes more than 1700 runtime errors.

And yes, this is all about Postgresql. After I start commercially using Postgresql, I plan to do the same for Mysql and then perhaps for Clickhouse if there are no special offers of cooperation :)

syastrov · on March 22, 2020

This is actually something I thought of doing, but you actually did it. It would be amazing if you can find a way to open-source at least part of it.

But my application was to be able to use raw SQL queries instead of an ORM in an application. By statically analyzing the SQL queries, then you can, in a statically typed language, automatically type-check the parameters to and results of the queries. Combining that and an IDE like the Jetbrains ones, which provide intelligent SQL autocompletion/refactoring, then tbh., I think it would beat an ORM in terms of ergonomics. Some people like LINQ or various Haskell/Scala ORMs because they are type-safe. This would be totally natural to those who know SQL without the downsides of being totally unanalyzed and type-checked.

It would be like the clojure library called hugs, but with static type-checking.

antonrevyako · on March 24, 2020

Look at this, I've returned the ability to export types :) In beta now available renderers for flow and ts. You can see alive how the export results change when you add columns or change constraints. If there are no foreign keys using JOINs, fields can become nullable, etc. I made a simple example: https://holistic.dev/en/playground/5596577a-ad0e-40e6-a05b-e... Remove -- before ALTER and see how the result of the type export will change.

antonrevyako · on March 22, 2020

This is exactly what I did in the beginning :)

But custdev showed that companies do not care about it.

They are ready to take Django/RoR/Laravel developers who know only ORM. Such developers are cheaper, they are easier to hire. Some people also think that it's faster to develop this way :) They prefer to start dealing with problems after they get on production DB.

They shift developer problems to DBA. There is research saying that the price of fixing a bug in production is on average 400 times the price of fixing it at the development stage.

Ok, the boss call the shots :) I had to make a pivot to DBA needs. It was a bit upsetting at first, but then I realized that it was even simpler. You need to do a lot more tools for developers to make them feel comfortable.

A few words about open source. At first, I used this AST parser for PostgreSQL https://github.com/lfittl/libpg_query But it's frozen on Postgresql 10. And we don't know if there'll be any updates.

No official AST parser for MySQL.

There's a hard way to get a bison/ANTLR grammar parser.

Two weeks ago there was a parser like this for MySQL: https://github.com/stevenmiller888/ts-mysql-parser.

There's also a parser from vitess: https://github.com/vitessio/vitess/tree/master/go/vt/sqlpars...

The only problem is that the grammar for these parsers was written by hand and is not related to the official MySQL repository. For example, the vitess parser does not support the syntax of MySQL 8.0, and MySQL 5.7 does not support more than 40%.

There is also such a tool https://www.jooq.org/. It supports some of its own generalized SQL syntax, which does not take into account the specifics of different databases.

Look, there is another tool that uses libpg_query - sqlc: https://news.ycombinator.com/item?id=21765689.

And the author of ts-mysql-parser also offers an analyzer: https://github.com/stevenmiller888/ts-mysql-analyzer There are only four rules at the moment...

Anyway, all hard work starts after you have the right AST parser in your hands. And if you don't spend all your time on it, it will be difficult for you to do something really interesting :(

PS: I removed the types exporting tool from the site a few days ago, along with texts explaining all the advantages of this tool to the developers :) In a couple of weeks, I will transfer it to a separate domain, as a separate project.

I am open to personal communication outside this site, all my contacts are at https://holistic.dev/en/contacts/

ibizaman · on March 27, 2020

Thank you for the write-up that was super interesting. If you don’t mind I have a question about the microservices. What is theit goal? Serving the data from the database? If not, then there is business logic you can’t Possibly generate, so what exactly do you generate?

aksx · on March 22, 2020

i suggested building this at a bigco where i work last year, built a prototype using postgres and wrote the design doc.

sadly it wasn't picked up.

Congrats on product, there is a big market for it

antonrevyako · on March 22, 2020

Could you tell us a little more about your experience, if that's acceptable? Maybe in personal messages, FB/linked?

Ros2 · on March 22, 2020

Does your roadmap include directly detecting against pg_stat_statements? Most of these tools go against the stats tables in the database directly ('SQL Doctor' from Idera comes to mind, that product is very solid)

antonrevyako · on March 22, 2020

Yes, absolutely. You can check requests from pg_stat_statements, slow query log, and any other source as you wish. If the SQL repository stores query in separate files, you can check these queries (e.g., before they are merged into the master).

It is crucial that you add the actual database schema (DDL) before checking.

This functionality will appear soon (client cabinet, API).

Please note that analysis is performed using static methods only! No connection to the database is required. In particular, statistics and other environmental parameters that may affect the query performance are not taken into the analysis. I would like to be able to enrich the data obtained by static methods with data from the database production, but at the moment, this is a very far-reaching plan.

The tool will make recommendations that are necessary, but not enough.

The actual database parameters affect the real query plan. There are a lot of such settings, and the impact changes over time.

Keeping an eye on these parameters and adjust them - there is a significant layer of work that is not a priority at this stage. Excellent tools have already been made for this purpose, such as postgres.ai.

DBA work is divided into two parts - to monitor the quality of queries and database schema and configure the parameters of production servers.

My tool covers the needs of the first part and does not cover the second part.

I talked to product owners of companies producing products or services related to the second part, and they agreed to integrate our products when I come ready.