Hacker News new | past | comments | ask | show | jobs | submit login
Write an SQL query builder in 150 lines of Python (andgravity.com)
141 points by genericlemon24 on Aug 27, 2021 | hide | past | favorite | 103 comments



SQL is an absolutely amazing domain specific language, but people keep building ORMs & other SQL abstractors. I get building one as a learning or resume project. I don't see how it's useful in most scenarios. Like, there's some difficulty breaking up the queries in a logical way (which leads to duplication), so the solution is another library that you need to write or install and learn how to use?

Even if it's easy to use, I can't imagine it'd take less time to read the documentation than to write some SQL. And then there's another dependency; Another thing to check if there's performance issues, another attack vector.


The issue with plain SQL is simply that you can not compose queries at runtime.

For example let user decide which columns to select, in which order to fetch the rows, which table to query etc. Or not even let the user decide but decide based on some config file or the system state.

You end up with a bunch of string manipulation that is fragile and does not compose well.

What solution is there except from simulating SQL via your languages nestable data structures?


This is my middle-ground solution for Python: https://github.com/bdowning/sql-athame

Still fundamentally manipulating SQL text (which is a feature as I don't want to learn a full DSL), but it handles wrangling embedded placeholders while you're composing stuff and some other common compositional tasks. It's worked well for me anyway but I'm under no illusions it'd be right for everyone.

Not an original concept regardless; my original version of this was in Node: https://github.com/bdowning/sql-assassin, but a few years after I wrote that (and mostly didn't use it) I found https://github.com/gajus/slonik which was very similar and much more fleshed-out; I rolled _some_ of its concepts and patterns into sql-athame.


Hey, this has roughly the same philosophy as mine, I might add it to my survey: https://death.andgravity.com/own-query-builder#sqlbuilder-mi...


Someone on Reddit suggested stored procedures, which seems like a good alternative.

Alas, SQLite doesn't have them, so query building it is.


Except if you build a complete query builder as a stored procedure I do not see the problem solved.

Simply stated the problem is: Viewing code as data (in a lisp sense) and transforming arbitrary data into a query that can be executed.


Stored procedures are a nightmare that shepherd your application into an illegible, unmanageable monstrosity.

Stored procedures are the slipperiest slope I've seen as a developer.


In my experience, such problems only occur when people believe that common software engineering practices do not apply when writing stored procedures. Just have your stored procedures version controlled, tested, and deployed like any other code.


The real problem is that you're almost always shifting work from a language that is well known and understood by you and/or your team to one that is less, or even poorly, understood and known, and you end up incurring the cost of novice programmers, which can be a real problem for both security and performance.

If you have good knowledge and experience in the language your preferred version of SQL implements, that's good. If you just have people that understand how to optimize schemas and queries, you might find that you encounter some of the same problems as if you shelled out to somewhat large and complex bash scripts. The value of doing so over using your core app language is debatable.


That seems to assume some very specific "scene" of developers, or at least a team environment where it's impossible to select for developers who know SQL. That may very well be the case in your situation, but I see no reason to expect it would be the case in most situations. SQL isn't exactly a new or niche language.


I'm not talking about plain SQL, I'm talking about the included procedural language that often ships with it[1] as an extension and which is much more DBMS specific.

So, the question is, does your team know SQL/PSM, or PL/SQL, or PL/pgSQL, or some other variant, and how well.

1: https://en.wikipedia.org/wiki/SQL#Procedural_extensions


My team would know the variant that we use, yes.


If your front-end is written in React and your business logic is written in SQL, is it really fair to call what's left in the middle tier a "core app language"? If you're writing a SaaS application today, you're more likely to want to rewrite your Java middle tier in Go than replacing your DBMS.


Not everyone is making a web app, or even something amenable to using React native, and even if they are, there's no guarantee that their middle tier isn't also in JavaScript.

That said, I wasn't making a case about replacing your DBMS. I specifically avoided that because yes, most people stay with what they know and used, and even if they switch, they switch for a different project, not within the same project. There are some cases where multiple DBMS back-end support is useful, but I think that's a fairly small subset (software aimed towards enterprises which wants to ease into your existing system and note add new requirements, and open source software meant to use one of the many DBMS back-ends you might have).

My actual point is more along the lines of:

- Most DBMS hosted languages I've seen are pretty shitty in comparison to what you're already using.

- The tooling for it is likely much worse or possibly non-existent.

- You are probably less familiar with it and likely to fall into the pitfalls of the language. All languages have them, shitty languages have more. See first point.

- If you accept those points and the degree to which you accept them should definitely play a role in deciding to use stored procedure you've written in the language your DBMS provides.

- I think trade offs are actually similar to what you would see writing chunks of your program in bash and calling out to that bash script. People can write well designed and safe bash programs. It's not easy, and there are a lot of pitfalls, and you can do it in the main language you're writing probably. Thus the reasons against calling out to bash for chunks of core are likely similar to the reasons against calling a stored procedure.


You make some good points. I personally have experience with business web applications, i.e. large complex data models with rather simple updates and report-like queries. These types of queries combined with a well normalized data model map well to set-based SQL. Of course, it's a different story if you're writing technical applications or games that are more about crunching numbers than querying and updating data.

To me, the shitty procedural languages you mention are just for gluing queries together. The important stuff happens in SQL and the simplicity of keeping it all in the database is worth it.


This isn’t my experience at all, but I suppose it depends on how you build your applications. I used to be of your opinion, before we moved more and more of our code based from .Net to Python an I was a linq junkie, but these days I think you’re crippling your pipeline if you don’t utilise SQL when and how it’s reasonable to do so.

There are a lot of data retrieval where a stores procedure will will save you incredibly amounts of resources because it gives you the exact dataset you need exactly when you need it.


That's a bit of an absolutist stance I think, a good programmer can use stored procedures just fine and in some cases it can even improve performance. Yes it makes it easier for bad programmers to write bad code, but bad programmers will write bad code no matter what tools they have in their toolbox.


With stored procedures you run your code inside of the SQL server process.

With SQLite, your entire application IS the process, and your SQLite data moves from disk to your app processes RAM.

I think the SQLite model is much better as you get to use your modern language and tooling (which is better than the language used for stored procedures which has not changed in 20 years, and is generally a bear to observe, test, develop).


> simulating SQL via your languages nestable data structures

I've gone down this rabbit hole... you've described the in-house query engine I work with.

Instead of sql, queries are nested json representing a dumbest-possible sql parse tree. It's only a subset of sql features, but it provides, as you point out, composability. By having the query as a data struct, all sorts of manipulations become possible... toggle fields for access control; recovery from dropped or renamed columns (schema has a changelog); auto-join related tables as required; etc.

It's magic when it works but it suffers from the drawbacks noted elsewhere in the thread about traditional ORMS... onboarding new devs, debugging when something goes wrong.

Adding new features to the dsl is a pain because the surface area is not just _the query_ but _all the queries it might form_ when composed with other queries in the system O_o


> It's magic when it works but it suffers from the drawbacks noted elsewhere in the thread about traditional ORMS...

(Not that there are lots of other devs on my project at the moment, but)

I am managing this by aggressively limiting the size / features of my query builder, so it's (relatively) easy to understand just by looking at the code.

If it ever passes that point, it's probably time to switch to a "real" query builder / ORM.


I'm in a similar environment--a very small crew.

I'm curious what feature subset you settled with. Ours got as far as aggregate ops, and the complexity of that turned out to be 'hydra'. (Surely I've cut off the last head!)

By that point we were painted into a corner with the in-house query builder... there was no off-the-rack ORM that had semantics for combining queries. (i.e. meld together query A and query B so that the result cells are union or intersection.)

You seem familiar with the problem space, I'm curious what your experience was. What feature(s) made you draw the line and say, 'when we need it we'll migrate to X'? What are your "real ORM" candidates for X to bridge that gap?


Spark SQL is great for this. SQL strings and data frames code can be used interchangeably, and performance is the same for either type of query. https://spark.apache.org/docs/2.1.1/sql-programming-guide.ht...


I propose that the issue is at the ODBC layer.

We have 2 modes of using that thing.

Tightly bound. Basically you have a bit of query string with question marks in it and you bind out your data points. Either for sending/recieving.

Loosely bound. Here is a totally composed string ready to go just run it. Also a good way to make SQL injection attacks.

Both involve string manipulation.

I think the issue is the ODBC interface does not really map to what SQL is and does. The column binding sort of does. But not table, not where, not having, not sort, etc. So we end up re-inventing something that pastes over that. Building a string to feed into an API that feeds that over the network to a string parser on the other side to decompose it again then runs it and goes back the other way in a similar manner.


Something like C++'s {fmt}, specialized for SQL.

fmt ("SELECT foo, bar FROM mytable WHERE id = {}", id);

'id' gets properly quoted, so that's not a concern anymore.


This is simpler than what gp is asking for, they want the ability to dynamically change which columns are requested too, so change it to "foo, bar, baz", but still do so with reasonable safety.


That's supported too:

fmt ("SELECT {:v}, {:v} FROM {:v} WHERE id = {}", colname1, colname2, tablename, id);

The :v means "don't quote this, insert this string exactly as specified." It also has specialized handling for NULL values (i.e. it can generate 'IS NULL' instead of '= ...' if the value passed on is intended as a NULL value).


Honestly though, how often does this scenario actually happen. Devs love to build for what if, but there's not a ton of scenarios where you need to build completely custom queries based on user input. A nice SPROC with some inputs can do most of that while maintaining performance


Pretty much every web app I've written has had some sort of search feature with a table of results, where the user can filter on multiple columns of the search. That requires either:

a. Dynamically adding "and SomeColumn like @someValue" to the where clause of the query

or

b. Having the query include all filters but ignore them when they are left blank, such as

`where @someValue is null or SomeColumn like @someValue`

The latter approach does allow you get by without string-building queries for filtering, but tends to perform worse if you have a lot of columns you might filter on.

And you still have to play the string concatenation game if you let the user click column headers to change the ORDER BY.

With a decent ORM, like Entity Framework for C#, any of this is very simple and there is no need to write SQL string manipulation code in your application. But under the covers that is what is happening.


Is this just a complaint about a lack of good tooling for SQL code? User-configurable columns are probably always going to be implemented as an array of columns that gets built at runtime and then sent to whatever database management system you're using. In any programming language and database system, isn't the only thing automatically preventing you from building a malformed array of columns at runtime some sort of type-checking or linting? It seems to be that this mistake is just as easy to make using an ORM or data access library without type-checking or linting.


The way LINQ in C# works allows a good amount of type-checking to be done on the expressions fed into the ORM. So you can compose queries in a much more sensible way than string concatenation. The only downside is that LINQ makes it easy to express certain things that SQL makes hard, so you can sometimes write LINQ queries that turn into monstrous SQL.

I authored a library for F# that did static typechecking of SQL in strings at compile time. So it could tell you if you, for example, mistyped a column name or selected a column that didn't exist on a table or passed a SQL function the wrong number or type of parameters, etc. That was nice, but it still sucked for trying to compose queries or dynamically add filtering/sorting to a query.


The main place I've run into this is a table with sorting, pagination and filtering while lazy loading data and I encounter this need a lot.

With some runtime query builder most databases will have decent performance for the space between "too much to reasonably load in memory" to "we need a dedicated query service like elastic". Unfortunately taking the example of sort by X then by Y where X and Y are dynamic I don't know of a nice solution in SQL only in MSSQL 2012 or MySQL 5.


I think you are playing into the comment, though. You'd be surprised how far you can go with a table that doesn't support sorting and filtering. Basic pagination can often be foregone, if you can instead resort to offering export and let them use whatever software they are used to for this purpose. (Pagination is also just easier if you forego custom sort and filtering.)

At some point, yes, you will want these things. But the basics can get you really far.


For sure you can offer a good approximation through SQL but I'm unlucky enough that I've needed to support dynamic sorting and query builders against tables at least 4 times for different datasets in the past years.

I'm particularly proud of the last one I built that still absolutely flies despite doing wildcard querying on text fields (no full text index) with multiple combined queries and a dynamically computed status from aggregate child properties in MySQL 5 with its lack of window functions. The solution was basically dynamically construct an efficient as possible inner query before applying the joins.

To clarify I believe in doing as much as possible in raw SQL and have a fondness for the stored procedure driven system I worked with but this problem remains unsolvable in SQL alone and I just keep encountering it.


It would be easier to do that with a template engine.



I work for an enterprise in-house web application, where over time we have a ton of tables, like over several hundred unfortunately. Writing queries for each any every table and manually mapping those to objects sounds exhausting. EntityFramework helps a lot with that, for CRUD operations.

Additionally, I think unit testing would benefit from an ORM as well. Instead of calling an actual database, you can easily have an in-memory database with an ORM with mocked data. Though, I suppose that is technically possibly with straight SQL queries as well, but I imagine it's easier with an ORM.


The main problem with in memory databases is that they are not the same database. I've had several issues which I had to find a fix that worked on one database and was legal syntax on the other. Say maybe one database allows force index but in memory does not.

I've even seen one database layer for in memory and a completely different one for real database. That means that your unit tests are not really testing your production code


You can also make the same argument about ORM's mock DB layer.


How would it be easier with ORM when switching to in-memory database for standard SQL is just changing connection string?


My main argument for ORMs is the mapping between SQL columns and your data model; having to write out SQL and code to load an SQL result set into your data model feels like a lot of duplication. And it gets a lot more difficult when you have more complicated data models spanning multiple database columns - that is, if your data model is like a document that has to be inserted and updated into a set of normalized database tables.

That said, if that wasn't as much of an issue, or if I still had the patience and typing skills to write a lot of boring code, I wouldn't object too much. I'd also treat SQL as a first class citizen, so dedicated .sql files that the application will load or embed and use. That way, the SQL can be developed, maintained and run outside of the context of your application - editors can execute them directly to a (local) development database.


What if your SQL is your data model? I believe object-oriented design is forcing us to do a lot of duplication for very little gains, just like it did on game design and the industry started moving towards data-oriented design and entity-component system.

The sooner we move away from the big lie that is DDD, the more productive we can be.


I think if sql is your data model, then it’s a good case for a query builder.

But at my company, our in house app is written using django but treats the django model asan interface to elasticsearch, Postgres, and redis.

The django model is sub optimal for sql, but it’s a good base for interacting with the system as a whole.


Agreed, lots of comments here saying that people don't want to learn SQL, I don't buy that argument. I think most people already know SQL because at some point it's inevitable, and it's also not that hard.

The value of the ORM is what the name implies: mapping your data from rows into something convenient to use in your language.

What people are really complaining about is ORMs that needlessly try to re-invent SQL, but they don't all do this. Someone else brought up Ecto, it's a brilliant ORM because the resulting SQL is always obvious.


I feel a lot of arguments for ORMs boil down to "I can't really be arsed to learn SQL properly" which is fair enough from a developer time / performance ratio point of view, but not from a purely technical point of view.

Like you say, I find there's way more mental friction dealing with new ORMs and wading through the documentation than there is in just going straight for raw SQL. There's less "magic" that way too which is always nice when you're debugging or optimising things. Different strokes for different folks I guess.


I feel the opposite, the arguments against ORMs largely boil down to "I can't really be arsed to learn SQL properly." ORMs are a leaky abstraction by nature and attempting to use one without understanding the underlying technology will only lead to pain and a developer with grudge against ORMs.

I think ORMs compliment SQL, not replace it. Without ORMs, you end up getting a lot of clunky boilerplate trying to do simple CRUD operations to your data. It's a bit like Greenspun's 10th law: "Any sufficiently advanced RDBMS-backed program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an ORM."


Ymmv, but I like the both/and approach. I’ve seen high quality codebases where the benefits of both are leveraged: ORM for simple queries, idiomatic code reuse; raw sql for complex or performance critical queries. Being smart about when to use which is the key, but I have seen it work very well.


This is what I think is the reasonable take on ORMs. If you're using one because you don't wanna learn SQL, that is a bad reason that will most likely result in more work in the long run.

But if you're using it to increase productivity for straightforward database calls, it can be a useful tool in a lot of scenarios.


>I feel a lot of arguments for ORMs boil down to "I can't really be arsed to learn SQL properly"

I learned SQL almost 30 years ago and used it in Oracle/Informix/Sybase/IBMDB2/MSSQL/MySQL/SQLite/etc and have written exam questions on SQL for outer-joins, nested subqueries, crosstab, etc.

All that said, concatenating raw strings to form SQL that has no compile time type-checking is tedious and error prone. Thankfully, my "expert" knowledge of SQL does not blind me to the benefits of what a well-written ORM can do for productivity. (E.g. results caching, compile-time validation, auto-completion in the IDE, etc)

The real technical reason why ORMs persist as a desirable feature is the decades-old technical concept of "working memory of field variables" that mirror the disk file table structure.

In the older languages like 1960s COBOL, and later 1980s/1990s 4GL type of business languages such as dBASE, PowerBuilder, SAP ABAP... they all have data tables/records as a 1st-class language syntax without using any quoted strings:

- In COBOL, you had RECORDS and later versions of COBOL had embedded SQL without quoted strings

- dBASE/Foxpro had SCATTER/GATHER which is sort of like a built-in ORM to pull disk-based data from dbf files into memory

- SAP ABAP has built-in SQL (no quoted raw strings required) with no manual typing of variables or structs to pull data into working memory

The issue with "general purpose" languages like Python/Javascript/C++ is that SQL is not 1st-class language syntax to read database data into memory variables and write them back out. The SCATTER/GATHER concept has to be bolted-on as a library ... hence you get the reinvention of of the "db-->memory-->db" data read/write paradigm into Python via projects like SQLAlchemy.

After tediously typing out hundreds of variable names or copy-pasting fields from SQL tables via strings such as ... db.execquery("SELECT cust_id, name, address FROM customer WHERE zipcode=12345;")

... it finally dawns on some programmers that this is mindless repetitive work when you have hundreds of tables. You end up writing a homemade ORM just to reduce the error-prone insanity.

Yes, a lot of ORM projects are bad quality, with terrible performance, bugs, etc. But that it still doesn't change the fact that non-trivial projects with hundreds of tables and thousands of fields need some type of abstraction to reduce the mindless boilerplate of raw SQL strings.


I feel like there should be a distinction between full-blown ORMs and just query builders.

I dislike ORMs that hide database operations from the application code by pretending that in-memory application objects are in concept interchangeable with persistable objects; though just saying that hurts a bit because I don't want to think in terms of objects, but in data.

Query builders are fine if they allow you to build efficient queries in a type-safe and composable manner to get data in and out of the database in the form your application requires at the site of the database query, but I don't want to be forced to pretend I fetch "User" objects if all I really need from the database are the name and e-mail.


I agree with your point that adding multiple layers = more attack vectors and abstraction of a really good domain specific language. But what seems to happen on most of the projects I work on is we end up hiding away extremely common logic behind helper functions. It always starts off with SQL and then slowly gets moved into higher level functions that offer a better developer experience.

Shameless plug, but I just posted a library I wrote (for node https://github.com/vramework/postgres-typed/blob/master/READ...) which pretty much is a tiny layer ontop of pg-node (which is query based / with value parameters) and provides small util functions with typescript support derived straight from postgres tables.

In an ideal world (one I think we are getting very close to) I think we will end up having SQL queries validated in our code against the actual DB structure the same way we have any other compilation error. But until then we'll need to rely on tests or helper libraries, and for the purpose of refactoring and development I find the latter more enjoyable (although still far from perfect).


> In an ideal world (one I think we are getting very close to) I think we will end up having SQL queries validated in our code against the actual DB structure the same way we have any other compilation error.

This library for TypeScript works exactly like this

https://github.com/MedFlyt/mfsqlchecker


>SQL queries validated in our code against the actual DB structure the same way we have any other compilation error

Good point about validation. The ideal scenario you propose would indeed be ideal.


F# (and surely there are others) offers type providers, which can reach into your db schemas at compile time and typecheck linq-ish or even raw sql strings and the subsequent uses of the result set. But in the simplest case it makes compilation a function of not only source files but also db schema state. You can extract schema defs and point the type provider at the "cached" defs, but make sure rebuilding those is part of the build process.


Yeah, everyone has a strong aversion to doing it but it's kind of self perpetuating. Once you get comfortable writing SQL to do everything, the ORMs feel like a frustrating game of trying to get the library to generate the SQL you have in mind.


if you want to write one SQL query, you should not use a SQL builder and/or ORM.

if you want to write thousands of SQL queries that naturally derive from an object model of 30, 100, or 800 domain classes or so and would otherwise be thousands of lines of repetitive boilerplate if all typed out individually, then you use a query builder and/or ORM.

different tools for different jobs.


Sometimes you need dynamic SQL, because the DB doesn't have stored procedures you can use for the same purpose (e.g. SQLite).

I talk more about this use case here: https://death.andgravity.com/query-builder-why#the-problem


I always like to bring up Ecto. Really nice sweet spot IMO.

  query = from u in "users",
            where: u.age > 18 or is_nil(u.email),
            select: %{name: u.name, age: u.age}
  
  # Returns maps as defined in select
  Repo.all(query)
https://hexdocs.pm/ecto/Ecto.html#module-query


This all day. I teach intro Web App stuff (meaning I've never built anything huge) and it's just painful to be like:

Hey everyone, look at this language with this incredibly intuitive and simple syntax, perhaps the most "easy to read and understand" thing you'll see in computing...

...and now here's a bunch of painful stupid garbage that you'll have to learn about, because industry has accreted a bunch of crap around it to make it turn on and work.


benefits of ORMs:

  - allow certain apps to work with any sql dialect (though tbh I question the value of this in the age of docker)  
  - makes simple selects easier  
  - enables plugins for web libraries
I think the 3rd thing is the most valuable. If you're in an ecosystem like django / rails, you can consume plugins that integrate with your sql backend and your schema more or less automatically.


> tbh I question the value of this in the age of docker

Haha. In the world outside of HN, big companies enforce specific database servers for reasons such as centralized monitoring, security, compliance and competence. Setting all this aside because of "docker" is classic HN.


I built one because I wanted to generate all my SQL scripts from a graphql schema.

Handjamming SQL where statements for arbitrarily deeply nested types is insanity.


Doing it the other way around makes more sense, i.e. write SQL to generate code. See https://github.com/kyleconroy/sqlc, why aren't more people following this approach?


This is awesome.

Composing a SQL query is often the easiest part. The issue is that throwing that static SELECT statement string in your code has a ton of downsides (as discussed everywhere in this thread).

And fiddling with an ORM is tremendously frustrating when you're just trying to get it to compose the SQL you've already written in your head.

For a dev who's bothered to learn SQL, flipping the thing on its head makes a ton of sense.


I've written a lot raw SQL in the dialect my language supports. This is fine for static queries. When queries are dynamic (maybe they come from an admin panel or some other part with lots of optionality), static SQL isn't enough. You then get to do one of three things: ad-hoc string manipulation, rely on a query AST->compiler (like SQLAchemy's core or HoneySQL), or an ORM. ad-hoc string manipulation is a real security and reliability nightmare, and if you don't have a good standalone query AST to SQL compiler, an ORM is the next best thing.


> ad-hoc string manipulation is a real security and reliability nightmare

Parameters.

A couple of examples:

https://www.psycopg.org/docs/usage.html#passing-parameters-t...

https://www.php.net/manual/en/mysqli.quickstart.prepared-sta...


Parameters have zero bearing on whether you should dynamically construct SQL strings. If parameters can solve all of your problems, you don't have a dynamic SQL query, you have user-submitted values in a WHERE clause.

What can be parameterized depends wildly on the SQL database in question. I haven't used one that could parameterize table names (for use in dynamic JOINs or CTEs) and many cannot parameterize values outside of a WHERE clause. Dynamically selecting which function to call, clauses to add or subtract, and sort orders are just a slice of places parameters don't help.

In short, parameters alone do no eliminate the need for a query builder. A good query builder should appropriately parameterize values as the underlying database supports and hopefully uses a type system or validation to constrict the domain of values it uses to construct parts of the expression that cannot be parameterized.


How do you parametrize things like dynamic joins, where clauses, field selections, or aggregations without string manipulation or gross duplication?


You can't order/sort by parameters.


you're missing the forest for the trees. this is a good python tutorial for somebody at my skill level


Hey, thanks for that!


Your point is valid, it make sense just speak SQL to DB in some scenarios . A good ORM will enable people to do that easily as well.

However, good orms like LINQ in C# really shines in a lot of places.


Wee nitpick, LINQ isn't an ORM, it's a querying API. Maybe you meant the original LINQ to SQL ORM which was deprecated in favour of Entity Framework (EF).

> A good ORM will enable people to do that easily as well

I agree. EF isn't a terrible ORM and covers many use cases developers encounter when interacting with a database server in your average line of business app. But you do need to keep an eye on the SQL it generates with more complex queries i.e. beware of cartesian hand grenades. Fortunately MS provides some guidance on this kind of thing, but sadly many developers often don't bother to read the docs, and this is why ORM's get a bad name:

https://docs.microsoft.com/en-us/ef/core/performance/efficie...


Ah, good ol' SQL injection attacks... If the author is reading, please do add a disclaimer mentioning that the code would be vulnerable to SQL injection attacks.

From a purely learning point of view, it is indeed an interesting read, however.


(author here)

Yup, good thing to note, thank you!

I've kinda assumed everyone knows about the named parameters some bindings offer, but that's probably not true. Will add a disclaimer.

In the previous articles, I do talk about how this is an upgrade from plain SQL, not from regular full-blown query builders / ORMs.


Definitely not true, I've encountered plenty of SQL users at work who never heard of injection attacks or parameterized queries. Some of them even built some ad-hoc query builders to replace some of their own repetitive queries. (Note that parameterized queries alone are not sufficient: often people would try to parameterize table or column names.)


> I do talk about how this is an upgrade from plain SQL

This statement is about functionality / ease of use, which is fairly orthogonal from (preventing) SQL injection: with plain SQL it's perfectly possible to avoid injection attacks, in fact that's probably the most common and easiest way to do it. In that sense, if anything this is a downgrade from regular SQL.


> In that sense, if anything this is a downgrade from regular SQL.

It likely is.

Ideally, instead of dynamically building queries, one would use stored procedures.

I'm using this with SQLite, which doesn't have stored procedures, so it's an acceptable downgrade for me. Appending strings to lists gets messy quickly, example: https://death.andgravity.com/query-builder-why#preventing-co...


> query.SELECT('one').FROM('table')

I don't really like this API. SQL is weird because it's written backwards. What does `query.SELECT("one")` on its own represent? A query on any table that happens to have a field called "one"? I know you're not trying to build an ORM, but `Table("table").select("one")` makes a lot more sense for an API since the object `Table("table")` actually has a purpose and would make sense to pass around etc.


I concur, despite the downvotes. This is also why the select is last LINQ to SQL [1] queries:

    var companyNameQuery =
        from cust in nw.Customers
        where cust.City == "London"
        select cust.CompanyName;
[1] https://docs.microsoft.com/en-us/dotnet/framework/data/adone...


> What does `query.SELECT("one")` on its own represent?

data['SELECT'].append('one')

I'm intentionally trying to not depart too much from SQL / list-of-strings-for-each-clause model, since I'd have to invent/learn another one.

For a full-blown query builder, I agree Table("table").select("one") is better.


also select is really project (and where is select), but that's the battle for another day.


I highly recommend pypika by Kayak: https://github.com/kayak/pypika

Have used in multiple projects and have found it's the right balance between ORMs and writing raw SQL. It's also easily extensible and takes care of the many edge cases and nuances of rolling your own SQL generator.


Yup, after my own initial research I found it as well, and I like it a lot. I talk more about it here: https://death.andgravity.com/own-query-builder#sqlbuilder-py...


> and takes care of the many edge cases and nuances of rolling your own SQL generator

Want to elaborate?


For one, it can output more than one flavor of SQL: https://pypika.readthedocs.io/en/latest/3_advanced.html#hand...

Since SQL is ever-so-slightly different across databases, I imagine trying to cover all of them as a single dev is a nightmare (especially if that's not the problem you're trying to solve).

I wrote my own query builder because I know for sure I'm only targeting SQLite. The second I need my feed reader library to work with another database engine I'm dumping my own for something more serious – either a full blown database abstraction layer like SQLAlchemy or Peewee (likely without the ORM part), or something simpler like PyPika or python-sql.[1]

[1]: I talk more about them here: https://death.andgravity.com/own-query-builder#sqlbuilder-py...


Many SQL abstractions substitute SQL keywords (e.g., "SELECT") for a fluent interface function (e.g., "select"). The query builder and output tend to be language-specific.

Several years ago, I prototyped a relational mapping language based on XPath expressions. The result is a database-, language-, and output-agnostic mechanism to generate structured documents from flat table structurse. Although the prototype uses XML, other formats are possible by changing the RXM library.

The queries might be bidirectional. Given a document and the RXM query that generates the document, it may be possible to update the database using the same query that produced the document. I haven't tried, though.

https://bitbucket.org/djarvis/rxm/src/master/


Interesting to see what goes into an ORM library - but as others note, in my experience learning SQL ends up being better. The things that ORMs make easy are already pretty straightforward, and when you get to more advanced queries, the ORM ends up getting in the way and/or in order to use the ORM properly you have to understand SQL deeply anyways.

For learning SQL, my favorite resource to get started:

Become a SELECT star! https://wizardzines.com/zines/sql/

followed by

The Best Medium-Hard Data Analyst SQL Interview Questions https://quip.com/2gwZArKuWk7W


This reminds me my older project called xql, for node.js: https://github.com/jsstuff/xql

and fiddle: https://kobalicek.com/experiments/fiddle-xql.html

Interesting how all these builders look the same :)


What are you doing to handle more complicated WHERE clauses, such as WHERE last_name = 'Doe' AND (first_name = 'Jane' OR first_name = 'John')?


That's what the fancier __init__() at the end of the article[1] is for :)

Here's the tl;dr of a real-world example[2]:

    query = Query().SELECT(...).WHERE(...)
    for subtags in tags:
        tag_query = BaseQuery({'(': [], ')': ['']}, {'(': 'OR'})
        tag_add = partial(tag_query.add, '(')
        for subtag in subtags:
            tag_add(subtag)
        query.WHERE(str(tag_query))
It can be shortened by making a special subclass, but I only needed this once or twice so I didn't bother yet:

    for subtags in tags:
        tag_query = QList('OR')
        for subtag in subtags:
            tag_query.append(subtag)
        query.WHERE(str(tag_query))

[1]: https://death.andgravity.com/query-builder-how#more-init

[2]: https://github.com/lemon24/reader/blob/10ccabb9186f531da04db...


I've built SQL query builders to get things done behind the scenes a few times. Users just love being able to search any field, and get their results.


this is a good guide for an intermediate-level python developer. thanks.


Glad you liked it!


Please God no...don't do this.. This is exactly how security vulnerabilities are introduced. Validate input and use parametrized queries over something like this...


It's not either-or; I'm using the query builder to build _parametrized_ queries.

Here are two examples:

https://death.andgravity.com/query-builder-why#introspection https://github.com/lemon24/reader/blob/10ccabb9186f531da04db...


Or just use sqlalchemy...


I wrote an entire article about why I didn't do that: https://death.andgravity.com/own-query-builder#sqlalchemy-co...

In general, SQLAlchemy is an excellent choice; in my specific case (single dev with limited time), the overhead would be too much (even with my previous experience with it).


A curious question: why not Python's built in Sqlite3 package [0]?

[0] https://docs.python.org/3/library/sqlite3.html


I am using exactly that; the query builder goes on top of it :)


That’s a great article and a good way to go.


Thank you!


Exactly. Because nobody should ever use their dev skills to build up their understanding of a tool by recreating some aspect of it.


Exactly. Why bother when you can just write another to-do list manager.

Seriously, it’s just a joke. Chill out.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: