Hacker News new | past | comments | ask | show | jobs | submit login
Why you should learn SQL (executeprogram.com)
69 points by arkj on Aug 4, 2021 | hide | past | favorite | 135 comments



I used to love ORM, I used it everywhere. Writing another language in the language I am coding is wrong. ORM simplifies my programs.

No, ORM does not simplify coding! It's a big complex adapter which does not fit many cases.

RDBM itself is complex enough, let's put another complex abstraction above it so we can forget about the tables and columns and joins and foreign keys. Complexity added upon another complexity does not make a polished interface.

If you want to use ORM, you have to learn both SQL and ORM with some depth at least. When problems occur, you have to debug both.

20 years in, I am still learning about RDBM. I still hate to compose SQL in code, but there is no better way.

Databases are used in every situation. Want to store data? Database. Want to communicate between processes? Database. Want to store logs? Database. There are different solutions for different scenarios, but as long as the DB chugs along, why? Sometimes it's sickening.

So 20 years in, the ROI of learning SQL is great. Please don't avoid learning about RDBM by using an ORM. Just learn it.

Actually, I'm not without ORM now a days. Light-weight ORM like Dapper serves well, it does not abstract out the concepts of RDBM, but make things easier.


> Writing another language in the language I am coding is wrong

That's not what's wrong. But rather writing in some language within a string literal is what feels wrong to you.

With better integration it'd be natural. The problem is, of course, there's no single SQL dialect.


" I still hate to compose SQL in code, but there is no better way."

The one big problem with SQL is intermixing query language and query parameters. This requires escaping and is the source of many PHP vulnerabilities. Instead of

    do_sql('SELECT * FROM tab WHERE name = "abc"')
there should have been a standard where you wrote

    do_sql('SELECT * FROM tab where name = V1", "abc")
or something like that. So you don't do string operations on the query parameters.


Aren’t you referring to parameterized/prepared queries? Any proper DB driver/library should have this feature.

    do_sql('SELECT * FROM tab where name = ?", "abc")
https://www.php.net/manual/en/pdo.prepared-statements.php

I recall getting curious when I was younger where the deranged advice of “sanitize your queries” came from — realizing that you should be able to simply tell the database that this is a string, not part of the query itself. SQL injection should barely exist as a concept, let alone be the #1 web vulnerability.

IIRC it turned out MySQL supported parameterized queries for ages, but the stdlib php MySQL library just didn’t add support for it. This discovery solidified my understanding that PHP has been giving developers brain damage for decades.


Unless you're using hardcoded values, using bind variables is the only proper way. Otherwise, the DBMS has difficulty recognizing the same query pattern for performance, and you're open to injection vulnerabilities for security.

https://www.databasestar.com/sql-bind-variables/

The big problem is the shitty workplace environment of today that discourages sharing, learning and proper software development.

A proper build-pipeline will include security scanners that detects these vulnerabilities automagically.


I don't know if it's a standard but both Postgres and MySQL have PREPARE statements, and any sane language driver will implement it correctly


If you use IntelliJ IDE in their commercial version (at least of PyCharm) they integrated DataGrip. Basically if you connect to your database and give an option to fetch your schema the IDE starts scanning for strings and if it detects SQL it provides IDE features to it as well (like auto completion, and some refactoring etc).

I think ORM and query builders were trying to hack around to make IDEs understand SQL, when in reality this approach is what actually was needed.


No, ORM provides an inspectable central entry point for your models with a standardized API. That's the feature.

That's why you get a great django ecosystem: the ORM abstractions allow all libs to rely on the fact the rest of the code access the model the same way.


The issue is that if you try to squeeze a relational model into object oriented model you won't get an efficient solution.

Solutions like django might be good when you're starting the project or it is something very simple, otherwise you'll have to fight with it to get something done more efficiently.

My point is that using plain SQL (I personally prefer asyncpg as it provides interface matching postgresql) is actually also easy. Especially if you have IDE that supports it.

I also realized that with this approach I rarely need to even transform the data in any way. Usually whatever I want to do I can get in a single SQL statement (even for things that have some hierarchy, thanks to aggregation functionality). So in the end the function just gets data and displays it.


Sure if you don't need to build an ecosystem, don't need introspection and can forgo integration, sql is indeed easy.

Also, SQLA proves that an ORM doesn't have to prevent you from getting an efficient solution, you just have to offer several layers of granularity.

Eventually, it's not an or proposition. In django you do use raw sql when you need so. but your auth system doesn't, and a plugin will solve it for you.


One nifty use-case for ORMs is decoupling the logic from the data.

I can prototype in Python/SQLAlchemy/SQLite and deploy to Python/SQLalchemy/PostGreSQL via a pipeline with confidence that I'm actually managing the complexity.

Cracking open some legacy code with vast swaths of embedded SQL is a source of much weeping and gnashing of teeth when the time comes for maintenance.


I did this for the longest time, testing with SQLite and running with PostgreSQL, until I discovered that this was completely unnecessary extra complexity. It turns out that setting up and starting a fresh PostgreSQL database takes about one second, so it's just as easy to just spawn a Postgres for the unit tests. And now I get to enjoy some immensely useful Postgres-specific SQL syntax like "FOR UPDATE SKIP LOCKED".


> Writing another language in the language I am coding is wrong.

Yeah but you're writing your strings in english anyway.


I've spent most of my career hoping that something - anything that's better than SQL will come along and replace it. It's like FORTRAN, except FORTRAN has had the decency to stay in use where it's really the best choice. But SQL is out there, like Clippy. "Hey, I see you're collecting some data. SELECT TRUE FROM HELP WHERE COLLECTING_DATA IS TRUE


> I've spent most of my career hoping that something - anything that's better than SQL will come along and replace it.

So, if we put aside Clippy jokes and so on, what's your problem with SQL exactly and what does "better than SQL" mean to you?

Because what I saw in the last couple of decades is the NoSQL movement lose air, and half the products representing it adding some sort of SQL dialect support, which if you think about it is hilarious admission of defeat.

SQL is exceptionally good at being a relational algebra language, and if you think in terms of relational algebra, you might find out there's nothing wrong with it, aside from superficial remarks like "FROM should come before SELECT".


There's a lot wrong with SQL. Chris Date [1] wrote several books comparing SQL to a more pure implementation relational algebra, Tutorial D. In Date's words:

> SQL is incapable of providing the kind of firm foundation we need for future growth and development. Instead, it’s the relational model that has to provide that foundation. [...] We see SQL as a kind of database COBOL, and we would like to see some other language become available as a better alternative to it.

Summarizing Date's points:

- Real relations don't contain duplicate tuples but SQL tables allow duplicate rows.

- Tuples and therefore relations don't ever contain nulls. Personally, I have trouble understanding how to support this limitation given LEFT JOIN.

- Domain types only provide type aliases not a new type. For example, you can compare or join on different domain types if the base type is the same. Date uses the example of `p.weight = sp.qty` to show a comparison on domain types that shouldn't be allowed.

- SQL has many ways to express the same query. His example shows at least 12 ways to answer "Get part numbers for parts that either are screws or are supplied by supplier S1, or both."

- SELECT DISTINCT should have been the default instead of SELECT ALL.

My personal gripes:

- SQL is quite chatty, so much so that you can omit words in some incantations.

- It's hard to dynamically build queries. It composes poorly for anything dynamic, even simple things like ordering by a column name.

[1]: https://en.wikipedia.org/wiki/Christopher_J._Date


Some of Dates gripes are valid, but some are quite questionable.

In SQL you enforce uniqueness of rows by defining a primary key. Dates concern seem to be that you can have duplicate rows if no primary key is defined, but why would you do that in the first place? A table should always have a primary key.

His gripe against nulls are controversial - E.F.Codd suggested nulls himself, so when Date claims null have no place in the relational model he is just stating a personal opinion. And while nulls are kind of weird, his suggested alternative is far worse.

His complain about comparing different types is valid IMHO. SQL is weakly typed and types are silently converted. I think it would be be much better if it was strongly typed and values has to be explicitly converted when e.g. comparing a string to a number. The issue about custom types is a consequence of SQL being weakly typed.


What's his alternative to null in a nutshell?


Sentinel values of the same type, e.g. -1 to indicate a missing integer.


Ouch, yeah that's bad.


It seems a lot of those things you list as "wrong" come from lack of understanding of the reasons behind these choices, and born out of pure idealism in vacuum.

For example, allowing duplicate rows is irrelevant, because if you have a primary key, there are no duplicate rows. SQL doesn't require primary keys because relational algebra has no such concept as a "primary key". There are just keys. However without defining a primary key, enforcing unique rows would mean the database silently indexing the ENTIRE ROW'S CONTENTS, including potentially blobs and large text fields. This would obviously be nonsense.

Likewise, having SELECT DISTINCT be a default would mean a very expensive processing step in your query processing being a default. An expensive step that in fact doesn't matter, because the vast majority of queries don't produce duplicate results in practice. DISTINCT is optional because the need for it is exceptional, and its cost is high.

Even you don't buy the "have no NULL" argument. So I don't have to defend this. Real-world data is not perfectly "rectangular". Optional attributes are a thing. So having a primitive for it makes sense. Once again, SQL allows you to define a field as non-nullable, so complaining about it being there if you EXPLICITLY WANT IT is silly.

Regarding having many ways to express the same query: that's true for all languages. This is one of the biggest issues in optimizing compilers, canonicalizing expressions so patterns can be recognized. There's no way to do it at the source, so complaining SQL also does it, is like shouting at clouds.

Regarding the type system, type systems can always be better, but let's not forget type systems (typically) exist to eliminate mistakes, not to enable new features. The kind of mistake where you compare "weight and quantity" is not likely.

More subtle errors are possible, like comparing metric and imperial measures, but since SQL is often used in tandem with a system's language (Java, C++) or a script, all this domain logic is offloaded to them and their type system. Even if SQL had a detailed type system, therefore, most people wouldn't bother duplicating their detailed type definitions from Java to SQL or back.

The only remaining issue is SQL is chatty. Which is quite ironic given the procedural code for what SQL does would be several times the size of the SQL query. SQL is a high-level language, and it being explicit is fine. And few extra letters here and there for a keyword don't make or break a language. I prefer chatty over cryptic.


Totally agree with your post but wanted to add on to this part:

> More subtle errors are possible, like comparing metric and imperial measures, but since SQL is often used in tandem with a system's language (Java, C++) or a script, all this domain logic is offloaded to them and their type system.

PostgreSQL actually supports user-defined types[1]! You could do something like define [2] a “kilogram” type and a “pound” type and then summing or joining on the column is safe.

That could still get sticky with grams and kilograms. Also, pounds are defined in terms of kilograms. So you could also define a “weight” type. This can have semantics like DATE[3] or BOOLEAN[4] so inserting “1kg”, “1kilogram”, “1000g” or “2.204623lb” all store the same value.

You can then use (define) format functions like TO_CHAR(weight, string) to display weights in grams, lb or whatever you need.

Of course, the other argument here is that all you really need are the primitive types and normalization. A “weight” table can just have a “unit”::string column. You can do conversions with another table such as “from_unit”::string, “to_unit”::string and “multiple”::float. Your “product_weight” table would then just have foreign key relationships to the weight and conversions tables.

[1]: https://www.postgresql.org/docs/current/xtypes.html

[2]: https://github.com/df7cb/postgresql-unit

[3]: https://www.postgresql.org/docs/13/datatype-datetime.html#DA...

[4]: https://www.postgresql.org/docs/13/datatype-boolean.html


When I heard "NoSQL", I thought we were getting an alternative query language for working with relational data, not throwing away the entire concept of RBMS's.


Every generation wants to rebel about something.

Some rebel about really silly things, such as RDBMS.


To borrow a turn of phrase, the RDBMS is the best database technology going today, which is to say, it's the marginally acceptable one among the set of complete bagbiting loser database technologies that we have to work with out here in the real world.


> half the products representing it adding some sort of SQL dialect support, which if you think about it is hilarious admission of defeat.

I don't see that as an admission of defeat. I see that as an attempt to provide some of the things that are good about SQL without dragging in the entire hairball of the language. Along the way, by choosing which subset of SQL to emulate the designers get to incorporate programming concepts that have come along since SQL. Things like user-defined ADTs, lambda expressions, and extensibility. Oh wait, LISP had all those before relational algebra was a gleam in the eyes of Date & Codd.


- actual abstract data types

- object (or rather, tuple) identity

- replace null with something to indicate missing or unknown. Have sensible null object behavior

And the big one that gives every relational algebra purist fits: make graph theory a first-class citizen of the data model.


It's super verbose, doesn't compose well with the host language primitives, is hard to introspect, full of gotchas and very low level.


Databases have no "host" they have "clients" which are heterogenous. So directing this critique at SQL is completely unfounded.

.NET added LINQ and now basic queries are integrated. It was not up to SQL, it was up to .NET

And calling it "very low-level" makes me question if you understand what SQL does and is. It's probably the only mainstream 4th generation language I can think of.

https://en.wikipedia.org/wiki/Fourth-generation_programming_...


Host language, as in the language of your app that embeds some sql code, not host related to the db. The host language may or may not be the client language.

It's low level because it lacks abstraction and anything outside the data query declarative paradigm is subpar. It has no notion of anything outside of itself, so it's a very restrictive inflexible dsl living in it's own bubble. And even worst, it lacks elegant abstractions for problems we've have had for decades, such as pagination, hierarchy, etc

Also, using linq as an example rather shows your lack of understanding of the field of abstractions for sql. The state of the art ORM is currently SQLA, the leading innovation to replace sql as a dsl is edgeql, both are addressing sql pain points while aiming at allowing you to do all of what sql could do.


I think you use "low level" in a different sense than is common. Low level (and high level) typically refer to abstraction level. SQL certainly have a very high abstraction level. But given it is a domain specific language it doesn't have much support for functionality outside of its intended domain (except through non-standard extensions). But a kitchen-sink language is not automatically higher level than a highly focused DSL - typically it is the other way around.


In what way is SQL "low level"? I would consider it a very high level DSL.

Agree about the verbosity. The syntax is quite clunky.


Higher level than C, sure. But compared to modern programming language, it lacks primitives, important abstractions, asks you to do many things manually and have way to many gotchas to be on par of what we consider high level in 2021.


SQL may look ugly but it got the fundamentals correct.

It's sort of like Git in that way.

SQL is hard to replace because most replacements are either not radical enough to make migration worth it or so radical that it's something else entirely.


The problem with SQL is that it doesn’t have the fundamentals correct — it doesn’t compose well, the language is an arbitrary hodgepodge of keywords, it conflates data administration with data manipulation/retrieval, it runs a ternary logic under boolean terms — leading to both potential for false negatives and false positives, it lacks a real standard (it has one, but it standardizes very little, and gets extended arbitrarily), it lacks any useful standard for libraries — causing your choice of DB to essentially also be your choice of standard library.

As a language, SQL is a downright mess. The only thing it really got right was being based on the relational algebra and model.

It’s also why language tooling is utter garbage is the DBA world. The lack of real Autocorrect/suggestion, code formatters, useful error messages, static typing (the tables & views are strictly typed, but your queries can’t be?), the lack of interoperability, etc all stem from how awful SQL is at being a programming language. Jumping from C# to SQL is like viewing a portal into the 80’s — it’s ridiculous. Hell, the standard itself is behind a damned paywall. If that’s not a signal of something going horribly wrong, I don’t know what is

What I’ve never understood is why open source databases don’t have alternative frontends to SQL — where’s my first-class support for datalog in Postgres? Or even just a SQL dialect that lets me put FROM before SELECT so autocomplete can do something useful for once?



Nobody else has invented any other declarative language it seems.

Come on aspiring post-docs, do your thing and launch a new programming language.

If MIT can launch Julia someone can launch a language that's not SQL.



Hmm.

It's read only, but holy shit please become popular and displace non-ANSI SQL in my life.


> Nobody else has invented any other declarative language it seems.

Postgres was originally built on one: https://en.m.wikipedia.org/wiki/QUEL_query_languages

There’s also D/D4/Dataphor, though i’m less convinced.

Unsurprisingly that mostly died out after sql took over everything.


There are multiple alternatives. Problem is SQL is "good enough" and ubiquitous and deeply entrenched. It will be extremely difficult for any alternative language to unseat SQL.


Edgeql is trying.


The closest that many have come to is to develop ORM libraries. LINQ + EntityFramework Core is my favorite.


I'm normally not a fan of ORMs but I've had good experiences with Entity Framework.

I really like the automatic migrations.

For small apps and a single dev it hits a sweet spot.

I find .Net really good at being kitchen sink included and business ready. If you're making internal small apps I've yet to find a better platform. Especially if you're in an MS Office heavy environment.


He said better than SQL.


LINQ is a much nicer SQL IME, at least for CRUD operations — but I generally don’t use the “objects” part of it except for very rudimentary insert/deletes.

However group by specifically seems to be a hack, and it falls on its face in weird ways almost every time I use it. I’m not sure how much I like it’s “AST rewriting” model, but it works great when it works. Just the LET command alone justifies it


To me, ORM code is better than raw SQL.

But, a lot many applications can be developed without really knowing the depths of SQL with help of the ORM itself.


Is it? All that's ever worked for me with ORMs are simple CRUD apps that don't require referential integrity. We currently need to support one built on Django/ORM where this isn't the case and e.g. deleting an object can require deleting thousands of others.... which the ORM helpfully renders into thousands of queries and can take you half an hour. Luckily the creators soon decided to switch off all timeouts....


I have had amazing luck with Prisma.io on the oracle side



The monkey's paw - you get your wish, but now instead of one annoying query language, you have two!


Standards - the thing that we need more of and that we have too many of.


Of all of the skills I started to pick up twenty years ago, SQL and Bash are the two that have continued to serve me well for my entire career.


And probably learning where and how to find information you need.


Agree. Regex literacy is a big skill to have too.


You should learn SQL because it's likely to be a very different language to the ones you already know. In SQL you don't tell the DB how to get the data (for loops and pointers) but what the result should look like, and the DB figures out an execution plan. This style is well worth practising and appreciating.


> and the DB figures out an execution plan.

And then you spend the next hour trying to decipher the plan and work out how to get it to do something that isn't insanely slow.


Maybe create foreign keys and indexes in your Rails migrations


I had a crazy case where after adding indexes to things, queries suddenly become much much slower. I had to look over the plan and realized that removing these indexes made queries fast again. Turns out you can't just take for granted that an index will speed things up and its something you actually have to test and record before/after results.

I'm still slightly at a loss as to what the issue was but the best I can find is that if you index something with lots of duplicate values, you make the query slower as it has to scan the index and then scan the long list of results.


Indices work best when the number of matching entries is low. There is a metric called index selectivity which is the number of distinct values of the indexed columns divided by the total number of records. For a boolean value, this would be 2/N records, effectively the worst possible index. For a perfect index, it would be 1 (every row has a unique entry in the index). It could be possible for the query planner to get the answer wrong for which index to use if it happens to be wrong about the selectivity of your particular query, because the selectivity must be approximated.

See for example PostgreSQL's (a fantastic database) documentation on these approximations and imagine a number of ways it could fail [1].

> Assuming a linear distribution of values inside each bucket, [...]

> This amounts to assuming that the fraction of the column that is not any of the MCVs is evenly distributed among all the other distinct values.

> Using some rather cheesy assumptions about the frequency of different characters

[1] https://www.postgresql.org/docs/current/row-estimation-examp...


Thank you for sharing, something to remember, a valuable lesson.


When I first encountered SQL i hated it as a language. But then the more I understood the working of RDBMS the more I appreciated its use cases. ETL is probably best handled by the DB if you don't want to get down into optimising the ETL process. I feel having tight loops in SQL and having another programming language as a glue between them is a decent tradeoff but you have to keep data transfer in mind.


At times it might make sense not to use SQL. That is fine. But I think there is a value in learning SQL. Some reasons:

* Learning effort to Reward ratio is good. A substantial syntax is valid for all of the DBs that support SQL.

* Very easy to remember the syntax. I have not used SQL in a long time but I can still write one without much effort (including the nested queries)

* Pareto: About 70-80% of data retrieval use-cases are well covered in SQL syntax (or a combination of multiple queries like union, creating temp table, etc)

* Tooling: Free & commercial tools which connect to multiple data-sources and provide SQL e.g [1]

[1] https://www.jetbrains.com/datagrip/ (I am not associated with jetbrains)


Or Idea Ultimate which contains Datagrip's functionality and the other languages they made IDEs for, except CLion (C/++, Rust debugger) and Rider (C/F#). Android Studio can connect to the Apps SQLite db. There is a plugin for Vim to connect to DBs, Written by Tim Pope.

Just a content user.

(edit) Links for the curious:

https://github.com/tpope/vim-dadbod

There are a UI and completions for it too:

https://github.com/kristijanhusak/vim-dadbod-ui

https://github.com/kristijanhusak/vim-dadbod-completion


I use SQL, or something like it, on pretty much any project that needs to store any significant amount of data. Easily the most bang for my buck out of any of the skills I picked up in high school.

Even if it's not SQL proper, you should learn a query language of some sort and get used to the idea of asking an engine to help you sort and filter your data in a consistent way. Once you have that mental model down, you'll find you can migrate between the SQL-likes and the SQL-not-so-likes comfortably, and pick the best tool for the job at hand.


I just keep getting faster at SQL. It's fun. I used CROSS LATERAL JOIN last night to find the nearest regions to points. Super fast and easy: https://carto.com/blog/lateral-joins/. Right on the data. SQL does change over time, as Postgres adds features, but especially when you use MADlib, Postgis, and other extensions. I mean ... https://postgis.net/docs/reference.html. And everything I learned when I started is still relevant. It's my go-to crunch tool for data processing.


I just want to throw out there, one of the weaknesses I've seen is the lack of fast options for pivoting (wide <-> long). I've used the tablefunc module: https://www.postgresql.org/docs/13/tablefunc.html but it's not particularly fast in my experience (maybe they've improved it since it has been awhile) and the syntax is a bit odd. I'm spoiled with R (newer version) tidyr's pivot_wider and pivot_longer.

Gosh, yes it's been awhile. I posted this, ahem, over 8 years ago: https://stackoverflow.com/questions/15415446/pivot-on-multip.... Guess what, while to many things change, I still use SQL.

I should, uh, try it again. Or try PL/R, PL/Python, or PL/pgsql. There are options.


I've built a desktop app that enables you to load CSV files and perform SQL on the CSVs.

I imagine it would be handy for anyone who starts learning SQL because you don't need to setup the database or use command-line.

If you are interested, please try it out: https://superintendent.app


It's nice. Windows only? (I mostly use a Mac at home).

EDIT: no, it's Win/Mac/Linux, apologies...


Yes, it supports all 3 platforms. Hope you enjoy it!


There’s also csvkit.


So it’s a visual wrapper around SQLite?


Yes! with few additionalfunctionality added (e.g. regex, parse_date, tsv).

Initially, I wanted it for myself because I used Excel a lot but I never loaded CSVs into a database (dunno why). Using SQL is much more convenient than Excel's formula.


> The SQL langauge is old, strange, and important.

I've always found it strange that something hasn't replace SQL, particularly for traditional 'web' client/server applications. But SQL is probably the only technology/language that has survived in the past 30 years of my career.

99% of their SQL databases probably don't require transactions and ACID behaviour. There nearly always needs to be a layer to convert the rows into a format for use (blobs of JSON for a front end etc). Complex queries are hard enough to be used as standard interview questions.

I guess the out of the box performance, availability and reliability of SQL databases, that 0.1% of times when you need ACID writes and transactionality, and the (just about) interoperable SQL makes it an easy choice - no one was fired for choosing SQL.


> 99% of their SQL databases probably don't require transactions and ACID behaviour

99% of applications do not require multi-statement transactions or rollbacks, but basic ACID is hard to live without (at least the 'D' and 'C' parts).

You probably don't need a massively parallel transaction system, perhaps you just need to isolate writers out but the fact is that most of what you need comes for "free" with a standard SQL database + failovers.

It's not that no one was fired for picking it, but that people can be hired to fix it when it doesn't quite work as well as it should.


Over 90% of the projects I have been involved in have required ACID. In fact I can't think of any that used a database and did not require ACID so I think your 99% should be inverted: 99% of all databases require ACID. This is evident from how even MonogoDB is ACID now with their new storage engine.


> 99% of their SQL databases probably don't require transactions and ACID behavior.

Until they do. Developer mind share for the data storage aspect of their applications is much simplified due to ACID.

Just look at the documentation of NoSQL databases. A lot of code is for integrity checks and verification.


I'd say that if you think transactions is needed just in 0.1 % of usecases then you probably have a lot of bugs in your application, or you have a very simple one.

Transactions are used quite often, and there isn't really any other options.


My apps may be simple and have lots of bugs but I'd still say that the vast majority of SQL calls in all apps are to read data that isn't being written to at all or very infrequently - hours/days/weeks and reading possibly temporally inconsistent data isn't important.


If you learn SQL, please take the time to learn how to pass parameters without putting them in the query.

Example in python:

  query = """Update employee set Salary = %s where id = %s"""
  tuple1 = (8000, 5)
  cursor.execute(query, tuple1)
It is very important that the parameters of a query never become part of the command string, or a whole class of injection attacks become possible.

At this point, you could still run into problems if your parameters and command string aren't in sync, so watch out for that as well. (Usually this would happen later as the code was modified for new requirements)


Working on a new SQL client https://arctype.com based on this very thesis! SQL tooling is rough tho. My first intro was pgadmin and MySQL workbench which I feel serves to turn a lot of beginners off. If you have to get Python 3 packaging right to even start writing queries.. it’s just a very tough journey.

My other gripe is about ‘intermediate SQL’ - CTEs, window functions, stored procedures. Intentional practice was needed for me to be able to use those! Unlike Python where you pretty much become a functional programmer without realizing it over time.


I am using spring hibernate. It's so powerful. I don't need to write and understand SQL anymore. Sarcasm off: I see more and more juniors without SQL skills. Various reasons they give me. From the nosql plague to some cloud solutions to the above example. SQL seems to be not so fun anymore. I have a hard time understanding that. Since data storage is such a fundamental part of your application. I would even argue that the data structure to some degree influences the application architecture. Specifically if you need performance and security.


I think you are complete right, but maybe I believe so in part because of my age.

To me the general undertone in all the arguments I have heard over time is either "we now have this tools/service solving that us" or "why would I need to learn all this (complicated) stuff, I've done fine without it".

The first variant I just never have seen being true, but that might be hard to see by someone who is reluctant to learn/understand anything about relational data organization.

The second variant I can to some degree sympathize with, because it does seems daunting to me, how much time/effort the young kids these days have to spend on keeping up with the ever changing landscape of new smoke and mirrors technology (very little of it fundamentally new or innovative). Learning something that appears to be far removed from the tools they already know will no doubt feel like a challenge and maybe therefore also as a waste of time.

However, the apparent resistance to learn about something as relational data organization (because this is actually about far more than just SQL), feels to me as rather troubling regarding the quality of software design.

The way I've hears some people talk about SQL and relational data organization, made me even wonder if they should even be allowed to call themselves software engineers. To me they sounded like architects telling me that concrete is such an outdated and unstructured mess that they no longer need (to know about) it to design buildings.

The problem is, buildings and software will be build anyways (for there is a need) .. and if their architects can get away with saying they don't need to know about some fundamental theory, then good luck with the quality of that software. No amount of tools/services that supposedly abstract the challenges of good structured data design away, can ever solve that problem.


SQL is useful but we’ve spent decades to abstract it away, because the Java/SQL articulation is always circumvoluted, and juniors do without.

Then they load two tables in Java and join them manually in memory; or they fetch the principal record, and use the getter to retrieve the dependency, so Hibernate does one query per row.

I accuse all ORMs, especially Hibernate, of making it too easy to skip SQL, and I accuse Java of not supporting multiline strings. We have failed, in most languages, at making it easy enough to write SQL yourself. JPA is nice to cache records in memory, but it’s been way more harmful to performance that it became hard to write SQL directly.


Anyone that does work on the client that should stay on the server, sending wasted data across the wire has already lost it.

To this day I keep writing stored procedures, no need to multiline strings.

And for the rest just use either myBatis or jOOP, run away from Hibernate.


You are right, regarding the wastefulness. Additionally, there are extra risks involved when data needs to be transferred between a storage and compute facility.

To a degree I get why architects these days like to separate everything into individual narrow and easier to manage/tuned services. But separating business logic from the data it runs on may equally be a fundamental mistake, when it comes to guarding integrity of both data and the logic acting on it.

There are good argument for advocating that actually ALL business logic should be implemented alongside the data it runs on, inside the database itself. Only the logic specific for presentation should exist separately, in whatever front-end is used. I've seen that done successfully, several times in fact. It does require some programming skills which I've not seen in most programmers these days though. Even less among the young dogs.

Maybe the key question here is: does that say something about proper software design / architecture, or about the state of what programmers these days actually know about the fundamentals.

I believe it's easy to hate on something like SQL from a position of ignorance (to be clear: SQL does have serious shortcomings, without a doubt). Still, in nearly 30 years I've not seen anything come along that is fundamentally better (and gaining enough traction to succeed).

As for those who like to complain that stored procedures are too limited because you would have to implement everything in SQL (I've heard that one a few times too many), plenty of databases have extension to program stored procedures in different languages (e.g. python).


The problem with SQL is integration with the rest of the code. I’ve already mentioned that Java not supporting multiline strings made it exceedingly hard to write SQL in-app and incentivized the use of frameworks; It is equally true that it is easier to deploy a .jar rather than a jar + stored procedures.

To update a stored procedure, you have to execute SQL (as in, load a client, deal with exceptions, then actively load an instruction and execute it), then deal with “what happens if there is already a procedure with the same name, what if it’s not ours, what if you don’t have permissions”, etc. If the upside of SQL is that is is functional, ie you declare what results you want and let the DBMS decide how to execute your query (ultimate declarative language), it is really funny that the DDL is instead an instructive language where you provide commands, and it fails miserably if you give the same instruction twice. They got it all wrong! DDL should be descriptive not prescriptive! It should not be “CREATE TABLE” but “TABLE ___ IS ___”!

It’s all about difficulty; If Postgres accepted “git push” to deploy stored procedures, we’d immediately see webservers implemented in SQL ;)


I never had that problem, Java projects aren't a tiny Python script that only runs once.


There is also a security aspect to it, with stored procedures, even with stolen creditials there is very granular acess, that would have to be otherwise provided with tons of views.

I think it is a consequence of six month bootcamps and then be allowed to call themselves "engineers" (in countries that allow such things).


> There is also a security aspect to it, with stored procedures, even with stolen creditials there is very granular acess, that would have to be otherwise provided with tons of views.

In fact, now that you mention this .. I've seen a few times how an application had complete granular access control on every individual record, based on the specific (personal) user credentials you would connected with to the database (PostgreSQL). There are some really interesting things possible. It always makes me cringe (at least a bit), when I see yet another application access a database with just a single set of (admin) user credentials. So much missed potential.

> I think it is a consequence of six month bootcamps and then be allowed to call themselves "engineers"

Certainly true, but I think commercial companies that constantly try to market their products/services as "making everything so much more easier, without a need to understand all that pesky complicated matter" are at least equally as much to blame for the apparent degradation of skills and knowledge.

I'm well aware that almost every generation looks at the younger generation thinking: "what little do these young people actually know". To a degree no doubt in their own ignorance, lacking real understanding of all the new things currently in use. On the other hand, I think this shift from actually understanding technology towards professionals increasingly just becoming convenient (and easy to replace/retrain) "tool jockeys" is nonetheless a real thing too.

That personally saddens me, because I know how much easier it will be for greedy commercial interests (think the current tech giants) to manipulate/hijack technology, with total disregard for actual technological progress (or even quality). A growing group of "professionals" who can't even tell the difference anymore, sure won't help.


> It always makes me cringe (at least a bit), when I see yet another application access a database with just a single set of (admin) user credentials. So much missed potential.

I wanted to use separate credentials, but the problem with app credentials which match userland credentials is that connection pools require that you use one single set of credentials. And they exist because opening a cnx to the db is slow, so it goes faster when you preopen 20 connections.


If you are able to use Kotlin, you can write multiline strings.


I detest hibernate and the whole JPA that came after it. It's more complex than using SQL directly. The whole mapping thing looks simple enough but when I create a one to many relationship on a new project from scratch I always need to reread the documentation because it is not working the way I expected it to work. The reason hibernate was created was to simplify data access. Sorry but I think it failed miserably. On personal projects I use SQL directly. It's easy, fast and simple.


Abstracting the persistence layer is important, but also unique to your application. ORMs come with some nice features however relying on a full ORM feature set can lead to leaky abstraction. Not to mention you have to learn another DSL to use most ORMs. Why not just use SQL. A nice custom wrapper around binding query parameters is all I need most of the time.

If you use an Onion or Clean Architecture you'll push the SQL into its own layer and you don't need an ORM.


We're currently moving our Druid/Kafka/Spark setup to BigQuery because of how simple SQL is.


You might find https://trino.io/ interesting. It allows you to bolt on a MPP SQL execution engine on top of any data source including pre-built connectors for Druid and Kafka.

It's all ANSI SQL and the best part is you can combine data from heterogenous sources. e.g. You can join data between a topic in Kafka and a table in Druid or even between Kafka, S3 and your RDBMS.

Disclaimer: I'm a maintainer of the project.


My employer is going to the other way from snowflake -> druid to reduce high response latencies/ improve freshness.

I am assuming your use cases can tolerate increased latency?


Yes they can, we will move low latency to a smaller, different pipeline.


The point of this article is more on the importance of data integrity in databases than on SQL itself.


I'd like to see SQL as a first class citizen in a static typed language with tuples & record polymorphism.

disclaimer: I hate ORMs.

Something of interest: https://smlsharp.github.io/en/


You might look at the now defunct ur/web if you haven't. An old snapshot of the web page:

https://web.archive.org/web/20170429201116/http://impredicat...


Wow I've been digging into the ML languages these last few months. First time I heard of SMLSharp. Looks very good :)


ya'll need to try elixir's sql system Ecto.

It doesn't try to wrap sql in objects. instead it exposes a dsl that lets your write sql in elixir and get elixir structures.

The end result is that the impedance mismatch is minimal

https://www.irrationalpixels.com/posts/database-modeling-wit...


Even as a PM, SQL is one of my most valuable skills. I can handle trivial tasks myself without bothering data analyst and speed up the processes by a lot


SQL is an important skill to have, in particular one should learn about database normalization and the normal forms - very important stuff!


And even if you don't want to normalize, at least don't make the DB basically incomprehensible.

Speaking from experience, a "abnormalized" db sucks. https://news.ycombinator.com/item?id=27842820


Yeah one should know why it’s important to normalize, and then when they go a different path, it is done consciously and not because they are inexperienced and didn’t really know any better.

Folks out there who don’t want to learn the ins and outs of database theory - please don’t; read up on it, it’s a very important skill & knowledge to have and doesn’t take long to master. It does take a few days, perhaps weeks, for all the ideas to “click” in your head.

Whilst we having many fads in IT, there are certain sound principles of computer science developed over 40-50 years which one should learn :-) Know the rules before you break them!


For Databases, it takes me a concious effort to not go to at least 3rd NF.

Except for addresses/names/telephone numbers, I have been burned by ANY assumption I made. Those, I tend to just store as something like char(255) nowadays by default, unless business requirements force me to split it up, but even then, after voicing my concerns.

Edit: reading up on the addresses falsehoods again, 255 may not be enough.

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

https://github.com/google/libphonenumber/blob/master/FALSEHO...

https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...


If you use Postgres or SQLite, TEXT is preferable for strings.


I just wanted to give a easily understood example. Specifically to point out that even limiting the length can make problems.


There's reasons for one wide table.

The traditional SQL database uses row based data structures.

When you shift to columnar data formats for the table, parquet or any columnar DB, the normalization rules which were developed for row based become extremely different.

The brave new world now is in-memory DBs.

All those 1970s rules, which make sense for row based tables stored on disk, don't apply to data in RAM.

At all.

So it's going to get very interesting.


Shouldn't the normalization rules be exactly the same whether the data is stored row-oriented or column-oriented and on disk or memory?


This is a common misconception.

Columnar stores do indeed enable wide tables but they don’t replace the need for normalization or the benefits. Wide tables and normalization (or denormalization) solve different problems.

From the perspective of a database the difference between RAM and disk is latency. Normalization is still a factor in query performance.


> There's reasons for one wide table.

There are indeed. It should still be a conscious decision. For the DB I work with, it wasn't.


SQL is a query language. Data normalization is a relational design methodology. They aren’t actually related except that relational databases often use SQL and benefit from normalization. You can normalize data models without SQL and you can use SQL to interact with denormalized data.


All of it is completely irrelevant unnecessary complexity. Learn to use Tinkerpop.


Curious that 99.99% of applications use an SQL database, must just be a lot of really bad developers. Am I so out of touch? No, no, it’s the children who are wrong.


You can patronize me all you want, it does not rectify the fact that you are out of touch with modern database technologies and prefer what you know works.


SQL is constantly getting new features. The newest is SQL:2016. So while it's old, there is a lot of newer things too.


It's a real testament to Chamberlin and Boyce that no one has been able to come up with anything better in ~45 years.


This article is an ad for an online course.


select * from argument a where a.subject like '%sql%'


(2019)


It is indeed very helpful, actually my first class in CS colleague was SQL in oracle. I think that's one of the class that is most valuable to me without my realising it. Until now I still remember most of the language because I used it every day, it has similar syntax and understanding even when I convert to Mysql or Postgres.

Without realising it is one of the most important fundamentals that is nailed in your head, suddenly you just understand when you look at raw SQL from your ORM to debug, what went wrong etc.


SQL is going to eventually die because of its three-pronged relationship model that implies query-time complexity. Learn it if you want, but especially backend and analytics people would be better off learning Gremlin or Cypher.

Edit: downvoting this is not going to make it untrue.


I for one am in theory very interested in graph DBs, but in practice I yet have to see how do implement a solutions for my particular set of data (and my ideas about the data) in a graph DB without it becoming a huge ball of spaghetti.

If you're so enthusiastic about graph DBs that's good for you; as for me I enjoy having named rectangular sets of data (i.e. tables and views) consisting of smaller entities (columns / fields) that follow formally described rules (types, domains, check constraints, foreign keys and higher-order invariants). I do have a number of gripes about SQL but right now I feel quite happy about having achieved a reasonable way to integrate an RDBMS (SQLite) into my application environment (NodeJS) where I can fluently switch between formulating solutions in SQL or in JS with few restrictions.

Programming means you have to constantly keep a number of balls in the air, all the time. Having my data in handy tables is a big boon and not something I currently can expect a graph DB to do for me.


That is a truism everything is eventually going to die - but SQL has been around for longer than just about all development technologies today and probably will be around a lot longer than things that are popular now. I would guess that it will be around when people have move on from React, Gremlin, and Cypher.


Eventually is a long time. SQLite is committed to supporting its file format out to 2050 and it's the recommended format for data archival by the Library of Congress. I suspect SQL will still be going strong long after my final exit().

Thing is, it's pretty neat in many respects (simple enough to do simple things easily, surprisingly flexible), and I say that as someone who really likes graph databases.


You're getting downvoted most probably because your comment provides no arguments why. Why would I be better off with Gremlin or Cypher?


Because it eliminates the complexity of joins: both the mind-complexity and the algorithmic complexity.


For everyone (like, for example, me) who has never heard these two names before (or not as replacements for SQL at least...) maybe the following might be useful:

https://stackoverflow.com/questions/13824962/neo4j-cypher-vs...


You are getting doenvoted because it's untrue. So many faang companies using traditional SQL. And for good reasons. Not to say your two examples are bad. They have their own benefit. But don't become the guy who believes in one replacement solution for all without the knowledge of all. It's annoying


And how many FAANG companies are using a graph database?


For simple queries, row adjacency will beat the graph hands down. And RSBMSs are hard to beat for transactional processing.


Plenty of graph databases, even distributed ones, come with transactional guarantees and ACID to boot. Not everything is eventual consistency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: