Select * from cloud

swalsh · on Sept 30, 2022

Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around. It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent. You're probably not making the wrong decision if you decide to build on top of SQL.

strbean · on Sept 30, 2022

> except for a brief period in 2010 where we thought we could kill it

Unfortunately, we never actually tried to kill SQL. We tried to kill relational databases, with the death of SQL as a side effect. This proved to be a stupid idea - non-relational data stores have value of course, but so do relational databases.

SQL, as you point out, is 48 years old. It shows. The ergonomics are awful. It uses natural-language inspired syntax that has fallen completely out of favor (with good reason imho). The ordering of fields and tables precludes the use of autocomplete. It uses orthogonal syntax for identical constructions, i.e.

    INSERT ... (col1, col2, ...) VALUES (val1, val2, ...)

vs.

    UPDATE ... col1=val1, col2=val2, ...

And worst of all, it comes with such a woefully inadequate standard that half of the universally supported features are expressed in a completely different fashion for each vendor.

It is very common to argue that its longevity is due to its excellence as a language. I think it's longevity is due to its broad target audience, which includes plenty of people in less technical roles who are high on the org chart. Tech workers are happy to jump on new technology that improves ergonomics (or shiny new toys that are strictly more interesting because they are new). Executives who's primary way of interacting with the tech stack is querying a read only replica are much less open to learning a new language.

levkk · on Sept 30, 2022

SQL expresses relational algebra [0] which is the language defining data relationships and manipulation. It hasn't changed in 50 years for the same reason why B-Trees and sorting algorithms haven't: it's foundational to computer science.

Database-specific "SQL extensions" are in my experience just administration commands, e.g. `VACUUM` or `CREATE EXTENSION` in Postgres. They help operate the DB, but have little to do with the actual data manipulation.

Killing SQL is like trying to kill computer science: you better come up with something better than a synonym.

[0] https://en.wikipedia.org/wiki/Relational_algebra

dllthomas · on Sept 30, 2022

Nothing about relational algebra says you have to put your projection before you specify what relations are involved. SQL is a bad fit for modern tooling in ways that are a property of the language and are unrelated to it expressing relational algebra.

The notion that nothing has changed about B-Trees or sorting algorithms doesn't, I think, actually hold up... but this question is more like changing the API for a standard-ish B-Tree library than about modifying how the B-Tree itself works.

scarface74 · on Sept 30, 2022

And that’s why Microsoft’s LINQ puts the entities (table names) first and then autocomplete just works. https://learn.microsoft.com/en-us/dotnet/csharp/linq/perform...

As an aside: LINQ is the only “ORM” that is truly worth using. It is integrated into the language and you can use the same expressions and query expressions are first class types that can be passed around as expression trees and are then only translated either as code if you’re using in memory lists, database specific SQL, MongoQuery etc based on the provider you pass the query to.

bb88 · on Oct 1, 2022

> LINQ is the only “ORM” that is truly worth using

I've been thinking a lot about ORMs and they almost usually fall flat at some level since they are not nearly as expressive as a first order language is. I haven't had experience with LINQ but I know the Django ORM is this way.

I'm thinking the best approach if I had time/money would be to develop a better query language that is close enough to SQL for people to learn, but then also translates back down to SQL. LINQ looks really close to this, but I'd really want it to be cross language.

One of the aspects of QUEL that I really liked was that if you couldn't describe what you wanted, you could program it out if you needed to. Say you had some complex ranking algorithm that was hard to write in SQL directly.

mondaygreens · on Oct 1, 2022

> I'm thinking the best approach if I had time/money would be to develop a better query language that is close enough to SQL for people to learn, but then also translates back down to SQL

Have a look at EdgeQL, the query language that powers edgedb.

scarface74 · on Oct 1, 2022

LINQ is a full query language you can left join, right join, inner join limits, etc. It even maps C# functions to Sql functions. If you can express it with SQL, you can express it with LINQ.

Cross language will still be an issue. LINQ isn’t really an “ORM” it is a method to translate C# commands used to work over related collections to expression trees and those expression trees are interpreted/translated at runtime by a provider.

You need support from the language/runtime to treat expression trees as a first class type.

expression tree parser. It’s not for the faint of heart.

jolux · on Oct 1, 2022

> If you can express it with SQL, you can express it with LINQ.

Not if it uses CTEs or other advanced SQL features. It can do basic select, join, group, sure. But that's just scratching the surface of how powerful SQL is. As for requiring language support: in Elixir, Ecto.Query essentially implements a LINQ-like DSL as a library (albeit with macros). See https://hexdocs.pm/ecto/Ecto.Query.html

scarface74 · on Oct 1, 2022

From what I see a CTE is just an expression that you can reference.

In C#, if you did something like this

var seniorMales = from user in users where u.Age > 65 && u.Sex == “male” select u

seniorMales is just an IQueryable<User>

Later on you could say

var gaMales = seniorMales.Where(s => s.State == “GA”)

var floridaMale = seniorMales.Where(s => s.State ==“Florida”)

All three of those are just expressions that have not hit the database yet.

Then when you want to actually run the query and limit the number of rows, you can do

var result = floridaMales.Limit(20).ToList()

It would then create a query including a limit clause just as you would expect it to.

Now if you do

floridaMales.ToList().Limit(10)

It would return all of the rows in the table to the client and it would be limited on the client side (don’t do that).

Worse case, if LINQ can’t express in query syntax, a provider can add its own extensions in function syntax and the provider can still parse the expression tree.

jolux · on Oct 1, 2022

Yes, I've been a professional C#/.NET developer before and understand LINQ and Entity Framework. The difference is that CTEs allow you to reference multiple tables at once and do inserts and updates on the intermediate result set. You can also use recursive CTEs to do hierarchical and graph-style queries, which you can't do with LINQ as far as I know.

LINQ also can't do window functions. Like I said, it covers the basics of SQL that most applications need. But as soon as you need to go beyond that you'll be writing custom SQL.

dropofwill · on Oct 1, 2022

Never used C#, but in Java land we have a similar dsl jooq, which supports everything i’ve ever needed, including window functions https://www.jooq.org/doc/latest/manual/sql-building/column-e...

jolux · on Oct 1, 2022

Yeah, there's nothing else quite like jOOQ as far as I know. I think SQLAlchemy in Python is the only other library that even comes close.

matwood · on Oct 1, 2022

jooq is amazing and one of the things I really miss any time I leave Java land.

lukaseder · on Oct 3, 2022

Then don't leave

michaelcampbell · on Oct 2, 2022

> LINQ puts the entities (table names) first and then autocomplete just works

I grew up with SQL so LINQ for a long time just reads so weird to me. I won't admit how long it was before I realized this fact you stated.

It still reads weird to me, but I get it.

Nitramp · on Sept 30, 2022

SQL does express relational algebra, but OP is arguing it expresses it poorly, in a cumbersome way. There's nothing foundational about how exactly SQL chooses to express relational algebra, you could imagine many different syntaxes and specific semantics.

I concur, SQL is a liability in many ways.

bryanrasmussen · on Sept 30, 2022

>Killing SQL is like trying to kill computer science

SQL is a language that expresses relational algebra, killing it is like killing any other language i.e. nearly impossible because there is still code sitting around written in it, thus needing competent programmers to maintain, competent programmers who will be asked to write more code and will choose to do it in SQL if they can get away with it, rinse and repeat and so forth.

Once a language has a certain installed base it may be eternal - what size that installed base is I don't know.

bb88 · on Oct 1, 2022

The installed base is kind red herring, because most of the worlds largest companies use Oracle to hold their data. So you'd have to get Oracle on board.

Could there be a better alternative, oh yes, most certainly.

E.g. Every company I worked for needed to do SQL migrations. Unfortunately that isn't a part of SQL. Could it be? Yes! Is it? No! So everyone re-invents the wheel...

samatman · on Sept 30, 2022

The Third Manifesto[0] is 27 years old at this point. If it were possible to take seriously the proposition that SQL and relational algebra over databases were equivalent before its publication, it hasn't been since.

[0]: https://en.wikipedia.org/wiki/The_Third_Manifesto

I've been trying to get my hands on a corpus of Dataphor code, because Tutorial D is unsatisfactory for different reasons than SQL. What I've been able to glean of Dataphor is quite promising, but it isn't much.

stefan_ · on Oct 1, 2022

You mean it butchers relational algebra; composition is the name of the game, and yet that's somehow incredibly cumbersome in SQL.

blacklight · on Sept 30, 2022

Db-specific extensions unfortunately aren't only admin commands. They include the whole PL/SQL thing, with concepts like triggers and stored procedures that should definitely be fundamental in SQL, but that nobody bothered to standardize, and eventually everybody implemented as they liked.

nine_k · on Sept 30, 2022

Stored procedures and even triggers are non-relational and have little to do with SQL proper. They are like Lua scripting inside nginx which are not directly related to anything HTTP.

On one hand, it might be nice to have completely portable triggers. On the other hand, the lack of standardization allowed to have approaches as differer as PL/SQL and T-SQL to emerge. It would be terrible to be stuck with some ancient COBOL-like SP / trigger syntax, required by the standard.

blacklight · on Oct 2, 2022

As someone who has worked across many DBs, I find the lack of a unified standard for procedures and triggers appalling and very time-consuming.

Granted, stored procedures and triggers aren't part of the relational model (strictly speaking). But they are the first layer built on top of the core - the one that packs multiple queries/statements together, introduces variables, and allows custom logic to be executed when the data changes.

The lack of standardization means that migrating from e.g. Oracle to Postgres/MySQL involves knowing the dialects and statements supported by each of them in order to migrate the PL/SQL or T-SQL logic - and, especially in the case of MySQL/MariaDB, some of those constructs may not be available at all.

That creates really a lot of friction when it comes to database migrations, especially for large databases. I've myself been working with this stuff for years, but I still have to regularly lookup how to write an IF or declare a variable for this or that DMBS, since we have such a proliferation of dialects and standards that one person can't keep all the variations in their mind.

nine_k · on Sept 30, 2022

Relational algebra is what makes SQL great: some actual strict math behind the design.

Ad-hoc syntax with hard-to-compose parts,is what makes SQL arbitrary and quirky. I would like a more algebraic syntax, with uniform, composable parts.

randomdata · on Sept 30, 2022

> SQL expresses relational algebra

Actually, no. Relational algebra is based on sets. SQL is not. Perhaps you are confusing SQL with QUEL, which was used by Postgres in its younger days? QUEL evolved out of Codd's original Alpha language.

matwood · on Oct 1, 2022

How is SQL not based on sets? One of the big issues I see when people first come to SQL is not thinking in sets. Instead they think in individual records and loops, which is the exact opposite on how to think about SQL.

randomdata · on Oct 1, 2022

Sets are collections of unique elements, and indeed relations are sets of tuples. But SQL only sticks to the tuples part, foregoing sets. It's quite possible, and quite likely if you're not careful, to end up with duplicate entires in your results to an unwanted effect.

SQL is only relational-like. With experience and care you can craft your queries in such a way that you do get true sets (e.g. using DISTINCT), but I see beginners – and sometimes even experts in a rush – overlook this quirk in a complex query resulting in what might seem okay in limited testing, but blows up when there is more data than the bare minimum needed to test. A language that truly follows Codd's relational model would produce sets always.

I expect SQL is considered difficult to learn exactly because of this deviation. It's quite unintuitive, especially if you go in thinking that it is actually relational. Nothing you can't work around with sufficient knowledge and experience, but that added knowledge and experience required is where the difficulty no doubt stems from and undoubtedly scares many away out of frustration when it doesn't work like you think it should in the interim.

lelanthran · on Sept 30, 2022

> It uses orthogonal syntax for identical constructions, i.e.

Regarding your complaint ... those aren't identical constructions.

Insert has to have the syntax it does because it lets you insert multiple rows all with different values in a single statement:

    INSERT ... (C1, C2) VALUES (v1, v2), (v3, v4), (v5, v6)

Update doesn't allow that. The syntax for each of the above makes sense for the specific intention of each statement.

RussianCow · on Sept 30, 2022

They could have consolidated the syntax for both, though. No reason you couldn't have this instead:

    INSERT ... VALUES (C1=v1, C2=v2), (C1=v3, C2=v4), (C1=v5, C2=v6)

This would actually be more powerful because it would allow you to arbitrarily omit columns on a per-row basis, which you can't do with SQL today.

smallnamespace · on Sept 30, 2022

You can omit columns on a per-row basis with the DEFAULT keyword, it's just more cumbersome if your rows are very sparse. Your proposed syntax of:

  INSERT ... VALUES (C1=v1, C2=v2), (C3=v3, C4=v4), (C5=v5, C6=v6)

in standard SQL is

  INSERT ... (C1, C2, ... C6) VALUES (v1, v2, DEFAULT, DEFAULT, DEFAULT, DEFAULT), (DEFAULT, DEFAULT, v3, v4, DEFAULT, DEFAULT), ...

layer8 · on Sept 30, 2022

It would maybe be more powerful, but also less efficient. If you want to insert different sets of columns, you just use multiple INSERT statements. The point of the current syntax is to be able to efficiently bulk-insert many rows using the same set of columns. That set of columns needs to be validated and mapped to the internal offsets (or whatever) only once for inserting an arbitrarily large number of rows, instead of having to repeat that validation and preprocessing for each row.

wruza · on Oct 1, 2022

And the point of gp is that SQL could do some work and sort these (a=b) values into groups that fit its inner requirements on its own. You can just omit a useful feature because it creates a minor inconvenience at a preparation phase, or you can include it. This is exactly these bad ergonomics some grand-commenter talked about.

NavinF · on Sept 30, 2022

What a bizarre use case. I highly doubt your scheme performs better than just using PREPARE AS INSERT to create a prepared statement and execution plan on the database side. Or are we still generating SQL using string interpolation in 2022?

ipaddr · on Oct 1, 2022

Most people have an orm between them and the database.

Think bulk inserting into via the command why the format exists vs what you do in your language and/or framework.

NavinF · on Oct 1, 2022

If you call row.save() on one ORM object at a time, you'll get neither the performance benefits of prepared statements nor that of multi-row inserts. If you save many objects at a time, the ORM library will use prepared statements.

regularfry · on Oct 1, 2022

We're certainly still slurping up bulk SQL commands from raw text files in 2022. Kinda odd not to come across that use case.

Shorel · on Oct 1, 2022

I think it is pretty standard usage for ETL tools.

You are not inserting a few dozen records, you are inserting a several thousand records each query.

In fact, sometimes insert speed is measured in MB per second, not records per second.

lelanthran · on Oct 2, 2022

They still aren't identical constructs, because the intended action is very different; multiple insertions result in multiple distinct records all with different values while multiple updates result in multiple records with the same new values

Having the syntax identical is inferior in every way except for the case where you want to build the string up programmatically, and that is a trivial problem to solve for the caller.

I think your proposal works better for an UPSERT keyword, as that is yet another different intended action.

aidos · on Sept 30, 2022

And further, insert … (select …)

dragonwriter · on Oct 1, 2022

> Unfortunately, we never actually tried to kill SQL. We tried to kill relational databases, with the death of SQL as a side effect.

I don’t know who “we” is, but, before NoSQL tried to replace RDBMS(well, starting before, these things overlapped it), there were efforts to provide alternatives to SQL for RDBMSs, some examples:

Query-by-Example. https://en.wikipedia.org/wiki/Query_by_Example

the D class of data languages. https://en.wikipedia.org/wiki/D_(data_language_specification...

mfashby · on Sept 30, 2022

No mention of PRQL yet, it's an effort to tackle some of these shortcomings

https://prql-lang.org/

lofaszvanitt · on Sept 30, 2022

Which is plain terrible and unnecessary. SQL works, it's easy to understand, doesn't need any kind of abstractions or stupefying measures like PRQL.

sanderjd · on Sept 30, 2022

I think your last sentence gets at the crux of it. I think SQL is truly great for that use case of analytical queries over a read only database / data warehouse / parquet / whatever (ie. "OLAP" use cases). I think it's less great for the "OLTP" use cases. It still works fine, which is why it sticks around, but other approaches are more convenient for doing lots of smaller inserts, updates, and lookups and fewer table scans and aggregations.

omegalulw · on Sept 30, 2022

You are selling SQL way short. SQL has stuck around because, compared to its complexity, it lets you do incredibly powerful things.

strbean · on Sept 30, 2022

Look at the analogy to general purpose programming languages.

Every single one, without exception, lets you do incredibly powerful things compared to its complexity.

This hasn't stopped the progression of general purpose programming languages, and I think the vast majority of developers would agree that there have been huge improvements in the last 50 years.

doctor_eval · on Sept 30, 2022

TIOBE [0] reckons that the top 10 programming languages are currently Python, C, Java, C++, C#, Visual Basic, JavaScript, "Assembly Language", SQL and PHP. The first five languages represent more than 50% of the users in TIOBE.

Apart from SQL, all of these languages are imperative, and most of them derive from C in some shape or form; K&R C was published in 1978, 44 years ago.

So, while I'd agree that there have certainly been huge incremental improvements in imperative programming in the last ~50 years, when we look at what people use on a daily basis, it's not really clear to me that the changes have been any greater in significance than the changes that SQL has gone through during the same period.

[0] https://www.tiobe.com/tiobe-index/

wruza · on Oct 1, 2022

Yet every one of these “ancient” languages allows you to have human-readable common expressions, functions, sorts of classes, identifier scoping, heavy code reuse, flow parametrization, higher order operations (even C with some effort), libraries, frameworks, package management, interoperability, to name a few.

SQL is really good at its in-place one-time this-specific-case relational querying.

Things SQL is not even mediocre at: programming.

patmorgan23 · on Oct 1, 2022

But aren't most people using ORM's? In which case the SQL code is more like a kind of byte code. SQL definitely is supper powerful, but it's also embedded deeply in all of the RDBMS's and tooling around them etc.

Lots of stuff untilizing SQL and the ability to do lots of cool stuff with SQL, doesn't actually mean that the Developer Experience or ergonomics around using SQL are good.

duxup · on Sept 30, 2022

It's also great that much of the time if I come up with a crappy query... SQL knows how to optimize it for me. Easy to use, and fixes my stupids.

OJFord · on Sept 30, 2022

That's nothing to do with SQL, that's the database engine you're using (lexing, parsing, then) planning & optimising your query as best it can.

Much like C code say, it's not a feature of C the language that gcc, for example, can optimise your crappy code for whatever target platform.

jayd16 · on Sept 30, 2022

I also feel like part of the staying power of SQL is because it's the lingua franca of the database. I wonder if it's time to build and popularize some lower level language that could be targeted by ORMs and others to speak to DB engines. If we had a more flexible assembly language we might get more freedom at the programmer language level.

johannes1234321 · on Sept 30, 2022

MySQL has something in that direction, where some operations can be expressed as expression trees by using Protobuf where a client can put some arbitrary query language on top: https://dev.mysql.com/doc/internals/en/x-protocol-messages-m... it is still a bit limited, though (like no join ...)

scarface74 · on Sept 30, 2022

C# already does that, it converts LINQ to expression trees that are then converted by providers to their target query language. I’ve used it on one project to pass around the same query expressions to SQL server and Mongo - you pass a type of IQueryable<T> which represents an expression tree.

For instance

var seniors = query(from u in Users where u.Age > 65 && u.Sex == “male” select u)

IEnumerable<Users> query(IQueryable<Users>)

matwood · on Oct 1, 2022

I've been using SQL for ~25 years and don't think the ergonomics are nearly as bad as people say. In your example the insert is done that way for good reason.

I know people hate listing the fields before the tables, but when I deal with data that's how I tend to think. I need X, Y, Z, now where do I get it from and do I need to filter.

From a vendor standpoint, it can take time getting new standards implemented, but it's definitely not half. One of the big issues here is that MySQL was and still is in some ways woefully inadequate. It's shortcomings are also often make people think they are relational database shortcomings in general.

Aeolun · on Oct 1, 2022

> The ergonomics are awful. It uses natural-language inspired syntax that has fallen completely out of favor

I think SQL is so popular and has staying power for the same reason that our ridiculous gregorian calendar has staying power.

ako · on Oct 1, 2022

The reason we can't replace SQL is that it is the lingua franca between many client tools and data sources. From programming languages, to Analytics tools, to ETL tools to bulk data transfer tools, everything knows how to define SQL queries. On the datastorage side, almost everything is standardizing on SQL as its API: RDBMSes, but also NoSQL storage (etc, hadoop + hive, etc), data virtualization layers (denodo, aws athena, steam pipe).

You can't change this by providing a better query language in one of the datasource products, or by providing a better ORM. Because of the network effects it will be very hard to come up with something better that is universally adopted. Many have tried though but it's always limited to a small stack: MDX, DAX, Linq, Dplyr, Graphql, etc.

There may be opportunities to replace SQL as soon as we need something that moves away from relational algebra. Currently you see al lot of adoption on graph storage and graph query languages in the Data Fabric space, as users need to build queries reasoning about relationships between different datasets in the enterprise.

The other reason Data Fabrics could offer an opportunity here is that they're basically adding an layer of abstraction between all data sources and data consumers, and they the possibility to translate SQL into something else, e.g. graphql.

ahallock · on Oct 1, 2022

> SQL, as you point out, is 48 years old. It shows.

That's irrelevant and a fallacy. Plenty of amazing things are old and have stood the test of time. Vim and SQL have empowered me my entire career.

denysonique · on Sept 30, 2022

> It uses orthogonal syntax for identical constructions

There is an alternative INSERT syntax matching the UPDATE SET one:

  INSERT INTO table SET col1='val1', col2='val2';

remram · on Oct 1, 2022

The SELECT syntax is the one that's backwards IMHO (and that order hurts autocomplete too).

xwolfi · on Sept 30, 2022

I disagree, the people high on the org chart are clamouring for cloud-based low structure stores on microsoft azure cloud maintained in India, while we warn them that we can migrate at a giant cost for a net loss in efficiency.

SQL and dbs are fine, they work, we could discuss beautifying sql for sure, but frankly we shouldnt solving data structuring problems that were already solved 50 years ago: we have real business to support instead.

aabbcc1241 · on Oct 4, 2022

The ergonomics issue of SQL can be workaround by transpiler.

I do like SQL the syntax but it's hard to discard the ecosystem around SQL.

For instance, I made better-sql [1] recently which generate SQL from a language similar to GraphQL and EdgeDB query language.

[1] https://better-sql.surge.sh

emmelaich · on Oct 1, 2022

I hope that those two syntaxes could be unified in some future version.

Also, surely allowing FROM and SELECT clauses to be swapped in order shouldn't stress out any parser too much.

wruza · on Oct 1, 2022

FROM and all subsequent JOINs, to be precise. They all generate a single namespace which SELECT uses. As far as I understand the grammar, FROM isn’t even a separate clause, but is itself a separator between a columnset and an expression of the form `t1 JOIN t2 ON cond12 JOIN t3 ON cond123 JOIN …`, where t<n> may be either a table name, or an arbitrarily nested subexpression of the same form.

mastersummoner · on Sept 30, 2022

When I was first learning to code, I once told someone that working with SQL felt like trying to wrestle with a mainframe, and I stand by that assertion.

codegeek · on Sept 30, 2022

To my primitive brain, NoSQL always sounded like "I am going to makeup data model and integrity on the fly and let the user figure out how to handle it". No Thanks. But may be I don't get it. GraphQL honestly gives me the same vibes. Happy to be corrected.

RussianCow · on Sept 30, 2022

> GraphQL honestly gives me the same vibes.

Not that this is particularly relevant to the discussion, but GraphQL is actually much closer to SQL than most NoSQL solutions because it's statically typed and the schema is statically defined. The GraphQL schema acts as the contract of what the API can serve to the client, much like a schema in a relational database.

blagie · on Sept 30, 2022

"Replacements" for SQL have been around as long as SQL. Remember the OODBMS rage of the eighties and nineties?

https://en.wikipedia.org/wiki/Object_database

I also wanted to kill old, obsolete, crummy SQL when I first learned it as a teenager. It reminded me of FORTRAN or COBAL or something.

It took quite a lot of theory and maturity to understand why RDBMS is the right paradigm. SQL is imperfect, but it's good enough to not be worth replacing.

Whenever I've seen benchmarks of postgresql managing JSON versus mongodb, postgresql won (at least comparing apples-to-apples; postgresql is much more conservative by default, in terms of reliability and data consistency, than mongodb). There's literally no reason to going with a lot of NoSQL databases over using a SQL database as a key-value store.

flockonus · on Sept 30, 2022

> SQL is imperfect, but it's good enough to not be worth replacing.

Great way to put it.

> There's literally no reason to going with a lot of NoSQL databases over using a SQL database as a key-value store.

To be devil's advocate, the arguments I hear range more on ease of ops/scalability for noSQL, apparently some do it better than SQL implementations, and some drivers can make trivial operations easier as well. That said, I'd pick PostgreSQL for pretty much any new project I get to work on.

blagie · on Sept 30, 2022

It turns out that most SQL databases, when used as a key-value store, shard better than most NoSQL databasing.

You run into issues with things like JOINs across shards, but there's literally no upside even there to NoSQL versus using a SQL databased in a disciplined way. I've designed systems with the same type of infinitely horizontally scalable KVS-type storage for all the changing state in SQL.

SQL also means you can do things like:

- Do local joins (if e.g. all the data for a user is in the same shard)

- Keeping a small set of relational data (not the stuff which needs to scale) and have one technology

- Use read replicas to scale some of the stuff which doesn't fit in the "infinitely horizontally scalable KVS" model

.... and so on. You can't not think about it, but if you do think about it, it's all upside and no downside.

strbean · on Sept 30, 2022

> I also wanted to kill old, obsolete, crummy SQL when I first learned it as a teenager. It reminded me of FORTRAN or COBAL or something.

> It took quite a lot of theory and maturity to understand why RDBMS is the right paradigm.

You and the guy you are replying to are doing it again, conflating SQL and relational databases. :(

blagie · on Sept 30, 2022

No. I'm not. I said:

1) "RDBMS is the right paradigm." That's high praise. RDBMS is exactly the Right Thing, with a trademark and all.

2) "SQL is imperfect, but it's good enough to not be worth replacing." That's definitely not high praise. It means that this implementation has all sorts of warts and annoyance. However, that doesn't rise over the bar of replacing.

As a teenager, I definitely did conflate the two. Perhaps that's where your misread came from?

setr · on Oct 1, 2022

It probably comes from 90% of discussion on SQL revolves around de-conflating the two. It’s pretty much impossible to complain about SQL without clarifying the difference, and even then. Really I think we should just permanently assume the other party has never conceived of such a difference, no matter how much they insist otherwise.

SQL is here to stay, if only because few can be convinced that another language could query the RDBMS directly (without first compiling down to SQL). Peolle also keep persisting in this belief it’s standardized, for god knows what reason

blagie · on Oct 1, 2022

In abstract, I don't see a lot of downside to compiling down to SQL as an intermediate format. Or vice-versa, for that matter.

In practice:

1) Most attempts to do this historically have missed the point of an RDBMS (e.g. most ORMs).

2) SQL is cumbersome, but it's not quite cumbersome enough I'd want to bother with something else.

As a footnote, I find SQL-grade standards to be incredibly helpful. In 2022, I wish there were a strict standard, but in new domains, standards like this mean:

1) I can read code for virtually any database, with maybe gentle use of search engines, unless I get into really hairy corners. Ultimately, those, I can look up too.

2) If I write code for a new database, I don't need to learn anything; I can understand syntax with a few web searches.

3) For 90+% of SQL, I can write automatic scripts quickly and easily to translate between databases (when migrating, the remaining few percent, someone can do by hand, and it's a very tractable chore).

That's not the case jumping into a graph database or other stuff.

At the same time, they don't over-constrain things. If you want your database to have a BCD type, JavaScript stored routines, or some wonky form of virtual tables, you can.

Now, in 2022, we know enough to standardize all of this stuff, but I'm not sure we did when these tools were coming out.

As a footnote: I'm working in a domain with no standards, and if everyone could just pick JSON (or XML, or just about any one thing), we'd already have 50% of the benefit of a full standard. If people standardized a few nouns and verbs (e.g. 'user' versus 'user_id' versus 'actor' versus 'agent' versus ...), we'd be another 50% of the way there. Systems do different things, and I don't think going 100% of the way makes sense until we understand the domain better, but partial standardization is a huge win. I've been pushing hard for having partial standards, and hard for not having full standards yet.

xapata · on Sept 30, 2022

Sometimes you just want to store data and worry about validation later. I'm in that situation now, with different tasks requiring different kinds of validation for the same data. Sometimes I want all the postal codes that users have entered, sometimes I want only the valid postal codes.

catlifeonmars · on Sept 30, 2022

Just to be pedantic you can use SQL to query NoSQL databases, just look at the PartiQL ExecuteStatement API for DynamoDB. I assume your question is about why anyone would use a NoSQL databases instead of a relational database. The short answer is scale. If you have well define access patterns that rely only on a primary index, you can effectively scale to billions of QPS. If you want to make relational queries, and/or can’t know your access patterns in advance, use a relational database.

tlarkworthy · on Sept 30, 2022

NoSQL is like simple assignment to storage and one layer of obvious indexing. Its lower level, and therefore much faster and predictable because there is no query planner guessing how to assemble things.

Tuning SQL on the other hand is a guessing game with the query planner. You can express a lot more but 1% of those expressions could down your DB.

I like both, but if you want performance NoSQL is the winner.

the_af · on Sept 30, 2022

> if you want performance NoSQL is the winner.

Performance for which use cases?

It matters to clarify this because if you get rid of every use case SQL+RDBMS optimize for, NoSQL is obviously going to be the winner. A car with no safety measures and without most consumer-oriented features is probably going to be "faster" than one that has them.

wyager · on Sept 30, 2022

I have plenty of beef with SQL, but your insert versus update example doesn't make sense to me. That's a perfectly reasonable difference; insert is always a dense operation while update is usually a sparse operation.

netule · on Sept 30, 2022

It's also the simplest use case of INSERT imaginable and fails to be applicable with a slightly more complex one:

  INSERT INTO table1 (col_1, col_2, col3)
  SELECT COUNT(col_6), col_4, col_5 FROM table2 INNER JOIN...

dllthomas · on Sept 30, 2022

I think in principle we could allow

    INSERT INTO table1
        col_1 = COUNT(table2.col_6),
        col_2 = table2.col_4,
        col3 = table2.col_5
    FROM table2 INNER JOIN ...

doctor_eval · on Sept 30, 2022

You can always do

    update t set (x, y) = (n, m);

At least you can in more recent versions of Postgres, and I think it’s standard SQL.

duxup · on Sept 30, 2022

Hey man if SQL is so great why are people always trying to kill it??!?!?

/s

tiffanyh · on Sept 30, 2022

Declarative vs Imperative Language.

A huge benefit of SQL is that it's a declarative language (you describe what the output should be); whereas with imperative languages like C/Python/Ruby you have to describe how to generate the output with specific instructions and procedures.

jayd16 · on Sept 30, 2022

There's plenty of improvement to be had while still keeping it declarative.

jooz · on Sept 30, 2022

I consider SQL imperative. You say exactly how to do what you want, not what do you want.

If we take as example a table with a single row:

In imperative (SQL):

INSERT INTO TABLE1 (name, wage) VALUES ('John','50k');

Then one day John gets a raise:

UPDATE TABLE TABLE1 SET wage = '60k' where name = 'John'

In declarative (pseudo code in yaml):

- table: TABLE1 - name: John wage: 50k

Then one day John gets a raise:

- table: TABLE1

  - name: John

    wage: 60k

zmmmmm · on Sept 30, 2022

It's sort of a weird hybrid monster really, some times it's the worst of both worlds.

You state what you want, yes - but you are forced to articulate it as a specific set of table navigations / logistics through the relational model, as if you were writing the implementation. Then, however, the database then may choose ignore those and do something else to resolve the data you asked for if it wants, if it can prove the outcome is equivalent.

For example I want all the ice creams bought by John. Even though the schema knows the foreign key relationship between "user" and "purchase" I have to tell it back to the database engine in my query. But even after doing that there's no requirement the database will actually implement the steps I was forced so ungraciously to specify. It may "optimise" them away and do something else.

tiffanyh · on Sept 30, 2022

You’re describing to the RDBMS what you wanted added to the database. The SQL/RDBMS takes care of how that’s done.

In C, you’d have to program how the data should be stored (data structure) and written to disk.

OJFord · on Sept 30, 2022

> In C, you’d have to program how the data should be stored (data structure) and written to disk.

Just like declaring the columns you're specifying values for and then the values for them?

If SQL was declarative, you wouldn't have an error on duplicate CREATE, there would be no CREATE OR REPLACE, and changing column types would not be an error. It would just 'table x should be like this, make it so'.

(Actual queries/projections I would say are declarative, just the nomenclature pretends they're not. (Select from join all sounds very imperative, but really you're just describing what you want, and have no say over how it's retrieved.))

runeks · on Oct 2, 2022

Is there a formal specification of what “what” and “how” means in this context? I’m familiar with your description, but it appears subjective to me because the phrases “what the output should be” and “how to generate the output” are poorly defined.

411111111111111 · on Sept 30, 2022

You might notice that the only way to create a declarative language is to use an Imperative language.

Declarative languages are basically a DSL, which (hopefully) translate the desired steps into efficient instructions. Nonetheless, your cpu will execute imperative code at the end.

SQL is an example of a very well established and generally well done declarative language, but that doesn't mean that declarative languages are inherently better.

freedomben · on Sept 30, 2022

I think you're reading into something that isn't there. GP never said anything about declarative being better than imperative, merely that they are different approaches and SQL is declarative.

But even if they had, your argument that the CPU ends up running imperative code makes it better seems silly. The CPU ends up "running" machine code and a compiler has to turn the vast majority of imperative code into a different form. Does that make machine code better than assembly? better than C?

I don't think so. They are simply different levels of abstraction, each with their own pros and cons. Neither are "better" than the other. Each are "better" at some tasks and worse at others.

GP said there was a "benefit" with SQL being declarative, because when you want to abstractly request some arbitrarily structured data, it's beneficial in most cases to not have to know exactly how to find and retrieve that data. It's certainly not "better" if you need to ensure some specific bit format on the hard drive. But it's "better" if you want to succinctly express a query that is broadly reusable and understandable even to people who have no idea how the database works on the inside.

411111111111111 · on Sept 30, 2022

You're reading something into my comment I didn't want to say too

I just felt the need to point out that declarative languages are essentially always a DSL, wherever this DSL is actually performant and should be used depends on it's implementation.

Generally speaking, SQL is very well implemented so using any of the well established databases is probably a good choice. Nonetheless, few declarative languages come even close to SQLs efficient implementation so they're very rarely the answer.

acmiyaguchi · on Sept 30, 2022

There’s nothing stopping you from writing a compiler or interpreter in a non-imperative language like prolog.

Also performing the equivalent of a table join in a non-declarative query language is going to be a burdensome task. It’s certainly beneficial to let a query planner figure out the details for you, instead of iterating over all the rows of your data.

411111111111111 · on Sept 30, 2022

Which compiles it to bytecode, which is Imperatively executed on your cpu.

You didn't think that one through, did ya?

I'm not even sure where your outage comes from. DSLs aren't inherently bad either.

pjmlp · on Sept 30, 2022

Design and analysis of hardware for high-performance Prolog

https://www.sciencedirect.com/science/article/pii/S074310669...

skwirl · on Sept 30, 2022

It means you can let the database engine determine the best way to get you what you are after. If you write imperative code, you are telling the DB exactly what to do and you don't really give it much ability to optimize your queries. Before SQL most DB engine queries were imperative and it was terrible... which is why we don't do it anymore.

0xbadcafebee · on Sept 30, 2022

People keep twisting themselves into knots trying to fit things into these labels, but the net effect is unrelated to the description.

Using a "declarative language":

  my_bucket = aws_s3_bucket(aws_region, bucket_name)

Using an "imperative language":

  if aws_s3_bucket.exists(aws_region, bucket_name):
    my_bucket = aws_s3_bucket.update(aws_region, bucket_name)
  else:
    my_bucket = aws_s3_bucket.create(aws_region, bucket_name)

I prefer the latter. In every DSL I've ever used, people end up needing to handle weird edge cases, and it's extremely hard to do that with a "declarative-only" DSL, so they end up adding imperative-ness to the DSL. Give me a regular "imperative" programming language and lots of convenience functions that do black magic behind the scenes, and I'll do regular programming when the black magic falls short.

This is basically why AWS CDK / Terraform CDK / Pulumi exist.

mirekrusin · on Sept 30, 2022

...and you have nice imperative race condition right there in your simple example.

johnthescott · on Sept 30, 2022

bravo.

0xbadcafebee · on Sept 30, 2022

From a simplistic point of view, yes, but from a practical standpoint, no. IaC assumes a global lock on any resources being managed. As soon as that expectation is violated, everything starts breaking, regardless of how good the code is.

samatman · on Sept 30, 2022

Furthermore, the only way to implement an imperative language is to use a declarative language.

Verilog and VDSL, you see, are declarative. They have to be.

It's not a better or worse thing. It's a domain thing.

daxvena · on Oct 1, 2022

Only if you limit yourself to current CPU architectures. Even GPUs right now are significantly more declarative than CPUs, and it definitely seems like some point in the future we could end up moving towards asynchronous circuits built with memristors.

munificent · on Sept 30, 2022

I'm sure this will ruffle some feathres, but the technical skill that has been the most persistently valuable across the past 25 years of my career is object-oriented programming and design. It's not the only paradigm I use, but being able to organize large programs in terms of OOP has been incredibly valuable and hugely productive for me across several languages and domains.

kortex · on Sept 30, 2022

On the contrary, I think that's a self-fulfilling perception. If assets are already organized as objects, then it'll continue to work well with oop. But if you take the approach from the get-go to use data and FP approaches, things can be quite nice. Examples:

- interface/trait based programming / structural subtyping (widely used in go/rust, increasingly in TS and python)

- terraform

- react

- aws lambda / cloud functions

- flow-based data processing (the whole of deep learning, spark/hadoop)

- and of course, anything declarative DSL based (SQL, jq and friends)

So I would counter that the more valuable skill is "how do I solve problems in terms of applying and composing transforms to data"

To clarify, since everyone has their own definition of OOP, and of the four pillars, Abstraction, Polymorphism, aren't at all unique to OOP, and Encapsulation is just Abstraction: the defining features of OOP are inheritance and poking-and-prodding-state into opaque objects. Inheritance is subsumed by interfaces / structural subtyping, and poking at state is contrasted with reactor patterns, event sourcing, persistent data structures, etc.

Oop really shines at the middlin-low level, in languages without a lifetime (state for things like IO resourcese, at the GUI widget level, and the (micro)service level, which is more like the original smalltalk sort of objects, in which case inheritance isn't a think.

munificent · on Sept 30, 2022

> On the contrary, I think that's a self-fulfilling perception.

So is using SQL. My point, as well as the parent author's point, is that I've found a technology that has been a useful source of value for my entire career. I make no claims that there aren't other useful sources of value, but I've never for a second regretted the time I've spent getting better at writing object-oriented code.

I'm also quite comfortable writing functional and dataflow-oriented code, but I've never found those to be as exciting as some people seem to.

kortex · on Oct 1, 2022

Fair enough!

And yeah, since OOP is everywhere, there's zero downside to getting better at it.

> I've never found those to be as exciting as some people seem to.

It's less about excitement (for me at least) and less about the hair pulling associated with reasoning about complex state. I find the more independent attributes an object has, the harder it gets to reason about, unless I can serialize the whole thing (which usually goes against encapsulation).

I'd say that's the number 1 thing I do to get "better at OOP": have a healthy suspicion towards any long-lived object I can't serialize/deserialize using primitive types.

Scarbutt · on Sept 30, 2022

React is implemented using OOP, which I think is great for libraries that are defining and defending boundaries.

kortex · on Oct 1, 2022

That's like saying the Glasgow Haskell Compiler is implemented in part with C, therefore Haskell is imperative.

The React "way" is very much leaning into pure immutable data that flows through components which react to changes. Components aren't objects in the OOP sense. You don't call methods on them to update their internal state.

Scarbutt · on Oct 1, 2022

Never said React was imperative.

continuational · on Sept 30, 2022

Hooks is not really OOP. The original OOP interface was abandoned because it lead to a very complicated lifecycle that was hard to reason about.

Scarbutt · on Sept 30, 2022

I'm referring to the implementation of React, not its API.

yamtaddle · on Sept 30, 2022

Can confirm, unless it's changed radically since I last looked at it, yes, everydamnthing in React ends up as an object representing a component, and I don't just mean in the way that JS functions are technically also objects.

pjmlp · on Sept 30, 2022

Except the little detail that JavaScript is an OOP language in the linage of SELF, where even basic types have methods.

mirekrusin · on Sept 30, 2022

No, it's not.

yamtaddle · on Sept 30, 2022

It is, in large part, right down at what you might call the heart of the whole thing. Though I haven't read the React source code since a bit after Hooks were added so I guess maybe that's changed—though I doubt it.

Jweb_Guru · on Sept 30, 2022

Everyone has a different experience! Object oriented programming and design have had approximately no impact on my own career, while SQL knowledge has been indispensable. I don't think they're really in competition.

pmarreck · on Sept 30, 2022

Tell me you've never worked on large FP codebases (as a point of comparison) without telling me you've never worked on large FP codebases ;)

munificent · on Sept 30, 2022

That's true, I haven't.

Weird how many large OOP codebases there are to find oneself working on and how few large FP ones. :) Maybe that's just coincidental, or maybe not.

sophacles · on Oct 1, 2022

Should the inference be:

FP doesn't scale?

- or -

FP inherently creates small code bases?

pmarreck · on Oct 1, 2022

Well, having worked on both, I was able to tell that you only worked on one since your stated opinion would likely be very different had you worked with both, so take from that what you may

runeks · on Oct 2, 2022

> Weird how many large OOP codebases there are to find oneself working on and how few large FP ones.

How on earth is that weird when OOP is a hundred times more popular than FP?

munificent · on Oct 3, 2022

The point is to ask why OOP is a hundred times more popular if, according to FP advocates, it's so much worse as a paradigm.

pmarreck · on Oct 5, 2022

Inertia is a powerful thing. Look at how many people still use C and C++ (and Java!) despite there being arguably better alternatives.

Scarbutt · on Sept 30, 2022

I can see how can that be the case if one never works with products that rely heavily on having a database.

munificent · on Sept 30, 2022

Yes, my career is possibly a bit unusual in that I've mostly not worked on heavily database-driven stuff. Mostly client applications, games, tools, and programming languages.

avgcorrection · on Sept 30, 2022

Just a slow-to-die 90’s fad in the end.

kwhitefoot · on Sept 30, 2022

OOP is considerably older than '90s. Turbo Pascal 5.5 had objects in 1989 and was far from the first.

Anyway structs plus FP is no less object oriented. You just don't have to write the methods in the same code module. See NIM Method Call Syntax for example: https://nim-lang.org/docs/tut2.html#object-oriented-programm...

Yes, I know NIM is not purely functional. Use F# instead to get a bit closer to purity, it also does OOP quite nicely as does pretty much any functional language that has closures.

OOP is a tool just as Git is a tool.

See also: https://medium.com/extreme-programming/oop-vs-fp-182475457a0...

randomdata · on Oct 1, 2022

> FP is no less object oriented.

In fact, Erlang (and Elixir, I suppose) may be the only object-oriented programming language[1].

[1] https://www.infoq.com/interviews/johnson-armstrong-oop/

avgcorrection · on Sept 30, 2022

> OOP is considerably older than '90s.

Fad means popular (in a deragetory way). 90’s was its popularity heyday.

> See NIM Method Call Syntax for example: https://nim-lang.org/docs/tut2.html#object-oriented-programm...

So oop is when you `obj.methodName(args)` instead of `methodName(obj, args)`? Well if that’s all it is about then I can get behind it...

> OOP is a tool just as Git is a tool.

An iron maiden is just a tool. Don’t be hatin’ it’s just a tool what has it done to you?

rubyist5eva · on Sept 30, 2022

I feel my abilities as a developer really took off when I took the time to really learn SQL and take advantage of the features that my database offers. For example, learning to use window functions felt like that galaxy brain meme where you become one with the universe.

Being able to take 50 lines of ruby code (manual (anti-)joins and aggregates, result partitioning, etc...) and replace it with a couple lines of sql that is much faster and less buggy is a life changing experience.

The only other time I had such a dramatic shift in the way that I look at building applications is when I learned Scala (functional programming).

johnthescott · on Sept 30, 2022

> replace it with a couple lines of sql that is much faster and less buggy is a life changing experience.

you have snatched the pea.

meken · on Oct 1, 2022

I googled this expression but didn’t find anything. What does it mean?

yCombLinks · on Oct 1, 2022

Pretty sure it's a reference to https://www.youtube.com/watch?v=9selPW2lL-M

johnthescott · on Oct 7, 2022

correct. sql moves the mind from pondering scalars to groking sets.

irrational · on Sept 30, 2022

I’ve been using HTML since the mid 1990s. That is probably the oldest technology that I still use on a regular basis. SQL would be a close second (or maybe third - I probably started learning sql around 2001 and I can’t remember when JavaScript came out. Surely before that.)

jrm4 · on Sept 30, 2022

As above has noted, Both HTML and SQL are effectively declarative, hence their staying power.

irrational · on Sept 30, 2022

And JavaScripts staying power is backwards compatibility for billions of web pages, don't want to introduce another language to the browser because it would cost a lot of money for developers to build the engine and would increase the security attack surface, and all the browsers would have to add a new engine around the same time so developers would be willing to use it.

jrm4 · on Oct 2, 2022

Sure, I don't disagree that Javascript also has some staying power.

But I suppose if you look at the web programming stack and its history (which helps explain how stupid and nonsensical it is) I'll bet that in the long run HTML and SQL stick around longer as a part, even when they're perhaps not very "good?"

As in, a good bit of the energy around Javascript is "dealing with its extreme shortcomings directly in a way that implies possible replacement?" Like, transpiling feels different from layering on top. And people transpile Javascript, but layer on top of HTML? Something like that.

youainti · on Sept 30, 2022

And yet webassembly seems to be going somewhere.

irrational · on Sept 30, 2022

It isn't replacing JavaScript. At the very least you still need JS to access the DOM. And, for backwards compatibility, I don't see JS going away during our lifetimes. And I don't see another language being added to browsers in place of JS (not including webassembly, which is not a replacement for JS).

randomdata · on Oct 1, 2022

XHTML looked like it would totally usurp HTML at one point but then we went back to HTML with avengence when everyone jumped on the HTML5 hype train.

Postgres used QUEL instead of SQL for the first decade of its life. There were several competing languages in the relational database space at one time, but SQL eventually won, no doubt thanks to Oracle and IBM's utter dominance in the space a few decades ago.

Declarative doesn't mean staying power. Reaching ubiquity through luck and random chance seems to be what achieves staying power. Javascript likely fits this category too. Browser developers 30 years from now will no doubt still be using Javascript.

that_guy_iain · on Sept 30, 2022

Honestly, mines has mostly stayed the same. I've played with various techs but I mostly work the same underlying tech. Some items get added but very rarely does something get replaced or removed. The tech gets improved and new features added .

The last time I can remember something getting replaced is vargrant for docker. And honestly, I think that's a mistake for many companies. Before that it was git for svn.

The only way I can see tech coming and going is you're always an early adoptor and need to replace things that failed. If you build on stable tech, that stuff stays around for decades even if it was terrible in the first place. A good example is PHP. It was terrible at first still around and improving year on year.

marcrosoft · on Sept 30, 2022

Can you elaborate on why vagrant to docker is a mistake? I find the docker workflow infinitely better and faster to iterate for making dev environments.

pphysch · on Sept 30, 2022

If your deployment target is a VM/bare-metal rather than container, Vagrant makes a lot of sense for development.

E.G. my dev Vagrantfile fits in a small screen, and provisioning is handled by a single Ansible playbook that works for all environments (using conditionals). So there's no need to maintain a separate dev container, and a lot of subtle headaches are avoided in the deployment process

Setting up the dev environment consists of running `vagrant up` (assuming Vagrant/Ansible are installed). That's it.

You get all the benefits of IaC (fast ~reproducible dev) without shoehorning containers in.

alfalfasprout · on Sept 30, 2022

It also comes with a lot of downsides and significantly increased complexity in many cases. Depends what you're doing.

arjvik · on Sept 30, 2022

Why do you think Vagrant->Docker was a mistake?

that_guy_iain · on Sept 30, 2022

Most companies have a large amount of devs on macbooks. Docker on the macbook has historically been painful. Mainly due to the filesystem. Whereas, vagrant doesn't really have that problem. I've been using Linux for so long that I haven't really experienced the pain myself. But I have gotten to enjoy the smugness of them all complaininga about their performance while mines is super snappy.

Then if you look at the toolchain surounding vagrant such as packer. It looks really nice. Docker doesn't seem so nice. In fact, I still use packer to build my docker container images because I like the setup nicer.

With Docker it's all about the production env. I see the benefits for prod but for dev when you're literally not using the same docker setup as you would in prod it makes no real sense to me.

0xbadcafebee · on Sept 30, 2022

Vagrant still allows people to have deviating personal environments, which destroys productivity when people have to constantly diagnose what changes they have in their local env don't match other things. Vagrant+Docker is a decent compromise. Best is cloud-native development. If everyone [on a single team] SSH's into one box in the cloud with one set of dependencies, everyone gets the same environment every time, and updates go out to everyone immediately without fail. K8s has similar projects to let you develop in a pod on the actual cluster, eliminating difference between dev<->prod.

that_guy_iain · on Oct 1, 2022

> Vagrant still allows people to have deviating personal environments,

So does docker the way many folk use it for their development. In fact, I saw in one place people set up env setups for their docker so each team member could have theirs the way they wanted it. Create a container and use forever. Very rarely in my experience are people completely rebuilding their env.

robertlagrant · on Sept 30, 2022

Docker was painful on Mac, but quite a while ago. I don't have any problems these days.

that_guy_iain · on Sept 30, 2022

The last complaints I heard were about a year ago. Which point those devs left.

quickthrower2 · on Sept 30, 2022

The problem I see here is building SQL on top of the cloud, which is like building SQL on top of a database where the tables are always changing - not just adding columns but the meaning of the columns and rows, or even the existence of the table.

I think this is a good idea, but the cloud doesn't seem to care too much about backwards compatibility. If you use MySQL/PostgreSQL/SQL Server, you can be fairly sure in 20 years time your SQL will still work :-).

The need for this library is more an indictment on "the cloud" than anything. I am mostly an Azure user so AWS/Google might be better but man it changes it's UI and CLI interfaces alot - way too much.

papito · on Oct 1, 2022

SQL is a technology that engineers should know in an out, and while in early 2000's you could not move an inch without knowing it, I find that the new generation of devs really considers database calls a freebie - hidden behind abstractions and ORMs. It's just a line of code, right?

This leads to highly non-performant software since, unless we have bent the laws of physics in the last 15 years, your data access layer is the most expensive one.

But we are crushing LeetCode, yeah?

rsync · on Sept 30, 2022

"Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around."

Ethernet comes to mind when I think of technologies that just can't be killed ...

matwood · on Sept 30, 2022

> It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent.

Agree. IIRC, I wrote my first SQL is 1996 and wrote some today. While not perfect, it’s amazing at what it does. I suggest reading Joe Celko to really up your SQL skills.

johnthescott · on Sept 30, 2022

yep.

and i wrote my first quel in 1984 at BLI. and sql still won the war.

api · on Sept 30, 2022

That's not random. SQL is basically math. The syntax is a little old school and arcane (but not terrible) but the principles behind it are set theory and logic.

randomdata · on Oct 1, 2022

> That's not random.

It's kind of random. There were plenty of competing languages in use until the 90s. Even our beloved Postgres used QUEL for its first decade of life. But Oracle took the dominant lead and its competition either disappeared or started adding SQL support to be compatible with it.

Had a different butterfly flapped its wings all those years ago, things could have turned out very different.

> SQL is basically math.

All programming languages are basically math.

> but the principles behind it are set theory

Thing is, one of the challenges with SQL is that it doesn't adhere to set theory. This leads to some unintuitive traps that could have been avoided had it carefully stuck to the relational algebra.

Such deviation may have been a reasonable tradeoff to help implementations, perhaps, but from a pure language perspective it does make it more terrible than it should have been.

sleiben · on Sept 30, 2022

Reminds me of Git.

irrational · on Sept 30, 2022

Eh, I’ve seen various things like Git come and go. I spent the majority of my career in the days before git existed and we used things like SVN. I wouldn’t be surprised to see Git supplanted. I would be surprised to see SQL supplanted.

swalsh · on Sept 30, 2022

Git is a bit newer, and there have been many attempts to solve the same problem it solves, but it solves that specific problem VERY well... it'll be a long time before it's dethroned in my opinion.

shepherdjerred · on Sept 30, 2022

Git is surprisingly only 17 years old while SQL is 48 years. Not that you’re wrong, it’s just interesting how comparatively newer it is

marcosdumay · on Sept 30, 2022

Also, git had major competition for almost half of that time where it was not clear at all what software people would settle on (or even if they they would actually settle down).

SQL was nearly undisputed its entire life. It quickly killed every previous architecture, and everything that came after it made a point on being compatible.

TremendousJudge · on Sept 30, 2022

>It quickly killed every previous architecture

This feat was much easier to achieve when the industry was much, much smaller, and a single company mostly owned the entire business sector (who were the people that needed databases in the first place).

marcosdumay · on Sept 30, 2022

Well, a single company with a huge unexplainable help from the US government.

But SQL stayed unopposed due to its qualities, not because of its iffy start.

TremendousJudge · on Sept 30, 2022

SQL is at least 30 years older than Git. Git has yet to prove its longevity in the same way.

nightski · on Sept 30, 2022

That statement will be true forever. It doesn’t tell you much.

TremendousJudge · on Sept 30, 2022

Let me restate in relative terms then. SQL is over twice as old as Git. When Git was originally released, SQL had been around for longer than Git has been today. Git has had a pretty good track record, but it's not comparable yet in terms of age or even in terms of mass adoption -- plenty of orgs don't use Git to manage their code.

nathanwallace · on Sept 30, 2022

Excited to see Steampipe shared here - thanks kiyanwang! I'm a lead on the project, so sharing some quick info below and happy to answer any questions.

Steampipe is open source [1] and uses Postgres foreign data wrappers under the hood [2]. We have 84+ plugins to SQL query AWS, GitHub, Slack, HN, etc [3]. Mods (written in HCL) provide dashboards as code and automated security & compliance benchmarks [3]. We'd love your help & feedback!

1 - https://github.com/turbot/steampipe 2 - https://steampipe.io/docs/develop/overview 3 - https://hub.steampipe.io/

davesque · on Sept 30, 2022

Hey Nathan. Can you comment on some of the known performance pitfalls of steampipe? I'm not super familiar with the Postgres foreign data wrappers API. I assume steampipe inherits many of its technical limitations from this API.

Having done some work in this space, I'm aware that it's no small thing to compile a high-level SQL statement that describes some analytics task to be executed on a dataset into a low-level program that will efficiently perform that task. This is especially true in the context of big data and distributed analytics. Also true if you're trying to blend different data sources that themselves might not have efficient query engines.

Would love to use this tool, but just curious about some of the details of its implementation.

nathanwallace · on Sept 30, 2022

The premise of virtual tables (Postgres FDW) is to not store the data but instead query it from the original source. So, the primary performance challenge is API throttling. Steampipe optimizes these API calls with smart caching, massive parallelization through goroutines and calling the minimum set of APIs to hydrate the exact columns requested. For example, "select instance_id from aws_ec2_instance" will do 2 API calls get 100 instances in 2 pages, while "select instance_id, tags from aws_ec2_instance" would do 2 calls (instance paging) + 100 tag API calls (one per instance). We've also implemented support for qualifiers (i.e. where clauses) so API calls can be reduced even further - e.g. get 1 EC2 instance without pagination etc.

The Postgres planner is not really optimized for foreign tables, but we can give it hints to indicate optimal paths. We've gradually ironed out many cases here in our FDW implementation particularly for joins etc.

If you can tolerate my Aussie accent, I explain many of the details in this Yow Data talk - https://www.youtube.com/watch?v=2BNzIU5SFaw

davesque · on Sept 30, 2022

Awesome, thanks so much for that! I guess then it sounds like, even if bandwidth were not an issue for a datasource, there would still be the physical limitations of the machine on which steampipe would do any later processing (such as joins between datasources). In other words, there's no sense in which steampipe distributes this work across multiple processes or machine. Is that correct?

Sorry if a dumb question. I could be thinking entirely in terms of the wrong paradigm here since my work in this space was primarily concerned with distributed computing and big data.

nathanwallace · on Sept 30, 2022

Steampipe computes the SQL part in a single Postgres instance (not distributed) - we've not found this to be a significant limit (so far). A difference in this case is that the query is really combining and organizing results that may been processed at the source. In 2022 most services are API first and offer great searching and filtering, so we can use that to do much of the query processing in many cases.

davesque · on Sept 30, 2022

Great, thanks again! Yeah, a lot of the things steampipe is doing are right up my alley. Fun to learn about! Watching your talk now and enjoying it.

cube2222 · on Sept 30, 2022

To add somewhat of a counterpoint to the other response, I've tried the Steampipe CSV plugin and got 50x slower performance vs OctoSQL[0], which is itself 5x slower than something like DataFusion[1]. The CSV plugin doesn't contact any external API's so it should be a good benchmark of the plugin architecture, though it might just not be optimized yet.

That said, I don't imagine this ever being a bottleneck for the main use case of Steampipe - in that case I think the APIs themselves will always be the limiting part. But it does - potentially - speak to what you can expect if you'd like to extend your usage of Steampipe to more than just DevOps data.

I've used the benchmark available in the OctoSQL README.

[0]: https://github.com/cube2222/octosql

[1]: https://github.com/apache/arrow-datafusion

Disclaimer: author of OctoSQL

davesque · on Oct 1, 2022

Yep, I'm aware of DataFusion and I'd expect nothing less than top notch performance from it. Interesting comparison.

klysm · on Sept 30, 2022

Love to see Postgres FDW used. It’s a really powerful and imo not utilized as much as it could be.

cube2222 · on Sept 30, 2022

Hey! Steampipe looks great and I think the architecture you chose is very smart. Some feedback from myself:

I've tried setting up steampipe with metabase to do some dashboarding. However, I’ve found that it mostly exposes "configuration" parameters, so to say. I couldn't find dynamic info, like S3 bucket size or autoscaling group instance count.

Have I done something backwards or not noticed a table, or is that a design decision of some sort? That was half a year ago, so things might've changed since then, too.

nathanwallace · on Sept 30, 2022

Steampipe mostly exposes underlying APIs or data through SQL. So, "select * from aws_s3_bucket" will return bucket names, tags, policies, etc all together. But, bucket size is not a direct API so doesn't have a direct table.

In some high value cases we've added tables to simplify / abstract / normalize data - for example AWS IAM policies - https://steampipe.io/blog/normalizing-aws-iam-policies-for-a...

This example query returns the number of instances attached to an autoscaling group - https://hub.steampipe.io/plugins/turbot/aws/tables/aws_ec2_a...

BTW, we recently published a Metabase integration guide - https://steampipe.io/docs/cloud/integrations/metabase

petercooper · on Sept 30, 2022

It's a very cool project!

It might just be a coincidence, but an hour before this HN post, I discovered it way back in our queue of things to review for https://golangweekly.com/ and featured it in today's issue. Hopefully kiyanwang is one of our readers :-D

nathanwallace · on Sept 30, 2022

Awesome - thanks for the shout out in golangweekly!

petercooper · on Sept 30, 2022

No worries, it's my job :-) I'm just so intrigued to see (well, guess, in this case) how the path of word of mouth works because most of the time it's a complete mystery!

aaronharnly · on Sept 30, 2022

Congratulations! This looks incredibly powerful and I'm excited to check it out.

Although this is pitched primarily as a "live" query tool, it feels like we could get the most value out of combining this with our existing warehouse, ELT, and BI toolchain.

Do you see people trying to do this, and any advice on approaches? For example, do folks perform joins in the BI layer? (Notably, Looker doesn't do that well.) Or do people just do bulk queries to basically treat your plugins as a Fivetran/Stitch competitor?

nathanwallace · on Sept 30, 2022

While it's extensible, the primary use case for Steampipe so far is to query DevOps data (e.g. AWS, GitHub, Kubernetes, Terraform files, etc) - so often they are working within the Steampipe DB itself and combining with CSV data etc.

But, because it's just Postgres, it can be integrated into many different data infrastructure strategies. Many users query Steampipe from standard BI tools, others use it to extract data into S3, it has also been connected with BigQuery - https://briansuk.medium.com/connecting-steampipe-with-google...

As opposed to a lake or a warehouse, we think of it as a "Data Rainbow" - structured, ephemeral queries on live API data. Because it doesn't have to import the data it works uniquely well for small, wide data and joining large data sets (e.g. search, querying logs). I spoke about this in detail at the Yow Data conference - https://www.youtube.com/watch?v=2BNzIU5SFaw

doublerebel · on Oct 1, 2022

Over the last week I’ve been wanting to use Steampipe for DevSecOps by layering Sliderule.io on top. Do you have any customers doing this or something like it already? I am going to give it a try tomorrow and take it up with our Sliderule contacts on Monday.

nathanwallace · on Oct 2, 2022

Sounds interesting - let us know how you go!

ezekg · on Sept 30, 2022

This is awesome. Makes me want to write a plugin for my SaaS, just because it looks fun.

nathanwallace · on Sept 30, 2022

I'm biased, but writing plugins is really fun - and then you can create mods / dashboards for your users as well! Please give it a go - we have a guide [1], 84+ open source plugins to copy [2], many authors in our Slack community [3] and we're happy to get your plugin added to the Steampipe Hub [4].

1 - https://steampipe.io/docs/develop/writing-plugins 2 - https://github.com/topics/steampipe-plugin 3 - https://steampipe.io/community/join 4 - https://hub.steampipe.io/plugins

breck · on Sept 30, 2022

This is amazing. Can't believe I hadn't seen it before. Nice job

VectorLock · on Sept 30, 2022

Are people usually setting up centralized shared instances of Steampipe or is it more of a "run on everyone's laptop" deployment preferred?

nathanwallace · on Sept 30, 2022

We see users running Steampipe on their desktop, in pipelines (e.g. to scan Terraform files) and also as a central service. See "service mode" in the CLI for a Postgres endpoint and a dashboard web server [1] or try docker / kubernetes [2]. We also have a cloud hosted option with dashboard hosting, multi-user support, etc [3].

1 - https://steampipe.io/docs/managing/service 2 - https://steampipe.io/docs/managing/containers 3 - https://steampipe.io/docs/cloud/overview

rmetzler · on Sept 30, 2022

My guess is, that it makes sense to get regular reports, e.g. weekly. But you also want to experiment and develop queries. So probably both. Not sure if there is something like a notebook for steampipe.

nathanwallace · on Sept 30, 2022

Because it's just Postgres, connecting Jupyter notebooks to Steampipe via SQL works really well :-)

ComodoHacker · on Sept 30, 2022

What about joins, are they supported? Can't see them in the examples.

The real power of SQL is locked until you can join different data sources.

judell · on Sept 30, 2022

Here is one of our favorite examples: https://steampipe.io/blog/use-shodan-to-test-aws-public-ip

nathanwallace · on Sept 30, 2022

Yes - joins work across tables and schemas! This (toy) example connects IAM user records with Slack user data:

    select u.name, s.id, s.display_name
    from aws_iam_user as u, slack_user as s
    where u.name = s.email

chousuke · on Sept 30, 2022

Steampipe is pretty much "just" PostgreSQL with a foreign data wrapper and some fancy extras on top. The data is in tables from the database's perspective, so pretty much everything you can do with PostgreSQL, you can do with steampipe, including creating your own tables, views, functions and whatnot.

Lorin · on Oct 1, 2022

FYI hiring page requires login to notion

nathanwallace · on Oct 1, 2022

hmmm ... I just tested in Incognito mode and it's working for me. Could you please try again? We are hiring for multiple roles!