Editor completion. Programmability "out-of-the-box" (rather than having to gener...

RadiozRadioz · 2024-11-16T06:07:49 1731737269

Why would you generate SQL using another programming language? To me that sounds like something you'd only do if you're deep in an ORM with no escape hatch. For data analysis tasks, that's extremely unergonomic and you should definitely just write normal SQL. Use the ORM for CRUD. I've never seen an ORM that won't let you drop down to regular SQL for ad-hoc queries.

Editor completion is an extremely low ranking aspect for choosing technologies for data analysis. If SQL is the better tool but you're not using it because it doesn't have editor completion, then you need a better editor. It pains me when people prioritise "developer experience in VS Code" over "actually the correct technological choice".

halfcat · 2024-11-16T06:44:58 1731739498

Can I ask what you do for version control of SQL?

I ask this sincerely, as I’ve seen many scenarios over the years where a tool like Django is used to manage a Postgres database, solely that the schema migrations are captured in version control. A .sql file can be in version control, but rolling back a git commit is not the same as being able to roll back a schema migration in the actual database.

RadiozRadioz · 2024-11-16T08:12:59 1731744779

Sure, happy to go into it. Firstly we need to distinguish between version control of DQL and DDL.

In the context of comparing SQL and Pandas, we're mostly talking about DQL, so version control for this is exactly the same as any other query code.

For the DDL side that you're asking about, indeed just version controlling SQL files won't work, you need to use a proper migration tool. There are many available that do pure SQL migrations in the same way Django works (a sequence of up/down operations to perform or revert a migration). Personally I use migrate[1].

So you really can achieve the same thing. Personally I like this approach because it can do anything Django can, but it can also do more complicated migrations that can't be expressed by an ORM's DB-agnostic abstract layer. Pure SQL migrations also decouple your database schema from your ORM/framework, which I think is sensible anyway; databases & data tend to be much stickier than apps & frameworks.

A downside here is that you need to do some extra work to keep models in sync with the schema. But the point I was making with my original post is that you can totally use ORMs for CRUD and auto-generate away all that boilerplate, _in addition to_ using raw SQL for complicated data analysis. My point is it's not binary, you can mix them, and there's nothing forcing you to generate _all_ your queries if you use an ORM for most of them. Use the ORM to manage schema, then SQL for a one-off query - still works.

[1]: https://github.com/golang-migrate/migrate

bcoates · 2024-11-16T07:11:56 1731741116

You roll back a schema migration on a dev/test database by nuking it and regenerating it, probably with the same tools you use to manage the rest of your environment.

You don't rollback a schema migration to a production database, because that is impossible. What has been done cannot be undone.

bmer · 2024-11-18T00:51:31 1731891091

What's the "right editor" for SQL?

"Correct technological choice": I think relational algebra style APIs (a la Polars) are the "correct technological choice" here. SQL is just a tool to express relational algebra, and I'm not sure it's a good one.

_boffin_ · 2024-11-20T01:32:30 1732066350

I personally love any of the IDEA platform products, such as IntelliJ Ultimate, DataGrip, and also DataSpell.

For strictly analysis, I’d recommend DataSpell as it’s more oriented towards that. Additionally, it has built in functionality to do sql queries in notebook cells and will save to a variable which is a data frame.