Just use Postgres

DarkNova6 · 2024-08-17T10:20:50 1723890050

> If you see a college student or fresh grad using MongoDB stop them. They need help. They have been led astray.

I like this sentence way more than I should.

vvillena · 2024-08-17T11:04:33 1723892673

Fully agree. At that level, the justification for using MongoDB usually boils down to not wanting to deal with table schemas or SQL. In both cases, there are better alternatives.

mrweasel · 2024-08-17T12:06:51 1723896411

> not wanting to deal with table schemas or SQL.

In most cases you're not even removing the concept of the schemas, your just moving them from your database to your application. Which in a sense is worse, especially if multiple applications needs to access the same database.

As for SQL, I have yet to see something solve the problem for querying a database in a simpler manor.

formerly_proven · 2024-08-17T11:30:03 1723894203

I’ve heard stuff like this from supposedly senior people - if we use mongo we can just store anything, we don’t need to think about a schema (also you can only store short strings in an SQL database)

croes · 2024-08-17T12:22:17 1723897337

https://news.ycombinator.com/item?id=41268803

Vosporos · 2024-08-17T11:35:50 1723894550

Good Lord the level of incompetence is terrifying

stogot · 2024-08-17T14:57:03 1723906623

Most people dontgain competence until they learn from their mistakes

hyper_cube · 2024-08-17T16:42:02 1723912922

But typically, if you do spend some time to think about schema, and relationships, you only have to spend a small amount of time, and map that into the appropriate place, and then usually you don't have to mess with it again. It's worth the investment.

j45 · 2024-08-17T14:19:13 1723904353

If we don’t think about it now we’ll really have to think about it way more later.

markoman · 2024-08-18T08:58:28 1723971508

Hence, the birth of the dreaded 'technical debt'.

arpinum · 2024-08-17T10:56:10 1723892170

This sentence has no substance. It gives the reader nothing to understand why MongoDB has both happy customers some who have been disappointed.

Rastonbury · 2024-08-17T11:13:53 1723893233

The explanation is in the paragraph before it

arpinum · 2024-08-17T11:18:37 1723893517

except its factually incorrect when applied to Mongo[1].

[1] https://news.ycombinator.com/item?id=41273563

heavenlyblue · 2024-08-17T11:37:30 1723894650

You can also shard with postgres, how is that a USP of Mongo?

arpinum · 2024-08-17T11:46:58 1723895218

I didn't say it is a usp for mongo. Only saying that there is a lack of substance behind the mongo criticism in the article.

cryptoz · 2024-08-17T10:35:22 1723890922

I don’t. How are new grads supposed to learn the ups and downs of different choices they make? Just being told they’re led astray in a blog post isn’t gonna work - it’ll backfire.

I used node as a new grad for things it wasn’t meant for and that’s how I learned what it is good at and what it isn’t.

vbezhenar · 2024-08-17T11:40:19 1723894819

It's OK to use anything in personal or throw-away projects.

But if your choices might cause big losses, it's better to avoid experimental technologies.

"Nobody was fired for choosing IBM" is a known meme, but it's not just meme, it's actually solid advice (not specifically about IBM).

sdoering · 2024-08-17T11:58:27 1723895907

You are basically referring to the "Boring Technology Club" [1] by @mcfunley.

[1]: https://boringtechnology.club/

mewpmewp2 · 2024-08-17T11:37:56 1723894676

Out of curiousity what were the things you shouldn't have used node for?

chrisldgk · 2024-08-17T18:35:51 1723919751

I‘d also like to know. I have yet to run into anything Node can’t handle

RicoElectrico · 2024-08-17T10:56:11 1723892171

Exactly, SSPL is such a trojan horse, and God knows how they will change the license in the future

m11a · 2024-08-17T16:53:25 1723913605

Maybe it's just me, but I think it's purely rational?

A lot of these OSS projects provided a commercial offering in the form of SaaS. Yet AWS/GCP/Azure can just take the OSS project, not contribute anything, and reap all the profit.

AFAICS, these licenses are only intended to defend against the cloud providers, not against companies just using the product commercially and internally.

JetSpiegel · 2024-08-31T09:36:43 1725097003

Amazon, Google, and Microsoft can do anything, snuffing out some mid company is the least of their crimes.

> not contribute anything, and reap all the profit

Yeah, it's called capitalism. What's their stance on Lina Khan breaking up the monopolies? Have they written letters criticising Reid Hoffman for pressuring Kamala?

cqqxo4zV46cp · 2024-08-17T12:07:06 1723896426

That’s not remotely the point.

JohnBooty · 2024-08-17T15:22:34 1723908154

Me too.

I'll give you one counter-argument, though: maybe it's hard to appreciate all of the problems a traditional SQL RDBMS solves until you try and solve them without an RDBMS... and crash and burn badly.

But, if they're actually building a real product with real funding money and they only know MongoDB... yeah, it's intervention time.

asah · 2024-08-17T17:23:52 1723915432

I would not wish this on anybody.

re-thc · 2024-08-17T10:43:38 1723891418

> They have been led astray.

They haven't though. What's wrong with using a tool even if it might be bad? Especially as a fresh user. It's how we learn. From both good and bad experiences.

> They need help.

Sadly it's not the fresh grad, but the "experienced" that only keep their old experiences that need help. Is this comment from 2010? MongoDB has improved. Maybe not to the point of being the best but definitely not unusable.

roenxi · 2024-08-17T10:56:06 1723892166

> Especially as a fresh user. It's how we learn. From both good and bad experiences.

I'd do that, but a superior strategy is letting other people make mistakes and then learning from them. It is best to always be making choices that seem like they could be optimal, with very rare exceptions.

DarkNova6 · 2024-08-17T12:15:18 1723896918

I agree. But unless you are at least tangentially involved, you are likely not to hear such „gone bad“ stories.

First, people are not nearly as self reflective or admit to failures at all. And second, a bad tech decision might not be directly observable (as you will then often see ppl fighting symptoms rather than change and identify the root cause).

re-thc · 2024-08-17T12:04:17 1723896257

> but a superior strategy is letting other people make mistakes and then learning from them

If this was a superior strategy that was so obvious no 1 would be making mistakes so how does this work? Not everything is strictly better.

And by that logic...

>> the superior strategy is letting other people comment (and make mistakes) and then learn from them

i.e. don't read the post until years later to ensure you have all the mistakes and learnings. Sorry, this thread is still live.

duckmysick · 2024-08-17T19:04:36 1723921476

> If this was a superior strategy that was so obvious no 1 would be making mistakes so how does this work?

Either people aren't aware of an optimal strategy (if one exists) or they ignore it for various reason.

The latter is surprisingly common. People know they should exercise, get enough sleep, eat healthy, stay hydrated, tackle high priority tasks instead of procrastinating, etc. - and yet they still aren't doing those things (or as much as they should).

Knowing something is not enough.

re-thc · 2024-08-18T03:03:30 1723950210

> Either people aren't aware of an optimal strategy (if one exists) or they ignore it for various reason.

And my point was precisely that it often doesn't exist. What is an optimal strategy?

> People know they should exercise, get enough sleep, eat healthy, stay hydrated, tackle high priority tasks instead of procrastinating, etc. - and yet they still aren't doing those things (or as much as they should). > Knowing something is not enough.

Your example doesn't even support this. People know they have to get enough sleep but they also know they have to <insert something else>. It depends on what they are optimizing for. i.e. there is no singular optimal strategy.

You've just proved my point rather than yours.

I don't know if knowing is not enough, but clearly the people in your example don't know what the optimal strategy is. E.g. eating healthy is NOT the optimal strategy as it could make them unhappy (e.g. don't like the taste). It's not optimal unless it's strictly better.

roenxi · 2024-08-17T22:46:57 1723934817

> If this was a superior strategy that was so obvious no 1 would be making mistakes so how does this work?

I agree with duckmysick, and also please take note that having a strategy of not making mistakes will not avoid all mistakes. Outcomes and intent never match up perfectly. But that is why it is important to learn from others right from the start.

re-thc · 2024-08-18T03:10:43 1723950643

> I agree with duckmysick > But that is why it is important to learn from others right from the start.

It's important if you know what you're learning but...

What do you even agree with? i.e. what's the learning?

>> duckmysick claims people ignore the optimal strategy but could not even give an example of 1.

This proves it's better to try it yourself than to assume you're learning and make even worse mistakes.

> and also please take note that having a strategy of not making mistakes will not avoid all mistakes

Where is the strategy to begin with? 0 + 0 was 0 to begin with.

regularfry · 2024-08-17T12:28:35 1723897715

MongoDB salesdroids rely heavily on you having a low familiarity with other database tech to spin themselves as the only game in town. "Being led astray" isn't a passive, ambient occurence, and it makes sense to push back against it.

afiori · 2024-08-17T18:45:09 1723920309

On the other hand I know of multiple databases where all tables had attribute_1, attribute_2,..., attribute_5 columns Just in Case™

But more seriously the one feature I like in MongoDB is the pipeline API, where you can express a complex query with multiple filters/aggregations/transformations/joins as a list of simple steps.

There are some use cases where it is very ergonomic (even if I suspect that mongo can easily lose indexes along the steps so pretty performance might not be super intuitive)

afiori · 2024-08-17T19:22:49 1723922569

I forgot about what I hate about mongo, so I will rant here about it: its data model is close enough but different enough from json to be both annoying and dangerous.

The empty string is a valid json key but not a mongo document key.

Mongo uses $operator keys to serialize its datatypes to json but does not sanitise the result: which means that {"foo":100000000000000000} and {"foo":{"$longInteger":"100000000000000000"}} will have a collision with the json export format. (Even if you choose the fully explicit Canonical format as there is no $document operator to wrap ambiguous documents)

So if your plan is to dump json to mongo you should plan for that (also sometimes $operators are evaluated sometimes they are not, it depends on each method and the documentation does not tell you )

The official client (both csv and json) is unable to export a collection if a field is both a value field both an atomic value and an object, so a collection with two documents: {a:1} and {a:{b:1}} will cause problems of you try to export it.

My colleagues have other issues with the json DSL and how most operators exist in 2-3 different forms with different syntax or how the syntax {$operator:{arg1:..., arg2:...}} is unintuitive but I actually sort of like it.

regularfry · 2024-08-21T10:44:42 1724237082

It's honestly difficult to see how this is relevant. There's a difference between just not knowing any better and being misled for money.

j45 · 2024-08-17T15:14:08 1723907648

Some people think it’s them and double down with using a bad tool for the job.

adityapatadia · 2024-08-17T11:22:31 1723893751

Almost all statements about MongoDB are wrong.

> You know exactly what your app needs to do, up-front

No one does. Mongodb still perfectly fits.

> You know exactly what your access patterns will be, up-front

This one also no one knows when they start. We successfully scaled MongoDB from a few users a day to millions of queries an hour.

> You have a known need to scale to really large sizes of data

This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

> You are okay giving up some level of consistency

This is said for ages about MongoDB. Today, it provides very good consistency.

> This is because this sort of database is basically a giant distributed hash map.

Putting MongoDB in category of Dynamo is a big mistake. It's NOT a giant distributed hash map.

> Arbitrary questions like "How many users signed up in the last month" can be trivially answered by writing a SQL query, perhaps on a read-replica if you are worried about running an expensive query on the same machine that is dealing with customer traffic. It's just outside the scope of this kind of database. You need to be ETL-ing your data out to handle it.

This shows the author has no idea how MongoDB aggregation works.

I don't want fresh grads to use SQL just because they learn relations (and consistency and constraints and what not). It's perfectly fine to start on MongoDB and make it the primary DB.

orf · 2024-08-17T15:45:29 1723909529

> This is exactly a great point. When data size goes to a billion rows, Postgres is tough.

You’ve been led astray. You can handle a billion rows on a developer laptop, let alone a production grade instance.

asah · 2024-08-17T17:25:29 1723915529

Depends on row width. 10KB JSON fields are all too common.

afiori · 2024-08-17T19:38:16 1723923496

I have more mongodb experience than postgres but my impression is that a lot of the json handling I ended up doing in mongo would have been easier/reliable in postgres

orf · 2024-08-17T22:52:29 1723935149

Yes, you obviously can’t fit 10 TB of data onto a developer laptop. However you’ll run out of disk before you run into Postgres issues.

lostmsu · 2024-08-18T01:12:15 1723943535

With an SSD cache + a few external 20TB hard drives I think you can easily make it to 40TB with redundancy.

orf · 2024-08-18T02:24:30 1723947870

Yes, you obviously can fit 10 TB of data onto a developer laptop with 10tb of external storage. However you’ll run out of disk before you run into Postgres issues.

lostmsu · 2024-08-18T04:39:36 1723955976

I was kinda agreeing with you :-D

rastignack · 2024-08-17T12:20:36 1723897236

> This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

Is it though ? Maybe 5-10 years ago it was.

endisneigh · 2024-08-17T12:45:19 1723898719

It is still true that vanilla Postgres doesn’t scale well beyond multiple machines. There are extensions that help, though.

rastignack · 2024-08-17T15:53:52 1723910032

My point is that you can handle a billion of rows on a single PostgreSQL instance.

endisneigh · 2024-08-17T15:58:44 1723910324

Looking at the sheer number of rows isn’t really helpful - you’d need to know the query profile. Any database can simply store a billion rows.

m11a · 2024-08-17T13:10:42 1723900242

> This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

Personally, I've not seen any application that seriously needs a billion rows in a single table. (except at truly massive scale, but then you're not using Mongo)

The real solution is implementing archiving to a file store like S3 and/or ship it off to a data warehouse. You don't need billions of rows in a `record_history`/`user_audit` table going back 5 years in your production database. Nobody queries the data.

adityapatadia · 2024-08-17T15:45:57 1723909557

May be we are the odd one here but we need that data at millisecond latency (no those are not logs, we use ClickHouse for that)

Just wanted to put here that it's possible to scale Mongo to this level.

Volundr · 2024-08-17T16:23:21 1723911801

Disclaimer: I have a lot more experience with postgres than mongo. I have worked with multi billion row databases in postgres. I have not on mongo.

> When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

Joins are tough at a billion rows in Postgres. PK lookups and simple index queries of the type mongo is good at Postgres is generally good at too. The main thing mongo has over postgres is ease of sharding if one is looking to scale horizontally.

siva7 · 2024-08-17T15:03:13 1723906993

>> When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

Our everyday problems... Tbh when you reach that size you will hopefully already have a dba department no matter what you use.

adityapatadia · 2024-08-17T15:44:19 1723909459

We don't have a single DBA or DevOps or SRE. MongoDB is really that simple.

j45 · 2024-08-17T15:15:21 1723907721

What percentage of projects hit a billion rows?

I guess one could write a lot of extra rows to try and get there.

ffsm8 · 2024-08-17T13:00:51 1723899651

> We successfully scaled MongoDB from a few users a day to millions of queries an hour.

Uh, 1 query per second is 60x60x60=216000... Soo, 1 million queries per hour equals 4-5 queries per second.

Soo, that's not even at toy project level. That's extremely low scale, like the smallest possible instance small.

A consumer laptop does 20+k queries/seconds on postgres, mysql etc. a raspberry pi usually still gets 1-3k read queries/s, depending on the used SD card (Or 432 million queries per second).

You're not instilling any kind of confidence quoting numbers like that

twentythree-at · 2024-08-17T13:16:51 1723900611

60x60x60 is 60 hours, not 1 hour. 1 hour is 3600 seconds. therefore 1 million queries per hour equal ~280 queries per second.

ffsm8 · 2024-08-17T13:28:34 1723901314

Oof, you're right. Still within the performance profile of a raspberry pi though, even if it's no longer off by an order of magnitude

So I think my point still stand: that number is as low as you can get for any rdbms.

adityapatadia · 2024-08-17T15:48:33 1723909713

TBH we are at around 15M queries per hour. I am sure our customers don't want us to run on RPi. Btw, it's not only query but billion+ rows which are also there.

ffsm8 · 2024-08-17T16:43:26 1723913006

I expected as much, usually it's me pointing out that mongodb is a decent DB depending on the data you're ingesting/storing, and it's builtin clustering is significant better then what postgres offers at the moment.

But the number was so low I couldn't help but point out that this was more likely to convince me that mongo is a joke then a usable database

endisneigh · 2024-08-17T14:01:54 1723903314

Your math is wrong… and it cannot be assumed that traffic is uniformly spread.

Finally what you’re saying is orthogonal to MongoDB - you can self host Mongo on a raspberry pi.

rcarmo · 2024-08-17T10:12:07 1723889527

The "SQLite is just a file" thing is actually an advantage. The example of a website is actually a pretty poor one, since any website that needs to scale beyond a single box has many options. The two easiest ones are:

- Mix static and dynamic content generation (and let's face it, most websites are mostly static from a server perspective)

- Designate a writer node and use any of the multiple SQLite replication features

But, in short, if you use an ORM that supports both SQLite and Postgres you'll have the option to upgrade if your site brings in enough traffic. Which might never happen, and in that case you have a trivial backup strategy and no need to maintain, secure and tweak a database server.

vbezhenar · 2024-08-17T11:41:49 1723894909

You don't need to maintain, secure and tweak postgres any more than you would with SQLite. Just install it and it'll work. Postgres backup is a single command. And actually you're supposed to create sqlite backups with special command as well, if you're copying a file, you're doing it wrong.

I really don't see any cons with Postgres over SQLite for server applications.

curtisblaine · 2024-08-17T12:04:39 1723896279

> You don't need to maintain, secure and tweak postgres any more than you would with SQLite.

That's not true. Postgres is another standalone process, SQLite is a library. Even if you have your service and Postgres on the same box, you need to account for yet another process that can independently go down, that is competing for resources etc...

jeremyjh · 2024-08-17T14:28:59 1723904939

I've never had Postgres "go down". It might if you run out of disk space, but that is going to be a bad time with any database. It is not "competing for resources" when it is running the workload your app is sending it. You may as well say Sqlite is competing for resources in that case.

nh2 · 2024-08-17T17:05:41 1723914341

> You don't need to maintain, secure

Of course you need to maintain postgres.

Major version upgrades are not automatic, you can't just install a newer binary/library version and start it as you can for SQLite. You need to shut down the DB and run `pg_upgrade`, or write manual full export-import scripts with `pg_dump`/`pg_dumpall`/`pg_restore`/`psql`.

And good luck deciding between the different format options, as some of them are unsupported across some of these tools, some cannot export and reimport the full database cluster, there's no idempotent "just import this snapshot" operation (point-in-time restore), lack of progress reporting, etc.

Here are some notes I on the topic:

    # Note on Postgres backups
    #
    # Unfortunately, postgres backup+restore is not straightforward.
    #
    # * Backups created with `pg_dumpall`, which create an .sql file,
    #   cannot simply be used for point-in-time recovery.
    #   They need to be restored with `psql` (not `pg_restore`),
    #   which errors if the data already exists.
    # * You could probably tell it to ignore errors, but naturally it'll just
    #   run through `INSERT ...`, so it's not a proper point-in-time recovery,
    #   because it doesn't remove data newer than the backup as expected.
    # * To use `pg_restore` (which can ignore existing data, re-creating
    #   everything with the `--clean` flag), you need to use `pg_dump`
    #   (not `pg_dumpall`), which cannot backup *all* databases,
    #   only a single given one.
    # * Further, `pg_restore` does not accept `--format=plain` SQL backups
    #   (the default created by `pg_dump`). Only the non-plain backups are
    #   accepted, which are less readable for a human to determine whether
    #   a given backup is the one desired to restore based on the data.
    #
    # As a result, we aim for restoration using `pg_restore --clean`,
    # backing up only the `postgres` database using `pg_dump -d postgres`.
    # This works for us because we currently store all our tables in the
    # `postgres` database.
    # We use `--format=tar` because it is a plain text format, which
    # * deduplicates better than compressed formats, and
    # * allows a human to `grep` in plain text for desired contents.

Why isn't there a mode with which I can just tell postgres to migrate my data automatically upon startup with a newer version?

And why can't I just have postgres-as-a-library to link into my binary, like I can do with SQLite?

You also can't just run postgres as root (e.g. in a container), and have to set up UNIX users to work around that, because postgres has it hardcoded to avoid running as root. This, too, you don't need to do with SQLite.

Also, postgres is harder to secure.

You need to either use TCP and ensure that other UNIX users on the same system can't just connect, or use UNIX Domain Sockets which have a 108 char path length restriction [1] (which is of course not documented in postgres's docs [2]), so it will suddenly break your CI when its path changes from

    /var/lib/jenkins/workspace/my-branch-name/postgres/sockets/.s.PGSQL.5432

to

   /var/lib/jenkins/workspace/my-longer-branch-name-for-additional-cool-feature-12456/postgres/sockets/.s.PGSQL.5432

And then you need to tell people to use shorter branch name "because otherwise the DB doens't work".

Postgres is still my DB of choice, but it would be very misleading to say that it needs no maintenance and just works.

[1]: https://serverfault.com/questions/641347/check-if-a-path-exc...

[2]: https://www.postgresql.org/docs/current/runtime-config-conne...

VWWHFSfQ · 2024-08-17T11:54:57 1723895697

> if you use an ORM that supports both SQLite and Postgres you'll have the option to upgrade if your site brings in enough traffic

I'll never understand this idea that Postgres and SQLite are somehow interchangeable when the time is right.

My database and Postgres are _literally_ the core definition of everything that my application does. My app is written in Rust, but that doesn't matter because it's a _Postgres_ application. I use Postgres-specific features extensively. Converting the application to SQLite would be essentially a re-write, and it would be worse in every way.

Also, I generally just don't understand this fad of running production backends on SQLite. SQLite is great for what it is, a tiny little embeddable client-side database. But it is a _terrible_ database for non trivial business applications where ref integrity, real types instead of "everything is a string", and battle-tested scaling is essential.

sampullman · 2024-08-17T12:54:10 1723899250

I don't think people often switch from Postgres to SQLite, it's probably more common (and much easier) to prototype with SQLite lite first and then switch.

If by referential integrity you just mean FK constraints, you can turn that on in sqlite3.

I think SQLite is pretty good for a lot of use cases. An Axum/sqlite CRUD app should be able to handle at least few hundred requests per second on a medium powered box, which is good enough for a lot of things.

Postgres is really powerful but I don't think it's actually that common to structure your app around it's unique features.

mnahkies · 2024-08-17T16:02:55 1723910575

Lately I've been using sqlite in this way on small projects I'm just hacking at, but after seeing pglite this week (https://news.ycombinator.com/item?id=41224689) I'll probably give that a try next time

rcarmo · 2024-08-17T11:58:45 1723895925

The fact that you use Postgres-specific features extensively is a design decision that many people would never make, regardless of their trust in the engine.

VWWHFSfQ · 2024-08-17T12:02:40 1723896160

That's like saying I use Rust but I don't want to use any Rust specific features because I might want to port it to Python someday.

It's complete nonsense.

rcarmo · 2024-08-17T15:09:14 1723907354

In the enterprise space, it isn't. You often have to build software that will run on different database back-ends. Just because your worldview doesn't align with other people's doesn't mean you own the truth...

rmsaksida · 2024-08-17T12:32:45 1723897965

IMO a downside of SQLite that isn't discussed as often as it should be is the poor support for some table operations like ALTER COLUMN. Need to change a column to null / not null? Drop a foreign key constraint? Tough luck, in some cases the only way to implement a change is recreating the table.

sampullman · 2024-08-17T10:18:03 1723889883

Even without an ORM that supports both, as long as the DB layer is reasonably separated in your application it shouldn't be too much effort to switch. And if you've scaled to the point where it matters, you probably have the resources to do so.

CraigJPerry · 2024-08-17T10:29:23 1723890563

>> as long as the DB layer is reasonably separated in your application

I find this is easy in retrospect but tricky when you’re building a system. It’s all shades of grey when you’re building:

Should I put my queue in my DB and just avoid the whole 2PC drama (saga is a more apt word but too much opportunity for confusion in this context).

I probably should implement that check constraint or that trigger but should I add a plugin to my DB to offer better performance and correctness of special type X or just use a trigger for that too?

Should I create my own db plugin so that triggers can publish messages themselves without going through an app layer?

In retrospect it’s easy to see when you went too far, or not far enough. At decision time the design document your team are refining starts to head past the ~10 page sweet spot limit.

sampullman · 2024-08-17T12:46:54 1723898814

That's true, even with "perfect" abstractions, switching gets more complicated as you use more complex database features.

It's only really easy if you push most of your constraints and triggers to the application. In practice, I've only ever switched databases with really simple CRUD stuff and have otherwise been able to predict that I'll eventually want Postgres/RabbitMQ/etc and build it in from the start.

melodyogonna · 2024-08-17T11:21:09 1723893669

It can be an advantage or a disadvantage, depends on what you're doing.

actuallyalys · 2024-08-17T12:31:37 1723897897

Agreed. For some situations, it might well be easier to take advantage of the static nature of the site and use SQLite compared to setting up a Postgres server. For others, setting up a server could be easier than the “easy” options.

re-thc · 2024-08-17T10:44:57 1723891497

> The "SQLite is just a file" thing is actually an advantage.

It's more like besides the point. Everything in Linux is "just a file".

mortehu · 2024-08-17T11:13:52 1723893232

It can be a process and some unnamed sockets also, which is the main difference between PostgreSQL and SQLite.

karolist · 2024-08-17T11:52:54 1723895574

SQLite is a library, the process still exists, it's just that it's part of your app now.

arpinum · 2024-08-17T10:15:45 1723889745

It's not worth pointing out the technical flaws in the post[1]. It is obvious the author does not have a strong grasp of the tools he is criticising. A better example of this style of post is Oxide's evaluation[2] for control plane storage that actually goes over their specific needs and context.

[1] Ok, just one, Rick Houlihan is currently at MongoDB.

[2] https://rfd.shared.oxide.computer/rfd/53

marcusb · 2024-08-17T10:58:15 1723892295

> It's not worth pointing out the technical flaws in the post[1].

It might help your argument if you pointed out a real technical flaw in the content of the post, and not an example of the author being mistaken about a stranger's first name.

arpinum · 2024-08-17T11:08:23 1723892903

sure. 1. sqlite can have more than 1 file when using wal mode. 2. You don't need to know your exact dyanmodb access patterns upfront, you can evolve the schema. again, not worth effort to point out more.

mbb70 · 2024-08-17T11:28:38 1723894118

Denormalizing data always locks you into certain access patterns, regardless of your ability to evolve schema.

kikimora · 2024-08-17T14:00:39 1723903239

DynamoDb is not a distributed hash map, it is a distributed forest of B-Trees. And B-Trees are what PG uses for indices and MySQL for both tables and indices. You don’t demoralize with it, but rather you build and maintain table and index like structures, they would contain submitted of ids or other data fields but it is the same with normal DB indexes. The difference if you have to maintain them manually. The upside is scale, serverlessnes and maybe less latency.

endisneigh · 2024-08-17T14:03:47 1723903427

You do not have to denormalize data to use dynamodb, just like how data doesn’t have to be normalized to use a relational DB.

Man people have no clue what they’re talking about lol

laurent_du · 2024-08-17T11:35:51 1723894551

I believe what the author has in mind is the fact that you need to create your LSI at the same time as you create the table, you cannot add them later (until GSI). So there's some truth to what they are saying regarding access patterns.

worik · 2024-08-17T10:42:06 1723891326

> Rick Houlihan is currently at MongoDB.

Not according to the YT video.

AWS in 2018

I did not detect technical flaws in the article. I thought it was very good

arpinum · 2024-08-17T10:46:58 1723891618

[flagged]

jb1991 · 2024-08-17T10:57:24 1723892244

> The author asked where Rick is today and got his name wrong.

That is not what is called a “technical flaw“.

emccue · 2024-08-17T11:57:39 1723895859

Yeah, shit. Fixed that one.

Would you believe I don't have an editor?

wordofx · 2024-08-17T10:37:53 1723891073

Na. The post is good.

tetha · 2024-08-17T10:48:39 1723891719

On the MySQL vs Postgres topic: We migrated for two reasons.

The first is that I consider everything remotely owned by Oracle as a business risk. Personal opinion and maybe too harsh, but Oracle licenses are made to be violated accidentially so you can be sued and put on the license hook once you're audited, try as you might.

But besides that, Postgres gives you more tools to keep your data consistent and the extension world can save a lot of dev-time with very good solutions.

For example, we're often exporting tenants at an SQL level and import somewhere else. This can turn out very weird if those are 12 year old on-prem tenants. MySQL in such a case has you turn of all foreign key validations and whatever happens happens. A lot of fun with every future DB migration is what happens. With Postgres, you just turn on deferred foreign key validation. That way it imports the dump, eventually complains and throws it all away. No migration issues in the future.

Or the overall tooling ecosystem around PostgreSQL just feels more mature and complete to me at least. HA (Patroni and such), Backups (pgbackrest, ...), pg_crypto, pg_partman and so on just offer a lot of very mature solutions to common operational and dev-issues.

azurelake · 2024-08-17T11:14:38 1723893278

MySQL has a very mature open source HA story with the flavors of group replication, as well as being able to replicate DDL. Not to mention Orchestrator and friends.

As a matter of fact, EnterpriseDB (the largest contributor to Postgres) has a paid multi master offering, so there's anti incentives in place to improve its HA story...

philipwhiuk · 2024-08-17T11:13:49 1723893229

Yeah there's absolutely no reason to use MySQL. You should use MariaDB if you are stuck on a MySQL project to avoid Oracle. And you should use PostgreSQL for everything new.

refset · 2024-08-17T10:50:10 1723891810

If Postgres already had decent temporal table support (per SQL:2011 system time + application time "bitemporal" versioning) we never would have gone down the road of building XTDB. From the perspective of anyone building applications on top of SQL with complex reporting requirements in heavily regulated sectors (FS, Insurance, Healthcare etc.), "just use temporal tables" would be the ideal default choice. To get an idea of why, see https://docs.xtdb.com/tutorials/financial-usecase/time-in-fi...

Kiro · 2024-08-17T10:43:18 1723891398

When people say "Just use SQLite. It's almost as good as Postgres and you won't need anything more" I'm trying to understand why I shouldn't just use Postgres. It's not like it's hard to install or has any significant overhead. Please enlighten me.

out_of_protocol · 2024-08-17T10:59:54 1723892394

Go with Postgres. SQLite is somewhat barebones, you are getting like 5 datatypes grand total with it (counting null as separate type), 1 type of indexes, somewhat unexplored tooling around it. Also column type is not strict and you can write strings into integer column. With Postgres you are getting very established and rich ecosystem, with various and VERY optimized data types, indexes, extensions (like postgis), tooling around it, lots of tutorials how to fix it if something goes wrong.

P.S. but if your project don't need any of that, e.g. it's desktop audio player, just embed sqlite - remove another subsystem to care about

aldonius · 2024-08-17T12:26:45 1723897605

> Also column type is not strict and you can write strings into integer column.

FYI, as of late 2021, you can opt into strictness.

https://sqlite.org/stricttables.html

jeroenhd · 2024-08-17T11:28:25 1723894105

Postgres isn't hard to use, but it requires maintenance. You need more scripts, more tooling, more knowledge of DBA, and that may not be necessary. When you're using a database to store a few thousand rows of data, Postgres quickly becomes overkill. Postgres is a V8 engine when all you may need to power is a lawn mower.

I personally prefer to use abstractions like ORMs for most of my database interactions, and direct SQL when those abstractions get in the way (by generating expensive queries and not finding an easy fix without a large refactor).

This way, starting out with sqlite (good enough for most websites I reckon, easy to backup) doesn't interfere with any necessary migration to postgres (like when the need for scaling arises). This also makes setting up tests easier (except for the manually written SQL) because starting an application with a temporary in-memory database is a lot faster than starting a full container.

Unless I'm doing native apps, I'll probably always want to reserve the ability to use Postgres. Sometimes that means hooking up a Postgres account and such, but often that just means sticking with sqlite and leaving my options open for when sqlite doesn't work anymore.

wonrax · 2024-08-17T13:30:39 1723901439

> Postgres isn't hard to use, but it requires maintenance. You need more scripts, more tooling, more knowledge of DBA, and that may not be necessary.

I don't think Postgres needs to be maintained at all for small databases, which is usually the use case for SQLite. Their default configurations would take care of most things for trivial applications.

> Starting an application with a temporary in-memory database is a lot faster than starting a full container.

Starting a container might be way slower than SQLite, but I would still consider it fast for most, if not all use cases.

> hooking up a Postgres account

You can configure Postgres to start up in trust mode, which doesn't require a password for any user. This is basically the same as the unencrypted SQLite database file but with a fixed connection string: `postgresql://postgres@localhost`

masklinn · 2024-08-17T10:57:45 1723892265

> It's not like it's hard to install or has any significant overhead.

Depends on the environment or lack thereof, postgres is a pain in the ass on windows, and then you need support for software configuration so that it can talk to postgres, and then you have to take care of the features you're using.

If you're deploying a complex server-side system with lots of moving parts, then yes postgres is basically free. But if you're deploying client-side, or want to run it in a VPS, or whatever, postgres might go from not available to extra cost to a huge chore.

> Just use SQLite. It's almost as good as Postgres and you won't need anything more

Can't say I agree with that sentiment in any way though, every time I use it sqlite frustrates me in its limitations compared to postgres, and how weak the defaults are from a safety and consistency perspective.

CuriouslyC · 2024-08-17T11:20:54 1723893654

> Depends on the environment or lack thereof, postgres is a pain in the ass on windows, and then you need support for software configuration so that it can talk to postgres, and then you have to take care of the features you're using.

This is why everyone uses docker and .env files. The problem has already been solved and you can copy/paste starter files from project to project to make it a non issue.

0dayz · 2024-08-17T13:58:12 1723903092

Well running dbs in containers are generally not a good idea.

CuriouslyC · 2024-08-17T14:47:27 1723906047

Works fine to enable developers, and most people use cloud database offerings like RDS in production.

jononor · 2024-08-17T12:27:49 1723897669

I would use SQLite if the embeddable part makes things considerably easier. For example, in a desktop or mobile application where a single application process is easier, because it fits the deployment model better.

fulafel · 2024-08-17T15:16:12 1723907772

Using a networked service makes your app a distributed system with all the associated hard things.

ptrwis · 2024-08-17T10:59:47 1723892387

SQLite for embedded or desktop apps.

christkv · 2024-08-17T11:06:49 1723892809

Really just reads as an article reaffirming his own bias. For Mongo at least most of it is wrong.

- Secondaries are read replicas and you can specify if you want to read from them using the drivers selecting that you are ok with eventual consistency.

- You can shard to get a distributed system but for small apps you will probably never have to. Sharing can also be geo specific so you query for french data on the french shards etc lowering latency while keeping a global unified system.

- JSON schema can be used to enforce integrity on collections.

- You can join but this I definitely don’t recommend if possible.

- I personally like the pipeline concept for queries and wish there was something like this for relational databases to make writing queries easier.

- The AI query generator based on the data using Atlas has reduced the pain of writing good pipelines. Chat gpt helps a lot here too.

- The change streams are awesome and has let us create a unified trigger system that works outside of the database and it’s easy to use.

We run postgres as well for some parts of the system and it also is great. Just pick the tool that makes the most sense for your usecase.

emccue · 2024-08-17T12:07:25 1723896445

Okay, I am very sorry that I got Rick Houlihan's name wrong.

In my defense, I hadn't watched his talks _recently_ and we've all been Berenstain Bear'ed a few times.

But also the comparison of DynamoDB/Cassandra to MongoDB comes directly from his talks. He currently works at MongoDB. I understand MongoDB has more of a flowery API with some more "powerful" operators. It is still a database where you store denormalized information and therefore is inflexible to changes in access patterns.

endisneigh · 2024-08-17T12:45:59 1723898759

You can store normalized information. What you’re saying is still wrong. With respect to schema you can use Atlas Schema. If you’re not really familiar you shouldn’t make these comparisons IMO.

arpinum · 2024-08-17T12:16:18 1723896978

> It is still a database where you store denormalized information and therefore is inflexible to changes in access patterns.

It is flexible and you don't need to know your exact access patterns upfront. It may not be as flexible as your chosen technology, but that doesn't make your statement true.

endisneigh · 2024-08-17T11:02:55 1723892575

Use what you know, ship useful stuff.

maipen · 2024-08-17T11:10:59 1723893059

I agree.

Mariadb is one of the easiest dbs i have ever used.

Easy to setup and flexible.

I prefer to use whatever makes it quicker to build something, which usually means whatever I am experienced with already.

Building products is what’s important at the end of the day.

No body cares, nor should they, what kind of tools Michael Angelo used. His art is what we value.

philipwhiuk · 2024-08-17T11:15:44 1723893344

> Michael Angelo

Michaelangelo was his first name. His full name is Michelangelo di Lodovico Buonarroti Simoni

andrewinardeer · 2024-08-17T12:04:47 1723896287

Like Leo Nardo, Don E. Telli and Raph A. El. All masters of their chosen marital arts.

tormeh · 2024-08-17T10:19:33 1723889973

MySQL is like Javascript: Full of bad decisions and footguns. It works perfectly fine, but I don’t see why you’d use it when Postgres exists.

viraptor · 2024-08-17T10:45:17 1723891517

Depends on your workload, immutable tuples and vacuum can hurt a lot. (Although that got much better recently) Also the mad decisions about the query planner which you can't control. Often this doesn't matter, but it's worth being aware.

CuriouslyC · 2024-08-17T11:25:58 1723893958

Oh the contortions I've gone through with the query planner to get it not to do things.

Also, if you're using JSONB, long strings or other toast entries, your query plan and your performance will be wildly divorced since the planner doesn't factor in a lot of the toast IO and associated memory management. The lesson for others here is if you have JSONB/long text fields, store them in their own table.

nsonha · 2024-08-17T10:26:20 1723890380

more like PHP, which is funny because they always come in pair, prolly originate from the days of LAMP stack.

js is more associated with Mongo, another bad db. Most modern js projects (or any modern project really, except PHP) use Postgres

treflop · 2024-08-17T10:42:19 1723891339

It’s because PHP’s popularity is old and Postgres used to be mid. MySQL was faster and better than Postgres.

But Postgres slowly improved and then got better than MySQL while MySQL stagnated. The most basic bugs persisted, basic features never got added and consistency never seemed to be a point of improvement.

Gud · 2024-08-17T10:50:14 1723891814

Sorry but MySQL was not really “better” than Postgres.

I never understood back then why MySQL was the default choice for so many people.

afiori · 2024-08-17T10:52:54 1723891974

Many claim that MySQL with default configs had various advantages over postgres with default config and that it had a far better replication story

Gud · 2024-08-17T12:35:30 1723898130

MySQL would also silently lose your data

afiori · 2024-08-17T19:04:48 1723921488

You win some you lose some

a012 · 2024-08-17T10:33:52 1723890832

20 years ago LAMP (Linux, Apache, MySQL, PHP) stack ~is~ was the most common combo of the web

graemep · 2024-08-17T15:28:16 1723908496

Not sure about Apache, but Linux, MySQL and PHP is still the most common combo in terms of number of sites running it. Wordpress alone is enough to establish that.

nsonha · 2024-08-17T19:16:24 1723922184

> most common combo in terms of number of sites running it. Wordpress...

does that claim do anything for anyone? India is the most populated country on earth, so?

cosmicradiance · 2024-08-17T10:36:00 1723890960

1. With the recent developments at CockroachDB one may like to bundle it along with MSSQL and Oracle.

2. Like the author, I will like to understand "Why not MariaDB? (a free variant of MySql)".

MrThoughtful · 2024-08-17T10:30:14 1723890614

Their reasoning is that some platforms like Heroku do not support SQLite.

Why use those then and not a platform that supports it, like Glitch?

I have used Postgres, MySql etc, but having the project storage in a single file is making things so much easier, I would never ever want to lose that again.

throw0101d · 2024-08-17T15:10:34 1723907434

For MySQL, for smaller deployments, I've found Galera to really be a handy HA system to get going:

> Galera Cluster is a synchronous multi-master database cluster, based on synchronous replication and MySQL and InnoDB. When Galera Cluster is in use, database reads and writes can be directed to any node. Any individual node can be lost without interruption in operations and without using complex failover procedures.

* https://galeracluster.com/library/documentation/overview.htm...

* https://packages.debian.org/search?keywords=galera

The closest out-of-box solution that I know of for Postgres is the proprietary BDR:

* https://www.enterprisedb.com/docs/pgd/4/bdr/

* https://wiki.postgresql.org/wiki/BDR_Project

There are systems like Bucardo, but they are trigger-based and external to the Postgres software:

* https://www.percona.com/blog/multi-master-replication-soluti...

Having a built-in 3-node MMR (or 2N+1arb[0]) solution would solve a bunch of 'simple' HA situations.

[0] https://packages.debian.org/search?keywords=galera-arbitrato...

Sesse__ · 2024-08-18T20:10:49 1724011849

> For MySQL, for smaller deployments, I've found Galera to really be a handy HA system to get going:

Well, at least if you don't value your data.

(https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster; Galera failed Jepsen testing in 2015 and the bug is still open with a 2022 mention of basically “we have experimental support [for actually providing the data consistency we promise], but it's not clear if it's worth it because it will be very slow”)

KingOfCoders · 2024-08-17T10:07:23 1723889243

Postgres will do to other databases, what Linux did to other Unix(/BSD-like) operating systems (IRIX, SunOs, ...).

Gud · 2024-08-17T10:48:07 1723891687

Linux didn’t “do” anything to *BSD. Linux emerged at the same time there was a nasty lawsuit against BSD.

KingOfCoders · 2024-08-17T11:24:41 1723893881

SunOS is/was an BSD based operating system, and Linux replaced sun servers that were running SunOS with Intel servers in the same way as Linux based servers replaced most server operating systems in data centers (and today cloud providers), like Ultrix, IRIX, HP-UX etc. I've meant "do" in this way.

worik · 2024-08-17T10:46:20 1723891580

Been waiting a rew decades now...

CuriouslyC · 2024-08-17T11:12:37 1723893157

I'd like to mention that CouchDB is really useful for one reason - a very robust sync story with clients, and a javascript version called PouchDB that can run on the browser and do bidirectional sync with remote Couch instances.

This can be done with sqlite by jumping through a few extra hoops, and now with in-browser WASM postgres, there as well with a few more hoops, but the Couch -> Pouch story is easy and robust.

dangoodmanUT · 2024-08-17T12:20:30 1723897230

There's a lot of _very_ arguably false statements in here, esp around mongo and dynamo.

Postgres still has to "rewrite" data if you need another index. In fact it's about the same amount if you had to add an index for dynamodb...

Also, when's the last time you changed your primary key in a postgres table? Or are you just adding indexes?

Longwelwind · 2024-08-17T12:26:07 1723897567

> It's annoying because, especially with MongoDB, people come into it having been sold on it being a more "flexible" database. Yes, you don't need to give it a schema. Yes, you can just dump untyped JSON into collections. No, this is not a flexible kind of database. It is an efficient one.

I really like this sentence because it perfectly encapsulates a mistake that, I think, people do when considering using MongoDB.

They believe that the schemaless nature of NoSQL database is an advantage because you don't need to do migrations when adding features (adding columns, splitting them, ...). But that's not why NoSQL database should be used. They are used when you are at a scale when the constraints of a schema become too costly and you want your database to be more efficient.

dangoodmanUT · 2024-08-17T12:20:52 1723897252

These posts are always so biased to the person that's used postgres 100x more than any other DB.

evanelias · 2024-08-17T13:16:37 1723900597

Absolutely this. The author seems blind to Postgres shortcomings.

For example, the author notes MySQL has "features locked behind their enterprise editions." That is true for some features in MySQL, yes. But the same thing is true in Postgres for DDL logical replication and other HA-related features, which are only in EDB Postgres -- and yet those are features MySQL has had in open source for over two decades.

jb3689 · 2024-08-17T15:19:11 1723907951

Every database has issues and quirks whether they be about how you design your application, how you need to scale, or how you need to maintain your database. You can play this game “just use XYZ and have no problems”, but it isn’t realistic. Production databases at scale require heavy dedicated infra to stay highly available and performant, and even out of the box solutions require you to understand what is going on and tune them else you run into “surprises” which are almost always that no one RTFM. Pretty much every mainstream database is capable of both highly available and highly consistent workloads at scale. The storage engine largely shouldn’t matter as much as the application tuning.

graemep · 2024-08-17T15:24:16 1723908256

I do not get the reasoning around SQLite.

SQlite is easy to backup, especially if you are OK with write locking for long enough to copy a file. It now has a backup API too of you are not OK with that.

Lots of things do not scale enough to need more than one application server. A lot of the time, even though I mostly use Postgres, the DB and the application are on the same server, which gets rid of the difficulties of working over a network (more configuration, more security issues, more maintenance).

The main reasons I do not use SQLite are its far more limited data types and and its lack of support for things like ALTER COLUMN (other comments have covered these individually).

j45 · 2024-08-17T14:18:14 1723904294

It has never made sense to me why someone uses no, and then proceed to little by little to make their implementation into having relations.

It’s way less work just to learn sql or an orm.

Nosql is great at being a document store.

I’ve used MySQL longer, it’s been a good default option, the jump to how Postgres works and what it offers is too much to ignore.

Postgres can act as a queue, many of the functions that a nosql has, handle being ann embedding db, and do so until a decent volume. It can be the backbone of many low code tools like supabase, hasura, etc. the only thing that’s different is there seems to be nice currents for MySQL but you get the hang of it pretty quick.

kingkongjaffa · 2024-08-17T10:27:49 1723890469

Does anyone know why seemingly all the introductory courses advocated nosql stuff like using mongoDB

Even freecodecamp who is excellent, does this.

They have a rel-db course https://www.freecodecamp.org/learn/relational-database/ but their backend course uses mongodb https://www.freecodecamp.org/learn/back-end-development-and-...

tytho · 2024-08-17T10:45:43 1723891543

I can’t speak to the official decisions made by these camps/courses, but from my own experience as an undergrad, I was first introduce to MySQL, and the professors at my university did not teach using migration management tools for bringing a schema in a database up. You were either using a GUI to set up the tables, or running your own cobbled together sql files. For class assignments this was fine. Then I had a professor introduce mongo to me. I was floored by the idea of having my schema live along-side the application code! No more messing around in SQL GUIs! Then of course over time I realized you still need to maintain a schema over time and provide someway to “upgrade” data when your schema evolves, and keep your data consistent. Then I discovered the tools around migrating mongo data are not nearly as mature as the ones you’ll find for SQL databases.

I find mongo alright at producing a short-lived prototype of an application (e.g. school assignments), but the risk of it shipping to production for a long period is too risky for the “benefit”.

aussieguy1234 · 2024-08-17T11:10:21 1723893021

If you're just starting a startup, go with Postgres for most things. With limited devops resources it'll be good enough in most scenarios.

You can always over optimise later on.

yigitcan07 · 2024-08-17T11:16:42 1723893402

I've found out key/value databases pushes for better architectural designs in enterprise environments. Especially in companies where different teams are responsible for a given business capability and it needs to scale above 1+ million users.

Postgres flexibility enables for design that is hard to scale. Both in terms of maintainability and performance. Enforcing K/V as a default database in one of my previous companies worked wonders.

m11a · 2024-08-17T12:52:02 1723899122

I don't think there's anything unscalable about Postgres, or RDBMS's in general. I've seen even poorly tuned Postgres with unnormalised table designs work fine at a decent scale, to the point where I'm convinced that Postgres with a decent table design gets you very far.

As in: far enough that if you outscaled it, you'd be able to afford a team of excellent engineers to write an appropriate database system.

Almost all companies don't need the hyper scaling NoSQL databases supposedly promise. What they do often eventually realise is that they want the querying power and additional ACID guarantees of a typical relational database, so they end up developing a shitty relational database on top of a NoSQL database.

yigitcan07 · 2024-08-18T10:31:26 1723977086

My current company uses Postgres by default and we have a lot of different usecases. Again, another million+ users company with 10+ countries. It does indeed scale.

The problem is how people think about SQL vs K/V. They fall into the normalization trap a lot and create complex procedures and read operations. This usage causes once a month DB CPU spikes and some inident.

We are currently advocating for; de-normalized tables with K/V usage of Postgres and pushing the complexity to the application layer. Essentially, use Postgres at its bare minimums.

In short, to make Postgres scale; you essentially need to forget your "expert SQL knowledge" and use it as a K/V.

m11a · 2024-08-28T09:24:46 1724837086

I think a balance is needed. Completely denormalised isn’t a good idea because if you have a single table with a large row, but a single column receives most the updates and it updates a lot, you’re going to have tons of dead tuples and write churn/overhead whenever updating a row in that table.

But I agree that some people go too far with normalisation. When done reasonably, with awareness of access patterns and application behaviour, I think it’s important though.

_vrtq · 2024-08-17T11:30:52 1723894252

So like how do you do that? How do you store the user record for example as kv?

yigitcan07 · 2024-08-18T10:41:37 1723977697

It mostly revolves around; understanding your primary business concept that write operations revolve around (aggregate root) and duplicating data for different read scenarios (view models).

For example imagine you have an "E-commerce" product which you can change details about. The "Product" would be a write-model that you store as K/V. It would accept operations such as; "change price", "change category" etc. Your key would be "product id" and the value would be the whole object represented as json etc.

For every write operation you would read the write-model from the database, deserialize, modify it, put it back. Changes to the write-model would trigger events and you could build different read-models to access the data.

Jupe · 2024-08-17T10:58:48 1723892328

From the article, DynamoDB-likes are good IF:

* You know exactly what your app needs to do, up-front

But isn't this true of any database? Generally, adding a new index to a 50 million row table is a pain in most RDBs. As is adding a column, or in some cases, even deleting an index. These operations usually incur downtime, or some tricky table duplication with migration process that is rather compute + I/O intensive... and risky.

williamdclt · 2024-08-17T11:08:14 1723892894

50M rows is really not that much, I’d guesstimate an index creation to take single-digit minutes.

None of these operations I’d expect to cause downtime, or require table duplication or to be risky

Edit: to be fair, you’re right there’s footguns. Make sure index creation is concurrently, and be careful with column default that might take a lock. It’s easy to do the right thing and have no problem, but also to do the wrong thing and have downtime

andatki · 2024-08-17T11:30:40 1723894240

Newer versions of Postgres also support dropping indexes concurrently. I recommend using the concurrently option when dropping unused or unneeded indexes on any table with active writes and reads. https://www.postgresql.org/docs/current/sql-dropindex.html

azurelake · 2024-08-17T11:04:39 1723892679

You can also add add GSIs (with their caveats) without any re-work.

hk__2 · 2024-08-17T11:17:20 1723893440

> Why not MySQL? MySQL is owned by Oracle.

Ok, so what about MariaDB?

codr7 · 2024-08-17T12:03:26 1723896206

Still nowhere near Postgres from my experience, full of nasty surprises like arbitrary missing features and weird defaults.

andrewstuart · 2024-08-17T10:13:33 1723889613

Totally agree - I have tried many databases of all flavors, but I always come back to Postgres.

HOWEVER - this blog post is missing a critical point.... the quote should be:

---> Just use Postgres

AND

---> Just use SQL

"Program the machine" stop using abstractions, ORMs, libraries and layers.

Learn how to write SQL - or at least learn how to debug the very good SQL that ChatGPT writes.

Please, use all the very powerful features of Postgres - Full-Text Search, Hstore, Common Table Expressions (CTEs) with Recursive Queries, Window Functions, Foreign Data Wrappers (FDW), put JSON in, get JSON out, Array Data Type, Exclusion Constraints, Range Types, Partial Indexes, Materialized Views, Unlogged Tables, Generated Columns, Event Triggers, Parallel Queries, Query Rewriting with RULES, Logical Replication, PartialIndexes, Policy-Based Row-Level Security (RLS), Publication/Subscription for Logical Replication.

Push all your business logic into big long stored procedures/functions - don't be pulling the data back and munging it in some other language - make the database do the work!

All this stuff you get from programming the machine. Stop using that ORM/lib and write SQL.

EDIT:

People replying saying "only use generic SQL so you cans switch databases!" - to that I say - rubbish!

I nearly wrote a final sentence in the above saying "forget that old wives tale about the dangers of using a databases functionality because you'll need to switch databases in the future and then you'll be stuck!"

Because the reason people switch databases is when they switch to Postgres after finding some other thing didn't get the job done.

The old "tut tut, don't use the true power of a database because you'll need to switch to Oracle/MySQL/SQL server/MongoDB" - that just doesn't hold.

rcarmo · 2024-08-17T10:18:03 1723889883

your answer was fine and good until you classified ChatGPT's SQL generation as "very good" -- which it is not. I've had _all_ GPT models spit out monstrosities and slow queries of all kinds.

ORMs are not all bad. In fact, some ORMs generate better code for really complex joins (think hundreds of tables, each with hundreds of columns) than humans, and often ensure that trivial best practices (like indexes and consistent foreign keys) are followed.

Writing SQL is a great skill, but if you tie yourself to a single database engine's idioms then you're in for a shock when you switch platforms/jobs/environments.

StefanBatory · 2024-08-17T10:25:16 1723890316

May I ask question about this part?

"Push all your business logic into big long stored procedures/functions - don't be pulling the data back and munging it in some other language - make the database do the work!"

From my courses I had at university, I've been led to believe that the current trend is doing hexagonal architecture, as that allows for better modularisation of the project and helps keep code clean over many years with many software engineers coming in and out. As a part of that I've been taught that the only part you could trust then is your internal modules - and even database has to treated as an external source, whose only job is to pull data in and out. How does that work in what you're suggesting? Is it just a different way of approaching things that will work depending on what's your goal is?

I'm just curious about this as I'm trying to get myself to learn a bit more, just to clarify

tazu · 2024-08-17T11:06:22 1723892782

The model described in the parent comment is essentially using the database (in this case PostgreSQL, but any RDMS would do) as the hexagonal "core" in which adapters plug in to. This is a powerful pattern that works very well when you use the full features of the RDMS like constraints, triggers, views, etc. This does require "coupling" to the RDMS-specific features, which makes migrating to an alternative system difficult, but in practice this rarely happens if you choose a strong RDMS from the beginning.

You can certainly use the database as a "dumb storage" tool in the hexagonal architecture, that is, as just another adapter. But most of the time you'll end up re-creating RDMS features in poorly written/documented application code that has to interact with the database anyways. Why not just do it all in the database? With a RDMS core, hexagonal adapters can be pure functional components, making them much easier to reason about and maintain.

For more on this idea, and how to avoid pitfalls with the hexagonal pattern, I recommend reading Out of the Tar Pit [1]. It's a short but highly influential paper on "functional relational programming".

[1]: https://curtclifton.net/papers/MoseleyMarks06a.pdf

StefanBatory · 2024-08-17T11:30:10 1723894210

I see it, that makes sense! Right, I don't think that you'd have a reason to swap RDMS unless licencing issues come up, like with Oracle. Thank you for helping out.

KronisLV · 2024-08-17T11:24:44 1723893884

> "Push all your business logic into big long stored procedures/functions - don't be pulling the data back and munging it in some other language - make the database do the work!"

This is one of the categories of opinions that I’ve heard, the proponents of which suggest that databases will typically be more efficient at querying and transforming data, since you’ll only need to transfer the end result over a network and will often avoid the N+1 problem altogether.

You probably don’t want some reporting or dashboard functionality in your app to have to pull tens or hundreds of thousands of rows to the back end, just because you have decided to iterate over the dataset and do some transformations there.

That said, I’ve worked in an app where the Java back end only called various stored procedures and displayed their results in tables and while it was blazingly fast, the developer experience was miserable compared to most other projects I’ve worked with - lots of tables with bad naming (symbol limits in that RDBMS to thank), badly commented (not) procedures with obscure flags, no way to step through anything with a debugger, no proper logging, no versioning or good CI tooling, no good tools for code navigation, no refactoring suggestions, no good tracing or metrics, nothing.

Sure, it might have just been a bad codebase, but it was worse than most of the ones where too much logic is in the back end, those just run badly, so I get the other category of opinions, which suggests that trying to use the DB for everything isn’t a walk in the park either.

There’s probably a good balance to be found and using tools in ways that both perform okay and don’t make the developer experience all that bad.

StefanBatory · 2024-08-17T11:36:42 1723894602

Thank you :)

KronisLV · 2024-08-17T14:23:26 1723904606

Ofc I reserve the right to be wrong, just wanted to share my subjective experience, that there can be tradeoffs and there probably aren't any silver bullets.

For the most part, I think that you should put any mass/batch processing in the DB (just comment/version/test/deploy your code like you would on the back end, as best as you can with the tools available to you) and don't sweat too much about handling the CRUD operations in your back end, through whatever ORM you use or don't use (regular queries are also fine, as long as parametrized to prevent injection).

For complex schemas, a nice approach I've found is making one DB view per table/list/section of your front end, so you only need 1 DB call to load a particular component, otherwise the N+1 risk gets far greater ("Oh hey, I got this list of orders, but each other needs a delivery status, so I'll just iterate over those and fetch them for each item, whoops, the DB is spammed with requests.").

Good luck!

poorlyknit · 2024-08-17T11:18:51 1723893531

Not OP, but I think it's a valid approach.

You gain: Model consistency guaranteed by the database, your backend basically only acts as an external API for the database.

You lose: Modularity, makes it harder to swap out databases. Also, you have to write SQL for business logic which many developers are bad at or dislike or both.

I've seen a system running on this approach for ten years and it survived three generations of developers programming against this API. There's Python wx frontends, web frontends, Rust software, Java software, C software, etc. They all use the same database procedures for manipulating the model so it stays consistent. Postgres is (kinda, not very) heavy for small projects but it scales for medium up to large-ish projects (where it still scales but not as trivially). One downside I've seen in this project is that some developers were afraid to change the SQL procedures so they started to work around them instead of adding new ones or changing the existing ones. So in addition to your regular work horse programming language you also have to be pretty good at SQL.

StefanBatory · 2024-08-17T11:30:58 1723894258

That's interesting way of doing things, I'll admit. Thank you for helping out :)

Kinrany · 2024-08-17T11:13:58 1723893238

With Pglite in a few years you'll be able to treat Postgres as a library, and merely allow selecting between in-process and remote via configuration.

The first job of a database is to be a data structure for persisting data, but you're allowed to extend said data structure in your own code. As long as you can come up with a way to keep all the code in version control, test it, etc., it's fine.

andatki · 2024-08-17T11:52:05 1723895525

Great list of Postgres features called out that highlight the extensive feature set.

Most of these are covered in my book, for anyone that’s interested in learning them. The book uses a Ruby on Rails app with Postgres instances for examples and exercises. Hope the plug is ok here as some folks may be looking for learning resources for Postgres. https://andyatkinson.com/pgrailsbook

codr7 · 2024-08-17T12:20:02 1723897202

You should use the most powerful and convenient language you have at your disposal, which is most likely the host language.

ORMs are funny things, it's like we got stuck in the idea of making the database object oriented. MongoDB just means we don't have to pretend anymore, not that it was a good idea.

It is perfectly possible to use relational concepts in a general purpose language. Tables, Columns, Foreign Keys, Records, Indexes, Queries etc. And you can build whatever Model abstractions you need on top of that; or not, for simple CRUD you don't really need a type system.

I usually build that layer along with the foundation of the application, it still evolves slightly every time around but the basics are very tried and proven by now.

https://github.com/codr7/hostr/tree/main/src/Hostr/DB

frithsun · 2024-08-17T10:20:02 1723890002

Try to avoid the bespoke features of psql in favor of generic SQL unless cornered by circumstances into doing so, methinks.

If there's one complaint I have about pg, it's that it has too many features that encourage finding cute, non standard, non obvious ways of going about things.

LukaD · 2024-08-17T10:33:07 1723890787

> Try to avoid the bespoke features of psql in favor of generic SQL unless cornered by circumstances into doing so, methinks.

Why? To make migration to another database easier? I've never had the need to migrate any application away from postgres. I usually take full advantage of what the database can do.

maccard · 2024-08-17T10:57:23 1723892243

I’m a proponent of vendor lock in is not a big deal - you’re not going to switch from AWS to Azure on a whim and if you do, the fact that you’re using ecs instead of k8s isn’t going to slow you down.

But data ownership is the one place I get iffy. What if your db does a rug pull and changes licenses? There’s certainly precedent in this space for that.

tormeh · 2024-08-17T10:54:51 1723892091

Stored procedures suck. First of all SQL is a dubious language to write business logic in because it doesn’t have static typing and other goodies we expect nowadays. But more importantly stored procedures tend to drift out of version control. So please don’t.

andrewstuart · 2024-08-17T12:01:31 1723896091

>> tend to drift out of version control

Old wives tale.

Modern IDEs are very good at version control over all database elements.

https://www.jetbrains.com/help/datagrip/databases-in-the-ver...

codr7 · 2024-08-17T12:24:12 1723897452

Still a lot more complicated to deal with in every aspect, obviously so to anyone who has long term experience with them, what are you gaining from pretending otherwise?

worik · 2024-08-17T10:52:09 1723891929

> People replying saying "only use generic SQL so you cans switch databases!" - to that I say - rubbish!

No, not rubbish

Portability matters. Lockin sucks

christophilus · 2024-08-17T11:18:14 1723893494

For everyone saying, “Just use SQLite”, how do you deal with pathological queries causing a denial of service? SQLite is synchronous, so you end up blocking your entire application when a query takes a long time. It’s a problem in Postgres, too, especially if the query involves table locks, but your app can Postgres can generally hobble along.

nh2 · 2024-08-17T17:24:38 1723915478

If your app is sane (== uses threads for blocking IO operations) then this does not happen.

nsonha · 2024-08-17T10:22:57 1723890177

> AI is a bubble

why does it even matter? I know that I need multimodal search in my product, and that is why I need vector DB. You're not saying anything interesting by saying "AI is a bubble". If you say something like I may not actually need RAG/mutimodal/semantic search/dedicated vector db then you may have my attention.

emccue · 2024-08-17T14:51:36 1723906296

Where does the funding for the companies developing those databases come from?

I do not know enough about vector search to assert pgvector is enough for you, but I do know enough about supply chains to get woozy

nsonha · 2024-08-17T19:13:52 1723922032

so like if the funding for AI disappears then somehow my requirement of multimodal search also disappears, and with it all the existing solutions, some NOT VC funded, like pgvector?

cpursley · 2024-08-17T10:57:51 1723892271

Great post - the comparison to specific tech was really useful.

Just added it to my "Postgres Is Enough" gist: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

mgaunard · 2024-08-17T11:00:23 1723892423

Not everyone builds the same kind of application and has the same amount of data with the same kind of interactions.

ginko · 2024-08-17T11:08:55 1723892935

I wish postgres had a library only mode that directly stored to a file like sqlite. That'd make starting development a lot easier since you don't have to jump through the hoops of setting up a postgres server. You could then switch to a "proper" DB when your application grows.

mortehu · 2024-08-17T11:17:52 1723893472

It's literally 4 lines of Python code calling subprocess.Popen to start a PostgreSQL server for a given database directory and connecting to it via a pipe on the filesystem. However, you can't launch multiple concurrent instances like this.

louwrentius · 2024-08-17T11:30:44 1723894244

> You can only have so much RAM. You can have a lot more than you'd think, but its still pretty limited compared to hard drives.

Your data fits in ram[0]. [0]: https://yourdatafitsinram.net

m11a · 2024-08-17T12:58:13 1723899493

I suppose the question is: at what cost?

eg on RDS, they'll give you instances with 1TB of RAM, eg a `db.r6idn.32xlarge`, at the nice price of $75/hr ($54k/mo). Not to mention that, in a microservices architecture, assuming you're not sharing a database, you might be multiplying that figure out a few times.

So just because it's possible for it to fit in RAM doesn't mean it's economical. RAM isn't exactly getting exponentially cheaper or more spacious anymore. The hope was flash memory would be the solution, but not sure how far that's getting these days.

geenat · 2024-08-17T11:02:21 1723892541

Horizontal scale of writes.

Citus would be alright if the HA story was better: https://github.com/citusdata/citus/issues/7602

evilmonkey19 · 2024-08-17T11:50:45 1723895445

Personally, most of the projects i do are in self-hosted servers. The traffic isnt big. In such cases sqlite has been way better than postgres. Many times i see postgres not well used. Its meant for big project, not small ones.

chimert · 2024-08-17T12:10:36 1723896636

I agree. Sqlite is fine unless you need concurrent write connections. I use it for everything.

wood_spirit · 2024-08-17T11:11:34 1723893094

My own advice would be start with SQLite and do a trivial migration to Postgres if warranted.

nu11ptr · 2024-08-17T11:20:48 1723893648

I don't hate SQL and I agree for many applications it makes sense, but I disagree 100% with "default to a SQL database" (like Postgres). Instead, figure out what you need based on your app.

Recently I had the opportunity to rewrite an application from scratch in a new language. This was a career first for me and I won't go into the why aspect. Anyway, the v1 of the app used SQL and v2 was written against MongoDb. I planned the data access patterns based on knowledge that my DB was effectively document/key/value. The end result: it is much simpler. The v1 DB had like 100+ tables with lots of relations and needs lots of documentation. The v2 DB has like 10 "tables" (or whatever mongo calls them) yet does the same thing. Granted, I could have made 10 equivalent SQL tables as well but this would have defeated the purpose of using SQL in the first place. This isn't to say MongoDB is "better". If I had tons of fancy queries and relations I needed it would be easier with SQL, but for this particular app, it is a MUCH better choice.

TL;DR Don't default to anything, look at your requirements and make an intelligent choice.

worik · 2024-08-17T10:43:37 1723891417

And often no databases manager is the best solution.

Literally

If you are not storing much data no datase manager is th best

dmezzetti · 2024-08-17T11:14:56 1723893296

I'm always cautious with a one-size-fits-all approach. If a team is working on a small project and SQLite works then great. You can use a SQLite database on something like a $4/month DigitalOcean droplet. Can't say the same for Postgres.

> AI is a bubble

Many say this but Generative AI and LLMs have gotten bunched up with everything else. There is a clear need for vectors and multimodal search. There is no core SQL statement to find concepts within an image for example. Machine learning models support that with arrays of numbers (i.e. vectors). pgvector adds vector storage and similarity search for Postgres. There was a recent post about storing vectors in SQLite (https://github.com/asg017/sqlite-vec).

> Even if your business is another AI grift, you probably only need to import openai.

There's much more than this. There are frameworks such as LangChain, LlamaIndex and txtai (disclaimer I'm the primary author of https://github.com/neuml/txtai) that handle generating embeddings locally or with APIs and storing them in databases such as Postgres.