> Use managed services for as long as possible Big agree here. Yes, you can save...

Radim · on June 15, 2023

It may be a generational thing, a matter of familiarity with computing and computers. For someone who's lived through the 80s, "handrolling postgres" doesn't sound nearly as scary as you imagine.

I expect the cost/benefit analysis of "handrolling postgres on a beefy Hetzner server" vs "navigating the menus and options of AWS services" would be different for different teams.

speedgoose · on June 15, 2023

I'm looking at the ansible playbooks to setup my favourite beefy baremetal Hetzner server (128GB ram, Ryzen 9 5950X 16-Core, 450Gb fast NVME SSD, 3.5TB x2 NVME SSDs, 155€/month):

- Install Debian 11 while booted in rescue mode.

- Setup the root file system encryption using cryptsetup and dropbear (to enter the key during the boot through SSH). Involves chroot and some fun commands.

- Setup ZFS encrypted mirror filesystem for the two additional SSDs.

- OpenSSH hardening and Teleport installation.

- Kubernetes installation (K3S)

- Connecting kubernetes to my Argo CD instance or an existing Kubernetes cluster.

And then through GitOps:

- Installation of openebs-zfspv

- Installation of kube-prometheus-stack helm chart

- Installation of (many) postgresql instances and other craps

I have been playing with Linux servers for 20 years and I find this fun and rewarding. But I do understand people saying that baremetal Hetzner is not for everyone. Especially if you start to have requirements such as "data must be encrypted at rest".

maccard · on June 15, 2023

Our terraform config for RDS is about 50 lines of configuration. We get a much smaller instance for our money, but ultimately figuring out all of what you posted isn't a good use of my time (yet).

speedgoose · on June 15, 2023

I agree, if you don't have the time and don't find it fun, managed services in the cloud is a better idea.

marginalia_nu · on June 15, 2023

Why are you running cluster orchestration software on a single machine?

kortilla · on June 15, 2023

Kubernetes is not a cluster orchestration software. It’s an app orchestration software that also supports multiple nodes. If you don’t understand the difference, you’re missing the value proposition of k8s.

marginalia_nu · on June 15, 2023

Well, yeah. I'm not sure I do see the value prop of k8s.

The fact that it's a supports multi-node means that you get all of the drawbacks of a multi-node system without any of the benefits. It's single node deployment but worse.

speedgoose · on June 15, 2023

My clusters don't necessarily have one machine, but I have a few clusters with a single node.

K3S is pretty lightweight and kubernetes is much more than cluster orchestration, so the pros win against the cons.

gog · on June 15, 2023

If you skip Kubernetes the setup is not that complicated.

didntcheck · on June 15, 2023

Yes, that list reminds me of the exaggerated posts about "look how hard it is to install Firefox on Linux!!". Claiming that setting up a Debian Postgres server necessarily entails knowing ZFS and Kubernetes is quite a reach

Not sure about Debian, but I believe Ubuntu Server will let you setup an mdadm mirror, LUKS (with LVM), and install and enable a Postgres server with a few buttons in the install wizard. It can even fetch SSH authorized keys from a Github account, covering by far the most important SSH hardening step (disabling passwords). Most hosting providers will also offer a one-click deploy that may similarly add your keys and do other common config

A better example of something that hosted databases makes a lot easier out of the box would be backup, replication, and monitoring

speedgoose · on June 15, 2023

I'm not sure. I rather use a lightweight kubernetes or perhaps nomad than do everything that kubernetes does without it. It sounds even more complicated. But I agree that for one single postgresql isolated from everything, kubernetes is overkill.

Ambix · on June 18, 2023

OMG you do not need this bloatload for just PG hosting. Just harden SSH, harden PG configs and voila :)

riku_iki · on June 16, 2023

> - Installation of (many) postgresql instances and other craps

and you missing part about fault tolerance and fall back which is most complicated.

porsager · on June 20, 2023

pg_auto_failover has your back - https://github.com/hapostgres/pg_auto_failover

riku_iki · on June 16, 2023

and you absolutely can't be sure your saas vendor is doing encryption and hardening.

erhaetherth · on June 15, 2023

I've been on unmanaged MySQL for ~8 years now. Considered switching to managed but I'm not seeing any performance or stability issues, so I guess I'll just keep this train going until it craps out on me, then restore a backup onto a managed service, say sorry for the downtime, and that'll be that.

whynotmaybe · on June 15, 2023

Do you know how long the downtime might be? Have you tested your backup recently?

Gitlab had a long downtime because the backup was huge and on the other side of the country. The backup server was on a low speed network.

https://www.arcserve.com/blog/lessons-learned-gitlabs-massiv...

How much money would you lose if you were down for one week? How many customer would you lose?

How much credibility would you lose?

For my peace of mind, I can't afford a spof when I know one lingering.

tacker2000 · on June 15, 2023

I get what your saying here, but its again the comparison with Github and extremely large sites thats the problem. Most of us dont run google/fb/github scale sites and the backup will probably fit on an external HDD and in some cases would be even downloadable over S3 in an hour.

whynotmaybe · on June 17, 2023

That's what I can't be comfortable with : "would".

How long does it take to try it? A day?

Well then try it, either it'll work flawlessly on the first try, either you'll learn that the backup you have doesn't include logins, password and the security configuration that goes with it. Or that the dump you took lost some data because it wasn't in the right encoding.

Or the tape drive you're using need specific drivers that aren't available on the web anymore because the company website's closed.

... This is a work of fiction. Any similarity to actual events might be purely coincidental...

tacker2000 · on June 15, 2023

Yea i also dont see the point of having everything “managed”.

RDS is crazy expensive compared to self hosting and if i have the DB on prem its much faster as well. And the admin overhead is not so big to be honest if you are using just one DB.

If you are Google scale of course things will change, but I think 80% of loads dont need any managed AWS stuff, replications, multiple nodes, kubernetes, etc… just periodic backups and it runs fine.

But people nowadays just like throwing money around I guess, instead of trying to set it up for themselves.

SanderNL · on June 15, 2023

I don't think "modern" stacks are sane enough to "handroll" anymore. Sure, you can do it, but look at the poster in this thread that details the setup of a debian server.

Kubernetes, "Argo CD", zero-trust, the sheer amount of "management" is off the chart.

"Installation of kube-prometheus-stack helm chart".. "Installation of openebs-zfspv"..

It's not postgres that's the problem here.

marginalia_nu · on June 15, 2023

A lot of "modern" stacks is just complexity for the sake of complexity. Google is doing it so clearly our 5 man startup will face the same scaling problems, or something like that.

Many of the problems these tools solve are problems that wouldn't exist building things the old fashioned way. If you stick relatively close to the metal, operating this stuff is pretty easy.

However it's notable that a very valid reason to prefer managed services as a SaaS is to cover your ass if things go wrong. Your SLA violation is their SLA violation.

maccard · on June 15, 2023

The complex stacks are insane to operate yourself but very simple to operate if you use a managed offering instead, and they do provide genuine value.

I can set up a new golang app on ECS with a load balancer and database, with a CI/CD pipeline, with 0 downtime updates in about 30 minutes. Most of that time is waiting for AWS to give me a load balancer. Our work applications have been running with this setup for over 2 years and the only thing Ivs done with infra in that time is adjusted instance sizes and bumped a MySql version.

marginalia_nu · on June 15, 2023

To be fair, I can set up a new service on bare metal in minutes too, mostly because I don't need to set up everything from scratch.

I don't really need to set up a database or load balancer or anything like that because it already exists on the server. Just create a new database schema, new systemd service, new nginx rule.

raverbashing · on June 15, 2023

I agree

And more to the point, learning how to "handroll" Postgres could be beneficial. You could have learned about options for limiting the amount of memory, etc

Sure, managed is easier and use it when you can afford it easily. But before that, it's better to see how things are going (mem usage, disk usage, bottlenecks, etc)

Ambix · on June 18, 2023

Yep, I'd always prefer freedom and power of hosting my own PG instance upon a some robust VM offering to guiggling with clumsy AWS menus.

PaulRobinson · on June 18, 2023

If you're managing AWS infra through the web application, you're definitely not doing things as per AWS-prescribed best practice.

mejutoco · on June 15, 2023

Or you could use sqlite until you need postgres. I have to admit I reach for postgres immediately when in many cases sqlite would have served me just as well.

cachehit · on June 15, 2023

SQLite seems to be gaining popularity with even larger projects which is surprising to me. As I see it, the big value prop of SQLite is that it runs in-process which, for a webapp, is almost nil?

Other than that, it's not like queries are any simpler and the "simple" type system is, in my opinion, not a feature. I get that some might disagree with that.

Is there some other reason why you would prefer it?

pravus · on June 15, 2023

It has an extremely low barrier of entry while providing the features of a relational database when all you need is a local data store. The files are trivially easy to transport using standard tools when needed. I've been in back-end automation/integration for my entire career and use these kinds of things all the time. The overhead of maintaining a full networked RDBMS isn't always something I want (or need) to do.

Glench · on June 15, 2023

Here’s a great article explaining some of the benefits of using SQLite in production: https://blog.wesleyac.com/posts/consider-sqlite

I use it in my production SaaS serving around 4 million requests per month on one of the lowest DigitalOcean tiers. The big ones for me were cost, operating simplicity and performance. I don’t need a separate process or server running which has saved me some money and time, and the app’s workload doesn’t need a ton of inserts so the speed is blazing fast.

throwaway2037 · on June 15, 2023

"4 million requests per month"

There are 2,592,000 seconds in a month. So, 1.5 requests per second?

Glench · on June 15, 2023

Oh whoops it’s actually 4 million per day! Good catch.

mejutoco · on June 15, 2023

I agree with you about the worse type system not being a feature. Also missing the json features of postgres is inconvenient.

Only one file to backup or deploy is the biggest advantage of sqlite IMO.

roblh · on June 15, 2023

It may not quite have all of the JSON features of Postgres, but recently the JSON handling has become way more usable in SQLite. More than usable for sure.

throwaway2037 · on June 15, 2023

Hat tip to both the amazing native JSON support from Postgres, and the SQLite module with JSON functions: https://www.sqlite.org/json1.html

rozenmd · on June 15, 2023

Check out Cloudflare D1

cachehit · on June 15, 2023

It looks neat but I am still not sure what the appeal is exactly. Is it cost reduction?

rozenmd · on June 15, 2023

Cost reduction and simplicity - costs nothing if no one is using it

ghiculescu · on June 15, 2023

I’m super intrigued by SQLite these days.

11 years ago you used Postgres because that’s what Heroku told you to use.

drawkbox · on June 15, 2023

Yes especially data, simplicity is key as well.

Where possible go with simple but abstracted cloud storage, cloud tables and then a cloud db that is managed. We use Azure mostly right now but our storage system works across Azure storage, Amazon S3, Google Cloud and others. For tables, Azure Tables mainly. For database with filtering/paging better performant and ACID compliant cloud db, CosmosDB currently which is a dream with the differing apis (SQL, Mongo, Cassandra, Tables style). The more you can avoid vendor or dev-lockin the better so simple formats/messaging/routing and abstracted specifics/implementations.

When you store data in storage or a cloud db the scaling is "infinite" and you can also snapshot or backup to anther one, you never worry about data.

The front ends and APIs are mostly repos pushed to app/web services and everything else in data storage. Super simple and anywhere you need some special service that can be serverless or a dedicated setup, like maybe a RDBMS, chat server, network server or WebRTC/socket endpoint that interacts with the simple side. These managed as well if possible, though not always. Additionally, build cheap and horizontal scaling on web/real-time servers. Vertical scaling and sharding is for suckers.

Side note: CosmosDB is like a combination of NoSQL, document databases and GraphQL and it is ACID compliant and you can do REST or SQL, it can even wrap MongoDB and Cassandra and make them ACID compliant. It really feels like the best way. Not many have all that and ACID compliant. Not even Amazon Redshift has that, DynamoDB does if specified. Google Firestore does if specified. I used to be big on RDMBS Oracle then MSSQL then PostgreSQL and those are great for backing/reporting etc but CosmosDB combines all the power of RDMBS, NoSQL, document databases, and ACID compliant with little worry about scale. It is vendor lockin to Azure, which you can route around with platform abstraction, but currently can't be beat. As you got that clean API layer you could change later but best way is limited/clean and if possible, non breaking change API layers/signatures.

alsiola · on June 15, 2023

Really do not share your experience with CosmosDB, some of it's attributes that currently make me miserable include...

- Scaling is not infinite, it's up to 20gb per partition key (1), which can't be changed after document creation.

- One set of global indices, no equivalent to DynamoDB's secondary indices.

- Still can't run their docker container on mac (2) natively.

- Weird SQL-like dialect that's required for all but the simplest queries. JOINs are spectacularly awkward.

- Tooling is horrific. Based on the previous point, no existing tooling works for it (and nobody is building tooling for a DB with such minimal market share). For example, I needed to manually deleted 30 or so documents/rows yesterday - only way to achieve this is with 30 separate click-to-deletes in their UI.

- Minimises analytics options. There exist a plethora of business intelligence type tools that will happily sit on top of most common DBs. None of them like CosmosDB. So you're stuck with synapse link or whatever MS calls it now.

Overall it seems to combine the worst aspects of both RDS and document stores, with the worst aspects of both traditional and serverless infrastructure.

(1) https://learn.microsoft.com/en-us/azure/cosmos-db/partitioni... (2) https://learn.microsoft.com/en-us/azure/cosmos-db/local-emul...

drawkbox · on June 15, 2023

The partition key limitation is something you can work out with smart partitioning and horizontal scaling. We do the same already with storage/tables to prevent large data blocks at a smaller limit even for speed/lookups/map.

Tooling could be improved and will be, it is fairly new still and the Azure Cosmos DB Emulator is not bad.

There is a CosmosDB Synapse setup that allows more analytics/intel on top like you said but same with other NoSQL, takes a bit to get worked in.

I actually like the flexibility of query types and that they include SQL as it makes it a bit more standard and somewhat less vendor lockin. You can use other types as well Mongo/Cassandra/Tables syntax. For filtering the SQL side isn't bad but most of what we do is flat/associative and not heavily normalized. For most of our data we are very cache heavy as well to reduce db hits and retries.

ACID compliance is huge and there are some design considerations.

What cloud DB do you use the most DynamoDB?

alsiola · on June 15, 2023

Most experience with DynamoDB, which isn't without challenges, and has a steepish learning curve to get the most out of it.

I work in pre-market fit startup though, so hyper-scaleability is not really a consideration right now, and would much prefer to be running postgres.

mamcx · on June 15, 2023

> Where possible go with simple but abstracted cloud storage

This is the total opposite of simplicity. Simple is PG and backup setup. DONE.

Maybe you mean "comfort"?

drawkbox · on June 16, 2023

Managed and no need to backup with snapshots. PG still has manage concerns with size, logs, access, scale etc. Nothing you have to do with cloud storage. I do love me some PostgreSQL but use it less and less except for reporting or heavily filtered needs. Horizontal over vertical for comfort and simplicity.

rldjbpin · on June 16, 2023

having no skin in the game, I feel this is a very interesting view, especially when you are offering a managed service building on top of another.

go through enough levels of dependencies, and someone who could do a few layers of the onion in-house can sweep in with competitive pricing/offerings a la sherlocking.

jordiburgos · on June 15, 2023

Which one do you recommend?

Those get quite expensive.

rozenmd · on June 15, 2023

I personally use RDS