It may be a generational thing, a matter of familiarity with computing and computers. For someone who's lived through the 80s, "handrolling postgres" doesn't sound nearly as scary as you imagine.
I expect the cost/benefit analysis of "handrolling postgres on a beefy Hetzner server" vs "navigating the menus and options of AWS services" would be different for different teams.
I'm looking at the ansible playbooks to setup my favourite beefy baremetal Hetzner server (128GB ram, Ryzen 9 5950X 16-Core, 450Gb fast NVME SSD, 3.5TB x2 NVME SSDs, 155€/month):
- Install Debian 11 while booted in rescue mode.
- Setup the root file system encryption using cryptsetup and dropbear (to enter the key during the boot through SSH). Involves chroot and some fun commands.
- Setup ZFS encrypted mirror filesystem for the two additional SSDs.
- OpenSSH hardening and Teleport installation.
- Kubernetes installation (K3S)
- Connecting kubernetes to my Argo CD instance or an existing Kubernetes cluster.
And then through GitOps:
- Installation of openebs-zfspv
- Installation of kube-prometheus-stack helm chart
- Installation of (many) postgresql instances and other craps
I have been playing with Linux servers for 20 years and I find this fun and rewarding. But I do understand people saying that baremetal Hetzner is not for everyone. Especially if you start to have requirements such as "data must be encrypted at rest".
Our terraform config for RDS is about 50 lines of configuration. We get a much smaller instance for our money, but ultimately figuring out all of what you posted isn't a good use of my time (yet).
Kubernetes is not a cluster orchestration software. It’s an app orchestration software that also supports multiple nodes. If you don’t understand the difference, you’re missing the value proposition of k8s.
Well, yeah. I'm not sure I do see the value prop of k8s.
The fact that it's a supports multi-node means that you get all of the drawbacks of a multi-node system without any of the benefits. It's single node deployment but worse.
Yes, that list reminds me of the exaggerated posts about "look how hard it is to install Firefox on Linux!!". Claiming that setting up a Debian Postgres server necessarily entails knowing ZFS and Kubernetes is quite a reach
Not sure about Debian, but I believe Ubuntu Server will let you setup an mdadm mirror, LUKS (with LVM), and install and enable a Postgres server with a few buttons in the install wizard. It can even fetch SSH authorized keys from a Github account, covering by far the most important SSH hardening step (disabling passwords). Most hosting providers will also offer a one-click deploy that may similarly add your keys and do other common config
A better example of something that hosted databases makes a lot easier out of the box would be backup, replication, and monitoring
I'm not sure. I rather use a lightweight kubernetes or perhaps nomad than do everything that kubernetes does without it. It sounds even more complicated. But I agree that for one single postgresql isolated from everything, kubernetes is overkill.
I've been on unmanaged MySQL for ~8 years now. Considered switching to managed but I'm not seeing any performance or stability issues, so I guess I'll just keep this train going until it craps out on me, then restore a backup onto a managed service, say sorry for the downtime, and that'll be that.
I get what your saying here, but its again the comparison with Github and extremely large sites thats the problem. Most of us dont run google/fb/github scale sites and the backup will probably fit on an external HDD and in some cases would be even downloadable over S3 in an hour.
That's what I can't be comfortable with : "would".
How long does it take to try it? A day?
Well then try it, either it'll work flawlessly on the first try, either you'll learn that the backup you have doesn't include logins, password and the security configuration that goes with it.
Or that the dump you took lost some data because it wasn't in the right encoding.
Or the tape drive you're using need specific drivers that aren't available on the web anymore because the company website's closed.
... This is a work of fiction. Any similarity to actual events might be purely coincidental...
Yea i also dont see the point of having everything “managed”.
RDS is crazy expensive compared to self hosting and if i have the DB on prem its much faster as well. And the admin overhead is not so big to be honest if you are using just one DB.
If you are Google scale of course things will change, but I think 80% of loads dont need any managed AWS stuff, replications, multiple nodes, kubernetes, etc… just periodic backups and it runs fine.
But people nowadays just like throwing money around I guess, instead of trying to set it up for themselves.
I don't think "modern" stacks are sane enough to "handroll" anymore. Sure, you can do it, but look at the poster in this thread that details the setup of a debian server.
Kubernetes, "Argo CD", zero-trust, the sheer amount of "management" is off the chart.
"Installation of kube-prometheus-stack helm chart"..
"Installation of openebs-zfspv"..
A lot of "modern" stacks is just complexity for the sake of complexity. Google is doing it so clearly our 5 man startup will face the same scaling problems, or something like that.
Many of the problems these tools solve are problems that wouldn't exist building things the old fashioned way. If you stick relatively close to the metal, operating this stuff is pretty easy.
However it's notable that a very valid reason to prefer managed services as a SaaS is to cover your ass if things go wrong. Your SLA violation is their SLA violation.
The complex stacks are insane to operate yourself but very simple to operate if you use a managed offering instead, and they do provide genuine value.
I can set up a new golang app on ECS with a load balancer and database, with a CI/CD pipeline, with 0 downtime updates in about 30 minutes. Most of that time is waiting for AWS to give me a load balancer. Our work applications have been running with this setup for over 2 years and the only thing Ivs done with infra in that time is adjusted instance sizes and bumped a MySql version.
To be fair, I can set up a new service on bare metal in minutes too, mostly because I don't need to set up everything from scratch.
I don't really need to set up a database or load balancer or anything like that because it already exists on the server. Just create a new database schema, new systemd service, new nginx rule.
And more to the point, learning how to "handroll" Postgres could be beneficial. You could have learned about options for limiting the amount of memory, etc
Sure, managed is easier and use it when you can afford it easily. But before that, it's better to see how things are going (mem usage, disk usage, bottlenecks, etc)
Or you could use sqlite until you need postgres. I have to admit I reach for postgres immediately when in many cases sqlite would have served me just as well.
SQLite seems to be gaining popularity with even larger projects which is surprising to me. As I see it, the big value prop of SQLite is that it runs in-process which, for a webapp, is almost nil?
Other than that, it's not like queries are any simpler and the "simple" type system is, in my opinion, not a feature. I get that some might disagree with that.
Is there some other reason why you would prefer it?
It has an extremely low barrier of entry while providing the features of a relational database when all you need is a local data store. The files are trivially easy to transport using standard tools when needed. I've been in back-end automation/integration for my entire career and use these kinds of things all the time. The overhead of maintaining a full networked RDBMS isn't always something I want (or need) to do.
I use it in my production SaaS serving around 4 million requests per month on one of the lowest DigitalOcean tiers. The big ones for me were cost, operating simplicity and performance. I don’t need a separate process or server running which has saved me some money and time, and the app’s workload doesn’t need a ton of inserts so the speed is blazing fast.
It may not quite have all of the JSON features of Postgres, but recently the JSON handling has become way more usable in SQLite. More than usable for sure.
Where possible go with simple but abstracted cloud storage, cloud tables and then a cloud db that is managed. We use Azure mostly right now but our storage system works across Azure storage, Amazon S3, Google Cloud and others. For tables, Azure Tables mainly. For database with filtering/paging better performant and ACID compliant cloud db, CosmosDB currently which is a dream with the differing apis (SQL, Mongo, Cassandra, Tables style). The more you can avoid vendor or dev-lockin the better so simple formats/messaging/routing and abstracted specifics/implementations.
When you store data in storage or a cloud db the scaling is "infinite" and you can also snapshot or backup to anther one, you never worry about data.
The front ends and APIs are mostly repos pushed to app/web services and everything else in data storage. Super simple and anywhere you need some special service that can be serverless or a dedicated setup, like maybe a RDBMS, chat server, network server or WebRTC/socket endpoint that interacts with the simple side. These managed as well if possible, though not always. Additionally, build cheap and horizontal scaling on web/real-time servers. Vertical scaling and sharding is for suckers.
Side note: CosmosDB is like a combination of NoSQL, document databases and GraphQL and it is ACID compliant and you can do REST or SQL, it can even wrap MongoDB and Cassandra and make them ACID compliant. It really feels like the best way. Not many have all that and ACID compliant. Not even Amazon Redshift has that, DynamoDB does if specified. Google Firestore does if specified. I used to be big on RDMBS Oracle then MSSQL then PostgreSQL and those are great for backing/reporting etc but CosmosDB combines all the power of RDMBS, NoSQL, document databases, and ACID compliant with little worry about scale. It is vendor lockin to Azure, which you can route around with platform abstraction, but currently can't be beat. As you got that clean API layer you could change later but best way is limited/clean and if possible, non breaking change API layers/signatures.
Really do not share your experience with CosmosDB, some of it's attributes that currently make me miserable include...
- Scaling is not infinite, it's up to 20gb per partition key (1), which can't be changed after document creation.
- One set of global indices, no equivalent to DynamoDB's secondary indices.
- Still can't run their docker container on mac (2) natively.
- Weird SQL-like dialect that's required for all but the simplest queries. JOINs are spectacularly awkward.
- Tooling is horrific. Based on the previous point, no existing tooling works for it (and nobody is building tooling for a DB with such minimal market share). For example, I needed to manually deleted 30 or so documents/rows yesterday - only way to achieve this is with 30 separate click-to-deletes in their UI.
- Minimises analytics options. There exist a plethora of business intelligence type tools that will happily sit on top of most common DBs. None of them like CosmosDB. So you're stuck with synapse link or whatever MS calls it now.
Overall it seems to combine the worst aspects of both RDS and document stores, with the worst aspects of both traditional and serverless infrastructure.
The partition key limitation is something you can work out with smart partitioning and horizontal scaling. We do the same already with storage/tables to prevent large data blocks at a smaller limit even for speed/lookups/map.
Tooling could be improved and will be, it is fairly new still and the Azure Cosmos DB Emulator is not bad.
There is a CosmosDB Synapse setup that allows more analytics/intel on top like you said but same with other NoSQL, takes a bit to get worked in.
I actually like the flexibility of query types and that they include SQL as it makes it a bit more standard and somewhat less vendor lockin. You can use other types as well Mongo/Cassandra/Tables syntax. For filtering the SQL side isn't bad but most of what we do is flat/associative and not heavily normalized. For most of our data we are very cache heavy as well to reduce db hits and retries.
ACID compliance is huge and there are some design considerations.
Managed and no need to backup with snapshots. PG still has manage concerns with size, logs, access, scale etc. Nothing you have to do with cloud storage. I do love me some PostgreSQL but use it less and less except for reporting or heavily filtered needs. Horizontal over vertical for comfort and simplicity.
having no skin in the game, I feel this is a very interesting view, especially when you are offering a managed service building on top of another.
go through enough levels of dependencies, and someone who could do a few layers of the onion in-house can sweep in with competitive pricing/offerings a la sherlocking.
Big agree here.
Yes, you can save stupid money by handrolling postgres on an extremely beefy Hetzner server, or you can pay someone else and keep building your product: https://onlineornot.com/self-hosting-vs-managed-services-dec...
This isn't to say, "don't bother learning how to do it yourself", but more "learn to pick your battles".