>By default, PostgreSQL listens on a TCP port 5432.
This post seems to outright state that by default postgres is listening to everyone via TCP for connection.
This is not true.
Unless you edit pg_ident.conf, your postgres install will not listen for connections outside of on localhost. So, while it's correct to say that it listens to TCP port 5432, there is a very narrow limit to whom it's listening for. Namely, the same machine.
Postgres is pretty secure by default. It doesn't allow external connections. It also requires a username and password to connect with permissions on the database you're connecting to.
Compare that to something like redis. They at least used to listen for connections external by default and not even have a password to connect. I can imagine it's still very easy to find all kinds of interesting stuff snooping around on port 6379.
>Compare that to something like redis. They at least used to listen for connections external by default and not even have a password to connect
Not anymore since version 3.2.0 [0]
>Unfortunately many users fail to protect Redis instances from being accessed from external networks. Many instances are simply left exposed on the internet with public IPs. For this reasons since version 3.2.0, when Redis is executed with the default configuration (binding all the interfaces) and without any password in order to access it, it enters a special mode called protected mode. In this mode Redis only replies to queries from the loopback interfaces, and reply to other clients connecting from other addresses with an error, explaining what is happening and how to configure Redis properly.
Yes, the article is a bit wierd. If you want to require a reverse ssh tunnel, you should be able to leave postgres in it's default config - typically listening on 5432 on localhost (or, in its upstream default - only on a domain socket, I believe).
Now, you probably want to allow services to connect without needing to speak ssh - so you probably do want a bind whitelist, an IP whitelist and ssl - but the article is off to a bad start by being wrong about defaults.
To be super clear, because it can be less than perfectly obvious, this means on `localhost` but also on loopback. There is no 'real' network device talking to this port.
Hmm I think the default pg_hba.conf uses "trust" and so will allow passwordless connections as postgres on localhost, you just have to specify the username (psql will use your username by default)
This is the case for the official docker image at least, and I'm fairly certain also true of distro package managers
> Unless you edit pg_ident.conf, your postgres install will not listen for connections outside of on localhost.
I don't know what the defaults are, but pg_ident.conf has absolutely nothing to do with this. The main configuration file (I think postgresql.conf usually) has listen_addresses, which controls the addresses on which postgres listens, as you might guess.
pg_hba.conf (not pg_ident.conf) controls the authentication methods the server asks from the client, depending on how they're connecting.
You're technically correct, but the difference is immaterial unless we're talking about outright pre-auth exploits. Or exploits of the auth itself, I guess.
(I think I recall exactly one in the history of PostgreSQL since I started using it, but it is what it is.)
Modern linux security thinking is that any sort of code running inevitably leads to root privilege.
Put another way, any user can become root through privilege escalation, so access control is pointless, since any untrusted user can take over the machine.
The real unit of security is the whole OS (VM), not its internal user boundaries.
This kind of mindset exists but is not a consensus. And even if it became consensus, it would still take many, many years for most developers to stop using ssh forwarding with localhost listeners in a security reliant way.
Also, the loopback is used as a networking interconnect or guest->host channel for sandboxed containers and VMs, so it's security sensitive in this way.
Air gap means there’s no physical connection. Like literal air between one wire and the other. Air gap should never be thought of something like a “logical air gap” because so such thing exists. Either the cable is physically plugged in or it isn’t.
Pretty sure that the common definition (also the one from Wikipedia [0]) is that there's no connection to a network (including wireless ones; the medium is not an important part in the concept).
The article appears to say that a database/DBMS in the ideal world is not accessible over network at all. That is, apparently only accessible by users who have physical access to the machine it is on.
"Unfortunately, this sort of air-gapped setup is not something PostgreSQL supports out-of-the-box."
Not sure what they mean by "out of the box", but you can make the listen_addresses list empty:
"listen_addresses (string)
...If the list is empty, the server does not listen on any IP interface at all, in which case only Unix-domain sockets can be used to connect to it..."
I’m sure the article author meant, by saying “out-of-the-box,” the features available on initial, default install.
I disagree with the author and agree that indeed a fresh install supports (i.e. allows one to configure without rebuilding) not talking over a network at all.
That's a good question. I agree with your definition of AG. However, the only time a database (or any other kind of data store) would be useful in an AG setting would be if it's part of an AG system including network and other computers.
Perhaps we need another somewhat similar term. It's like null-routing or firewalling devices on your network.. they're technically "connected" but if they cannot dial out they're in some ways gapped. (This is handy for dubious quality IoT devices, they can't phone home, auto-patch to drop features, or share your usage information with $corp, but still respond to local network commands).
To some extent tunneling is security through obscurity (an SSH tunnel has moved the port you need to secure from 5432 to 22)
I personally secure my postgres instances by putting them in a silod vpc without internet access. I then limit incoming connections to only be allowed from the separate vpc containing the application server
I then use a bastion host when I need to access ssh on the instance. The bastion host remains off and inaccessible except for when I need to perform maintenance.
The advantages of this is that there is no "always-open" access to the instance.
Probably because the blog post is an advertisement for their product, which already allows you to implement bastion hosts as you describe.
From the bottom:
> Databases do not need to be exposed on the public Internet and can safely operate in air-gapped environments using Teleport’s built-in reverse tunnel subsystem.
They've been around a while, it actually seems quite cool. Bastion hosts are simple to setup, sure, but Teleport adds a whole bunch of porcelain on top, e.g. integration with SSO, web UI for administration, etc.
Haven't used them myself but I wouldn't be against trying it if in the market for something like that.
Yep, and your answer is the better default answer.
Bastion<->app host<->DB
Bastion can talk to the net.
App host can talk outbound but inbound only accepts bastion and DB.
DB can only talk with app host.
Obviously, you harden everything appropriately... But with this arrangement, it's very difficult to penetrate this sort of network. Think of it as a network that as a whole is default-deny.
Currently I have a postgres/timescaledb running on EC2 in VPC which has no internet access. I do VPN tunnel to access private local subnet and have security group settings that allows my and 1 more machine access only. I usually have a jump server that I use, but I don't want to keep my ssh keys on the server or leave it behind.
One alternative to keeping the bastion off is to restrict access to the bastion by IP address. Ideally, the list of IP addresses that need access to the bastion should be relatively small.
In cloud environments, it's straightforward to update network firewall rules as IP addresses change. Residential and office IP addresses don't change much so it's not much of a hassle in my experience. That said, it can get annoying if you find your self working on a network that rotates your IP address frequently (e.g., a hotspot).
What value is added by using a separate VPC? Equivalent restrictions can more easily be done with security groups, including on the outbound networking
I would suggest considering segregation by subnets (in addition to security groups), using public / private subnets, where any server in a private subnet (behind a nat gateway) doesn't/can't have a public ip address, and therefore cannot be accessed via the public internet.
You could then use a bastion to access servers in the private subnet, or use something like AWS Session Manager which provides command line access via web browser in lieu of a bastion.
Adding more mechanisms on top is pointless when the effort could be invested in, for example, automated auditing of SGs, which is vastly more potent from a hardening perspective than adding additional layers of technical redundancy that are still exposed to the same flawed human processes.
When you reach a team of 10-20 folk on a project, stuff tends to get confusing and/or lazy with elaborate configurations. Security design therefore is about more about managing that outcome through simplicity and process hardening than.. well.. I don't even know what threats a separate VPC protects against
> When you reach a team of 10-20 folk on a project, stuff tends to get confusing
From the perspective of maintaining security in the least confusing way possible in a team setting, I could see a scenario where you have a VPC that only has private subnets, and no public subnets at all.
You could call it “Database VPC” and assuming your team doesn’t reconfigure the VPC to add public subnets, you can be comfortable with your team adding more EC2 servers / databases / whatever in that VPC since they would all be in private subnets inaccessible by the public internet.
And then you could have a 2nd VPC with private/public subnets that are less locked down than the database VPC, which might contain your load balancers, application servers, etc.
I suppose the benefit would be the logical separation of databases into a VPC without any public subnets. And the only way to gain access to subnets in that VPC would be through VPC peering.
(The above assumes you’re using AWS). Although after typing all that, I still think you can accomplish a comparably secure (on a network level) architecture using security groups, or even separate subnets, without separate VPCs.
In general I try to avoid multiple VPCs because VPC peering in AWS can get tricky (or impossible if both VPCs have overlapping CIDR blocks).
You can put other machines with a highly restrictive set of network rules that allow data to cross inside the network and outside the network in very controlled ways.
I am curious. Most of the monitoring stacks include something like telegram/Prometheus to collect metrics and send to influxdb. How would you do this via email?
Inside the he bastion network/VPC, there can be logging, monitoring, ad/kerb+openldap stuff to do the requisite thing. And your logging and monitoring can send trigger emails when they catch bad things.
Your inbox needs to be an automatable inbox, say controlled by bleeder or another flowengine. From there you can build a full messaging dashboard and have it done in something that could pass a very stringent audit.
A good overview, but the most secure way is to not give any table-level access privileges and only allow access using SECURITY DEFINER stored procedures (the PG equivalent of the setuid bit). For instance if you have a table of users with login and hashed salted password, you could implement a check_password() procedure. If the app account is compromised, it would not have access to the password hashes or even be able to enumerate users.
I have an impression that many developers actively avoid stored procedures, trigger or any 'complex' database features because they don't fit well into a typical CI/CD pipeline which deals only with application code.
It is quite possible to test stored procedures, it's just requires some additional work/infrastructure.
It's mostly a familiarity and education problem, as well as the fact MySQL didn't have good support for this. Developers are exposed to procedural programming languages at university, but seldom to databases, let alone stored procedures. Unit tests can be written in SQL or PL/PGSQL and will typically run much faster than a CI pipeline.
When used for business logic they also execute about 20x faster than the same logic encoded in a client, and in far fewer LOC. Getting rid of all those round-trips has a huge effect!
Running a slow hash operation that should take a non-negligible amount of CPU on your single core per process database server isn’t a great idea.
That’s not a rag on stored procedures more generally. Just that specific use case scales poorly to a large number of operations as its inherently cpu bound.
You can do the slow pbkdf2 hash on your webserver and use the result for your stored procedure to check the password. If the password column is only visible to the stored procedure you have used the best features of both systems.
I agree with the password example. I don't agree with people updating queries willy-nilly. Updated queries should go through the same life-cycle of testing (etc) as code, meaning they can simply go out with continuous application deployments anyways.
I mean, it's likely you have some other stored procedure that does enumerate users, for use by app adminstrators. Ideally it doesn't return password hashes though, since they are really not useful for humans to look at. Also ideally, the admin stored procedures are not executable by the end-user app.
Usually application uses connection pool to connect with database server. I'm not sure if that pattern could be used, when different requests need different database roles.
It's doable. You have a web role in DB that's used for public facing part of a website and then admin role that's used for backoffice part of the website. You can then separate these backends to different machines or at least to a differnet UNIX users, so that when someone breaks through via public facing backend, he doesn't immediately get access to everything, but still has to work for it a bit.
It's not as simple, you still have to make sure public backend user can't access web logs, that may reveal session id of an backoffice admin account or other information useful for breaking in other parts of the website.
Eventually, the attacker will gain access, but this is useful to slow him down enough, so there's a chance he'll give up, or you notice something suspicious.
For example pgbouncer allows you to connect via multiple roles/credentials to a database. It's just about how you configure your pool.
Not to mention that with the microservices fad you could have different server processes with different credentials in their connection pools for different operations.
It feels weird to me that this blog post would suggest acquiring Let's Encrypt certificates for certificate encryption. While it's great for something public facing that needs your CA installed by default, creating certificates for things that you probably don't want being public - like your database backend - just makes it more discoverable. For example, certificate transparency logs mean that anybody can see what certificates Let's Encrypt has handed over to you. Example: https://crt.sh/?q=ycombinator.com
Of course, this is a security through obscurity type of approach, and you'd want to secure your database whether or not somebody knew where it was running. But there's a difference between somebody seeing that you've just created `staging-psql.foo.com` that you might still be configuring, and the passive background noise of internet port scanning that's a little less targeted.
I don’t understand the value of Certificate Transparency: I could see triggering an email to the owner of a domain every time a certificate is issued, but what’s the value of a public log? It seems to me that it just increases attack surface of your services.
This is surely a silly question (and probably a lot of devops will think I am an idiot), but what do you recommend to secure a PostgreSQL that is accesed by different IPs? We use Azure PostgreSQL database, which I guess they take care of most of the security, but they allow us to set up firewall rules, as only allowing connectins by established IPs.
My (our) problem is that we use a lof ot AWS lambda functions that read and write to the database, and they always execute from different (dynamic) IPs, so what is the best solution in this case?
Seems the old adage of "beware of blog posts bearing gifts" strikes again.
For blog post masquerading as a how-to on securing Postgres, the fact the article only fleetingly mentions the word "function" once is not cool.
One of the biggest things you can do for Postgres (or any database for that matter) is enforce the use of stored procedures ("functions" in PG-speak) rather than direct SQL queries.
SQL Injection attacks are common as muck. Stored procedures are a quick and easy way to mitigate them.
Stored procedures also have the added bonus of allowing the DBA to remain in control and ensure a higher quality of SQL query rather than upstream devs sending all manner of unoptimised SQL queries.
I see quite a few posts discussing running Postgres in a separate VPC or network.
We currently run our Postgres databases within Kubernetes and leverage network policies to ensure that only application pods can access the database.
The application pods ingest the db credentials from Vault. The biggest concern we have today is automating credential rotation.
Curious if anyone else has a similar setup or thoughts on ours?
Vault has a nice trick where they can create a temporary DB credential to hand to clients. If that cred is stolen, it's no good later. You have to convince Vault to give you a new cred if you want to talk to the DB.
What exactly is the security benefit of a reverse tunnel, in comparison to just listening on a port?
Usually DB servers are "always on" and always accept connections, so the reverse tunnel also needs to be always up.
Regarding row-level security: that sounds quite awesome, but in the form it's described in the article, very limited in the number of use cases.
Quite often you have a web app that talks to the DB and that uses a service account. So as with, with row-level security you can just allow or disallow things to the service account, not to the user logged into the web application.
Is there a way to drop from the service account into a less-privileged role inside a transaction or so?
Disclaimer: I work at Teleport (but I am not the author of the article).
This work was done because the Teleport users who used it for SSH kept asking for the same access for their databases. The reasoning goes like:
1. Setting up a single proxy gives you the same benefits for N databases as they come online. No need to manage additional endpoints (public IPs, ports, etc).
2. You have the same centralized place to manage auth/authz for all users.
3. This allows to connect to databases on the edge, where there isn't an opportunity to have a permanent public IP and locations frequently go online/offline.
4. Finally, it's nice to have unified visibility into what's available (for users) and centralized logging/audit for the security team.
As always, all of this is possible with other tools. The world of open source is vast and full of options, but we were hoping to make it simpler, with less configuration and moving parts.
I know in SQL Server a lot of products use the CONTEXT_INFO function for this, basically store some variable per connection and change it for context switching/row level security.
Obviously you need to trust the service account enough to do that.
Running your own Postgres is surprisingly easy, even with backups/monitoring/replication. Postgres replication has a reputation for being difficult to set up, but that's not what I've found. I think the reputation largely reflects how it was 15-20 years ago, not today. It's not completely turn-key I suppose, but there are many resources to learn.
Speaking of monitoring: there are far fewer options if you're on RDS. Personally my favorite is munin, where you can instantly see all kinds of stats & history at multiple levels of abstraction. There are many excellent Postgres plugins to report on transactions, locking, etc., and it's easy to write your own.
On RDS you can't install custom extensions. They have a whitelist of the most commonly-used ones, but if you find a different one (or build your own), you're out of luck. This really hampers you if you want to get the most from your database. I will say though, building custom extensions can also block you from using CI/CD solutions (or at least make them harder to set up), since they may have similar restrictions. Writing a custom extension is sort of a last restort, but it can be a huge boost for certain problems.
RDS also doesn't grant you direct access to the WAL. That means you can't use WAL-E/WAL-G (a really nice incremental backup solution) or many other helpful tools. You can't do replication except via AWS's own black-box features. (This can be especially annoying when you do upgrades.) It also matters because RDS only gives you 30 days of backups. Tons of businesses need more than that, and it's hard to achieve without using pg_dump. But for large databases, pg_dump can take hours and impact performance of other connections.
On RDS you also have to deal with EBS expense and performance limitations. PIOPS are very expensive. Running on local disks is a lot faster. On plain EC2 you can do more to work around all that. Ephemeral storage gives you real disks, but they might not be big enough (and they don't scale independently of the instance size). A better approach is RAIDing over gp2 volumes. I've heard they did this at Reddit and Citus. I've set it up before and it has worked great. You might be able to find some details in my comment history. Go back a few years. . . . Of course if running in your own datacenter is an option, that's even simpler (in some dimensions anyway). To me the sweet spot is renting dedicated machines, e.g. from Wholesale Internet.
RDS is sooo easy though. I have to admit it's hard not to recommend it for early-stage ventures. The limitations probably won't bite you until you're far along.
For example: how much of this would I need not worry about if I use a managed postgres database (like from digitalocean or aws or any provider actually)
Teleport looks cool.
We have a pretty elaborate home grown bastion host setup. Seems like an area that should have more competition. Especially after looking at teleports pricing.
Is there any reason against using wireguard to prevent any external connections to postgres if both the postgres clients and server are under your full control (e.g. only your server connect to postgres)?
VPNs are commonly used to connect applications to databases living "somewhere else". I don't see any particular reason why wireguard would be a bad choice for that. It's certainly easier to configure than IPsec.
Do still use SSL and password authentication too; a VPN alone isn't a complete solution.
This post seems to outright state that by default postgres is listening to everyone via TCP for connection.
This is not true.
Unless you edit pg_ident.conf, your postgres install will not listen for connections outside of on localhost. So, while it's correct to say that it listens to TCP port 5432, there is a very narrow limit to whom it's listening for. Namely, the same machine.
Postgres is pretty secure by default. It doesn't allow external connections. It also requires a username and password to connect with permissions on the database you're connecting to.
Compare that to something like redis. They at least used to listen for connections external by default and not even have a password to connect. I can imagine it's still very easy to find all kinds of interesting stuff snooping around on port 6379.