Hacker News new | past | comments | ask | show | jobs | submit login
Securing a Postgres Database (goteleport.com)
312 points by bryanmikaelian on April 2, 2021 | hide | past | favorite | 93 comments



>By default, PostgreSQL listens on a TCP port 5432.

This post seems to outright state that by default postgres is listening to everyone via TCP for connection.

This is not true.

Unless you edit pg_ident.conf, your postgres install will not listen for connections outside of on localhost. So, while it's correct to say that it listens to TCP port 5432, there is a very narrow limit to whom it's listening for. Namely, the same machine.

Postgres is pretty secure by default. It doesn't allow external connections. It also requires a username and password to connect with permissions on the database you're connecting to.

Compare that to something like redis. They at least used to listen for connections external by default and not even have a password to connect. I can imagine it's still very easy to find all kinds of interesting stuff snooping around on port 6379.


>Compare that to something like redis. They at least used to listen for connections external by default and not even have a password to connect

Not anymore since version 3.2.0 [0]

>Unfortunately many users fail to protect Redis instances from being accessed from external networks. Many instances are simply left exposed on the internet with public IPs. For this reasons since version 3.2.0, when Redis is executed with the default configuration (binding all the interfaces) and without any password in order to access it, it enters a special mode called protected mode. In this mode Redis only replies to queries from the loopback interfaces, and reply to other clients connecting from other addresses with an error, explaining what is happening and how to configure Redis properly.

[0] https://redis.io/topics/security


That is very very late. This type of error was inexcusable back in 2009 when Redis first came about. There is no reason to trust this product.


3.2.0 came out 5 years ago. I think it's okay now.


Yes, the article is a bit wierd. If you want to require a reverse ssh tunnel, you should be able to leave postgres in it's default config - typically listening on 5432 on localhost (or, in its upstream default - only on a domain socket, I believe).

Now, you probably want to allow services to connect without needing to speak ssh - so you probably do want a bind whitelist, an IP whitelist and ssl - but the article is off to a bad start by being wrong about defaults.


I've checked postgres repository and default has been to listen on localhost for at least 15 years.

https://github.com/postgres/postgres/blob/master/src/backend...


To be super clear, because it can be less than perfectly obvious, this means on `localhost` but also on loopback. There is no 'real' network device talking to this port.


it also isn't "open"; your username has to match the postgres database username to connect.

IE; the 'postgres' unix user is required to access databases as the 'postgres' database user.


Hmm I think the default pg_hba.conf uses "trust" and so will allow passwordless connections as postgres on localhost, you just have to specify the username (psql will use your username by default)

This is the case for the official docker image at least, and I'm fairly certain also true of distro package managers


At least on debian / ubuntu, the default is peer auth: https://salsa.debian.org/postgresql/postgresql-common/-/blob...


> Unless you edit pg_ident.conf, your postgres install will not listen for connections outside of on localhost.

I don't know what the defaults are, but pg_ident.conf has absolutely nothing to do with this. The main configuration file (I think postgresql.conf usually) has listen_addresses, which controls the addresses on which postgres listens, as you might guess.

pg_hba.conf (not pg_ident.conf) controls the authentication methods the server asks from the client, depending on how they're connecting.


You're technically correct, but the difference is immaterial unless we're talking about outright pre-auth exploits. Or exploits of the auth itself, I guess.

(I think I recall exactly one in the history of PostgreSQL since I started using it, but it is what it is.)


I saw nobody mentioning Docker here, so just a reminder: If you do `-p 5432:5432`, there is no different than listening `0.0.0.0`.


Right, which is why it’s absolutely critical to bind to 127.0.0.1 instead of 0.0.0.0 in your docker / docker-compose config.


The worst of both worlds: not accessible from the outside, yet insecure because it’s accessible to any user on the inside.

What is the advantage of listening on localhost compared to using a socket, with free access control?


Heartbleed style OpenSSL zero day protection. An open to all environment, like the public Heroku postgres environment, is a huge attack surface.

Layers of security exist for this exact reason. If one fails, hopefully something else stops it.


> What is the advantage of listening on localhost compared to using a socket, with free access control?

Can install and use right away locally without figuring out where your distro puts the socket at.

Edit: Also no need to play with permissions of the socket in such a case.


Unless your local cluster doesn’t allocate your install on 5432 e.g. after cluster upgrade.


No need to play with permissions of the socket, because you ‘just’ use a socket that gives permission to everyone.


Modern linux security thinking is that any sort of code running inevitably leads to root privilege.

Put another way, any user can become root through privilege escalation, so access control is pointless, since any untrusted user can take over the machine.

The real unit of security is the whole OS (VM), not its internal user boundaries.


This kind of mindset exists but is not a consensus. And even if it became consensus, it would still take many, many years for most developers to stop using ssh forwarding with localhost listeners in a security reliant way.

Also, the loopback is used as a networking interconnect or guest->host channel for sandboxed containers and VMs, so it's security sensitive in this way.


> Unfortunately, this sort of air-gapped setup is not something PostgreSQL supports out-of-the-box.

Is a reverse tunnel really air gapped?

I thought AG meant one had to physically touch the device and transfer using devices without any network capability, such as a flash drive?


Air gap means there’s no physical connection. Like literal air between one wire and the other. Air gap should never be thought of something like a “logical air gap” because so such thing exists. Either the cable is physically plugged in or it isn’t.


Pretty sure that the common definition (also the one from Wikipedia [0]) is that there's no connection to a network (including wireless ones; the medium is not an important part in the concept).

The article appears to say that a database/DBMS in the ideal world is not accessible over network at all. That is, apparently only accessible by users who have physical access to the machine it is on.

[0] https://en.wikipedia.org/wiki/Air_gap_(networking)


"Unfortunately, this sort of air-gapped setup is not something PostgreSQL supports out-of-the-box."

Not sure what they mean by "out of the box", but you can make the listen_addresses list empty:

"listen_addresses (string) ...If the list is empty, the server does not listen on any IP interface at all, in which case only Unix-domain sockets can be used to connect to it..."

https://www.postgresql.org/docs/9.3/runtime-config-connectio...


I’m sure the article author meant, by saying “out-of-the-box,” the features available on initial, default install.

I disagree with the author and agree that indeed a fresh install supports (i.e. allows one to configure without rebuilding) not talking over a network at all.


Ah, yes, re-read, and they do mention this.


That's a good question. I agree with your definition of AG. However, the only time a database (or any other kind of data store) would be useful in an AG setting would be if it's part of an AG system including network and other computers.

Perhaps we need another somewhat similar term. It's like null-routing or firewalling devices on your network.. they're technically "connected" but if they cannot dial out they're in some ways gapped. (This is handy for dubious quality IoT devices, they can't phone home, auto-patch to drop features, or share your usage information with $corp, but still respond to local network commands).

To some extent tunneling is security through obscurity (an SSH tunnel has moved the port you need to secure from 5432 to 22)


Air gapped data stores sound useful for very common scenarios like: offline backups, non-anonymized PII, infrequently accessed bank records


I.E. for long term archiving or legal compliance.


Yep, tapes on a shelf at Iron Mountain.


I personally secure my postgres instances by putting them in a silod vpc without internet access. I then limit incoming connections to only be allowed from the separate vpc containing the application server

I then use a bastion host when I need to access ssh on the instance. The bastion host remains off and inaccessible except for when I need to perform maintenance.

The advantages of this is that there is no "always-open" access to the instance.

Not sure why the author does not advocate this.


> Not sure why the author does not advocate this.

Probably because the blog post is an advertisement for their product, which already allows you to implement bastion hosts as you describe.

From the bottom:

> Databases do not need to be exposed on the public Internet and can safely operate in air-gapped environments using Teleport’s built-in reverse tunnel subsystem.


Did not catch that. Sneaky self promotion there :)

Not sure what there reverse tunnel product is, but a bastion host is super easy to implement, just spin up an ec2 and walla.

Curious as to what value they are providing


Unsure whether you meant walla, but just in case this is what you meant: It's "voila", from French "et violà".

https://www.collinsdictionary.com/dictionary/french-english/...


They've been around a while, it actually seems quite cool. Bastion hosts are simple to setup, sure, but Teleport adds a whole bunch of porcelain on top, e.g. integration with SSO, web UI for administration, etc.

Haven't used them myself but I wouldn't be against trying it if in the market for something like that.


I find that porcelain has a very bad time in breaking when you least suspect it.

The same is true for server applications that have weird 3rd party dependencies that may go down when you least suspect it.


Yep, and your answer is the better default answer.

Bastion<->app host<->DB

Bastion can talk to the net.

App host can talk outbound but inbound only accepts bastion and DB.

DB can only talk with app host.

Obviously, you harden everything appropriately... But with this arrangement, it's very difficult to penetrate this sort of network. Think of it as a network that as a whole is default-deny.


Currently I have a postgres/timescaledb running on EC2 in VPC which has no internet access. I do VPN tunnel to access private local subnet and have security group settings that allows my and 1 more machine access only. I usually have a jump server that I use, but I don't want to keep my ssh keys on the server or leave it behind.


If the jumpserver is trusted and controlled by you then you should have a look at ssh agent forwarding. Then you avoid leaving keys on the jumphost.


A better alternative would be ProxyJump. See e.g. https://serverfault.com/questions/958222/bastion-server-use-...

Edit: To add some details - using ProxyJump you don’t have to expose anything to the jump host and instead just proxy through it.


One alternative to keeping the bastion off is to restrict access to the bastion by IP address. Ideally, the list of IP addresses that need access to the bastion should be relatively small.

In cloud environments, it's straightforward to update network firewall rules as IP addresses change. Residential and office IP addresses don't change much so it's not much of a hassle in my experience. That said, it can get annoying if you find your self working on a network that rotates your IP address frequently (e.g., a hotspot).


What value is added by using a separate VPC? Equivalent restrictions can more easily be done with security groups, including on the outbound networking


I would suggest considering segregation by subnets (in addition to security groups), using public / private subnets, where any server in a private subnet (behind a nat gateway) doesn't/can't have a public ip address, and therefore cannot be accessed via the public internet.

You could then use a bastion to access servers in the private subnet, or use something like AWS Session Manager which provides command line access via web browser in lieu of a bastion.


> What value is added by using a separate VPC?

Adding more mechanisms on top is pointless when the effort could be invested in, for example, automated auditing of SGs, which is vastly more potent from a hardening perspective than adding additional layers of technical redundancy that are still exposed to the same flawed human processes.

When you reach a team of 10-20 folk on a project, stuff tends to get confusing and/or lazy with elaborate configurations. Security design therefore is about more about managing that outcome through simplicity and process hardening than.. well.. I don't even know what threats a separate VPC protects against


> When you reach a team of 10-20 folk on a project, stuff tends to get confusing

From the perspective of maintaining security in the least confusing way possible in a team setting, I could see a scenario where you have a VPC that only has private subnets, and no public subnets at all.

You could call it “Database VPC” and assuming your team doesn’t reconfigure the VPC to add public subnets, you can be comfortable with your team adding more EC2 servers / databases / whatever in that VPC since they would all be in private subnets inaccessible by the public internet.

And then you could have a 2nd VPC with private/public subnets that are less locked down than the database VPC, which might contain your load balancers, application servers, etc.

I suppose the benefit would be the logical separation of databases into a VPC without any public subnets. And the only way to gain access to subnets in that VPC would be through VPC peering.

(The above assumes you’re using AWS). Although after typing all that, I still think you can accomplish a comparably secure (on a network level) architecture using security groups, or even separate subnets, without separate VPCs.

In general I try to avoid multiple VPCs because VPC peering in AWS can get tricky (or impossible if both VPCs have overlapping CIDR blocks).


Agreed. What would be the best way when using something like Heroku?


Something like Nebula (https://github.com/slackhq/nebula) or Tailscale, perhaps?


Can you setup Nebula on your Heroku dynos and an RDS instance?


I use almost the same approach.

I also limit access to just one IP, which is a VPN server hosted on a different cloud provider.

You can run a bulletproof VPN server easily using something like algo (https://github.com/trailofbits/algo)


How do you monitor node health?


You can put other machines with a highly restrictive set of network rules that allow data to cross inside the network and outside the network in very controlled ways.

Email is one such way.


> Email is one such way.

I'm sorry but could you clarify? You mean that you monitor and/or collect data from hosts inside of a network via email somehow?


I am curious. Most of the monitoring stacks include something like telegram/Prometheus to collect metrics and send to influxdb. How would you do this via email?


Inside the he bastion network/VPC, there can be logging, monitoring, ad/kerb+openldap stuff to do the requisite thing. And your logging and monitoring can send trigger emails when they catch bad things.

Your inbox needs to be an automatable inbox, say controlled by bleeder or another flowengine. From there you can build a full messaging dashboard and have it done in something that could pass a very stringent audit.


A good overview, but the most secure way is to not give any table-level access privileges and only allow access using SECURITY DEFINER stored procedures (the PG equivalent of the setuid bit). For instance if you have a table of users with login and hashed salted password, you could implement a check_password() procedure. If the app account is compromised, it would not have access to the password hashes or even be able to enumerate users.


Stored procedures are really not used enough. They’re incredibly useful, and not just for security.

People can make out-of-band updates to the query, like making it more efficient or migrating it, without requiring any changes to the all.


I have an impression that many developers actively avoid stored procedures, trigger or any 'complex' database features because they don't fit well into a typical CI/CD pipeline which deals only with application code.

It is quite possible to test stored procedures, it's just requires some additional work/infrastructure.


It's mostly a familiarity and education problem, as well as the fact MySQL didn't have good support for this. Developers are exposed to procedural programming languages at university, but seldom to databases, let alone stored procedures. Unit tests can be written in SQL or PL/PGSQL and will typically run much faster than a CI pipeline.


When used for business logic they also execute about 20x faster than the same logic encoded in a client, and in far fewer LOC. Getting rid of all those round-trips has a huge effect!


Running a slow hash operation that should take a non-negligible amount of CPU on your single core per process database server isn’t a great idea.

That’s not a rag on stored procedures more generally. Just that specific use case scales poorly to a large number of operations as its inherently cpu bound.


You can do the slow pbkdf2 hash on your webserver and use the result for your stored procedure to check the password. If the password column is only visible to the stored procedure you have used the best features of both systems.


No you can’t. To do the hash you need the per user salt before you start hashing. Which would require reading from the DB.


Then you can retrieve salt, do hash on server, and use the stored procedure to compare. More requests, but a tradeoff for security \o/


There is no reason you can‘t have a deterministic function which generates a hash for the user.

A simple sha256(lower(email)) is equally secure as a complete random salt, the only requirement on a salt is to be unique.


I agree with the password example. I don't agree with people updating queries willy-nilly. Updated queries should go through the same life-cycle of testing (etc) as code, meaning they can simply go out with continuous application deployments anyways.


I mean, it's likely you have some other stored procedure that does enumerate users, for use by app adminstrators. Ideally it doesn't return password hashes though, since they are really not useful for humans to look at. Also ideally, the admin stored procedures are not executable by the end-user app.


Usually application uses connection pool to connect with database server. I'm not sure if that pattern could be used, when different requests need different database roles.


It's doable. You have a web role in DB that's used for public facing part of a website and then admin role that's used for backoffice part of the website. You can then separate these backends to different machines or at least to a differnet UNIX users, so that when someone breaks through via public facing backend, he doesn't immediately get access to everything, but still has to work for it a bit.

It's not as simple, you still have to make sure public backend user can't access web logs, that may reveal session id of an backoffice admin account or other information useful for breaking in other parts of the website.

Eventually, the attacker will gain access, but this is useful to slow him down enough, so there's a chance he'll give up, or you notice something suspicious.

For example pgbouncer allows you to connect via multiple roles/credentials to a database. It's just about how you configure your pool.


Not to mention that with the microservices fad you could have different server processes with different credentials in their connection pools for different operations.


It feels weird to me that this blog post would suggest acquiring Let's Encrypt certificates for certificate encryption. While it's great for something public facing that needs your CA installed by default, creating certificates for things that you probably don't want being public - like your database backend - just makes it more discoverable. For example, certificate transparency logs mean that anybody can see what certificates Let's Encrypt has handed over to you. Example: https://crt.sh/?q=ycombinator.com

Of course, this is a security through obscurity type of approach, and you'd want to secure your database whether or not somebody knew where it was running. But there's a difference between somebody seeing that you've just created `staging-psql.foo.com` that you might still be configuring, and the passive background noise of internet port scanning that's a little less targeted.


I don’t understand the value of Certificate Transparency: I could see triggering an email to the owner of a domain every time a certificate is issued, but what’s the value of a public log? It seems to me that it just increases attack surface of your services.


I agree, I would just do a self-signed cert.


This is surely a silly question (and probably a lot of devops will think I am an idiot), but what do you recommend to secure a PostgreSQL that is accesed by different IPs? We use Azure PostgreSQL database, which I guess they take care of most of the security, but they allow us to set up firewall rules, as only allowing connectins by established IPs.

My (our) problem is that we use a lof ot AWS lambda functions that read and write to the database, and they always execute from different (dynamic) IPs, so what is the best solution in this case?


You can attach the Lambda functions to a VPC. So you can control their outbound IP.


Seems the old adage of "beware of blog posts bearing gifts" strikes again.

For blog post masquerading as a how-to on securing Postgres, the fact the article only fleetingly mentions the word "function" once is not cool.

One of the biggest things you can do for Postgres (or any database for that matter) is enforce the use of stored procedures ("functions" in PG-speak) rather than direct SQL queries.

SQL Injection attacks are common as muck. Stored procedures are a quick and easy way to mitigate them.

Stored procedures also have the added bonus of allowing the DBA to remain in control and ensure a higher quality of SQL query rather than upstream devs sending all manner of unoptimised SQL queries.


I see quite a few posts discussing running Postgres in a separate VPC or network. We currently run our Postgres databases within Kubernetes and leverage network policies to ensure that only application pods can access the database.

The application pods ingest the db credentials from Vault. The biggest concern we have today is automating credential rotation.

Curious if anyone else has a similar setup or thoughts on ours?


Vault has a nice trick where they can create a temporary DB credential to hand to clients. If that cred is stolen, it's no good later. You have to convince Vault to give you a new cred if you want to talk to the DB.

https://www.hashicorp.com/resources/securing-databases-with-...


Removing the public schema and not allowing users to connect to all databases by default is also a good idea.


What exactly is the security benefit of a reverse tunnel, in comparison to just listening on a port?

Usually DB servers are "always on" and always accept connections, so the reverse tunnel also needs to be always up.

Regarding row-level security: that sounds quite awesome, but in the form it's described in the article, very limited in the number of use cases.

Quite often you have a web app that talks to the DB and that uses a service account. So as with, with row-level security you can just allow or disallow things to the service account, not to the user logged into the web application.

Is there a way to drop from the service account into a less-privileged role inside a transaction or so?


Disclaimer: I work at Teleport (but I am not the author of the article).

This work was done because the Teleport users who used it for SSH kept asking for the same access for their databases. The reasoning goes like:

1. Setting up a single proxy gives you the same benefits for N databases as they come online. No need to manage additional endpoints (public IPs, ports, etc).

2. You have the same centralized place to manage auth/authz for all users.

3. This allows to connect to databases on the edge, where there isn't an opportunity to have a permanent public IP and locations frequently go online/offline.

4. Finally, it's nice to have unified visibility into what's available (for users) and centralized logging/audit for the security team.

As always, all of this is possible with other tools. The world of open source is vast and full of options, but we were hoping to make it simpler, with less configuration and moving parts.


I know in SQL Server a lot of products use the CONTEXT_INFO function for this, basically store some variable per connection and change it for context switching/row level security.

Obviously you need to trust the service account enough to do that.


Possibly stupid questions.

(1) When is operating your own PostgreSQL instance desirable?

(2) Isn’t this equivalent to running an RDS instance in its own VPC with public access turned off and only allowing comms with app VPC?


1. Saving money 2. Need an extension that is not available 3. Running the whole app on a single node (to avoid network trips) 4. Want to learn


1. When you don't want your data on other people's computers, or need extensions or configuration that's impractical, or need control over your uptime


I do it because of custom extensions we need. I don't think any cloud provider allow it.


Running your own Postgres is surprisingly easy, even with backups/monitoring/replication. Postgres replication has a reputation for being difficult to set up, but that's not what I've found. I think the reputation largely reflects how it was 15-20 years ago, not today. It's not completely turn-key I suppose, but there are many resources to learn.

Speaking of monitoring: there are far fewer options if you're on RDS. Personally my favorite is munin, where you can instantly see all kinds of stats & history at multiple levels of abstraction. There are many excellent Postgres plugins to report on transactions, locking, etc., and it's easy to write your own.

On RDS you can't install custom extensions. They have a whitelist of the most commonly-used ones, but if you find a different one (or build your own), you're out of luck. This really hampers you if you want to get the most from your database. I will say though, building custom extensions can also block you from using CI/CD solutions (or at least make them harder to set up), since they may have similar restrictions. Writing a custom extension is sort of a last restort, but it can be a huge boost for certain problems.

RDS also doesn't grant you direct access to the WAL. That means you can't use WAL-E/WAL-G (a really nice incremental backup solution) or many other helpful tools. You can't do replication except via AWS's own black-box features. (This can be especially annoying when you do upgrades.) It also matters because RDS only gives you 30 days of backups. Tons of businesses need more than that, and it's hard to achieve without using pg_dump. But for large databases, pg_dump can take hours and impact performance of other connections.

On RDS you also have to deal with EBS expense and performance limitations. PIOPS are very expensive. Running on local disks is a lot faster. On plain EC2 you can do more to work around all that. Ephemeral storage gives you real disks, but they might not be big enough (and they don't scale independently of the instance size). A better approach is RAIDing over gp2 volumes. I've heard they did this at Reddit and Citus. I've set it up before and it has worked great. You might be able to find some details in my comment history. Go back a few years. . . . Of course if running in your own datacenter is an option, that's even simpler (in some dimensions anyway). To me the sweet spot is renting dedicated machines, e.g. from Wholesale Internet.

RDS is sooo easy though. I have to admit it's hard not to recommend it for early-stage ventures. The limitations probably won't bite you until you're far along.


Some input on this would be greatly appreciated.

For example: how much of this would I need not worry about if I use a managed postgres database (like from digitalocean or aws or any provider actually)


I'll be soon doing this for a project running on LAN without connection to internet, during events once COVID is over


Teleport looks cool. We have a pretty elaborate home grown bastion host setup. Seems like an area that should have more competition. Especially after looking at teleports pricing.


Disclaimer: I work at Teleport

If our Enterprise pricing is too high, take a look Teleport's open core version:

https://github.com/gravitational/teleport


Check out using Hashicorp Vault for dynamic postgres credentials and securely obtaining them during system init


Is there any reason against using wireguard to prevent any external connections to postgres if both the postgres clients and server are under your full control (e.g. only your server connect to postgres)?


VPNs are commonly used to connect applications to databases living "somewhere else". I don't see any particular reason why wireguard would be a bad choice for that. It's certainly easier to configure than IPsec.

Do still use SSL and password authentication too; a VPN alone isn't a complete solution.


https://supabase.io/docs is a good example of adding a nice security later (and more) on top of postgres


https://archive.ph/FsOHN (as the article is now removed)


> log_statement = all

Is this something people really do? Wouldn't these logs be enormous and/or potentially leak unwanted information?


Assuming the logs have the same visibility as the db files themselves, I don't think it makes a difference. It can affect performance, though.

So we use it in development but not in production.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: