Remote access to production infrastructure (death to the VPN)

bob1029 · on March 2, 2020

In my experience, the best way to eliminate the VPN is to expose your various internal business services as websites w/ TLS1.2 & multi-factor authentication.

Obviously, this isn't practical for everything. But, if the thing you were using VPN for is already a web application, you are basically halfway there. Ideally, you just directly expose a secure web application to clients, but in some cases (i.e. very old legacy systems) you probably want to put an nginx box in front and then put the authentication at that level.

Web access has a huge range of benefits. Users are scoped directly to the system of concern rather than an entire network of hosts. You can take security to the next level with server-side rendering of web content in order to avoid additional required channels of communication or revealing of implementation secrets to the client (e.g. SPA client source).

We are at a point of placing our actual application servers directly on the public internet (with TLS1.2/MFA/ACLs/etc). Hiding behind VPNs or layers of reverse proxies seems to cause more harm than good.

cameronbrown · on March 2, 2020

> Obviously, this isn't practical for everything

If you have the engineering resources to back it up, it definitely can be. Internal services at Google usually trust the office network the same as any other -- well documented in the BeyondCorp paper if you're interested.

LinuxBender · on March 2, 2020

My understanding based on the folks I know at Google is that BeyondCorp paper was a PoC that was implemented in part of their corp network, that is called Production, not to be confused with the production network that hosts their search site. That network still requires a VPN and a hardened Linux laptop to access. Not every service has been modified to implement the RPC calls / authenticated protobuf code changes.

Someone at Google please feel free to correct me on this.

asdfasgasdgasdg · on March 3, 2020

BeyondCorp is not just a proof of concept. Everything at Google is accessed through it, including production-production (through a proxy maybe? Not sure the details). You're right about the requirement to have a hardened device -- which acts sort of like a token (as described in the BeyondCorp whitepapers). But it can be Windows, Linux, Mac, Chromebook, Android or iPhone. I never use VPN and I work on production stuff from outside the office all the time.

(FWIW, I don't think there's anything secret here. This stuff is very explicitly described in the whitepapers.)

jedieaston · on March 3, 2020

As I understood the paper, their production version is called überproxy and has access to everything they host, as they moved their applications all on to it (notably, this means SSH now has to go through chrome).

bluecmd · on March 3, 2020

BeyondCorp is the effort/program name, ÜP is one of the software that enables it to be implemented. ÜP is just a fancy Envoy/nginx reverse proxy.

kccqzy · on March 3, 2020

With BeyondCorp, the production network you access does host all of the critical jobs including search. But of course you only get to manipulate these jobs in an approved way, e.g. using an RPC to bring up or bring down a job. Interacting with jobs by sending them RPCs requires ACLs naturally.

You don't get direct SSH access to production machines or any other lower level network access like packet sniffing on the production network.

thedance · on March 3, 2020

A key reason why BeyondCorp actually works is hardly anybody needs to SSH to prod, and people who do need it, need it rarely. Everything at Google has rich RPC control surfaces and the tools are installed on users' workstations to invoke the RPCs. Status of everything is available via HTTP, in your browser. No need to SSH to a server to read logs or restart a process. Need to collect hardware PMU counters in prod? There's an RPC service for that. Not only do these rich interfaces enable BeyondCorp, they also cut down on insider risk because it's no longer considered "normal" to get an interactive shell session in production.

unexaminedlife · on March 3, 2020

I want this, but with a twist.

1) I want the websites to do certificate verification on the certs I'm using on my desktop.

2) Then on top of that my website should use usb security key verification as well.

Easy enough to do #2, but I want #1 to be ubiquitous as well.

... So basically my HTTPS server will use my public key as my identity, not my username/email and password.

johnmaguire · on March 3, 2020

The Duo Network Gateway can do this with Trusted Endpoint certificates.

(I work there as an engineer.)

AYBABTME · on March 3, 2020

you can do that with mutual TLS (client side authentication)

unexaminedlife · on March 3, 2020

Thanks for pointing that out. I definitely get that it's possible, but as far as I know in the open source world there isn't much in terms of infrastructure to implement these types of solutions in web applications.

Happy to be proven wrong, I'm just unaware if any popular open source HTTPS servers offer this as an integrated solution.

Or better yet, I'd like raw access to the certificate info FROM the application layer on server-side so I can manage that as needed.

bluecmd · on March 3, 2020

There are a few. https://github.com/tink-ab/login-service is one I wrote and open sourced when I worked at the associated company. Adding something like https://github.com/dhtech/authservice + https://github.com/dhtech/prodaccess would give you client certificate provisioning and handling.

In the end you'd easily get e.g. daily U2F with monthly cert rotation. Or whatever you end up wanting.

Open source have had this convered for years, but you'll have to look for it.

Disclosure: I'm the main author of all those three projects.

jeroenhd · on March 3, 2020

As for handling the TLS part, both Apache and NGINX have the capability to do verification of certificates. The application only needs to parse the headers they pass on to determine the user connecting to the application.

If you don't care about the user (TLS client certs + standard username/password) you can get away with proxying the application through nginx and calling it a day.

Basically you turn on client verification and you're done. If you want to show an error to unauthenticated users, you can make verification optional and add something along the lines of: if ($ssl_client_verify != SUCCESS) { return 403; }

the8472 · on March 3, 2020

nginx can do the client cert verification for you and pass the results as http headers to the application. so all you have to do in the application is to add some request authentication interceptor that inspects the headers.

ViViDboarder · on March 3, 2020

You don’t implement it at the Application level. You implement mutual TLS on your proxy and then each application can keep its own auth.

Disclaimer: I haven’t don’t this myself (yet), but have read about it a bit.

unexaminedlife · on March 4, 2020

Right. I may have done a poor job explaining. Imagine if the HTTPS server did all its verification / etc at the protocol level, but then EXPOSED the public key, used by the client, to the application I wrote. This way I can (at the application level) do app-related stuff like reject users if (for example) they've provided a public key not in my white-listed public keys. This would also make it seamless to build tooling around the application, such as what github (and others) do when they ask you to maintain public keys you may use when pushing / pulling to/from a repo.

But in this case users could provide public keys they will use when accessing the website from internet (as opposed to intranet).

phamhongviet · on March 3, 2020

You can use openresty and use lua to create a whitelist of certs. Example: https://gist.github.com/phamhongviet/71da4b02bb517000593cf49...

jrockway · on March 3, 2020

I think most open source stuff supports client certs pretty well, but the issue is getting them to end users. I personally use mTLS as a two part authentication/authorization system; services prove their identity to each other with certificates, humans prove their identity to a proxy server that generates a bearer token for the lifetime of the request. Then each application sees (source application, user) and can make an authorization decision.

I personally use Envoy as the proxy and cert-manager to manage certificates internally. You can peruse my production environment config for my personal projects at https://github.com/jrockway/jrock.us (dunno if that's the real link, my ISP broke routes to github tonight, but it's something like that).

The flow is basically:

1) At application installation time, a cert is provisioned via cert-manager. Each application gets a one-word subject alternate name that is its network identity. The application is configured to use this cert; requiring incoming connections to present a client certificate that validates against the CA, and making outgoing connections with its own certificate. (This integrates nicely with things like Postgres, that expect exactly this sort of setup.) This lets pure service-to-service communication securely validate the other side of the connection. This is nice because, in theory, I don't have to configure each application with a Postgres password, Postgres can just validate the client cert and grant privileges based on that. (I have not set this up yet, however.) I also like the ability to reliably detect misconfiguration; if you misconfigure a DNS record, instead of making requests to the wrong server, the connection just breaks. Saves you from a lot of debugging. And, of course, if the NSA is wiretapping your internal network, they don't get to observe the actual traffic. (But probably compromised your control plane too, so it's all pointless.)

2) The other half is letting things outside of the cluster make requests to things inside the cluster. I use an Envoy proxy in the middle; this terminates the end user's TLS connection, and routes requests to the desired backend, like every HTTPS reverse proxy ever. I wrote a "control plane" that automates most of the mTLS stuff (it's production/ekglue in the repository; ekglue is an open-source project that is agnostic to mTLS, my configuration adds it for my setup). At this point, users outside of the cluster will see a valid jrock.us cert, so they know they've gone to the right site, and applications inside the cluster will see that traffic is coming from the proxy, and can decide how they want to trust that. Right now, everything I run in my cluster just passes through to its native authentication, so it's pretty pointless, but the hook exists for future applications that care.

3) For applications that want a known human user (or human-authorized outside service, think dashboards or webhooks), I wrote an Envoy ext_authz plugin that exchanges cookies or bearer tokens for an internal request-scoped time-limited access token. Applications can then validate this token without calling out to a third-party service, so no latency is introduced. (They do have to be configured to do this, and the state here in the open source world is pretty abysmal. OIDC is helping, and it's trivial to write it into your own application framework. A few applications will just accept an x-remote-user HTTP header, which I found to be adequate, especially if they can trust the proxy with mTLS. Compromising the proxy lets you compromise all upstream apps, though, so I'm looking for a new design.)

I actually wrote this at my last job and don't have the code (it's theirs)... but am slowly rebuilding it in my spare time. Second system syndrome is a bitch. You can follow along at my jsso repository on Github, but it is not ready to be used and I think that most of the stuff I wrote in the design document there is going to change ;)

Anyway, where I'm going with all this is... all the pieces exist to make yourself a secure and reliable production environment. mTLS is pretty straightforward these days, and in addition to the easy route of just doing it yourself, a bunch of frameworks exist to let you get even more security (SPIFFE/Spire, Istio, etc.) For authenticating human users, most of the work has been done in the closed source world; Okta, Duo, Google's Identity Aware Proxy, etc.

tomjen3 · on March 3, 2020

Why wouldn't it be even more secure with a VPN in front of it? Is defense in depth no longer a good strategy?

fulafel · on March 3, 2020

Thos doesn't pan out in practice. One the VPN is there, people get lazu & complacent about implementing strong security inside.

Defense in depth is a concept that should be applied with some thought, it would be good if your additional layer did something different. For example good reactive security, endpoint attestation, etc.

whatsmyusername · on March 2, 2020

We do this with a whitelist pattern to the building for 'sensitive' services. Setting up VPNs to AWS is prohibitively expensive if you're not going to build and manage hosts yourself. I don't have the time or patience to deal with it.

davedx · on March 3, 2020

Isn't the point of a VPN to provide network level security in addition to application level access control security?

Multiple layers, security in depth...

jeroenhd · on March 3, 2020

Why limit yourself to TLS 1.2? What's wrong with TLS 1.3?

winthrowe · on March 3, 2020

I think the point is more to block 1.0 and 1.1.

sscarduzio · on March 2, 2020

Authenticating the source IP address on the fly (as detected from the browser) is definitely not the way to go for many reasons:

1. With NAT and metropolitan area networks, hundreds of thousands of devices could share the same public IP.

2. Large networks with many devices often connect to the public network through trunking (load balance the connections through multiple routers), so the HTTP connection between OKTA and my browser can VERY well originate from a different IP address than my SSH session, and I would never be able to connect.

3. Many devices are mobile, and they can change their IP address when they pass from WiFi to LTE for example. This would force an unnecessary re-auth.

sullivanmatt · on March 2, 2020

I want to be clear about the use case that the enterprise port knocking solution is trying to solve: it's an additional control that would not normally even be in place. In most setups, you are exposing some sort of relay to the internet, through which your users can access the services after authentication - such as an SSH bastion host or a VPN. The IP based whitelisting mechanism is simply a layer to allow you to not have to compromise and expose anything to the world wide internet. The actual authentication and authorization mechanism is the certificate-based set of SSH connections.

BlueTemplar · on March 3, 2020

NAT and single IP adresses for multiple users are going away with IPv6 ?

zamadatix · on March 3, 2020

Eventually, for now v4 CGNAT exists too though.

exabrial · on March 2, 2020

Pure Zero Trust is just as ridiculous as using Pure-VPN-around-a-garden. The first gives an attacker unlimited retries, and the second gives an attacker full system access once they breach the outer wall.

The correct solution is somewhere in the middle: block everything by default to get you to an inner courtyard, where the zero trust model is deployed... (which ironically he suggests by deploying port knocking (port knocking is a bad idea (TCP/UDP ports are sent in the clear and the "key" is never rotated)))

The best model is probably "block everything" by default, then allow access to the inner courtyard via a VPN, where then the Zero Trust model is deployed. You remove the ability for an attacker to have unlimited retries, but access to resources still requires individual authentication.

packetslave · on March 2, 2020

He doesn't ACTUALLY suggest port knocking, just the concept ("do something in order to open a firewall hole"). The proposed solution using Lambda w/ 2FA is actually pretty cool.

exabrial · on March 3, 2020

Correct, but his solution is a hair improvement at best over port knocking.

A VPN means an individual connection is authorized into the interior courtyard.

A Lambda with 2FA to whitelist an IP, then a cron job to cleanup means everyone at your local cafe wireless access point is also authorized into the inner courtyard.

sullivanmatt · on March 3, 2020

No, the IP whitelist is only to allow access to the network entry point. So if you had a VPN, it would open the port to the VPN server. In our case, it opens the port to the cordoned-off SSH-based network entry point. It's not the replacement of an existing authorization layer; its an addition where one usually isn't found.

bronco21016 · on March 2, 2020

Why does pure zero trust give unlimited retries? If you use rate limiting, good password policy, and strong 2FA then most motivated attackers are going to seek a different entrance.

nitrogen · on March 2, 2020

The vulnerability in your stack may be before your limits/policies are checked.

exabrial · on March 2, 2020

Rate limiting across distributed systems is a notoriously hard problem to solve.

Thriptic · on March 3, 2020

So VPN in to a subnet where only a bastion host is exposed effectively?

throwdbaaway · on March 3, 2020

I used to think the same way as the author, i.e. if there is already a secured SSH bastion host and IAP-protected internal services, why do we still need VPN?

The answer is of course defense in depth.

I wonder why the author seems to reply to every comments other than this.

sullivanmatt · on March 3, 2020

Not to be particularly combative to this top-level comment author, but I did not see a reason to reply because I did not feel they had read the post particularly closely.

Obviously defense in depth can go as deep or shallow as you see fit, given an organization's resources. We believe that the short-lived SSH certificates, IP whitelisting (via "enterprise port knocking"), endpoint authentication (device trust), password authentication, and multifactor authentication are enough to protect a single production deployment. Encompassing all of that with a VPN seemed unnecessary when other protection mechanisms like the above, and additional mechanisms that we won't speak to publicly, are taken into account.

Like with anything, it's a game of risk, and it is up to each organization to decide what risk level they will tolerate. I believe most organizations have deployed VPNs in a way that gives them a higher exposure, and simply wanted to share some of the things we have learned through the process :)

zamadatix · on March 3, 2020

Also just to note there is nothing to stop you from using a rotating port knock key, particularly if you are willing to assume the client's clock is reasonably accurate.

exabrial · on March 3, 2020

Yes, but not nearly as impossible as it should be. The amount of precision to pull of a timing attack is difficult, but the amount of computing power to refactor 4098bit RSA keys requires computers that don't exist [yet].

zamadatix · on March 3, 2020

Nothing stopping you from HMACing the request IP+time with whatever crypto function you like and sending it in a series of encoded port knocks.

The only issue I've run into with port knocking is places that heavily restrict outbound ports/protocols. Though technically that is solvable too I just haven't bothered.

exabrial · on March 3, 2020

It's still sent in the clear and observable by anyone that see your network traffic.

zamadatix · on March 5, 2020

The only things in clear text are your currently IP, the time, and the AES signature. The attacker doing a packet capture already knows your IP and probably owns a watch but it's unlikely they know how to break AES to sign a modified message as you.

OJFord · on March 3, 2020

Why not zero trust on the VPN? (And zero access otherwise)

meowface · on March 2, 2020

Any suggestions for a good inexpensive or open source zero trust auth solution that supports both HTTP and SSH? I considered Cloudflare Access, but you need to pay extra for Argo Tunnel if you want it to work with SSH.

The main open source option I'm aware of now with support is Pritunl Zero. Was going to actually stand that up today before I read the article.

johnmarcus · on March 3, 2020

I love Pritunl. Have been using it for 5 years. It's super easy to setup, maybe takes an hour the first time and like 15 minutes once you have done it.

jiveturkey · on March 3, 2020

CFA is not ZT. It is "simply" moving the auth point to the CF gateway. It's still a VPN (or bastion, if you will).

ZT is when you move [strong] authn and authz to the endpoint itself.

jyrkesh · on March 3, 2020

Not sure if you need SSO (it doesn't do it), but if you can bootstrap with a cert or key of some kind, I've been loving Zerotier.

theptip · on March 3, 2020

If you're in GCP, then you can use their Identity Aware Proxy to achieve most of this. (https://cloud.google.com/iap/docs)

IAP supports HTTP and TCP connections, so you can put it in front of your website (say an internal admin webapp), or use it to tunnel SSH onto a machine that doesn't have a public IP, using your IAM roles.

If you're running Kubernetes in GKE, you can also wire IAP up to an Ingress, to protect any TCP/HTTP services you have in your k8s cluster. This one is a bit tricky to configure, but is very nice once you have it up and running.

throwaway3157 · on March 2, 2020

Hey Matt. I appreciate images in articles, but GIFs are very distracting while trying to read. May I suggest static images next time?

sullivanmatt · on March 2, 2020

Thanks all for your feedback. I have removed the images to improve readability, especially for mobile users. The post was originally written for an internal blog where we have a GIF-heavy communication culture, and I probably should have cleaned it up a bit more for general public consumption.

low_key · on March 2, 2020

This article actually got me to go track down the firefox pref "image.animation_mode". "none" is a very nice choice.

lolc · on March 2, 2020

I second this. I had to zoom way in so I could scroll between the animations for undisturbed reading.

fulafel · on March 2, 2020

The critique against VPNs is exactly right, they're such garbage compared to the standard we otherwise hold SSH, TLS etc to, and the access granularity is too wide, and there's no transparency on how wide the access is configured from VPNs. And they're very often on the wrong side of the it dept vs devops responsibility split so often misconfigured.

jschwartzi · on March 2, 2020

The biggest issue I have with replacing VPNs is in server management, but not the way the author is talking. My job entails doing software development for over 1000 devices that are all fielded behind enterprise firewalls, and the PCI compliance requirements dictate that no unnecessary access be provided into those firewalls. What this means in practice is that no connections may be established which originate from a device on the internet to a device behind the firewall. We don't control the firewall as it is under the control of our customers.

What we used VPN for is allowing us to establish SSH connections to the equipment. I would really, really like a low-resource mechanism to replace this but everyone wants to deploy their solution in a 90+ megabyte Docker container, or a Snap, which is about 1.5 times as large as the entire Linux system image for our oldest equipment. So these are great solutions for when you control the entire network path from the server(or when you are using a server!) to the Internet including the firewall, but they suck terribly for eliminating VPN in cases where you can't just open an inbound port on the firewall.

As it is I'm trying to figure out how to configure an OpenSSH client to punch out through the firewall to an OpenSSH server, then immediately turn around and provide a shell to the server. This seems to be entirely contradictory to how OpenSSH is designed, but I'm hopeful I can hack something together.

xorcist · on March 2, 2020

Are you just looking for "ssh -fNT -R 10022:127.0.0.1:22 remote"?

If so, then that's not as much a hack as a pretty standard reverse ssh.

jiveturkey · on March 3, 2020

> We don't control the firewall as it is under the control of our customers.

> As it is I'm trying to figure out how to configure an OpenSSH client to punch out through the firewall to an OpenSSH server, then immediately turn around and provide a shell to the server. This seems to be entirely contradictory to how OpenSSH is designed, but I'm hopeful I can hack something together.

This is trivial. But if you don't control the firewall, how will you get the outbound SSH access? PCI requires that both inbound and outbound traffic from the secure zone (CDE) be controlled. If you can impose upon the customer that they punch an outbound hole, you can impose inbound requirements as well. Your inbound connection does not come from "the public internet", it comes from your managed in-scope network.

sk5t · on March 3, 2020

Does a VPN afford your customers belief in the fiction that the source device is no longer on the internet?

bdesimone · on March 2, 2020

If you are interested in BeyondCorp-style access, I put together a collection of curated resources.

https://github.com/pomerium/awesome-zero-trust

PRs welcome.

kodablah · on March 2, 2020

Pardon my ignorance, so this updates a security group (or multiple) which presumably have access to several internal things? One wonders if you can take it a step further and for web-based services (i.e. doesn't apply to SSH access), at the conclusion of the authentication, it updates the security group for just that webapp. With how e.g. oauth2 SSO automatically auths via redirects, if the update to the security group is atomic/fast, access can be given one webapp a time.

Also, are there any concerns about IP timeout vs explicit VPN disconnect? Obviously the latter works better in shared environments (e.g. shared terminals, wifi's that reuse IPs frequently, large NATs that have many devices behind a single IP).

sullivanmatt · on March 2, 2020

There is one network entry point per deployment of our app infrastructure (eg. US and EU deploys), so the lambda does go and update both security groups simultaneously to allow the requestor's IP to hit either if they would like. If you wanted to, you could certainly make it more fine-grained than that, but the goal was simply to cut off the majority of the Internet from these mechanisms as an extra protection layer. There are all the other protection mechanisms (e.g. the mutual certs) to actually protect and authenticate the connection itself.

For web apps, we simply front using an OAuth2-aware proxy. Back in the day, we used this: https://mattslifebytes.com/2018/08/07/protecting-internal-ap... Now, we utilize Kubernetes for hosting most production internal apps, so we run the oauth2-proxy Helm chart (https://github.com/helm/charts/tree/master/stable/oauth2-pro...) to handle verification of identity before sending traffic back to its destination service. Conceptually similar, as auth has to be completed before the request is sent to the back-end.

nurettin · on March 2, 2020

> One of my biggest pet peeves about VPNs is that they hijack all your network traffic. They can be configured not to, but our customers and security controls like NIST 800-53 SC-7(7) typically require that they do.

VPN is dead because some customers want you to route the internet interfaces of all machines through the VPN server.

How does this even make any sense?

russdill · on March 2, 2020

Hijacking all your network traffic is one of the reasons for corps to use a VPN in the first place. It allows any and all outgoing requests to be routed through corporate security policies while you have access to the VPN and cuts down on the possibly of cross site scripting attacks.

whatthesmack · on March 2, 2020

Split tunneling is the answer here. Many folks configure their VPN solutions that way. That's exactly what I do at my present employer... traffic meant for VPN goes over VPN, and everything else gets routed through the client's internet connection.

sullivanmatt · on March 2, 2020

The control I listed, NIST 800-53 SC-7(7) [which is a part of the FedRAMP Moderate suite of controls], specifically requires you implement a technical control such that your users cannot split tunnel.

ohazi · on March 3, 2020

> requires you implement a technical control such that your users cannot split tunnel

Is it actually possible to have a technical control like this?

Why can't I create a container or virtual machine that just runs a VPN client, and then use the virtual machine network controls to decide what host traffic gets routed to the VM and through the tunnel? How would the VPN client running inside the VM know about anything I'm doing one level up?

Or is this just another bullshit "you don't actually control the software running on your machine" technical control?

wtracy · on March 3, 2020

The common assumption is that the company-issued VPN client will only ever be installed on hardware owned and controlled by the company, and never inside a VM.

Realistically, the usual plan is to create controls that are impossible for most non-technical users to bypass, inconvenient for anyone else to bypass, and back them up with the threat of disciplinary action.

ohazi · on March 3, 2020

I think this is why VPNs have such a poor reputation among the tehnically literate. We're used to security where every aspect is goverened by strong cryptography that's difficult for state actors to break (e.g. SSH, TLS).

There may be real cryptography over the wire, but there's nothing "strong" about the assumption you mentioned, or the disceplenary threats. If the threat model assumes that I can't extract a key from a laptop, or clone the behavior of some garbage Cisco client, that seems pretty broken to me.

Commercial VPNs are mostly just shitty software for enforcing shitty corporate policy, disguised as a remote access tool.

whatthesmack · on March 2, 2020

Apologies... didn't read into the NIST requirement. Out of curiosity, what do you normally implement that meets that requirement? Forcing all traffic when there is no security benefit (what is the advantage of getting to https://news.ycombinator.com through the VPN?) seems ripe for a compensating control.

burfog · on March 3, 2020

If the business prioritizes security, getting to https://news.ycombinator.com would not be possible. If you want to get to https://news.ycombinator.com you use a separate PC.

The PC you use with the VPN is never to be directly connected to the internet. It connects to a piece of dedicated VPN hardware. (could be a Raspberry PI with special software or something far more expensive) That PC can use the VPN, and thus get to various computers within the company, but it can't go elsewhere. No other business is reachable.

The company can allocate IP addresses without NAT and without regard for the rest of the world. There just isn't any connection to the rest of the world, so conflicts can't happen.

_9vzr · on March 2, 2020

It could have to deal with monitoring what is coming and going from the VPN'ed machine. If malware can escape the VPN tunnel, it is less likely to be detected than if it is forced to go through a firewall that is already looking for suspicious traffic. Rather than let a user pick and choose what goes through the VPN and possibly letting the malware make the decision for them, don't let them choose at all. I've worked at places that force you to use the VPN if you aren't connected to the office network. Security of the VPN client aside, it's not a bad idea to force any remote machine to be totally protected from the raw internet.

whatsmyusername · on March 2, 2020

If the malware (or the user) has root it can hairpin tunnel anything it wants out the default gateway by manipulating the routing tables. It may not even require root but I'd have to tinker with it which I'm too lazy to do.

Split tunnel vs not split tunnel means nothing if the client doesn't want it to mean something.

o-__-o · on March 3, 2020

The rule of thumb is if you must vpn then you have a controlled system where root is limited. Think bank or government systems

sullivanmatt · on March 2, 2020

Most people in industry I have talked to just do VPN for all traffic, even though that seems crazy to me to route your Hacker News traffic through your prod servers' networks (gross). Normally a compensating control would be appropriate, but if you are pursuing FedRAMP Moderate ATO like we did, you can only get away with a pretty small number of alternative implementations, and only Low findings can be easily accepted by the Federal stakeholders. It's tough.

awinder · on March 2, 2020

If this is for complying with NIST 800-53 SC-7, then I'd be really curious how this actually works. Because SC-7 is all wrapped up in language around split tunneling I feel like this is focused on the wrong thing -- you'd still need to accommodate for controlling access of laptops etc. to external resources. Yes, in a VPN landscape that means not allowing for side-stepping the VPN, but that would be true of any other means of protection. It's clearly talking about having outbound connections controlled & secured at all times, not just your connections to internal trusted resources.

Spooky23 · on March 3, 2020

The risk of split tunneling is that the remote client has the ability to relay data in real time. Allowing it means the local printers or laptops on an open WiFi are effectively on your network.

sullivanmatt · on March 2, 2020

As I alluded to in the post, it's a legacy viewpoint. These customers hand you a 300+ security questionnaire that hasn't been updated in 10 years. When you tick the 'VPN' boxes, alarms sound, but they are thinking about the term 'VPN' in a different context, like employees accessing a network file share. But what we really have is a Bastion host (aka jump box), which is fundamentally different. By not saying you run VPN software, the conversations shift significantly, especially when dealing with the F100 banks and the like that may not be as familiar with modern cloud architectures.

whatsmyusername · on March 2, 2020

Because split tunnel is a pain in the ass.

More specifically it's a pain in the ass if you use AWS ALB load balancers and whitelisting. Those IPs aren't consistent and you typically can ONLY route on IP.

We do it because it's better than the alternatives, but our setup wouldn't scale past more than a few applications.

dboreham · on March 3, 2020

This is only the default. You can set up your VPN however you want including only routing certain subsets down the tunnel.

johnmarcus · on March 3, 2020

Meh, this sounds like it's paid for by okta. It also doesn't cover the real use case scenario of vpns - non-technical folks need to access internal services. What's presented is a reasonable approach for ssh control. Oh, and I'm pretty sure Cloud Passage has a port-knocking based solution in the real world which also gives 2fa access for ssh.

sullivanmatt · on March 3, 2020

I have no way to prove to you that I am not some paid shill :) , but I have no relationship with Okta outside of being a customer through my employer, and I was not compensated or gifted anything for the creation of my post.

I blog about things that I encounter at work and find interesting. That happens to often be a cross-section of infrastructure and identity!

breput · on March 3, 2020

I can verify that he isn't an Okta shill but he also glosses over some of the problems and limitations that I've experienced with the ScaleFT product compared to our co-existing OpenVPN solution.

We have used multiple OpenVPN servers with password protected cerificates and TOTP. Even if someone were to obtain access to my credentials and certs, they wouldn't be able to access the production services without also obtaining access to the authenticator device. Once your machine is enrolled in ScaleFT and while you're authenticated with your identity provider, malware or just a malicious coworker could access the production services with a single command line.

There are upsides to ScaleFT as well, though. As long as you're all in on Okta or can federate with it, user management is a no brainer. And having the IdP integration is much more user (and malware) friendly and is likely more reliable for server to server use compared to OpenVPN. Limiting access to particular services is likely easier, too.

Downsides with this product include having all sorts of reoccuring configuration problems where a server just disappears from the list of available services, which requires ops involement to restore access. If you're using macOS and RDP (I just outed myself to Matt...) you have to use the sub-par FreeRDP client. And ultimately you're tunnelling TCP over TCP, which works ok in the office but which might not always work as well in mobile or higher latency network situations.

sullivanmatt · on March 3, 2020

Yes, the RDP story is very painful.

To the best of my ability, my goal was to make the post more about the network architecture (esp around the concept of SSH bastions) and less about the actual OASA product itself. I think there are a number of fungible solutions which would be just as effective (though I think the integration with Okta is a key product feature). What I find interesting and novel is more what we can do to only open ports to authenticated IP addresses, and to address connections between a single source and a single destination. To me, that's where the real power lies.

nimbusblack · on March 2, 2020

This is an apples to orange comparison. Most network admins are lazy and just provide full L3 access so they don't have deal with any access issues. Most users use VPNs as an access mechanism rather than to secure anything. Most vendors you mentioned in the article can also control access at application level like ssh.

brentis · on March 3, 2020

I work in this space who has a commercial offering. Ill just leave this CSA spec here for reference. Please also look up SPA - newer than port knocking, but based on same premise. https://downloads.cloudsecurityalliance.org/initiatives/sdp/...

fulafel · on March 3, 2020

This talks about air-gapped networks?

ex_amazon_sde · on March 3, 2020

> Amazon Linux 2-based EC2 instances, meaning the attack surface is extraordinarily small

WHAT?

0xbadcafebee · on March 3, 2020

Mainly this is a poor idea because of the whole allowing-an-IP-thing. Do not rely on IPs or ports for security, ever. Ever. Evereverevereverever. They are not secure. They do not contribute to security. Do not make your security dependent on them. Ever. Ok? Thanks.

AWS already has a vastly superior solution for this, called AWS Systems Manager Session Manager (it's quite a mouthful). You create a session with AWS using a federated login service (SAML-based SSO) and then craft IAM policies to allow a single user to ssh into a single server, over the AWS API. Not only will this be more secure, you don't have to maintain a wacky custom solution.

Logging into servers is an anti-pattern, and wherever possible you should be running away from it. Get metrics out of the server and analyze them, run commands remotely using some kind of persistent system agent, stop storing state on your servers. I know this is not the point of the article, but I want to remind people of it so they can can avoid the ssh trap early.

irl_ · on March 2, 2020

If you wanted to do this "enterprise port knocking" on OpenBSD, you could use pf to do this.

https://www.openbsd.org/faq/pf/authpf.html

The FAQ entry is about building an authenticated gateway, but the same technique can be applied to open individual ports.

znpy · on March 3, 2020

I've recently started using socks proxies via ssh and I was surprised about how far you can go with such solution.

Unless you want permanent connection with routing and everything, ssh socks proxy work awesomely.

Point a firefox profile to use it and you can really act as if you were in a different subnet: it can proxy dns resolution too.

_ytji · on March 2, 2020

AWS Systems Manager provides a neat solution to do this [0], permission would be managed via IAM.

[0] https://github.com/elpy1/ssh-over-ssm <-- not made by me, but a good example

whatthesmack · on March 2, 2020

This doesn't make any sense. The people that make Okta ASA (formerly ScaleFT) -- Okta themselves -- must VPN into production. I wouldn't use ASA or the like without VPN if the people that make it won't even do that. Besides... it's an extra layer of security. Don't have the requisite certificate, username-password combo, and additional factor? Then you can't even get to port 22 to start with.

samcat116 · on March 2, 2020

How do you know they don’t use it for production?

emperor_ · on March 2, 2020

Interesting read! A couple of days ago I setup a Cloudflare’s access with argotunnels and a CA. A very cool beyondcorp service which looks similar to okta.

mleonhard · on March 3, 2020

TLDR: Use a complicated SSH proxy instead of a VPN.

This has some serious downsides for non-SSH applications. For example, to connect to a production database cluster, one would need to ssh through the proxy to a bastion host, and then set up port forwarding from the bastion host to the database. Setting up a simple database connection now requires shell access to a production server. This is less secure and more complex than using a traditional VPN.

sullivanmatt · on March 3, 2020

A great point. It does depend on your use case, and your dependence on manual operations. For our organization, almost all database interactions and maintenance are performed in code; if somebody is connecting manually, something pretty bad has happened. So for us, we are not really impacted by having to perform port forwarding like this on rare occasion. I completely agree that it could be much more impactful to other organizations.

I'm curious: why is utilizing port forwarding over these mutually authenticated SSH tunnels less secure than employing a VPN? From my perspective, port forwarding still adds a level of intentionality which reduces the likelihood of an incident/accident.

mleonhard · on March 3, 2020

Good VPNs are mutually authenticated. Intentionality is good, but in your example it comes at a cost of complexity. Simplicity is paramount for security.

If intentionality is desired, one can use per-server VPNs.

cordite · on March 3, 2020

I'm surprised to see no mention of Mutual TLS (MTLS), PIV cards, or the like

bb611 · on March 3, 2020

I believe in this case he's talking about MTLS:

> OASA also protects these hops by issuing client certificates with 10-minute expirations after first verifying your identity through our single sign-on provider, and then also verifying you are on a pre-enrolled (and approved) trusted company device.

sullivanmatt · on March 3, 2020

Basically. Since we are focusing on SSH in this post (and keep in mind that SSH is its own protocol, separate from TLS), it's conceptually the same: client has a certificate and a key, signed by a trusted certificate authority, and the client is also in possession of the server certificate authority. So then you have a bi-directional trust established. The certificates are short-lived, and issued after a successful authentication + dial request to the OASA service.

neurobashing · on March 3, 2020

Does anyone happen to know what this costs? They’re predictably quiet about it.

Trias11 · on March 2, 2020

Agree on VPNs. They need to die.

Let elect ZeroTier to be the president of remote, secure access :)

jedieaston · on March 2, 2020

Nebula is very nice too, licensed MIT and has DNS support, something ZeroTier still hasn’t added outside of the ztdns server that someone wrote that I never did get to work properly...

hopefully ZeroTier makes some strides in 2.0.

jyrkesh · on March 3, 2020

Oh my god, thank you for mentioning this project. I just deployed ZeroTier for some local infra, but in doing investigation, I was desparately trying to find the Nebula project name and GitHub. I was looking for "beacon", "lighthouse", "bastion", "ZeroTier compete" and everything in between. I even knew it created at some prominent tech company (kept thinking Netflix, of course it was Slack). I was starting to think that I'd concocted a false memory and it didn't really exist.

So now I gotta go decide if I want to rip out all my existing ZT infra or not.

BlueTemplar · on March 3, 2020

Hmm, static IP suffixes are a thing of the past anyway with IPv6 ? "Tens of millions" ? More like quintillions, and that's for a single IP prefix more typical of a home connection !

sullivanmatt · on March 3, 2020

Sorry, the tens of millions of IP addresses is referencing the IP space of AWS (one of which these network access points occupies at any given point in time).

forkexec · on March 3, 2020

No, no, no. The whole point of a VPN or SSH jumpbox is to airgap critical infrastructure with unknown vulnerabilities behind a hardened point of access. Putting production infrastructure on the public internet is beyond idiotic and regressive, and an invitation to be hacked by an unlimited and unknown number of exploits. It took forever to get departmental firewalls at a big name university where I worked because systems put in before my time like nutrition/meal planner, housing lottery draw, facilities management system and retail POS systems were getting owned left-and-right by remote malware.

I'll keep using fwknop-protected OpenSSH on OpenBSD and WireGuard, others can do whatever they want without thinking about the security vs. convenience.

pquerna · on March 3, 2020

Many of our customers use SSH jumpboxes - its a natively supported feature of Okta Advanced Server Access / ScaleFT.

From a post awhile back about using Bastions with ScaleFT:

> One of our values at ScaleFT is to do our best to support our users where they are, with the decisions and tools they’ve already selected. This means treating SSH bastions as an SSH feature, parameterizing and centralizing the associated configurations, and seamlessly integrating it into our users’ daily workflows.

https://www.scaleft.com/blog/bastion-hopping-with-ssh-and-sc...

So, if you want to layer on top VPNs, or SSH Jump Boxes, we try to let you. We also try to make parts of the chain better whenever we can.

(disclaimer, I'm ScaleFT co-founder)

sullivanmatt · on March 3, 2020

> The whole point of a VPN or SSH jumpbox is to airgap critical infrastructure with unknown vulnerabilities behind a hardened point of access

Yes, I completely agree. This post is literally an endorsement of that idea, with enterprise port knocking mixed in for additional security. At no point in this post do I advocate simply opening all servers to the Internet. Quite the opposite.

If you have suggestions for how I could be clearer in the post, please let me know.

jjoonathan · on March 3, 2020

So, they have drawn their lines of defense in a position you are not used to, and therefore they are beyond idiotic and regressive?

Really?

forkexec · on March 3, 2020

Yeap. There is a night and day difference in attack surfaces between isolating access to a single (or HA pair) jumpbox and N boxes on the internet with no real DMZ or private admin network. Feelings and fashions don't make stupid configurations better. If you have a problem with honest opinions from someone with 25 years of experience, I think you need thicker skin or I can choose to simply not comment and let stupid fashions propagate.

sullivanmatt · on March 3, 2020

I wish you had come into this discussion with constructive criticism, instead of simply swinging a hammer. I, for one, am happy to learn from somebody with a number of years of experience. However showing up on a thread and spewing negativity and name calling isn't a great way to earn respect in this industry.

icedchai · on March 3, 2020

Yep. Putting everything directly on the public Internet is 90's style. I remember it well. Whole offices with public IP addresses. No firewall. It's amazing anyone ever considered this sane, but it was a different time.

SturgeonsLaw · on March 3, 2020

Better bust out the JNCO jeans and Offspring CDs because IPv6 is on it's way and you can bet some deployments will have everything accessible to everything.

icedchai · on March 3, 2020

Yes. That's what firewalls are for!

justlexi93 · on March 3, 2020

In 99.95% of cases, VPNs are set up to:

Bridge a network device – such as a laptop or even another server … into a larger network of servers – such as in the cloud or on-prem … across the Internet – protected with an additional layer of encryption