Show HN: Datree (YC W20): Prevent K8s misconfigurations from reaching production

exdsq · on Oct 19, 2021

I can't believe how complex a tool has to be for several startups to get funding purely to try stop it doing bad things. Kudos to Datree for making the most of this, but feels like something's wrong with k8s for this to be such a thing.

wvh · on Oct 19, 2021

Have you ever tried to set up HA environments before Kubernetes and friends, let alone anything that can do failover and some form of auto-scaling? Kubernetes might have some unwarranted complexity, but let's not pretend it was a walk in the park setting environments up with ad-hoc shell scripts, HA proxy, IP failover, manual package updates and more fun stuff. Tools like Kubernetes and their general abstractions are vastly better than anything I could come up with in a time before even Puppet and Ansible existed. You have to take into consideration all that orchestrators like Kubernetes are trying to replace before you can meaningfully criticise their complexity.

Sadly enough it's all YAML, but at least it's all YAML and not random mostly undocumented arcane configuration formats...

johntash · on Oct 20, 2021

> Have you ever tried to set up HA environments before Kubernetes and friends, let alone anything that can do failover and some form of auto-scaling?

Those were simpler days for sure.

1) Set up multiple fileservers w/ DRDB or some sort of clustered filesystem like glusterfs 2) Set up multiple database servers in a cluster 3) Set up multiple web servers, all of which can connect to the NAS and DB 4) Optionally set up a cache layer like Varnish 5) Set up a hardware load balancer (or even varnish/nginx/haproxy)

Autoscaling on bare metal wasn't really a thing for most DCs, but setting up HA environments has been possible for a long time. Things were actually a lot less complicated because there weren't as many options imo. You'd generally overprovision to handle your peak loads, or maybe buy a few more servers before busy seasons/days and cancel them later.

Managing a bare metal kubernetes installation feels like it has a lot more moving pieces and ways that it can break. Cloud providers and managed services do take away a lot of the burden though.

exdsq · on Oct 20, 2021

My first job was managing bare metal for a variety of companies including fairly large multinational labs, all the way down to plugging in the RAM and fiddling with the routers, mostly for VoIP systems but also data backups. I didn't architect any HA environments but worked on some.

nonameiguess · on Oct 19, 2021

The problem is right here in the blurb. It's not that k8s itself is complex. Ops is complex. Companies used to have entire dedicated IT departments full of sysadmins, storage engineers, network engineers, and security engineers with decades of experience configuring, deploying, maintaining, and monitoring servers, networks, hypervisors, and data centers.

Newer companies just push this onto their application developers, expecting them to figure this stuff out on top of being developers, "full-stack" now meaning you need to understand everything down to filesystems, overlay networks, container runtimes. This is not a reasonable expectation. Nobody can be an expert in everything.

Of course, I'm not sure full automation can really replace human expertise, either.

romanlab · on Oct 19, 2021

It's true that a developer can't be an expert in every aspect of the stack but that's where services like Datree or many of AWS' services for example come in, they bring the domain expertise and require the developer to only be familiar with the subject. The experts moved to be domain experts, working for the companies that develop the tools.

You don't really need a resident storage expert in every company, since most companies have similar needs.

exdsq · on Oct 19, 2021

But suddenly you need to keep up with a ton of different tools and if they break you either dig into the topic or you're f**ed. This is such a productivity killer it's crazy.

geofft · on Oct 19, 2021

IMO Kubernetes is of the level of complexity of a programming language or an OS. It's just at a larger scope, and we don't have a lot of things in that space, so we don't have well-defined concepts like "language" or "OS" to encompass them.

There are basically entire industries dedicated to stopping programming languages from doing bad things (static analysis vendors, auditing consultancies, formal verification tool vendors, etc.) and to stopping OSes from doing bad things (application-focused monitoring tools, security-focused intrusion detection tools, policy enforcement / device management vendors, etc.), and we don't really say "Wow, something is wrong with C++" or "Wow, something is wrong with Linux." We understand that they are high-power tools and you do want additional tools to focus that power.

(Well, to be fair, I say something is wrong with C++, but my preferred solution to that is even more complicated programming languages :) )

TeMPOraL · on Oct 19, 2021

Don't we? :). I say[0] something is wrong with programming itself, on a more fundamental level than the evolutionary history of C++.

I can't give a coherent and detailed analysis of it yet[1] - but I have this growing feeling that we're drowning in accidental complexity all across the board, at every abstraction layer. Like an inverse iceberg - where we see this whole, humongous mountain of tooling required to build and maintain software systems, but you can't shake the impression that we should be able to do the job with just the bit that's sticking above the waterline.

Speaking of k8s being "of the level of complexity of a programming language or an OS", I bet there's some formal way to show some isomorphism here - them being different incarnations of the same abstract structure. It's another kind of feeling I get when jumping up and down the software stack[2]. Maybe one day we'll figure it all out.

--

[0] - https://news.ycombinator.com/item?id=28568053

[1] - But I am collecting observations and trying to mull the problem over in my subconscious mind.

[2] - Like e.g. code is data is code; your config parser is an interpreter of a programming language. Often enough, it grows to look like a typical PL, then gets replaced by one[3]. If your config happens to describe infrastructure, at some point you might realize you're writing "function calls" for business logic that are implemented in terms of spinning clusters up and down. Or e.g. the realization that DCOM is essentially microservices, so for some two decades or more, every Windows installation had something similar to k8s deep inside its bowels.

[3] - https://mikehadlow.blogspot.com/2012/05/configuration-comple...

romanlab · on Oct 19, 2021

That may be true but programming languages are there to give you the power to develop any idea into a working software solution. I think K8s differs because I see it as something that simplifies infra and abstracts infra vendor specific concepts. The complexity in K8s doesn't add power, just confusion ;)

geofft · on Oct 19, 2021

Kubernetes does simplify and abstract - just like C++ simplifies and abstracts compared to writing assembly. :) In both cases, the scope of what you can do expands significantly, which means that the systems you build with the higher-level tool are significantly more complex, which means that you see much more confusing C++ programs and Kubernetes deployments than assembly programs and five-lovingly-handcrafted-servers deployments.... but that's because you can successfully start with a more complex idea and make it happen.

shimont · on Oct 19, 2021

I know! I think that the fact that developers are dealing more and more with infra is very empowering but on the other hand brings new challenges. It is no longer Dev VS OPS, but now Devs also need to learn infra best practices, so tools like ours help them :) thank you for your Kudos! <3

nijave · on Oct 19, 2021

Not much different than Windows, Linux, etc (operating systems)

K8s basically implements a distributed OS that schedules across machines instead of CPU cores

Not-too-long-ago AWS had a significant outage due to misconfigured ulimits (Linux os)

mr_moose · on Oct 20, 2021

As someone who works in a similar space (K8s configuration management and IaC), I'm curious what drove you to develop a CLI tool for enforcing policies as opposed to something that is able to integrate with K8s more closely such as OPA Gatekeeper or Kyverno?

As I understand, the primary users of policy tools are platform teams, infrastructure teams, or some other entity who needs to able to create, manage, and enforce policies over domains that they're responsible for.

When I look at Datree from the POV of a platform team, I see a tool that I must trust dev teams to use to enforce policies.

Yes, I can hide my K8s cluster behind a CI/CD pipeline that runs Datree, but this is limiting for organizations that actually want to let its dev teams access its K8s clusters directly or run workloads that themselves can create K8s resources (e.g. operators).

By contrast, OPA Gatekeeper or Kyverno do not have such limitations because they allow policies to be enforced at the cluster itself.

Both also allow platform teams to create new policies and detect if there are any K8s resources _already_ in the cluster that are in violation of the new policies (i.e. Day 2 operations).

Lastly, both even offer CLI tools for dev teams to use to detect issues earlier during development.

I would argue though that dev teams are actually secondary to platform teams in terms of who to focus on when building policy tools since platform teams usually have more of an interest/responsibility in enforcing policies and therefore more of a say in what policy tools to adopt for an organization.

Hence, I was curious why you started with a CLI tool which seems to be more of a dev-centric approach rather than platform-centric.

Also, more specifically, what makes Datree a better option over OPA Gatekeeper or Kyverno?

shimont · on Oct 20, 2021

Hey, this is a great question.

We are big believers in "shift-left" and trying to fix/avoid issues as early as possible. We started with a CLI tool as it is agnostic and can be run in the devs IDE like VSCODE, in the terminal and finally in the CI/CD process.

We love OPA and think that GateKeeper is a good solution, but we want to provide feedback as early as possible. While Gatekeeper will block a deployment to the Kubernetes cluster at the end of the development process.

As a developer myself I would rather be notified for an issue as early as possible and not find our about it in the very last second before it goes live to production.

We might add support similar to GateKeeper in the future, but we wanted to be shift-left first :)

I hope this answers your question Thank you

bobbiechen · on Oct 19, 2021

Looks like this is a policy-as-code tool in the vein of Terraform Cloud's Sentinel Policies, or more generally Open Policy Agent, but specifically targeted at k8s use cases.

From the custom rules overview https://hub.datree.io/custom-rules-overview , though it is docs WIP, I noticed these are defined as YAML/JSON somehow. That's a contrast to HashiCorp's Sentinel https://docs.hashicorp.com/sentinel/concepts/language and OPA's Rego https://www.openpolicyagent.org/docs/latest/policy-language/ . Is this an intentional design decision?

shimont · on Oct 20, 2021

We just released support for custom rules :) from interviewing our users, we decided to start with [0] JSON Schema as it is very easy to write rules using it and you do not have to learn rego.

Having said that, we might add OPA .rego support in the near future :)

What is the desired way for you to write custom policy rules?

[0] - https://json-schema.org/

sbkg0002 · on Oct 19, 2021

Is it me, or is the dashboard only for paying customers?

What does this add compared to Polaris by fairwinds ?

shimont · on Oct 19, 2021

The dashboard is also offered as part of our freemium offering :) we offer 1000 policy checks per month for free. Including the dashboard.

In terms of what we offer compared to Polaris: We offer pre-defined policies that comes out of the box along with the ability to write custom rules for your policy by your self.

Take us for a spin and let me know what you think! thank you

bbrennan · on Oct 19, 2021

FYI - Polaris open source has both these things :)

(Disclosure - I'm a maintainer)

verchol · on Oct 19, 2021

K8s manifest are yamls that translated to tens of K8s actions that change production environment .Misconfigurtions that can be prevented by integrating datree cli into CI/CD cycle can save hours of production unreliability. For me it's must have phase in K8s release flow.

shimont · on Oct 20, 2021

Thank you Varchol :) Also you can use Helm which might help on top of Kubernetes manifests

nawgz · on Oct 19, 2021

Why is this whole thread dead? Just because of the blatant appearance of astroturf or what?

dang · on Oct 19, 2021

I killed all the booster comments. We tell YC startups not to do that. All startups, of course, but especially YC startups.

Sometimes it happens inadvertently (e.g. users find out about the thread and rush in to 'help'), but obviously we want the discussion here to be substantive.

shimont · on Oct 19, 2021

Hey, some of our friends were over eager to help hehe :)

I look forward to hearing your feedback! Thank you

nawgz · on Oct 19, 2021

Thanks, makes sense, appreciate the reply!