CNCF's Cortex v1.0: scalable, fast Prometheus implementation

nopzor · on April 2, 2020

awesome job by the cortex team!

there's a lot of good questions, and some confusion in this thread. here is my view. note: i'm definitely biased; am the co-founder/ceo at grafana labs.

- at grafana labs we are huge fans of prometheus. it has become the most popular metrics backend for grafana. we view cortex and prometheus as complementary. we are also very active contributors to the prometheus project itself. in fact, cortex vendors in prometheus.

- you can think of cortex as a scale-out, multi-tenant, highly available "implementation" of prometheus itself.

- the reason grafana labs put so much resources into cortex is because it powers our grafana cloud product (which offers a prometheus backend). like grafana itself, we are also actively working on an enterprise edition of cortex that is designed to meet the security and feature requirements of the largest companies in the world.

- yes, cortex was born at weaveworks in 2016. tom wilkie (vp of product at grafana labs) co-created it while he worked there. after tom joined grafana labs in 2018, we decided to pour a lot more resources into the project, and managed to convince weave.works to move it to the cncf. this was a great move for the project and the community, and cortex has come a long long way in the last 2 years.

once again, a big hat tip to everyone who made this release possible. a big day for the project, and for prometheus users in general!

[edit: typos]

Florin_Andrei · on April 2, 2020

I'm worried about this statement:

> Local storage is explicitly not production ready at this time.

https://cortexmetrics.io/docs/getting-started/getting-starte...

But I want a scale-out, multitenant implementation of Prometheus with local storage that's ready for prod. What are my options then? VictoriaMetrics?

prungta · on April 2, 2020

I suggest checking out M3DB[1]. My team & I use it to serve metrics for all of Uber, we have ~1500 hosts across various clusters. It's serving us quite well.

[1]: https://github.com/m3db/m3

gouthamve · on April 2, 2020

The only one I know with "non-experimental" local-storage is VictoriaMetrics. But the big thing there is that data in VM is not replicated, so when you lose a disk/node, you lose that data.

Having said that, both Thanos and Cortex have experimental local-storage modes that are pretty good. You could also try them for now while they get production ready.

simonrobb · on April 2, 2020

M3 provides local storage but is not experimental, on top of that with cluster replication which VictoriaMetrics does not provide, and has a kubernetes operator to help scale out a cluster.

Disclosure: I work on the TSDB underlying M3 (M3DB) at Uber. Still worth checking out though!

Florin_Andrei · on April 2, 2020

> data in VM is not replicated, so when you lose a disk/node, you lose that data

The vmstorage component in VictoriaMetrics Server - is it RAID0-like (stripping) or RAID1-like (mirroring)?

https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...

valyala · on April 6, 2020

It is easy to implement RAID1-like replication in VictoriaMetrics: just set up independent VictoriaMetrics instances (single-node or clusters) and replicate all the incoming data simultaneously to these instances. This can be done either via providing multiple `remote_write->url` values in Prometheus configs or via providing multiple `-remoteWrite.url` command-line flags in vmagent [1]. Then query multiple VictoriaMetrics replicas via Promxy [2].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast...

[2] https://github.com/jacksontj/promxy

valyala · on April 4, 2020

It is more like RAID0. VictoriaMetrics shards time series among available vmstorage nodes. I.e. each vmstorage node contains a part of data stored in the cluster. This is usually named shared nothing architecture [1].

As for data replication, VictoriaMetrics offloads this task to the underlying storage, since the replication is hard to make properly [2]. Proper replication must be able to perform the following tasks additionally to copying the data to multiple nodes:

* To heal the data (aka to return back replication factor) after a node becomes permanently unavailable. The healing process mustn't degrade cluster performance and it must properly handle other cases mentioned below.

* To gracefully handle temporary unavailability of nodes.

* To survive network partitioning when nodes are temporarily split into multiple isolated subnetworks.

* To handle data corruption.

* To continue accepting new data at normal rate when a part of nodes are unavailable.

* To continue serving incoming requests with acceptable latency when a part of nodes are unavailable.

* To replicate data among multiple availability zones (AZ), so the cluster should continue accepting new data and serving requests if a single AZ becomes unavailable.

I'm unsure whether popular systems that claim replication support can handle all the cases mentioned above. The only system that seems to handle these cases properly is GCP persistent disks based on Colossus storage [3]. That's why it is recommended storing VictoriaMetrics data on GCP persistent disks.

[1] https://en.wikipedia.org/wiki/Shared-nothing_architecture

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...

[3] https://medium.com/google-cloud/persistent-disks-and-replica...

netingle · on April 2, 2020

There are a bunch of different solutions out there; Thanos, Influx, federated Prometheus etc.

The local Cortex storage works pretty well but we have a very high bar for production worthiness. Right now I'd recommend using Bigtable of DynamoDB, and if you're on premise Cassandra. In the future the block storage will allow you to run minio.

ecnahc515 · on April 2, 2020

Thanos is probably one of the other popular choices. It's being heavily used in production by a number of companies, but I don't think they've branded it at "Prod ready" in a 1.0 release though.

sciurus · on April 2, 2020

Thanos doesn't have production support for local storage either. The only stable storage providers for it are google, amazon, and azure's object stores.

https://thanos.io/storage.md/

Interestingly, it looks like Cortex's support for local storage and object stores comes from using Thanos's storage engine. So once it's production ready in Thanos it will probably be production-ready in Cortex shortly thereafter.

https://cortexmetrics.io/docs/operations/blocks-storage/

I think for Cortex your safest storage options now are Bigtable, DynamoDB, or Cassandra.

ecnahc515 · on April 2, 2020

I may have misinterpreted what they meant by local storage! I was reading that as having a local copy of the TSDB available to Prometheus, (eg: how Thanos works) versus Cortex which doesn't store metrics locally (IIRC).

What you said is correct and makes sense. Though, I would suspect either choice works with any S3 compatible API that can run on local storage, but I know that isn't necessarily what's meant by "local storage".

Florin_Andrei · on April 2, 2020

"local storage" = I don't want to install yet another gizmo just to store data, nor do I want to use an external service for that

Batteries included.

m0rphling · on April 2, 2020

Please note the difference between complimentary and complementary. It's a common homophone confusion in English.

The former means free or charge or expressing praise or a compliment.

The latter means disparate things go well together and enhance each others' qualities.

nopzor · on April 2, 2020

thanks for the complimentary tip ;) fixed.

kapilvt · on April 2, 2020

also props to https://weave.works for creating cortex, open-sourcing it and moving it under cncf, something this blog post leaves out.

netingle · on April 2, 2020

Hi! Tom, one of the Cortex authors here. Super proud of the team and this release - let me know if you have any questions!

number101010 · on April 2, 2020

Hey Tom!

Can you outline how Cortex differs from some of the other available Prometheus backends?

netingle · on April 2, 2020

Sure, check out this talk from PromCon I did with Bartek, the Thanos author: https://grafana.com/blog/2019/11/21/promcon-recap-two-househ...

MetalMatze · on April 2, 2020

Love that talk. :)

valyala · on April 4, 2020

VictoriaMetrics FAQ contains comparisons to Cortex [1] and Thanos [2].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/FAQ#...

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/FAQ#...

ctovena · on April 2, 2020

Great job Cortex team, Do you think this means Cortex will move to incubation in the CNCF landscape ?

netingle · on April 2, 2020

I hope so! Goutham is apply for incubation as we speak..

RichiH · on April 2, 2020

This will also depend on SIG o11y, the creation of which is currently being voted on by CNCF TOC. TOC vote is looking good and projects which have been in sandbox for some time are obvious candidates for early review.

ones_and_zeros · on April 2, 2020

Isn't prometheus an implementation and not an interface? I have "prometheus" running in my cluster, if it's not cortex, what implementation am I using?

ownagefool · on April 2, 2020

It's kinda several things

- The OSS product

- The Storage Format (I guess)

- The Interface for pulling metrics (https://github.com/OpenObservability/OpenMetrics)

I haven't dug into cortex even a little, but the other comments are suggesting it's API compatible but essentially claiming they're production ready because they'll give you things the OSS project won't give you out of the box, i.e. long term storage and RBAC.

Looks like a good thing.

netingle · on April 2, 2020

> wrapping prometheus and giving you that production readyness that they're claiming the OSS project won't give you out of the box

No! Prometheus is and has been production ready for many years. Cortex is a clustered/horizontally scalable implemention of the Prometheus APIs, and Cortex has just gone production ready. Sorry for the confusion.

ownagefool · on April 2, 2020

Just want to say, I use prometheus. It's amazing.

But readiness depends somewhat on your use case. If you're on a multi-tenanted cluster and you don't want to explicit trust your users / admins, how do you stop them from messing with your metrics whilst allowing them to maintain their own?

I typically did it via github flow, some others used the operator to give us many proms, some others would just suggest it's missing features.

Indeed, I could probably word my example better though. Apologies if I were putting words in your mouth.

RichiH · on April 2, 2020

And I have Prometheus data from 2015, so I would argue that's long-term.

outworlder · on April 2, 2020

You are using Prometheus.

However, Prometheus can use different storage backends. The TSDB that it comes with is horrible.

I mean, it's workable. And can store an impressive amount of data points. If you don't care about historical data or scale, it may be all you need.

However, if your scale is really large, or if you care about the data, it may not be the right solution, and you'll need something like Cortex.

For instance, Prometheus' own TSSB has no 'fsck'-like tool. From time to time, it does compaction operations. If your process (or pod in K8s) dies, you may be left with duplicate time series. And now you have to delete some (or a lot!) of your data to recover.

Prometheus documentation, last I checked, even says it is not suitable for long-term storage.

ecnahc515 · on April 2, 2020

The TSDB it uses is actually pretty state of the art. I think your pain point is more that it's designed for being used on local disk, but that doesn't mean it isn't possible to store the TSDB remotely. In fact, this is exactly how Thanos works.

The docs say Prometheus is not intended for long term storage because without a remote_write configuration, all data is persisted locally, and thus you will eventually hit limits on the amount that can be stored and queried locally. However, that is a limitation on how Prometheus is designed, not how the TSDB is designed, and which can be overcome by using a remote_write adapter.

sagichmal · on April 2, 2020

> The TSDB that it comes with is horrible.

The TSDB in Prometheus since 2.0 is excellent for its use case.

netingle · on April 2, 2020

Yes, Prometheus is an implementation - the HN text has a limited number of words, so I thought "Prometheus implementation" conveyed the fact Cortex was trying to be a 100% API compatible implementation of Prometheus, but with scalability, replication etc

cat199 · on April 2, 2020

how about:

CNCF's Cortex v1.0: scalable, fast Prometheus API implementation ready for prod (grafana.com)

saves 1 char.

gouthamve · on April 2, 2020

Yes, you're running the Prometheus server. But what Cortex is a Prometheus API compatible service that horizontally scales and has multi-tenancy and other things built in.

Rapzid · on April 2, 2020

Dat architecture tho: https://cortexmetrics.io/docs/architecture/ . Holy bi-gebus.

netingle · on April 2, 2020

Thats the "microservices" mode - you can run it as a single process and the architecture becomes super boring.

Its like looking at the module interdependencies of reasonably large piece of software; of course its going to look complicated.

valyala · on April 4, 2020

According to Cortex docs [1], a single-process Cortex isn't production ready. It is intended for development and testing only.

[1] https://cortexmetrics.io/docs/configuration/single-process-c...

zytek · on April 2, 2020

Congrats to Grafana Team!

If you're looking at scaling your Prometheus setup - check out also Victoria Metrics.

Operational simplicity and scalability/robustness are what drive me to it.

I used to to send metrics from multiple Kubernetes clusters with Prometheus - each cluster having Prom with remote_write directive to send metrics to central VictoriaMetrics service.

That way my "edge" prometheus installations are practically "stateless", easily set up using prometheus-operator. You don't even need to add persistent storage to them.

mmcclellan · on April 2, 2020

New to Cortex but when looking at a comparison of Prometheus and InfluxDB (like https://prometheus.io/docs/introduction/comparison/#promethe...) it appears that Cortex offers similar horizontal scalability features to the InfluxDB Enterprise offering. The linked comparison does note the difference between event logging and metrics recording but I am curious (choosy beggar that I am) whether others consider them separate tooling or whether it is possible to remain performant using one solution.

stuff4ben · on April 2, 2020

This was a Weaveworks project right?

gouthamve · on April 2, 2020

Yes, it was created at Weaveworks, but it was later donated to CNCF and now the community is much bigger! Having said that Weaveworks is still a major contributor!

mattmendick · on April 2, 2020

Really exciting! Well done

rfratto · on April 2, 2020

Great job Cortex team!

demilich · on April 3, 2020

Good job, excited!

throwaway50203 · on April 2, 2020

Reminder: github star history is in no way a measure of quality.