Hacker News new | past | comments | ask | show | jobs | submit login
CNCF's Cortex v1.0: scalable, fast Prometheus implementation (grafana.com)
181 points by netingle on April 2, 2020 | hide | past | favorite | 46 comments



awesome job by the cortex team!

there's a lot of good questions, and some confusion in this thread. here is my view. note: i'm definitely biased; am the co-founder/ceo at grafana labs.

- at grafana labs we are huge fans of prometheus. it has become the most popular metrics backend for grafana. we view cortex and prometheus as complementary. we are also very active contributors to the prometheus project itself. in fact, cortex vendors in prometheus.

- you can think of cortex as a scale-out, multi-tenant, highly available "implementation" of prometheus itself.

- the reason grafana labs put so much resources into cortex is because it powers our grafana cloud product (which offers a prometheus backend). like grafana itself, we are also actively working on an enterprise edition of cortex that is designed to meet the security and feature requirements of the largest companies in the world.

- yes, cortex was born at weaveworks in 2016. tom wilkie (vp of product at grafana labs) co-created it while he worked there. after tom joined grafana labs in 2018, we decided to pour a lot more resources into the project, and managed to convince weave.works to move it to the cncf. this was a great move for the project and the community, and cortex has come a long long way in the last 2 years.

once again, a big hat tip to everyone who made this release possible. a big day for the project, and for prometheus users in general!

[edit: typos]


I'm worried about this statement:

> Local storage is explicitly not production ready at this time.

https://cortexmetrics.io/docs/getting-started/getting-starte...

But I want a scale-out, multitenant implementation of Prometheus with local storage that's ready for prod. What are my options then? VictoriaMetrics?


I suggest checking out M3DB[1]. My team & I use it to serve metrics for all of Uber, we have ~1500 hosts across various clusters. It's serving us quite well.

[1]: https://github.com/m3db/m3


The only one I know with "non-experimental" local-storage is VictoriaMetrics. But the big thing there is that data in VM is not replicated, so when you lose a disk/node, you lose that data.

Having said that, both Thanos and Cortex have experimental local-storage modes that are pretty good. You could also try them for now while they get production ready.


M3 provides local storage but is not experimental, on top of that with cluster replication which VictoriaMetrics does not provide, and has a kubernetes operator to help scale out a cluster.

Disclosure: I work on the TSDB underlying M3 (M3DB) at Uber. Still worth checking out though!


> data in VM is not replicated, so when you lose a disk/node, you lose that data

The vmstorage component in VictoriaMetrics Server - is it RAID0-like (stripping) or RAID1-like (mirroring)?

https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...


It is easy to implement RAID1-like replication in VictoriaMetrics: just set up independent VictoriaMetrics instances (single-node or clusters) and replicate all the incoming data simultaneously to these instances. This can be done either via providing multiple `remote_write->url` values in Prometheus configs or via providing multiple `-remoteWrite.url` command-line flags in vmagent [1]. Then query multiple VictoriaMetrics replicas via Promxy [2].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast...

[2] https://github.com/jacksontj/promxy


It is more like RAID0. VictoriaMetrics shards time series among available vmstorage nodes. I.e. each vmstorage node contains a part of data stored in the cluster. This is usually named shared nothing architecture [1].

As for data replication, VictoriaMetrics offloads this task to the underlying storage, since the replication is hard to make properly [2]. Proper replication must be able to perform the following tasks additionally to copying the data to multiple nodes:

* To heal the data (aka to return back replication factor) after a node becomes permanently unavailable. The healing process mustn't degrade cluster performance and it must properly handle other cases mentioned below.

* To gracefully handle temporary unavailability of nodes.

* To survive network partitioning when nodes are temporarily split into multiple isolated subnetworks.

* To handle data corruption.

* To continue accepting new data at normal rate when a part of nodes are unavailable.

* To continue serving incoming requests with acceptable latency when a part of nodes are unavailable.

* To replicate data among multiple availability zones (AZ), so the cluster should continue accepting new data and serving requests if a single AZ becomes unavailable.

I'm unsure whether popular systems that claim replication support can handle all the cases mentioned above. The only system that seems to handle these cases properly is GCP persistent disks based on Colossus storage [3]. That's why it is recommended storing VictoriaMetrics data on GCP persistent disks.

[1] https://en.wikipedia.org/wiki/Shared-nothing_architecture

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...

[3] https://medium.com/google-cloud/persistent-disks-and-replica...


There are a bunch of different solutions out there; Thanos, Influx, federated Prometheus etc.

The local Cortex storage works pretty well but we have a very high bar for production worthiness. Right now I'd recommend using Bigtable of DynamoDB, and if you're on premise Cassandra. In the future the block storage will allow you to run minio.


Thanos is probably one of the other popular choices. It's being heavily used in production by a number of companies, but I don't think they've branded it at "Prod ready" in a 1.0 release though.


Thanos doesn't have production support for local storage either. The only stable storage providers for it are google, amazon, and azure's object stores.

https://thanos.io/storage.md/

Interestingly, it looks like Cortex's support for local storage and object stores comes from using Thanos's storage engine. So once it's production ready in Thanos it will probably be production-ready in Cortex shortly thereafter.

https://cortexmetrics.io/docs/operations/blocks-storage/

I think for Cortex your safest storage options now are Bigtable, DynamoDB, or Cassandra.


I may have misinterpreted what they meant by local storage! I was reading that as having a local copy of the TSDB available to Prometheus, (eg: how Thanos works) versus Cortex which doesn't store metrics locally (IIRC).

What you said is correct and makes sense. Though, I would suspect either choice works with any S3 compatible API that can run on local storage, but I know that isn't necessarily what's meant by "local storage".


"local storage" = I don't want to install yet another gizmo just to store data, nor do I want to use an external service for that

Batteries included.


Please note the difference between complimentary and complementary. It's a common homophone confusion in English.

The former means free or charge or expressing praise or a compliment.

The latter means disparate things go well together and enhance each others' qualities.


thanks for the complimentary tip ;) fixed.


also props to https://weave.works for creating cortex, open-sourcing it and moving it under cncf, something this blog post leaves out.


Hi! Tom, one of the Cortex authors here. Super proud of the team and this release - let me know if you have any questions!


Hey Tom!

Can you outline how Cortex differs from some of the other available Prometheus backends?


Sure, check out this talk from PromCon I did with Bartek, the Thanos author: https://grafana.com/blog/2019/11/21/promcon-recap-two-househ...


Love that talk. :)



Great job Cortex team, Do you think this means Cortex will move to incubation in the CNCF landscape ?


I hope so! Goutham is apply for incubation as we speak..


This will also depend on SIG o11y, the creation of which is currently being voted on by CNCF TOC. TOC vote is looking good and projects which have been in sandbox for some time are obvious candidates for early review.


Isn't prometheus an implementation and not an interface? I have "prometheus" running in my cluster, if it's not cortex, what implementation am I using?


It's kinda several things

- The OSS product

- The Storage Format (I guess)

- The Interface for pulling metrics (https://github.com/OpenObservability/OpenMetrics)

I haven't dug into cortex even a little, but the other comments are suggesting it's API compatible but essentially claiming they're production ready because they'll give you things the OSS project won't give you out of the box, i.e. long term storage and RBAC.

Looks like a good thing.


> wrapping prometheus and giving you that production readyness that they're claiming the OSS project won't give you out of the box

No! Prometheus is and has been production ready for many years. Cortex is a clustered/horizontally scalable implemention of the Prometheus APIs, and Cortex has just gone production ready. Sorry for the confusion.


Just want to say, I use prometheus. It's amazing.

But readiness depends somewhat on your use case. If you're on a multi-tenanted cluster and you don't want to explicit trust your users / admins, how do you stop them from messing with your metrics whilst allowing them to maintain their own?

I typically did it via github flow, some others used the operator to give us many proms, some others would just suggest it's missing features.

Indeed, I could probably word my example better though. Apologies if I were putting words in your mouth.


And I have Prometheus data from 2015, so I would argue that's long-term.


You are using Prometheus.

However, Prometheus can use different storage backends. The TSDB that it comes with is horrible.

I mean, it's workable. And can store an impressive amount of data points. If you don't care about historical data or scale, it may be all you need.

However, if your scale is really large, or if you care about the data, it may not be the right solution, and you'll need something like Cortex.

For instance, Prometheus' own TSSB has no 'fsck'-like tool. From time to time, it does compaction operations. If your process (or pod in K8s) dies, you may be left with duplicate time series. And now you have to delete some (or a lot!) of your data to recover.

Prometheus documentation, last I checked, even says it is not suitable for long-term storage.


The TSDB it uses is actually pretty state of the art. I think your pain point is more that it's designed for being used on local disk, but that doesn't mean it isn't possible to store the TSDB remotely. In fact, this is exactly how Thanos works.

The docs say Prometheus is not intended for long term storage because without a remote_write configuration, all data is persisted locally, and thus you will eventually hit limits on the amount that can be stored and queried locally. However, that is a limitation on how Prometheus is designed, not how the TSDB is designed, and which can be overcome by using a remote_write adapter.


> The TSDB that it comes with is horrible.

The TSDB in Prometheus since 2.0 is excellent for its use case.


Yes, Prometheus is an implementation - the HN text has a limited number of words, so I thought "Prometheus implementation" conveyed the fact Cortex was trying to be a 100% API compatible implementation of Prometheus, but with scalability, replication etc


how about:

CNCF's Cortex v1.0: scalable, fast Prometheus API implementation ready for prod (grafana.com)

saves 1 char.


Yes, you're running the Prometheus server. But what Cortex is a Prometheus API compatible service that horizontally scales and has multi-tenancy and other things built in.


Dat architecture tho: https://cortexmetrics.io/docs/architecture/ . Holy bi-gebus.


Thats the "microservices" mode - you can run it as a single process and the architecture becomes super boring.

Its like looking at the module interdependencies of reasonably large piece of software; of course its going to look complicated.


According to Cortex docs [1], a single-process Cortex isn't production ready. It is intended for development and testing only.

[1] https://cortexmetrics.io/docs/configuration/single-process-c...


Congrats to Grafana Team!

If you're looking at scaling your Prometheus setup - check out also Victoria Metrics.

Operational simplicity and scalability/robustness are what drive me to it.

I used to to send metrics from multiple Kubernetes clusters with Prometheus - each cluster having Prom with remote_write directive to send metrics to central VictoriaMetrics service.

That way my "edge" prometheus installations are practically "stateless", easily set up using prometheus-operator. You don't even need to add persistent storage to them.


New to Cortex but when looking at a comparison of Prometheus and InfluxDB (like https://prometheus.io/docs/introduction/comparison/#promethe...) it appears that Cortex offers similar horizontal scalability features to the InfluxDB Enterprise offering. The linked comparison does note the difference between event logging and metrics recording but I am curious (choosy beggar that I am) whether others consider them separate tooling or whether it is possible to remain performant using one solution.


This was a Weaveworks project right?


Yes, it was created at Weaveworks, but it was later donated to CNCF and now the community is much bigger! Having said that Weaveworks is still a major contributor!


Really exciting! Well done


Great job Cortex team!


Good job, excited!


Reminder: github star history is in no way a measure of quality.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: