CloudWatch Logs has no ingestion rate limits. Using it is a real financial risk. Misbehaving servers can quickly generate huge AWS charges, up to $74,000 per day per account. Even 12 servers could produce $4320 in charges just over a weekend.
Also the CloudWatch Logs console is unusable for even simple tasks.
Datadog and its competitors have ingestion rate limits, but they are overall quite expensive. Self-hosted log analysis tools are exceedingly complex to set up and maintain: ElasticSearch + Kibana, Grafana + Loki.
just curious (and i'll admit i'm biased, am co-founder @ grafana labs), what do you find exceedingly complex about setting up and maintaining grafana + loki? both are single binary, and can be run without any dependencies.
I'm using Terraform to maintain identical staging and prod deployments. Grafana is difficult to deploy statelessly, so this adds yet another Terraform deployment stage:
1. Deploy host running dockerd
2. Deploy Grafana server
3. Configure Grafana server admin password, organizations, users, and passwords.
If Grafana would just support file-based configuration then a whole stage could be eliminated.
"Loki does not come with any included authentication layer. Operators are expected to run an authenticating reverse proxy in front of your services, such as NGINX using basic auth or an OAuth2 proxy."
Unfortunately, this means that any process that can write logs can read all logs. This violates the principle of least privilege, a core part of system security. Prometheus suffers from this, too. To put this into concrete terms: When someone uploads a malicious image through our app which exploits our image resizing server, they will obtain log-writing credentials. Those credentials should not allow them to read all the logs and steal the user data in them. That would be catastrophic for the users and the company.
ElasticSearch supposedly has ACL support, but it is mostly undocumented and full of foot-guns. For example, their security doc omits a necessary flag which enables password enforcement. After following the guide, I discovered that the passwords I had set up were not being checked. I immediately lost all confidence in ElasticSearch as a tool to safely store user data. I deleted it.
I used InfluxDB since it lets me create write-only user accounts. Unfortunately, Grafana's integration with InfluxDB is problematic.
The lack of useful debug logging in Grafana makes troubleshooting especially difficult.
Also the CloudWatch Logs console is unusable for even simple tasks.
Datadog and its competitors have ingestion rate limits, but they are overall quite expensive. Self-hosted log analysis tools are exceedingly complex to set up and maintain: ElasticSearch + Kibana, Grafana + Loki.