Hacker News new | past | comments | ask | show | jobs | submit login
Upptime/upptime: Uptime monitor and status page powered by GitHub (github.com/upptime)
168 points by simonpure on June 30, 2022 | hide | past | favorite | 58 comments



Pretty cool! But, before you jump into it... Unless Github have changed their terms, this is against the ToS of Github actions. If I recall correctly, they are meant to be used for building/testing/maintaining the project included in the repo.


Regardless, don't lean on these big-tech central points of failure. It can be taken away from you on the arbitrary whim of some exec - not a great resilience plan for an availability monitor if you're doing anything remotely serious.


You must rely on some big-tech with status page because you can't do it within your own infrastructure.


There are some non-big-tech status pages. I use Oh Dear (from a previous recommendation on HN).


Here's the direct link: https://docs.github.com/en/site-policy/github-terms/github-t...

> Actions should not be used for: [...]

> if using GitHub-hosted runners, any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used.


Also, GitHub actions is not necessarily free. There is a $0.008/minute fee for every run. You just don’t think about it because you have minutes included in your current plan. This tool runs every 5 minutes which by my math equates to ~8640 runs/month. GitHub’s free edition has 2000 minutes/month. You can see how this might actually be great for GitHub’s bottom line.


Maker here. When you have an open-source project (i.e., public repository), you get unlimited minutes for free. The free edition's 2,000 minutes per month only count if it's a private repository.


I wonder how long github will allow unlimited free minutes, when more and more such non-CI projects are popping up.


Thanks Intel!


Isn't this maintaining?


Generally it’ll be used for monitoring uptime of another project. So even if that falls under “maintaining” it wouldn’t be for the project “in the repository”, unless you somehow combine upptime into the repo of the project you’re monitoring. But that could get very messy.


You can use `git checkout --orphan gh-pages` to make a branch with no parent commit. Put the upptime stuff there and maybe it just works in the same project. I haven't tried upptime, so ymmv.


That is irrelevant. Project to repository mapping is arbitrary.


Some people have stated that GitHub Actions (or even GitLab CI) might now have an uptime as good as you'd expect for something like this, while others have brought ToS into the question and whether this is okay to do.

On that note, I feel like this is probably a fun and clever hack, though might also cause problems if it would happen at too large a scale, much like people tried mining crypto in CI a while ago, which was really problematic.

But then that leaves another question: what is everyone using for uptime monitoring?

A cloud service of some sort? Self-hosted software in a different region/VPS provider with some alerting integrations?

My current setup is pretty boring, but I figured I'd share nonetheless:

  - for uptime monitoring itself, I use Uptime Kuma: https://github.com/louislam/uptime-kuma
  - it's a reasonably simple and lightweight piece of software, with nice alerting capabilities
  - I have it hooked up to a self-hosted Mattermost instance for alerting, with the app on my phone
  - should the instance or the server itself fail, I have a separate solution to monitor the infrastructure itself
  - for infrastructure monitoring, I use Zabbix: https://www.zabbix.com/
  - Zabbix will send me alerts to my e-mail (also self-hosted) should any issues become apparent
  - I use Let's Encrypt, Caddy/Nginx/Apache (depends) and also containers where applicable: Swarm/Nomad/K3s (depends)
  - storage uses bind mounts and DNS records use CNAME, so I can easily move stuff around
Some of the aspects might be old fashioned, I know that people really enjoy Prometheus and Grafana, but personally it all seems to work decently - even if Zabbix for example is a bit old fashioned, it has a pretty sane monitoring setup out of the box (using Zabbix agent in a passive configuration).


> I know that people really enjoy Prometheus and Grafana, but personally it all seems to work decently

Those are for hard core metrics - not for status metrics which are trivial and you have only couple of them. When you have thousands of metrics, uptime kuma and bunch of friends wont help you.

Detailed App/Infra metrics can also run on your own infrastructure unlike status pages that should use something independent. In your case, if your local mattermost fails, you will get 0 notifications. It doesn't even have to fail, its enough that your company Internet stops working (like it happened to me today, 20 + services were not available, including local statping [1] instance)

> even if Zabbix for example is a bit old fashioned,

Its not a problem that it is old fashioned but that it is harder then it needs to be - compared for example to telegraf/influxdb/grafana its setup on both sides is far from trivial. With influxdb, I can send metrics even from PowerShell scripts with 0 effort [2].

[1]: https://github.com/statping/statping

[2]: https://github.com/majkinetor/psinflux


> When you have thousands of metrics, uptime kuma and bunch of friends wont help you.

This is fair! Actually Uptime Kuma still doesn't support a multi-user mode (e.g. one admin user, multiple users that can edit values/setup, maybe some for viewing data).

That said, I'm also at the scale where it makes perfect sense to use something this simplistic and there are few things that give me joy than running/building a container image and getting working software in less than an hour, which at my scale is also good for most if not all "day 2" concerns.

> Detailed App/Infra metrics can also run on your own infrastructure unlike status pages that should use something independent In your case, if your local mattermost fails, you will get 0 notifications.

Another fair point! That said, there's very little preventing you from choosing the most boring and stable multi-cloud setup that you can find. A Docker container for the software, with a reverse proxy and connected to the aforementioned infrastructure monitoring.

Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification. Containers fail health checks? Might still need to work on this, but totally doable as well with minimal work.

Of course, there's also a lot of variability to how you can lay everything out - for example, I run some of my personal infrastructure from nodes that are in another room at my place, most other parts off of rented VMs in a semi-local company. My homepage, for example, has both Uptime Kuma as well as external monitoring service connected to it, just to compare how believable those values are.

At work, though? For development/test environments, Uptime Kuma on a separate server is enough (say, if you have one that controls the container cluster or aggregates other metrics, might as well spin up a simple container there), or any other software package that's necessary, like Apache Skywalking etc.

For production? Frankly, depending on what you're running, you might as well get a team of people together and come up with something that has proper redundancies in place, as well as a multi-cloud strategy.


> Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification.

If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else. So that thing you mention is NOT a borring technology (which should be promoted) but outsourceing (which should NOT get promoted in general).

> For development/test environments, Uptime Kuma on a separate server is enough

It doesn't matter as your network will fail. There is nothing worse then status page having false positives.


> If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else.

Consider this example:

  I have Zabbix on server A.
  I have an e-mail server on server B.
  I have Uptime Kuma on server C.
  I have an instance of Mattermost on server D.
  I have the application that I want to monitor on server E.
In a zero trust model (or even just running WireGuard) there is very little preventing you from having either on different cloud providers. There's also very little preventing you from having a setup like A-D on a few boxes that sit under your desk/colocated somewhere but having D in the cloud.

Thus, one can reason about the potential failure states:

  If servers C-E run into issues (say, Docker issues), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers C-E are utterly unreachable (say, network interface problems), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers A-B or E run into issues, I'll get a notification thanks to C and D (Uptime Kuma sending a message).
  In the current configuration, I wouldn't be protected against a compound failure of A-D (both Zabbix and Uptime Kuma down), but those might as well run on different clouds, with different orchestrators.
Of course, you can setup failover and redundancy options, but by that point you're probably also looking into distributed file systems for any backing storage like GlusterFS or Ceph but right now I don't need that complexity.

Furthermore, as you said, you can also rely on cloud services in addition to what you already have, so should A-D go down, then E will still be monitored by another solution as an alternative, though that's also hardly necessary for most things.

Hell, for all I care, I might as well have a Raspberry Pi on my desk that pings the servers, checks SSH connections, checks running Docker images, does a curl call and blinks and beeps aggressively when something isn't okay on servers that sit in a data center somewhere. It's not like there's not an endless amount of options. Of course, you can also go in the opposite direction and pick whatever is good enough, such as having A-B as a single server (or VM) and C-D as a single server (or VM), to not overcomplicate.


I know you can have all that :) All I say is that you must relay on externals if A-E are all on the same network as it may go down. Then your emails or other notif. channels wont work.

Be that as it may I think people generally tend to overkill redundancy. One can usually tolerate most of the regular services going down an hour or tow once every couple of years...


> All I say is that you must relay on externals if A-E are all on the same network as it may go down.

Thankfully, it's not too hard to take advantage of multiple networks in a hybrid/multi-cloud setup nowadays! Though, depending on the necessary access controls and auditing, such a setup might require slightly more work.

You do bring up an excellent point, though, about how it's a serious single point of failure in many systems out there, because personally I've also seen many setups like that (the majority of them, actually): I do suspect that in many cases that is indeed done for ease of use/convenience, even if it may lead to downtime.

Of course, in some cases downtime is acceptable, so I cannot argue that it can also make sense to choose such a simpler setup - for example, for having your own company's applications/monitoring for development environments all on the same network.

Though if this topology is retained at scale, things can get a bit interesting. On a similar note, I recall Bryan Cantrill doing an interesting presentation "Debugging Under Fire: Keep your Head when Systems have Lost their Mind" that talked about restarting their whole data center and the implications of that: https://youtu.be/30jNsCVLpAE


I've been using Site24x7 for at least 6 years now, probably closer to 10. They have a "free forever" plan that's good enough for small projects: https://www.site24x7.com

It's been rock solid, haven't had a single issue.

I'm at the point where I now want a bit more, such as a status page, more frequent polling, monitoring from different continents and a record of historical outages. So I'm either going to pony up for a paid Site24x7 plan (£7/m), or self-host something myself - Kuma looks awesome BTW, I hadn't come across it before!


OK, I'll bite: Why is Zabbix old-fashioned? What would you use instead? (I have ~60 bare metal servers of various vintages and OSes to monitor.)


It's a sentiment that I've heard a lot - Zabbix focuses a lot on a core set of functionality, which is mostly just monitoring the state of some number of servers/VMs, with some optional integrations.

While the metrics that come out of the box are nice, everything does feel a bit cumbersome, personally - such as creating network maps (which aren't automatically generated), dashboards for getting input on CPU/RAM/storage at a glance, as well as viewing web monitoring statistics.

Actually, previously I used it for web monitoring, though for some reason by default it does not allow you to have triggers for those that would send an e-mail once a site goes down, even if the performance information and configuration was passable.

But overall it's just views after views, nested inside of tables with weird UI/UX choices, such as having Update/Add buttons, both of which are necessary to persist changes (for example, in the web monitoring section when adding new monitoring steps and also wanting to save those changes).

Like another commenter suggested, there are systems out there that make ingesting and visualizing data more easy and sometimes also more pleasant... Though personally Zabbix is fine for what I need.


What made you go with Mattermost for relaying messages to your phone?


Mostly the fact that I already had an instance up and running for chatting with some people and because it was pretty trivial to integrate Uptime Kuma with it.

Sending e-mails would have also been a decent idea, but honestly configuring a Mattermost app/account/token is easier than messing around with a new mail account and all of the SMTP settings.

As for Mattermost itself, previously I also enjoyed running Rocket.Chat, though somehow Mattermost felt a bit more like Slack (what I'm used to), runs with PostgreSQL as a backing store and also has the Boards and Playbooks integrations, alongside lots of other goodies.

I do miss the ability to integrate Jitsi or something else in the actual interface like Rocket.Chat did, but it's not really a large problem overall.


Given the recurring recent HN posts about GitHub outages, I thought this was going to be a parody.


You might want to do the opposite and monitor GitHub from your infrastructure instead.


Thought the same. "Well, who's gonna monitor the monitors of the monitors?"


As someone that runs an uptime monitoring service business - you run several replicas of your stack (necessary to stay up during huge outages anyway), and monitor yourself.

Monitoring from another vendor helps too.


I think that's probably why its been posted here.. Some schadenfreude.


The lengths people will go to get free stuff. Technically cool, but I think it's a misuse of GitHub Actions.


It's also against their ToS.


Cool. Maybe cloudflare workers would be better than using GitHub actions?

Regardless of the hate on GitHub availability and the potential misuse of actions, it’s a very nice landing page and unique idea.

I also ran a Cron based pipeline on Gitlab once to execute some shell scripts and it worked well! Cheers


Maker here. AMA!

Previous discussion: https://news.ycombinator.com/item?id=25553445 (301 points, 82 comments)


I’ve tried it probably a year ago for a couple of weeks. It was very unreliable and slow for me. Change in uptime status were not picked up very fast and sometimes not at all. Should I give it another try?


In that case, I would say no. Not much has changed in the past few weeks, and GitHub Actions has become increasingly less reliable (i.e., wait times are higher than when we launched Upptime a year ago).

We're exploring a new CLI approach [1] which has the benefit of still running on GitHub Actions scheduled workflows and all bells and whistles like Slack notifications and opening issues, but it can be fully self-hosted with just a CLI command, and will always run in the background. Plus, it'll support GitLab and really any git repository and more status website features. When that's ready, perhaps that would be a better fit.

[1] https://github.com/upptime/cli


Do you have any plan to port this to gitlab in case github ban these kind of use?


Maker here. From another comment:

> We're exploring a new CLI approach [1] which has the benefit of still running on GitHub Actions scheduled workflows and all bells and whistles like Slack notifications and opening issues, but it can be fully self-hosted with just a CLI command, and will always run in the background. Plus, it'll support GitLab and really any git repository and more status website features. When that's ready, perhaps that would be a better fit.

[1] https://github.com/upptime/cli


GitHub Actions has really poor uptime itself. I have an action that is triggered every minute to check with an external API and update data & rebuild site as necessary; the number of times the action fails every month because the initial git checkout or some other internal thing fails is high. Occasionally things get spotty for half an hour or so at a time even though status is all green.


I love it. Its simple, natural and intuitive.

It would be great if you offer the same workflow on GitLab. Its scheduled pipelines default to 10 minutes, but on premise installations allow it to be set to run every minute. [1]

[1]: https://docs.gitlab.com/ee/ci/pipelines/schedules.html


I love, but also kind of fear, the amount of open infrastructure being built to run specifically on GitHub (and therefore Microsoft's) servers.


I don’t love it. Overall the amount of things now that just boil down to use github/aws/google has really decreased my enjoyment of building projects. Additionally i haven’t seen it provide some huge boost in productivity like everyone likes to assume is happening.


So build them on a VPS?


Part of business continuity is factoring and planning for risks.

We plan for the possibility that an earthquake might happen, or the building burns down.

By the same merit, if you're locked into a cloud provider or service, it's your responsibility to ensure that there's a plan in place to migrate away if necessary.


Sure it's already been done but I'm sure a similar thing could be made on Azure Functions well within the free tier per month. Can also host static web apps for free or even have GUIs served by the functions themselves. And as people stated before this seems a gross miss use of GitHub Actions. Whereas at least the above outlines a solution Microsoft actively encourages.


I feel like people are overstating this as gross misuse to be honest. Curling some http based endpoints on a CRON isn’t really a gross misuse of actions.

There are a million ways to do this. I could also tell you using Azure functions and having a dependency on Microsoft is overly complex for something that could be done with collectD and cron built into the OS on most VPSs


The ToS explicitly mentions this as being forbidden.


I built my own version entirely within AWS Lambda's free tier, works well until you're into "continuously running serverless" territory - at that point any random VM would be significantly cheaper.


OCI free tier would be good for this sort of thing.. their free tier is pretty nuts. You can have Ampere 4 OCPU + 24G RAM + 200G block storage spread across 1-4 instances.

https://www.oracle.com/cloud/free/


Yeah that's pretty wild. I managed to fit my entire serverless stack (roughly 1.2m checks per week) into 2x 256MB VMs and 1x 512MB VM, you could run a monster server on that.


AFAIK "5 minute crons" on GHA are "best effort" and may run anywhere between 5 and 20 minutes, and are also subject to some kind of "couldn't find a worker in time" skip/cancellation rules.


Maker here. Yes, this became a common-enough problem for me to write a small post: https://upptime.js.org/blog/2021/01/22/github-actions-schedu.... Some people tried with self-hosted runners on Actions [1], but that turned out not to be much better.

[1] https://github.com/upptime/upptime/issues/42


I agree there have been a few GitHub actions issues. We've been getting errors with UppTime error due to GitHub actions failing

However, I still think Upptime is amazing! It is simple, under your control, and does the job.

Here is a review I wrote a while back on Upptime.

https://www.ayrshare.com/upptime-monitor-status-website-api/


Is this relying on scheduled github actions? We are using those and they are not very accurate. We regularly see jobs completely skipped or run with a lot of delay.


It does rely on schedule. https://github.com/upptime/upptime/blob/c203e2fd45da62963d16...

And yes, schedule is terribly unreliable. 1h+ delays are routine, 3h+ delays have been observed, high frequency jobs are skipped all the time. You have to use an external trigger (that is to say, triggering the workflow_dispatch event through the API) for reliable scheduling.


This is very easy to setup and use. Github actions work very well.

However, Github seems to consider the cron schedule for actions to be a suggestion, instead of the timer you would expect it to be. As a result it is not uncommon for you to find that more than 20 minutes have elapsed between subsequent status checks.

I solved it by triggering Github actions using an external cron job on a server.


Based on the reliability of GitHub Actions, the status page can suffer quite a significant downtime as well...


Pretty unique way to use github actions for sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: