Munin is my favorite monitoring tool. One reason (among many mentioned here already) is that its information density is higher than anything else. When your site goes down and you need to know why, the page full of tight graphs is incredibly helpful. You can scan everything without any clicks at all and see if the problem is disk, RAM, database locks, etc. This compactness is the main criterion Tufte uses throughout Visual Display of Quantitative Information. I don't think Munin is doing anything clever or original (nothing like the map of Napolean's march), but it is far more useful than most of the custom d3 "dataviz" stuff you see. And it's way faster to navigate than something like datadog. And since it's just serving static HTML, the page loads instantly.
I think the homepage was overdue for a redesign, but I hope they don't redesign the tool itself! :-)
If they are trying to improve their communication, one thing I would have appreciated when I got started years ago is a clearer tutorial-like explanation about where the client & server live, how both execute (daemon vs cron), and who opens a connection to whom. Just the basic big picture architecture. It seemed difficult to find that information. EDIT: This right here: "Munin has a master/node architecture in which the master connects to all the nodes at regular intervals and asks them for data." Great!
ANOTHER EDIT: I see they are emphasizing how easy it is to write plugins. I can attest to that. I've written plugins for Postgres and Phusion Passenger, and both were really simple. Also there are tons of "contrib" plugins that probably already have most of what you want (even beyond the extensive default system-level metrics).
AND ANOTHER: The graphs at http://demo.munin-monitoring.org/munin-monitoring.org/www.mu... are more painful to read than my existing deployments. They are 497x261, and my own graphs are 499x277. That doesn't seem like enough difference to matter. The font is smaller though. Having to squint to read the graphs makes the display a lot more effort to read. It's a small thing, but I could feel the difference immediately.
Totally agreed. The "static HTML" part is IMO munin's single most important feature. You can point any webserver at the output directory (or just download the output directory) and immediately browse the contents. There's no need for CGI or any daemon to be running in the backend, which is extremely helpful when you might be trying to diagnose why said daemon isn't running.
I also agree that documentation about the basic architecture and execution method needs improvement. At some point in the CentOS 7 lifecycle, a minor update to EPEL's munin package deleted its own crontab file, effectively disabling the master, without providing an obvious alternative such as a systemd timer. I couldn't find any explanation about why it did that.
Great this pulls Munin solidly into the early 10's.
Jokes aside, I feel the lack of vertical whitespace makes the site feel a bit markety. Like it's trying to force a lot of marketing information in a small amount of vertical space.
Thanks for your work on munin. I love it! And I appreciate your responsiveness here. Personally I don't care if my server wastes time generating HTML I never see. It's not enough work to make any difference. But I care a lot about seeing the HTML instantly when I want it, not waiting for anything to render.
I have a munin server for a fleet of a bit more than a dozen servers, and I'd be very sad if the ability to create a static site were to go.
That, and the ability to create "overall" charts (same metric for all servers in a group) as well as per-server per-metric charts is what keeps me solidly using munin.
Yeah, why would you use a tested system that just "works" if you could add some npm dependencies, node.js and a 3 month old database that is out of support already if you are not upgrading every week?
Zabbix dates back to 2001 and is written in C, PHP and Java. [and in my experience of it, has the typical monitoring system problem where the design seems to be "the more panicky alerts it spews, the more value it's creating"].
Because it just works. Grafana requires a lot of work to set up, especially if you want to do it in a secure way and provision it programmatically. Munin on the other hand is just an apt-get install plus some minor configuration. Yes, it is not as powerful but on the other hand it is much eaiser to set up correctly.
It just work, it is easy to deploy and easy to write plugin for.
A combo like prometheus+grafana and other is very good too and definitely more modern. But, having used both, I often didn't need the extra feature that the more modern solution offer and I was more than happy to use munin, which was definitely easier to run and maintain (for me).
I use(d) both. While I appreciated Zabbix's configurability, it just takes too many clicks to add a basic probe with an alert or a new system. Meanwhile Munin has everything in flat config files (easy to deploy with ansible or equivalent), and has reasonable alerts configured by default.
And creating a munin plugin is as simple as writing a script that writes values to stdout (in whichever language you like), and copying it to /etc/munin/plugins/.
The advantage of zabbix is the templating and service discovery. You shouldn't have to add a new system to anything, it should just be found and monitored. The same goes for prometheus.
Monitoring new things with zabbix is also "write a value to stdout". Using telegraf and prometheus you have the same flexibility. I think these days you can even scrape prometheus exposition with zabbix. There are plenty of footguns with prometheus and zabbix, but they take big scale to run into.
But at the end of the day use what you're comfortable with.
It's lightweight, it's flexible, it's hackable and it's very orchestrate-able. I've had dashboards showing summary stats for IoT devices, the VMs they're talking to and the hypervisors the VMs are running on with very little effort.
Slightly off-topic, but from
time to time my UX/UI agency offers free help/advice to spice up dev tools. Especially open source. So if you are working on a ‘boring’ but important tool that looks crappy (because you don’t have the budget to hire someone), send me a message. I’d be happy to offer some simple & actionable advice on how you can improve the design without hiring anyone. Details in bio.
I like Munin for small environments. Dead easy to set up and a very concise UI. I've written plugins .. I love it.
Where I get frustrated is scalability of RRDTool. Every data set has its own file so you get into this IO nightmare. I solved this for a while by storing them in a RAM disk.
Ultimately I set up InfluxDB+Grafana. It is a lot more work and last I tried, InfluxDB does not support downsampling of older data, so instead of two years of history we have 90 days. The collector from one version to the next renamed most of the datapoints so the Grafana dashboards broke ...
(Oh and of course also Munin looks like the 90s, which doesn't bother me, but sometimes gives decision makers cause to believe my solution is not hip.)
As I have slowly started to shift more workloads into my homelab I have started to ponder the right monitoring solution. I would like to be able to see hardware and OS metrics for my entire fleet in one area, regardless of the host, vm vs bare metal etc...
Munin, Monit, Nagios, Zabbix and a few other tools are on my list to try out.
My biggest concern is around the data management. Ideally the underlying datastore can be moved, expanded, etc... as I sort out how much history I want to keep.
What do y'all use for this? I have 2 VM hosts (esxi), 1 bare metal box, and a handful of ubnt networking equipment.
It execs scripts for each metric and stores the data using rrdtool. If you are only collecting a few metrics and use a 5 minute collection interval it's probably fine.
A while ago I switched from munin to prometheus+node_exporter and with that went from collecting metrics every 5 minutes to more metrics every 15 seconds.
Munin is easy to operate initially. You can go from a script that outputs a number to a graph on your screen in a few minutes. It does get more complicated when you want to do something like graph the aggregate of a metric across a subset of machines. It is doable in the configuration, but a hell of a lot harder than just adding a widget in grafana and writing a simple promql query.
I have a munin-master running on an Atom C2338 (very low-end), which monitors about 20 hosts. The whole system uses about 300MB RAM and 40% of a CPU core. It's a little intensive because it recomputes half the images every 5 min, when the cron runs (that's configurable).
In terms of disk, munin uses 750MB, and it's not growing thanks to rrdtools; and it writes 1MB/s on average. Bandwidth usage is 5Mb/s on average (to get data from all munin nodes, also every 5 min in my config)
Munin nodes themselves are extremely lightweight. Essentially, the master calls the munin-node daemon via a TCP socket, which starts a plugin process and pipes its stdout to the socket. Plugins usually simply call a command like "df" or "netstat", parse its output, and write it in munin's format.
Munin is fairly light, but metrics are collected through running various scripts, which is generally more resource-intensive than the usual node-exporter(-like) daemons, so the frequency at which data is updated is usually much lower.
I reckon you could take the page weight of this site down from 426KB to about 100KB with very little effort.
- There's 20KB of Bootstrap.css that could be replaced with native CSS that would be loads smaller.
- There's 96KB of jQuery that I can't see being used at all. Even if it is, it doesn't need to be the full fat version.
- There's a 138KB PNG image that can be shrunk to 91KB using OxiPNG (or 77KB as a WebP, or 52KB using AVIF) using a free tool like https://squoosh.app
- There's 80KB for the Material Design icons font that could be replaced with an optimized version that only includes the glyphs used on the page (using a free tool like https://www.fontsquirrel.com/tools/webfont-generator).
Your other points are great low-hanging fruit performance-wise, but replacing bootstrap in a site that has been fully designed around it goes well beyond "very little effort."
I've never tried setting something like that up but Bootstrap itself is extremely modular and you can be selective about what you want bundled with it, so that's probably a more effective starting place: https://getbootstrap.com/docs/3.4/customize/
For me, when someone is critical of another person having spent their time checking out if there are any optimisations for something submitted to HN with, lets face it, snarky, "Well, raise a PR then", that's quite a shame.
For me, I see an open source project that probably has limited resources/time/volunteer effort to support it.
I read that comment as, "rip huge chunks of Bootstrap and jQuery out of it - should be quick"... well I am guessing given Bootstrap design patterns that they used these tools as a core library in-which they built their site around. They may be planning to leverage these tools in the future, or even just want to give themselves the option to... It's an over-simplification. Likely, this site got them to where they wanted to be... a much more modern-looking site without a ton of effort sunk into development.
It's also completely disingenuous to talk about these filesizes in pre-gzip numbers vs. post. When you look at these "optimizations" without taking into account gzip you are looking at a "worst case scenario" that doesn't exist - this site is gzipping it's assets over the wire. The jQuery gripe is more like 33.6Kb vs 96Kb when you look at it through the gzip lens.
---
This sort of response (and subsequent defense) is why I would never show my creations here. It's emotionally difficult for me to see such a stable and mature project as Munin accomplish a bit of modernization only to be nitpicked like this. Munin is a great tool.
"Raise a PR" is perfectly valid here when it comes down to it. I think the internet will handle the extra ~100Kb/page load just fine.
Raise an issue first, gauge the reaction, and then create a PR if you think they would welcome it.
Whether it's a website or some other open source project, maintainers usually don't like it when you create a massive refactoring PR without talking with them first.
If, luckily, the maintainers are on HN, I guess talking about the "problem" here could very well substitute for raising an issue on whatever platform they use.
The question is if the feedback is in line with the announcement.
Basically the same a pointing out a spelling error on a single slide in a CS class. You are technically correct about the feedback, but it is misplaced and not worth spending time on in class.
Personally I'd say the issues are not about the design, but infrastructure and potentially UX (dependent on who that target group is, how their usage pattern is, etc.).
It's not. For starters you need to get the repo, make a branch, do the changes add tests, you may need an account to raise the PR on whatever platform, then there's possibly a CLA to raise. Whereas whoever raised the submission in the first place probably has some interaction with the project, therefore mentioning this in HN could then be taken by the submitter and raised as a feature request so someone could make a PR.
Your comment prompted me to check to see if I could do that. The website build process doesn't appear to be documented and it's been built in a template system I'm not familiar with. So I can't.
That is actually something I'd expect gp know/check before suggesting submit pr/make issue/submit patch. I mean, if the website repo isn't available - the only "better" way to give feedback, would be to submit a link/the text of the hn comment to the project mail list/chat channel etc.
I certainly think "open an issue/please make a pr" is a fine suggestion regarding FOSS projects in general.
FWIW I did mean my comment more as a suggestion of where people could look for optimizations (in this site or their own sites) rather than a criticism of what you've made. I apologize for not communicating that better.
It's like telling da Vinci, at the moment of the reveal of the Mona Lisa, he should have used more paint so it can last 100k years instead of only 98k years. We are not dealing with a 10MB website with seconds long loading time and screaming for smaller images.
As far as HN comments go, this is way better than the HN traditional "Why should I use this?" or the always popular "I'm tired of new libraries coming out, your project is literally cancer".
Someone put time on giving actual technical feedback for improvement. There's lots of developers here. This might even get people interested in contributing to the project.
It seems a fair discussion point, even if you don’t agree that the payoff is worth the effort.
I don’t read it as a negative thing, and for me <500kB seems reasonable. Page size is an aspect of design – even if many websites are suffering far greater needless bloat than this one.
Performance is a feature, or at least it should be, of any good design.
Of course it might be a very low priority feature, so it would be valid to close any issue reports as "WONTFIX" if the devs have no time or feel they are better spending their time on other matters, especially if people on slow connections are not expected to be a significant part of the target audience, but commenting on potential performance issues is perfectly valid when discussing a site (re)design IMO.
This looks like the "before" part of a "before and after" redesign comparison. Sorry, but it looks a decade or more older visually.
(Only mentioning this because the title has "website redesigned" in it.)
I think the homepage was overdue for a redesign, but I hope they don't redesign the tool itself! :-)
If they are trying to improve their communication, one thing I would have appreciated when I got started years ago is a clearer tutorial-like explanation about where the client & server live, how both execute (daemon vs cron), and who opens a connection to whom. Just the basic big picture architecture. It seemed difficult to find that information. EDIT: This right here: "Munin has a master/node architecture in which the master connects to all the nodes at regular intervals and asks them for data." Great!
ANOTHER EDIT: I see they are emphasizing how easy it is to write plugins. I can attest to that. I've written plugins for Postgres and Phusion Passenger, and both were really simple. Also there are tons of "contrib" plugins that probably already have most of what you want (even beyond the extensive default system-level metrics).
AND ANOTHER: The graphs at http://demo.munin-monitoring.org/munin-monitoring.org/www.mu... are more painful to read than my existing deployments. They are 497x261, and my own graphs are 499x277. That doesn't seem like enough difference to matter. The font is smaller though. Having to squint to read the graphs makes the display a lot more effort to read. It's a small thing, but I could feel the difference immediately.