Hacker News new | past | comments | ask | show | jobs | submit login
Munin Website Redesigned (munin-monitoring.org)
119 points by ashitlerferad on April 20, 2021 | hide | past | favorite | 70 comments



Munin is my favorite monitoring tool. One reason (among many mentioned here already) is that its information density is higher than anything else. When your site goes down and you need to know why, the page full of tight graphs is incredibly helpful. You can scan everything without any clicks at all and see if the problem is disk, RAM, database locks, etc. This compactness is the main criterion Tufte uses throughout Visual Display of Quantitative Information. I don't think Munin is doing anything clever or original (nothing like the map of Napolean's march), but it is far more useful than most of the custom d3 "dataviz" stuff you see. And it's way faster to navigate than something like datadog. And since it's just serving static HTML, the page loads instantly.

I think the homepage was overdue for a redesign, but I hope they don't redesign the tool itself! :-)

If they are trying to improve their communication, one thing I would have appreciated when I got started years ago is a clearer tutorial-like explanation about where the client & server live, how both execute (daemon vs cron), and who opens a connection to whom. Just the basic big picture architecture. It seemed difficult to find that information. EDIT: This right here: "Munin has a master/node architecture in which the master connects to all the nodes at regular intervals and asks them for data." Great!

ANOTHER EDIT: I see they are emphasizing how easy it is to write plugins. I can attest to that. I've written plugins for Postgres and Phusion Passenger, and both were really simple. Also there are tons of "contrib" plugins that probably already have most of what you want (even beyond the extensive default system-level metrics).

AND ANOTHER: The graphs at http://demo.munin-monitoring.org/munin-monitoring.org/www.mu... are more painful to read than my existing deployments. They are 497x261, and my own graphs are 499x277. That doesn't seem like enough difference to matter. The font is smaller though. Having to squint to read the graphs makes the display a lot more effort to read. It's a small thing, but I could feel the difference immediately.


Totally agreed. The "static HTML" part is IMO munin's single most important feature. You can point any webserver at the output directory (or just download the output directory) and immediately browse the contents. There's no need for CGI or any daemon to be running in the backend, which is extremely helpful when you might be trying to diagnose why said daemon isn't running.

I also agree that documentation about the basic architecture and execution method needs improvement. At some point in the CentOS 7 lifecycle, a minor update to EPEL's munin package deleted its own crontab file, effectively disabling the master, without providing an obvious alternative such as a systemd timer. I couldn't find any explanation about why it did that.


Have you tried netdata? https://www.netdata.cloud/

I've switched to this for all hosted/decentralized monitoring.


The graphs appear to scale based on viewport size.


Great this pulls Munin solidly into the early 10's.

Jokes aside, I feel the lack of vertical whitespace makes the site feel a bit markety. Like it's trying to force a lot of marketing information in a small amount of vertical space.


I agree with your impression but potentially disagree with the solution if adding vertical whitespace means more scrolling for the same information.


Makes sense, as the website was redone in 2016-2017, but only live this week ;-)


oh boy rrdtool and perl, I feel like I'm in 2002 again


I LOVE munin. It's the only system that you can:

- deploy with an apt-get install

- publish to the whole Internet as static pages without any security risk

- maintain with OS package updates and zero additional effort

In a world of resume-driven development, simplicity and security are the first victims. I wish we had more tools like Munin.


Well, in the upcoming version of Munin, I move away from the static site, as it is very wasteful IMHO.

But, I agree that for a couple of servers, the security it procures might overtake the performance boost.

Let's add it again: https://github.com/munin-monitoring/munin/issues/1395


Thanks for your work on munin. I love it! And I appreciate your responsiveness here. Personally I don't care if my server wastes time generating HTML I never see. It's not enough work to make any difference. But I care a lot about seeing the HTML instantly when I want it, not waiting for anything to render.


I have a munin server for a fleet of a bit more than a dozen servers, and I'd be very sad if the ability to create a static site were to go.

That, and the ability to create "overall" charts (same metric for all servers in a group) as well as per-server per-metric charts is what keeps me solidly using munin.


Yeah, I don't really get the point of Munin now that we have things like Zabbix (coupled to grafana) or LibreNMS ?


Yeah, why would you use a tested system that just "works" if you could add some npm dependencies, node.js and a 3 month old database that is out of support already if you are not upgrading every week?


Zabbix dates back to 2001 and is written in C, PHP and Java. [and in my experience of it, has the typical monitoring system problem where the design seems to be "the more panicky alerts it spews, the more value it's creating"].

https://en.wikipedia.org/wiki/Zabbix


Because it just works. Grafana requires a lot of work to set up, especially if you want to do it in a secure way and provision it programmatically. Munin on the other hand is just an apt-get install plus some minor configuration. Yes, it is not as powerful but on the other hand it is much eaiser to set up correctly.


It just work, it is easy to deploy and easy to write plugin for. A combo like prometheus+grafana and other is very good too and definitely more modern. But, having used both, I often didn't need the extra feature that the more modern solution offer and I was more than happy to use munin, which was definitely easier to run and maintain (for me).


I use(d) both. While I appreciated Zabbix's configurability, it just takes too many clicks to add a basic probe with an alert or a new system. Meanwhile Munin has everything in flat config files (easy to deploy with ansible or equivalent), and has reasonable alerts configured by default.

And creating a munin plugin is as simple as writing a script that writes values to stdout (in whichever language you like), and copying it to /etc/munin/plugins/.


The advantage of zabbix is the templating and service discovery. You shouldn't have to add a new system to anything, it should just be found and monitored. The same goes for prometheus.

Monitoring new things with zabbix is also "write a value to stdout". Using telegraf and prometheus you have the same flexibility. I think these days you can even scrape prometheus exposition with zabbix. There are plenty of footguns with prometheus and zabbix, but they take big scale to run into.

But at the end of the day use what you're comfortable with.


Systems that are already configured and up and running for years have to be maintained and updated.


It's lightweight, it's flexible, it's hackable and it's very orchestrate-able. I've had dashboards showing summary stats for IoT devices, the VMs they're talking to and the hypervisors the VMs are running on with very little effort.


It showed up kind of late to the Munin game, but I wrote a library for making plugins in Go. https://github.com/alrs/muninplugin


Slightly off-topic, but from time to time my UX/UI agency offers free help/advice to spice up dev tools. Especially open source. So if you are working on a ‘boring’ but important tool that looks crappy (because you don’t have the budget to hire someone), send me a message. I’d be happy to offer some simple & actionable advice on how you can improve the design without hiring anyone. Details in bio.


I like Munin for small environments. Dead easy to set up and a very concise UI. I've written plugins .. I love it.

Where I get frustrated is scalability of RRDTool. Every data set has its own file so you get into this IO nightmare. I solved this for a while by storing them in a RAM disk.

Ultimately I set up InfluxDB+Grafana. It is a lot more work and last I tried, InfluxDB does not support downsampling of older data, so instead of two years of history we have 90 days. The collector from one version to the next renamed most of the datapoints so the Grafana dashboards broke ...

(Oh and of course also Munin looks like the 90s, which doesn't bother me, but sometimes gives decision makers cause to believe my solution is not hip.)


You should have a look at rrdcached. It basically is a writeback cache, with batching I/O.

We took special care not to flush the cache too much, so the improvements are very noticeable.

Yet, it works only with on-demand graphing, of course.


As I have slowly started to shift more workloads into my homelab I have started to ponder the right monitoring solution. I would like to be able to see hardware and OS metrics for my entire fleet in one area, regardless of the host, vm vs bare metal etc...

Munin, Monit, Nagios, Zabbix and a few other tools are on my list to try out.

My biggest concern is around the data management. Ideally the underlying datastore can be moved, expanded, etc... as I sort out how much history I want to keep.

What do y'all use for this? I have 2 VM hosts (esxi), 1 bare metal box, and a handful of ubnt networking equipment.


Does Munin itself use up a lot of resources? I am curious to try it out


It execs scripts for each metric and stores the data using rrdtool. If you are only collecting a few metrics and use a 5 minute collection interval it's probably fine.

A while ago I switched from munin to prometheus+node_exporter and with that went from collecting metrics every 5 minutes to more metrics every 15 seconds.

Munin is easy to operate initially. You can go from a script that outputs a number to a graph on your screen in a few minutes. It does get more complicated when you want to do something like graph the aggregate of a metric across a subset of machines. It is doable in the configuration, but a hell of a lot harder than just adding a widget in grafana and writing a simple promql query.


I have a munin-master running on an Atom C2338 (very low-end), which monitors about 20 hosts. The whole system uses about 300MB RAM and 40% of a CPU core. It's a little intensive because it recomputes half the images every 5 min, when the cron runs (that's configurable).

In terms of disk, munin uses 750MB, and it's not growing thanks to rrdtools; and it writes 1MB/s on average. Bandwidth usage is 5Mb/s on average (to get data from all munin nodes, also every 5 min in my config)

Munin nodes themselves are extremely lightweight. Essentially, the master calls the munin-node daemon via a TCP socket, which starts a plugin process and pipes its stdout to the socket. Plugins usually simply call a command like "df" or "netstat", parse its output, and write it in munin's format.


Munin is fairly light, but metrics are collected through running various scripts, which is generally more resource-intensive than the usual node-exporter(-like) daemons, so the frequency at which data is updated is usually much lower.


In my experience it's pretty lightweight. But never properly measured its impact. Just didn't notice any big impact to pay attention to it.


Who monitors the monitorers?!


Oh sweet! I have been having fun playing with Jekyll lately, glad to see it’s still gaining popularity.


I reckon you could take the page weight of this site down from 426KB to about 100KB with very little effort.

- There's 20KB of Bootstrap.css that could be replaced with native CSS that would be loads smaller.

- There's 96KB of jQuery that I can't see being used at all. Even if it is, it doesn't need to be the full fat version.

- There's a 138KB PNG image that can be shrunk to 91KB using OxiPNG (or 77KB as a WebP, or 52KB using AVIF) using a free tool like https://squoosh.app

- There's 80KB for the Material Design icons font that could be replaced with an optimized version that only includes the glyphs used on the page (using a free tool like https://www.fontsquirrel.com/tools/webfont-generator).


Your other points are great low-hanging fruit performance-wise, but replacing bootstrap in a site that has been fully designed around it goes well beyond "very little effort."


Could something like purgecss be effective with removing unused bootstrap classes?


I've never tried setting something like that up but Bootstrap itself is extremely modular and you can be selective about what you want bundled with it, so that's probably a more effective starting place: https://getbootstrap.com/docs/3.4/customize/


Glad to see you are volunteering to submit PRs to resolve these issues on an open source project


For me, when someone is critical of another person having spent their time checking out if there are any optimisations for something submitted to HN with, lets face it, snarky, "Well, raise a PR then", that's quite a shame.


For me, I see an open source project that probably has limited resources/time/volunteer effort to support it.

I read that comment as, "rip huge chunks of Bootstrap and jQuery out of it - should be quick"... well I am guessing given Bootstrap design patterns that they used these tools as a core library in-which they built their site around. They may be planning to leverage these tools in the future, or even just want to give themselves the option to... It's an over-simplification. Likely, this site got them to where they wanted to be... a much more modern-looking site without a ton of effort sunk into development.

It's also completely disingenuous to talk about these filesizes in pre-gzip numbers vs. post. When you look at these "optimizations" without taking into account gzip you are looking at a "worst case scenario" that doesn't exist - this site is gzipping it's assets over the wire. The jQuery gripe is more like 33.6Kb vs 96Kb when you look at it through the gzip lens.

---

This sort of response (and subsequent defense) is why I would never show my creations here. It's emotionally difficult for me to see such a stable and mature project as Munin accomplish a bit of modernization only to be nitpicked like this. Munin is a great tool.

"Raise a PR" is perfectly valid here when it comes down to it. I think the internet will handle the extra ~100Kb/page load just fine.


> They may be planning to leverage these tools in the future

If that's the case then creating a PR and raising it would be a waste of time, as it'd be rejected.


Raise an issue first, gauge the reaction, and then create a PR if you think they would welcome it.

Whether it's a website or some other open source project, maintainers usually don't like it when you create a massive refactoring PR without talking with them first.

If, luckily, the maintainers are on HN, I guess talking about the "problem" here could very well substitute for raising an issue on whatever platform they use.


Completely agree. It seems there's no repo for this site anyway, judging be other comments.


The question is if the feedback is in line with the announcement.

Basically the same a pointing out a spelling error on a single slide in a CS class. You are technically correct about the feedback, but it is misplaced and not worth spending time on in class.


Given that this post was about a website redesign I feel his comment seems on topic.


Personally I'd say the issues are not about the design, but infrastructure and potentially UX (dependent on who that target group is, how their usage pattern is, etc.).


It's not. For starters you need to get the repo, make a branch, do the changes add tests, you may need an account to raise the PR on whatever platform, then there's possibly a CLA to raise. Whereas whoever raised the submission in the first place probably has some interaction with the project, therefore mentioning this in HN could then be taken by the submitter and raised as a feature request so someone could make a PR.


Your comment prompted me to check to see if I could do that. The website build process doesn't appear to be documented and it's been built in a template system I'm not familiar with. So I can't.


That is actually something I'd expect gp know/check before suggesting submit pr/make issue/submit patch. I mean, if the website repo isn't available - the only "better" way to give feedback, would be to submit a link/the text of the hn comment to the project mail list/chat channel etc.

I certainly think "open an issue/please make a pr" is a fine suggestion regarding FOSS projects in general.

Ed: isn't this the repo, built via github workflows ? https://github.com/munin-monitoring/munin-monitoring.github....

Beyond that; is there much cause to spend (any) effort to go from 0.5mb to less, really? Especially considering gzip?


Sorry about that. I actually commented on it at

https://munin-monitoring.org/munin/website/2021/04/20/we-mad...


FWIW I did mean my comment more as a suggestion of where people could look for optimizations (in this site or their own sites) rather than a criticism of what you've made. I apologize for not communicating that better.


Are you going to recommend this for every website posted on HN?


Seems like a valid critique for a site redesign post


It's like telling da Vinci, at the moment of the reveal of the Mona Lisa, he should have used more paint so it can last 100k years instead of only 98k years. We are not dealing with a 10MB website with seconds long loading time and screaming for smaller images.


As far as HN comments go, this is way better than the HN traditional "Why should I use this?" or the always popular "I'm tired of new libraries coming out, your project is literally cancer".

Someone put time on giving actual technical feedback for improvement. There's lots of developers here. This might even get people interested in contributing to the project.


When the story is specifically about a website redesign, yes.


Yes, a website redesign, not a page speed overhaul project.


It seems a fair discussion point, even if you don’t agree that the payoff is worth the effort.

I don’t read it as a negative thing, and for me <500kB seems reasonable. Page size is an aspect of design – even if many websites are suffering far greater needless bloat than this one.


Performance is a feature, or at least it should be, of any good design.

Of course it might be a very low priority feature, so it would be valid to close any issue reports as "WONTFIX" if the devs have no time or feel they are better spending their time on other matters, especially if people on slow connections are not expected to be a significant part of the target audience, but commenting on potential performance issues is perfectly valid when discussing a site (re)design IMO.


I don't know why I even have broadband...


What is the HN equivalent of "slashdotted" ?


The "HN Hug."


For completeness, the generic expression: http://www.catb.org/~esr/jargon/html/F/flash-crowd.html


That's cool. I had never read that story, but it makes a lot of sense.

Thanks!


Would be nice if that tiny image on the top left of the homepage were clickable so I could see the screenshot full-size.


Huginn and Munin. From norse mythology.


Huginn is a pretty cool automation tool.

https://github.com/huginn/huginn


old style monitoring, already forgot it some years...


old style monitoring , such forgot it....


This looks like the "before" part of a "before and after" redesign comparison. Sorry, but it looks a decade or more older visually. (Only mentioning this because the title has "website redesigned" in it.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: