I've started to look for a monitoring infrastructure for all IT things, because:

  • I'm self-hosting this blog and want to know when it fails
  • I have home automation (KNX + NodeRed) and I also want to know if there are issues
  • I have periodic back-ups set up
  • I have a host of docker-based services (e.g. SonarQube, paperless ngx, outline)
  • I want to set up a Raspberry Pi cluster with RPI4s and the Compute blade

This gives me a bunch of requirements:

  1. Cross-architecture
  2. Multi-site
  3. Multi-framework (raw, vm, docker, kubernetes)
  4. Capable of sending data to grafana (via prometheus, influxdb)
  5. Capable of sending notifications (e.g. via callback, api...)
  6. Pretty graphs (optional because of grafana)

So, I have a list of initial candidates:

  1. NetData - good old infrastructure monitoring framework
    Netdata is high-fidelity infrastructure monitoring and troubleshooting.
    Open-source, free, preconfigured, opinionated, and always real-time.
  2. CheckMK - monitors everything via agents
    Quickly gain a complete view of your IT infrastructure, no matter how complex.
  3. Zabbix - Another monitoring tool
    Get a single pane of glass view of your whole IT infrastructure stack

There are also other tools which would worth investigating:

  • Prometheus + grafana - I already have grafana (+influxdb right now) and it's great. We'd need something to feed data to prometheus though.
  • Glances - It's a python-based framework to get OS-level metrics. It looks like I'd need to complement it with some docker/kubernetes-level info.
  • Nagios - infrastructure monitoring. Long time ago when I did network monitoring, this was the tool to go.

Now, let the games bagin!