OK – Icinga monitoring is online

When the telephone rings…

During the last few months I have been rebuilding the infrastructure monitoring at Puzzle ITC. Monitoring enables us to proactively inform customers instead of being surprised when they call. 

As my apprenticeship at Puzzle ITC slowly came to an end, it was time to choose a topic for my thesis. I wanted to work on improving our  monitoring as entering new configuration into our existing Zenoss monitoring software was error prone and hard to track.

At Puzzle ITC our mindset is that configuration should not be edited manually. No one is allowed to change files or install programs by hand. Everything is clearly documented in code and the changes are always apparent together with information about by whom and when those changes were applied.

To allow such a high level of automation all configuration must be human-readable and located directly on the filesystem. Monitoring solutions that store their configuration in a database are out of the question. Automation makes auto-discovery obsolete, because all infrastructure components are defined as code and are already known. This makes it possible to automatically add monitoring whenever an application or website is deployed.

The topic of my Thesis is: Icinga2 Monitoring Cluster. I chose to research Icinga, because I assumed it would fulfill our requirements.

When thinking about open-source monitoring solutions, the first thing that comes to mind is Nagios. Nagios builds upon small scripts called plugins. Nagios itself can be described as a script scheduler and executor. The actual monitoring is done by each individual script. These scripts can be written in any language and must output the status of a single service each time they are executed. The success of monitoring software depends on the quantity and quality of community submitted scripts.

Icinga was originally a fork of Nagios and later became a full rewrite. The fork was created because of disagreements between contributors and the main Nagios developer. Icinga remains compatible with all Nagios plugins and supports multi-site deployments and clustering. The configuration syntax was completely rewritten to allow dynamic assignments and even scripting. It is now easy to apply a specific check on all Linux servers,

apply Service "ssh" {
  import "generic-service"
  check_command = "ssh"
  assign where host.address && host.vars.os == "Linux"

or increase the load threshold during a backup.

object Service "backupserver-load" {
  check_command = "load"
  host_name = "backupserver"

  vars.load_crit = {{
    if (get_time_period("backup").is_inside) {
      return 20
    } else {
      return 5

Notifications are another important aspect of every monitoring system, but if an email is sent for every alert, it can quickly become overwhelming. To combat this Icinga allows dependencies between services and hosts. These can be set up so that only the original source (a switch for example) of the failure creates a notification.

After completing my Thesis I used my acquired knowledge to integrate Icinga into the Puzzle ITC Infrastructure. We now have almost 2000 service checks and are adding new checks every day. I can definitely recommend Icinga.

Kommentare sind geschlossen.