Event-Driven Ansible in the Wild (Part 2/2)

In the first blog post of this miniseries, we’ve shown how to set up EDA-Server and CIQ’s Ascender to run ansible-rulebooks in an enterprise environment. In this second part, we use Prometheus to monitor a group of web servers, and reprovision faulty ones with the use of EDA-Server and Ascender.

We want our Prometheus server to check on two web servers (ansible group „web“) if the web pages they provide are available. We use the Prometheus Blackbox Exporter to check the health of these web pages. Once a web server fails to deliver its web page, Prometheus notifies EDA-Server through its alerting tool Alertmanager. EDA-Server evaluates this event (ie. the alert from Prometheus) and takes action if certain conditions (see below) are met: In our case, it triggers the job template (playbook) on Ascender that provisions the web servers.

 

Configurations

This blog post doesn’t explain how to configure Prometheus or Kubernetes in detail. If you’d like to know more about these tools, you can attend one of our workshops. In this example, we check the availability of the two URL’s http://puzzle-node1.workshop.puzzle.ch and http://puzzle-node2.workshop.puzzle.ch, and set up Prometheus accordingly. We configure the ansible.eda.alertmanager source plugin to listen on port 5000 (see rulebook below), but because EDA-Server runs on k3s in our example, we use a Kubernetes nodeport service to enable external access to the webhook (Port 32767 in our example). The offical way to go stated in the Red Hat documentation would be to use a Kubernetes Route Service. ansible.eda.alertmanager expects the events to be sent to „/endpoint“. This all leads to this webhook configuration in /etc/alertmanager/alertmanager.yml:

receivers:
- name: eda
  webhook_configs:
    - url: http://puzzle-edaserver.workshop.puzzle.ch:32767/endpoint

Once started, the rulebook waits for events, and starts a job template on Ascender as soon as an event with the alert name „WebsiteDown“ and status „firing“ is received. This results in the rulebook prom_rulebook.yml below:

---
- name: Prometheus Alertmanager
  hosts: web
  sources:
    - ansible.eda.alertmanager:
        host: 0.0.0.0
        port: 5000
  rules:
    - name: Reprovision Webserver if Alert from Prometheus
      condition:
        all:
          - event.alert.status == "firing"
          - event.alert.labels.alertname == "WebsiteDown"
      actions:
        - run_job_template:
            name: Provision_Webserver
            organization: Puzzle

Now we create a rulebook activation on EDA-Server using this rulebook. Remember to put your rulebook in the /rulebooks folder at the top-level of your git repository in order for EDA-Server to find it. Beware that EDA-Server will hide any rulebooks that contain syntax errors!

 

Reprovision only faulty Servers

If one web server fails, we don’t want to reprovision all the servers, but only the faulty one. The name of the faulty webserver can be extracted from the Prometheus alert. If a rulebook triggers the start of an ansible playbook, we can use information from the event source (alert) inside the playbook through the variable ansible_eda. In our case, the value of the faulty URL resides in ansible_eda.event.alert.labels.instance. We just cut „http://“ from the beginning of the string and since the remainder conveniently matches the name of the server in our inventory, we can readily use it to limit our playbook webserver.yml to just this server:

---
- hosts: "{{ ansible_eda.event.alert.labels.instance | regex_replace('^http://', '') }}"
  become: true
  tasks:
    - name: install httpd
      ansible.builtin.dnf:
        name:
          - httpd
          - firewalld
        state: installed
    - name: start and enable httpd
      ansible.builtin.service:
        name: httpd
        state: started
        enabled: yes
    - name: put default webpage
      ansible.builtin.copy:
#...
#rest omitted

Finally, we ensure that the rulebook activation is started:

Triggering the Alert

Now, when we stop the web server service on one of our web servers, Prometheus will use Alertmanager to inform EDA-Server about the Alert. Because the status of the alert is „firing“, and the alert name is „WebsiteDown“, the defined condition is met and the rulebook takes action by starting the job template „Provisioning_Webserver“ on Ascender. For the ansible_eda variable to be passed to the playbook, ensure that you checked the box “Prompt on launch” for the Variables field in the job template on Ascender (for details, see here):

EDA-Server receives the alert (1), evaluates if the condition is met (2) and informs Ascender to start the job template (3):

Ascender shows the execution of the job template on only the faulty webserver:



Are you interested in learning everything about these topics? Check out our brand new workshops here:

Kommentare sind geschlossen.