Network Automation: Our Journey from GUI to Git – Automating OPNsense Firewalls with Ansible
A lot has happened since the last blog post. As teased in our previous blog post, we successfully migrated our firewall setup to new hardware, using Ansible for most of the configurations. But how did we do it? What obstacles did we overcome, and what lessons did we learn along the way? In this blog post, you’ll find the answers to these questions.
Evaluating Hardware
Like with all evaluations, we began by analyzing the current setup. We reviewed the cabling and identified what the new hardware would need to support. Key factors included firewall throughput, threat protection performance, and port speed—especially since our switches use 25 Gbps ports.
With these criteria in mind, we evaluated various hardware options.
The final result of our evaluation consists of two different devices. One primary instance of a DEC4280 and one secondary cold standby instance DEC2752.
Evaluating Software
Obviously, we chose proprietary software for our firewalls. Just kidding – our clear choice was OPNsense. With the release of OPNsense 25.1 just around the corner, we decided to wait for it.
On one hand, we had to deprecate several of our own Ansible modules with a bit of a heavy heart.
On the other hand, we welcomed the improved API support and the introduction of Automation Rules (more on that later).
This also meant that we would use more of the API-based O-X-L OPNsense Collection (fka ansibleguy.opnsense) rather than our own Puzzle OPNsense Collection.
Examining the current configuration
Now it was time to get our hands dirty. Our current configuration had grown over time and was—maybe surprisingly to you, but definitely not to us, not thoroughly documented. So, we had to manually go through every setting in the GUI, asking ourselves two key questions:
-
Are these configurations still necessary?
-
Is there an Ansible module to manage this setting—and if not, is it worth creating one?
With this approach in mind, we combed through every configuration page.
To give you a sense of how medieval, our tactics were: three engineers from our team spent three weeks battling through aliases and rules—summarizing recurring patterns, restructuring configurations, and clearing out unnecessary entries, all with the help of a humble Calc sheet.
Hardware Setup
For our setup, we chose to configure the new instance in parallel with the current one. This approach allowed us to incrementally add configurations to the new instance (of course, after testing most of them using Molecule instances).
Ansible Setup
Inside our current Ansible automation setup for our infrastructure, we have started to build the configuration for the new firewall inside its host vars. The configuration will then be applied using an Ansible role we have built that combines the collections mentioned above to apply the configurations in such a way that dependencies of different configurations are considered.
The role we wrote consists of the following tasks, of which we will explain the most interesting ones:
---
- name: OPNsense configuration
module_defaults:
group/ansibleguy.opnsense.all:
firewall: "{{ ansible_host }}"
ssl_verify: false
api_key: "{{ opnsense_api_key }}"
api_secret: "{{ opnsense_api_secret }}"
block:
- name: Configure system settings
puzzle.opnsense.system_settings_general:
hostname: "{{ opnsense_config.system.settings.general.hostname }}"
domain: "{{ opnsense_config.system.settings.general.domain }}"
timezone: "{{ opnsense_config.system.settings.general.timezone }}"
- name: Configure unbound DNS
ansible.builtin.import_tasks: unbound.yml
tags: opnsense_unbound
- name: configure interfaces
ansible.builtin.import_tasks: interfaces.yml
tags: opnsense_interfaces
- name: configure packages
ansible.builtin.import_tasks: packages.yml
- name: Configure logging
ansible.builtin.import_tasks: logging.yml
tags: opnsense_logging
- name: Configure aliases
ansibleguy.opnsense.alias: # noqa: args[module]
name: "{{ alias.name }}"
description: "{{ alias.description | default(omit) }}"
content: "{{ alias.content }}"
type: "{{ alias.type }}"
state: "{{ alias.state | default('present') }}"
with_items: "{{ opnsense_config.firewall.aliases }}"
loop_control:
loop_var: alias
tags: opnsense_aliases
- name: Configure IPSec
ansible.builtin.import_tasks: ipsec.yml
tags: opnsense_ipsec
- name: Configure Gateways
ansible.builtin.import_tasks: gateways.yml
tags: opnsense_gateways
- name: Configure Free Range Routing
ansible.builtin.import_tasks: frr.yml
tags: opnsense_frr
- name: Configure firewall rules
module_defaults:
ansibleguy.opnsense.rule:
match_fields: [ "interface", "sequence", "description" ]
reload: false # do not apply changes per default
ansible.builtin.import_tasks: rules_main.yml
tags: opnsense_rules
- name: Configure NAT
ansible.builtin.import_tasks: nat.yml
tags: opnsense_nat
Let’s take a look at how firewall rules are managed.
Firewall Rules
Managing firewall rules with the help of the ansibleguy.opnsense.rule module comes with a few advantages.
Since it configures the rules using the API we now find all the rules we manage with it in the Firewall>Automation>Filter page.
This page comes with a handy search functionality and combines floating rules with interface specific rules.
Maybe the biggest advantage, however, is the ability to have control over manual created rules.
Looking at the ‚traditional‘ rule view of an interface, you can notice a new collapsed category of rules called ‚Rules from Automation‘.
This means any manually created rule, whether wanted or unwanted, will be visible at a glance in the interface view.
For our firewall instance, we created the host vars using the following structure:
interface_names:
LAN: "opt1"
WIFI: "opt2"
DMZ: "opt3"
WAN: "opt4"
opnsense_config:
firewall:
aliases:
- name: NET_RFC1918
content:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
rules:
floating:
- description: Allow TCP DNS IPv4 requests to firewall
interface:
- "{{ interface_names.LAN }}"
- "{{ interface_names.WIFI }}"
- "{{ interface_names.DMZ }}"
- "{{ interface_names.WAN }}"
protocol: TCP
destination_net: (self)
destination_port: 53
sequence: 1
DMZ:
- description: Block RFC1918
action: block
source_net: "{{ interface_names.DMZ }}"
destination_net: NET_RFC1918
sequence: 101
With the rules now defined, let’s take a look at how we apply them to the firewall.
The challenge here lies in how to track changes when applying the rules to the firewall.
Using Ansible, we can define how the uniqueness of a given rule is determined by configuring the match_fields of the ansibleguy.opnsense.rule module.
However, there is no way to track changes to rules that change one of the uniqueness fields.
To prevent the build up of orphaned rules, we have to detect them and delete them from the firewall.
We solved this problem by implementing the following sequence in our Ansible role:
1. Write the rules to the firewall without applying the changes
2. Read all the rules from the firewall
3. Detect orphaned rules by comparing the host’s rule data with the rules from the firewall
4. Delete orphaned rules
5. Apply the changes to the firewall
In the task file it then looks like this:
---
- name: Set a new fact for rules
ansible.builtin.set_fact:
firewall_rules: []
- name: Apply Interfaces by interface group
ansible.builtin.include_tasks: rules_group_deploy.yml
loop: "{{ opnsense_config.firewall.rules | dict2items }}"
loop_control:
loop_var: rule_group
index_var: rule_group_number
- name: Get current rules
ansibleguy.opnsense.list:
target: "rule"
register: existing_rules_raw
check_mode: false
- name: Prepare existing and target rules for comparison
ansible.builtin.set_fact:
existing_rules: "{{ existing_rules_raw.data | ansible.utils.keep_keys(target=opnsense_rule_match_fields) }}"
target_rules: "{{ firewall_rules | ansible.utils.keep_keys(target=opnsense_rule_match_fields) }}"
# in order to be able to compare the existing_rules and target_rules the sequence attribute
# of the target_rules must be cast to int since existing_rules are provided with an int as the
# sequence from the OPNsense API.
- name: Cast sequence to int for target rules
ansible.builtin.set_fact:
new_rules: >-
{{ new_rules | default([]) + [item | combine({'sequence': (item.sequence | int)})] }}
loop: "{{ target_rules }}"
- name: Detect orphaned rules
ansible.builtin.set_fact:
orphaned_rules: "{{ existing_rules | difference(new_rules) }}"
- name: Delete orphaned rules
ansibleguy.opnsense.rule:
description: "{{ orphaned_rule.description }}"
interface: "{{ orphaned_rule.interface }}"
sequence: "{{ orphaned_rule.sequence }}"
state: "absent"
match_fields: "{{ opnsense_rule_match_fields }}"
reload: false # do not apply changes per default
loop: "{{ orphaned_rules }}"
loop_control:
label: "{{ orphaned_rule.description }}"
loop_var: orphaned_rule
- name: Apply rule changes
ansibleguy.opnsense.reload:
target: rule
With these tasks implemented,, we have achieved that our firewall rules can be managed as code and automatically be deployed using Ansible.
Migration
With the new role implemented and tested, we were ready to perform the switch over of our firewalls.
Some configurations, however, could not be tested in advance.
For example, we configured the IPSec tunnels and FRR routes on the new firewall but kept the corresponding services disabled since they would interfere with the existing setup.
Other configurations do not yet have a dedicated Ansible module, which meant that we had to apply the changes manually.
During the migration, we took extra care to plan and document these manual configurations in advance, such that the migration process would be as smooth as possible.
Since we expected some downtime while we migrated the firewall instances, we planned for the migration to be done outside of normal business hours.
So with everything in place, we were ready to start the migration.
The plan sounded simple: Unplug the ISP link from the old firewall, plug it in the new one and run the playbook against the new firewall to enable all services. (Ordering pizza was an integral part of the process as well).
That’s the point where the problems started to appear. Let’s take a look at what happened.
Lessons learned
The first thing that did not work as expected was the 25Gbps Fiber Uplink between the Core Switch and the Firewall.
During the migration, we could not identify the cause of the problem, but configuring the link to 10Gbps did the trick.
Later, we were able to identify the problem:
The auto negotiation of the FEC modes between the core switch and the firewall interfaces did not work.
By explicitly setting the FEC modes to RS-FEC on both ends of the link allowed us to get the 25Gbps link up and running again.
Another thing that did not work as expected was that we seemed to miss routes from our secondary site.
This, however, was simply due to us forgetting to configure the necessary static routes on the new firewall.
All things considered, however, we were astonished that this seemed to be the only thing we missed.
Given the original state of lacking and distributed documentation of the old setup, we managed to perform a smooth migration.
Results
We’ve automated roughly 80% of our firewall configuration using Ansible, while the remaining 20% covers edge cases that are not relevant to our day-to-day operational needs.
For example, we do not expect to change our LDAP server configuration on our OPNSense instances often, but we do expect to have frequent changes to firewall rules.
Our goal, however, is to gradually reach 100% automation in order to achieve a fully declarative firewall configuration.
This would help us further improve consistency, version control, and simplifying firewall deployment and recovery processes.
Next steps and outlook
We still have several cloud firewall instances left to set up. Luckily, with the lessons learned and our newly created Ansible structure, this will take much less time.
Also, automated testing is something we have been missing so far and have not yet implemented.
This is definitely something we want to do next.
Since the majority of our core network devices are now managed through Ansible, we want to give you the opportunity to get a feel for what it’s like to do network automation using GitOps.
Stay tuned for the next post in our network automation series, where we will take a closer look at our GitOps approach.