Automating management of Linux infrastructure with Ansible

19. avgust 2022 - Avtorja Justin Činkelj, Jasna Simončič

19. avgust 2022
Avtorja Justin Činkelj, Jasna Simončič

This post was originally published on the XLAB Steampunk blog.

Ansible can be used to automate anything, from configuration management, application deployment, provisioning, orchestration, and networking to security. In this post, we will focus on one specific use case: setting up and updating Linux with Ansible.

Problem

Our client needed help with optimizing the management of their server fleet. The existing environment consists of a Microsoft Active Directory (AD) domain with Windows and Linux servers. The network is isolated. There is no connection to the Internet except for a few select services. For example, operating system updates are retrieved from the Internet but by a single dedicated server acting as an application-level firewall and a caching proxy.

They had a perfectly fine procedure for keeping their Linux servers updated, but the problem was that the administrator had to follow the rules exactly. That takes time and mental effort. It sounds easy to update Linux OS, right? You just run yum update or apt-get update (or was it upgrade :D) and reboot. But should you reboot even if a reboot is not needed? Have you checked if old kernel files in /boot need to be deleted? Many questions come up, even when it comes to seemingly simple tasks.

To install a new server from scratch, the client has developed a rather nice bash script. It gets the job done. However, it also requires user interaction, such as entering the password needed to connect a server to the Active Directory domain. If the Bash script is interrupted in the middle of execution or fails, you need to run it again. The next time you run it, you know that there will be errors in the first half of the script (user cannot be added because it already exists; domain cannot be joined because it has already been joined). Surely you will read all the errors carefully and decide if they can be safely ignored?

All of this is just a nuisance that someone has to deal with. Hopefully not your administrators every single day, as they have a lot of other work to do. Instead, these operations can be easily automated with Ansible and Red Hat Ansible Automation Platform (AAP).

Solution

To help the client automate setting up and updating Linux with Ansible, we have:

Developed an Ansible Playbook for an initial Linux server setup, including:
- joining server to Active Directory domain,
- configuring antivirus agent,
- configuring system monitoring agent.* developed an Ansible playbook for Linux server upgrade, including:
- upgrading what needed to be upgraded,
- rebooting the server, only when it is needed.

But before the playbooks could be run against servers, we needed to get the list of servers from AD. This server list is called “inventory” in the Ansible world. For a quick test on one or two servers, we wrote a test inventory - a text file about 10 lines long. Next, we added all servers to the inventory and we kept it in sync with the AD server. Or you could choose the alternative - asking your helpful admins, who have nothing better to do, to get up at 02:00 every morning to update the inventory. Surprisingly, when presented with this option they usually refuse and mutter something angrily to themselves. Go figure.

Alternative to manually updating inventory file is Ansible dynamic inventory. Dynamic inventories are able to retrieve list of servers from your infrastructure provider, for example AWS EC2 cloud or VMware ESXi on-premise deployment. In this use case the dynamic inventory plugin will need to obtain server list from the Active Directory. We have three additional tasks:

implement Active Directory Ansible inventory plugin,
setup a CI testing environment for AD inventory plugin,
setup a CI testing environment for developed playbooks.

Setting up CI testing environment

We did not test only in the client’s staging environment because it is on an isolated network. While developers would gain some time by not implementing a proper CI testing strategy, they would spend just as much, if not more, time moving files back and forth between their workstation and the client’s environment. Not to mention that the code/test cycle would be interrupted by lengthy “copy file(s)” and concentration would be lost, which is even more frustrating.

Instead, we used Samba to test the AD inventory plugin, as it provides the “AD controller” mode. We created a Dockerfile to create a functional “mockup AD controller” container. Mockup uses predefined credentials, a custom (e.g., self-signed) certificate authority, and exposes LDAP and LDAPS port. Only the AD content is missing in the container image. It is injected later, at container start time, when a “data-fixture” file is attached to the container.

We asked the client for some explanations about their AD, so we could create AD “data-fixture” with appropriate structure. They explained how their servers are organized in AD - which groups the servers are in, CA certificate for LDAPS connection, etc. Then we logged into their AD and exported the contents of 2 or 3 test servers. This was enough to prepare a “data-fixture” describing a few hypothetical mockup Linux servers.

The mockup Linux servers in the previous paragraph were hypothetical. At this stage, we have changed them to real mockup servers. The server had to be updated, so we wrote a docker-compose to spin up a container with the correct (and outdated) image, made sure the correct IP was used, and Ansible was able to log into the container, so we had something to test our code on.

Implementing Ansible Active Directory plugin

Because the client was using a specific way of registering their infrastructure resources which has been in place for a long period of time, we have implemented a new dynamic inventory plugin to help administrators avoid having to manually select which server to execute tasks on each time.

So, with testing environment running on the laptop, we started developing the AD inventory plugin. First, we defined API:

what the plugin needed to generate as output and,
what needed to be given to it as input.

The output is pretty clear - it is dictated by Ansible and is well documented. Essentially, it is a list of hosts, each described by a DNS name or IP, login credentials, and possibly other (optional) attributes.

Input is dictated by AD server. We needed:

AD directory IP address or URL,
bind DN and bind password,
location of servers in the AD tree,
CA certifacate chain:
- this is required when LDAPS connection is being used (as it should be) with self-signed certificate authority.

AD directory plugin itself is just regular Python code. It is tested against our mockup AD server, so we are confident it works with the real AD server as well. In the end, they both speak the LDAP protocol, right?

Then it was time to deliver the plugin to the client. The days of instructions like “copy the zip file over, extract it to directory /a/b/c, etc.” are over. Now we have Ansible Collections. So, we developed an Ansible Collection. After plugin becomes part of collection it can be installed with a single command.

The dynamic inventory plugin enables to collect, filter, and group resources based on users’ configuration, so now the administrators can retrieve all the servers that match their criteria with a simple query, and then automate them in a seamless way.

After developing the dynamic inventory, we also integrated the plugin into Ansible Automation Platform and provided the documentation and a sample snippet on how to use it with AAP. While it is true one must only read the official documentation, it is also easier to just copy-paste the snippet into your AAP.

Developing Ansible playbooks for initial OS setup and subsequent updates

This was a regular “automate these steps, please” job. Upgrading a Linux machine is a single playbook. The “is reboot needed” question is certainly interesting. RHEL needs-restarting command for sure does an interesting hunting for processes still using outdated libraries. But this is not topic for today.

However, the initial setup of OS consists of a few different parts. As mentioned earlier, the client uses their own CA authority. And they use commercial antivirus solution. What do these two have in common? Absolutely nothing, so it’s important to not lump them together.

We implemented each part as a dedicated Ansible role, independent of other roles. We then created a single playbook for each role. We ended up with about 5 small playbooks, not just one big one. Why? Because in AAP, we combined all the small playbooks into a single workflow. This means that:

A few of those playbooks can and should run in parallel.
If one of those playbooks fail, the others should still be run on a particular host. For example, if monitoring setup fails (maybe because monitoring server is unavailable for a short time), we still want to get antivirus setup.
Later we can create another workflow, using only a subset of those playbooks. For example, maybe we need to only reconfigure antivirus agent.

Setting up and configuring Ansible Automation Platform

We took care of the basic AAP setup - got fresh RHEL host, downloaded installation bundle, set up passwords, and run setup.sh. Next, we configured AAP to run our playbooks.

AAP execution environment

AAP runs ansible-playbook commands in AAP execution environment. This is basically a (podman) container with everything you will ever need. It includes:

roles and modules you wrote,
their prerequisites:
- all roles, modules and collections used by your roles,
- required python packages,
- required binary packages.

We needed to write execution-environment.yml to describe all direct dependencies. Then we built an environment image with ansible-builder. Prerequisites of the direct dependencies were recursively resolved, and EE was ready to be used.

AD inventory and Custom credential type for AD inventory plugin

The EE contains code for the AD inventory plugin. To use this plugin in AAP, we needed to add a new inventory with a new inventory source. For new inventory source, we selected our AD plugin, and suitable credentials.

Two arbitrary inventory plugins require different credentials. Not only values, but also structure - it is not always username + password, it can also be token, and URL, etc. Or no credentials at all - the inventory file can contain a list of all relevant hosts.

AAP credential type is a generic solution to this problem. For our plugin, we created a new credential type. It consists of two parts:

“Input configuration” is a list of required (or optional) fields. It is what the administrator needs in AAP GUI to grant the AD inventory plugin access to the AD server - AD URL, password, CA certificate, etc. Fields like password should be marked as secret to ensure they are not visible in UI and stored as encrypted.
“Injector configuration” is how to pass the provided values to the ansible-playbook running inside AAP EE container. The inventory plugin will get most of the values in UNIX process environment. That is the usual approach. For some data, it is more common to be provided as file, like the CA certificate. This is also possible; you just need to provide the desired jinja2 template to include one or more variables from the input configuration.

With a new credential type defined and used in our inventory, we can sync the inventory source and new hosts will appear. We can also set up a schedule for our inventory source to avoid waking up at 2:00 AM to click the sync button again.

The rest

We configured read-only access to the Git repository with playbooks on the Git server and used that to add a new AAP project. We synced the project and transferred the Git code to AAP. We added an AAP job template for each relevant playbook and ran a template to check if they work.

With job templates defined, we then created workflows. First, we updated OS and set up a custom CA certificate chain. Then we configured the monitoring agent, antivirus agent, and backup agent, enforced system policies, etc. - all in parallel. We then installed the actual application.

Again, a schedule can be used to automatically run a job or workflow at regular intervals.

Admin can be notified of success or failure via notifications. In addition to email, Slack and other common options, a fully customizable webhook is also available. This allows integration with other “unknown” systems.

Results

The developed Ansible playbooks allow automated, non-interactive OS update. It is scheduled to run on periodic basis against all relevant hosts. The initial OS installation playbook is run by the admin when a new host is being added.

The described AAP is self-contained. It can be run on an isolated network without Internet access.

The AD inventory plugin allows transparent synchronization from AD to AAP.

The Git server used by AAP runs on an isolated network. Developers typically push code to a public Git server such as github.com, so the content needs to be transferred to this isolated Git server.

The AAP EE is based on the official RedHat EE base image. It is built and tested on a host with Internet access and then transferred to AAP.

By automating setting up and updating Linux, our client simplified the management of their Linux servers and made a big step towards unified automation of their IT processes and infrastructure. Looking for ways to optimize the management of your servers? Reach out, our experts are here to help you out.