Writing reliable Ansible playbooks

14. julij 2021 - Avtor Tadej Borovšak

14. julij 2021
Avtor Tadej Borovšak

This post was originally published on the XLAB Steampunk blog.

We often talk about techniques and tools that help developers write high-quality Ansible content (modules and plugins). Of course, having high-quality Ansible collections is a prerequisite for creating robust automation workflows, but Ansible playbook authors still need to use that content appropriately. And this is the topic we will talk about today.

In the first part of the post, we will look into what properties make the Ansible playbook reliable and what benefits those properties bring to the Ansible users. We will dedicate the second part of the post to finding concrete solutions for problems Ansible playbook authors often stumble upon while automating processes.

Why bother with established good practices?

When we write an Ansible playbook or role, we usually do it because we want to automate tedious manual tasks. But if the Ansible playbook that implements task steps is not reliable, we gained nothing. Before, we had to perform error-prone steps manually, and now we have to overview Ansible executions. And yes, watching Ansible logs is great fun during the honeymoon phase, but it gets tedious real quick ;)

Luckily, things do not have to be this way. With a little bit of discipline, everyone can write reliable Ansible playbooks that will gracefully handle errors and offer an easy way of recovering after an error occurs.

Do note that this does not mean we have to add a large amount of error checking to our Ansible playbooks. Error checking can be helpful sometimes, but in most cases, we do not need it. It is often much easier to let the error bubble up and stop the Ansible execution. Once we diagnose the culprit, we can rerun our Ansible playbook and call it a day.

Now that we know why we should care about the reliability of our Ansible playbooks, we can start looking at some general guidelines that will improve the quality of our automation.

Enforcing the desired state

Enforcing the desired state is the thing that will have the most significant impact on the robustness of our Ansible playbooks. What it means is that we can run our Ansible playbooks two times in a row, and the second run will not break anything.

If you think, “Ugh, that is probably too hard for me,” we have some great news. Writing Ansible playbooks that enforce the desired state is more effortless compared to their actions-executing counterparts. Why? Because we can test our playbook after each new task addition by rerunning it.

It is also easy to spot when we start to deviate from this best practice. If we find ourselves commenting out previously written tasks before rerunning Ansible, we are probably doing something wrong.

Most of the time, we do not have to do anything special when writing Ansible playbooks because most Ansible modules enforce state by default. But there exists one family of Ansible modules that we have to use with a bit more care: command executors.

Let us assume that we need to run a database initialization command after we install the database. Our first attempt will probably look something like this:

- name: Initialize database
  ansible.builtin.command: init_my_db with some params

But now we have a problem. If we rerun our Ansible playbook, Ansible will try to initialize the database for the second time, which is not OK. We can prevent this by telling the command module what file the command will create using the creates parameter. If the file exists, the command module will skip the initialization and report no change.

- name: Initialize database
  ansible.builtin.command: init_my_db with some params
  args:
    creates: /path/to/file/created/at/initialization.db

But sometimes, the command itself enforces the state. Sensu Go initialization is one such example. In this case, it is safe to rerun the initialization command, but it is still helpful to tell Ansible if things changed. And we can do that through the changed_when task keyword.

- name: Initialize backend
  command:
    cmd: sensu-backend init
  register: init
  failed_when: init.rc not in (0, 3)  # 0 - OK, 3 - already initialized
  changed_when: init.rc == 0

In the Sensu Go initialization example, Ansible will report a state change if the initialization command returns a zero status code.

Fully qualified collection names

Before the introduction of Ansible Collections, all Ansible modules lived in the same (global) namespace. In order not to break existing Ansible playbooks, Ansible Base introduced the routing table for content that Ansible developers moved from the central repository into dedicated collections. And while this is excellent news for owners of existing Ansible playbooks because they do not have to update them, new Ansible playbooks should always use fully qualified collection names (FQCNs).

The main reason for this is straightforward: if we always use FQCNs, there is less of a chance that Ansible will use a different module than we intended.

Finding an FQCN for a module might sound simple, but because Ansible Collections can redirect those to other collections, we might have to work a bit harder than anticipated. The safest option right now is to run Ansible in verbose mode and inspect the output it prints to the console.

For example, let us take the following Ansible playbook:

---
- hosts: localhost
  gather_facts: false
  tasks:
    - name: Create contaier
      docker_container:
        # Parameter go here

When we run Ansible, we will see something similar to this:

$ ansible-playbook -vv playbook.yaml
ansible-playbook [core 2.11.1]

# More output here, trimmed for brevity

TASK [Create user] ******************************************************
path: /tmp/play.yaml:5 redirecting (type: modules) 
  ansible.builtin.docker_container to community.docker.docker_container

# More output here, trimmed for brevity

We can see that Ansible “renamed” the docker_container module into ansible.builtin.docker_container and then redirected it to the community.docker.docker_container.

Certified Ansible content

Being able to buy support for Ansible content we use in our Ansible playbooks is not directly related to reliability. Still, it does make a difference when finding someone to help us resolve our issues.

Determining if an Ansible collection is certified is relatively straightforward: if we installed it from Automation Hub, Red Hat or one of the partners would support us. Making sure we do not use any community-supported Ansible collections is a bit harder because the community bundles quite a few of them inside the ansible Python package, making them exceptionally convenient to use.

And to make things even “worse” (if we can call having a lot of ready-to-use content a bad thing), quite a lot of short Ansible module names redirect to community-supported modules. So this is yet another reason why we should use FQCNs if at all possible.

It is also worth mentioning that Red Hat certifies individual collections and not all of the content from a namespace. Would you guess that the ansible.windows collection is not certified yet? Well, neither did the blog post author, which was a “fun” problem to solve ;)

Conclusion

So, what did we learn today? Well, if there is one point we would like to get across is this:

Writing reliable Ansible playbooks is not much more complex than writing bad Ansible playbooks if we know a few pitfalls that we need to avoid.

Using Ansible a lot is one way of getting to know those pitfalls. But visiting scanner.steampunk.si can give you a speed boost and get your Ansible playbooks in top shape with a minimal amount of effort.

Cheers!