The ultimate guide for writing high-quality Ansible Playbooks

21. julij 2022 - Avtorja Jasna Simončič, Anže Luzar

21. julij 2022
Avtorja Jasna Simončič, Anže Luzar

This post was originally published on the XLAB Steampunk blog.

The point of automating IT tasks is to save time. But in reality, the process of successfully automating a task can actually take longer than the task itself if your instructions are not clear. So should we give up on automating IT and miss out on all its benefits?

Of course not, manual management of the increasingly complex IT infrastructure is simply not viable anymore. So, should we just accept the fact that before automation simplifies our life, it will complicate it a bit first? Well, everything comes at a price, but if you get it right from the start, you’ll reap the invaluable benefits of automation in no time.

Having written countless Ansible Playbooks and Ansible Collections, we share our experience to help you become a pro at writing Ansible Playbooks that guarantee safe, reliable, and secure automation.

What does quality have to do with it?

Everything. If your playbook isn’t of high quality, chances are, its execution will fail, others won’t be inclined to use it, you’ll have to spend a lot of time debugging it and, most importantly, it can lead to high security risks. High-quality playbooks have short, simple, and easy-to-read instructions so that anyone can understand what the playbooks will do and use them without any hassle. If the playbook that implements the task steps isn’t reliable, we’ve gained nothing, as we need to focus our efforts on the overview of Ansible executions.

But don’t worry, following these steps will help you achieve reliable automation that you can trust.

1. Choose a high-quality Ansible Collection

High-quality Ansible Collections are a prerequisite for creating robust automation workflows. While collections give you much more control over the content you use in your Ansible Playbooks, they also bring the responsibility for the quality of the content you use when creating playbooks. So how can you recognize high-quality Ansible Collection?

Check these 5 things:

1. Documentation

Once you have found a potential candidate, first check its documentation. It should contain at least a quickstart tutorial with installation instructions. Another essential part of the documentation is a detailed module, plugin, and role reference guide.

2. Playbook readability

As an Ansible Playbook should serve as a human-readable description of the desired state, modules from the Ansible Collection under evaluation should have a consistent user interface and descriptive parameter names.

3. Basic functionality

Always check each Ansible module’s basic functionality. The most critical property to look for is the result. Ansible modules and roles that enforce a state are much easier to use than their action-executing counterparts, because you can update your Ansible Playbook and rerun it without risking a significant breakage.

Modules should also have support for check mode and diff mode. These modes of execution are highly helpful when you are creating or editing a playbook or role and you want to know what it will do. Ansible check_mode is an extremely useful feature when creating or editing an Ansible Playbook, especially when running complex playbooks that involve major changes to servers. check_mode lets you see what the playbook will do without actually changing anything on the remote servers. This dry run helps you find potential errors that would seriously damage the servers or completely shut them down. Better safe than sorry, right? In diff_mode, Ansible provides before-and-after comparisons. You can combine check mode and diff mode for detailed validation of your playbook or role. And if you combine check mode with state enforcement, you get a configuration drift detector for free.

4. Implementation robustness

Checking the continuous integration/continuous delivery (CI/CD) configuration files should give you a general idea of what is tested. Finding ansible-test and molecule commands in the test suite is an excellent sign.

5. Maintenance

Take a look at the issue tracker and development activity. Finding old issues with no response from maintainers is one sign of a poorly maintained Ansible Collection.

Use Certified Ansible content

Another way to ensure that the collection is of high quality is to use certified content, as it has undergone additional quality assessment and testing and, more importantly, guarantees that the collection is maintained and fully supported by Red Hat and partners. This means that you’ll always have someone to turn to in case of problems.

Learn more about tips for Choosing an Ansible Collection that’s right for you.

2. Use the content appropriately

Choosing the right Ansible Collections is the first step in creating robust automation workflows. Now you have to use that content the right way.

1. Enforce the desired state

Enforcing the desired state has a significant impact on the reliability of our playbooks. This means that we can run our Ansible Playbooks twice in a row without breaking anything on the second run. We can test our playbook after each new task addition by rerunning it. It is also easy to spot when we start to deviate from this best practice.

Most Ansible modules check if the desired final state has already been achieved and if it has, they do not perform any actions, so repeating the task doesn’t change the final state. Modules that behave this way are called “idempotent” and ensure that the result is always the same whether you run a playbook once or multiple times. But beware - not all playbooks and not all modules behave this way. To be sure, you should test your playbooks in a sandbox environment before running them multiple times in production.

And note that it’s very important to enforce states at the playbook task level, which can be achieved by using the word state (e.g., state: present or state: absent), which we use to define the desired state of our resources.

2. Use fully qualified collection names

Ansible Playbooks should always use fully qualified collection names (FQCNs), because that decreases the chance that Ansible will use a different module than we intended.

3. Document everything and stay organized

When automating with Ansible, be sure to write thorough documentation for your Ansible content, as this will help other users understand what you have done. Moreover, it’s also important to maintain some structure to your content. If you are developing a role or collection, you should follow the structure described in the Ansible Documentation. If you are only using Ansible to automate your infrastructure, you should also try to organize your playbooks into logical groups. If your playbooks get too large, split them into multiple files and proceed from there. If you need to automate a complex application, consider grouping your content into roles. Also - putting your Ansible content in a Git repository is probably the easiest way of storing your content and tracking your changes. There you can also provide good documentation with README and share the content with the Ansible community.

4. Don’t forget to KISS

When automating with Ansible, it is important to “keep it simple stupid” (KISS principle), which means that you should mainly use Ansible modules and simple tasks, rather than complex blocks and loops that no one can understand and debug. It is also recommended that you use Jinja templating and filtering when needed and not too much inline code. When your playbooks are too complex or you have a lot of inline shell and Python commands, think about developing your custom Ansible module.

Learn more about Writing reliable Ansible Playbooks.

3. Follow best practices and avoid common mistakes

So, identifying quality content and then taking the right steps for writing reliable playbooks should do the trick, right? Well, following all the guidelines still doesn’t guarantee things won’t break. So, what more can you do? Don’t make mistakes. Duh, right? But when you’re familiar with the most common mistakes playbook creators do, it’s easier to avoid them.

1. Run early, run often

When writing Ansible Playbooks, we highly suggest running them after each task you add. Why? Because when we mess things up, Ansible lets us know that right away. And since Ansible errors can get quite lengthy and dense, having a single source of bugs is a must.

2. Run it again

No, we haven’t lost it, we suggest running paybooks twice in a row to make sure Ansible changes nothing when rerunning the playbook. Why is this important? Because it forces us to think about the desired state of the target system. Ansible Playbooks that enforce a particular state are way more versatile and robust compared to their action-executing counterparts.

3. Check it

Ok, yeah, we may be fans of checking, as we suggest you run Ansible again. But this time, run it in check mode. (We swear this is the last Ansible run we will add to the workflow). The main idea behind this new run is to ensure that no tasks fail in check mode. Why? Because this opens up a lot of new possibilities for reusing playbooks. For example, we can run such playbooks in check mode once per day, and if any of the runs report back a changed task, we know that someone or something has tampered with our system.

4. Lint it

Ansible Lint is, at its core, a collection of rules that playbooks should follow. And yes, some rules are made to be broken. For example, Ansible Lint will warn you if your Jinja expressions do not have spaces before and after, and if you feel having things styled your own way, you can always ignore such warnings. Though even if stylistic deviations do not make a difference in the playbook’s functionality, having a consistent style in the codebase never hurts.

But there are other rules that we should never ever ignore. The risky file permissions warning is one such example that reminds us that we should pay attention to security when using modules like copy or template.

Learn more about Avoiding common mistakes in your Ansible Playbooks.

4. Stay on the safe side

We should not forget about security, right? Ansible Playbooks describe infrastructure with code and can contain vulnerabilities that may potentially lead to security breaches. Apart from regular static and dynamic code scanning locally and with CI/CD, it’s important to be aware of common risks and follow best practices for security.

For instance:

use least privilege principle without keeping file permissions too open,
don’t use admin by default (avoid become: yes if you can),
use Ansible Vault to encrypt your variables and store sensitive content,
upgrade your dependencies and Ansible version regularly
check integrity when downloading from web (e.g., set gpgcheck: true for yum Ansible module to perform GPG signature check),
store your Ansible Playbooks on secure servers,
remove suspicious code comments (use name for describing what every task does, but don’t uncover any security info about your infrastructure).

So, after following all these steps and guidelines, you should be set, right? Well, we hate to disappoint you, but even all the running and rerunning and checking doesn’t guarantee you’ll catch all potential errors. But don’t get discouraged, we promised to make you a pro, so keep reading.

5. Use Steampunk Spotter

What if we told you that you don’t have to constantly think about all these steps and still make your playbooks shine? That you can have an always present little helper guiding you each step of the way, making sure the quality of your playbooks is the highest? Yeah, it would be really weird if now we’d say JK. But no, we’re serious, there is such a tool, and it’s called Steampunk Spotter.

Steampunk Spotter analyses and enhances your Ansible Playbooks to help you increase the reliability and security of your automation. The tool is currently in beta, but our team is working hard to add a bunch of new features.

The Spotter checks the quality of your content, offers recommendations on how to improve your playbooks to avoid troubleshooting, and helps you understand what happens when you run your playbooks, so you don’t have to worry about breaking anything in production. Unlike the existing tools that only provide syntax checks, Spotter understands the context of your playbooks and Ansible Collections, which means it knows what you’re trying to accomplish and helps you achieve that goal faster and more safely. Start Spotting!