Manage diagnostic settings & VM boot diagnostics with Azure Automation

It is common for cloud administrators to make their environments feature-consistent, and that may be accomplished in different ways. My preference is to use Desired State Configuration (DSC), which is part of the Azure Automation, and my second option is to make sure that your ARM templates and Azure DevOps pipelines enforce the required settings. On top of that, we can always use Azure Automation to report any resources not compliant with your standards. Lately, I have been working with a customer that moved a bunch of servers in a lift-and-shift migration. In this kind of scenario, the benefits of DevOps will start to be seen when new VMs are being provisioned in the cloud. However, it does not help existing VMs that have been running in the environment for years.

To address this scenario, we created a script that will evaluate resources and based on tags, we will deploy all the requirements from the business. In this article, we are covering two items: diagnostic settings and VM boot diagnostics. However, we can use the same concept and expand to any other requirement that you may have in your environment using the similar code by just adding a couple of changes.

How the cloud-operation runbook works

When planning automation for any scenario, we must prepare for exceptions and ways to allow the administrator to manage the solution based on their requirements without too much change at the code level (runbook). Managing your Azure subscription is no different, and although we could create a runbook that enables diagnostic settings on all your IaaS (infrastructure-as-a-service) VMs, it would not be practical. A good example: You may want to have logs for IIS and SQL differently than a regular application server.

The first draft of the script that originated this article is comprised of three Azure tags: OperationFlag, which is a string and has two possible values 0 or 1, where 0 means off and 1 means on. The second is DiagVersion, which has the version of the current diagnostic settings that we are applying to our environment; the third is the DiagWorkload, which we will reserve for future use but will allow the script to apply a different configuration based on the workload.

The OperationFlag is applied at the Resource Group level, and all resources within that Resource Group will receive those settings. If an OperationFlag does not exist at the Resource Group level, the script will stamp one with zeros. The OperationFlag can also be used at the resource level, and it will override any value that comes from the Resource Group.

The same logic holds for DiagVersion and DiagWorkload. If they do not exist at the resource level, they will be added using the current version and standard value for the DiagWorkload tag.

In the scenario depicted below, in the first run of the script, a few actions are expected to occur, as follows:

Both activities are configured to be executed at the Resource Group level (OperationFlag:11).
The virtual machine resource would be evaluated, If the two actions defined are not enabled on the VM, then they would be enabled.
The load balancer resource would receive DiagVersion and DiagWorkload. The two actions would be evaluated, and if the resource is not in compliance, the runbook will configure it accordingly.
The Key Vault resource would have the two actions disabled because the OperationFlag associated at the resource level takes precedence.

OperationFlag inner workings

The OperationFlag is the heart of this runbook. The idea behind this is to allow the administrator to define the actions that will happen on the resources. The first draft of the script contains two actions: The first one (in green) configures diagnostic logging for the resources and the second digit (in red) configures the boot diagnostics settings only for the VM resource type.

Note: The diagnostic settings for a VM are different than a regular resource (load balancer and Key Vault, to mention a few)

If there is a configuration that only applies to a single resource type, then you may reserve a digit just for that. If it is a more generic configuration from Azure perspective (diagnostic settings, which is being represented in the first digit), then the use of a single digit for all resources is the recommended approach.

You may be wondering: How will the script know that our second digit only applies to the VM resource type? The way that we tackle this dilemma is by using a JSON file that we call rules.standard.v1.json (where v1 is related to the version of the file).

This file is the memory of the runbook. The first section of the file has all global values, such as Storage Accounts, workspace ID, the resources being used by the script, etc. After that, we have one record for every type of the resource and the cmdlets that should be used by the runbook to check, deploy and remove the feature.

That answers the question of how we can make sure that the second digit will be used only against a virtual machine resource.

Every digit in the OperationFlag is represented in the rules file by S<DigitPosition> (we start from 0 to match all loops within the script and make it easier to troubleshoot the code).

Another “feature” of the script is that everything that appears in the cmdlets between <> is a string that the script will replace with dynamic values during execution time.

Running the Azure diagnostic settings script

Now that we have an idea of how the runbook logic was defined, we can run the runbook and target a single Resource Group or an entire subscription. A simple way to do that is to add a -ResourceGroupName switch on the Get-AzResource cmdlet (line 218).

The query to find the resources comes from the rules file, and it will target only resources that were defined prior. For every resource, the script will evaluate the OperationFlag defined vs. the resource currently configured.

All features that need to be either added or removed will be executed as part of the script to bring the resource back to the same status configured by the OperationFlag. The script execution is simple. Just give a status for each resource that evaluates during the execution.

A new VM was introduced in the RG-MSLAB Resource Group, and that Resource Group has the following tags associated with it. The VM does not have any diagnostic settings and boot diagnostics settings configured before the script execution.

After the execution of the script, the diagnostic settings (image below) and boot diagnostics were enabled by the Azure Automation runbook.

The current script and the files to support it can be found on the following GitHub here. The script is a work in progress. However, it gives the cloud administrator a good start and an option to address some common issues among resources, such as Azure diagnostic settings from a central location with a single runbook.

We are going to have a future article explaining the script itself. For now, the goal is to understand the business requirement and the usage of the Azure diagnostic settings automation script.

Featured image: Shutterstock

Manage Azure diagnostic settings and VM boot diagnostics with Azure Automation

How the cloud-operation runbook works

OperationFlag inner workings

Running the Azure diagnostic settings script

About The Author

Anderson Patricio

Leave a Comment Cancel Reply

How the cloud-operation runbook works

OperationFlag inner workings

Running the Azure diagnostic settings script

About The Author

Anderson Patricio

Read Next

Leave a Comment Cancel Reply