Azure Linux VM boot errors due to file system changes: How to fix them

In this article, we will go over some of the steps required to fix a boot error on an Azure Linux VM due to a file system mount error. We will cover some of the options available to avoid this problem. The content of this article can help you in your Linux certification (Red Hat or LCFS) studies, as well as cloud administrators who are managing Linux VMs in Microsoft Azure.

We can use a simple scenario. Our corporation, AP6 Enterprises, is planning to start with an infrastructure to support Linux in Microsoft Azure. We have high utilization of volumes being added, changed and removed from our Linux VMs.

We are going to keep it simple and create a single volume and attach it to the VM. We went over the entire step-by-step in detail in another article where the main topic was managing individual disks in Azure Linux VMs.

We are going to document only the required steps to create the environment without explaining the validation process because it was included in the previous article we linked to above.

The current Linux VM is called srv001, and we added a 32GB disk, and the disk was labeled disk000 (for lack of better imagination).

After adding the new disk using the Azure Portal, we performed these following steps to create the new partition. First, we list all the current disks on the operating system (Item 1), and we noticed that the new disk is /dev/sdc (Item 2).

We executed fdisk utility using the new disk as a parameter (Item 3). We started the creation of a new partition (Item 4), defined as primary partition (Item 5), since that is the first partition we designated as such (Item 6), the first section we left default values (Item 7), and we defined the size to be 10Gb (Item 8). We wrote all those changes (Item 9).

The next step in troubleshooting Azure Linux VM boot errors is to format the newly created partition (Item 1) and create a folder that we will mount the partition (Item 2) and where the application will reference to use to store data.

A simple test is to mount the partition before committing the changes to the /etc/fstab file. One important note to remember is that the /dev/sdX (where X is the letter associated with the disk) may shift if you move your disks around. It is highly recommended to use the UUID of the disks instead of their /dev/<disk/partition> address.

After mounting the disk, run a blkid (Item 1) and copy the UUID (Item 2) of the desired disks, and use that information on the /etc/fstab file.

The result is summarized in the picture below. We start by checking the content of the file /etc/fstab (Item 1), and we will notice the newly added line into the file (Item 2). Since the machine restarted, we can check all mounted file systems using df -h (Item 3). Indeed, we can see that the drive /dev/sdc1 is mounted on /mnt/fs.

We are introducing chaos!

The second law of thermodynamics says that entropy always increases, meaning from order to disorder. Let’s assume that our Linux administrator had that mindset and changed the /etc/fstab to a nonexistent value in the line that we specify our new disks. Another possible way to cause disruption is to disconnect the disks from the Azure VM.

Solution 1: Fixing the /etc/fstab

By default, when a disk cannot be mounted, it will provide an error and the system will enter rescue mode automatically. If you check the console of the Linux server, the root password will be requested to continue.

If you don’t have the root password, check out my previous article that goes over this exact scenario in Azure Linux VMs.

After getting authenticated as root, edit the /etc/fstab and fix the problem. There are a couple of ways to tackle the issue. First, if the disk no longer exists, a disconnected disk for example, then delete or comment the line. Second, if the disk exists and there is a typo, fix it and save the file.

When complete with the changes, run systemctl reboot to restart the Linux server. If there are no additional errors in the /etc/fstab, the system will start normally.

Solution 2: Avoiding the problem with nofail parameter

You may want to avoid the problem even before it becomes a real problem. If you use Red Hat or any other distribution that supports nofail or something similar, we can adopt this method as standard when adding entries to the/etc/fstab file.

In the Red Hat, add nofail separated by a comma besides the default column. In the image depicted below, we added an X to the UUID and restarted the Linux server. The Linux server restarted without any issues, and the only problem, for obvious reasons, is that the file system was not mounted.

This workaround does not solve the problem where the application will not be able to use the data. However, it is just a matter of fixing the /etc/fstab in a live system. This approach is much easier on the operation side.

After fixing the /etc/fstab file, just run mount -a. The system will reread the file and mount any missing partition. This procedure guarantees that you are not going to have the same issue in the next restart.