PowerShell for Storage and File System Management (Part 3)

If you would like to read the other parts in this article series please go to:

In the previous article, I explained that before we can develop a PowerShell script for monitoring our storage, we need to figure out what our objectives are. It kind of goes without saying, but we also need to determine PowerShell’s capabilities. Otherwise we could end up in a situation in which our objectives don’t match up with what can be realistically achieved through PowerShell.

So what types of useful things might we do with our script? Well, we could possibly monitor storage capacity to make sure that we aren’t running out of space. It would also be a good idea to make sure that the disk is healthy. Finally, it’s a good idea to make sure that there are no problems with the disk that might signal an impending failure.

Of course this raises the question of whether a disk being healthy and being free of problems is the same thing. Believe it or not, a disk can have problems and still be listed as healthy. A disk’s health is determined at the OS level, not at the hardware level. In order to determine a disk’s true state, we need to check the health as reported by the OS, but we also need to perform a hardware check to look for problems.

If we are to build a PowerShell tool for monitoring storage health, then there are three things that we are going to need to do. First, we have to configure the tool to check the state of the disk. In other words, we must decide what criteria need to be evaluated and then build a script to check the disk based on that criteria.

The second thing that we will have to do is to schedule the script to run on a periodic basis. Manually running a script to see if all of a server’s disk’s are healthy is fine for one off situation, but our tool would be much more useful as an automated monitoring solution. Thankfully, this is relatively easy to do.

Finally, we need to produce alerts in the event that a disk’s health is not what it should be. This is easier said than done, but Windows does give us everything that we need.

OK, so let’s get started. In the interest of simplicity, we will start out by building a script that works on a single computer. Once we have got a working script we will extend it so that it can monitor remote servers.

I think that the most important thing to check is storage health, so let’s start there. For our purposes, I want to build the script to check:

Disk health as seen by the OS
Read Errors (total)
Write Errors (total)
Temperature
SMART status

In case you are wondering, I haven’t forgotten about capacity checks It’s just that capacity checks need to be performed at the volume level rather than the disk level (a single disk can contain multiple volumes). Therefore, I want to deal with capacity checks separately.

So with regard to disk health, there are plenty of other things that we could potentially check if we wanted to. For example, we design a script that differentiates between corrected and uncorrected read errors. Since I am building this tool as an educational exercise rather than developing a commercial product, I am going to stick to looking at the criteria listed above.

So the next question becomes, how do we retrieve this information. It is easy to get everything except for the disk’s SMART status. In fact, if I were to omit the SMART status, I could get everything that I need in two lines of code. Even so, it’s important to know if a disk is about to fail, so we really should be checking the SMART status.

Getting SMART status information isn’t overly difficult. However, that information must be retrieved using a different method than is used for acquiring the other information. This isn’t any big deal if the system only has one disk, but if a system has multiple disks then things get tricky. The reason for this is that the two different methods that we are going to need to use have completely different ways of identifying disks. Disks can be identified by friendly name, number, unique identifier, or instance name.

I’m actually getting a little bit ahead of myself. Before I delve too deeply into talking about disk identification, let’s take a look at how to get the health information for a single disk. Initially I will deal with the SMART status separately, but then we will work it into a single script later on.

The first thing that we have to do is to retrieve a list of our physical disks. You can do this by entering the Get-PhysicalDisk command. As you can see in Figure A, this command lists each of the system’s disks and its health status.

Figure A: The Get-PhysicalDisk command lists each disk in the system.

If we wanted to examine a single disk in more detail, we could append the –FriendlyName parameter and the disk’s Friendly Name. For example, we might type:

Get-PhysicalDisk –FriendlyName PhysicalDisk4

For right now we will stick to collectively examining all of the disks. With this in mind, we can get most of the information that was listed earlier by delving into the Storage Reliability Counter. This is surprisingly easy to do. You could retrieve all of the previously listed information (except for the disk’s SMART status) by using the following command:

Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus

Get-PhysicalDisk | Get-StorageReliabilityCounter| Select-Object ReadErrorsTotal, WriteErrorsTotal, Temperature

You can see what this looks like in Figure B.

Figure B: Here is the health information for the disks.

As you look at the figure above, you will probably notice two things. First of all, one of the disks is producing a lot of read errors. Second, there isn’t an easy way of telling which disk is having the problem. That’s what I meant when I said that we were going to have to do some work with regard to disk identities. I will get to that in the next article. In the meantime however, we have a disk that is having some problems, so we better check the disk’s SMART status, we can do so by using this command:

(Get-WmiObject -namespace root\wmi –class MSStorageDriver_FailurePredictStatus | Select InstanceName, PredictFailure, Reason)

As you can see in Figure C, SMART is not currently predicting a disk failure, so that is certainly good news.

Figure C: SMART is not predicting an imminent disk failure.

One last thing that I want to point out in the figure above is that the disks are being identified in yet another way. There is no real consistent method for identifying disks. As such, we are going to have to be creative with regard to how we use these commands within a single script.

Conclusion

In this article, I have shown you some techniques for retrieving basic health information for a disk. In the next article, we can begin working toward building a script for monitoring storage health.

If you would like to read the other parts in this article series please go to: