Categories ArticlesStorage

SOS for SSDs: How to avoid solid-state drives firmware failure

A few months ago, I was alerted by a colleague to a critical bulletin that was released by the Hewlett Packard Enterprise Support Center. The bulletin warned about a firmware defect that had been detected in certain models of solid-state drives (SSD) used in several different HP systems and appliances. The title of the bulletin was difficult enough to parse at first: “HPE SAS Solid State Drives - Critical Firmware Upgrade Required for Certain HPE SAS Solid State Drive Models to Prevent Drive Failure at 32,768 Hours of Operation.” The bulletin was originally released in November and has since then been updated four times including late last month. You can read the full bulletin here.

What it all boils down to is that if you purchased one of the affected HP systems and turn it on, you can expect the SSD in it to catastrophically fail after exactly 3 years, 270 days and 8 hours (32,768 hours of operation). Well, at least it’s nice to know when something is going to fail so you won’t have your socks knocked off when it happens.

Of course, faulty firmware isn’t the only thing that can cause problems with SSDs. It’s a well-known fact that even SSDs that have had only minimal use can suddenly and unexpectedly fail when they have been experiencing certain kinds of loads. With hard disk drives (HDDs) at least you could get SMART errors warning you that your drive was in danger of bugging out pretty soon. SSDs, on the other hand, can prematurely fail without generating any SMART error conditions. Still, the incredible speed that SSD technologies have over slower “spinning rust” technologies has led many companies to migrate much of their storage from HDD to SSD drives where their budget has allowed them. And SSD prices continue to fall and are closing in fast towards parity with the cost of HDDs.

But the question remains: How can you prepare your datacenter so a firmware issue like this won’t take down your servers and other appliances? I talked with several colleagues about this and have distilled their consensus below as a series of best practices or tips you should follow.

Sign me up!

Shutterstock

The first thing you should do if you use SSDs, or you have systems or appliances deployed that have SSDs within them, is to sign up with your vendor’s support alerts mailing list if they have one. And don’t purchase anything from a vendor that doesn’t have a mailing list you can sign up for that provides alerts concerning issues with their products. Unfortunately, it can be hard with some vendors to find out where you can sign up for these kinds of support alerts or bulletins. For example, HP lets you sign up for Driver and Support eAlerts on this page as well as other announcements that are more marketing oriented simply by specifying your name, company, and email address. Dell lets you subscribe here to receive driver and firmware Update notifications, but this requires that you first create a Dell account on MyAccount. For other vendors however you either have to Google for various terms like “support bulletins” or “subscribe to alerts” and so on, or just go digging around on their website for information on how to subscribe (and whether they even have a list you can subscribe to).

Befriend your TAM

If you are an enterprise customer then you probably have been assigned a technical account manager or TAM work works at the vendor and whose job is to help you get answers when you need them (and convince you to buy more of their products). My advice is that you try to build a good working relationship with your TAM and not just treat them as another grasping appendage of your vendor’s sales department. A good TAM can be a lifesaver in many kinds of difficult solutions, and a TAM who you feel comfortable talking with — and who feels comfortable that they can reach out to you as well without feeling they’re intruding or viewed as being too pushy — is just the person you need on your side when something like a critical firmware problem is discovered in one of their products. Ask your TAM to notify you if anything like this should come up on their radar, and tell them that you’d appreciate them texting or calling you direct without delay if anything like this should arise. A good TAM can not only warn you when there’s a firmware issue but can also help you find and possibly even deploy the needed firmware update when it has been released by your vendor. Or at least your TAM can connect you with someone on your vendor’s support team who actually knows their game and is not just following a script that was provided to them.

Make regular backups

My final word of advice should be a no-brainer as it applies to anything in computing or networking that is storage-related. That advice is to make sure you regularly back up the storage on all your systems. With server systems, this should be straightforward and there’s no need to discuss it any further. Network appliances are a different kettle of fish, however, because some of them may have SSD storage embedded within them but may not surface any access to their storage externally, except perhaps to the vendor’s own authorized support personnel. In such cases, you may need to build some kind of load balancing capability into where your device is positioned on your network so that if the device unexpectedly fails its workload can be handled by another device on your network. But just don’t forget the importance of doing backups wherever they are possible.

Featured image: Shutterstock

Mitch Tulloch

Mitch Tulloch is Senior Editor of both WServerNews and FitITproNews and is a widely recognized expert on Windows Server and cloud technologies. He has written more than a thousand articles and has authored or been series editor for over 50 books for Microsoft Press and other publishers. Mitch has also been a twelve-time recipient of the Microsoft Most Valuable Professional (MVP) award in the technical category of Cloud and Datacenter Management. He currently runs an IT content development business in Winnipeg, Canada.

Share
Published by
Mitch Tulloch

Recent Posts

Hardware RAID vs. software RAID: Pros and cons for each

RAID is a technique to virtualize independent disks into arrays for improved performance. Should you…

3 days ago

After the plague: What IT will look like in a post-COVID-19 world

COVID-19 has changed everything, but once it disappears, we will not go back to how…

3 days ago

Solved: Outlook defaults to Microsoft 365 version with Exchange server

An Exchange server with a hybrid connection to Microsoft 365 is usually pretty seamless —…

3 days ago

How chatbots are changing the way teams communicate internally

Chatots are primarily thought of as consumer-facing solutions. They bring life to customer interactions by…

3 days ago

Hakbit ransomware campaign targeting specific European countries

The newly uncovered Hakbit ransomware campaign spread via spear-phishing emails may indicate a shift in…

4 days ago

Credential stuffing: Everything you need to know to avoid being a victim

Credential stuffing is yet another weapon being used by cybercriminals. Here’s what credential stuffing is…

4 days ago