Data Tiering Strategies
If you would like to read previous articles in this series, please go to:
Let's start with a quick review of what we've learned so far in this series:
- Storage sprawl happens when each server in your organization has its own direct-attached storage (DAS). The solution to storage sprawl is storage consolidation. This is best accomplished by migrating server data from DAS to SAN.
- Storage overprovisioning is when you buy more storage hardware that you need. This often happens in environments where DAS is used for server storage, and the result can be both underutilization (wasted storage space) and application instability due to saturation of storage devices. The solution to storage overprovisioning is a combination of storage consolidation and data tiering.
- Data tiering is the practice of storing your business data on storage devices appropriate for each type of data. This means storing hot data on Tier 1 storage that is fast, highly reliable, always available and easy to search; cool data on Tier 2 storage that is slower, has acceptable reliability, not mirrored or replicated, and slower to search; and cold data on Tier 3 storage for archival and regulatory purposes. Tier 1 typically uses enterprise hard drives that can be either HDDs or SSDs. Tier 2 typically uses low-cost commodity-based large-capacity HDDs. And Tier 3 typically uses tape.
Benefits of data tiering
Data tiering can provide a number of different benefits for your company. The biggest of these is obviously that you can save money. For example, let's say you've determined that your business data totals 20 TB and breaks down as follows:
- 10% of your business data is hot data (valuable data that has been recently acquired and might be needed at a moment's notice by your business applications)
- 50% of your business data is cool data (stale data that is less valuable and might be needed only occasionally by business applications).
- 40% of your business data is cold data that is no longer needed but should be stored in archive form in case the regulators ever ask for it.
Assuming you consolidate your data onto SAN storage to reduce storage sprawl and prevent storage overprovisioning, here's how the cost might work out if you don't implement storage tiering (i.e. if you use only a single tier for storage):
Figure 1: Our cost is $40,000 if we use only a single tier for data storage.
Now let's see if we can save any money by implementing data tiering as described above:
Figure 2: Implementing data tiering cuts our storage costs by half!
Of course the figures used above for cost are only approximate and will depend on the actual storage hardware used, but I think you get the basic idea that storage tiering can save you money!
Some of the other benefits of storage tiering include:
- Improved performance for Tier 1 business-critical applications. By greatly reducing the amount of data stored on Tier 1 storage you can improve the responsiveness for applications that work with such data. In addition, the money you save through data tiering can allow you to purchase faster enterprise-class hard drives for Tier 1 storage and implement other strategies like data replication to ensure uptime for your Tier 1 applications.
- Improve backup performance overall. By migrating much of your data to Tier 2 storage you can speed the backup process for Tier 1 storage and perform less frequent backups for Tier 2 storage. You also make it easier to restore data from backup.
- Improve search. By keeping only hot data on Tier 1 you can make search more efficient for Tier 1 applications.
Why data tiering isn't implemented
If data tiering is so great, why isn't it implemented more often in business environments? There are several reasons for this, some of which I've described in the previous article of this series. Reasons for not implementing data tiering can include:
- Laziness. Someone once said that execution requires both planning and discipline. Unfortunately planning requires effort, and discipline is basically effort that is consistently applied over time.
- Busyness. Sometimes it's not that we're lazy as IT pros but simply too busy to carefully plan and consistently implement a good solution for a problem. The way around this obstacle is to learn how to prioritize your use of time effectively, and a great help for me in this regard has been Stephen Covey's four quadrants. For a quick and helpful overview see this article from White Dove Books, but you should get Covey's book if you really want to master time management.
- Fear. I've already talked about how fear of data loss makes it difficult to make infrastructure changes in an IT environment. That's because change always entails some degree of risk, and if you're afraid of making changes because you fear losing valuable business data (and hence your job) then you've got to learn how to manage risk rationally and deal with the often conflicting demands of upper management.
- Politics. Another hindrance to making infrastructure changes can be politics, especially in large enterprises where different groups own the compute, network and storage resources. You must strive to overcome such a silo mentality by clearly demonstrating the potential benefits that can be achieved through implementation of a data tiering solution.
- Ignorance. No one likes to admit they're wrong or that they don't know something or aren't familiar with a technology or practice. IT pros are no different from anyone else in this regard. If you don't know how a SAN works or how to determine which types of data should be allocated to each storage tier, you simply need to start learning. And yes, learning takes time and effort, so don't by lazy!
Strategies for implementing data tiering
At a basic level, developing and implementing a data tiering strategy is just like any other IT project and involves four things: scope, resources, schedule, and risk:
Figure 3: The four elements of any project.
This means you need to start by clearly articulating your objectives and budget. You assemble a team and assign responsibilities. The team works out the steps involved, identifies any risks associated with each step, and devises strategies to mitigate these risks. Once the project plan is finished, you perform tests in a safe environment. If the tests pass, you then pilot the project using low-risk applications and data. If the pilot is successful, you begin rolling out the changes in an orderly way, assessing each phase of the implementation. Finally you're done, but not quite as you need to make sure everything is properly documented and appropriate support is in place in case something goes wrong in the future.
After all this, your hope is that your success will be recognized and rewarded by management, but the usual reward is simply more work and being assigned other projects! Oh well, that's IT for you...
Anyways, some strategies and tips relating specifically to data tiering projects include the following:
- Identify the different types of data processed by business applications in your environment. Examples might include documents, presentations, media, email, customer data, product data, financial data, and so on. Some of this will be in the form of files (.docx, .xlsx, .pptx, etc.) and others in structured database format.
- Determine which storage tier each type of data best belongs to. The tier for data of a particular type may also depend on such factors as when the data was created or last accessed, who created the data, and other considerations. For example, the .pptx files for a presentation that will be given at next week's company meeting might be considered hot data, while .pptx files created more than 6 months ago might be considered cool data and those created more than two years ago might be considered cold data.
- Decide whether you will implement manual tiering or use some form of automation to move stale data from Tier 1 to Tier 2 and from Tier 2 to Tier 3. If you plan on consolidating your existing DAS storage onto a SAN then you should verify that the SAN you are thinking of purchasing includes some form of dynamic data tiering functionality. Otherwise you might need to cobble together your own auto-tiering solution using scripts.
- Deploy and configure your Tier 1, 2 and 3 storage devices. For SAN environments you might have both Tier 1 and 2 on the same SAN but using different types of drives for the logical unit numbers (LUNs) used for each storage tier. Or might decide to keep hot data on hardware RAID arrays in each server and create scripts to migrate data from these arrays to Tier 2 storage on your SAN as the data ages and becomes stale.
- Begin by migrating cold data to Tier 3 storage (typically tape). Once this is done, you can then migrate your cool data to Tier 2 storage and either leave your hot data on your existing server storage or migrate it to new Tier 1 storage.
Finally, note that data tiering should always be implemented together with storage consolidation (e.g. migrating DAS to SAN), otherwise you'll end up with an overly complicated solution that moves data around way more often than is needed, and which can make data difficult to find when it's needed.
In the next article of this series we'll examine the importance of implementing service level agreements (SLAs) to help ensure the success of a data tiering solution.