An introduction to the world of storage (Part 2)

If you would like to read the first part in this article series please go to An introduction to the world of storage (Part 1).

In the last storage basics article we discussed the terminology that gets thrown around when discussing storage. In this article we’ll go a little deeper and discuss some architecture. Most of this will be from an EMC perspective as that’s what I’m most familiar with. The general concepts should apply to all vendors, though. We basically want redundant paths going from our arrays to our storage switches to our servers.

The following diagram shows us the back of a VNX Disk Processor Enclosure (DPE) with Storage Processors, a brocade fibre channel switch, and two servers. The Disk Processor Enclosure is the brains of the array. This is what a very basic SAN looks like.

Figure 1

Notice SP A is on the right and SP B is on the left, which is how it would look if you were looking at the back of the array. This is a little counter-intuitive, however, if you were looking at the front of the array it would seem right to those of us who read from left to right. The links between each device are fibre channel cables. There can always be more, but I’ve made it simple for this example.

The storage processors (SP A and SP B) are the mechanisms that we use for data communication. There are two to allow for high availability. If SP A goes down for any reason, SP B should be able to handle all storage communication if everything is set up properly. As you can see we have one cable going from SP A to the FC switch and one from SP B. Although there are different methods of multi-pathing, this should prevent you from experiencing any outages. Different servers can use different types of multi-pathing on the same array. If you’re using EMC arrays, it’s suggested that you use PowerPath to enable multi-pathing from Windows and Linux servers. You may use PowerPath with VMware ESXi hosts, but VMware does a pretty good job of multi-pathing natively.

Common methods of multi-pathing:

Active/Active – This allows both storage processors to be active at the same time. They’re both communicating data from the array to the servers.
Active/Passive – This means that one storage processor is delivering data for a particular lun (logical unit) to a particular server. If the active storage processor fails the passive storage processor will pick of the slack.
Asymmetric Logical Unit Access (ALUA) – The host determines which is the best path. The host will use some of the paths as active and some as secondary. On newer versions of VMware ESXi this is the preferred way of multi-pathing. That’s probably an over-simplification. With ALUA, there is usually one optimized path where one storage processor owns the lun and will communicate with a particular server. It will use a path this isn’t optimized if the host deems it necessary, though.

These methods are assigned on the array, though you can do several things on the host to maximize storage IO using either PowerPath or built in features within VMware.

It makes sense now why we have two processors and at least one connection from each processor to the switch. I should note that it’s not absolutely necessary to use a switch, but as with even a networking switch, you use it so you can connect multiple servers to the array. If you don’t have enough ports on the storage processors a switch will be necessary.

Much like creating VLANs on a networking switch to allow certain parts of the infrastructure to communicate we need to create zones on a fibre channel switch to allow servers to communicate with the array. As shown in the figure above we also have two servers that connect to the switch. In each server are two HBAs (Host Bus Adapters). This again allows for multi-pathing and redundancy. On the switch we want to put each server HBA in a zone with each storage processor. That way we can not only lose a storage processer we could also lose an HBA or even a port on the switch. If we wanted to make it even more redundant we’d use the following configuration.

Figure 2

In this configuration I’ve added another switch which allows for the failure of one of the switches. Each server has one HBA that goes to the first switch and one that goes to the second switch. On the array we’ve added another port to each storage process that goes to the second switch. Now we have what are called storage fabrics. Multiple fabrics implies that you can take one fabric down for maintenance, or because of a problem, and your production systems will keep running. For example you can upgrade your array, one storage processor at a time, or upgrade your switches one switch at a time, and theoretically your production infrastructure will keep running.

While we use WWNs, or World Wide Names for fibre channel zoning and to register a Host Initiator on an array we use IQNs (iSCSI Qualified Names) when working with iSCSI. We still want redundancy as shown in the diagrams above, but we’ll use networking switches and network cables to make this work. Most likely we’ll have a software iSCSI initiator on the hosts.

We can also use file-level systems in our data centers especially with vSphere. NFS is supported in recent versions of vSphere. As long as we have a VMkernel IP address for our storage, which should be on a different VLAN, as well as an IP specified on our file system in the same VLAN, we can add datastores using NFS. NFS storage will not be formatted as VMFS (VMware File System) storage so you won’t get some of the features that go along with VMFS. As of vSpehre 5.x we can also do multiple links to NFS storage. We need to have a couple dedicated ports to our storage. Then we create two vSwitches with one port being active and the other port being standby. On the second vSwitch we’ll use the same ports, but the standby will be the active and the active will be the standby on the second vSwitch.

Currently, in most mid-tier datacenters, it’s common to use block storage (SAN) to create your storage architecture. Many smaller businesses will use file level storage, though, especially with VMware. In the past few years, storage has changed a lot, though so it’s good to keep an eye on what’s going on. The best thing you can do is talk to a lot of vendors to see what’s best for your particular environment. A SAN can be very expensive and perhaps complicated to implement, but a NAS may not give you everything you need as far as performance and flexibility.

As mentioned in the previous part of this series, Software Defined Storage (SDS) is also hitting the market. Some examples of vendors that take a software-based approach to storage are VMware (VSAN), Maxta, Nexenta, Nutanix, SimpliVIty, and Scale Computing. In all of these cases, storage is abstracted and managed by a comprehensive software layer that adds the storage features that we’ve come to know and love, such as deduplication, compression, and more.

As you investigate the world of storage, don’t overlook this growing market segment.

If you would like to read the first part in this article series please go to An introduction to the world of storage (Part 1).