BTRFS in focus – recoverability aspects
In 2013, one of the most known NAS vendors – NETGEAR – started using BTRFS (short for B-tree file system) filesystem in their NAS devices. Unlike traditional filesystems used in NASes, BTRFS is not a pure filesystem, but instead a hybrid of filesystem and RAID. Since NETGEAR NAS devices are very popular among home users, the need to reveal the specifics of storing data on BTRFS is quite reasonable. Additionally, knowing all the specifics helps understand better what difficulties one may face when recovering data from a failed NETGEAR NAS.
First, let's look at a typical NAS layout. Any NAS device consists of physical disks, which are combined in some way into a single storage. Usually one of the two schemes are used:
- 2-layer scheme where the md Linux driver provides RAID and a filesystem is deployed over the RAID. Such a scheme is used in QNAP NASes and NETGEAR NASes configured in Flex-RAID mode.
- 3-layer scheme in which the md Linux driver ensures RAID, the LVM driver is responsible for combining several RAIDs into single storage, and a filesystem is in charge of storing data on the LVM volume. Synology NASes and NETGEAR NASes with in X-RAID2 mode utilize the 3-layer scheme.
In the 3-layer scheme, LVM provides a possibility to expand an array by adding or replacing disks with larger ones without filesystem reconstruction, thus eliminating the need to backup and restore a NAS during expansion.
As for the filesystems, any Linux filesystem can be used since NASes usually operate under Linux operating system. Typically, it is an EXT filesystem family (EXT3 or EXT4); some vendors opted for XFS, for example in Buffalo NAS models.
The 2-layer and 3-layer schemes are quite reliable and time-tested; however, they are not that feature rich. For example, neither EXT nor XFS support
- snapshots at the filesystem level,
- data compression at any level,
- data and metadata checksumming.
When recovering data from a NAS working under one of these schemes, one recovers each layer one by one; the last layer (filesystem recovery) is fully studied because of the EXT and XFS age and is recovered pretty well.
BTRFS filesystem has much more features including snapshots, compression, data and metadata checksumming. However, it is important to understand that BTRFS is not just a filesystem but a hybrid of filesystem and volume manager. For example, it is possible to take two devices and create an equivalent of RAID0 or RAID1 by means of BTRFS alone.
BTRFS supports almost all RAID levels – JBOD, RAID1, RAID0, RAID5 - with varying degrees of reliability. At the moment, all RAID level implementations in BTRFS except RAID5 work reliable; that's why modern NETGEAR NASes use BTRFS only to create JBOD, thus replacing the LVM layer. Fault tolerance, however, is still ensured by the good old RAID5 technology which the md driver is responsible for. Furthermore, the md driver supports RAID6 level for many years while in BTRFS RAID6 was not yet production-ready in 2013, when the switch to BTRFS occurred.
The main difference between BTRFS and traditional filesystems is in the way filesystem stores clusters on the underlying storage created by md RAID. Strictly speaking, in the filesystem terminology, cluster is called a block but we, nevertheless, prefer to stick to clusters since "block" often is used in a broad sense. Typical filesystems divide disk space into a sequentially numbered set of clusters. As a consequence, knowing only the location of the first cluster and the cluster size you can get the location of all other clusters. BTRFS divides disk space into clusters as well; however, clusters are not necessary sequentially arranged – clusters may go in arbitrary order, some clusters may have two copies in two different places, some disk areas may have no clusters at all. The correspondence between physical addresses and cluster numbers is determined by an additional table (the chunk table). Similar to all other BTRFS metadata, chunk table is usually stored in two copies.
With a traditional scheme, the recovery of different layers – md, LVM, filesystem - is done independently and sequentially, that is the filesystem recovery stage is the last one and is done once arrays are restored. At this stage, information about the location of file content is collected. To translate locations taken from filesystem metadata to physical clusters on the underlying md RAID it is enough to know the location of the first cluster and the cluster size.
Data recovery from BTRFS is radically more complex as compared to traditional filesystem recovery since BTRFS combines two layers of addressing. Unlike a traditional scheme where recovery is done at each layer independently, in BTRFS you need to know simultaneously where file content is located on the filesystem and where filesystem elements are stored on the physical disks. In other words, with BTRFS in addition to collecting information about file location on a volume, one needs a table of cluster address translation (chunk table). If both copies are lost, data recovery from a BTRFS volume is almost impossible.
In summary, previous NETGEAR NASes (released before 2013) are recovered in three independent and quite simple steps; in modern NETGEAR NASes, the first step (md RAID reconstruction) is the same while the second and the third are combined in one much more complex operation complicated by the fact that the crucial element exists only in two copies which can't be rebuilt if damaged.
The bottom line is modern NETGEAR NASes tend to be more complex when it comes to data recovery than previous models. In earnest, you should not take it into account at all. Recoverability was never guaranteed for any device. The only thing that can be guaranteed is a backup; that's why when configuring your storage system plan to have a backup rather than rely on the modern fancy technologies.
Author bio: written by Elena Pakhomova of www.ReclaiMe.com, specializing in NAS recovery solutions for a wide range of NASes: NETGEAR, QNAP, Synology, you name it.