The first part of this series was dedicated to looking at some of the 
most basic concepts in storage. In this article, we’ll build on that 
knowledge to look at the ways you can use multiple drives to address 
shortcomings and overcome limitations in storage.
Part 1 – Hyper-V storage fundamentals
Part 2 – Drive Combinations
Part 3 – Connectivity
Part 4 – Formatting and file systems
Part 5 – Practical Storage Designs
Part 6 – How To Connect Storage
Part 7 – Actual Storage Performance
Why We Use Multiple Drives
Despite a lot of common misperceptions, the real reason we use 
multiple drives for storage is to increase the speed at which drives can
 perform. Data protection is a secondary concern, mostly because it is 
more difficult to achieve. When these schemes were initially created, 
they were notoriously poor at providing any protection. 
Drives were 
manufactured and sold in batches, so one drive failure in an array was 
usually followed very quickly by at least one other. Improvements in 
manufacturing made multiple drive systems somewhat more viable for data 
protection, for a time. 
As drive sizes increase, the ability for 
multi-drive builds to protect data is again on the decline. The 
performance improvements are the real driver behind these builds, and 
they are still worth the effort.
As illustrated in the first article, drives are slow. Solid state 
drives are much faster than spinning disks, but they’re still not as 
fast as other components. By combining drives in arrays where their 
contents can be accessed in parallel, we can aggregate their performance
 for much improved data throughput.
True Data Protection
Since it’s been brought up, remember that the only real way to 
protect your data is by taking regular backups. There is no stand-in, no
 workaround, no kludge, and no excuse that is satisfactory. Get good 
backups.
Traditional RAID
In the distant past, RAID stood for “Redundant Array of Inexpensive 
Disks”. But, as most people were quick to point out, disks weren’t 
inexpensive. So, the “I” can also mean “Independent”. As the name 
implies, the idea was that a disk or two could fail and, through 
redundancy, could be replaced without losing data. As indicated earlier,
 this worked better on paper than in practice. However, the multiple 
paths to store and retrieve data increased drive performance, so usage 
continued.
There are multiple levels of RAID. This post will only examine the most common.
RAID-0
The zero in the name indicates “no redundancy”. This RAID type is 
most commonly used in performance PCs and workstations. If any drive in 
the array fails, all data is lost and must be restored from backup.
RAID-0 is useful to understand because its striping mechanism is used
 in all but one of the other common types. When data is written, it is 
spread as evenly as possible across all drives in the array in a pattern
 called a stripe. This is most easily understood by diagram:
The sample file data has been color-coded to show how it is 
distributed across a three disk RAID-0 array (sorry, those of you who 
are colorblind). You may notice that the final stripe isn’t full. The 
next file to be written will be able to use that empty space.
RAID-0 is used in performance systems because it is a pure 
aggregation of the drives’ abilities. When the file above is written, 
the controller writes to all three drives simultaneously. When the file 
is read, it is retrieved from all three drives simultaneously. Adding 
more drives to a RAID-0 directly improves its performance, but it also 
increases its risks as each new drive adds another potential point of 
failure that can bring the entire array down.
A RAID-0 array requires a minimum of two drives, but its upper limit 
is dictated by the controller (usually, the number of available drive 
bays is exhausted first). It is very uncommon to find such an array in a
 virtualization system.
RAID-1
RAID-1 is the only common RAID type that does not use a stripe pattern. It is known as a mirror
 array. It requires exactly two drives. Incoming data is written to both
 disks simultaneously, but because it is the same data, the speed is the
 same as if there was only a single drive. If the array controller is 
smart enough, data can be read from both drives simultaneously but from 
different locations, allowing the read speed of both drives to be 
combined.
One drive in a RAID-1 can fail and the other will continue to function. The loss of a drive is called a broken mirror.
 When the drive is replaced, a rebuild operation must take place, which 
simply copies the data from the surviving drive to the replacement. As 
this is an extremely heavy I/O operation, it is not uncommon for the 
other drive to fail during a rebuild. Some controllers have the ability 
to throttle rebuild speed (although this is generally to reduce the 
impact of the rebuild on standard data operations more than anything 
else).
The RAID-1 type is most commonly used to hold operating systems and 
SQL log files. It is generally not used to hold virtual machine data. 
However, in small installations with minimal use virtual machines, it 
would be acceptable.
RAID-5
With RAID-5, we return to the striping mechanism. This system uses parity,
 in which part of the data saved onto disk participates in an error 
detection and control system. As with RAID-0, this is most easily 
understood by diagram:
For those interested, the parity calculation is a simple bitwise XOR 
of the live data. In the event that one of the drives is lost, this 
parity data can be used to mathematically reconstruct the missing data 
on the fly. That operation takes time, but the array can function in 
this degraded mode. The purpose of distributing the 
parity as seen in the diagram is to minimize the effects of the loss of a
 drive. Instead of calculating for an entire drive’s worth of data, it 
can skip calculated for blocks that only contained parity.
After a failed drive is replaced, a rebuild cycle begins. As with the
 RAID-0, this can be a vulnerable time due to the uncommonly high I/O 
load. Many RAID-5 controllers also allow this to be throttled.
There are some other considerations for RAID-5 to speak of, but 
RAID-6 also shares these concerns so discussion will be held until after
 that section.
RAID-5 requires a minimum of three drives. Its maximum is limited by the controller and available bays.
RAID-6
RAID-6 is very similar RAID-5 except that it uses two parity blocks 
instead of one. This allows the array to lose two drives and continue to
 function.
Concerns with Parity RAID
Parity RAID systems are beginning to fall out of favor in higher-end 
and large capacity systems. The reasoning for doing so is valid, but 
there are also a lot of FUD-spreading extremists who are advocating for 
the death of parity RAID regardless of the situation. This condemnation 
will eventually be the rule, but for now it is still very premature.
Write Performance
When a parity RAID system makes a write that only changes part of the
 data in a stripe, it first has to read all the data back so that it can
 recalculate the parity for all of it, including both the pre-existing 
and the replaced data. Only then can it write the data. 
This entire 
process is known as a read-modify-write (RMW). It requires one extra 
read cycle before the write cycle. Naturally, that drags down 
performance. However, there are four mitigating factors. First, most 
loads are heavier on reads than writes (usually 60% or higher), and 
reads occur at the cumulative of all drives in the array minus one or 
two (because the parity block(s) is/are not read).
Second, once the 
drives seek to where the stripe is, the entire RMW process occurs in an 
extremely localized area, which means that performance will be at the 
drives’ fastest rate. Third, a write-back cache on the array controller 
may cause the process to return to the operating system immediately such
 that the I/O effect is never felt. Finally, this write penalty is a 
known issue and should have been accounted for when the storage system 
was designed such that the minimum available IOPS remain above what is 
necessary for the connected system. 
I would also suggest a fifth 
mitigating factor: some people just have an unhealthy obsession with 
performance and it’s safe to ignore them. Another thing often overlooked
 is that a write that encompasses all blocks in a stripe is written 
immediately. The best way to increase performance of a RAID system is to
 increase spindles, so drop another disk or two in that new system 
you’re about to order and don’t worry.
Data Vulnerability
A more pressing concern than the write penalty is the vulnerability 
of data on the disks. Data on a disk is just a particular magnetization 
of coating on a glass platter. That magnetization status can and will 
slip into a non-discrete state. As bit densities increase and therefore,
 the amount of coating that constitutes a single bit shrinks, the risk 
and effects of this demagnetization also increase. 
If the magnetization 
status of data in a stripe becomes corrupted on more than one drive, 
then the entire array is effectively corrupted. This is a very real 
concern and you should be aware of it.
The problem with the extremist FUD-spreaders is that they act as 
though this is all new information and we’re just now becoming aware of 
it. The truth is, the potential for data corruption through 
demagnetization has always been a risk and we have always known about 
it. Each drive contains its own onboard techniques to protect against 
the problem. 
This is pretty much the only reason the single drive in 
your laptop or desktop remains functional after a week or two in the 
field. Beyond that, RAID systems have their own protection scheme in the
 form of an operation that periodically scans the entire drive system 
for these errors. This process is known as a scrub, and it’s been in 
every RAID system I’ve ever seen, going back to the late ’90s.
That said, the risk is increasing and eventually we will have to stop
 using parity RAID. Disk capacities are increasing to the point that a 
scrub operation just can’t complete in a reasonable amount of time. 
Furthermore, these big drives take a long time to go through a rebuild 
operation, stretching out into days in some cases. The sustained higher 
I/O load over that amount of time puts some serious wear on the 
remaining drives and greatly increases the odds that they’ll fail while 
there is insufficient protection. RAID-6 provides that extra bit of 
protection against this and as such will be with us longer than RAID-5.
So, the risk is real. But again, it’s on the big drives. Not everyone
 is jumping to those big drives, though, especially since they’re still 
not the fastest. The performance hounds are still using 450GB and 
smaller drives. These have no problems completing scrubs. Where is the 
sweet spot? I’m not certain, exactly, but I wouldn’t guess at it, 
either. I’d have the engineers for any RAID system I was considering 
tell me. They’ll have access to the failure rates of their systems. As a
 rule of thumb, I feel comfortable using 15k 600GB drives in a RAID-5. 
Putting a stack of 5400 RPM 2TB drives in a RAID-5 is just begging for 
data loss.
RAID-10
RAID-10 is a type of hybrid RAID in which multiple 
RAID types are combined in a single array. This particular type is 
called a “stripe of mirrors”. First, you take two or more pairs of 
drives and make them into mirrors. You then create a stripe across those
 mirrors. Just look at the RAID-0 diagram and imagine that each disk is 
actually two disks.
The primary reason to use RAID-10 is performance. With a 
well-designed controller, reads can occur from separate portions of 
every disk in the array, making this by far the fastest-reading RAID 
available. Writes occur at the combined speed of half the disks.
RAID-10 has historically been the preferred medium for SQL data, 
simply for the performance. However, it is also becoming the de facto 
replacement for parity RAID. Since data protection occurs across pairs 
and not the entire array, data integrity scans are quicker. If a drive 
is lost, only that pair needs to be rebuilt, so only one disk is placed 
at high I/O risk and not the entire array.
The drawbacks with RAID-10 are logistical in nature. Fully half of 
the physical drive space is reserved for redundancy, making it the most 
expensive per-gigabyte of the common RAID types. Reaching an equivalent 
storage capacity in comparison to a parity RAID also means filling lots 
of drive bays, which could quickly run out.
RAID-10 requires a minimum of four drives and is limited by the controller and available drive bays.
RAID Controller
RAID control can either be performed by dedicated hardware or in 
software. The preferred method is generally hardware. With modern 
hardware, parity calculations and scrub jobs and mirror maintenance 
aren’t particularly heavy loads, but it’s still better handled in 
dedicated hardware.
Also, many hardware RAID controllers have an 
included battery that allow for write-back cache; in the event of power 
failure, the controller can flush cache contents to disks prior to 
powering them down.
Software RAID is often controlled by operating systems. Of most 
relevance in this series is the Storage Spaces feature (not really a 
RAID system, but close enough for this discussion) that was released in 
Windows Server 2012, but even earlier Windows operating systems could 
run a software RAID.  
Many NAS systems employ a software RAID, although 
that’s generally obscured by a specially designed operating system. In a
 Hyper-V Server deployment, what you want to avoid is running a software
 RAID (or Storage Spaces) on the same system that’s running virtual 
machines. The division of labor can be a drain on performance.
JBOD
JBOD will be something you’re going to start seeing more of. It 
stands for “just a bunch of disks”, and it means that there are multiple
 disks present but no RAID. In earlier systems, this wouldn’t be a 
sufficient platform for virtualization. Now, with the introduction of 
Storage Spaces, JBOD can be made into viable configuration.
Storage Spaces
Storage Spaces is Microsoft’s new entry in the storage arena. It’s a 
large software stack that’s devoted to presenting storage. It was first 
introduced in Windows Server 2012 (and Windows 8, but I’m not going to 
discuss the user-level implementation). You can create a storage pool 
out of a single disk or across multiple disks in a JBOD. It is not 
supported to use Storage Spaces on an existing hardware RAID, although 
the system will allow it in some instances. In case you’re wondering, 
yes, the Storage Spaces system can detect a hardware RAID and, depending
 on what you’re trying to accomplish, will warn you about the hardware 
RAID or will simply refuse to use it. Storage Spaces also prefers SAS 
disks, although it can work with SATA (and with USB drives, although 
these aren’t supported for a server deployment).
Storage Spaces is not a traditional RAID system, although it does have some similarities. The simple Storage Space is effectively a RAID-0, as it stripes data across the drives in its pool. You can create a mirror
 on your physical storage, which uses two or three drives in a similar 
fashion to RAID-1. 
What’s interesting about the mirror is that you can 
continue adding disks in columns, which essentially converts your mirror into a RAID-10. You can also create a parity space, which is similar to a RAID-5. Storage Spaces also grants you the ability to designate hot spares.
 If a drive in a mirror or parity space fails or is predicted to fail, 
Windows will automatically transfer its data to one of these hot spares 
so that the system doesn’t need to wait for human intervention.
The draw for Storage Spaces is that it can provide many of the 
most-desired capabilities of a dedicated hardware storage system at a 
lower price point than the average SAN. You use a general-purpose 
computer system running Windows Server 2012 or later and regular SAS 
drives. You also have the option to attach inexpensive JBOD enclosures 
to the system to extend the number of drives available for use. The 
storage can then be exposed to local applications (although hardware 
RAID is still preferred for that) or to remote systems through iSCSI or 
SMB 3.0. You can even create Cluster Shared Volumes on a Storage Space.
The nice thing about Storage Spaces is that it’s also available for 
hardware vendors. They can build hardware systems with an embedded copy 
of Windows Storage Server using Storage Spaces, and the result is an 
inexpensive networked storage device.
If you’re considering using Storage Spaces, I’d encourage you to 
consider waiting until 2012 R2 is released. It adds a number of 
capabilities to Storage Spaces and addresses some of the shortcomings in
 the initial release. For instance, the 2012 version doesn’t support 
CSVs on a parity space, while R2 does. Since there’s less than a month 
to go (as of this writing), the wait should be tolerable.
What’s Next
This article took one step up from the fundamentals and looked at the
 ways that hard drives can be arranged in order to increase performance 
and reliability. The next part in this series will take a deep look into
 the ways that you can connect your Hyper-V systems to storage.


No comments: