This post in our series on storage for Hyper-V is devoted to the ways
that a Hyper-V host can connect to storage. This will not be a how-to
guide, but an inspection of the technologies.
Part 1 – Hyper-V storage fundamentals
Part 2 – Drive Combinations
Part 3 – Connectivity
Part 4 – Formatting and file systems
Part 5 – Practical Storage Designs
Part 6 – How To Connect Storage
Part 7 – Actual Storage Performance
Don’t Overdo It
The biggest piece of advice I hope all readers take away from this article is: do not over-architect storage connectivity.
As you saw in part one, hard drives are slow. Even when we get a bunch
of them working in parallel, they’re still slow. As you saw in my
anecdote in part two, even people who should know better will expend a
lot of effort trying to create ultra-fast connections to ultra-slow hard
drives. Unless you have racks and cabinets brimming with hard drive
enclosures and SANs, installing 16Gb/s Fibre Channel switches and 10 GbE
switches to connect to storage is a waste of money.
Your primary
architectural goals for storage connectivity are stability and redundancy.
As for speed, figure out how much data your host systems need to move
and how much your drive systems are capable of moving before you even
start to worry about how fast your connectivity is. I guarantee that
most people will be surprised to see just how little they actually need.
As a point of comparison, I managed a Hyper-V Server system with
about 35 active virtual machines that included Exchange, multiple SQL
Servers, a corporate file server, and numerous application servers for
about 300 users. There was a single backing SAN with 15 disks and a pair
of 1GbE connections. The usage monitor indicated that the iSCSI link
was generally near flat and almost never exceeded 1Gb/s in production
hours. The only time it got any real utilization was during backup
windows. Even then, it was only as full as the backup target could
handle. My storage network almost never reached its bandwidth capacity.
Fibre Channel
Fibre Channel (FC) is a fast and dependable way to connect to
external storage. Such external storage will almost exclusively be mid-
to high-end SAN equipment. The FC protocol is designed to be completely
lossless, which means there is very little overhead. This allows FC
communications to run at very nearly the maximum rated speed of the
equipment (4, 8, and 16 Gbps are the most common speeds).
Historically, FC was the premier method to connect to storage.
External storage was considered a top-shelf technology under any
circumstances, and nothing else had anything near FC’s speed and
reliability. These are still true today, but Ethernet is starting to
close the gap rapidly. The issue is that FC has always been very
expensive. It gets even more expensive when FC switches are required.
Many FC SANs allow multiple systems to connect directly without going
through a switch, but this limits the number of connected hosts and
removes a few options. For instance, you might want your hosts to
connect to multiple SANs or you might want to employ some form of
SAN-level replication. These are going to need switching hardware.
Another issue with FC is that it’s really its own world. It requires
understanding and implementation of World Wide Names and masking and
other terms and technologies. They aren’t terribly difficult, but you
can’t really bring in other knowledge and apply it to FC’s
idiosyncracies. A serious side effect of this is that you can’t really
find a lot of generic information on connecting a host to an FC device.
You pretty much need to work off of manufacturer’s documentation. For a
new system, that’s not such a big deal; it probably (hopefully!) comes
with the equipment. If you inherit a device, especially an older one
that the manufacturer has no interest or incentive in helping you with,
you might be facing an uphill battle. As the data center becomes flatter
and administrators are expected to know more about a wider range of
technologies, having portability of knowledge is more important.
When configuring FC, the best advice I can give you is to follow the
manufacturer’s directions.
You do want to avoid using any equipment that
converts the FC signal. You can find FCoE (Fibre Channel over Ethernet)
devices, but please don’t use them for storage. The FC equipment
believes that its signal will be received and doesn’t deal with loss
very well, so converting FC into Ethernet and back can cause serious
headaches — read that as “data loss”.
iSCSI
iSCSI is, in some ways, a successor to FC (although it’s a bit of a
stretch to think that it will replace FC). It’s not as fast or as
reliable, but it’s substantially cheaper. Instead of requiring expensive
FC switches and archaic configurations, it works on plain, standard
Ethernet switches. If you understand IP at all, you already know enough
to begin deploying and administering iSCSI.
What hurts iSCSI is that it relies on the TCP/IP protocol. This
protocol was designed specifically to operate across unreliable
networks, and as such has a great deal of overhead to enable detection
and retransmission of lost packets and resequencing of packets that
arrive out of order. Even if all packets show up and are in the correct
sequence, that overhead is still present. This overhead exists in the
form of control information added on to each packet and extra processing
time in packaging and unpackaging transmitted data. It’s very easy to
overstate the true impact of this overhead, but it’s foolish to pretend
that it’s not there.
A pitfall that some administrators fall into is treating iSCSI
traffic like all other TCP/IP traffic. Their configurations usually
work, just not as well as they could. The most common mistake is
allowing iSCSI traffic to cross a router. A very common piece of advice
given is to segregate iSCSI devices and hosts into separate subnets.
This makes sense. This is often extended into a recommendation that if
the hosts and hardware have multiple NICs, to also place those into
unique subnets. This quality of this advice is questionable, but if
properly implemented, not harmful. The following image represents what
this might look like.
Here, all switching is layer-2. What that means is that source and
destination systems are separated solely by MAC address, which is in the
header portion of the Ethernet frame where it’s easy for the switch to
access. As long as the switch isn’t overloaded, layer-2 connectivity
happens very rapidly. It’s so fast that it’s difficult to measure the
impact of a single switch. The only real dangers here are overloading
the switch, having too much broadcast traffic on the same network as the
iSCSI traffic, or forcing iSCSI traffic to go through numerous
switches.
It’s a lot harder to overload switches than most people think,
even cheap switches, but an easy fix is to just use switching hardware
that is dedicated to iSCSI traffic. Broadcast traffic can be reduced by
putting iSCSI traffic into its own subnet. If you like, you can further
isolate it by placing it into its own VLAN. Finally, try to use no more
than one switch between your hosts and your storage devices.
A problem arises when iSCSI initiators (clients) connect to iSCSI
targets (hosts) that are on different subnets. This connection can only
be made through a router. This often happens when the SAN’s adapters are
teamed, won’t allow their adapters to participate on separate subnets,
or the configuring administrator inadvertently connects the initiator to
the wrong target. One possible mis-configuration is illustrated below:
The problem here is a matter of both latency and loss. The second
adapter in the host cannot connect directly to the SAN’s team IP
address. People often look at this and think, well, if the source
adapter knows the MAC address of the destination adapter, why doesn’t it
just send it there using layer-2? The answer is: that’s not how TCP/IP
works. The source system looks at the destination IP address, looks at
its own address and subnet mask, and realizes that it needs a router to
communicate.
The destination MAC address it’s going to use when it
creates the Ethernet frame is that of its gateway, not the SAN’s teamed
adapter. The layer-2 switch at the top will then send those frames to
the gateway because, being only layer-2, it has no way to know that the
ultimate destination is the SAN. When the gateway receives the data, it
will have to dig below the Ethernet frame into the header of the TCP
packet to determine the destination IP address. It then has to repackage
the packet into a new frame with the source and destination MAC
addresses replaced and send it back to the layer-2 switch.
Hopefully, it’s obvious that layer-3 communications (routing) are a
more involved operation than layer-2 switching. Layer-3 delays are much
easier to detect than layer-2, especially if the router is performing
any form of packet inspection such as firewalling. Also, routers are
much more inclined to drop packets than simple layer-2 switches are.
Remember that in TCP/IP design, data loss is accepted as a given. As far
as the router is concerned, dropping the occasional packet is not a big
deal. For your storage connections, the extra delays and misplaced
packets can accumulate to be a very big deal indeed.
1GbE vs. 10GbE iSCSI
10GbE is all the rage… for people who make money on equipment sales.
10GbE cards are not terribly expensive, but 10GbE switches are. Unless
you’re absolutely certain that your storage can make decent use of
10GbE, you should favor multi-path 1GbE in your storage connectivity
purchase decisions. Again: don’t overarchitect your storage
connectivity.
Jumbo Frames
The size of the standard Ethernet frame is 1538 or 1546 bytes, with
1500 bytes as the payload and the rest as Ethernet information,
resulting in about 3% overhead on each frame. That’s not a great deal,
but superior ratios are available by employing jumbo frames. These allow
an Ethernet frame to be as large as 9216 bytes. The amount of overhead
bytes remains the same, however, reducing it to a trivial fraction of
the frame’s size.
In aggregate, this means that far fewer frames are
required to transmit the same amount of data. Less total data crosses
the wire, increasing throughput. Another benefit is that
packet-processing is decreased, since fewer headers need to be packaged,
processed, and unpacked. With modern adapters and switching hardware,
the effects of the reduced processing aren’t as significant as in the
past, but they are still present.
Configuring jumbo frames for a standard network adapter is just a
matter of changing its properties. Configuring in Server Core or for
Hyper-V can be somewhat more complicated.
No comments: