Windows Server 2012R2 NIC Teaming

This is an attempt at giving a technical overview of how the native network teaming in Windows 2012R2 works, and how I would recommend using it. From time to time I am presented with problems “caused” by network teaming, so figuring out how it all works has been essential. Compared to the days of old, where teaming was NIC vendor dependent, todays Windows native teaming is a delight, but it is not necessarily trouble free.

Sources

Someone at Microsoft has written an excellent guide called Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management, available at here. It gives a detailed technical guide to all the available options. I have added my field experience to the mix to create this guide.

Definitions

  • NIC: Network Interface Card. Also known as Network Adapter.
  • vNIC/virtual NIC: a team adapter on a host or another computer (virtual or physical) that use teaming.
  • Physical NIC/adapter: An adapter port that is a member of a team. Usually a physical NIC, but could be a virtual NIC if someone has made a complicated setup with teaming on a virtual machine.
  • vSwitch: A virtual switch, usually a Hyper-V switch.
  • Team member: a NIC that is a member of a team.
  • LACP: Link Aggregation Control Protocol, also IEE 802.3ad. See https://en.wikipedia.org/wiki/Link_aggregation#Link_Aggregation_Control_Protocol

Active-Active vs Active-Passive

image

If none of the adapters are set as standby, you are running an Active-Active config. If one is standby and you have a total of two adapters, you are running an Active-Passive config. If you have more than two team members, you may be running a mixed Active-Active-Passive config (strandby adapter set), or an Active-Active config without a standby adapter.

If you are using a configuration with more than one active team member on a 10G infrastructure, my recommendation is to make sure that both members are connected to the same physical switch and in the same module. If not, be prepared to sink literally hundreds, if not thousands of hours into troubleshooting that could otherwise be avoided. There are far too many problems related to the switch teaming protocols used on 10G, especially with the Cisco Nexus platform. And it is not that they do not work, it is usually an implementation problem. A particularly nasty kind of device is something Cisco refers to as a FEX or fabric extender. Again, it is not that it cannot work. It’s just that when you connect it to the main switch with a long cable run it usually works fine for a couple of months. And then it starts dropping packets and pretends nothing happened. So if you connect one of your team members to a FEX, and another to a switch, you are setting yourself up for failure.

Due to the problems mentioned above and similar troubles, many it operations have a ban on Active-Active teaming. It is just not worth the hassle. If you really want to try it out, I recommend one of the following configurations:

  • Switch independent, Hyper-V load balancing. Naturally for vSwitch connected teams only. No, do not use Dynamic.
  • LACP with Address Hash or Hyper-V load balancing. Again, do not use Dynamic mode.

Team members

I do not recommend using more than two team members in Switch Independent teaming due to artifacts in load distribution. Your servers and switches may handle everything correctly, but the rest of the network may not. For switch dependent teaming, you should be OK, provided that all team members are connected to the same switch module. I do not recommend using more than four team members though, as it seems to be the breaking point between added redundancy and too much complexity.

Make sure all team members are using the exact same network adapter with the exact same firmware and driver versions. Mixing them up will work, but even if base jumping is legal you don’t have to go jumping. NICs are cheap, so fork over the cash for a proper intel card.

Load distribution algorithms

Be aware that the load distribution algorithm primarily affects outbound connections only. The behavior of inbound connections and routing for switch independent mode is described for each algorithm. In switch dependent mode (either LACP or static) the switch will determine where to send the inbound packets.

Address hash

Using parts of the address components, a hash is created for each load/connection. There are three different modes available, but the default one available in the GUI (Port and IP) is mostly used. The other alternatives are IP only and MAC only. For traffic that does not support the default method, one of the others is used as fallback.

Address hash creates a very granular distribution of traffic initiated at the VM, as each packet/connection is load balanced independently. The hash is kept for the duration of the connection, as long as the active team members are the same. If a failover occurs, or if you add or remove a team member, the connections are rebalanced. The total outbound load from one source is limited by the total outbound capacity of the team and the distribution.

clip_image003[5]

Inbound connections

The IP address for the vNIC is bound to the so called primary team member, which is selected from the available team members when the team goes online. Thus, everything that use this team will share one inbound interface. Furthermore, the inbound route may be different from the outbound route. If the primary adapter goes offline, a new primary adapter is selected from the remaining team members.

Recommended usage
  • Active/passive teams with two members
  • Never ever use this for a Virtual Switch
  • Using more than two team members with this algorithm is highly discouraged. Do not do it.

MS recommends this for VM teaming, but you should never create teams in a VM. I have yet to hear a good reason to do so in production. What you do in you lab is between you and your therapist.

Hyper-V mode

Each vNIC, be it on a VM or on the host, is assigned to a team adapter and stays connected to this as long as it is online. The advantage is a predictable network path, the disadvantage is poor load balancing. As adapters are assigned in a round robin fashion, all your high bandwidth usage may overload one team adapter while the other team adapters have no traffic. There is no rebalancing of traffic. The outbound capacity for each vNIC is limited to the capacity of the Physical NIC it is attached to.

This algorithm supports VMQ.

clip_image004[5]

It may be the case that the red connection in the example above is saturating the physical NIC, thus causing trouble for the green connection. The load will not be rebalanced as long as both physical NICs are online, even if the blue connection is completely idle.

The upside is that the connection is attached to a physical NIC, and thus incoming traffic is routed to the same NIC as outbound traffic.

Inbound connections

Inbound connections for VMs are routed to the Physical NIC assigned to the vNIC. Inbound connections to a host is routed to the primary team member (see address hash). Thus inbound load is balanced for VMs, and we are able to utilize VMQ for better performance. Dynamic has the same inbound load balancing problems as Address hash for host inbound connections.

Recommended use

Not recommended for use on 2012R2, as Dynamic will offer better performance in all scenarios. But, if you need MAC address stability for VMs on a Switch Independent team, Hyper-V load distribution mode may offer a solution.

On 2012, recommended for teams that are connected to a vSwitch.

Dynamic

Dynamic is a mix between Hyper-V and Address hash. It is an attempt to create a best of both worlds-scenario by distributing outbound loads using address hash algorithms and inbound load as Hyper-V, that is each vNIC is assigned one physical NIC for inbound traffic. Outbound loads are rebalanced in real time. The team detects breaks in the communication stream where no traffic is sent. The period between two such breaks are called flowlets. After each flowlet the team will rebalance the load if deemed necessary, expecting that the next flowlet will be equal to the previous one.

The teaming algorithm will also trigger a rebalancing of outbound streams if the total load becomes very unbalanced, a team member fails or other hidden magic black-box settings should determine that immediate rebalancing is required.

This mode supports VMQ.

clip_image005

Inbound connections

Inbound connections are mapped to one specific Physical Nic for each workload, be it a VM or a workload originating on a host. Thus, the inbound path may differ from the outbound path as in address hash.

Recommended use

MS recommends this mode for all teams with the following exceptions:

  • Teams inside a VM (which I do not recommend that you do no matter what).
  • LACP Switch dependent teaming
  • Active/Passive teams

I will add the following exception: If your network contains load balancers that do not employ proper routing, e.g. F5 BigIP with the “Auto Last Hop” option enabled to overcome the problems, routing will not work together with this teaming algorithm. Use Hyper-V or Address Hash Active/passive instead.

Source MAC address in Switch independent mode

Outbound packets from a VM that is exiting the host through the Primary adapter will use the MAC address of the VM as source address. Outbound packets that are using a different physical adapter to exit the host will get another MAC address as source address to avoid triggering a MAC flapping alert on the physical switches. This is done to ensure that one MAC address is only present at one physical NIC at any one point in time. The MAC assigned to the packet is the MAC of the Physical NIC in question.

To try to clarify, for Address Hash:

  • If a packet from a VM exits through the primary team member, the MAC of the vNIC on the VM is kept as source MAC address in the packet.
  • If a packet from a VM exits through (one of) the secondary team members, the source MAC address is changed to the MAC address of the secondary team member.

for Hyper-V:

  • Every vSwitch port is assigned to a physical NIC/team member. If you use this for host teaming (no vSwitch), you have 1 vSwitch port and all inbound traffic is assigned to one physical NIC.
  • Every packet use this team member until a failover occurs for any reason

for Dynamic:

  • Every vSwitch port is assigned to a physical NIC. If you use this for host teaming (no vSwitch), you have 1 vSwitch port and all inbound traffic is assigned to one physical NIC.
  • Outbound traffic will be balanced. MAC address will be changed for packets on secondary adapters.

For Hyper-V and Dynamic, the primary is not the team primary but the assigned team member. It will thus be different for each VM.

For Host teaming without a vSwitch the behavior is similar. One of the team members’ MAC is chosen as the primary for host traffic, and MAC replacement rules applies as for VMs. Remember, you should not use Hyper-V load balancing mode for host teaming. Use Address hash or Dynamic.

Algorithm Source MAC on primary Source MAC on secondary adapters
Address hash Unchanged MAC of the secondary in use
Hyper-V Unchanged Not used
Dynamic Unchanged MAC of the secondary in use

Source MAC address in switch dependent mode

No MAC replacement is performed on outbound packets. To be overly specific:

Algorithm Source MAC on primary Source MAC on secondary adapters
Static Address hash Unchanged Unchanged
Static Hyper-V Unchanged Unchanged
Static Dynamic Unchanged Unchanged
LACP Address hash Unchanged Unchanged
LACP Hyper-V Unchanged Unchanged
LACP Dynamic Unchanged Unchanged

Author: DizzyBadger

SQL DBA Principal Analyst

Leave a Reply