Spanning Tree Protocol(s) Explained

Or, how I learned to stop worrying and love the BPDUs.

Spanning Tree Protocol, as mentioned previously, is a protocol used by switches to build out a loop-free logical topology of an Ethernet network. The protocol itself is built off of a concept called, well, a spanning tree.

How does this work? Well, that’s what I’m here to explain.

Spanning Trees

A spanning tree is a concept from graph theory, is, formally, a tree of an undirected graph which includes all of its vertices in the least number of edges possible. For, well, normal people, literally the first image off Wikipedia is a good visual example.

In this context, a spanning tree is a set of links present in the network that, when they are the only ones used, result in a network with no loops in it.

Spanning Tree Protocol (STP), The Original

STP (as a whole) starts with one switch, called the root bridge. A spanning tree is, well, a tree, and this defines the starting point. With STP, every switch is given an identifier, called the bridge ID. This ID is the concatenation of a priority value, and the switch’s MAC address. The default priority is 32768, and is only adjustable in multiples of 4096. The switch with the lowest ID (so, the switch with the most zeros in its MAC when converted to binary, unless someone made priority changes) in the network is the root bridge, and then the tree builds from there.

Every port that’s connected is given a “cost”, which is $$1Gbis/s / port speed$$ in regular STP, meaning that switches that support higher standards like 10-gig can’t be accurately accounted for with STP.

With every port cost known, switches will then designate exactly one port as the root port, which is the port with the lowest total cost to the root bridge. When the tree is being built, every switch adds on the cost of its port to the cost it received, so it knows for every valid connection what the total cost through the entire path back to the root bridge is. Otherwise, valid links in the tree are called designated ports, and a link that would cause a loop if enabled is called a blocked port. Blocked ports are still utilized for switch to switch communication, in case that port is now required, but no normal network traffic is sent across one.

Switches talk to other switches using Bridge Protocol Data Units, or BPDUs. In STP, BPDUs are sent out about once every 2 seconds. BPDUs operate directly on the Ethernet link, meaning they do not exist on top of the Internet Protocol. You can usually identify them by their destination MAC address, 01:80:C2:00:00:00, which is the multicast MAC for STP.1

Most BPDUs are just sending existing port state, when idle, or at first setup, designating and blocking ports. However, when something changes, a Topology Change Notification (TCN) is propagated through the network, and all switches will recompute a valid spanning tree again, though this takes usually between 30 to 50 seconds.

Rapid Spanning Tree Protocol (RSTP)

Eventually, we got rapid STP, which is a little simpler in terms of operation (and possible states), and, well… rapid. And since 2004, it’s taken over, and plain STP is considered obsolete. Where, as said above, STP can take upwards of a full minute to recompute the network topology, RSTP can do it in three hello times, also known as the interval at which BPDUs are sent, the default is 2 seconds. Therefore, a change can be detected and acted on in 6 seconds by default. Honestly, “rapid” is right. And it’s backwards-compatible.

Among the changes to increase speed, the major one is that switches can actually propose spanning tree data to other switches, meaning instead of waiting for a full recalculation to propagate around the network (a few times, actually), if they agree on the proposals, that makes the calculations significantly quicker. Again, 6 seconds.

Multi Spanning Tree Protocol (MSTP)

And now, this gets weird. (R)STP works on physical Ethernet links, meaning that the shuffling around in failure conditions does not take into account the existence of VLANs at all. And since VLAN membership and filtering is configured per port on a switch, this can cause, well, let’s just say a lot of chaos. For this, we have MSTP, which I can summarize by saying it computes a valid spanning tree for each VLAN instead of one tree for the entire physical network. And this one introduces a lot of new terminology, so please just… sit tight.

MSTP is divided into regions, which are differentiated by their configuration name which is just an arbitrary string (of up to 32 characters) that a network admin can choose, a revision identifier which is an integer, from 0–65535, which is, for the most part, unused, though it’s still taken into account, meaning you can have separate regions with the same name by specifying a different rev number, and finally, an MD5 HMAC of the configuration table (that maps VLANs to trees), called the configuration digest. If all three parameters match, the two switches are in the same region.

Within a single region, there is one Internal Spanning Tree Instance (IST), and potentially many Multiple Spanning Tree Instances (MSTIs). The IST of a region is similar to the spanning tree of (R)STP, and is a default of sorts, any unmapped VLAN will follow the IST. MSTIs, however, can have multiple VLANs mapped to them, and each MSTI is a separate tree with a separate root bridge. (A VLAN can only belong to one MSTI.) MSTIs are identified with in integer, much the same a VLAN is.

There’s also a Common Spanning Tree (CST), which is the spanning tree of all regions in a network, and what bridges MSTP regions to (R)STP switches.

And just to make things confusing, the Common Internal Spanning Tree (CIST) is the spanning tree of every IST, connected across the IST roots per region, again, making it part of the default routing for unassigned VLANs.

So to summarize, MSTP will calculate a CST for the entire network, similar to if each switch was running plain (R)STP. Individual regions in a network will by default allocate an IST among themselves, and all ISTs are linked by the CIST, creating a global loop-free valid path. Finally, admins can configure MSTIs, and map VLANs to them, meaning that regardless of the defaults, different VLANs can have different trees because they might have different requirements for connectivity.2

Because that wasn’t confusing enough, I guess. Just add trees on top of trees on top of trees because One can never have enough trees Guys, guys… when we said “plant more trees”, I didn’t mean like this… But hey, it does live up to its name, so…

1. I was also going to point out the valid EtherType for STP, since that’s a thing (okay, EtherType is a two-byte field in Ethernet II, which identifies the protocol in use. 0x0800 is IPv4, and 0x86DD is IPv6, for example), but… I couldn’t find a value, but STP runs on top of IEEE 802.3 Ethernet not Ethernet II. And in 802.3, that same position is actually the size of the frame. ↩︎

2. I could very well be entirely wrong here, this is one of the few topics that I don’t have 100% understanding on… just 95%. ↩︎