What’s a Steiner Tree?

Any of you who have worked with VPLS or NG-MVPNs are likely already familiar with using Point-to-Multipoint (P2MP) LSPs to get traffic from a single ingress PE to multiple egress PEs.  The reason that P2MP LSPs are desired in these cases is that it can reduce unnecessary replication by doing so only where absolutely required, for example where a given P2MP LSP must diverge in order to reach two different PEs.

However, typically the sub-LSPs which are part of a given P2MP LSP traverse the shortest-path from ingress to egress based on whatever user defined constraints have been configured.  While this is fine for many applications, additional optimizations might be required such that additional bandwidth savings can be realized.

We will take a look at something called a Steiner-Tree which can help the network operator to realize these additional savings, when warranted, reducing the overall bandwidth used in the network and fundamentally changing the way in which paths are computed.

Let’s start by taking a look at a simple example in which RSVP is used to signal a particular P2MP LSP, but no constraints are defined.  All the links in this network have a metric of 10.  In this case, the sub-LSPs will simply traverse along the shortest path in the network, as can be seen in the diagram below.

Here we see a P2MP LSP where PE1 is the ingress PE and PE2, PE3, and PE4 are all egress nodes.  Since no constraints have been defined the calculated ERO for each of the sub-LSPs will follow along the shortest path where we can see one sub-LSP taking the PE-P1-P2-PE2 path, another is taking the PE1-P1-P3-PE3 path, and the third is taking the PE1-P1-P4-PE4 path.  In this case, each sub-LSP has a total end-to-end cost of 30.

Shortest Tree

Under many circumstances this type of tree would be perfectly acceptable, especially when the end-goal is the minimize end-to-end latency, however there are other cases where we may want to introduce additional hops in an effort to reduce overall bandwidth utilization.  This is where the concept of a minimum-cost tree, otherwise known as a Steiner Tree, comes into play.

This may seem counter-intuitive at first; after all, doesn’t a shortest-path tree attempt to minimize costs?  The answer is yes, but it usually only does so by looking at costs in terms of end-to-end metrics or hops through a network.  Once you understand the mechanics of the Steiner Tree algorithm, and how it attempts to minimize the total number of interconnects, it starts to make more sense.

According to Wikipedia, “the Steiner tree problem, or the minimum Steiner tree problem, named after Jakob Steiner, is a problem in combinatorial optimization, which may be formulated in a number of settings, with the common part being that it is required to find the shortest interconnect for a given set of objects”.

That’s a pretty fancy way of saying it’s attempting to optimize the path to be the shortest path possible while at the same time reducing the total number of interconnects between all devices to only those that are absolutely required.

Steiner Tree optimizations are very useful where an ingress PE must send large amounts of data to multiple PEs and it is preferable to ensure that overall bandwidth utilization is reduced, perhaps because of usage-based billing scenarios which require that overall circuit utilization be reduced as much as possible in order to save money.

Let’s take a look at an example, once again using the same network as before, but this time performing a Steiner Tree optimization whereby cost is measured in terms of overall bandwidth utilization.  In this case we still see that we have the requirement to build the P2MP LSP from PE1 to PE2, PE3, and PE4.  However, this time we are going to compute an ERO such that replication will only take place where absolutely necessary in order to reduce the total number of interconnects and hence overall bandwidth utilization.

After performing a Steiner Tree path computation, we determine that PE3 is a more logical choice to perform the replication to PE2 and PE4, even though it increases the overall end-to-end metric cost to 40.  The reason for this is we have now effectively eliminated the bandwidth utilization on the P1-P2, P2-PE2, P1-P4, and P4-PE4 links.  In effect, we’ve gone from utilizing bandwidth across seven links to only five.  If the P2MP LSP was servicing a 100 Mbps video stream, we have just effectively reduced overall bandwidth utilization on the network as a whole by 200 Mbps.

Steiner Tree

One of the interesting side-effects of this approach is that we now see that PE3 is not only an egress node, but it is now also a transit node as well (for the sub-LSPs terminating at PE2 and PE4).  Due to this, we’ll also see that in these types of scenarios the Penultimate Hop Popping (PHP) behavior is different on P3 in that we don’t want it popping the outer label before sending frames to PE3 since PE3 may need to accommodate labeled packets heading to PE2 or PE3.  We will cover some of this in a subsequent article on the signaling mechanisms inherent in P2MP LSPs and some of the changes to the behavior in MPLS forwarding state.

Path computation for P2MP LSPs can be complex, especially when the goal is create Steiner Trees.  The reason for this added complexity when computing Steiner Trees is that sub-LSP placement has a direct correlation with other sub-LSPs, which is contrary to what happens when shortest-path trees are calculated where each sub-LSP may be signaled along their own unique path without regard to the placement of other sub-LSPs.

As with traditional LSPs, similar methods of determining the paths through the network and hence the ERO can be used, i.e. manual, offline computation.

The easiest approach would be to use constructs like Link Coloring (Affinity Groups for you Cisco wonks) to influence path selection, for example, by coloring the PE1-P1, P1-P3, P3-PE3, PE3-PE2, and PE3-PE4 links with an included color, or coloring the remaining links with a different color and excluding that color from the LSP configuration.

However, this approach is merely a trick.  We are feeding elements into the CSPF algorithm such that the shortest path which is calculated essentially mimics that of a Steiner Tree.  In other words, it’s not a true Steiner Tree calculation because the goal was not to reduce the total number of interconnects, but rather to only utilize links of an included color.

Furthermore, such an approach doesn’t easily accommodate failure scenarios in which PE3 may go down, because even though Fast Reroute or Link/Node Protection may be desired, if the remaining links do not have the included colors they may be unable to compute an ERO for signaling.

Workarounds to this approach are to configure your Fast Reroute Detours or your Link/Node Protection Bypass LSPs to have more relaxed constraints, such that any potential path might be used.  However, more commonly what you’ll see is that some type of additional computations might be performed using traditional offline approaches (using modeling tools such as those provided by vendors such as WANDL, OPNET, or Cariden) which factors both steady-state as well as failure scenarios to assist the operator in determining optimal placement of all elements.

An interesting side-note is that there are some pretty significant developments underway whereby online computation can be performed in such a way as to optimize all P2MP LSPs network-wide, using something known as Path Computation Elements (PCEs).  These are essentially any entity which is capable of performing path computation for any set of paths throughout a network by applying various constraints.  It is something that looks to be especially useful in large carrier networks consisting of many LSPs, and especially so in the case of Steiner Tree P2MP LSPs where the sub-LSP placement is highly dependent on others.  See the charter of the PCE Working Group in the IETF for more information on this and other related developments.

As a side note, it should be fairly evident that in order to perform path optimizations on anything other than shortest-path trees (i.e. Steiner Trees or any other type of tree based on user-defined constraints), RSVP signaling must be used in order to signal a path along the computed ERO.  LDP certainly can be used to build P2MP LSPs (aka mLDP), however much like traditional LSPs built via LDP, the path follows the traditional IGP path.

Stay tuned as we will cover more exciting articles on P2MP LSPs and some of the other underpinnings behind many of the next generation MPLS services being commonly deployed…

Book Review :: JUNOS High Availability: Best Practices for High Network Uptime

JUNOS High Availability: Best Practices for High Network Uptime
JUNOS_High_Availabilityby James Sonderegger, Orin Blomberg, Kieran Milne, Senad Palislamovic
Paperback: 688 pages
Publisher: O’Reilly Media
ISBN-13: 978-0596523046

5starsHigh Praises for JUNOS High Availability

Building a network capable of providing connectivity for simple business applications is a fairly straightforward and well-understood process. However, building networks capable of surviving varying degrees of failure and providing connectivity for mission-critical applications is a completely different story. After all, what separates a good network from a great network is how well it can withstand failures and how rapidly it can respond to them.

While there are a great deal of books and resources available to assist the network designer in establishing simple network connectivity, there aren’t many books which discuss the protocols, technologies, and the myriad ways in which high availability can be achieved, much less tie it all together into one consistent thread. “JUNOS High Availability” does just that, in essence providing a single, concise resource covering all of the bits and pieces which are required in highly available networks, allowing the network designer to build networks capable of sustaining five, six, or even seven nines of uptime.

In general, there are a lot of misconceptions and misunderstandings amongst Network Engineers with regards to implementing high availability in Junos. One only needs to look at the fact that Graceful Restart (GR) protocol extensions and Graceful Routing Engine Switchover (GRES) are often mistaken for the same thing, thanks in no small part to the fact that these two technologies share similar letters in their acronyms. This book does a good job of clarifying the difference between the two and steers clear of the pitfalls typically prevalent in coverage of the subject matter. The chapter on ‘Control Plane High Availability’ covers the technical underpinnings of the underlying architecture on most Juniper platforms; coverage of topics like the separation between the control and forwarding planes, and kernel replication between the Master and Backup Routing Engine give the reader a solid foundation to understand concepts like Non-Stop Routing, Non-Stop Bridging, and In-Service Software Upgrades (ISSU). In particular I found this book to be very useful on several consulting engagements in which seamless high availability was required during software upgrades as the chapter on ‘Painless Software Upgrades’ discusses the methodology for achieving ISSU and provides a checklist of things to be performed before, during, and after the upgrade process. Similarly, I found the chapter on ‘Fast High Availability Protocols’ to be very informative as well, providing excellent coverage of BFD, as well as the differences between Fast Reroute vs. Link and Node Protection.

Overall I feel this book is a valuable addition to any networking library and I reference it often when I need to implement certain high availability mechanisms, or simply to evaluate the applicability of a given mechanism versus another for a certain deployment. The inclusion of factoring costs into a high availability design is a welcome addition and one that all too many authors fail to cover. Naturally, it only makes sense that costs should be factored into the equation, even when high availability is the desired end-state, in order to ensure that ultimately the business is profitable. If I had to make one suggestion for this book it is that there should be additional coverage of implementing High Availability on the SRX Series Services Gateways using JSRP, as this is a fundamental high availability component within Juniper’s line of security products. To the authors credit however, this book was written just as the SRX line was being released, so I don’t fault the authors for providing limited coverage. Perhaps more substantial coverage could be provided in the future if a Second Edition is published.

The bottom line is this – if you are a Network Engineer or Architect responsible for the continuous operation or design of mission-critical networks, “JUNOS High Availability” will undoubtedly serve as an invaluable resource. In my opinion, the chapters on ‘Control Plane High Availability’, ‘Painless Software Upgrades’, and ‘Fast High Availability Protocols’ are alone worth the entire purchase price of the book. The fact that you get a wealth of information beyond that in addition to the configuration examples provided makes this book a compelling addition to any networking library.

Implementing Provider-Provisioned VPNs using Route Reflectors

MPLS/BGP Provider-Provisioned VPNs, such as those proposed in RFC 4364 (formerly RFC 2547) or draft-kompella variants, suffer from some scalability issues due to the fact that all PE routers are required to have a full iBGP mesh in order to exchange VPN-IPv4 NLRI and associated VPN label information.  In a modern network consisting of a large number of PE devices, it becomes readily apparent that this requirement can quickly become unmanageable.

The formula to compute the number of sessions for an iBGP full mesh is n * (n-1)/2.  10 PE devices would only require a total of 45 iBGP sessions (10 * (9)/2 = 45).  However, by simply adding 5 additional PEs into this environment your total number of sessions increases exponentially to 105.  Scalability issues arise because maintaining this number of iBGP sessions on each PE is an operational nightmare; similarly control plane resources are quickly exhausted.

An alternative to this that has gained widespread adoption is to utilize Route Reflectors to reflect the VPN-IPv4 NLRI and associated VPN label between PE devices.  However, several issues arise when using Route Reflectors in such an environment.  In a normal environment without the use of Route Reflectors, MPLS tunnels exist between each PE router such that when the VPN-IPv4 NLRI and associated VPN label are received, a PE router can recurse through its routing table to find the underlying MPLS tunnel used to reach the remote BGP next-hop within the VPN-IPv4 NLRI.  In the Route Reflection model, the Route Reflector typically doesn’t have an MPLS tunnel to each PE for which it is receiving VPN-IPv4 NLRI.  Therefore, these routes never become active and are therefore not candidates for reflection back to other client and non-client peers.

A few methods have been developed which circumvent this issue.  One method is to simply define MPLS tunnels from the Route Reflector to each PE.  This solves the problem by allowing the Route Reflector to find a recursive match (i.e. MPLS tunnel) in order to reach the remote PE.  However, this approach suffers from the drawback in that it requires a whole bunch of MPLS tunnels to be configured which only serve to allow the received VPN-IPv4 NLRI to be considered active.  Remember, these tunnels are completely useless in that they will never be used for the actual forwarding of data, they are only used within the control plane to instantiate routes.

An alternative and much more graceful solution to this problem is to configure the Route Reflector with a static discard route within the routing table which is used to reference BGP next-hops in MPLS environments (inet.3 for example in JUNOS).  This static discard route only serves to function as a recursive match when incoming VPN-IPv4 NLRI are received for the express purpose of making these routes active and therefore candidates for redistribution.  In JUNOS, one can accomplish this using the following configuration:

With the above, any VPN-IPv4 NLRI received from a PE router is immediately made active due to the fact that a static route has been created in inet.3 which is the routing table used in JUNOS to recurse for BGP next-hops in MPLS environments.

An excellent whitepaper entitled “BGP Route Reflection in Layer 3 VPN Networks” expands upon this and describes the benefits of using Route Reflection in such environments. It also builds the case for using a distributed Route Reflection design to further enhance scalability and redundancy.

One thing to keep in mind is that with the Route Reflector approach, we have merely moved the problem set from that of the PE device to that of the Route Reflector.  Although it minimizes the number of iBGP sessions required on PE devices, the Route Reflector must be capable of supporting a large number of iBGP sessions and in addition, must be able to store all of the VPN-IPv4 NLRI for all of the VPNs for which it is servicing.  It is highly recommended that adequate amounts of memory are in place on the Route Reflector in order to store this large amount of routing information.

Finally, while using Route Reflectors is an acceptable solution in the interim to addressing scaling concerns with Provider-Provisioned VPNs, it is not clear if this approach is sufficient for the long term.  There are several other options being examined, with some of them outlined in a presentation entitled 2547 L3VPN Control Plane Scaling” given at APRICOT in Kyoto, Japan in 2005.

Book Review :: MPLS-Enabled Applications: Emerging Developments and New Technologies

MPLS Enabled Applications

MPLS-Enabled Applications: Emerging Developments and New Technologies
by Ina Minei, Julian Lucek
Paperback: 526 pages
Publisher: Wiley
ISBN-13: 978-0470986448

5starsExcellent coverage of VPLS, and Multicast over Layer 3 VPNs

Recently I had to work on a project which involved demonstrating Multicast over Layer 3 VPN interoperability between Cisco and Juniper. I spent several days reading through all the RFCs and working-group drafts which pertained to this subject matter, after which I still had many unanswered questions. In order to round out my understanding, I decided to order the Second Edition of ‘MPLS-Enabled Applications’. Looking back, I wish I had read this book instead of wasting my time reading the various RFCs and working-group drafts. This book answered all of my questions and went above and beyond to give me a solid understanding of the concepts and their application. As other reviewers have pointed out, often one needs to read a book to understand the technology basics, and then refer to RFCs or working-group drafts in order to keep abreast of the latest changes. Not so with this book… In fact, this book is so current that reading the working-group drafts is largely unnecessary. It is incredibly comprehensive, concise, and gives the reader a thorough understanding of the business drivers. Furthermore, it illustrates the various ways in which MPLS services can be offered and outlines the pros and cons of each approach so that the network designer can make intelligent decisions with regards to implementation.

In addition to the great coverage that was provided by the First Edition, the Second Edition has updated the text to reflect newer trends and applications such as the transport of IPv6 over an IPv4 MPLS core, and detailed coverage of end-to-end and local protection schemes in MPLS networks. Likewise, the chapter previously called “Point-to-Multipoint LSPs” has now been renamed to “MPLS Multicast”, with much more detailed coverage of the P2MP hierarchy and the forwarding-plane and control-plane operation. The biggest value for me was the addition of a completely new chapter on “Multicast over Layer 3 VPNs” which provided comprehensive coverage of this emerging technology and fully illustrates the full gamut of operation of either the PIM/GRE approach, or the NG-VPN approach utilizing BGP and P2MP LSPs. Finally, the addition of a chapter on “MPLS in Access Networks” was well deserved seeing as Ethernet is quickly becoming the access technology of choice and MPLS will likely be utilized as an overlay in order to realize the full potential of Ethernet in these environments.

This book has earned a spot on my bookshelf as one of my most coveted resources, and I refer to it quite often to refresh my memory on the myriad workings of various functions within MPLS. I wish I could give this book a rating higher than five stars! I can’t overemphasize how exceptional this book is. If you are in the market for a book covering MPLS and emerging applications offered on MPLS networks, this single book should be at the top of your list!