JNCIE Tips from the Field :: Summarization Made Easy

Today we’ll start with a series of articles covering tips and techniques that might be utilized by JNCIE candidates, whether pursuing the JNCIE-SP, JNCIE-ENT, or even the JNCIE-SEC.  The tips and techniques I will be covering might prove to be useful during a lab attempt but could also be used in real-world scenarios to save time and minimize configuration burden in addition to eliminating mistakes that might otherwise be made.  I want everyone to understand that what I am about to write is simply a technique.  I am not divulging any materials or topics which are covered under NDA.

Continue reading “JNCIE Tips from the Field :: Summarization Made Easy”

IETF Provides New Guidance on IPv6 End-Site Addressing

I’ve always been at odds with the recommendation in RFC 3177 towards allocating /48 IPv6 prefixes to end-sites.  To me this seemed rather short-sighted, akin to saying that 640K of memory should be enough for anybody.  It’s essentially equivalent to giving out /12s in the IPv4 world which in this day and age might seem completely ridiculous, but let us not forget that in the early days of IPv4 it wasn’t uncommon to get a /16 or even a /8 in some cases.

Granted, I know there are quite a few more usable bits in IPv6 than there are in IPv4, but allocating huge swaths of address space simply because it’s there and we haven’t thought of all the myriad ways it could be used in the future just seems outright wasteful. Continue reading “IETF Provides New Guidance on IPv6 End-Site Addressing”

Book Review :: JUNOS High Availability: Best Practices for High Network Uptime

JUNOS High Availability: Best Practices for High Network Uptime
JUNOS_High_Availabilityby James Sonderegger, Orin Blomberg, Kieran Milne, Senad Palislamovic
Paperback: 688 pages
Publisher: O’Reilly Media
ISBN-13: 978-0596523046

5starsHigh Praises for JUNOS High Availability

Building a network capable of providing connectivity for simple business applications is a fairly straightforward and well-understood process. However, building networks capable of surviving varying degrees of failure and providing connectivity for mission-critical applications is a completely different story. After all, what separates a good network from a great network is how well it can withstand failures and how rapidly it can respond to them.

While there are a great deal of books and resources available to assist the network designer in establishing simple network connectivity, there aren’t many books which discuss the protocols, technologies, and the myriad ways in which high availability can be achieved, much less tie it all together into one consistent thread. “JUNOS High Availability” does just that, in essence providing a single, concise resource covering all of the bits and pieces which are required in highly available networks, allowing the network designer to build networks capable of sustaining five, six, or even seven nines of uptime.

In general, there are a lot of misconceptions and misunderstandings amongst Network Engineers with regards to implementing high availability in Junos. One only needs to look at the fact that Graceful Restart (GR) protocol extensions and Graceful Routing Engine Switchover (GRES) are often mistaken for the same thing, thanks in no small part to the fact that these two technologies share similar letters in their acronyms. This book does a good job of clarifying the difference between the two and steers clear of the pitfalls typically prevalent in coverage of the subject matter. The chapter on ‘Control Plane High Availability’ covers the technical underpinnings of the underlying architecture on most Juniper platforms; coverage of topics like the separation between the control and forwarding planes, and kernel replication between the Master and Backup Routing Engine give the reader a solid foundation to understand concepts like Non-Stop Routing, Non-Stop Bridging, and In-Service Software Upgrades (ISSU). In particular I found this book to be very useful on several consulting engagements in which seamless high availability was required during software upgrades as the chapter on ‘Painless Software Upgrades’ discusses the methodology for achieving ISSU and provides a checklist of things to be performed before, during, and after the upgrade process. Similarly, I found the chapter on ‘Fast High Availability Protocols’ to be very informative as well, providing excellent coverage of BFD, as well as the differences between Fast Reroute vs. Link and Node Protection.

Overall I feel this book is a valuable addition to any networking library and I reference it often when I need to implement certain high availability mechanisms, or simply to evaluate the applicability of a given mechanism versus another for a certain deployment. The inclusion of factoring costs into a high availability design is a welcome addition and one that all too many authors fail to cover. Naturally, it only makes sense that costs should be factored into the equation, even when high availability is the desired end-state, in order to ensure that ultimately the business is profitable. If I had to make one suggestion for this book it is that there should be additional coverage of implementing High Availability on the SRX Series Services Gateways using JSRP, as this is a fundamental high availability component within Juniper’s line of security products. To the authors credit however, this book was written just as the SRX line was being released, so I don’t fault the authors for providing limited coverage. Perhaps more substantial coverage could be provided in the future if a Second Edition is published.

The bottom line is this – if you are a Network Engineer or Architect responsible for the continuous operation or design of mission-critical networks, “JUNOS High Availability” will undoubtedly serve as an invaluable resource. In my opinion, the chapters on ‘Control Plane High Availability’, ‘Painless Software Upgrades’, and ‘Fast High Availability Protocols’ are alone worth the entire purchase price of the book. The fact that you get a wealth of information beyond that in addition to the configuration examples provided makes this book a compelling addition to any networking library.

Reality Check: Traditional Perimeter Security is Dead!

Recently I came across a marketing event promoted by a network integrator which touted industry leading solutions to assist customers in determining “what was lurking outside their network”, as can be seen in the screenshot below. Please note all references to the company have been removed to protect the not-so-innocent.

lurking

In this day and age, it still surprises me when supposedly network savvy folks are still thinking of network security in terms of a traditional perimeter made up of firewalls or IPS devices. The truth of the matter is that the traditional perimeter vanished quite a few years ago.

Only looking at the perimeter gives the end-user a a false sense of protection. It completely fails to recognize the dangers of mobility in today’s traditional workplace environment. Users roam. They might bring in viruses or other Trojans INSIDE your network where they are free to roam unencumbered. In the worst of these cases, the perimeter is only secured in one direction, giving outbound traffic unfettered access and completely ignoring that data which might be leaked from hosts inside your network destined to hosts outside your network, as might be the case with Keyloggers or other similar types of rogue programs.

Furthermore, in today’s environment composed of virtualized machines, the line gets even blurrier which is why we are starting to see solutions from startup vendors such as Altor Networks. It’s one thing when we are dealing with physical hosts in the traditional sense, but what about the situation when you are dealing with a multitude of virtual machines on the same physical hosts which must talk to each other?

When you take a data-focused approach instead of a technology-focused approach, the problem and its solutions start to make more sense.   The perimeter should be viewed as the demarcation between the data and any I/O fabric providing connectivity between that data and some external entity. This is the domain of things like Data Loss Prevention (DLP), Network Access Control (NAC), and Virtual Hypervisor Firewalls in addition to that of traditional security devices.

trojan-horse

To deal with the realities of today, we must start to think of network security in terms of Hotels vs. Castles. In the Castle model, we have a big wall around our infrastructure. We might have a moat and some alligators, and perhaps we only lower our drawbridge for very special visitors. This model tends to keep a good majority of the enemies at bay, but it completely ignores that which might already be inside your network (think in terms of the Trojan horse as told in Virgil’s epic poem ‘The Aeneid’).

What is more commonly being employed is that of the Hotel Model.  Initially, to gain entrance into the hotel itself, we must check in with the Concierge and get our room key.  Once we have our room key, we have limited access to our own room, and perhaps some shared facilities like the pool or the gym.  In this model, we are unable to enter into a room in which we do not have access.  The key word here is LIMITED access.

An all-inclusive security posture looks at the network from a holistic point of view.  The principles of Defense-in-Depth will make evident the failings of the traditional perimeter model.  The traditional perimeter is dead.  The perimeter is wherever the data is.

What’s the BFD with BFD?

Many networks today are striving for “five nines” high availability and beyond. What this means is that network operators must configure the network to detect and respond to network failures as quickly as possible, preferably on the order of milliseconds. This is in contrast to the failure detection inherent in most routing protocols, which is typically on the order of several seconds or more. For example, the default hold-time for BGP in JUNOS is 90 seconds, which means that in certain scenarios BGP will have to wait for upwards of 90 seconds before a failure is detected, during which time a large percentage of traffic may be blackholed. It is only after the failure is detected that BGP can reconverge on a new best path.

Another example is OSPF which has a default dead interval of 40 seconds, or IS-IS which has a default hold-time of 9 seconds (for DIS routers), and 27 seconds (for non-DIS routers). For many environments which support mission-critical data, or those supporting Voice/Video or any real-time applications, any type of failure which isn’t detected in the sub-millisecond range is too long.

While it is possible to lower timers in OSPF or IS-IS to such an extent that a failure between two neighbors can be detected rather quickly (~1 second), it comes at a cost of increased protocol state and considerable burden on the Routing Engine’s CPU.  As an example, let us consider the situation in which a router has several hundred neighbors. Maintaining subsecond Hello messages for all of these neighbors will dramatically increase the amount of work that the Routing Engine must perform. Therefore, it is a widely accepted view that a reduction in IGP timers is not the overall best solution to solve the problem of fast failure detection.

Another reason that adjusting protocol timers is not the best solution is that there are many protocols which don’t support a reduction of timers to such an extent that fast failure detection can be realized. For example, the minimum BGP holdtime is 20 seconds, which means that the best an operator can hope for is a bare minimum of 20 seconds for failure detection.

Notwithstanding, this does nothing about situations in which there is no protocol at all, for example, Ethernet environments in which two nodes are connected via a switch as can be seen in the figure below.  In this type of environment, R1 has no idea that R2 is not reachable, since R1’s local Ethernet segment connected to the switch remains up.  Therefore, R1 can’t rely on an ‘Interface Down’ event to trigger reconvergence on a new path and instead must wait for higher layer protocol timers to age out before determining that the neighbor is not reachable.  (Note to the astute reader: Yes, Ethernet OAM is certainly one way to deal with this situation, but that is a discussion which is beyond the scope of this article).

L2-Connectivity

Essentially, at the root of the problem is either a lack of suitable protocols for fast failure detection of lower layers, or worse, no protocol at all.  The solution to this was the development of Bidirectional Forwarding Detection, or BFD, developed jointly by Cisco and Juniper.  It has been widely deployed and is continuing to gain widespread acceptance, with more and more protocols being adapted to use BFD for fast failure detection. 

So what is the Big Freaking Deal with Bidirectional Forwarding Detection anyway and why are so many operators implementing it in their networks?  BFD is a simplistic hello protocol with the express purpose of rapidly detecting failures at lower layers.  The developers wanted to create a low overhead mechanism for exchanging hellos between two neighbors without all the nonessential bits which are typical in an IGP hello or BGP Keepalives.  Furthermore, the method developed had to be able to quickly detect faults in the Bidirectional path between two neighbors in the forwarding plane.  Originally, BFD was developed to provide a simple mechanism to be used on Ethernet links, as in the example above, prior to the development of Ethernet OAM capabilities.  Hence, BFD was developed with this express purpose in mind with the intent of providing fault identification in an end-to-end path between two neighbors.

Once BFD was developed, the protocol designers quickly found that it could be used for numerous applications beyond simply Ethernet.  In fact, one of the main benefits of BFD is that it provides a common method to provide for failure detection for a large number of protocols, allowing a singular, centralized method which can be reused.  In other words, let routing protocols do what they do best – exchange routing information and recalculate routing tables as necessary, but not perform identification of faults at lower layers.  An offshoot of this is that it allows network operators to actually configure higher protocol timer values for their IGPs, further reducing the burden placed on the Routing Engine.

BFD timers can be tuned such that failure detection can be realized in just a few milliseconds, allowing for failure and reconvergence to take place in similar timeframes to that of SONET Automatic Protection Switching.  A word of caution – while BFD can dramatically decrease the time it takes to detect a failure, operators should be careful when setting the intervals too low.  Very aggressive BFD timers could cause a link to be declared down even when there is only a slight variance in the link quality, which could cause flapping and other disastrous behavior to ensue.  The best current practice with regards to BFD timers is to set a transmit and receive interval of 300ms and a multiplier of 3, which equates to 900ms for failure detection.  This is generally considered fine for most environments, and only the most stringent of environments should need to set their timers more aggressive than this.

One question that is commonly asked is how is it that BFD can send hello packets in the millisecond range without becoming a burden on the router.  The answer to this question lies in the fact that BFD was intended to be lightweight and run in the forwarding plane, as opposed to the control plane (as is the case with routing protocols).  It is true that while early implementations of BFD ran on the control plane, most of the newer implementations run in the forwarding plane, taking advantage of the dedicated processors built into the forwarding plane and alleviating the burden which would otherwise be place on the RE.  In JUNOS releases prior to JUNOS 9.4, BFD Hello packets were generated via RPD running on the RE.  In order to enable BFD to operate in the PFE in JUNOS versions prior to JUNOS 9.4, the Periodic Packet Management Daemon (PPMD) had to be enabled, using the command ‘set routing-options ppm delegate processing’.  In JUNOS 9.4 and higher this is the default behavior and BFD Hello packets are automatically handled by PPMD operating within the PFE.

Implementing Provider-Provisioned VPNs using Route Reflectors

MPLS/BGP Provider-Provisioned VPNs, such as those proposed in RFC 4364 (formerly RFC 2547) or draft-kompella variants, suffer from some scalability issues due to the fact that all PE routers are required to have a full iBGP mesh in order to exchange VPN-IPv4 NLRI and associated VPN label information.  In a modern network consisting of a large number of PE devices, it becomes readily apparent that this requirement can quickly become unmanageable.

The formula to compute the number of sessions for an iBGP full mesh is n * (n-1)/2.  10 PE devices would only require a total of 45 iBGP sessions (10 * (9)/2 = 45).  However, by simply adding 5 additional PEs into this environment your total number of sessions increases exponentially to 105.  Scalability issues arise because maintaining this number of iBGP sessions on each PE is an operational nightmare; similarly control plane resources are quickly exhausted.

An alternative to this that has gained widespread adoption is to utilize Route Reflectors to reflect the VPN-IPv4 NLRI and associated VPN label between PE devices.  However, several issues arise when using Route Reflectors in such an environment.  In a normal environment without the use of Route Reflectors, MPLS tunnels exist between each PE router such that when the VPN-IPv4 NLRI and associated VPN label are received, a PE router can recurse through its routing table to find the underlying MPLS tunnel used to reach the remote BGP next-hop within the VPN-IPv4 NLRI.  In the Route Reflection model, the Route Reflector typically doesn’t have an MPLS tunnel to each PE for which it is receiving VPN-IPv4 NLRI.  Therefore, these routes never become active and are therefore not candidates for reflection back to other client and non-client peers.

A few methods have been developed which circumvent this issue.  One method is to simply define MPLS tunnels from the Route Reflector to each PE.  This solves the problem by allowing the Route Reflector to find a recursive match (i.e. MPLS tunnel) in order to reach the remote PE.  However, this approach suffers from the drawback in that it requires a whole bunch of MPLS tunnels to be configured which only serve to allow the received VPN-IPv4 NLRI to be considered active.  Remember, these tunnels are completely useless in that they will never be used for the actual forwarding of data, they are only used within the control plane to instantiate routes.

An alternative and much more graceful solution to this problem is to configure the Route Reflector with a static discard route within the routing table which is used to reference BGP next-hops in MPLS environments (inet.3 for example in JUNOS).  This static discard route only serves to function as a recursive match when incoming VPN-IPv4 NLRI are received for the express purpose of making these routes active and therefore candidates for redistribution.  In JUNOS, one can accomplish this using the following configuration:

routing-options {
    rib inet.3 {
        static {
            route 0.0.0.0/0 discard;
        }
    }
}

With the above, any VPN-IPv4 NLRI received from a PE router is immediately made active due to the fact that a static route has been created in inet.3 which is the routing table used in JUNOS to recurse for BGP next-hops in MPLS environments.

An excellent whitepaper entitled “BGP Route Reflection in Layer 3 VPN Networks” expands upon this and describes the benefits of using Route Reflection in such environments. It also builds the case for using a distributed Route Reflection design to further enhance scalability and redundancy.

One thing to keep in mind is that with the Route Reflector approach, we have merely moved the problem set from that of the PE device to that of the Route Reflector.  Although it minimizes the number of iBGP sessions required on PE devices, the Route Reflector must be capable of supporting a large number of iBGP sessions and in addition, must be able to store all of the VPN-IPv4 NLRI for all of the VPNs for which it is servicing.  It is highly recommended that adequate amounts of memory are in place on the Route Reflector in order to store this large amount of routing information.

Finally, while using Route Reflectors is an acceptable solution in the interim to addressing scaling concerns with Provider-Provisioned VPNs, it is not clear if this approach is sufficient for the long term.  There are several other options being examined, with some of them outlined in a presentation entitled 2547 L3VPN Control Plane Scaling” given at APRICOT in Kyoto, Japan in 2005.

Book Review :: IS-IS: Deployment in IP Networks

IS-IS_DeploymentIS-IS: Deployment in IP Networks
by Russ White, Alvaro Retana
Hardcover: 320 pages
Publisher: Pearson Education
ISBN-13: 978-0201657722

2starsBetter off choosing an alternative selection

As IS-IS is one of the more esoteric protocols, understood only by a few people in large scale ISP environments, I thought this book would be a welcome addition to my library as there isn’t much else on the market covering this protocol. There are of course ISO 10589 and RFC 1195 which covers these protocols, but seeing as this is a short book I thought it might be able to shed some light on an otherwise complex protocol.

In reviewing this book I’ve come up disappointed in general. There are certainly a few golden nuggets and I give the book a couple of stars just for attempting to bridge the gap between the purely theoretical and the purely vendor specific. However, the book comes up short on most other points. Often times I found myself wanting to scrap this book in favor of some of the other selections on the market, but since I have respect for these authors I read the whole book hoping that they might be able to redeem themselves by the time I finished.

Obviously the authors have a great deal of knowledge about the subject, and I don’t fault them entirely. The quality of the editing is poor with many grammatical and syntactical errors littered throughout the text. There are abundant instances throughout the book where the diagrams used do not match the text describing them. I was rather disappointed because I usually find that Addison-Wesley publishes some of the best texts on the market.

All in all, I thought this book could have been a lot better than it was. After all, these authors have several other titles under their belt, most notably “Advanced IP Network Design”. But in this case, I would say that you are better off looking for other similar titles available on the market, such as Jeff Doyle’s “Routing TCP/IP Volume 1” or “The Complete IS-IS Routing Protocol” by Hannes Gredler and Walter Goralski.