ccie blog

BGP Recursive Routing Failure

I came across an really awesome problem about a BGP next hop recursion failure that you can run into. I bet you I get asked this when I do my CCIE lab exam. Let me show you a basic BGP network setup and introduce a very annoying problem to solve. Below is the network, along with the current configurations underneath.


router bgp 1
 no synchronization
 bgp router-id
 bgp log-neighbor-changes
 neighbor remote-as 2
 neighbor ebgp-multihop 2
 neighbor update-source Loopback0
 no auto-summary
interface Loopback0
 ip address
 ip route
router bgp 2
 no synchronization
 bgp router-id
 bgp log-neighbor-changes
 neighbor remote-as 1
 neighbor ebgp-multihop 2
 neighbor update-source Loopback0
 no auto-summary
interface Loopback0
 ip address
ip route

Read the rest of this entry »

Understanding The OSPF Forwarding Address

There seems to be a lot of confusion on the web about what the OSPF forwarding address is. So I’ve decided to create a lab to show its purpose and how it works.

In this lab I used the topology below to create the scenario in which it was designed for. The actual use case was for BGP instead of EIGRP, but my IOS image was crap & didn’t support BGP, so I could only use EIGRP for this lab (however, the principle is exactly the same).

R1, R2 and R3 have interfaces in OSPF area 0. R2 & R3 also have an OSPF relationship in Area1 over the LAN segment connecting to the switch. But only R2 is running EIGRP with R4.

OSPF Forwarding AddressIn this scenario (before OSPF was modified to allow a non-zero value for the forwarding address) only R2 does the redistribution between EIGRP and OSPF. What this means is that all routers (including R3) have to go via R2 in order to reach The reason why is because R2 advertises that in order to reach everyone must solve the shortest path to me (the ASBR), and then R2 knows the shortest path to the destination.

So when this issue was pointed out to John Moy (one of the OSPF engineers), he sat down and figured that providing all links are of equal bandwidth in this network, wouldn’t it be better if R3 just went directly towards R4 to reach instead? Since all R2 is going to do is send the traffic to the next hop IP ( that is on the same Ethernet segment as R3 anyways, it doesn’t seem logical that R3 should have to take a path via R2 first. The engineers then decided to then change OSPF so that R2 could just say that to reach, just solve the shortest path to instead. We will take a look at how this is achieved in the lab in just a moment. But I want you to know is that this was original reason for allowing the forwarding to be changed.

Read the rest of this entry »

OSPF NSSA Translator Election & Forwarding Address

In this section I will look at who does the NSSA type 7 to 5 translation & how we can use the forwarding address to control routing towards external routes that have been brought into the OSPF domain via an NSSA. The topology we will use is below. Currently R4 is redistributing the prefix into OSPF and all routers have reachability to it.

OSPF Forwarding Address With An NSSA & Two ABRs

In a network like above, only one ABR router connected to the NSSA is elected to do the LSA type 7 to 5 translation for the EIGRP route The decision is based on who has the highest router-id. So if we look at the OSPF database for this external prefix on R1, we should see that the advertising router is R3 ( because his router-id is higher than R2 ( and therefore did the translation. Read the rest of this entry »

Understanding Default Routing With OSPF NSSAs

Default routing in NSSA’s needs a little bit of thinking about before you do it. The reason why is because you can generate a type 5 LSA or a type 7 LSA for your default route depending on the command you input, and obviously the type 5 default won’t go into the NSSA. There is some other really interesting factors that you also should take into consideration too, and I will talk about these in this post. The topology I will use is shown below.

OSPF_Understanding Default Routing

Read the rest of this entry »

Understanding OSPF External LSA Recursion

I wanted to fully understand how routers recurse LSA’s for external routes. Since it’s probably the most complex thing in OSPF, I decided to lab it up and explain how it works using real examples. Since this topic is pretty intense, this post is not for the feint hearted. It will be long but extremely useful for understanding how OSPF works on a fundamental & CCIE level. I really enjoyed learning about this topic & it has significantly improved my understanding of OSPF. FYI I have also included how the metric is calculated at the bottom of each scenario.

I figured that external routes can be learned in 3 ways:

  • Scenario 1: From an Intra-Area ASBR
  • Scenario 2: From an Inter-Area ASBR
  • Scenario 3: From an ASBR in an NSSA, and then learned by area 0

My topology accomodates for each of these scenarios, however not all routers will be used in each scenario, so it’s actually easier than it looks to follow along. FYI, take note of the following facts:

  • R4 and R6 are doing mutual redistribution between OSPF and EIGRP
  • All router-ID’s have NOT been advertised into OSPF

Understanding LSA Recursion

Scenario 1:  Recursion of a Type 5 LSA learned from an Intra-Area ASBR

Read the rest of this entry »

PPPoE with BT Infinity

This post is a little off-topic from the CCIE study material, but still relevant in terms of PPP. Anyways I recently had some ethical problems with my ISP (BT) and their BT Infinity product (if you’re from the US assume I’m talking about a provider like Comcast, or if you are from from France assume it’s Orange, or just some big ISP). It’s basically a Fiber To The Cabinet (FTTC) product, and then copper from the cabinet in the street to your home. Since BT reserve the right to remote-manage the router they supplied (even though it’s not a managed service and the customer technically owns the router), they regularly push updates, firewall changes and other stuff to the box using a protocol called CWMP. Long story short, their constant updates rebooted the router on a regular basis, and now I am replacing the router with a Cisco 887VA. The configuration required to get the PPPoE WAN side working is shown below. Note that shutting down the ATM0 interface is absolutely required in order to get this to work. The 887VA comes with an ADSL/VDSL combination interface and you can only use the ADSL OR the VDSL. By shutting down the ATM0 interface, it allows you to use the VDSL connection.

interface Dialer1
ip address negotiated
encapsulation ppp
ip tcp adjust-mss 1452
dialer pool 1
ppp mtu adaptive
ppp authentication chap callin
ppp chap hostname
ppp chap password 0 bt

interface ATM0
no ip address

interface Ethernet0.101
encapsulation dot1Q 101
pppoe-client dial-pool-number 1

I will now run over some of the technicalities below.

MTU explination

Ethernet uses a standard 1500 byte MTU to transmit data. However when we use PPPoE, it adds an extra 8 bytes of overhead (6 bytes for the PPP header and 2 bytes for the PPP Protocol ID) when it encapsulates the datagram with PPP. So this means that in order to make sure we do not exceed the Ethernet 1500 byte MTU, we have to ensure we restrict the MTU size of the PPP frame to 1492. So when PPP encapsulates the frame and adds 8 bytes onto it, then it will not exceed 1500 bytes. The way I’ve done this in the configuration is by using #ppp mtu adaptive on the dialer interface. Where, when the negotiation of the link parameters is determined using Link Control Procol (LCP), it just simply accepts the MTU provided by the peer.

The way I checked that the peer wasn’t going to give me some funky MTU value, was just to enable a debug, shown below. Note, I also took my adaptive mtu setting off temporarily to show why we actually need it (ideally).

router#debug ppp negotiation
router#config t
router(config)#int di1
router(config)#no ppp mtu adaptive
router(config)#no shut

Feb 10 22:46:15.060: %DIALER-6-BIND: Interface Vi2 bound to profile Di1
Feb 10 22:46:15.064: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to up
Feb 10 22:46:15.064: Vi2 PPP: Sending cstate UP notification
Feb 10 22:46:15.064: Vi2 PPP: Processing CstateUp message
Feb 10 22:46:15.064: PPP: Alloc Context [8551EED0]
Feb 10 22:46:15.064: ppp4 PPP: Phase is ESTABLISHING
Feb 10 22:46:15.064: Vi2 PPP: Using dialer call direction
Feb 10 22:46:15.064: Vi2 PPP: Treating connection as a callout
Feb 10 22:46:15.064: Vi2 PPP: Session handle[85000004] Session id[4]
Feb 10 22:46:15.064: Vi2 LCP: Event[OPEN] State[Initial to Starting]
Feb 10 22:46:15.064: Vi2 PPP: No remote authentication for call-out
Feb 10 22:46:15.064: Vi2 LCP: O CONFREQ [Starting] id 1 len 10
Feb 10 22:46:15.064: Vi2 LCP:    MagicNumber 0x87665B00 (0x050687665B00)
Feb 10 22:46:15.068: Vi2 LCP: Event[UP] State[Starting to REQsent]
Feb 10 22:46:15.104: Vi2 LCP: I CONFREQ [REQsent] id 88 len 19
Feb 10 22:46:15.104: Vi2 LCP:    MRU 1492 (0x010405D4)
Feb 10 22:46:15.104: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
Feb 10 22:46:15.104: Vi2 LCP:    MagicNumber 0x7CCF52D6 (0x05067CCF52D6)
Feb 10 22:46:15.104: Vi2 LCP: O CONFNAK [REQsent] id 88 len 8
Feb 10 22:46:15.104: Vi2 LCP:    MRU 1500 (0x010405DC)
Feb 10 22:46:15.104: Vi2 LCP: Event[Receive ConfReq-] State[REQsent to REQsent]

From the output we can see that the incoming Maximum Receivable Unit (MRU) is 1492 (note that the red colored “I” denotes that this is an incoming message. Likewise, “O” means outgoing). My router then stated he didn’t want to use this MRU, and wanted to use 1500 (default Ethernet MTU size). So to get this setting corrected I just used the #ppp mtu adaptive command, which just listens to what the other side of the PPP session wants to use, and just accepts that value. Another way you can do this is just by setting the MTU on the dialer using #mtu 1492. Note, this is not the same as #ip mtu 1492, which a lot of people have mistakenly put on their configurations. The #IP mtu command would just influence the maximum size of the layer 3 PDU that can be encapsulated into the PPP payload. However, I’m not trying to do that. Instead I’m trying to tell the PPP neighbor that my MRU is 1492, which is nothing to do with any upper layer protocols such as IP.

Command: #ip tcp adjust-mss

This command adjusts the Maximum Segment Size (MSS), otherwise known as payload, that is allowed to be used when sending TCP segments.

So we know that Ethernet has an MTU/payload of 1500 bytes. So everything from all the upper layer protocols must fit into this 1500 byte data field in the Ethernet frame before it gets transmitted across the link. We also know PPP adds 8 bytes of headers, IP adds 20 bytes of headers, and TCP adds 20 bytes of headers. So 1500-8-20-20 = 1452. So the maximum amount of data that we could possibly send in the TCP segment would be 1452 bytes. So this is therefore our MSS value. If we were to exceed this value, then it would not fit inside the Ethernet frame, and device would notify the upper layer protocols that fragmentation needs to occur in order to transmit the data across the link.


In order for the VDSL card to work inside the 887VA, you have to actually shutdown the ATM interface. So that is why I included that in this configuration. The actual VDSL configuration is done on the Ethernet 0 interface (I bet you always wondered why that interface was there! I know I did lol). Also, BT advise for PPPoE to work on their network, we have to use VLAN 101, which is the reason for the encapsulation on the Ethernet interface.

Real Life DSL

This post will go into detail to explain how DSL works end to end between the customer and the ISP. It will be a pretty heavy read, so this ain’t for the light hearted. In the scope of the CCIEv5 lab exam you have to know how to configure a broadband access group (a bba-group) on the ISP side, and also how to configure PPP on the client side. For me, that isn’t enough detail/reality in order to get a good understanding of how DSL works, so this post will explain how connectivity works in a real life scenario. In this post if you are not from the UK, when I mention BT, assume I’m talking about a mainline telephone/internet provider. So in America this would be equivalent of Comcast, or Germany would be Deutsche Telekom, or France would be Orange etc.

Understanding connectivity using standard copper cabling
When customers need to connect to an ISP using DSL, they do so via their normal telephone lines. If your using a Cisco 1800/2800 sereis router you need a WIC card, usually a HWIC-1ADSL-M card (which is backwards compatible with ADSL, ADSL2 and ADSL2+ for Annex A, as well as supporting Annex M), or HWIC-1VDSL if you’re using a 1900/2900 series router. Effectively what these WIC’s do is just provide an interface to the ATM network between your customer router and the DSLAM. The diagram below represents basic customer router connectivity to the DSLAM & the BRAS (I’ll talk about the BRAS in just a moment). The reason I’ve put a cloud between the customer router and DSLAM is just to represent that loads of customers will be connecting to this DSLAM via the ATM network.



Ok, so at the other end of the ATM cloud is where the telephone exchange is located. The copper pair from each customer goes into a huge patch pannel, called the Main Distribution Frame (MDF). This is then connected to the DSLAM using something like this 50pin RJ21 to RJ11 cable shown below. So the RJ11 ends would plug into the MDF, and each copper pair relating to that RJ11 would be wired into the RJ21 connector at the end of the cable, which would then connect to the DSLAM. This one cable would allow one RJ21 connector to support 24 DSL subscriber lines.

25pin Amphenol Cable

The DSLAM now needs to connect to the Broadband Remote Access Server (BRAS). The functionality of the BRAS is to authenticate the end users PPP session in order to gain access to BT’s network. It can also send PPP sessions to other ISP’s for authentication, but I’ll dig into that in just a sec. But basically what you need to know is that a bunch of subscribers will connect from the MDF into the DSLAM, and then all these subscibers will use the uplink towards the BRAS router via the ATM cloud.

From the DSLAM to the BRAS then, the connection can either be done by Ethernet or ATM. Traditionally it was done via ATM. So the DSLAM would be connected to an ATM switch (which would just be there to take connections from a bunch of DSLAMs in the exchange) which would be configured to create a Permanent Virtual Circuit (PVC) between the BRAS and the DSLAMs using ATM sub-interfaces. This PVC is then mapped to a subscriber line-id, which is basically just a VLAN on the DSLAM that has been used for a particular local loop copper pair towards a customer.

The connection between the DSLAM and the BRAS could also be done via Ethernet instead of ATM, but requires just a little more config to embed the subscriber line-id into the PPP discovery frame. All this does is just tell the BRAS which DSLAM contains which subscriber-id.


Terminating PPP sessions to a third party ISP

Ok, I’ve made a simplified diagram of the network above so that I can explain how to terminate PPP sessions to a third party ISP, for this example lets just say Plusnet is the 3rd party ISP. See the diagram below.


So in order for another ISP to authenticate DSL subscriber PPP sessions (i.e. Plusnet’s customers), we need a way of tunneling the layer 2 ppp session over towards the ISP’s router. The way this is done is by using Layer 2 Tunneling Protocol. So when the PPP session is initiated from the customer, it includes some authentication parameters, such as the CHAP username and password. Typically the username would contain a domain name such as and some random password. Technically the is considered a realm in the ISP world, and it’s this parameter that the BRAS uses to decide where to forward the PPP session. The L2TP tunnel is then created between the BRAS (also known as a Layer 2 Access Conecntrator, LAC, since it’s creating a load of layer 2 tunnels to different ISPs/locations) and the ISP’s Layer 2 Network Server (LNS) router based on the PPP realm sent from the customer.

In short, the customer creates a PPP sessions to the ISP. The session hits BT’s LAC who do a lookup on the realm (usually using a RADIUS server). Then the LAC forwards this over to the LNS over in Plusnet’s ISP. The ISP authenticates the customers PPP credentials, which then gives the customer access to the service provider network. So the layer 2 forwarding path at a very high level overview is just from the customers router to the ISP’s LNS.

The ISP then provides connectivity to the rest of the world using a transit ISP. Basically, what this means is that Plusnet will have an upstream transit service provider that they create a BGP peering session with. The transit provider will be have a tonne of BGP peers in various peering exchanges across the country that connect to other ISP’s, which effectively create the internet. Some peering exchanges I’ve used myself are LINX and LONAP in London.

Note: If you wish to see the configuration of how the customer router, BRAS/LAC and LNS is setup below, you can just click on my post about Configuring DSL. Radius server config was not applied in the example, but it should give you an idea on how it works.

Loop Guard and UDLD

Loop guard and UDLD are two ways to protect your fiber cables from causing loops in the network.  In short, loop guard is a spanning-tree optimisation, and UDLD is a layer 1/2 protocol (unrelated to spanning-tree) that protects your upper layer protocols from causing loops in the network.  To explain these features clearly, see the diagrams below.  The first diagram is the layer 2 spanning-tree topology, and the second diagram is the actual physical wiring used in the topology. You will need to use both diagrams as a reference point simultaneously in order to understand how loop guard and UDLD work in the examples I will provide.

Loop Guard and UDLD

In case you are not familiar with fiber, you need to make sure you understand the connection between Sw2 and Sw3 in the diagram on the right hand side.  This is two physical cables, one is to transmit data and the other is to receive data. These fiber cables are usually plugged into an SFP such as the one shown below, and then the SFP is inserted into the switch. On the switch, this is shown as one physical port. In my diagram, it’s shown as Gi0/1 on Sw2 and Sw3.


Read the rest of this entry »

RSTP Alternate and Backup Ports

This post identifies differences between the legacy spanning-tree (PVST+) non-designated port & the new RSTP replacement ports.

RSTP brought about a couple of new port roles compared to legacy spanning-tree, see below.

Legacy PVST+                      RSTP

Root                    ———->   Root
Designated          ———->   Designated
Non-Designated   ———->   Now uses Alternate and Backup ports

In PVST+ we said that anything that was in a blocking port state, was a spanning-tree a non-designated port.  RSTP has broken this blocking port down into two separate functions in order to provide faster convergence in a couple of difference scenarios. Read the rest of this entry »

Native VLAN

Growing a little tired of reading numerous useless posts about the native VLAN, I decided to do one that describes exactly what it is.  The native VLAN has two main functions:

  1. Tags incoming un-tagged traffic on trunk links with the native VLAN.
  2. Un-tags outgoing traffic that has already been tagged with same VLAN that is being used for the native VLAN on the trunk.

Let me elaborate on this a little bit with aid of the diagram shown below.

Native VLAN

A normal design would use the same native VLAN both sides of the trunk.  But to understand the native VLAN properly, I’ve designed it this way instead.  So going back to the bullet points above (specifically bullet point 2), when the switchport connecting to Host A has been configured to use the same access VLAN (vlan 50) that is being used as the native VLAN on the trunk, the data sent from Host A is un-tagged as it leaves Switch 1 towards switch 2.  This leads us up to bullet point 1 (above), where switch2 now receives an un-tagged frame (i.e. a frame without a VLAN tag on it). Switch2 will always tag this, currently tag-less frame with the configured native VLAN on the trunk, in this case VLAN 60. So this actually leaks VLAN 50 into VLAN 60’s broadcast domain.

Read the rest of this entry »