CCIE Pursuit Blog

September 25, 2007

LFU 4 – Fat Fingers Can Doom You

I was doing a NAT lab today and came to a dead stop because I couldn’t get BGP to work between two routers.  R4 and R5 share two links: a PTP serial link (155.1.45.0/24) and a PTP Frame Relay link (155.1.0.0/24).  I was running OSPF as an IGP and everything was fine until I found that BGP was not working:

r4#sh ip bgp sum
BGP router identifier 150.1.4.4, local AS number 1
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
150.1.5.5       4     2       0       0        0    0    0 never    Active

r5#sh ip bgp sum
BGP router identifier 150.1.5.5, local AS number 2
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
150.1.4.4       4     1       0       0        0    0    0 never    Active

I went over the BGP config on both routers and couldn’t find any issues:

r4#sh run | sec bgp
router bgp 1
 no synchronization
 bgp router-id 150.1.4.4
 bgp log-neighbor-changes
 neighbor 150.1.5.5 remote-as 2
 neighbor 150.1.5.5 ebgp-multihop 255
 neighbor 150.1.5.5 update-source Loopback0
 no auto-summary

r5#sh run | sec bgp
router bgp 2
 no synchronization
 bgp router-id 150.1.5.5
 bgp log-neighbor-changes
 neighbor 150.1.4.4 remote-as 1
 neighbor 150.1.4.4 ebgp-multihop 255
 neighbor 150.1.4.4 update-source Loopback0
 neighbor 150.1.4.4 default-originate
 no auto-summary

I issues “clear ip bgp *” multiple times on both sides.  I removed the whole BGP configuration on both routers and then re-added them.  Finally, I reloaded both routers.  I still couldn’t get BGP to work.

I debugged BGP events:

r4#debug ip bgp event
BGP events debugging is on
*Sep 25 16:52:58.743: BGP: Regular scanner event timer
*Sep 25 16:52:58.743: BGP: Import timer expired. Walking from 1 to 1

r4#clear ip bgp *

*Sep 25 16:52:58.743: BGP: Regular scanner event timer
*Sep 25 16:52:58.743: BGP: Import timer expired. Walking from 1 to 1
*Sep 25 16:53:04.371: BGP: reset all neighbors due to User reset
*Sep 25 16:53:04.375: BGP(IPv4 Unicast): will wait 60s for the first peer to establish
*Sep 25 16:53:04.375: BGP(IPv6 Unicast): computed bestpaths, table version wentfrom 1 to 1
*Sep 25 16:53:04.375: BGP(VPNv4 Unicast): computed bestpaths, table version went from 1 to 1
*Sep 25 16:53:04.375: BGP(IPv4 Multicast): computed bestpaths, table version went from 1 to 1
*Sep 25 16:53:04.375: BGP(IPv6 Multicast): computed bestpaths, table version went from 1 to 1
*Sep 25 16:53:04.375: BGP(NSAP Unicast): computed bestpaths, table version went from 1 to 1
*Sep 25 16:53:13.743: BGP: Regular scanner event timer
*Sep 25 16:53:13.743: BGP: Import timer expired. Walking from 1 to 1
*Sep 25 16:53:28.743: BGP: Regular scanner event timer
*Sep 25 16:53:28.743: BGP: Import timer expired. Walking from 1 to 1
*Sep 25 16:53:43.743: BGP: Regular scanner event timer
*Sep 25 16:53:43.743: BGP: Performing BGP general scanning
*Sep 25 16:53:43.743: BGP(0): scanning IPv4 Unicast routing tables
*Sep 25 16:53:43.743: BGP(1): scanning IPv6 Unicast routing tables
*Sep 25 16:53:43.743: BGP(IPv6 Unicast): Performing BGP Nexthop scanning for general scan
*Sep 25 16:53:43.743: BGP(1): Future scanner version: 16, current scanner version: 15
*Sep 25 16:53:43.743: BGP(2): scanning VPNv4 Unicast routing tables
*Sep 25 16:53:43.743: BGP(VPNv4 Unicast): Performing BGP Nexthop scanning for general scan
*Sep 25 16:53:43.743: BGP: Import walker start version 0, end version 1
*Sep 25 16:53:43.743: BGP: … start import cfg version = 0

I did a Google search on “BGP: Import timer expired. Walking from 1 to 1” and came across a post suggesting the following:

1) You don’t have a route to it.

2) You need ebgp-multihop but haven’t configured it. (If it’s not on a directly connected network or you’re using update-source loopback, you need ebgp-multihop)

3) (Unlikely, I suspect you’d get a different error) It’s not configured to talk BGP to you.

1 – check.  2 – check.  3 – ummm check.

Actually, number 1 was my issue.  Even though I had looked at the OSPF config, I never did my due diligence and actually verified the loopback addresses from each side of the link(s).  When I finally did that, I found my problem:

r5#sh ip route 150.1.4.4
% Subnet not in table
  <-this is a problem  🙂

Although I had glanced at the OSPF configurations, I didn’t notice my problem the first couple of times:

r4#sh run | sec ospf
router ospf 100
 router-id 150.1.4.4
 log-adjacency-changes
 network 155.1.0.4 0.0.0.0 area 0
 network 155.1.4.4 0.0.0.0 area 0  <-DOH!!! 150 not 155!!!
 network 155.1.45.4 0.0.0.0 area 0

r4(config)#router os 100
r4(config-router)#no network 155.1.4.4 0.0.0.0 area 0
r4(config-router)#net 150.1.4.4 0.0.0.0 area 0
r4(config-router)#^Z
r4#
*Sep 25 17:00:39.999: %BGP-5-ADJCHANGE: neighbor 150.1.5.5 Up
*Sep 25 17:00:41.255: %SYS-5-CONFIG_I: Configured from console by console
r4#sh ip bgp sum
BGP router identifier 150.1.4.4, local AS number 1
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
150.1.5.5       4     2       2       2        0    0    0 00:00:12        0  <-success!!!!

My OSPF neighbors were established on each router using the router-id which was the same as the loopback address.  I didn’t think the problem through enough to realize that this meant absolutely nothing about the state of the route from each router to the other router’s loopback address.  I had fat-fingered the network address in r4’s OSPF configuration and therefore the network was never advertised into OSPF.  BGP was using the loopback address as the neighbor address.  Since it did not have an IGP route to the loopback, the BGP adjacency never established.  About 45 minutes of head-scratching later, I discovered the problem.

Internetwork Expert advises not to use loopback addresses like 1.1.1.1 (r1) because it is pretty easy for one of the BBC routers to use those types of address and inject some not-so-fun troubles into your lab.  On the same hand, if your loopback addresses are very similar to your active interface networks, it becomes pretty easy to mistype a network statement which will lead to problems like the one that I had.  It also makes it a bit more difficult to find the mistyped statement(s) when you’re quickly trying to troubleshoot.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: