Michael Morris recently vented a bit about CO techs (‘No Love For Central Office Techs’). While I can’t agree with him about the value of unions (my father was in the IBEW and my mother is in the UAN, both of which saved our family’s ass when times were tough) I do agree with him about the flippant attitude one often encounters with carriers/LECs. I had an interesting experience about a month ago.
A DS3 link was up and up but the BGP peering was idle. I bounce the interface, cleared the BGP adjacency, deleted and re-added the BGP configuration, and I (eventually) reloaded the router. Nothing changed. The interface was up and up (I could see traffic flowing in and out of the interface) but the BGP peering would not establish. I brought our ATM subject matter expert (my ATM skills are weak) to look at the issue and he verified that ATM was working correctly. I even looped the interface and was able to ping my interface IP address. For whatever reason I had no layer 3 connectivity to the PER.
So I opened a ticket with AT&T and pushed them to get their BGP team involved. They successfully tested the circuit to the CO. Since this was a DS3 they did not have an NIU to loop at our premises. I told them that they needed to get someone to verify the PER’s configuration (especially the BGP configuration). They eventually verified the PER configuration. I opened a TAC case to get the DS3 card replaced. While the RMA was cooking, I put up a loop on my interface and asked ATT to test to it. They successfully tested to the loop. I dropped my loop. Luckily, that’s when the break came.
“Thanks for your time, I’ll get Cisco to replace our card.”
“No problem. I just ran another pattern to your loop and it was good as well.”
“Huh?”
“I just ran another pattern to your loop…”
“My loop?”
“Yes.”
“I dropped my loop 15 minutes ago.”
“Umm….”
“Are you testing the right circuit.”
“Yes.”
“Can you send a loopdown code.”
“I just did and I can still see your loop.”
“Is there a loop in the CO?”
“We’ll contact the LEC to look at that.”
Two hours later I get a call from an AT&T tech…well kind of. The LEC for this area is SBC. SBC recently bought AT&T. So now the (follow me here) SBC LEC technicians refer to themselves as AT&T. The US telecom industry can make your head explode if you try to follow it too closely. 🙂 So the call was actually from the CO technician who called me by accident and thought that I was AT&T (the carrier). The other twist to this tale, is that my company (large enterprise) does not interface directly with the LEC. We only interface with the carrier/ISP.
Anyhoo…the LEC tech told me that there was a loop in the CO towards the customer’s (my!) equipment and asked if I wanted it dropped (again, he thought I was the carrier and not the customer). I asked him why the circuit was looped.
“I have no fucking idea.”
“Really? You guys looped a DS3 for no good reason.”
“Yup.”
“Drop the loop please.”
The loop dropped, the BGP peering established, and our site was back to 100% of their bandwidth capacity. When I called AT&T (the carrier) to get a reason for outage, they gave me the tired old “cleared while testing”. Nice.
Actually, there was another twist to this tale. Our NOC missed the BGP alert. We have separate routers connected to two different carriers (AT&T and MCI) at each of our sites. So we still had a DS3 connection to the MCI cloud. I don’t remember how the BGP issue eventually came to light, but it had been down for nearly a week when I got involved. It’s a testament to our bandwidth allocation (but not our network monitoring) that the site never noticed the loss of 50% of its available bandwidth. I have NO idea how this didn’t affect their VoIP. Anyhoo…once I finally got an AT&T BGP technician to look at this issue, he had the balls to annotate the ticket (we can view their tickets online) with “BGP has been down for a week and they’re just now opening a ticket?” When I spoke with him I told him that we have dual carriers and that MCI hadn’t fucked up our circuit and that he should probably keep comments like that out of our tickets. This was before we discovered that the issue was not our equipment. Now it was my turn to be a douchebag. When AT&T told me “cleared while testing” I told them to open a post mortem (a ticket review process) on the ticket. Then I jumped down the throat of our ATT account manager at our next weekly meeting.
“So you’re telling me that our circuit was looped at the CO and that it took you nearly two days to figure this out AND you lied to us about the RFO? During this time our MCI circuit handled the load. Why should we maintain you as a carrier?”
We pay tens of millions (maybe more) dollars for our bandwidth. Even hinting that we might axe one or our carriers in favor of the other is kind of dirty, but it makes the account managers shit their pants and jump into action anytime we mention it. By the end of the meeting my boss got AT&T refund us for 3 months worth of charges for that DS3. It’s good to be king. 🙂