CCIE Pursuit Blog

May 6, 2008

Fun With ISPs

Filed under: BGP,Cisco,Work — cciepursuit @ 9:57 am
Tags:

Michael Morris recently vented a bit about CO techs (‘No Love For Central Office Techs’).  While I can’t agree with him about the value of unions (my father was in the IBEW and my mother is in the UAN, both of which saved our family’s ass when times were tough) I do agree with him about the flippant attitude one often encounters with carriers/LECs.  I had an interesting experience about a month ago.

A DS3 link was up and up but the BGP peering was idle.  I bounce the interface, cleared the BGP adjacency, deleted and re-added the BGP configuration, and I (eventually) reloaded the router.  Nothing changed.  The interface was up and up (I could see traffic flowing in and out of the interface) but the BGP peering would not establish. I brought our ATM subject matter expert (my ATM skills are weak) to look at the issue and he verified that ATM was working correctly.  I even looped the interface and was able to ping my interface IP address.  For whatever reason I had no layer 3 connectivity to the PER.

So I opened a ticket with AT&T and pushed them to get their BGP team involved.  They successfully tested the circuit to the CO.  Since this was a DS3 they did not have an NIU to loop at our premises.  I told them that they needed to get someone to verify the PER’s configuration (especially the BGP configuration).  They eventually verified the PER configuration.  I opened a TAC case to get the DS3 card replaced.  While the RMA was cooking, I put up a loop on my interface and asked ATT to test to it.  They successfully tested to the loop.  I dropped my loop.  Luckily, that’s when the break came.

“Thanks for your time, I’ll get Cisco to replace our card.”
“No problem.  I just ran another pattern to your loop and it was good as well.”
“Huh?”
“I just ran another pattern to your loop…”
“My loop?”
“Yes.”
“I dropped my loop 15 minutes ago.”
“Umm….”
“Are you testing the right circuit.”
“Yes.”
“Can you send a loopdown code.”
“I just did and I can still see your loop.”
“Is there a loop in the CO?”
“We’ll contact the LEC to look at that.”

Two hours later I get a call from an AT&T tech…well kind of.  The LEC for this area is SBC.  SBC recently bought AT&T.  So now the (follow me here) SBC LEC technicians refer to themselves as AT&T.  The US telecom industry can make your head explode if you try to follow it too closely. 🙂  So the call was actually from the CO technician who called me by accident and thought that I was AT&T (the carrier).  The other twist to this tale, is that my company (large enterprise) does not interface directly with the LEC.  We only interface with the carrier/ISP. 

Anyhoo…the LEC tech told me that there was a loop in the CO towards the customer’s (my!) equipment and asked if I wanted it dropped (again, he thought I was the carrier and not the customer).  I asked him why the circuit was looped.

“I have no fucking idea.”
“Really?  You guys looped a DS3 for no good reason.”
“Yup.”
“Drop the loop please.”

The loop dropped, the BGP peering established, and our site was back to 100% of their bandwidth capacity.  When I called AT&T (the carrier) to get a reason for outage, they gave me the tired old “cleared while testing”.  Nice.

Actually, there was another twist to this tale.  Our NOC missed the BGP alert.  We have separate routers connected to two different carriers (AT&T and MCI) at each of our sites.  So we still had a DS3 connection to the MCI cloud.  I don’t remember how the BGP issue eventually came to light, but it had been down for nearly a week when I got involved.  It’s a testament to our bandwidth allocation (but not our network monitoring) that the site never noticed the loss of 50% of its available bandwidth.  I have NO idea how this didn’t affect their VoIP.  Anyhoo…once I finally got an AT&T BGP technician to look at this issue, he had the balls to annotate the ticket (we can view their tickets online) with “BGP has been down for a week and they’re just now opening a ticket?”  When I spoke with him I told him that we have dual carriers and that MCI hadn’t fucked up our circuit and that he should probably keep comments like that out of our tickets.  This was before we discovered that the issue was not our equipment.  Now it was my turn to be a douchebag.  When AT&T told me “cleared while testing” I told them to open a post mortem (a ticket review process) on the ticket.  Then I jumped down the throat of our ATT account manager at our next weekly meeting.

“So you’re telling me that our circuit was looped at the CO and that it took you nearly two days to figure this out AND you lied to us about the RFO?  During this time our MCI circuit handled the load.  Why should we maintain you as a carrier?”

We pay tens of millions (maybe more) dollars for our bandwidth.  Even hinting that we might axe one or our carriers in favor of the other is kind of dirty, but it makes the account managers shit their pants and jump into action anytime we mention it.  By the end of the meeting my boss got AT&T refund us for 3 months worth of charges for that DS3.  It’s good to be king.  🙂

 

Advertisements

9 Comments »

  1. I hate AT&T with a passion!!!!! That put a little smile on my face just now…

    Comment by Carl Yost Jr — May 6, 2008 @ 11:20 am | Reply

  2. The whole “cleared while testing” thing has been a pretty common practice with Sprint/Embarq out here. Is that common with AT&T?

    I would think that with a company your size, they would do everything they can to be up front with you guys. Everybody knows that “cleared while testing” response is total BS.

    Also, on average, how much bandwidth are you guys pushing over those DS3’s? We only have about 3000 employees, but I still can’t see saturating a single DS3 with everything we’re pushing.

    Comment by kintner — May 6, 2008 @ 11:26 am | Reply

  3. That reminded me of http://www.youtube.com/watch?v=I6nuwQmhrZ8

    Comment by Tassos — May 6, 2008 @ 2:49 pm | Reply

  4. On a related note, we just had some pains with AT&T/SBC whoever they are this week. We’ve been working on getting a new DS3 installed for what feels like a year now. The kicker was when we finally (or so we thought) had the circuit ready to go and went to turn it up, ended up finding out that the circuit had been tested and verified good on let’s say Feb 15 (throwing out a date as an example), and on Feb 18 a disconnect order somehow got placed. We’ve had lots of fun getting everything scheduled only to have it all blow up in our faces because one side or the other isn’t working, or isn’t complete, or doesn’t have facilities, etc.

    I wish I could be in the room when the big guns make our reps cry over the agony they put us through.

    Comment by tml — May 7, 2008 @ 7:44 am | Reply

  5. Hey, sorry for my slowness, but how do you loop an interface? Is it related to a loopback? I’ve heard ‘looped back’ too but never knew the difference between them all.

    Comment by Marko — May 7, 2008 @ 8:12 am | Reply

  6. Let me try to answer my own question… I think its referring to connecting the send leads to the receive leads?

    Comment by Marko — May 7, 2008 @ 8:16 am | Reply

  7. @Kintner – “Cleared while testing” seems to be code for “We have no idea what fixed the issue” or “It was our fault, but we’re not going to admit it”. At my last job we really fought tooth and nail to get the actual reason for outage from the carriers. We were pretty diligent about getting rebates for missed SLAs. In my current job, we don’t really seem to care. We have an embarrassment of riches when it comes to bandwidth. We still want the carriers to fix issues quickly, but I rarely see anyone hold their feet to the fire if our outage was caused by the carrier. That’s reason that the “cleared while testing” RFO pissed me off. I knew that they were lying and it really didn’t make sense to lie because we were not going to hammer them over the issue.

    We rarely push more than 4 – 8 Mbps through our DS3s (on each DS3, so about 8 – 16 Mbps total). That’s most likely why one DS3 being down for a week didn’t (seem) to affect the site. Most of that traffic is VoIP. We usually grossly over-provision bandwidth because we’d rather eat the monthly cost of unused bandwidth than lose money over unavailable application/communications due to over utilized bandwidth. Since we are in the medical field, we also open ourselves up to lawsuits and such if we don’t communicate certain things. Last weekend a call center change control went bad while we were announcing a prescription drug withdrawal. There were a lot of suits on that war room who were not happy.

    That policy has its good and bad sides. One of the bad sides is that we try to fix everything with more bandwidth rather than optimizing the applications that are eating up the bandwidth. But empty pipes sure beat the hell out of trying to fix everything into too small of a pipe. 🙂

    Comment by cciepursuit — May 8, 2008 @ 11:22 am | Reply

  8. @Tassos – LOL. I love Colbert.

    Comment by cciepursuit — May 8, 2008 @ 11:27 am | Reply

  9. @tml – The whole provisioning/disconnect procedure can get really ugly. My favorite part about the “phantom disconnect” is when they try to tell you that I have to go through the provisioning process again (which can take weeks to months sometimes) to fix a mistake that they made. The circuit was working yesterday. It’s still physically connected at the prem. The carrier ‘accidentally’ disconnects the circuit. I’m responsible for fixing it (the really slow way)??? They act like someone has physically removed miles of fiber/copper when they ‘disconnect’ a circuit. Umm…just reverse out the changes you made last night and we’ll be cool right?

    That’s one reason that I don’t issue disconnect requests for site moves/closings until after my equipment is gone. I’ve been burnt too many times by disconnects going in ‘early’. The carriers seem to think that it’s okay to disconnect a circuit days or weeks before I request it. I’d rather pay for an extra month than risk have my site drop completely.

    Comment by cciepursuit — May 8, 2008 @ 11:41 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: