- Justin Baker
- March 31, 2009
Had a really fun customer meeeting the other day, the participants were your typical IT geek/gurus, but with an extra dash of funny. The "blamestorm" part of our customer presentation really resonated with him, this is where we talk about when you have a problem with your application, most people go around the room and keep saying "nope, it's not me" and the network always gets blamed first. Our charming Mr. Network Guy said, "damn right, the only reason I come to work everyday is to defend my job it seems." It was very funny, but also sadly true. Somehow the network team definitely gets the short end of the stick.
Case in point:
A customer rolled out a new application. The next day, key transactions were failing and the application was exhibiting a general sluggishness. People started crying foul, "something's wrong with the network, everything's so slow". Right, the network huh? Which didn't really change from the previous day at all, what about that application which introduced a number of changes? With the ExtraHop system in place, luckily we were able to find some serious database errors to be the root cause, and got the poor network team off the hook.
Classic end-user error. A customer has users calling in to complain the network is slow, claiming "this is unacceptable". After much rigmarole on the support team's part, it turns out the user had at least 20 applications open on his desktop, including Firefox, Flash, Photoshop ... With this many hogs, it's easy to see why his app is performing poorly, the network didn't really figure into the equation at all.
At another customer site, the network team got blamed when an internet facing application seemed to time out a lot. "Are we running low on bandwidth? Did you upgrade the routers to the latest firmware?" After some investigation using the ExtraHop system, they found a single media server that was serving up very out-of-date content. So when this particular server comes up on the round-robin rotation, the transaction would just fail. Purely an application layer problem, but buried deep in the cluster so that as our champion there said "if it wasn't for ExtraHop, we'd never have found that needle in the hay stack."
Moral of the story is, it's not always the network. Today's applications are supported by a complex ecosystem of servers, network devices, databases, storage devices and the application code itself, any one of them can be the true culprit. The key is to have complete visibility across all these layers, so you can find the root cause and avoid these knee-jerk reactions. It's no longer us vs them, it's time for more collaborative troubleshooting.
This is a companion discussion topic for the original entry at http://www.extrahop.com/post/blog/stories/i-come-to-work-daily-to-defend-my-job/