Fred

FREDWESTON.NET - InfoWorld article about network admins
InfoWorld article about network admins
Posted under Miscellaneous on Thursday, July 18, 2013 @ 11:41:00 AM
Show Previous Article Show Next Article
InfoWorld posted an article that I agree with so much, I decided to repost here. A lot of these resonate with me. Credit to Paul Venezia.

Veteran network admin trait No. 1: We already know it's down
Few things are more annoying than having your phone blow up with automated alert messages from your monitoring systems, scrambling to dig into the issue, only to be continually bombarded with humans texting/talking/emailing/calling with the same "Is x down?" question, or even worse, "The network's down!" If the outage is significant, we already know about it, and we are trying to work on it as fast as we possibly can. Continued attempts to deliver elderly information will only impede that effort.

Veteran network admin trait No. 2: If we don't know it's down, it's probably not down
Conversely, if we get a message claiming, "The network's down!" yet we have not been notified by any monitoring system, then the problem is almost certainly the complaining user in question. To users, if there is any resource that cannot be contacted, whether that resource is internal to the network, on the Internet, or perhaps orbiting the earth, that means the network is "down." This apparently includes 404 errors from shady websites, mistyped URLs, or the lack of any sort of network connection on the user's laptop. Nothing is more grating to a network admin than someone claiming the network is "down." No, it isn't -- reboot your laptop.

Veteran network admin trait No. 3: We will ping and test several times before digging into the problem
If we begin looking into a problem, especially across a WAN or long-haul link with several providers in the middle, we will reserve judgment for the first several minutes. This is because these connections are subject to the vagaries of their path, and connectivity problems can come and go like ghosts in the night. A fiber WAN link that was stable a minute ago but is now exhibiting 30 percent packet loss will more than likely fix itself in short order. Only after a probationary period will we start digging deeper into the issue.

Veteran network admin trait No. 4: Believe it or not, we've tried turning it off and back on again
Many times, we will "fix" a problem by turning an interface off and back on again. In fact, this may be the first thing we try when troubleshooting an issue. Whether it's a problem with auto-negotiation, a sketchy cable, or sunspots, you'd be surprised at how often dropping and restarting an interface will restore proper operation. While networking may be a science, it's not without its white whale.

Veteran network admin trait No. 5: During an outage, we're not just staring at the screen -- we're following a path in our heads
If you come across a network admin working on a problem, looking at a routing table with a 1,000-yard stare, he's not experiencing performance anxiety or somehow "lost." He's running through dozens of possible scenarios in his head, following the routing and switching paths, and calculating possible problem scenarios. The uninitiated can't quite grok the way a network admin's mind works, but a parallel might be to imagine an intangible maze, then try to solve it. Somewhere in that maze lies a dead end that shouldn't be there. We're looking for that -- then we can take steps to open up the path again.

Veteran network admin trait No. 6: We calculate subnet masks and CIDR as easily as breathing
I think it's safe to say that perhaps outside of the common Class C netmask, the overwhelming majority of humans do not understand subnetting, or CIDR. For network admins, this is as intrinsic and involuntary to our brain as breathing. It's not just knowing that a /28 is a netmask of 255.255.255.240, and that it describes 16 addresses, 14 usable, or where subnet boundaries lie. It's the ability to collapse large numbers of smaller subnets into larger descriptors in order to reduce routing table sizes, ACL applications, and a wide variety of other internal networking tasks. When we see contiguous networks sliced into smaller chunks in an ACL, yet with identical rules applied, we get agita. A /19 is a much better idea than a collection of /24s. The same applies to wildmasks.

Veteran network admin trait No. 7: We do not tolerate bugs; they are of the devil
On occasion, conventional troubleshooting or building new networks run into an unexplainable blocking issue. After poring over configurations, sketching out connections, routes, and forwarding tables, and running debugs, one is brought no closer to solving the problem. This is the unholy area of networking inhabited by the software bug. Network admins think of switching and routing software bugs as personal attacks, and they will usually excoriate a vendor when one is discovered. This is because before the determination is made that the problem is due to a bug, nothing makes sense whatsoever. It completely violates years of experience and knowledge, throws waste to logic, and causes immense amounts of stress and turmoil. You might think of it as if you spontaneously transmogrified into a difference species. Everything you've ever known suddenly does not apply, yet here you are.

Veteran network admin trait No. 8: We can read live packet streams and write highly complex filters in our sleep
Few items in networking and computing are even remotely similar to how they're portrayed on TV and in movies. The terminal window with text rapidly scrolling past, however, is one of them. In the sys admin world, that's usually a log file tail. In the networking world, it's usually a packet dump. Depending on what we're trying to do, we may quickly call up a packet capture on a live circuit and watch packets fly by in real time. It's not gibberish. We're looking for telltale signs, and we'll winnow down our capture using filters until we've found what we're looking for. Speed is usually of the essence in these cases, so we're well versed in BPF filtering syntax. Also, we tend to notice really strange things like IP headers that "just don't look right" and other oddities that may as well be alphabet soup to most people. Some people speak Klingon. We speak IP.

Veteran network admin trait No. 9: We take big risks all the time
Network admins tend to work on many remote devices. Unlike server admins who can pull up a console if the server is otherwise inaccessible via the network, we usually have no such luxury. This means that changes we make to certain devices carry with them the ever-present threat of the loss of connection. Basically, we're always a missed keystroke or two away from making a problem worse, or causing a big problem where none previously existed. While making a change to a remote switch and applying that change to port 34 instead of port 35, say, we could inadvertently cause a complete lack of connectivity to the remote site, taking who knows what offline until the device is power-cycled or we -- or worse, someone else -- drive over to fix it manually. Ideally, this shouldn't happen in the middle of the night, but given the usual timing of network maintenance windows, it often does. Oh, and we sometimes have to make changes to remote devices knowing that if anything goes slightly wrong, we will lose connectivity -- such as when changing Internet providers at a remote site and reconfiguring the firewall with a script because we have no other persistent connection to it. We live with this understanding all day, every day.