Dear KV,
I posted a question on a mailing list recently about a networking problem and was asked if I had a tcpdump. The person who responded to my questionand to the whole list as wellseemed to think my lack of networking knowledge was some kind of affront to him. His response was pretty much a personal attack: If I couldn't be bothered to do the most basic types of debugging on my own, then I shouldn't expect much help from the list. Aside from the personal attack, what did he mean by this?
Dumped
Dear Dumped,
It is always interesting to me that when people study computer programming or software engineering they are taught to use the creative toolseditors to create code, compilers to take that code and turn it into an executablebut are rarely, if ever, taught how to debug a program. Debuggers are powerful tools, and once you learn to use one you become a far more productive programmer because, face it, putting printf()
or its immoral equivalentthroughout your code is a really annoying way to find bugs. In many cases, especially those related to timing issues, adding print
statements just leads to erroneous results. If the number of people who actually learn how to debug a program during their studies is small, the number who learn how to debug a networking problem is minuscule. I actually don't know anyone who was ever directly taught how to debug a networking problem.
Some peoplethe lucky onesare eventually led to the program you mention, tcpdump, or its graphical equivalent, wireshark, but I've never seen anyone try to teach people to use these tools. One of the nice things about tcpdump and wireshark is that they're multi-platform, running on both Unix-like operating systems and Windows. In fact, writing a packet-capture program is relatively easy, as long as the operating system you're working with gives you the ability to tap into the networking code or driver at a low enough level to sniff packets.
Those of us who spend our days banging our heads against networking problems eventually learn how to use these tools, sort of in the way that early humans learned to cook flesh. Let's just say that though the results may have been edible, they were not winning any Michelin stars.
Using a packet-capture tool is, to a networking person, somewhat like using a thermometer is to a parent. It is likely that if you ever felt sick when you were a child at least one of your parents would take your temperature. If they took you to the doctor, the doctor would also take your temperature. I once had my temperature taken for a broken anklecrazy, yes, but that doctor gave the best prescriptions, so I just smiled blithely and let him have his fun. That aside, taking a child's temperature is the first thing on a parent's checklist for the question "Is my child sick?" What on earth does this have to do with capturing packets?
By far the best tool for determining what is wrong with programs that use a network, or even the network itself, is the tcpdump tool. Why is that? Surely in the now 40-plus years since packets were first transmitted across the original ARPANET we have developed some better tools. The fact is we have not. When something in the network breaks, you want to be able to see the messages at as many layers as possible.
The other key component in debugging network problems is understanding the timing of what happens, which a good packet-capture program also records. Networks are perhaps the most nondeterministic components of any complex computing system. Finding out who did what to whom and when (another question parents often ask, usually after a fight among siblings) is extremely important.
All network protocols, and the programs that use them, have some sort of ordering that is important to their functioning. Did a message go missing? Did two or more messages arrive out of order at the destination? All of these questions can potentially be answered by using a packet sniffer to record network traffic, but only if you use it!
It's also important to record the network traffic as soon as you see the problem. Because of their nondeterministic nature, networks give rise to the worst types of timing bugs. Perhaps the bug happens only every so many hours, because of a rollover in a large counter; you really want to start recording the network traffic before the bug occurs, not after, because it may be many hours until the condition comes up again.
Using a packet-capture tool is, to a networking person, somewhat like using a thermometer is to a parent.
So, here are some very basic recommendations on using a packet sniffer in debugging a network problem. First, get permission (yes, it really is KV giving you this advice). People get cranky if you record their network traffic, such as instant messages, email, and banking transactions, and then post it to a mailing list. Just because some person in IT was dumb enough to give you root or admin rights on your desktop does not mean you should just record everything and send it off.
Next, record only as much information as you need to debug the problem. If you're new at this you'll probably have the program suck up every packet so you don't miss anything, but that's problematic for two reasons: the first is the previously mentioned privacy issue; and the second is that if you record too much data, finding the bug will be like finding a needle in a haystackonly you've never seen a haystack that big. Recording an hour of Ethernet traffic on your LAN can capture a few hundred million packets. No matter how good a tool you have, it's going to do a much better job at finding a bug if you narrow down the search.
If you do record a lot of data, don't try to share it all as one huge chunk. See how these points follow each other? Most packet-capture programs have options to say, "Once the capture file is full, close it and start a new one." Limiting files to one megabyte is a nice start.
Finally, do not record your data on a network file system. There is no better way to ruin a whole set of packet-capture files than by having them capture themselves.
So there you have it: a brief introduction to capturing data so you can debug a networking problem. Perhaps now you can get yelled at on a mailing list for something more egregious than not taking your network's temperature before calling the doctor.
KV
Related articles
on queue.acm.org
Debugging in an Asynchronous World
Michael Donat
http://queue.acm.org/detail.cfm?id=945134
Kode Vicious Bugs Out
Kode Vicious
http://queue.acm.org/detail.cfm?id=1127862
A Conversation with Steve Bourne, Eric Allman, and Bryan Cantrill
http://queue.acm.org/detail.cfm?id=1413258
I would like to dedicate this column to my first editor, Mrs. B. Neville-Neil, who passed away after a sudden illness on December 9th, 2009; she was 65 years old.
My mother took language, both written and spoken, very seriously. The last thing I wanted to hear upon showing her an essay I was writing for school was, "Bring me the red pen." In those days I did not have a computer; all my assignments were written longhand or on a typewriter and so the red pen meant a total rewrite. She was a tough editor, but it was impossible to question the quality of her work or the passion that she brought to the writing process. All of the things Strunk and White have taught others throughout the years my mother taught me, on her own, with the benefit of only a high school education and a voracious appetite for reading.
It is, in large part, due to my mother's influence that I am a writer today. It is also due to her influence that I review articles, books, and code on paper, using a red pen. Her edits and her unswerving belief that I could always improve are, already, keenly missed.
George Vernon Neville-Neil III
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
The following letter was published in the Letters to the Editor in the June 2010 CACM (http://cacm.acm.org/magazines/2010/6/92490).
--CACM Administrator
George V. Neville-Neil's "Kode Vicious" Viewpoint "Taking Your Network's Temperature" (Feb. 2010) was thought-provoking, but two of its conclusions "putting printf()... throughout your code is a really annoying way to find bugs" and "limiting the files to one megabyte is a good start"were somewhat misleading.
Timing was one reason Neville-Neil offered for his view that printf() can lead to "erroneous results." Debugger and printf() both have timing loads. Debug timing depends on hardware support. A watch statement functions like a printf(), and a breakpoint consumes "infinite" time. In both single-threaded and multithreaded environments, a breakpoint stops thread activity. In all cases, debugger statements perturb timing in a way that's like printf().
We would expect such stimulus added to multithreaded applications would produce different output. Neville-Neil expressed a similar sentiment, saying "Networks are perhaps the most nondeterministic components of any complex computing system." Both printf() and debuggers exaggerate timing differences, so the qualitative issue resolves to individual preferences, not to timing.
Choosing between a debugger and a printf() statement depends on the development stage in which each is to be used. At an early stage, a debugger might be better when timing and messaging order are less important than error detection. Along with functional integration in the program, a debugger can sometimes reach a point of diminishing returns. Programmers shift their attention to finding the first appearance of an error and the point in their programs where the error was generated. Using a debugger tends to be a trial-and-error process involving large amounts of programmer and test-bench time to find that very point. A printf() statement inserted at program creation requires no setup time and little bench time, so is, in this sense, resource-efficient.
The downside of using a printf() statement is that at program creation (when it is inserted) programmers anticipate errors but are unaware of where and when they might occur; printf() output can be overwhelming, and the aggregate time to produce diagnostic output can impede time-critical operations. The overhead load of output and time is only partially correctable.
Limiting file size to some arbitrary maximum leads programmers to assume (incorrectly) that the search is for a single error and that localizing it is the goal. Limiting file size allows programmers to focus on a manageable subset of data for analysis but misses other unrelated errors. If the point of error-generation is not within some limited number of files, little insight would be gained for finding the point an error was in fact generated.
Neville-Neil saying "No matter how good a tool you have, it's going to do a much better job at finding a bug if you narrow down the search." might apply to "Dumped" (the "questioner" in his Viewpoint) but not necessarily to everyone else. An analysis tool is meant to discover errors, and programmers and users both win if errors are found. Trying to optimize tool execution time over error-detection is a mistake.
Art Schwarz
Irvine, CA
The following letter was published in the Letters to the Editor in the June 2010 CACM (http://cacm.acm.org/magazines/2010/6/92490).
--CACM Administrator
George V. Neville-Neil's Viewpoint (Feb. 2010) said students are rarely taught to use tools to analyze networking problems. For example, he mentioned Wireshark and tcpdump, but only in a cursory way, even though these tools are part of many contemporary university courses on networking.
Sniffers (such as Wireshark and Ethereal) for analyzing network protocols have been covered at Fairleigh Dickinson University for at least the past 10 years. Widely used tools for network analysis and vulnerability assessment (such as nmap, nessus, Snort, and ettercap) are available through Fedora and nUbuntu Linux distributions. Open source tools for wireless systems include NetStumbler and AirSnort.
Fairleigh Dickenson's network labs run on virtual machines to limit inadvertent damage and the need for protection measures. We teach the basic network utilities available on Windows- and/or Posix-compliant systems, including ping, netstat, arp, tracert (traceroute), ipconfig (ifconfig in Linux/Unix and iwconfig in Linux wireless cards), and nslookup (dig in Linux). With the proper options, netstat displays IP addresses, protocols, and ports used by all open and listening connections, as well as by protocol statistics and routing tables.
The Wireshark packet sniffer identifies control information at different protocol layers. A TCP capture specification thus provides a tree of protocols, with fields for frame header and trailer (including MAC address), IP header (including IP address), and TCP header (including port address). Students compare the MAC and IP addresses found through Wireshark with those found through netstat and ipconfig. They then change addresses and check results by sniffing new packets, analyzing the arp packets that try to resolve the altered addresses. Capture filters in Wireshark support search through protocol and name resolution; Neville-Neil stressed the importance of narrowing one's search but failed to mention the related mechanisms. Students are also able to make connections through (unencrypted) telnet and PuTTy, comparing password fields.
My favorite Wireshark assignment involves viewing TCP handshakes via statistics/flow/TCP flow, perhaps following an nmap SYN attack. The free security scanner nmap runs with Wireshark and watch probes initiated by the scan options provided. I always assign a Christmas-tree scan (nmap sX) that sends packets with different combinations of flag bits. Capturing probe packets and a receiving station's reactions enables identification of flag settings and the receiver's response to them. Operating systems react differently to illegal flag combinations, as students observe via their screen captures.
Network courses and network maintenance are thus strongly enhanced by sniffers and other types of tools that yield information concerning network traffic and potential system vulnerabilities.
Gertrude Levine
Madison, NJ
Displaying all 2 comments