The industry is in the mood for programmable networks, where an operator can dynamically deploy network functions on network devices, akin to how one deploys virtual machines on physical machines in a cloud environment. Such flexibility brings along the threat of unpredictable behavior and performance. What are the minimum restrictions that we need to impose on network functionality such that we are able to verify that a network device behaves and performs as expected, for example, does not crash or enter an infinite loop? We present the result of working iteratively on two tasks: designing a domain-specific verification tool for packet-processing software, while trying to identify a minimal set of restrictions that packet-processing software must satisfy in order to be verification-friendly. Our main insight is that packet-processing software is a good candidate for domain-specific verification, for example, because it typically consists of distinct pieces of code that share limited mutable state; we can leverage this and other properties to sidestep fundamental verification challenges. We apply our ideas on Click packet-processing software; we perform complete and sound verification of an IP router and two simple middleboxes within tens of minutes, whereas a state-of-the-art general-purpose tool fails to complete the same task within several hours.
What kind of programming interface should a network device expose to its operator? Answering this question involves a fundamental trade-off between flexibility and verifiability: the more flexible the interface we choose, the wider the range of functionality that we can deploy on a network device, but the harder it is to verify that the network device will behave and perform as expected, for example, it will not crash, no matter how we configure it or what traffic it receives.
Traditional network devices (switches and routers) do not expose any interface for frequent reprogramming. They implement a well-defined set of rarely changing algorithms, like Ethernet switching or IP forwarding. Part of this functionality is typically hardcoded in application-specific integrated circuits (ASICs) and cannot be reprogrammed. The rest is implemented in software running on programmable processors and can be reprogrammed by installing a new image on the device, but this is a potentially disruptive task that is not meant to be part of regular operation.
At the other end of the spectrum, certain Active Network devices—introduced by Tennenhouse and Wetherall in the late 1990s11—can execute code specified in the packets they receive. In that universe, the network does not implement a predefined set of algorithms; it is a distributed execution platform, closer in spirit to today's clouds than to a traditional network.
However, such flexibility brings along the threat of unpredictable behavior and performance. Any system of non-trivial size that is subject to frequent updates typically suffers from behavior and performance bugs. Consider a piece of packet-processing code that enters an infinite loop upon receiving a particular type of packet; deploying this code on a network device would enable an attacker to disrupt network operation by causing network devices to stop processing new packets.
Until recently, network operators did not see an immediate economic benefit from subjecting the network to this threat; but now the landscape is changing. Today the network implements an evolving set of algorithms that go beyond switching and forwarding, for example, load-balancing, intrusion prevention and detection, or redundancy elimination. And since traditional network devices do not expose any interface for frequent reprogramming, when a network operator wants to extend the functionality of her network, say, with a new intrusion-prevention algorithm, she typically has to deploy a new specialized middlebox. It was recently reported that specialized middleboxes can make a significant fraction of network infrastructure.25 It was also reported that one can cut network operation costs by half by getting rid of all these middleboxes in favor of flexible, programmable network devices.24
Enter the modern, industry-driven reincarnation of Active Networking: Network Function Virtualization.10 The idea is to enable a network operator to dynamically deploy pieces of functionality ("network functions") on network devices, akin to how one deploys virtual machines on physical machines in a cloud environment.
Given the industry's mood for network programming, this is a great time to revisit the question of what the network-programming interface should look like. Should it allow to deploy arbitrary network functions on network devices? Or should it restrict network functionality to enable verification? Is the flexibility/verifiability trade-off the same as it was in the days of Active Networking?
One thing that has changed since the late 1990s is software verification tools. Fully automatic, complete and sound verification of arbitrary code is still out of reach, but there is significant progress in verifying software from specific domains, for example, operating-system kernels,17 device drivers,1, 8 aviation control systems.3 Most importantly, there are verification tools like Klee4 and S2E7 that are designed to be used by researchers or developers who are not verification experts. These tools provide basic verification functions that can be used as building blocks for domain-specific verification.
Given these new verification tools, what are the minimum restrictions that we need to impose on network functionality such that we are able to verify that a network device satisfies useful properties? There are different ways to approach this question: one could start from a restricted, verification-friendly programming model and broaden it as much as possible; or, one could start from a popular, but not verification-friendly programming model and restrict it as little as necessary. We chose the latter in an effort to be practical.
In this paper, we present the result of working iteratively on two tasks: designing a domain-specific verification tool for packet-processing software, while trying to identify a minimal set of restrictions that packet-processing software must satisfy in order to be verification-friendly. As a first step, we apply our ideas on Click18 packet-processing software; we verify that an IP router, a Network Address Translation (NAT) box, and a simple traffic monitor do not crash and do not execute more than a given number of instructions per packet, independently from their configuration and the traffic they receive. Our domain-specific tool completes its proofs within tens of minutes, whereas a state-of-the-art general-purpose tool could not complete the same tasks within hours.
Our main insight is that packet-processing software is a good candidate for domain-specific verification: First, it is typically written in the form of a pipeline of distinct packet-processing elements (e.g., a firewall element, an encryption element, an IP lookup element) that do not share mutable state between them, other than the packet passed from one element to the next and associated metadata. Second, it does not typically involve recursion or loops with an unpredictable number of iterations. Intuitively, these characteristics make it easier to reason about the behavior and performance of packet-processing software than that of arbitrary software. Our verification tool has semantic information about certain common aspects of packet-processing software (e.g., a "packet" is not an arbitrary data structure, a "packet-processing loop" is not an arbitrary control structure), and this information enables it to sidestep fundamental verification challenges.
In the rest of the paper, we provide basic background (Section 2), describe the design of our verification tool (Section 3), summarize our evaluation results (Section 4), and close with a discussion of open issues (Section 5).
In this section, we provide background on network devices (Section 2.1), specify the particular network devices that our work addresses (Section 2.2), provide background on symbolic execution, the main verification technique that we build on (Section 2.3), outline the main insight that motivated our work (Section 2.4), and summarize prior work (Section 2.5).
2.1. Network devices and click
A traditional network device consists of two components: the "data plane," which implements packet-processing algorithms (e.g., IP forwarding, filtering, encryption, and encapsulation) and the "control plane," which implements the algorithms that populate the data plane's data structures (e.g., the routing algorithms that populate the IP forwarding table). The data plane can be implemented in ASICs or in software (or a combination of the two), while the control plane is typically implemented in software.
Our work concerns software data planes that are subject to certain requirements: (a) Flexibility, for example, it should be easy to extend packet processing with a new filtering algorithm, or to replace the existing IP lookup algorithm with a new, faster one. (b) Predictability, for example, a packet should not be processed for more than a few microseconds; given that a software data plane is typically CPU-bound when processing small packets or performing CPU-intensive tasks (e.g., encryption or intrusion detection), a latency constraint translates into a constraint in the number (and nature) of instructions executed per packet.
To combine flexibility with predictability, it makes sense to follow certain principles when designing and implementing a software data plane: (a) Avoid inefficient programming patterns or patterns that may result in unpredictable performance, for example, recursion or loops with an unpredictable number of iterations. (b) Structure the data plane as a pipeline of distinct packet-processing elements (e.g., a filtering element, an encryption element, an IP forwarding element) that do not share mutable state, other than passing the packet itself from one element to the next, together with associated metadata.
These principles are the result of discussing with developers and our own experience, but we do not have quantitative evidence that most software data planes follow them. Avoiding inefficient or unpredictable patterns is common sense—one can hardly envision a network device that executes unbounded loops upon receiving a packet. Structuring the data plane as a pipeline of distinct, loosely coupled elements simplifies code development and maintenance. To the best of our knowledge, the first packet-processing framework that explicitly advocated this principle is Click: "A Click router is assembled from packet processing modules called elements. Individual elements implement simple router functions like packet classification, queuing, ... A router configuration is a directed graph with elements at the vertices; packets flow along the edges of the graphs."18 Click is arguably the most popular packet-processing framework within the academic community. However, commercial middleboxes are typically not open-source, and we acknowledge that they may not follow a Click-like structure. One of the points we make with this work is that they should, because this structure is verification-friendly without compromising performance.
2.2. Pipeline structure
We focus on software data planes that consist of distinct packet-processing elements, where each element may access three types of state (Table 1):
Packet state is owned by exactly one element at any point in time. It can be read or written only by its owner; the current owner (and nobody else) may atomically transfer ownership to another element. Packet state is used for communicating packet content and metadata between elements. For each newly arrived packet, there is typically an element that reads it from the network, creates a packet object, and transfers object ownership to the next element in the pipeline. Once an element has transferred ownership of a packet, it cannot read or write it any more.
Private state is owned by one element and never changes ownership. It can be read or written only by its owner, and it persists across the processing of multiple packets. A typical example is a map in a NAT element, or a flow table in a traffic-monitoring element.
Static state can be read by any element but not written by any element. This state is immutable as far as the data plane is concerned. A typical example is an IP forwarding table.
2.3. Symbolic execution
A program can be viewed as an "execution tree," where each node corresponds to a program state, and each edge is associated with a basic block. Running the program for a given input leads to the execution of a sequence of instructions that corresponds to a path through the execution tree, from the root to a leaf. For example, the program E1 in Figure 1 may execute two instruction sequences: one for input in < 0 and the other for input in ≥ 0; hence, its execution tree (shown to the right of the program) consists of two edges, one for each "input class" and instruction sequence.
Symbolic execution4, 14 is a practical way of generating execution trees. During normal execution of a program, each variable is assigned a concrete value, and only a single path of the tree is executed. During symbolic execution, a variable may be symbolic, that is, assigned a set of values that is specified by an associated constraint. For example, a symbolic integer x with associated constraint x > 2 ∧ x < 5 is the set of concrete values x = {3, 4}. A symbolic-execution engine can take a program, make the program's input symbolic, and execute all the paths that are feasible given this input.
Consider the program E2 in Figure 1 and assume that the input in can take any integer value. To symbolically execute this program, we start at the root of the tree and execute all the feasible paths. As we go down each path, we collect two pieces of information: the "path constraint" specifies which values of in lead to this path, and the "symbolic state" maps each variable to its current value on this path. For example, at the end of path e4, the path constraint is C = (in ≥ 0 ∧ in < 10), and the symbolic state is S = {out = 10}; at the end of path e5, the path constraint is C = (in ≥ 10), and the symbolic state is S = {out = in}.
In principle, symbolic execution provides an automatic, conceptually straightforward way to verify a program, that is, (dis)prove that the program satisfies a target property. For example, suppose we want to prove that program E1 in Figure 1 never crashes, no matter what input it receives; if we symbolically execute E1 specifying no constraint for in, we will execute all two paths of the program without crashing, which constitutes proof that the program does not crash for any input. Similarly, if we symbolically execute E2 in Figure 1 specifying no constraint for in, we will execute all three paths of the program, and path e3 will end in a crash, which constitutes proof that the program can crash when in < 0.
In practice, symbolic execution is rarely used to generate proofs, because of path explosion: The number of paths in a program generally grows exponentially with the number of branching points. As a result, a real program (even one that consists of a few hundred lines of code) can have so many paths that it is impossible to execute all of them in useful time.
On the other hand, symbolic execution is successfully used to test programs. The difference between testing a program and proving something about it lies in "path coverage," which is the fraction of the program's paths that are executed. Proving something about a program through symbolic execution means executing 100% of its paths, which is not a requirement for testing. A popular metric for evaluating a testing tool is "line coverage," which is the fraction of the program's lines of code that are exercised by the tool. It is worth noting that line coverage and path coverage are very different metrics. For example, when Klee4 tests UNIX coreutils like nice or cat, it achieves line coverage above 70%, but path coverage below 1%.20
There are two main reasons why we chose to build on symbolic execution, despite its limitations: First, there exist publicly available, well-maintained symbolic execution tools4, 7 that do not require verification expertise. This is not accidental: symbolic execution can be thought of as a "brute force" approach to verification; it does not require any sophisticated reasoning about program logic, which makes it easy to use (and also hard to scale). Second, symbolic execution does not require access to the source code of the program, only the executable binary. This is consistent with the vision of a network operator verifying packet-processing software written by third parties, which may not have an incentive to provide the source code.
2.4. Starting point
We observe that symbolic execution is a good fit for software data planes, because their pipeline structure can help sidestep path explosion. In a pipeline, two elements (stages) never concurrently hold read or write permissions to the same mutable state, regardless of whether that state is a packet being processed or some other data structure. This level of isolation can help significantly with path explosion.
Our approach is to first analyze each pipeline element in isolation, then compose the results to prove properties about the entire pipeline. This can reduce the amount of work that needs to be done to prove something about the pipeline by as much as an exponential factor: If each element has n branches and roughly 2n paths, a pipeline of m such elements has roughly 2m·n paths. Analyzing each element in isolation (as opposed to the entire pipeline in one piece) can reduce the number of paths that need to be explored roughly from 2m·n to m · 2n. In the worst case, the per-element analyses yield that every single pipeline path warrants further analysis (so we end up having to consider all the paths anyway). In practice, we expect that most pipeline paths are irrelevant to the target property, and we only need to consider a small fraction.
2.5. Prior work
Our work is feasible because of advances in program analysis tools for C/C++ code, from Verisoft12 to modern model checkers19, 23 and tools based on symbolic execution.4, 5, 7, 14 Of particular relevance to our work is the idea of using compositionality to address path explosion.13 Verification techniques have been used before to debug or verify networked systems (but not data planes): Musuvathi and Engler22 adapted the CMC model checker to test the Linux TCP implementation for interoperability with the TCP specification. Bishop et al.2 contributed a formal specification of TCP/IP and the sockets API, and they tested existing implementations for conformance to their specification. Killian et al.16 contributed new algorithms for finding liveness bugs in systems like Pastry and Chord. NICE finds bugs in Open-Flow applications.6 SOFT tests OpenFlow switches for inter-operability with reference implementations.21 Guha et al.15 contributed "the first machine-verified [Software Defined Networking] controller."
In this section, we describe our system: first how it leverages the pipeline structure to sidestep inter-element path explosion (Section 3.1); second, how it leverages other aspects of packet processing to sidestep intra-element path explosion resulting from loops (Section 3.2), large data structures (Section 3.3), and mutable state (Section 3.4). We close with a quick comment on the kinds of overhead that our design introduces (Section 3.5).
As we describe each technique used by our system, we also state any extra conditions on top of pipeline structure (Section 2.2) that this technique requires from the target software in order to work well. If a software data plane does not meet these conditions, our tool may not be able to complete a proof for this data plane.
We will illustrate our system through Figure 1, which shows a pipeline consisting of two elements. We will use the term segment to refer to an instruction sequence through a single element, and the term path to refer to an instruction sequence through the entire pipeline. The input in corresponds to a newly received packet, and we assume that this may contain anything, that is, we make in symbolic and unconstrained.
We developed our system on top of S2E, which is a state-of-the-art, publicly available verification framework for general software.7 For illustration purposes, our examples simplify two aspects of our system: first, our examples input in is an integer, whereas in reality the input packet object is an array of bytes; second, our example code snippets consist of pseudo-code, whereas in reality S2E takes as input X86 code.
3.1. Pipeline decomposition
Verification consists of two main steps: step 1 searches inside each element, in isolation, for code that may violate the target property, while step 2 determines which of these potential violations are feasible once we assemble the elements into a pipeline. More specifically, we cut each pipeline path into element-level segments (Figure 1). In step 1, we obtain, for each segment, a logical expression that specifies how this segment transforms state; this allows us to identify all the "suspect segments" that may cause the target property to be violated. In step 2, we determine which of the suspect segments are feasible and indeed cause the target property to be violated, once we assemble segments into paths.
In step 1, we analyze each element in isolation: First, we symbolically execute the element assuming unconstrained symbolic input. Next, we conservatively tag as "suspect" all the segments that may cause the target property to be violated. For example, in Figure 1, if the target property is crash-freedom, segment e3 is tagged as suspect, because, if executed, it leads to a crash.
If we stopped at step 1, our verification would catch all property violations, but could yield false positives: If this step does not yield any suspect segments for any element, then we have proved that the pipeline satisfies the target property. For instance, if none of the elements ever crashes for any input, we have proved that the pipeline never crashes. However, a suspect segment does not necessarily mean that the pipeline violates the target property, because a segment that is feasible in the context of an individual element may become infeasible in the context of the full pipeline. For example, in Figure 1, if we consider element E2 alone, segment e3 leads to a crash; however, in a pipeline where E2 always follows E1, segment e3 becomes infeasible, and the pipeline never crashes. In program-analysis terminology, in step 1, we over-approximate, that is, we execute some segments that would never be executed within the pipeline that we are aiming to verify.
Step 2 discards suspect segments that are infeasible in the context of the pipeline: First, we construct each potential path pi that includes at least one suspect segment; each pi is a sequence of segments ej. Next, we compose the path constraint and symbolic state for pi based on the constraints and symbolic state of its constituent segments (that we have already obtained in step 1). Finally, for every pi, we determine whether it is feasible (based on its constraints) and whether it violates the target property (based on its symbolic state). Note that the last step does not require actually executing pi, only composing the logical expressions of its constituent segments.
For example, here is how we prove that the pipeline in Figure 1 does not crash:
Step 1:
Step 2:
Pipeline decomposition enables us to prove properties about the pipeline without having to consider every single pipeline path; but it still requires us to consider every single element segment. This is not straightforward for elements that involve loops or large data structures. We will next discuss how we address each of these scenarios.
3.2. Loops
Loops can create an impractical number of segments within an element. Consider an element that implements the processing of IP options: for each received packet, it loops over the options stored in the packet's IP header and performs the processing required by each specified option type. If the processing of one option yields up to 2n segments, then the processing of t options yields up to 2t·n segments. For example, in the IP-options element that comes with the Click distribution, the processing of 3 options yields millions of segments that—we estimated—would take months to symbolically execute.
To address this, we reuse the idea of decomposition, this time applying it not to the entire pipeline, but to each loop: If a loop has t iterations, we view it as a "mini-pipeline" that consists of t "mini-elements," each one corresponding to one iteration of the loop. We have described how, if we have a pipeline of m elements, we symbolically execute each element in isolation, then compose the results to reason about the entire pipeline. Similarly, if we have a loop of t mini-elements (iterations), we symbolically execute each mini-element in isolation, then compose the results to reason about the entire loop. Unlike a pipeline that consists of different element types, a loop of t iterations consists of the same mini-element type, repeated t times; hence, for each loop, we only need to symbolically execute one mini-element.
This brings us to our first extra condition on software data planes: To use pipeline decomposition as we do, the only mutable state shared across components must be the packet object itself. For instance, to decompose a pipeline into individual elements, we rely on the fact that the only mutable state shared across elements is the packet object (Section 3.1). Similarly, to decompose a loop into individual iterations, the only mutable state shared across iterations must be part of the packet object.
For example, consider again an IP-options element: Such an element typically includes a next variable, which points to the IP-header location that stores the next option to be processed; each iteration of the main loop starts by reading this variable and ends by incrementing it. In a conventional element, next would be a local variable. In our verification-optimized element, next is part of the packet metadata, hence part of packet. And since, in step 1, we make packet symbolic and unconstrained, next is also symbolic and unconstrained, allowing us to reason about the behavior of one iteration of the main loop, assuming that iteration may start reading from anywhere in the IP header.
CONDITION 1. Any mutable state shared across loop iterations is part of the packet metadata.
3.3. Data structures
Symbolic-execution engines lack the semantics to reason about data structures in a scalable manner. For instance, symbolically executing an element that uses a packet's destination IP address to index an array with a thousand entries will cause a symbolic-execution engine to essentially branch into a thousand different segments—independently from the array content or the logic of the code that uses the returned value. So, if we naïvely feed an element with a forwarding or filtering table of more than a few hundred entries to a symbolic-execution engine, step 1 of our verification process will not complete in useful time.
To address this, when we reason about an element, we abstract away any data-structure access; this allows us to symbolically execute the element and identify suspect segments, without requiring the symbolic-execution engine to handle any data structures. To reason about the data structures themselves, we rely on other means, for example, manual or static analysis; this restricts us to using only data structures that are manually or statically verifiable, but we have evidence that these are typically sufficient for packet-processing functionality.
This brings us to our second extra condition on software data planes: To reason about different components of the same executable separately, there must exist a well-defined interface between them. For instance, to reason about each pipeline element separately and compose the results, we rely on the existence of a well-defined interface between each pair of elements, which specifies all the state that can be exchanged between them—the packet object (Section 3.1). Similarly, to reason about a data structure separately from the element that uses it and compose the results, the data structure must expose a well-defined interface to the element.
We need an interface that abstracts each data structure as a key/value store that supports at least read, write, membership test, and expiration. The first three operations are straightforward; the last one—expiration—allows an element to indicate that a {key, value} pair will not be accessed by the element any more, hence is ready to be removed and processed by the higher layers. For example, suppose an element maintains a data structure with per-flow packet counters; when a flow completes (e.g., because a FIN packet from that flow is observed), the element can use the expiration operation to signal this completion to the control-plane process that manages traffic statistics.
CONDITION 2. Elements use data structures that expose a common key/value-store interface.
Moreover, we need data structures that expose the above interface and can be verified in useful time. When we say that a data structure is "verified," we mean that the implementation of the interface exposed by the data structure is proved to satisfy crash-freedom, bounded-execution, and correctness. The latter depends on the particular semantics of the data structure, for example, a hash-table should satisfy the following property: a "write (key, value)" followed by a "read (key)" should return "value." If an element uses only data structures for which these properties hold, then, when we reason about the element, we can abstract away all the data-structure implementations and consider only the rest of the element code plus the data-structure interfaces.
CONDITION 3. Elements use data structures that are implemented on top of verifiable building blocks, for example, pre-allocated arrays.
As evidence that such data structures exist, we implemented a hash table and a longest-prefix-match table that satisfy crash-freedom and bounded-execution. They both consist of chains of pre-allocated arrays. For instance, our hash table is a sequence of K such arrays; when adding the k-th key/value pair that hashes to the same index, if k ≤ K, the new pair is stored in the k-th array, otherwise it cannot be added (the write operation returns False).
We chose arrays as the main building block, because they combine two desirable properties: (a) They enable line-rate access to packet-processing state, because of their O(1) lookup time. (b) They are easy to verify, because of the simplicity of their semantics. For example, a write to an array (that is within the array bounds) is guaranteed not to cause a crash and not to cause the execution of more than a known number of instructions that depends on the particular CPU architecture. In contrast, a write to a dynamically-growing data structure, for example, a linked list or a radix trie, may result in a variable number of memory allocations, deallocations and accesses, which can fail in unpredictable ways.
3.4. Mutable private state
Mutable private state is hard to reason about because it may depend on a sequence of observed packets (as opposed to the currently observed packet alone). For instance, if an element maintains connection state or traffic statistics, then its private state is a function of all traffic observed since the element was initialized. Hence, it is not enough to reason about the segments of the element that can result from all possible contents of the current packet; we need to reason about the segments that can result from all possible contents of all possible packet sequences that can be observed by the element. The challenge is that symbolic-execution engines (and verification tools in general) are not yet at the point where they can handle symbolic inputs of arbitrary length in a scalable manner.
We have only scratched the surface of verifying elements with mutable state. We can currently verify two specific kinds of such elements: a NAT that maintains per-connection state and rewrites packet headers accordingly, and a traffic monitor that maintains per-flow packet counters. We believe that our approach can be generalized to other elements, but we do not expect to be able to perform complete and sound verification of an element that performs arbitrary state manipulation—claiming that would be close to claiming that we could verify arbitrary software.
Our approach is to break verification step 1 (Section 3.1) into two substeps: the first one searches for "suspect" values of the private state that would cause the target property to be violated, while the second one determines which of these potential violations are feasible given the logic of the element. In the first sub-step, we assume that the private state can take any value allowed by its type (i.e., we over-approximate). In the second sub-step, we take into account the fact that private state cannot, in reality, take any value, but is restricted by the particular type of state manipulation performed by the given element. So far, we have not exercised the second sub-step in practice: in the two stateful elements that we have experimented with, the first sub-step did not reveal any suspect states, hence the second one was not exercised.
3.5. Overhead
Our extra conditions introduce two kinds of overhead: First, existing elements may need to be rewritten to satisfy them; this involves rewriting existing loops, replacing existing data structures, and changing any line of code that accesses a data structure. To reduce this overhead, part of our work is to create a library of data structures that satisfy our conditions. Second, implementing sophisticated data structures on top of pre-allocated arrays typically requires more memory than conventional implementations. We put specific numbers on these kinds of overhead in the next section, where we summarize our evaluation results.
We tested our ideas on pipelines created with Click. In each tested pipeline, packets are generated by a "generator" element and dropped by a "sink" element; what we verify is the packet-processing code between generator and sink. We answer the following questions: Can we perform complete and sound verification (Section 4.1)? How does verification time increase with pipeline length (Section 4.2)? Can we use our tool to uncover bugs in existing code and useful performance characteristics (Section 4.3)?
4.1. Feasibility and overhead
We considered pipelines with three types of elements: (1) Elements from Click release 2.0.1 that we did not change at all: Classifier, CheckIPHeader, EtherEncap, DecIPTTL, DropBroadcasts, and Strip (Ethernet decapsulation). (2) Elements from the same release that we modified modestly: IPGWOptions and DirectIPLookup. (3) Elements that we wrote from scratch: a NAT and a traffic monitor that maintains per-flow packet counters.
For each pipeline, we proved crash-freedom (the pipeline will not crash) and bounded-execution (the pipeline will not execute more than a given number of instructions per packet). More generally, for each pipeline, we were able to answer questions of the following kind: can line X in element Y be executed with arguments Z in the context of this pipeline? If yes, what is a packet that would cause this line to be executed with these arguments?
We have not yet applied our approach widely enough to have statistically meaningful results on the amount of rewriting effort. To make the IPGWOptions and DirectIPLookup elements satisfy our conditions, we had to change, respectively, about 26 lines (12% of the element's code) and 130 lines (20% of the element's code), which took a few hours. We also tried to make the Click IPRewriter element from release 2.0.1 (which generalizes NAT functionality) satisfy our conditions, but we ended up changing most of the element, because most of its code accesses data structures; in the end, we wrote it from scratch (about 870 lines of code), which took a couple of days. Our traffic monitor is about 650 lines of code.
In terms of memory overhead: The Click IPRewriter element stores per-connection state in a hash table, implemented as an array of dynamically growing linked lists (when adding the k-th key/value pair that hashes to the same index, the new pair is stored as the k-th item of a linked list). In contrast, our NAT element uses a hash table implemented as a chain of K = 3 pre-allocated arrays (this value makes the probability of dropping a connection negligible). Hence, our NAT element may use up to three times more memory to store the same amount of state. In our opinion, sacrificing memory for verifiability is worth considering given the relative costs of memory and the human support for dealing with network problems.
4.2. Scalability
We now examine how verification time increases with pipeline length. To show the benefit of our domain-specific techniques, we use as a baseline vanilla S2E. We refer to our approach as "dataplane-specific verification" and to the baseline as "generic verification." We feed the same code to the two systems.
We consider three meaningful pipelines: (a) edge router implements a standard IP router with a small forwarding table (10 entries); (b) core router is similar but has a large forwarding table (100,000 entries); (c) network gateway implements NAT and per-flow statistics collection. Each of them presents an extra verification challenge: the first one includes a loop (processing IP options), the second one a large data structure (the IP lookup table), and the third one mutable private state (in our NAT and traffic monitor elements).
Figure 2a shows how verification time increases as we add more elements to the IP-router pipelines: Dataplane-specific verification completes in less than 20 min. Most of this time is spent on IP options, because this element has significantly more branching points than the rest. Generic verification of the edge router exceeds 12 h (at which point we abort it) the moment we allow packets to carry 2 IP options (so we do not show any data point for it beyond "+IPoption2"). Generic verification of the core router exceeds 12 h the moment we add the IP lookup element to the pipeline (so we do not show any data point for it beyond "+IPlookup"). The difference between the two tools comes from our pipeline decomposition and special treatment of loops and large data structures.
Figure 2b shows the same information for the network-gateway pipeline: Dataplane-specific verification completes in less than 6 min, whereas generic verification exceeds 12 h the moment we add either the traffic monitor or the NAT element. The difference comes from our pipeline decomposition and the fact that we abstract away data-structure implementations.
4.3. Usefulness
We said that our tool can help developers debug their code, and network operators to better understand the performance and behavior of their data planes; we now look at a few specific examples.
Bugs in existing code. While trying to prove crash-freedom and bounded-execution for various pipelines, we encountered the following instances where a malicious end-host could disable the pipeline by sending a specially crafted packet. All elements that we refer to belong to Click release 2.0.1 (without any subsequent patches).
Any pipeline that includes the IPFragmenter element, not preceded by the IPGWOptions element, will enter an infinite loop if it tries to fragment a packet that carries a zero-length IP option. This is because the current option length determines where the next iteration of the loop will start reading, hence, a zero-length option causes the loop to get stuck.a The IPGWOptions element discards any packet with a zero-length option, so including it in the pipeline prevents the bug from being exercised.
Any pipeline that includes the IPRewriter elementb will hit a failed assertionc if it receives a packet with source IP address/port tuple Ts = T and destination tuple Td = T, where T is the public IP address/port of the IPRewriter. We should note that this particular bug was fixed (independently from us) in a patch subsequent to release 2.0.1.
Longest paths in IP router. We used our tool to construct adversarial—from a performance point of view—workloads for a pipeline implementing a standard IP router. Recent research showed that such a router is capable of multi-Gbps lines rates,9 but this result was obtained using work loads of well-formed packets, not meant to exercise the pipeline's exception paths. Instead, we obtained the pipeline's 10 (it could have been any number) longest paths, as well as the packets that cause them to be executed.
It is not surprising that the longest paths are executed in response to problematic packets that trigger further packet examination and logging; what may be surprising is that these paths execute 2.5 times as many instructions as the most common path. Moreover, these extra instructions are CPU-heavy, that is, they include memory accesses and system calls for logging; an attacker may cause significant performance degradation by sending a sequence of packets that are specially crafted to exercise these particular paths. This is useful information to a developer, because it reveals to her paths that may require her attention. It is also useful to a network operator, because it reveals to her the performance limits of a pipeline and the workloads that trigger them—allowing her to decide whether it is suitable for her network.
The key enabler and at the same time limitation of our work is that we focus on software data planes that are subject to certain restrictions: (1) They obey a pipeline structure. We think that this is a natural fit for packet-processing software. Most research prototypes are already written this way, and we know of at least one industry prototype as well. Favoring an already popular programming model is, in our opinion, a modest price to pay for verifiability. (2) They satisfy our extra conditions, which may require rewriting existing code (but the resulting code is, in our opinion, easier to read and maintain); and may result in implementations with larger memory footprints than existing code (but trading off memory for verifiability is, in our opinion, worth considering).
Our approach is applicable to packet-processing platforms where each packet is handled by a single processing core and different cores never need to synchronize. We focused on such platforms, because there is compelling evidence that they lead to better performance (by minimizing the number of compulsory cache misses).9 We would need new results in order to verify platforms where different cores contend for access to the same data structures.
As we continue our work, we are coming to the following conclusion: symbolic execution is a powerful tool, but it alone cannot take us to complete and sound verification of sophisticated, stateful packet-processing elements. We were able to use it thus far, because we considered only stateless elements (IP router) or elements that maintain relatively simple forms of state (NAT and traffic monitor). To verify stateful elements, we first assume that the element's state may take any value and check whether the target property holds; if it does not hold for a particular suspect value, we then try to reason about whether the element's state can ever take that suspect value (Section 3.4). The challenge lies in the second step. In the case of our NAT and traffic monitor, there were no suspect values to begin with—and even if there were, reasoning about the feasible values of the state of these elements is relatively simple. However, an intrusion detection system (IDS) may implement sophisticated state machines that we cannot reason about, even with our domain-specific optimizations.
Hence, we are heading toward an approach where we analyze each element in isolation and compose the results on demand (as our current system does), but we use a wider range of techniques for our per-element analysis. Techniques like model checking or abstract interpretation may be necessary if we want to reason about sophisticated, stateful packet-processing elements. The challenge lies in simplifying these techniques enough to make them accessible to non-verification experts; continuing in the same vein with our current work, we will try to do this by leveraging the idiosyncrasies of packet-processing code. For example, we are exploring the idea of identifying patterns of code that are common in packet processing, and developing specialized techniques for verifying these particular patterns.
There will always be software that we cannot verify, and we by no means advocate that network operators reject it. Rather, we advocate classifying software into "verified" and "non-verified," to enable network operators to make informed decisions about deploying it on their network devices. For instance, it makes sense to deploy non-verified software in an isolated fashion (e.g., within a container, where only experimental traffic can reach it), to contain its effect on the rest of the network. Perhaps this combination of verification and dual-mode deployment is the way to reconcile the need for predictable network behavior and performance with the vision of a continuously evolving Active Network.
Thanks to Stefan Bucur, George Candea, Vitaly Chipounov, Johannes Kinder, Vova Kuznetsov, Christian Maciocco, Dimitris Melissovas, David Ott, Iris Safaka, Simon Schubert, and Cristian Zamfir. This work was supported by an Intel gift and a Swiss National Science Foundation grant.
1. Ball, T., Rajamani, S.K. The SLAM project: Debugging system software via static analysis. In Proceedings of the ACM Symposium on the Principles of Programming Languages (POPL) (2002).
2. Bishop, S., Fairbairn, M., Norrish, M., Sewell, P., Smith, M., Wansbrough K. Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and Sockets. In Proceedings of the ACM SIGCOMM Conference (2005).
3. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X. A static analyzer for large safety-critical software. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI) (2003).
4. Cadar, C., Dunbar, D., Engler, D.R. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2008).
5. Cadar, C., Engler, D.R. EXE: Automatically generating inputs of death. In Proceedings of the ACM Conference on Computer Communication Security (CCS) (2006).
6. Canini, M., Venzano, D., Peresini, P., Kostic, D., Rexford J. A NICE way to test OpenFlow applications. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012).
7. Chipounov, V., Kuznetsov, V., Candea, G. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems 30, 1 (February 2012), 2:1–2:49.
8. Cook, B., Podelski, A., Rybalchenko, A. Termination proofs for systems code. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI) (2006).
9. Dobrescu, M., Egi, N., Argyraki, K., Chun, B.-G., Fall, K., Iannaccone, G., Knies, A., Manesh, M., Ratnasamy, S. RouteBricks: Exploiting parallelism to scale software routers. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (2009).
10. I.S.G. European Telecommunications Standards Institute (ETSI). Network Functions Virtualisation (NFV); Use Cases. Technical Report ETSI GS NFV 001 V1.1.1, ETSI, 650 Route des Lucioles, F-06921 Sophia Antipolis Cedex, FRANCE, 2013. Reference DGS/NFV-009.
11. Feamster, N., Rexford, J., Zegura, E. The road to SDN: An intellectual history of programmable networks. ACM Comput. Commun. Rev. 44, 2 (April 2014), 87–98.
12. Godefroid, P. Model checking for programming languages using Verisoft. In Proceedings of the ACM Symposium on the Principles of Programming Languages (POPL) (1997).
13. Godefroid, P. Compositional dynamic test generation. In Proceedings of the ACM Symposium on the Principles of Programming Languages (POPL) (2007).
14. Godefroid, P., Klarlund, N., Sen, K. DART: Directed automated random testing. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI) (2005).
15. Guha, A., Reitblatt, M., Foster, N. Machine-verified network controllers. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI) (2013).
16. Killian, C., Anderson, J.W., Jhala, R., Vahdat, A. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2007).
17. Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Norrish, M., Kolanski, R., Sewell, T., Tuch, H., Winwood, S. seL4: Formal verification of an OS kernel. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (2009).
18. Kohler, E., Morris, R., Chen, B., Jannoti, J., Kaashoek, M.F. The click modular router. ACM Trans. Comput. Syst. 18, 3 (August 2000), 263–297.
19. Kroening, D., Clarke, E., Yorav, K. Behavioral consistency of C and Verilog programs using bounded model checking. In Proceedings of the Design Automation Conference (DAC) (2003).
20. Kuznetsov, V., Kinder, J., Bucur, S., Candea, G. Efficient state merging in symbolic execution. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI) (2012).
21. Kuzniar, M., Peresini, P., Canini, M., Venzano, D., Kostic, D. A SOFT way for OpenFlow switch interoperability testing. In Proceedings of the ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT) (2012).
22. Musuvathi, M., Engler, D.R. Model checking large network protocol implementations. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2004).
23. Musuvathi, M., Park, D., Chou, A., Engler, D.R., Dill, D.L. CMC: A pragmatic approach to model checking. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2002).
24. Sekar, V., Egi, N., Ratnasamy, S., Reiter, M.K., Shi, G. Design and implementation of a consolidated middlebox architecture. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012).
25. Sherry, J., Hasan, S., Scott, C., Krishnamurthy, A., Ratnasamy, S., Sekar, V. Making middleboxes someone else's problem: network processing as a cloud service. In Proceedings of the ACM SIGCOMM Conference (2012).
a. elements/ip/ipfragmenter.cc, line 69.
b. elements/tcpudp/iprewriter.cc.
c. include/click/heap.hh, line 149.
A previous version of this paper was published in Proceedings of the 11th USENIX Conference on Networked Systems and Design and Implementation, 2014, pp. 101–114.
Figure 1. A toy pipeline that consists of two elements.
Figure 2. Verification time as a function of pipeline length. "Preproc" consists of the Click Classifier, CheckIPHeader, and Strip elements. (a) IP router. For the dataplane-specific approach, the results are the same for the edge and core pipelines. (b) Network gateway.
©2015 ACM 0001-0782/15/11
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.
No entries found