acm-header
Sign In

Communications of the ACM

Research highlights

CoSaMP: Iterative Signal Recovery From Incomplete and Inaccurate Samples


mathematical symbols

Credit: Free Desktop Backgrounds

Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are compressible, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called, CoSaMP, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

Back to Top

1. What is Compressive Sampling?

In many applications, the ambient dimension of a data vector does not reflect the actual number of degrees of freedom in that vector. In mathematical signal processing, this property is captured by the notion of a compressible signal. Natural images provide a concrete example of compressible signals because we can approximate them accurately just by summarizing the solid areas (local averages) and the edges (local differences). This representation allows us to compress images dramatically without a substantial loss of visual quality—an idea implemented in the JPEG image coders that we use every day.

Sampling is a generic term for the process of acquiring data about an object, either in the physical or the virtual world. There are many different technologies that we use to collect samples. A ubiquitous piece of sampling hardware is the CCD array inside a digital camera, which measures the average light intensity at each small square in the focal plane of the camera. Other sampling equipment includes X-ray machines, magnetic resonance imaging (MRI), and analog-to-digital converters.

In most cases, we use sampling to acquire as much information as possible about an object of interest. Indeed, when purchasing a digital camera, we often think that more pixels are better. But as soon as the image is captured, the camera invokes compression algorithms to reduce the amount of information it needs to store. This fact suggests an awkward question: if there is a limited amount of information in the object, why are we taking so many samples? Is there some way to obtain measurements that automatically sieve out the information in the object?

One way to exploit the gap between nominal dimension and degrees of freedom is to perform dimension reduction. This idea has a long tradition in computer science. Indeed, to solve geometric problems involving high-dimensional data, we often map the data to a lower dimension while approximately preserving the geometric properties that are relevant to the computation. One standard method for achieving this goal is to use a random projection to the data.

The main idea in compressive sampling (CoSa) is to develop sampling technologies based on dimension reduction. The mathematical foundation for this approach is the fact that an appropriate random projection of a compressible signal retains most of the information in that signal. The coordinates of this randomly projected signal are interpreted as samples, and the number of samples we need is comparable with the information content of the object we are sampling. Because we have a lot of flexibility when designing projections, we have flexibility in engineering the sampling hardware.

An intriguing example of a CoSa sensing device is the Rice University single-pixel camera. This camera uses a digital micromirror device (DMD) to selectively black out a random subset of pixels from an image and to measure the total intensity of the remaining pixels. By applying many random masks in sequence, we obtain a collection of samples that captures the information in the image.

Another key fact about CoSa is that we cannot use simple linear techniques to reconstruct the data vector from the random samples. Instead, we must develop more sophisticated algorithms. One approach that has received a lot of attention is convex programming based on ell.gif1 minimization. This abstract describes an algorithm, called CoSaMP, that is based on the greedy pursuit schema.18, 19 CoSaMP was the first computational technique that provably achieved near-optimal guarantees on resource usage. Using the techniques from our paper, researchers later demonstrated that other algorithms offer comparable performance guarantees. Another algorithm similar to CoSaMP, called Subspace Pursuit,9 was published at the same time, but the accompanying analysis was less complete.

The remainder of the article is organized as follows. Section 2 describes the mathematical framework for CoSa as well as the theoretical results for the CoSaMP algorithm. Section 3 presents a description of the algorithm and its implementation. Section 4 discusses related work and a comparison to our method. We conclude in Section 5 with the analysis of the algorithm and sketch the proof of the main result.

Back to Top

2. The Mathematics of CoSa

We continue with a description of the mathematical model for signals, sampling technologies, and signal reconstruction algorithms. The main theoretical results for the CoSaMP algorithm appear at the end of the section.

* 2.1. Notation

Let us instate some notation that is used throughout the paper. A signal is a vector x isin.gif CN. For p isin.gif [1, ∞], we write ||·||p for the usual ell.gifp vector norm: cacm5312_a.gif. We reserve ||·|| for the spectral norm. For x isin.gif CN, we write xr for the signal in CN that is formed by restricting x to its r largest-magnitude components (with ties broken lexicographically).

For T ⊂ {1, 2, ..., N}, we denote the restriction of the signal to the set T as x|T, which is the vector equal to x on T and 0 elsewhere. We occasionally abuse the notation and treat x|T as an element of CT. We also define the restriction ΦT of the sampling matrix Φ to be the column submatrix whose columns are listed in the index set T. Finally, we define the pseudoinverse of a tall, full-rank matrix A by the formula A = (A*A)−1 A*.

* 2.2. Sparse and compressible signals

We say that a signal is s-sparse when it has sN nonzero entries:

ueq01.gif

This definition encapsulates the idea that the signal contains far less information than its ambient dimension suggests. Observe that it takes roughly cacm5312_b.gif bits to write down the locations of the nonzero entries in an s-sparse signal in CN.

While sparse signals are important for theory, they rarely arise in practice, so we consider the larger class of compressible signals. For p isin.gif (0, 1), we say that x is p-compressible with magnitude R if its components taken in sorted order obey

eq01.gif

Compressible signals are well approximated by sparse signals. The smaller the number p, the better the approximation:

eq02.gif

Let us emphasize that we do not know in advance which coordinates of a compressible signal are significant—only that few coordinates matter.

We limit our attention to signals that are sparse or compressible in the standard coordinate basis, although the ideas apply equally well to signals that are sparse or compressible with respect to other orthonormal bases. The generalization is important in applications. For instance, natural images are usually not compressible in the standard basis but they are compressible in an appropriate wavelet basis.

* 2.3. Modeling the sampling operation

We can model each of the sampling technologies mentioned in the introduction as a linear map from a signal of interest to a collection of samples. We use an m × N matrix Φ to describe the sampling operation. Thus, we write u = Φx to summarize the process of collecting a vector u of m samples from an N-dimensional signal x.

Restricted Isometry Constants: In this section, to build intuition, we focus on the class of s-sparse signals. It is important to understand what properties of the sampling matrix Φ ensure that s-sparse signals can be distinguished based on their samples. In other words, we need to make sure that Φ is a linear embedding (i.e., a bijective structure-preserving map) of the set of s-sparse signals into Cm. A necessary and sufficient condition is that each 2s-column submatrix of the sampling matrix Φ forms a linearly independent set. Certain Vandermonde matrices with m = 2s rows, arising from Reed–Solomon codes, have this property. Unfortunately, these matrices contain nearly singular 2s-column submatrices, so the sampling matrix is not stably invertible on the image of the set of s-sparse signals. Noise and signal perturbations become a serious problem.

To avoid this difficulty, Candès and Tao4 insist that each 2s-column submatrix of the measurement matrix should be well conditioned. More formally, they define the rth restricted isometry constant of the matrix Φ to be the least number δr such that

eq03.gif

The condition δ2s ≪ 1 implies that the measurement matrix Φ preserves the geometry (with respect to the Euclidean metric) of the set of all s-sparse signals. This condition is sufficient to ensure that sampling is stably invertible. It has become standard practice in CoSa to assume that the measurement matrix satisfies a condition of this type.

How Many Samples?: The next question is how many samples m do we need to ensure that the sampling map embeds the set of s-sparse signals into Cm. Because an s-sparse signal contains only about s log(N/s) bits of information, we hope that m = O(s log(N/s)) samples suffice for the embedding. In fact, many random sampling schemes allow us to achieve this sampling rate, or one that is just slightly worse. (It is necessary in some sense to take at least this many samples, so these sampling schemes are optimal.) Furthermore, there are technologies that implement these sampling schemes. Here are two examples.

A random sign matrix has independent, identically distributed entries that take the values ±1 with equal probability. Its restricted isometry constant5 satisfies

ueq02.gif

In words, we can embed the set of s-sparse signals into m dimensions, where m is proportional to the sparsity and logarithmic in the ambient dimension. A random sign matrix can be used to construct the random masks required by the Rice single-pixel camera.

The subsampled Fourier matrix consists of a collection of m rows chosen uniformly at random from the discrete Fourier transform matrix. The best theoretical bound for the restricted isometry constant23 is

ueq03.gif

It is conjectured that the actual scaling is cacm5312_j.gif log(N). The subsampled Fourier matrix describes measurements that can be acquired by an MRI machine. Note that a subsampled Fourier matrix can be applied to a vector using O(N log N) arithmetic operations, and it requires only O(m log N) bits of storage. These computational properties are very important in practice, as we will see.

Certain types of deterministic matrices also satisfy bounds on their restricted isometry constants. At present, all known examples require that the number m of samples satisfy cacm5312_c.gif. Randomness seems to provide a significant advantage when designing schemes for CoSa.

* 2.4. Signal reconstruction via CoSaMP

When the sampling matrix Φ stably embeds the set of s-sparse signals, it is possible in principle to recover an arbitrary s-sparse signal from its samples. The major computational question in CoSa is to determine how to reconstruct the sparse signal using a tractable algorithm. For practical applications, we also need to understand how the reconstruction is affected when the signal is merely compressible and the samples are contaminated by noise. The following issues are crucial:

  1. Can the algorithm recover the signal under a variety of sampling schemes? In other words, is the class of measurement maps for which the algorithm can succeed large?
  2. Can the algorithm reconstruct the signal from a minimal number m of samples?
  3. Is signal recovery robust when samples are contaminated with noise or the signal is not exactly sparse? Is the error bound optimal for each target signal?
  4. Does the algorithm offer provably efficient computational cost and storage?

This abstract discusses a reconstruction algorithm called Compressive Sampling Matching Pursuit (CoSaMP) that satisfies all of the foregoing criteria. Our original publications18, 19 were the first to prove that a CoSa signal reconstruction algorithm could achieve essentially optimal resource usage.

Properties of the CoSaMP Algorithm: Let us state and explain our main result. We postpone the details of the algorithm and the analysis until later.

THEOREM A (CoSaMP). Suppose that Φ is an m × N sampling matrix with restricted isometry constant δ2s ≤ c. Let u = Φx + e be a vector of samples of an arbitrary signal, contaminated with arbitrary noise.

The CoSaMP algorithm takes as input a routine to multiply Φ and Φ* by a vector, the vector u of samples, the sparsity parameter s, and a precision parameter η. The output is an s-sparse signal approximation a that satisfies

ueq04.gif

The running time is O(ℒ .log(||x||2/η)), where ℒ bounds the cost of a matrix–vector multiply with Φ or Φ*. Working storage is O(N). The constants c < 1 and C > 1 are absolute.

How does this result deliver the desired performance?

  1. The algorithm requires only that the matrix satisfy a bound on the restricted isometry constant. Many types of random matrices (and certain deterministic matrices) have this property. As a consequence, the algorithm can be used with many measurement technologies.
  2. The number m of samples required to obtain the bound δ2sc depends on the type of sampling matrix. For many types of random matrices, m = O(s logαN) for a small integer α. Since this value is optimal up to the constants, the algorithm can recover signals from a minimal number of samples.
  3. The error bound demonstrates that the algorithm succeeds for all signals, even when there is noise in the samples. As we expect, the algorithm performs best for sparse and compressible signals. Indeed, Equation 2 shows that CoSaMP produces an approximation a of a p-compressible signal x that satisfies
      
      cacm5312_kueq05.gif
      which is the best possible approximation error we could hope for in general.6
  4. The computational cost of the algorithm depends on the type of signals we are approximating. For a compressible signal, we may take the precision parameter η = R · s1/2-1/p to see that the number of iterations required to produce a minimal approximation error is at most O(log s). See Section 1.2 of Needell and Tropp19 for details.
      The running time also depends on the type of sampling matrix we use. In particular, we can apply a subsampled Fourier matrix to a vector using ℒ = O(N log N) operations.
      It follows that if we acquire a compressible signal with a structured sampling matrix, we can perform signal recovery in time O(N log2 N). This runtime bound is essentially optimal when the sparsity level is roughly the same order as the dimension (which is the setting in most practical problems).

Back to Top

3. The CoSaMP Algorithm

The CoSaMP algorithm is, at heart, a greedy iterative method for reconstructing a signal from compressive samples. This section provides an overview of the algorithm and its implementation. We also explain the basic idea behind the analysis of the algorithm, which demonstrates that each iteration reduces the error in the current signal approximation.

* 3.1. Description of algorithm

The input to the algorithm consists of (access to) the sampling operator Φ, the samples u, the target sparsity level s, and a halting criterion. We impose the following hypotheses:

  • The sparsity level s is fixed, and the m × N sampling operator Φ has restricted isometry constant δ4s ≤ 0.1.
  • The signal x isin.gif CN is arbitrary, except where noted. The noise vector e isin.gif Cm is also arbitrary.
  • The vector of samples u = Φx + e.

The algorithm is initialized with a trivial signal estimate, meaning that the initial residual is the entire unknown target signal. Each iteration then consists of five major steps.

  1. Identification. Using the current samples, the algorithm computes a vector that is highly correlated with the signal, called the signal proxy. From the proxy, components of the signal that carrry a lot of energy are located.
  2. Support Merger. The set of newly identified components is united with the set of components that appear in the current approximation.
  3. Estimation. The algorithm solves a least-squares problem to approximate the target signal on the merged set of components.
  4. Pruning. The algorithm produces a new approximation by retaining only the largest entries in this leastsquares signal approximation.
  5. Sample Update. Finally, the samples are updated so that they reflect the residual, the part of the signal that has not been approximated.

These five steps are repeated until the halting criterion is satisfied. In our analysis, we assume that the method uses a fixed number of iterations, but Needell and Tropp18, 19 discuss other alternatives. The CoSaMP algorithm can be summarized by the pseudocode presented as Algorithm 1.

REMARK 1. We emphasize that the method we present is a specific example from a more general framework. The articles18, 19 discuss a number of variations. In particular, note that the term 2s in the identification step can be replaced by αs for other values of α. This type of tuning is valuable in practice, but it makes the proof more complicated.

REMARK 2. Some knowledge about the sparsity level is required as input to the algorithm. There are several strategies for estimating this parameter if it is not known a priori. One such method would be to use the number m of measurements to deduce the sparsity level. Since m ap.gif 2s log N is often sufficient, the estimate s ap.gif m/2 log N is reasonable. A second approach would be to run the CoSaMP algorithm using a variety of sparsity levels and select the best approximation a obtained using the empirical error ||Φa - u||2. If one varies s according to a geometric progression (e.g., s = 1, 2, 4, 8, ..., m), this variation increases the runtime by at most O(log m). See Appendix A of Needell and Tropp19 for more details.

* 3.2. Implementation

CoSaMP was designed to be a practical algorithm for signal recovery, so it is worth spending some time on implementation issues. The least-squares step in the algorithm presents the largest contribution to the runtime. Fortunately, we have constructed ΦT to ensure it is well conditioned, so we can apply its pseudoinverse quickly using an iterative method. Section 5 of Needell and Tropp19 contains a thorough analysis of iterative least-squares methods as modules in CoSaMP. This analysis shows that the cost of solving the least-squares problem is O(ℒ), where ℒ bounds the cost of a matrix–vector multiply with ΦT or cacm5312_l.gif.

The remaining steps of the algorithm involve standard techniques, and we can estimate the operation counts as follows.

Proxy. Forming the proxy is dominated by the cost of the matrix–vector multiply Φ*v.

Identification. We can locate the largest 2s entries of a vector in time O(N) using the approach in Cormen et al.7, Ch. 9. In practice, it may be faster to use an O(N log N) sorting algorithm (e.g., quicksort, mergesort, heapsort7, Sec. 11) on the entries of the signal and then select the first 2s of them.

Support Merger. We can merge two support sets of size O(s) in expected time O(s) via randomized hashing methods,7, Ch. 11 or by sorting both sets first and using the elementary merge procedure7, p. 29 for a total cost O(s log s).

LS estimation. We can use the conjugate gradient method or the LSQR algorit (see e.g. Paige and Saunders22) to compute cacm5312_m.gifu. Initializing the least-squares algorithm requires a matrix–vector multiply with cacm5312_l.gif, and each iteration requires one matrix–vector multiply each with ΦT and cacm5312_l.gif. Since ΦT is a submatrix of Φ, the matrix–vector multiplies can also be obtained from multiplication with the full matrix. We show in Section 5 of Needell and Tropp19 that a constant number of least-squares iterations suffices for our results to hold.

Pruning. As in the identification step, pruning can be implemented in time O(s), but it may be preferable to sort the components of the vector and then select the first s at a cost of O(s log s).

Sample Update. This step is dominated by the cost of the multiplication of Φ with the s-sparse vector ak.

Table 1 summarizes the cost of each step for the cases in which the standard and fast multiply is used.

Finally, we may analyze the storage requirements of the algorithm. Aside from the sampling matrix storage requirement, the algorithm constructs only one vector of length N, the signal proxy. The sample vectors u and v have length m, and thus require O(m) storage. Since the signal approximations are sparse, they can be stored using sparse data structures which require at most O(s log N) storage. Similarly, the index sets that appear require only O(s log N) storage. The total working storage is therefore O(N).

The following result summarizes these observations.

THEOREM 1 (RESOURCE REQUIREMENTS). Each iteration of CoSaMP requires O(ℒ) time, where ℒ bounds the cost of a multiplication with the matrix Φ or Φ*. The algorithm uses working storage O(N).

* 3.3. Error reduction bound

The theoretical analysis of the CoSaMP algorithm is based on an estimate for how much the algorithm reduces the approximation error in each iteration. This result demonstrates that the algorithm makes significant progress whenever the error is substantially larger than a certain baseline value. We call this baseline the unrecoverable energy v in the signal:

eq04.gif

The unrecoverable energy is due to the noise in the samples and the fact that the signal is not exactly sparse. For a detailed discussion, see Section 2.5 of Needell and Tropp.19

The main technical result is the following iteration invariant, which we establish in Section 5.

THEOREM 2 (ERROR REDUCTION). For each iteration k ≥ 0, the signal approximation ak is s-sparse and

ueq06.gif

In particular,

ueq07.gif

The main consequence of Theorem 2 is the fact that, after log2(||x||2/η) iterations, the approximation error is no greater than η + 20v. See Needell and Tropp18, 19 for more discussion.

The statement of Theorem A depends on a simple upper bound for the ell.gif2 term in the unrecoverable energy:

eq05.gif

This estimate is a consequence of Lemma 7 of Gilbert.15

Finally, we can replace the assumption that δ4s ≤ 0.1 by a stronger bound on δ2s because of Corollary 3.4 in Needell and Tropp,19 which states that δcrc · δ2r for any positive integers c and r.

Back to Top

4. Related Work

CoSaMP is inspired by algorithmic ideas and analytic techniques that have appeared previously. We briefly summarize the major algorithmic approaches and compare them with CoSaMP. This work can be classified in three rough categories: convex relaxation,4, 11 greedy pursuits,12, 20, 24 and combinatorial algorithms.8, 14–16

The convex optimization methods1, 10 attempt to reconstruct signals by solving the mathematical program

eq06.gif

where we assume that ||e||2 ≤ ε. The intuition behind minimizing the ell.gif1 norm is that this norm promotes sparsity, and so the solution to this program should approximate a compressible signal well. Candès et al.3 establish that each minimizer a of Equation 6 satisfies

eq07.gif

whenever Φ has restricted isometry constant δ4s ≤ 0.2. These restricted isometry constant requirements continue to be improved.2, 13 The error bound for CoSaMP is equivalent with Equation 7, up to the precise value of the constants.

Many algorithms have been proposed to optimize Equation 6. In particular, interior-point methods1, 17 are guaranteed to solve the problem to a fixed precision in O(m2N1.5) time.21 The runtime for CoSaMP is much better than these interior-point methods.

Greedy algorithms such as OMP,24 StOMP,12 and ROMP20 are iterative methods for approximating a signal from compressive samples. In each iteration, they use a greedy rule to identify new elements in the support of the signal. These methods provide fast runtimes. On the other hand, to the extent that their behavior is understood, greedy algorithms require more measurements or produce worse errors than the convex optimization Equation 6.

Tropp and Gilbert proposed the greedy algorithm orthogonal matching pursuit (OMP) for signal recovery.24 OMP is similar to CoSaMP in that it uses a signal proxy to select large components of the target signal, but it selects one such component at each iteration. However, it does not provide the same uniform guarantees as convex relaxation, and it is unknown whether it succeeds in the noisy setting.

Other algorithms based on OMP have been formulated, such as regularized OMP, or ROMP,20 developed by Needell and Vershynin. ROMP is noteworthy because the authors establish that under the restricted isometry property, the algorithm can approximately recover compressible signals from noisy samples. The error bound and measurement requirements provided by these results are analogous with those of the convex optimization method, modulo parasitic logarithmic terms. Our results for CoSaMP remove all the extraneous factors, so the performance of CoSaMP is essentially optimal.

Finally, we mention that there is a class of algorithms for signal recovery that have sublinear (in the signal dimension) runtime. Some examples are the Fourier sampling algorithm (FS) of Gilbert et al.,16 chaining pursuit,14 and HHS pursuit.15 These methods are very fast, but they require highly structured measurements.

Table 2 summarizes the behavior of the above algorithms in terms of the following five criteria.

General samples. Does the algorithm succeed for a variety of sampling schemes, or does it require structured samples? The designation "RIP" implies that a bound on a restricted isometry constant suffices. "Gauss" means that the algorithm succeeds when the sampling matrix Φ has iid subgaussian entries.

Optimal number of samples. Can the algorithm reconstruct s-sparse signals from a near-optimal number of measurements, m = O(s log N)? Or are its sampling requirements higher (by logarithmic factors)?

Uniformity. Given a fixed sampling matrix, does the algorithm recover all signals? Or do the results hold with high probability only for each fixed signal?

Stability. Does the algorithm succeed when the signal is compressible (but not sparse) and the samples are noisy?

Running time. In the worst case (without fast multiplies), how long does it take the algorithm to recover a real-valued s-sparse signal within a fixed relative precision, given a sampling matrix with no special structure? The designation LP(N, m) represents the cost of solving a linear program with N variables and m constraints (O(m2N1.5) using an interior-point method).

Under all of these metrics, CoSaMP achieves the best performance out of the linear and superlinear methods. Of course, CoSaMP is slower than the sublinear algorithms, but it allows more general sampling matrices and demands fewer samples, which makes it more adaptable to practical applications. CoSaMP delivers the same guarantees as the best convex optimization approaches as well as rigorous bounds on computational cost and storage that are comparable with faster greedy methods. Thus, CoSaMP is essentially optimal in every important respect.

Back to Top

5. Proof of Results

The CoSaMP algorithm uses an approach inspired by the restricted isometry property to identify the largest s components of the target signal. Owing to the restricted isometry property, each set of s components of the proxy vector y = Φ*Φx approximates the energy in the corresponding s components of x. Since the samples are of the form u = Φx, we can obtain the proxy by applying the matrix Φ* to the samples u. Once the set T of significant locations is identified, the signal coefficients can be recovered using cacm5312_m.gif.

The algorithm repeats this idea at each iteration, updating the samples to reflect the residual (the part of the signal yet to be approximated). We use least squares to estimate the signal on this support set and repeat this process until the recoverable energy in the signal has been found.

We now outline the proof of Theorem 2.

* 5.1. Consequences of the RIP

We begin with some important consequences of the restricted isometry property. Omitted proofs appear in Needell and Tropp.19

PROPOSITION 1 (CONSEQUENCES). Suppose Φ has restricted isometry constant δr. Let T be a set of r indices or fewer. Then

ueq08.gif

where the last two statements contain an upper and lower bound, depending on the sign chosen.

A corollary of these bounds is the fact that disjoint sets of columns from the sampling matrix span nearly orthogonal subspaces.

PROPOSITION 2 (ORTHOGONALITY). Suppose Φ has restricted isometry constant δr. Let S and T be disjoint sets of indices whose combined cardinality does not exceed r. Then

ueq09.gif

We apply this result through a corollary.

COROLLARY 1. Suppose Φ has restricted isometry constant δr. Let T be a set of indices, and let x be a vector. Provided that r ≥|Tcup.gif supp(x)|,

ueq10.gif

The last consequence of the restricted isometry property that we will need measures how much the sampling matrix inflates nonsparse vectors. Its proof uses arguments from functional analysis.

PROPOSITION 3 (ENERGY BOUND). Suppose Φ verifies the upper inequality of (3), viz.

ueq11.gif

Then, for every signal x,

ueq12.gif

* 5.2. Iteration invariant: sparse case

In proving Theorem 2, we first operate under the assumption that the signal x is exactly s-sparse. We remove this assumption later.

THEOREM 3 (SPARSE ERROR REDUCTION). Assume x is s-sparse. For each k ≥ 0, the signal approximation ak is s-sparse, and

ueq13.gif

In particular,

ueq14.gif

REMARK 3. This bound assumes the least-squares step in the algorithm is performed without error. In Section 5 of Needell and Tropp,19 we study how many iterations of a least-squares solver are needed to make the resulting error negligible. We show that, if we use conjugate gradient for the estimation step of CoSaMP, then initializing the least-squares algorithm with the current approximation ak-1, then Theorem 3 still holds.

The proof of the iteration invariant, Theorem 3 involves a sequence of short lemmas, one for each step in the algorithm. We fix an iteration k, and let a = ak-1 be the signal approximation at the beginning of the iteration. We define the residual as r = x - a, which must be 2s-sparse since both a and x are s-sparse. We also observe that the vector v of updated samples can be interpreted as noisy samples of the residual:

ueq15.gif

The first step in the algorithm is the identification step, in which a set of components is found corresponding to locations where the residual signal has a lot of energy. This is summarized by the following lemma which is proven in Needell and Tropp.19

LEMMA 1 (IDENTIFICATION). The set W = supp(y2s), where y = Φ*v is the signal proxy, contains at most 2s indices, and

ueq16.gif

Next, the algorithm merges the support of the current estimate a with the new selection of components. The following simple lemma shows that components of the signal x outside this set have small energy.

LEMMA 2 (SUPPORT MERGER). Let W be a set of at most 2s indices. The set T = W cup.gif supp(a) contains at most 3s indices, and

ueq17.gif

The estimation step in the algorithm obtains values for coefficients in the set T by solving a least-squares problem. The next result bounds the error of this approximation.

LEMMA 3 (ESTIMATION19). Let T be a set of at most 3s indices, and define the least-squares signal estimate b by the formulae

ueq18.gif

where u = Φx + e. Then

ueq19.gif

Proof. Note first that

ueq20.gif

Using the expression u = Φx + e and the fact cacm5312_m.gifΦT = IT, we calculate that

ueq21.gif

The cardinality of T is at most 3s, and x is s-sparse, so Proposition 1 and Corollary 1 imply that

ueq22.gif

Combine the bounds to reach

ueq23.gif

Finally, invoke the hypothesis that δ3s ≤ δ4s ≤ 0.1.

Lastly, the algorithm prunes the intermediate estimation to its largest s terms. The triangle inequality bounds the error in this pruned approximation.

LEMMA 4 (PRUNING19). The pruned approximation bs satisfies

ueq24.gif

At the end of the iteration, the new approximation is formed by setting ak = bs. The sequence of lemmas above allows us to prove the iteration invariant for the sparse case, Theorem 3. Indeed, we have:

ueq25.gif

To obtain the second bound in Theorem 3, we solve the error recursion and observe that

ueq26.gif

This point completes the argument.

* 5.3. Extension to general signals

We now turn to the proof of the main result for CoSaMP, Theorem A. We must remove the assumption that the signal x is sparse. Although this job seems difficult at first, it turns out to have an elegant solution. We can view the noisy samples of a generic signal as samples of a sparse signal corrupted by additional noise that reflects the tail of the signal. We have the following reduction to the sparse case, of which Theorem 2 is a consequence.

LEMMA 5 (REDUCTION TO SPARSE CASE19). Let x be an arbitrary vector in CN. The sample vector u = Φx + e can also be expressed as cacm5312_d.gif where

ueq27.gif

This reduction will now allow us to prove the iteration invariant for general signals, Theorem 2. Let x be a general signal. By Lemma 5, we can write the noisy vector of samples as cacm5312_d.gif. Applying the sparse iteration invariant, Theorem 3, we obtain

ueq28.gif

We then invoke the lower and upper triangle inequalities to obtain

ueq29.gif

Finally, recalling the estimate for cacm5312_e.gif from Lemma 5 and simplifying, we reach

ueq30.gif

where v is the unrecoverable energy (Equation 4).

Back to Top

References

1. Candès E, Romberg J., Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information. IEEE Trans. Info. Theory 52, 2 (Feb. 2006), 489–509.

2. Candès, E. J. The restricted isometry property and its implications for compressed sensing. C. R. Math. Acad. Sci. Paris, Serie I 346 (2008), U589–U592.

3. Candès, E. J., Romberg, J., Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pur. Appl. Math. 59, 8 (2006), 1207–1223.

4. Candès, E. J., Tao, T. Decoding by linear programming. IEEE Trans. Info. Theory 51 (2005), 4203–4215.

5. Candès, E. J., Tao, T. Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Info. Theory 52, 12 (Dec. 2006), 5406–5425.

6. Cohen, A., Dahmen, W., DeVore, R. Compressed sensing and best k-term approximation. J. Am. Math. Soc. 22, 1 (2009) 211–231.

7. Cormen, T., Lesierson, C., Rivest, L., Stein, C. Introduction to Algorithms, 2nd edition, MIT Press, Cambridge, MA, 2001.

8. Cormode, G., Muthukrishnan, S. Combinatorial algorithms for compressed sensing. Technical report, DIMACS, 2005.

9. Dai, W., Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Info. Theory 55, 5 (2009).

10. Donoho, D. L. Compressed sensing. IEEE Trans. Info. Theory 52, 4 (Apr. 2006), 1289–1306.

11. Donoho, D. L., Stark, P. B. Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49, 3 (June 1989) 906–931.

12. Donoho, D. L., Tsaig, Y., Drori, I., Starck, J.-L. Sparse solution of underdetermined linear equations by stagewise Orthogonal Matching Pursuit (StOMP). Submitted for publication (2007)

13. Foucart, S. A note on guaranteed sparse recovery via ell.gif1-minimization. Appl. Comput. Harmon. Anal. (2010). To appear.

14. Gilbert, A., Strauss M., Tropp J., Vershynin R. Algorithmic linear dimension reduction in the ell.gif1 norm for sparse vectors. In Proceedings of the 44th Annual Allerton Conference on Communication, Control and Computing (Sept. 2006).

15. Gilbert, A., Strauss, M., Tropp, J., Vershynin, R. One sketch for all: fast algorithms for compressed sensing. In Proceedings of the 39th ACM Symposium. Theory of Computing (San Diego, June 2007).

16. Gilbert, A. C., Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M. J. Near-optimal sparse Fourier representations via sampling. In Proceedings of the 2002 ACM Symposium on Theory of Computing STOC (2002), 152–161.

17. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D. An interiorpoint method for large-scale ell.gif1-regularized least squares. IEEE J. Sel. Top. Signa. 1, 4 (2007) 606–617.

18. Needell, D., Tropp, J. A. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. ACM Technical Report 2008–01, California Institute of Technology, Pasadena, July 2008.

19. Needell, D., Tropp, J. A. CoSaMP: Iterative signal recovery from noisy samples. Appl. Comput. Harmon. Anal. 26, 3 (2008), 301–321.

20. Needell, D., Vershynin, R. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Signa. (2007). To appear.

21. Nesterov, Y. E., Nemirovski, A. S. Interior Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, 1994.

22. Paige, C. C., Saunders, M. A. LSQR: An algorithm for sparse linear equations and sparse least squares. TOMS 8, 1 (1982), 43–71.

23. Rudelson, M., Vershynin, R. Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements. In Proceedings of the 40th Annual Conference on Info. Sciences and Systems, Princeton, Mar. 2006.

24. Tropp, J. A., Gilbert, A. C. Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE Trans. Info. Theory 53, 12 (2007), 4655–4666.

Back to Top

Authors

Deanna Needell ([email protected]), Stanford University, Stanford, CA.

Joel A. Tropp ([email protected]), California Institute of Technology, Pasadena, CA.

Back to Top

Footnotes

The original version of this paper appeared in Applied and Computational Harmonic Analysis 26, 3 (2008), 301–321.

DOI: http://doi.acm.org/10.1145/1859204.1859229

Back to Top

Figures

UF1Algorithm 1. CoSaMP Recovery Algorithm

Back to Top

Tables

T1Table 1. Operation count for CoSaMP

T2Table 2. Comparison of several signal recovery algorithms

Back to top


©2010 ACM  0001-0782/10/1200  $10.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.


 

No entries found