You are given a large set of data values, and you are requested to compress, clean, recover, recognize, and/or predict it. Sounds familiar? This is a fundamental list of scientific tasks encountered often in our daily lives. Indeed, this is the essence of the fields of signal-and image-processing, machine-learning, data-mining, and statistics in general. Common to these and many other tasks is the heavy reliance on models. It all starts with our expectation that the given data is not an arbitrary set of randomly chosen numbers, and there are some dependencies and guidelines the data follows, implying the true dimensionality of the data is far smaller than the embedding space dimension.
Knowing these rules enable the solution of all the tasks mentioned here, which begs the question: How would we know these rules to begin with? The bad news is we will probably never know these rules. The good news is we could approximate them using a model. A model is a flexible mathematical construction that is assumed to describe the data reasonably well. The better the model, the better the treatment the data gets. The progress made through the years in processing data is a reflection of the progress in the chosen models and their sophistication.
In the past decade, much effort has been devoted to the study of one specific and universal model that is based on sparsity.1 In a nutshell, this model assumes the data can be described by a matrix (the dictionary) multiplying a sparse vector (the representation). Extensive work in the past decade provided solid theoretical foundations for this model, studied numerical schemes for using it in practice, and explored various applications where state-of-the-art results are obtained.
The following paper by Deanna Needell and Joel Tropp belongs to this field, employing this very model to a task called Compressive Sampling (CoSa). In CoSa, the aim is to "sample" the given data by multiplying it by a matrix that projects it to a (much) lower dimension. This way, few projections are expected to characterize the complete signal, giving a compression effect. The research in CoSa concentrates on the choice of the projections to apply, algorithms for recovering the data from the projections, and most important of all, deriving bounds on the required number of projections to enable a recovery of the original data.
How should the data be recovered from the projections? Owing to the sparsity of the data's representation, we should seek the signal with the sparsest representation that is close to the given projected vector. This problem is known to be NP-hard and the literature offers a series of "pursuit" algorithms for its approximation. One of the greatest surprises in this field is a set of theoretical guarantees for the success of those pursuit methods, essentially claiming that for a sparse enough representation, the approximated solution is not far from ideal.
The authors propose a greedy iterative pursuit algorithm called CoSaMP, and provides a theoretical analysis of its performance, giving such a guarantee for its successful operation. This work is unique in several important ways:
The authors' theoretical analysis introduces a brilliant and powerful language, adequate for an analysis of general greedy methods.
To conclude, CoSa, and sparse approximation in general, pose an appealing model and ways to use it successfully for various tasks. The following paper on CoSaMP is an important milestone in our journey to better understand the potential and implications of this research arena.
1. Bruckstein, A. M., Donoho, D. L., and Elad, M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51, 1 (Feb. 2009), 3481.
2. Dai, W. and Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Info. Theory 55, 5 (2009).
3. Giryes, R. and Elad, M. RIP-based near-oracle performance guarantees for subspace-pursuit, CoSaMP, and iterative hard-thresholding. Submitted to IEEE Trans. Info. Theory.
©2010 ACM 0001-0782/10/1200 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
No entries found