"A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services" presents a research deployment of Field Programmable Gate Arrays (FPGAs) in a Microsoft...James C. Hoe From Communications of the ACM | November 2016
We describe a medium-scale deployment of a composable, reconfigurable hardware fabric on a bed of 1,632 servers, and measure its effectiveness in accelerating the...Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger From Communications of the ACM | November 2016
"DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning" shows a deep understanding of both neural net implementations and the issues in computer...Kurt Keutzer From Communications of the ACM | November 2016
We introduce a series of hardware accelerators (i.e., the DianNao family) designed for Machine Learning (especially neural networks), with a special emphasis on...Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, Olivier Temam From Communications of the ACM | November 2016
"Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware" by Carbin et al. addresses challenges related to a bug, how likely it is to...Todd Millstein From Communications of the ACM | August 2016
In "Probabilistic Theorem Proving," Gogate and Domingos suggest how PTP could be turned in a fast approximate algorithm by sampling from the set of children of...Henry Kautz, Parag Singla From Communications of the ACM | July 2016
Many representation schemes combining first-order logic and probability have been proposed in recent years. We propose the first method that has the full power...Vibhav Gogate, Pedro Domingos From Communications of the ACM | July 2016
"Bare-Metal Performance for Virtual Machines with Exitless Interrupts" shows how to enable a virtual machine to attain "bare metal" performance from high-speed...Steve Hand From Communications of the ACM | January 2016
We present ExitLess Interrupts (ELI), a software-only approach for handling interrupts within guest virtual machines directly and securely.Nadav Amit, Abel Gordon, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, Dan Tsafrir From Communications of the ACM | January 2016
"Software Dataplane Verification" takes existing static checking of networks to a new level by checking the real code in the forwarding path of a Click router using...George Varghese From Communications of the ACM | November 2015
We present the result of working iteratively on two tasks: designing a domain-specific verification tool for packet-processing software, while trying to identify...Mihai Dobrescu, Katerina Argyraki From Communications of the ACM | November 2015
Specialization improves energy-efficiency in computing but only makes economic sense if there is significant demand. A balance can often be found by designing...Trevor Mudge From Communications of the ACM | April 2015
We present the Convolution Engine (CE) — a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer...Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, Mark Horowitz From Communications of the ACM | April 2015
"Neural Acceleration for General-Purpose Approximate Programs" demonstrates the significant advantages in cost, power, and latency through approximate computing...Ravi Nair From Communications of the ACM | January 2015
This paper describes a new approach that uses machine learning-based transformations to accelerate approximation-tolerant programs.Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, Doug Burger From Communications of the ACM | January 2015
As GPUs have become mainstream parallel processing engines, many applications targeting GPUs now have data locality more amenable to traditional caching. The...Stephen W. Keckler From Communications of the ACM | December 2014
This paper studies the effect of accelerating highly parallel workloads with significant locality on a massively multithreaded GPU.Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt From Communications of the ACM | December 2014
"Dissection: A New Paradigm for Solving Bicomposite Search Problems," by Itai Dinur, Orr Dunkelman, Nathan Keller, and Adi Shamir, presents an elegant new algorithm...Bart Preneel From Communications of the ACM | October 2014
In this paper, we introduce the new notion of bicomposite search problems, and show that they can be solved with improved combinations of time and space complexities...Itai Dinur, Orr Dunkelman, Nathan Keller, Adi Shamir From Communications of the ACM | October 2014
Having multiple Wi-Fi Access Points with an overlapping coverage area operating on the same frequency may not be a problem anymore.Konstantina (Dina) Papagiannaki From Communications of the ACM | July 2014