"A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services" presents a research deployment of Field Programmable Gate Arrays (FPGAs) in a Microsoft...James C. Hoe From Communications of the ACM | November 2016
We describe a medium-scale deployment of a composable, reconfigurable hardware fabric on a bed of 1,632 servers, and measure its effectiveness in accelerating the...Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger From Communications of the ACM | November 2016
"DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning" shows a deep understanding of both neural net implementations and the issues in computer...Kurt Keutzer From Communications of the ACM | November 2016
We introduce a series of hardware accelerators (i.e., the DianNao family) designed for Machine Learning (especially neural networks), with a special emphasis on...Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, Olivier Temam From Communications of the ACM | November 2016
As "Jupiter Rising" makes clear, many of the Internet mechanisms for maintaining large-scale networks are suboptimal when the datacenter is largely homogeneous,...Andrew W. Moore From Communications of the ACM | September 2016
We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago.Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Hong Liu, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, Amin Vahdat From Communications of the ACM | September 2016
In "Probabilistic Theorem Proving," Gogate and Domingos suggest how PTP could be turned in a fast approximate algorithm by sampling from the set of children of...Henry Kautz, Parag Singla From Communications of the ACM | July 2016
Many representation schemes combining first-order logic and probability have been proposed in recent years. We propose the first method that has the full power...Vibhav Gogate, Pedro Domingos From Communications of the ACM | July 2016
In "Learning to Name Objects," the authors offer a method to determine a basic-level category name for an object in an image.David Forsyth From Communications of the ACM | March 2016
This paper looks at the problem of predicting category labels that mimic how human observers would name objects.Vicente Ordonez, Wei Liu, Jia Deng, Yejin Choi, Alexander C. Berg, Tamara L. Berg From Communications of the ACM | March 2016
"Bare-Metal Performance for Virtual Machines with Exitless Interrupts" shows how to enable a virtual machine to attain "bare metal" performance from high-speed...Steve Hand From Communications of the ACM | January 2016
We present ExitLess Interrupts (ELI), a software-only approach for handling interrupts within guest virtual machines directly and securely.Nadav Amit, Abel Gordon, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, Dan Tsafrir From Communications of the ACM | January 2016
"Software Dataplane Verification" takes existing static checking of networks to a new level by checking the real code in the forwarding path of a Click router using...George Varghese From Communications of the ACM | November 2015
We present the result of working iteratively on two tasks: designing a domain-specific verification tool for packet-processing software, while trying to identify...Mihai Dobrescu, Katerina Argyraki From Communications of the ACM | November 2015
Specialization improves energy-efficiency in computing but only makes economic sense if there is significant demand. A balance can often be found by designing...Trevor Mudge From Communications of the ACM | April 2015
We present the Convolution Engine (CE) — a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer...Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, Mark Horowitz From Communications of the ACM | April 2015
"Neural Acceleration for General-Purpose Approximate Programs" demonstrates the significant advantages in cost, power, and latency through approximate computing...Ravi Nair From Communications of the ACM | January 2015
This paper describes a new approach that uses machine learning-based transformations to accelerate approximation-tolerant programs.Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, Doug Burger From Communications of the ACM | January 2015
As GPUs have become mainstream parallel processing engines, many applications targeting GPUs now have data locality more amenable to traditional caching. The...Stephen W. Keckler From Communications of the ACM | December 2014
This paper studies the effect of accelerating highly parallel workloads with significant locality on a massively multithreaded GPU.Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt From Communications of the ACM | December 2014