"Automatically Accelerating Non-Numerical Programs by Architecture-Compiler Co-Design," by Simone Campanoni, et al., proposes a modest hardware extension to support...James Larus From Communications of the ACM | December 2017
We describe a medium-scale deployment of a composable, reconfigurable hardware fabric on a bed of 1,632 servers, and measure its effectiveness in accelerating the...Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger From Communications of the ACM | November 2016
"Efficient Parallelization Using Rank Convergence in Dynamic Programming Algorithms" shows how some instances of dynamic programming can be effectively parallelized...James Larus From Communications of the ACM | October 2016
"Can Traditional Programming Bridge the Ninja Performance Gap for Parallel Computing Applications" advocates an appealing division of labor between a developer...James Larus From Communications of the ACM | May 2015