The Research archive provides access to all Research articles published in past issues of Communications of the ACM.
"WINOGRANDE" explores new methods of dataset development and adversarial filtering, expressly designed to prevent AI systems from making claims of smashing through benchmarks without making real progress.
"PlanAlyzer," by Emma Tosch et al., details PlanAlyzer software, the first tool to statically check the validity of online experiments.
We present the first approach for checking the internal validity of online experiments statically, that is, from code alone.
We introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original Winograd Schema Challenge, but adjusted to improve both the scale and the hardness of the dataset.