acm-header
Sign In

Communications of the ACM

ACM TechNews

Algorithm Aims to Combat Science's Reproducibility Problem


View as: Print Mobile App Share:
big data, illustration

Credit: Tibco

University of Pennsylvania researchers are developing new data-mining tools designed to make it easier to know which information is relevant, and when a correlation that seems to have predictive value actually does not because it results only from random chance. The researchers say the tools provide a method for successfully testing hypotheses on the same data without compromising statistical assurances that their conclusions are valid. The method could increase the power of analysis done on smaller datasets, by identifying ways researchers can come to a "false discovery."

"One thing you could do is get a totally new set of data for every time you test a hypothesis that is based on something you've tested in the past, but that means the amount of data you need to conduct your analysis is going to grow proportionally to the number of hypotheses you are testing," says Pennsylvania professor Aaron Roth.

The researchers say they developed a "reusable holdout" tool that enables scientists to query through a "differentially private" algorithm, instead of testing a hypothesis on the holdout set directly. Using the tool means any findings that would rely on idiosyncratic outliers of a given set would disappear when looking at data through a differentially private lens.

From University of Pennsylvania
View Full Article

 

Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account