ACM

Communications of the ACM

Home/News/Forget the Needle, Consider the Haystack: Uncovering.../Full Text

ACM TechNews

Forget the Needle, Consider the Haystack: Uncovering Hidden Structures in Massive Data Collections

By Princeton University
November 4, 2013
Comments

View as: Print Mobile App Share:

Prem Gopalan and David Blei of Princeton University — Princeton University doctoral student Prem Gopalan (left) and associate professor of computer science David Blei (right) have developed a method to identify groups within vast sets of data.

Credit: Frank Wojciechowski / Princeton University

Princeton University computer scientists have developed a method to leverage big data using a mathematical method to determine the probability of a pattern repeating itself throughout a data subset. The researchers say their method significantly reduces the time required to uncover patterns in large data collections such as social networks, enabling researchers to pinpoint links between seemingly unrelated groups.

"The data we are interested in are graphs of networks like friends on Facebook or lists of academic citations," says Princeton professor David Blei.

The researchers developed an algorithm to analyze a subset of a large database, determining the likelihood that nodes belong to various groups in the database. The researchers then created an adjustable matrix that accepts the subset's analysis and assigns weights to each data point based on its probability of belonging to different groups.

The research is based on a stochastic optimization method that identifies a central pattern from a group of seemingly random data. Blei compares the technique to navigating from New York to Los Angeles by asking random people for directions, which would eventually be successful given the right questions and interpretations. The researchers used the method to find patterns in connections between patents using public data from the U.S. National Bureau of Economic Research.

From Princeton University
View Full Article

No entries found