acm-header
Sign In

Communications of the ACM

ACM Careers

Generating Synthetic Data Solves Major Privacy Issues in Research


View as: Print Mobile App Share:
face and binary data, illustration

Credit: Getty Images

Researchers at the Finnish Center for Artificial Intelligence (FCAI) have developed a machine learning-based method that produces synthetic data on the basis of original data sets. The approach could solve the ongoing problem of data scarcity in medical research and other fields where information is sensitive.

The generated data preserves privacy, remaining similar enough to the original data to be used for statistical analyses. With the new method, researchers can conduct an infinite number of analyses without compromising the identities of the individuals involved in the original experiment.

Researchers have produced and used synthetic data before, but the FCAI team solved a major problem with existing methods by making use of probabilistic modelling. This enabled them to use prior knowledge about the original data without getting too close to the properties of the particular data set used as basis for the synthetic data.

The work is described in "Privacy-Preserving Data Sharing via Probabilistic Modelling," presented at the virtual 180th Meeting of the Acoustical Society of America.

From Aalto University
View Full Article


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account