ACM

Communications of the ACM

Home/Careers/Generating Synthetic Data Solves Major Privacy Issues.../Full Text

ACM Careers

Generating Synthetic Data Solves Major Privacy Issues in Research

By Aalto University
June 8, 2021
Comments

View as: Print Mobile App Share:

face and binary data, illustration — Credit: Getty Images

Researchers at the Finnish Center for Artificial Intelligence (FCAI) have developed a machine learning-based method that produces synthetic data on the basis of original data sets. The approach could solve the ongoing problem of data scarcity in medical research and other fields where information is sensitive.

The generated data preserves privacy, remaining similar enough to the original data to be used for statistical analyses. With the new method, researchers can conduct an infinite number of analyses without compromising the identities of the individuals involved in the original experiment.

Researchers have produced and used synthetic data before, but the FCAI team solved a major problem with existing methods by making use of probabilistic modelling. This enabled them to use prior knowledge about the original data without getting too close to the properties of the particular data set used as basis for the synthetic data.

The work is described in "Privacy-Preserving Data Sharing via Probabilistic Modelling," presented at the virtual 180th Meeting of the Acoustical Society of America.

From Aalto University
View Full Article

No entries found