ACM

Communications of the ACM

Home/Careers/Coalition to Build Data Set for Deepfake Detection.../Full Text

ACM Careers

Coalition to Build Data Set for Deepfake Detection Challenge

By Facebook
September 6, 2019
Comments

View as: Print Mobile App Share:

Data sets and benchmarks have been some of the most effective tools to speed progress in AI. The current renaissance in deep learning has been fueled in part by the ImageNet benchmark. Recent advances in natural language processing have been hastened by the GLUE and SuperGLUE benchmarks.

"Deepfake" techniques, which present realistic AI-generated videos of real people doing and saying fictional things, have significant implications for determining the legitimacy of information presented online. Yet the industry doesn't have a great data set or benchmark for detecting them. To catalyze more research and development in this area and ensure that there are better open source tools to detect deepfakes, Facebook CTO Mike Schroepfer announced that Facebook, the Partnership on AI, Microsoft, and academics from Cornell Tech, MIT, University of Oxford, UC Berkeley, University of Maryland, College Park, and University at Albany-SUNY are coming together to build the Deepfake Detection Challenge.

The goal of the challenge is to produce technology that everyone can use to better detect when AI has been used to alter a video in order to mislead the viewer, Schroepfer says. The Deepfake Detection Challenge will include a data set and leaderboard, as well as grants and awards, to spur the industry to create new ways of detecting and preventing media manipulated via AI from being used to mislead others. The governance of the challenge will be facilitated and overseen by the Partnership on AI's new Steering Committee on AI and Media Integrity, which is made up of a broad cross-sector coalition of organizations including Facebook, WITNESS, Microsoft, and others in civil society and the technology, media, and academic communities.

It's important to have data that is freely available for the community to use, with clearly consenting participants, and few restrictions on usage, Schroepfer says. That's why Facebook is commissioning a realistic data set that will use paid actors, with the required consent obtained, to contribute to the challenge. No Facebook user data will be used in this data set. Facebook is also funding research collaborations and prizes for the challenge to help encourage more participation. In total, it is dedicating more than $10 million to fund this industry-wide effort.

To ensure the quality of the data set and challenge parameters, the data will initially be tested through a targeted technical working session this October at ICCV 2019, the International Conference on Computer Vision. The full data set release and the Deepfake Detection Challenge launch will happen at NeurIPS 2019, the 33rd Conference on Neural Information Processing Systems, this December. Facebook will also enter the challenge but not accept any financial prize. Regular updates will be available on the Deepfake Detection Challenge website.

"This is a constantly evolving problem, much like spam or other adversarial challenges, and our hope is that by helping the industry and AI community come together we can make faster progress," Schroepfer says.

Outside experts shared their perspectives on the project.

"In order to move from the information age to the knowledge age, we must do better in distinguishing the real from the fake, reward trusted content over untrusted content, and educate the next generation to be better digital citizens," says Professor Hany Farid, Professor in the Department of Electrical Engineering & Computer Science and the School of Information, UC Berkeley. "This will require investments across the board, including in industry/university/NGO research efforts to develop and operationalize technology that can quickly and accurately determine which content is authentic."

"People have manipulated images for almost as long as photography has existed. But it's now possible for almost anyone to create and pass off fakes to a mass audience," says Antonio Torralba, Professor of Electrical Engineering & Computer Science and Director of the MIT Quest for Intelligence. "The goal of this competition is to build AI systems that can detect the slight imperfections in a doctored image and expose its fraudulent representation of reality."

"As we live in the multimedia age, having information with integrity is crucial to our lives. Given the recent developments in being able to generate manipulated information (text, images, videos, and audio) at scale, we need the full involvement of the research community in an open environment to develop methods and systems that can detect and mitigate the ill effects of manipulated multimedia," says Professor Rama Chellappa, Distinguished University Professor and Minta Martin Professor of Engineering, University of Maryland. "By making available a large corpus of genuine and manipulated media, the proposed challenge will excite and enable the research community to collectively address this looming crisis."

"To effectively drive change and solve problems, we believe it's critical for academia and industry to come together in an open and collaborative environment. At Cornell Tech, our research is centered around bridging that gap and addressing technology's societal impact in the digital age, and the Deepfake Detection Challenge is a perfect example of this," says Serge Belongie, Associate Dean and Professor, Cornell Tech. "Working with tech industry leaders and academic colleagues, we are developing a comprehensive data source that will enable us to identify fake media and ultimately lead to building tools and solutions to combat it. We're proud to be a part of this group and to share the data source with the public, allowing anyone to learn from and expand upon this research."

"Manipulated media being put out on the Internet, to create bogus conspiracy theories and to manipulate people for political gain, is becoming an issue of global importance, as it is a fundamental threat to democracy, and hence freedom. I believe we urgently need new tools to detect and characterize this misinformation, so I am happy to be part of an initiative that seeks to mobilize the research community around these goals — both to preserve the truth whilst pushing the frontiers of science.," says Professor Philip H. S. Torr, Department of Engineering Science, University of Oxford.

"Although deepfakes may look realistic, the fact that they are generated from an algorithm instead of real events captured by camera means they can still be detected and their provenance verified," says Professor Siwei Lyu, College of Engineering and Applied Sciences, University at Albany-SUNY. "Several promising new methods for spotting and mitigate the harmful effects of deepfakes are coming on stream, including procedures for adding 'digital fingerprints' to video footage to help verify its authenticity. As with any complex problem, it needs a joint effort from the technical community, government agencies, media, platform companies, and every online users to combat their negative impact."

"Technology to manipulate images is advancing faster than our ability to tell what's real from what's been faked. A problem as big as this won't be solved by one person alone. Open competitions like this one spur innovation by focusing the world's collective brainpower on a seemingly impossible goal," says Phillip Isola, Assistant Professor of Electrical Engineering and Computer Science, MIT.

No entries found