acm-header
Sign In

Communications of the ACM

ACM TechNews

Deep Learning Turns Mono Recordings Into Immersive Sound


View as: Print Mobile App Share:
Converting monaural sounds into spatialized sounds.

Researchers at the University of Texas and Facebook Research taught an artificial intelligence system to convert monaural sounds into nearly three-dimensional sounds.

Credit: Samuel Dixon/Unsplash

Researchers at the University of Texas and Facebook Research have taught an artificial intelligence (AI) system to convert ordinary monaural sounds into nearly three-dimensional sounds.

The researchers trained the AI to pick up visual cues to determine a sound's originating direction, so when provided a video of a scene and mono sound recording, the machine learning system works out this direction and distorts the interaural time and level differences to generate the effect for the listener.

The researchers said, "We call the resulting output 2.5D visual sound—the visual stream helps 'lift' the flat single channel audio into spatialized sound."

The AI was trained on a database of about 2,000 binaural recordings of videoed musical clips.

The researchers said their next step is "to explore ways to incorporate object localization and motion, and explicitly model scene sounds."

From Technology Review
View Full Article

 

Abstracts Copyright © 2019 SmithBucklin, Washington, DC, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account