acm-header
Sign In

Communications of the ACM

China Region special section: Big trends

The Practice of Speech and Language Processing in China


mouth projecting speech signals, illustration

Credit: Wired

Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in very noisy environments. Over the past few years, Chinese startup AISpeech has been developing very deep convolutional neural networks (VDCNN),21 a new architecture the company recently began applying to ASR use cases.

Different than traditional deep CNN models for computer vision, VDCNN features novel filter designs, pooling operations, input feature map selection, and padding strategies, all of which lead to more accurate and robust ASR performance. Moreover, VDCNN is further extended with adaptation, which can significantly alleviate the mismatch between training and testing. Factor-aware training and cluster-adaptive training are explored to fully utilize the environmental variety and quickly adapt model parameters. With this newly proposed approach, ASR systems can improve the system robustness and accuracy, even in under very noisy and complex conditions.1


 

No entries found

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account