acm-header
Sign In

Communications of the ACM

ACM TechNews

Framework Brings Accuracy, Efficiency to Identifying Stop Words


View as: Print Mobile App Share:
A stop sign.

A research team at Northwestern University's McCormick School of Engineering has developed an algorithmic approach to data analysis that automatically recognizes uninformative words (known as stop words) in a large collection of text.

Credit: ehs.oregonstate.edu

Researchers at Northwestern University's McCormick School of Engineering have developed an algorithmic approach for data analysis that automatically recognizes uninformative words, known as stop words, in a large collection of text.

This development could dramatically save time during natural language processing and reduce the technology's energy footprint.

The researchers used information theory to develop a model that more accurately and efficiently identifies stop words.

The model relies on a "conditional entropy" metric that measures a given word's certainty of being informative.

The team tested the model by comparing its performance to common topic modeling methods, which infers the words most related to a given topic by comparing them to other text in the data set. T

his method produced improved accuracy and reproducibility across the texts measured, while also being more applicable to other languages.

From Northwestern University McCormick School of Engineering
View Full Article

 

Abstracts Copyright © 2019 SmithBucklin, Washington, DC, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account