ACM

Communications of the ACM

Home/News/Framework Brings Accuracy, Efficiency to Identifying.../Full Text

ACM TechNews

Framework Brings Accuracy, Efficiency to Identifying Stop Words

By Northwestern University McCormick School of Engineering
December 12, 2019
Comments

View as: Print Mobile App Share:

A stop sign. — A research team at Northwestern University's McCormick School of Engineering has developed an algorithmic approach to data analysis that automatically recognizes uninformative words (known as stop words) in a large collection of text.

Credit: ehs.oregonstate.edu

Researchers at Northwestern University's McCormick School of Engineering have developed an algorithmic approach for data analysis that automatically recognizes uninformative words, known as stop words, in a large collection of text.

This development could dramatically save time during natural language processing and reduce the technology's energy footprint.

The researchers used information theory to develop a model that more accurately and efficiently identifies stop words.

The model relies on a "conditional entropy" metric that measures a given word's certainty of being informative.

The team tested the model by comparing its performance to common topic modeling methods, which infers the words most related to a given topic by comparing them to other text in the data set. T

his method produced improved accuracy and reproducibility across the texts measured, while also being more applicable to other languages.

From Northwestern University McCormick School of Engineering
View Full Article

No entries found