Researchers at Northwestern University's McCormick School of Engineering have developed an algorithmic approach for data analysis that automatically recognizes uninformative words, known as stop words, in a large collection of text.
This development could dramatically save time during natural language processing and reduce the technology's energy footprint.
The researchers used information theory to develop a model that more accurately and efficiently identifies stop words.
The model relies on a "conditional entropy" metric that measures a given word's certainty of being informative.
The team tested the model by comparing its performance to common topic modeling methods, which infers the words most related to a given topic by comparing them to other text in the data set. T
his method produced improved accuracy and reproducibility across the texts measured, while also being more applicable to other languages.
From Northwestern University McCormick School of Engineering
View Full Article
Abstracts Copyright © 2019 SmithBucklin, Washington, DC, USA
No entries found