acm-header
Sign In

Communications of the ACM

ACM TechNews

Computational Text Analysis Made Possible Regardless of Language or Domain


View as: Print Mobile App Share:
Text analysis

Credit: Aalto University

Aalto University's Mari-Sanna Paukkeri has devised computational methods for processing and analyzing online text regardless of its language or domain.

The methods employ algorithms that sift through textual data sets for statistical dependencies and structures, from which specific text properties can be extracted. Paukkeri's area of concentration is how unsupervised machine learning applies to natural-language processing. Such techniques do not involve the manual pre-processing of the data set. Instead, the algorithms are left on their own to learn the nature of the data and what type of statistical dependencies and structures it contains.

Paukkeri describes one method, Likey, that is applied to keyphrase and keyword extraction from text documents in 11 languages. Likey determines how common certain words and pairs, threes, and fours of words are in a data set. The keywords and phrases for a specific document are then defined, according to their frequency and context within the text. Paukkeri notes that methods where textual data is processed in all working languages can be especially beneficial for companies with a global reach.

From Aalto University 
View Full Article

Abstracts Copyright © 2012 Information Inc., Bethesda, Maryland, USA 


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account