Aalto University's Mari-Sanna Paukkeri has devised computational methods for processing and analyzing online text regardless of its language or domain.
The methods employ algorithms that sift through textual data sets for statistical dependencies and structures, from which specific text properties can be extracted. Paukkeri's area of concentration is how unsupervised machine learning applies to natural-language processing. Such techniques do not involve the manual pre-processing of the data set. Instead, the algorithms are left on their own to learn the nature of the data and what type of statistical dependencies and structures it contains.
Paukkeri describes one method, Likey, that is applied to keyphrase and keyword extraction from text documents in 11 languages. Likey determines how common certain words and pairs, threes, and fours of words are in a data set. The keywords and phrases for a specific document are then defined, according to their frequency and context within the text. Paukkeri notes that methods where textual data is processed in all working languages can be especially beneficial for companies with a global reach.
From Aalto University
View Full Article
Abstracts Copyright © 2012 Information Inc., Bethesda, Maryland, USA
No entries found