Researchers at the State University of New York at Buffalo and India's International Institute of Information Technology Bangalore have developed an algorithm to convert old newspapers into searchable data by identifying and ranking people's names in order of their importance.
The researchers worked with the New York Public Library to analyze over 14,000 articles from The Sun published in November and December of 1894.
The algorithm keys on attributes exclusively from text produced by optical character recognition (OCR) software, like name context, title before the name, article length, and how often the name is mentioned in an article.
Because the OCR text was garbled, the researchers modeled the attributes statistically, and tested the algorithm on raw OCR-generated text and articles cleaned up manually by schoolchildren.
They found it could rank names very precisely, even from the OCR text, when compared to the cleaned-up versions.
From University at Buffalo News Center
View Full Article
Abstracts Copyright © 2021 SmithBucklin, Washington, DC, USA
No entries found