The In Codice Ratio project uses artificial intelligence and optical character recognition (OCR) software to mine the Vatican Secret Archives and make its documents available for the first time.
Traditional OCR deconstructs words into letter-images by seeking the spaces between letters, and then compares each letter-image to the bank of letters in its memory. After deciding which letter best matches an image, the software renders the letter into a computer code to make the text searchable.
Handwritten text does not translate well with this technology, but In Codice Ratio uses jigsaw segmentation to circumvent this problem by breaking words down into something closer to individual pen strokes. The OCR splits each word into a series of vertical and horizontal bands and looks for local minimums, then carves the letters at these joints and chunks them together to produce possible letters.
Applying common-sense training to the OCR helped further refine the software's deciphering ability.
From The Atlantic
View Full Article
Abstracts Copyright © 2018 Information Inc., Bethesda, Maryland, USA
No entries found