The National Archives and Records Administration enlisted the Texas Advanced Computing Center (TACC) to find innovative and scalable solutions to large-scale electronic records collections. The TACC researchers developed a multi-pronged approach that combines different data analysis methods into a visualization framework. Archivists try to determine the organization, contents, and characteristics of collections so they can describe them for public access.
The TACC team adapted a treemap visualization technique to render additional information dimensions, such as technical metadata, file format correlations and preservation risk levels. The renderings are specifically designed to suit the archivist's need to compare different groups of electronic records. The researchers also developed an analysis method that combines string alignment algorithms with natural language processing methods, which will help archivists determine how a group of records is organized.
The researchers are developing another analysis method that computes paragraph-to-paragraph similarity to discover stories from large collections of email messages.
From National Science Foundation
View Full Article
Abstracts Copyright © 2011 Information Inc., Bethesda, Maryland, USA
No entries found