acm-header
Sign In

Communications of the ACM

ICT Results

Access to Data of the Past


View as: Print Mobile App Share:
data repository

Credit: iStockPhoto.com

A software infrastructure that enables scientific data from long-completed projects to be accessed and understood while advancing shared global usability of current digital research is under construction.

Since the 1970s, spacecraft have made vast numbers of readings on their travels and sent that data back to Earth. But what happens to the data when the mission ends and the software and tacit knowledge needed to interpret it are no longer available?

Generally, there is little money put aside for long-term data preservation. Magnetic tapes full of valuable information have ended up sitting on shelves. A huge amount of data from the arts and humanities, as well as scientific research, is becoming inaccessible and/or unusable ever-more quickly.

Researchers from across Europe on the CASPAR project set out to find secure, reliable and cost-effective ways to ensure that digitally encoded information remains usable for an indefinite time period. CASPAR stands for Cultural, Artistic, and Scientific knowledge for Preservation, Access, and Retrieval. The methodologies and tools developed during the project are important not only because they will provide access to data from the past, suggests David Giaretta, CASPAR project coordinator, and a researcher at the STFC's Rutherford Appleton Laboratory.

"The techniques that you need to preserve old digital objects—techniques that make unfamiliar digital objects usable—are exactly the same techniques you need to make newly created digital objects accessible and understandable," he says.

If the e-science concept of research facilities sharing computational processing and data collections across the Internet is to be fully realized, it will require a CASPAR-style infrastructure.

Indeed, CASPAR infrastructure will put data into a context so that it can be interpreted or understood by 'designated communities'—defined by those who are responsible for the data. For example, the infrastructure may inform us that long lists of numbers are actually calls made from a telephone over a certain period. Learning this would provide most of us with no useful information. However, for 'designated communities' such as the police investigating a crime or the telephone company's invoicing department, understanding that the numbers are telephone calls may be very valuable knowledge indeed.

Driving Standardization and Change

Because the infrastructure developed by the EU-funded CASPAR project is a pioneering implementation of Open Archival Information System (OAIS, ISO 14721), an ISO standard reference model for digital preservation, its influence will be felt right across the digital preservation industry. The purpose of OAIS is to increase awareness and understanding of concepts relevant for archiving digital objects, especially among non-archival institutions. It defines terminology and concepts for describing and comparing data models and archival architectures.

In fact, CASPAR's implementation of OAIS defines the methodology and infrastructure for digital preservation across Europe. It guarantees not only understandability but also the protection of digital rights as well as the authenticity of the information preserved.

CASPAR produced eleven reusable infrastructure components and toolkits to support digital preservation: registry, knowledge management, orchestration, representation information, preservation datastore, data access and security, digital rights management, finding aids, virtualization, packaging, and authenticity.

All components are independent from each other and they offer web-based (and other) services. That gives the system great robustness because there is no single point of failure. CASPAR is an open system able to interoperate with the many different commercial digital preservation solutions on the market.

E-Science Infrastructure

"Over the next five years or so we expect to see those CASPAR components . . . integrated into the broader e-science infrastructure that is being created in Europe," Giaretta says.

"That is why it was so important that CASPAR tools could cope with all types of data, and were tested using cultural and performing arts as well as science data. There are a number of tools and toolkits within CASPAR that are closely tied to specific domains, but there are also elements that are discipline-independent, as you would expect with infrastructure.

"We expect an evolution in the use of the domain-specific tools while other parts will be made even more robust and scalable as they move over into the broader infrastructure across Europe," Giaretta concludes.

The CASPAR project received funding from the ICT strand of the Sixth Framework Program for research.

CASPAR's software releases are available online at the CASPAR website and at Sourceforge.net's digital preservation services site.

From ICT Results

 


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account