Metadata, the layer of data abstraction, is among the more enigmatic elements in information systems. Enigmatic, since IT professionals have vague and sometimes conflicting views of its role and value. While system administrators contend that no system can function robustly without metadata, business decision makers often fail to see its value and consequently find it difficult to justify metadata-related expenditures [5, 8]. Meanwhile, academic researchers have not examined metadata in depth, and consequently theoretical frameworks for assessing the value of metadata do not exist.
In the past, before commercial database management systems were widely adopted by organizations, metadata was a second-class citizen in the data management field [7]. Application and system developers who sought to implement metadata solutions considered it a Sisyphean1 torture. Metadata requirements are complex and difficult to capture, implementation is demanding, the end result is rarely satisfactory, and enhancements or corrections require significant effort, as metadata layers are deeply embedded in systems. Why is implementing metadata solutions so difficult? Here, we explore this question in the context of a data warehouse, introducing the multiple elements that constitute metadata to illustrate its inherent complexity. We further explore the drawbacks of commercial off-the-shelf (COTS) products for managing metadata and the challenge of designing and implementing metadata solutions. Finally, we ask whether the difficulties introduced by metadata offset its benefits. The answer is not clear-cut, and many factors must still be examined.
Metadata is often viewed as a system's data dictionary, capturing definitions of data entities and the relationships among them. While not inaccurate, this is also an overly narrow view that overlooks the richness and complexity of metadata. Metadata has been categorized in many ways; for example, technical metadata is used for system operation and maintenance, and business metadata is used for data valuation and interpretation [5]. While metadata may be a technical necessity, recent studies of data quality [9], decision theory [1, 2] and knowledge management [6] suggest it has significant business implications as well. A second categorization differentiates back-end metadata (associated with data storage and processing) from front-end metadata (associated with information delivery and use) [8]. The table here lists several examples of metadata components based on these classifications. A third categorization is based on functionality and reflects design and maintenance characteristics abstracted by the metadata. It identifies six distinct metadata types [8]:
Recent studies [1, 2] have explored the effects of metadataspecifically quality and process metadataon decision-making efficiency and decision outcome. Decision makers may evaluate data quality both impartially (objectively) and contextually (subjectively) [9]. Impartial evaluation is based on the data itself, including missing records, miscalculated fields, and integrity violations. Contextual evaluation accounts for a variety of factors (such as the process that generated the data, the decision task the data is used for, and the decision maker's motivation and expertise). To support them, quality metadata includes pre-evaluated measurements (along the dimensions of data quality, such as accuracy, timeliness, and completeness) derived from the data itself, while process metadata offers a way to gauge quality within a decision context.
Providing these metadata components to decision makers has been shown to significantly improve their decision process efficiency [1], as well as the decision outcome [1, 2]. Figure 1 outlines how quality metadata can be used to enrich a report. Contextually, quality metadata plays a more significant role when the report is used to help determine a quarterly bonus (a context) and a less significant role if used for routine performance tracking (a different context).
The metadata categoriesfront-end/back-end, business/technical, and functional typesare not necessarily distinct and may reflect strong interdependencies. For example, source-target transformations (process metadata) must be mapped to the physical configuration (infrastructure metadata); data delivery information (interface metadata) must be tied to registered users (administration metadata); and quality (content) metadata must relate to the actual data element it describes (model metadata). A consequence of adopting a narrow view of metadata while failing to understand the relationships among metadata components is the creation of fragmented "metadata islands." Each island includes metadata of a specific functionality, unaware of and unable to communicate with other islands. Even when system designers and developers understand this complexity, implementing an integrated metadata layer is resource-intensive in terms of money, time, and managerial effort, as well as being a technical challenge.
Metadata is likely to be useful in rational, data-driven, analytical decision-making scenarios. Not clear is whether it provides similar benefit in decision processes that are more intuitive or politically charged.
Given the different types of metadata and the complexity of managing them, IT practitioners turn to COTS products for implementing data warehousing and metadata solutions [4, 10]. COTS products in this area are broadly classified into three categories. The first is data storage and management systems (such as Oracle, Sybase, MS-SQL, IBM-UDB, and Hyperion-Essbase). The second includes automated back-end data-processing utilities, commonly known as ETL products (such as Informatica, Oracle Warehouse-Builder, MS-SQL DTS, IBM Warehouse Manager, and Hummingbird). And the third includes reporting, or business intelligence (BI), utilities (such as MicroStrategy, Business Objects, and Cognos). Most leading data warehousing products provide GUI-supported utilities for metadata management.
An examination of the leading COTS products reveals an ambiguous picture. On one hand, all the vendors acknowledge the importance of metadata and embed it within their products. On the other, these products fail to address several critical issues:
The broad and complex functionality of metadata, coupled with insufficient support for metadata management from software products pose several challenges for implementing metadata solutions. A successful implementation must also address other technical and managerial factors, including:
Interchangeable metadata formats. Metadata can be captured and represented in a variety of formats. For example, textual flat files are easy to implement and read but are less secure and do not readily support the capture of the relationships among metadata components. Relational models are easier to centralize and integrate, relatively secure, and equipped with a standard access method (SQL) and with well-defined data-administration utilities. However, relational implementation can be complex and expensive (in terms of RDBMS purchase costs and administration overhead). Graphical structures (such as entity relationship models) are more interpretable but require user training; they are also not easy to integrate with metadata in other formats. Documents allow business users to easily understand metadata and capture complex detail. On the flip side, integrating documents with other formats is difficult; documents also require significant administrative overhead. Proprietary data structures are customizable for specific organizational needs, but integrating them with standard formats is difficult.
Metadata implementation is likely to involve more than one format. Certain data entities may require abstraction in multiple formats, hence efficient interchangeability among formats is highly desirable. A common approach for achieving compatibility and interchangeability is to choose one format (typically the relational model) as the baseline for the others. Figure 2 outlines the concept of format interchangeability. For example, the Sale Transactions data in the figure is abstracted into three metadata formatstabular, textual, and visualeach targeting a different user group.
Integrating metadata. Without appropriate controls metadata might evolve inconsistently across the enterprise as complex, isolated, nonreusable "pockets" tightly coupled with individual applications [7]. These pockets might lead to conflicting standards for business entities, disable efficient communication among subsystems, and complicate system maintenance [5]. Metadata management is moving from managing decentralized metadata pockets to managing centralized repositories [7]. The metadata repository reflects this trend [3, 5, 10], providing enterprisewide storage of metadata that integrates all components, offers better control, and avoids metadata islands. Unfortunately, a comprehensive commercial solution for full-fledged metadata integration does not exist. A major obstacle for integration, as pointed out earlier, is the lack of standardization among COTS products that manage metadata.
Efforts to overcome metadata exchange and integration problems have been partially successful. The market, however, is still split between two competing metadata exchange standards: the Open Integration Model (OIM) and the Common Warehouse Model (CWM). The Metadata Coalition, led by Microsoft, proposed OIM in 1999. At about the same time, the Object Management Group, led by Oracle, promoted CWM. Both standards allow data warehousing tools to keep a proprietary form of metadata while permitting access to it through a standard interface. However, they differ in their mix of exchangeable metadata elements and in their exchange formats and hence are not readily interchangeable.
Securing organizationwide support is typically the greatest challenge in any successful metadata implementation.
The existence of competing standards complicates the selection of tools for a data warehouse implementation. To facilitate easier integration, it is preferable to select a mix of tools committed to the same metadata interchange standard. Alternatively, the standard gap can be bridged through specialized metadata management tools (such as MetaCenter by Data Advantage Group, Advantage Data Transformer by Computer Associates, MetaBase by MetaMatrix, and Unicorn System by Unicorn). These tools provide a unified metadata infrastructure, broad coverage for technical and business metadata, the claim of being vendor-independent, and support for multiple metadata exchange formats. XML, another emerging solution for integration is the standard for data interchange among distributed applications. XML is used by CWM for metadata exchange, integrating both data and metadata into a single structure.
Design paradigms. Organizations face several choices when designing an enterprise repository. An elementary choice is from among the top-down, bottom-up, and hybrid strategies needed to capture requirements. A top-down approach would look at the entire set of organizational information systems and try to capture an overall metadata picture.3 A bottom-up approach, on the other hand, would start from the lower granularity of subsystems and bring their metadata specifications together into one unified set. While a top-down paradigm is more likely to ensure standardization and integration among sub-systems, it might be infeasible in cases involving information systems with local metadata repositories already in place. Moreover, capturing metadata requirements for an entire organization is a complex and tedious task that might not be completed in a reasonable time. The bottom-up paradigm, focusing on specific system, is more likely to achieve short-term results but fail to satisfy broader integration needs.
Organizations may compromise by choosing a hybrid approach that still starts at a high level but focuses on the metadata modules more critical to the organization, as well as key functionality types (such as semantic layer, security configuration, information flow rules, and data quality assessment). The key modules serve as the base for a core repository that becomes a centralized metadata source for those specific functionality types. The metadata repository is not comprehensive or exhaustive to start with. Subsequent to the core implementation, initial modules may be expanded and others added incrementally as the repository grows. Compared to the top-down approach, the hybrid counterpart has the advantage of allowing faster, less-complex, less-costly implementations. On the other hand, it still provides a centralized, integrated solution to the key metadata elements.
The metadata repository architecture is another important design choice [8]. A centralized architecture, corresponding to a top-down paradigm, locates the organizational metadata repository on a centralized server and becomes the only metadata source for all front-end and back-end utilities. Alternatively, a distributed architecture, corresponding to a bottom-up design paradigm, allows systems to maintain their own customized metadata. Hybrid architecture allows metadata to reside with applications but keeps control (and the key components) in a centralized repository.
The chosen design paradigm and derived architectural approach are likely to be influenced by the organizational structure and the complexity of its information systems. It is unlikely that a large organization with sophisticated information needs would adopt a top-down design for metadata and implement it in a centralized manner. Such organizations are likely to have many information systems, hence are more likely to apply a decentralized or hybrid architecture through the corresponding design paradigms. Smaller organizations, with less complex information needs, can afford the luxury of a top-down approach, aiming to capture the entire set of metadata requirements and implement a centralized architecture.
Metadata quality. The design and initial implementation of the metadata layer is only the beginning. As organizations enhance their business activities or transition to new ones, information systems, the underlying data, and metadata must all be updated accordingly. Poor-quality metadata can result not only from poor analysis of requirements or from an invalid design approach but also from the failure to detect changes in the business and reflect them in the metadata layer. With metadata being at the functional core of information systems, poor quality might in turn degrade the quality of the data, cause operational failures, and violate information security. To keep metadata functional and its quality high, organizations need to invest in its ongoing administration and maintenance. Metadata must be constantly updated to reflect evolving changes in data models, business rules, information flow, end-user configuration, and underlying technology.
Given these challenges, is metadata implementation worth the effort? If IT/IS researchers and practitioners understand only metadata's technical merits but not its business benefits, why should business managers care about metadata? Wouldn't it be reasonable for COTS product vendors to focus on technical metadata, designing it exclusively for IT professionals while ignoring the business decision makers? The answer is not obvious, as the benefits of metadata are not yet well known. Recent studies [1, 2, 6] suggest that metadata may significantly benefit business decision makers, hence, ought to be further explored by the academic research community.
Data-quality management is a promising area for metadata [9]. Due to the rapid growth of data volume and complexity, poor data quality represents a growing hazard to information systems. Metadata enables decision makers to gauge data quality and is critical for the administration of processes and security within decision-support environments (such as a data warehouse).
Managerial decision making stands to benefit from metadata [1, 2]. Understanding this benefit involves several questions:
Securing organizationwide support is typically the greatest challenge in any successful metadata implementation. Such support cannot be achieved without identifying and communicating the merits of metadata to the technical community, to business users, and to corporate decision makers alike. Those merits, however, as well as the drawbacks, have yet to be fully explored, and many questions remain to be answered before metadata value is fully known.
1. Even, A., Shankaranarayanan, G., and Watts, S. Enhancing decision making with process metadata: Theoretical framework, research tool, and exploratory examination. In Proceedings of the 39th Hawaii International Conference on System Sciences (HICSS-39) (Kauai, HI, Jan. 47). IEEE Press, Los Alamitos, CA, 2006.
2. Fisher, C., Chengalur-Smith, I., and Ballou, D. The impact of experience and time on the use of data quality information in decision making. Information Systems Research 14, 2 (June 2003), 170188.
3. Jarke, M., Lenzerini, M., Vassiliou, Y., and Vassiliadis, P. Fundamentals of Data Warehouses. Springer-Verlag, Heidelberg, Germany, 2000.
4. Kimball, R., Reeves, L., Ross, M., and Thornthwaite, W. The Data Warehouse Lifecycle Toolkit. Wiley Computer Publishing, New York, 2000.
5. Marco, D. Building and Managing the Meta Data Repository: A Full Lifecycle Guide. John Wiley and Sons, Inc., New York, 2000.
6. Nevo, D. and Wand, Y. Organizational memory information systems: A transactive memory approach. Decision Support Systems 39, 4 (June 2005), 549562.
7. Sen, A. Metadata management: Past, present, future. Decision Support Systems 37, 1 (Apr. 2004), 151173.
8. Shankaranarayanan, G. and Even, A. Managing metadata in data warehouses: Pitfalls and possibilities. Communications of the AIS 14, 13 (Sept. 2004), 247274.
9. Shankaranarayanan, G., Ziad, M., and Wang, R. Managing data quality in dynamic decision-making environments: An information product approach. Journal of Database Management 14, 4 (Oct.Dec. 2003), 1432.
10. Vaduva, A. and Vetterli, T. Metadata management for data warehousing: An overview. International Journal of Cooperative Information Systems 10, 3 (Sept. 2001), 273298.
1In Greek mythology, Sisyphus, the king of Corinth, was condemned to eternal torture by the gods. His punishment was to roll a heavy stone up to the top of a steep hill, and, whenever he would almost reach the top, the stone would roll down to the bottom of the hill, forcing him to start again.
2This information is based on a comprehensive review we conducted in February 2004 and revisited in September 2005 [8].
3Such an approach is supported by several commercial metadata management tools, including the Unicorn System from Unicorn and MetaCenter from Data Advantage Group.
©2006 ACM 0001-0782/06/0200 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.