acm-header
Sign In

Communications of the ACM

Communications of the ACM

Features to Consider in a Data Warehousing System


In order for an organization to achieve competitive advantage, voluminous data needs to be managed, analyzed, and fed into the decision-making process. Data warehouses provide decision support to organizations with the help of analytical databases and online analytical processing (OLAP) tools. Incorporating OLAP tools into decision models as part of decision support systems improves decision making [10]. Decision makers can: access analytical databases through an OLAP interface and are able to analyze corporate data on various dimensions; view corporate changes over a period of time, to obtain a macro view of the business operations as well as perform a microanalysis in a specific sub-function; perform various what-if analyses; and drill-down and discover the pattern of sales of certain products in a given period of time or find how the sales performance of an individual salesperson affects the company's revenues.

These time-order/aggregation/disaggregation features provide decision makers valuable insights into the customer/business behavior, which are of fundamental importance for better decision making. OLAP tools have benefited organizations in different ways. For example, Lockheed Martin has used OLAP tools in aircraft design and manufacturing data and cut their analyst labor costs up to 20% [4]. As a result of using data warehouse technology, First American Corporation has transformed itself and improved its financial performance from losses to profits [3]. Data warehouse technology in conjunction with OLAP has been useful in improving decision making in the community health care realm, as shown in [1].

However, despite potential benefits of data warehousing and OLAP tools, such projects were difficult to use and failed to realize benefits [9]. Corporations that invest in data warehouses often do not provide tools to end users that they can use easily, resulting in users not utilizing the tools, millions of dollars worth of unused software, and unrealized return on investment [8]. The most important determinants of new technology acceptance are perceived ease of use (PEU) and perceived usefulness (PU) [6]. PU is defined as the degree to which a person believes that using a particular system would enhance his or her job performance. PEU is defined as the degree to which a person believes that using a particular system would be free of effort. In order to derive benefits from OLAP technology, it is important to assess whether the OLAP tools, as an integral part of data warehousing, help or hinder the usage by the end user. Thus, this article is intended: To find the effect of OLAP features on perceived easy of use (PEU) and the perceived usefulness (PU) of OLAP; to provide suggestions for appropriate contexts for use of ROLAP and MOLAP systems; and to provide guidelines for better design of data warehouses with OLAP technology.

Back to Top

Data Warehouse and OLAP

Codd et al. first coined the term OLAP in 1993 as "the dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data." OLAP technology can organize data in multidimensional tables called data cubes and provides access to the data warehouse through an interactive GUI (see Figure 1). Some of the common capabilities of OLAP include: multidimensionality, aggregation, drill-down and roll-up (view detailed and aggregated data), and slicing and dicing.

The most common types of OLAP technology are Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). The differences between the two concern data processing capability and data currency [9]. In MOLAP, the data is cleaned, aggregated in multiple dimensions, and uploaded into a data cube periodically. The data is stored in multidimensional arrays [2], thus the database has precompiled organization and data arrays that can be accessed directly and faster. In ROLAP, data is aggregated and stored along with relational databases. ROLAP relies on indices to be built on tables for data access. Users generate queries using SQL on the fly, offering more flexibility in query generation and data currency.

Back to Top

Research Methodology

Measures of PU and PEU. Data was collected for each feature of OLAP and ease of use/usefulness of OLAP system as perceived by users. Perceived Usefulness is measured based on the potential OLAP benefits [6]: improves decision making, provides accurate analysis, provides all required information, improves working efficiency, and increases user productivity. Perceived Ease of Use is measured based on whether users feel that learning OLAP was easy, the system was user-friendly, OLAP was easy to use, and it was easy to get information.

Measures of OLAP Features. Seven types of tools or features are normally offered in an OLAP system. Based on previous literature, each feature and its components are described here.

Visualization allows users to create summary tables and charts interactively. This is measured using the presence of multidimensional tables and graphics.

Summarization refers to "degree of aggregation" of information. We measure this feature using number of hierarchies allowed, level of detail, and the capability to swap between summarized and detailed levels.

Navigation refers to its capability to drill-down or drill-up between levels of detail. This is measured by shareability (number of concurrent users allowed), data navigatability (availability of drill-down, slicing-dicing, and drag-drop facilities), and ability to extract detailed and real-time data.

Query Function: Query engines extract data from multidimensional databases and generate outputs in 3D graphics. This is measured using preconstructed query capability, simple query building with click-select feature, query building with query languages, and concurrent run of queries.

Sophisticated Analysis: This feature is measured by six most common types of analyses used in decision support: statistical profiling (for example, list customers with highest combined sales); moving averages; cross dimension comparison (compare product sales by region over a period of time); queries with self-defined formula; exception condition; and what-if analysis.

Dimensionality is measured using the number of allowable dimensions, capability to redefine dimension, and time for data refresh after redefinition.

Performance includes response times for four basic functions: standard report generation, customized report generation, graphic/chart generation, and data navigation.

Data Collection. To examine the effect of OLAP features on perceived ease of use (PEU) and perceived usefulness (PU), a questionnaire-based survey was conducted in Hong Kong. The questionnaire considered user demographics, measures of PEU and PU, and features of OLAP in place. Users were queried about their positions, departments, and OLAP systems they used. Questions regarding PEU and PU, for example, "OLAP system increases my productivity," used a five-point Likert scale ranging from 1 (strongly agree) to 5 (strongly disagree). Questions regarding OLAP features inquired about their satisfaction, for example, "Flexibility to swap between summarized and detailed data" (1=strongly unsatisfied to 5=very satisfied). Alternatively, the respondents for each feature may choose "not used" and "not applicable," as appropriate. The questionnaire was sent to two groups of people—ROLAP and MOLAP users. Seventy-eight questionnaires were sent to four companies with two of each using ROLAP or MOLAP systems. Approximately 58 questionnaires were returned providing a 74% return rate. Pearson correlation analysis was used to examine the relationship between OLAP features and PEU and PU.

Four companies were selected for the survey, two of which used MOLAP software and the other two used ROLAP-related software. The companies using MOLAP used either Cognos Software's PowerPlay or Oracle Express from Oracle Corporation. PowerPlay software stores the analytical data in multidimensional data sets called PowerCubes that are stored either on clients or on servers and are updated periodically by running a batch job. The PowerPlay analytical engine is aided by Impromptu reporting system and Visualizer visualization technology. Oracle Express also stores data in multidimensional "physical cubes" and allows users to "slice and dice" the data cubes. The companies using ROLAP used either Business Objects or had modules that were custom-developed internally using SQLBase RDBMS. When employing Business Objects software, a user submits a request for information through a semantic layer, which is converted to an SQL statement submitted to the database engine that accesses relational database and returns the result that is transformed into a cube for the user.

Back to Top

Results and Discussion

The significant relationships between OLAP features and PEU and PU for MOLAP and ROLAP systems are shown in Figures 2 and 3, respectively. All features (except Query function) of ROLAP are perceived as useful. On the contrary, only two features (Visualization and Summarization) of MOLAP are perceived to be useful. Furthermore, in a ROLAP system, PEU is significantly related to PU; thus, when the ROLAP features are perceived as easy to use and user-friendly, it positively impacted the usefulness of ROLAP.

The visualization feature has a positive effect on ease of use with ROLAP software, while it has a negative effect on ease of use with MOLAP. Visualization features are less prevalent in ROLAP, so any improvements in visualization with help of graphical user interfaces and help menus aided ease of use in ROLAP. On the other hand, MOLAP systems usually have adequate visualization effects. Cognos' MOLAP system, PowerPlay, presents data to users in a variety of modes, such as cross-tabs, pie charts, and graphs using Visualizer technology. In addition, users can change various visualization effects, such as, colors, formats, fonts, and so forth. It is possible that excessive presence of visualization effects in MOLAP could confuse users, resulting in a negative relationship with PEU. The visualization features of ROLAP and MOLAP have positive significant effects on the usefulness of the OLAP tools.

The summarization feature has a positive significant relationship with both PEU and PU in ROLAP and MOLAP. This implies that with increasing number of permissible detail-levels and flexibility in swapping between levels, the use of OLAP will improve.

The data navigation feature has a significant positive effect on PEU in MOLAP. Since there are only limited levels available for drill-down and slice-dice, MOLAP allowed users to navigate easily. This limitation of MOLAP resulted in a nonsignificant relationship with PU. The situation is the reverse for ROLAP. The Mercury of Business Objects, a ROLAP system, lets the users define their own dimensions, lets them perform queries at various levels of detail, and offers various reporting facilities. Since these flexible navigation facilities (real-time data access, detail data extraction, or drill-down) are possible for ROLAP, this feature has a positive effect on PU.

The Query function showed a significant positive relationship only with PEU for MOLAP. Since all reports have been predesigned in MOLAP, users need only to click and select the report. Impromptu, a companion of Cognos' PowerPlay, is easy-to-use software, but the data cube has to be built by either a database administrator or a database analyst. This predefined data cube may not meet the query needs of a user, in which case, the user needs to wait for the database specialist to modify the data cube. Although MOLAP is easy to use, users did not find it useful because of its lack of flexibility.

Sophisticated analysis has a significant positive effect on PU in ROLAP and not on PU in MOLAP. This is because ROLAP provided users with more useful functions: ad hoc queries down to detail data, customized reports, and what-if analysis. The Set Analyzer of Business Objects allows users to build complex queries from large databases as index tables, thereby enabling users to build sophisticated and flexible queries that also run quickly. Set Analyzer allows users to maintain a hierarchy of evolving queries, giving them the capability to perform sophisticated analyses.

Dimensionality for ROLAP systems has a significant effect on PU. Since ROLAP systems operate on transactional data, users could get current data in their required dimensions. In MOLAP systems, pre-aggregation has limited the flexibility of changing the definition of dimensions, resulting in users not perceiving it as useful. Oracle Express allowed users to create relationships among the existing dimensions and to define the top-level dimension. However, the users did not perceive these facilities as relevant or useful.

The positive correlation between Performance and PU signifies the importance of system performance in ROLAP. Since it takes time to execute the SQL queries for manipulating voluminous data, users perceived performance to be critical. With ROLAP systems (for example, Business Objects), large amounts of data are queried by the clients against large data sets—this further results in increase in network traffic, leading to high response times of queries.

Back to Top

Choice Between MOLAP and ROLAP

This study evaluated OLAP tools for ease of using the system and for usefulness. Following are some guidelines in choosing between MOLAP or ROLAP:

  • Choose MOLAP for non-sophisticated computer users and ROLAP for the sophisticated users. Our study found more features of MOLAP have positive effects on ease of use, compared to those of ROLAP.
  • Users who use only preset reports and have no need to monitor the daily transaction data could deploy a MOLAP system. On the other hand, users that need to analyze the market information regularly would require a ROLAP system; it is suitable for the retailing industry or manufacturers with a variety of products and a large volume of data.
  • If the information needs of users are relatively consistent over a period of time, MOLAP is preferred. If the requirements change frequently, ROLAP should be adopted because of its flexible query capability.
  • Since MOLAP uses a multidimensional data cube that is generated periodically, the data is not current. Hence, MOLAP should be used where data is relatively nonvolatile. Customers can use MOLAP for inquiring about the products, their descriptions, and prices. For a volatile data environment, for example, as in sales transaction data, they would need more current data than is possible through a ROLAP system.
  • In the initial stages of adoption of OLAP technology in organizations, MOLAP systems are recommended because of their ease of use. After considerable experience, a ROLAP system is preferred because of its flexibility and ability to handle complex queries.

Back to Top

Effective OLAP for Data Warehouses

Based on the OLAP users' perception, our findings indicate MOLAP tools make the data warehouse system easy to use but not useful; ROLAP tools make the data warehouse useful but not easy to use. Suggestions for improving the design of data warehouses with OLAP include:

Do proper planning: Because the system designs for MOLAP and ROLAP systems are quite different, IT professionals should be aware of this in requirement planning. User requirements for MOLAP systems should be clearly defined in advance so that pre-aggregated formats can be set appropriately.

Make ROLAP user-friendly: The flexibility of ROLAP system should be complemented with easy-to-use features. Software vendors should design ROLAP tools using better GUI and drag-drop technologies, so that the software is more user-friendly.

Align IT strategy with business: OLAP tools should be designed considering alignment of IT strategy with business strategy [7]. First American corporation implemented a data warehouse that is aligned with its business strategy and improved financial performance [3]. By determining information needs based on the proper alignment, OLAP tools can be made more useful for organizations and individuals. This is especially true in case of MOLAP tools, since only a few features are related to PU.

Physical data warehouse design: Better physical data warehouse design is needed in order to improve the performance of ROLAP tools. Data warehouses may be designed integrating the ROLAP relational structure and the MOLAP multidimensional cube—one way to implement this is by using a dense-region-based data cube [2]. Performance of data warehouses can also be improved by using physical design techniques, such as partitioning and access method selection [12] and parallel query processing techniques [5].

Personalize: OLAP tools should be personalizable. Personalization is an evolutionary concept in designing personal end-user tools [11]. This may be done by unbundling the features of OLAP and providing the software interface to the user that will allow access to a set of OLAP tools selected depending on the skill level and the information needs of the specific user. This will improve both ease of use and usefulness of the system.

Integrate ROLAP and MOLAP: Data warehouses should include both ROLAP and OLAP in an integrated fashion, since information needs generally comprise both batch output and online inquiries. Batch outputs could be done with MOLAP, while online ad hoc needs can be met with ROLAP tools.

Integrate OLAP with decision models: In order to make data warehouses and the associated OLAP tools more useful for decision support, analyses need to be made of the decisions to be supported, the decision processes involved, and the relevant decision models. Using decision-making processes and decision models, appropriate queries can be designed and incorporated into OLAP tools, thereby benefiting decision makers.

Improve data currency: Since a drawback of MOLAP is not having current data in its database, these data warehouses should be updated as frequently as possible, which will ensure the outputs from the data warehouse are more current. However, updating the data warehouses is time consuming and costly. So, an optimal updating frequency should be computed and used in practice.

Use data mining to improve OLAP: Data mining can extract rules based on historical data. By using these rules, the materialized views for OLAP can be designed. Since these rules are extracted from previous transaction profile, the predesigned queries or materialized views in MOLAP tend to be more useful. Furthermore, by using data-mining rules, indexes can be selected intelligently for ROLAP.

Back to Top

References

1. Berndt, D.J., Hevner, A.R., and Studnicki, J. The Catch data warehouse: Support for community health care decision-making. Decision Support Systems 35, 3 (June 2003), 367–384.

2. Cheung, D.W., et al. Towards the building of a dense-region-based OLAP system. Data and Knowledge Engineering 36, (2001), 1–27.

3. Cooper, B.L., et al. Data warehousing supports corporate strategy at First American Corporation. MIS Quarterly 24, 4 (Dec. 2000), 547–567.

4. Cope, J. New tools help Lockheed Martin prepare for takeoff. Computerworld (Mar. 17, 2000).

5. Datta, A., VanderMeer, D., and Ramamritham, K. Parallel star join + DataIndexes: Efficient query processing in data warehouses and OLAP. IEEE Trans. On Knowledge and Data Engineering 14, 6 (Nov./Dec. 2002), 1299–1316.

6. Davis, D.G. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, (Sept. 1989), 319–339.

7. Gardner, S.R. Building the data warehouse. Commun. ACM 41, 9 (Sept. 1998), 52–60.

8. Glassey, K. Seducing the end user. Commun. ACM 41, 9 (Sept. 1998), 62–69.

9. Hasan, H. and Hyland, P. Using OLAP and mltidimensional data for decision making. IT Pro, (Sept./Oct. 2001), 44–50.

10. Koutsoukis, N., Mitra, G., and Lucas, C. Adapting on-line analytical processing for decision modelling: The interaction of information and decision technologies. Decision Support Systems 26, (1999), 1–30.

11. Riecken, D. Personal end-user tools. Commun. ACM 43, 8 (Aug. 2000), 89–91.

12. Song, S. and Gorla, N. A transaction-based genetic algorithm approach to vertical fragmentation in relational databases. The Computer Journal, 43, 1 (2000), 81–93.

Back to Top

Author

Narasimhaiah Gorla ([email protected]) is an associate professor of IS at Wayne State University in Detroit, MI.

Back to Top

Figures

F1Figure 1. Data warehouse and OLAP.

F2Figure 2. MOLAP model.

F3Figure 3. ROLAP model.

Back to top


©2003 ACM  0002-0782/03/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.