Both researchers and practitioners accept that decision support systems (DSSs) have specific needs that cannot be properly addressed by conventional information systems. Online Transaction Processing (OLTP) systems work with relatively small chunks of information at a time, while DSS applications require the analysis of huge amounts of data. Online Analytical Processing (OLAP) [1], data mining [3] and data warehouses [7] emerged during the last decade in order to fulfill the expectations of executives, managers, and analysts (also known as knowledge workers).
We have also witnessed the flourishing of component-based frameworks during the last few years (see Communications' special section on object-oriented application frameworks, Oct. 1997 and Communications' special section on component-based enterprise frameworks, Oct. 2000). These frameworks are intended to help developers to build increasingly complex systems, enhancing productivity and promoting component reuse in well-defined patterns. Nowadays those systems are widely used in enterprises throughout the world, but they usually provide only low-level information processing capabilities, since they are OLTP-application-oriented. Here, we propose the development of custom-tailored component-based frameworks to solve DSS problems, although this approach can be extended to a wide range of scientific applications.
An open framework for the development of data mining algorithms and DSSs should include capabilities to analyze huge datasets, cluster data, build classification models, and extract associations and patterns from input data. The conceptual model for such a system is shown in Figure 1.
The data miner (the user of the system) has to analyze large datasets and he or she needs to make use of data mining tools to perform a task. Data is gathered and data mining algorithms are used in order to build knowledge models that summarize the input data. Those models may provide the information our user needs, or they may just suggest new ways to explore the available data. Moreover, those knowledge models could be used as input to other mining algorithms in order to solve second-order data mining problems. Both knowledge models and dataset metadata might be stored for later use in the back-end database (for instance, the Object Pool).
Component-based frameworks such as Enterprise JavaBeans (see java.sun.com/products/j2ee) and Microsoft .NET (see www. microsoft.com/com/net) are based on a common architectural pattern, a.k.a. the Enterprise Component Framework [5]. A simplified representation of this pattern is shown in Figure 2. This pattern, modeled as a parameterized collaboration in UML [6], contains six classifier roles which are depicted as rectangles:
We believe every component-based, data mining framework should focus its design efforts into two major objectives:
Let us consider, for example, the case of assembling the datasets that are used as input to build knowledge models. These datasets may come from heterogeneous information sources.
Data mining tools usually work with tables in the relational sense. Each table contains a set of fixed-width tuples that can be obtained either from relational databases or any other information source (ASCII or XML files, for example).
All tabular datasets have a set of columns (also called attributes). Each one of them has a unique identifier and an associated data type (strings, numbers, dates, and so forth). A flexible tool should allow the specification of order relationships among attribute values and the grouping of attribute values to define concept hierarchies.
A data mining system should also be capable of performing heterogeneous queries over different databases and information sources. The independently retrieved datasets, in fact, might be processed further in order to join them with other datasets (data integration), to standardize concept representations and eliminate redundancies (data cleaning), to compute aggregations (data summarizing), or just to discard part of them (data filtering).
All the aforementioned operations involving datasets can be performed using powerful formal models and query languages. However, typical users are not prepared to use such models and languages to define the customized datasets they need. They would probably reject a system that requires them to learn any complex formalism. In order to improve system acceptance, we propose a bottom-up approach. A family of dataset-building components should provide users with all the primitives they need to build their own datasets from the available data sources:
These components can be combined easily in tree-like structures to build highly personalized datasets. These datasets are amenable to standard query optimization techniques, therefore improving system performance. Using our approach, even computer-illiterate users are able to use complex data mining systems by linking dataset modeling components.
Commercial enterprise application servers (the containers in the framework pattern) are currently restricted to OLTP applications, and we believe it is time for system architects to focus on higher-level information processing capabilities in order to take advantage of the vast computing resources available with current corporate intranets. Although we have proposed a component-based framework model for building a data mining system here, our approach is extendible to any CPU-intensive computing application.
1. Chaudhuri, S. and Dayal, U. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, Mar. 1997.
2. Codd, E.F., Codd, S.B., and Salley, C.T. Providing OLAP to User-Analysts: An IT Mandate. Hyperion Solutions Corporation, Sunnyvale, CA., 1998; www.hyperion.com/ products/whitepapers.
3. Han, J. and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 2000.
4. Juristo, N., Windl, H. and Constantine, L. (Eds.) Usability engineering. IEEE Software 18, 1 (Jan./Feb. 2001).
5. Kobryn, C. Modeling components and frameworks with UML. Commun. ACM 43, 10, (Oct. 2000), 3138.
6. Object Management Group. OMG Unified Modeling Language Specification, 1.4, 1st. Ed. (Sept. 2001); www.omg.org/technology/uml/.
7. Widom, J. Research problems in data warehousing. In Proceedings of the 1995 International Conference on Information and Knowledge Management (CIKM'95), Nov. 29Dec. 2, 1995, Baltimore, MD.
Figure 1. A conceptual model for DSSs.
Figure 2. A simplified representation of the Enterprise Component Framework.
©2002 ACM 0002-0782/02/1200 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.
No entries found