acm-header
Sign In

Communications of the ACM

Communications of the ACM

Discovery Through Rough Set Theory


Applications of rough sets theory to knowledge discovery involve collecting empirical data and building classification models from the data [1–3]. The main distinction in this approach is it's primarily concerned with the acquisition of decision tables from data followed by their analysis and simplification by identifying attribute dependencies, minimal nonredundant subsets of attributes, most important attributes, and minimized rules. The technology of rough sets has been applied to practical knowledge discovery problems since late 1980s. The first commercial software tool for rough set-based knowledge discovery application development was sold in the early 1990s by Reduct Systems, Inc. [4]. It was called Datalogic. Here, I briefly discuss some representative applications of this technology using Datalogic and other tools (See [5, 6] for more indepth details). Most of these apps fall into categories such as market research, medicine, control, drug and new material design research, stock market, pattern recognition, and environmental engineering.

In the market research domain, typical applications involve building predictive models of customer response to product offering. Decision tables, or decision rules with probabilistic confidence factors, are extracted from data containing information about past customer demographic, income, and other characteristics. They are then used for segmenting the market trying to predict good prospects. The data collections processed in these applications are in the range of tens of thousands to several hundred thousand records. The objective is to increase the likelihood of correct prediction, rather than trying to build a deterministic decision model. The population frequency of the prediction target, that is, the estimate of the prior probability of success when the decision procedure is not based on any attribute-value information, serves as a benchmark against which the quality of the model is measured.

Knowing the likely income level of a potential customer is important information on which marketing decisions are often based. An application in this category involved an income-level prediction based on an individual's characteristics such as educational level, value of residence, possessions, and so on. The application, done for a market research firm, involved constructing survey data-based model with probabilistic rules for predicting income level using factors selected out of 250 survey items. The data size was approximately 25K records.

The model, in the form of a series of "if .. then .." rules with computed rule strength (number of records matching the rule conditions) and probability quality indicators, provided significantly higher (two- to three-fold) prediction confidence than predictions based simply on the frequency distribution of income among surveyed individuals. The rule strength reflects the degree of persistence of a particular pattern and provides the necessary basis for credible estimate of the rule's probability. Weaker rules are considered unreliable and normally are not retrieved.

Back to Top

Control Knowledge Acquisition from Data

Control applications involve using past experience or simulator-generated data reflecting states of a process to develop a model to support control decision making. The essence of the approach is extracting control knowledge from process log data. Some exemplary applications in this area are balancing inverted pendulum, cement production control [4, 5], and emission control. While the inverted pendulum control was essentially the proof of concept demo developed with Datalogic software, the cement production control and emission control applications were developed and implemented in industrial installations.

Cement kiln control. The quality of the cement produced from slurry in the rotary clinker kiln (Figure 1) depends on the interaction of many factors, including revolution speed or amount of coal being burned. Human operators with significant experience manage to acquire sufficient control skills to produce high quality clinker. The skills are largely intuitive and not easily convertible into computer control programs. In the industrial application described in [5], the control knowledge of the skilled human operator was captured via analysis of the operation log file detailing system states at different times and corresponding operator actions taken (setting kiln revolution speed). A decision table was created in which factors such as a clinker kiln's burning zone color, clinker granulation, color inside the kiln, burning zone temperature and the first derivative of the burning zone temperature were represented in a qualitative form (for example, clinker granulation was either fine, fine_with _lumps, and so on) and related to the control variable setting. The decision table classified all possible observations into relatively small number of classes with associated control actions. The analysis of the decision table involved rough sets-based detection of dependencies between state variables and the control action variable, identification of redundant state variables, and derivation of the optimized decision table (Table 1) with all redundant variables removed. The table was incorporated into PC-based controller and tested in production runs. The tests demonstrated that the quality of control achieved with the automated system exceeds results obtained by experienced human operators.


Practically all rough sets apps can be called knowledge discovery apps.


Emission control. Some results of rough sets apps to discovery utility boiler control rules minimizes nitrogen dioxide (NOx) emissions for an industrial client. The analysis of emission data was conducted by Reduct Systems Inc. for a tangentially fired 150Mw lignite boiler with a modified burner injection system. The burner section was modified to slurry coal clinker burning zone that facilitates optimization of the lower and upper overfire air windbox. The operating data used in the analysis was collected over a period of eight months and included 20 controllable operational variables and NOx emissions analyses recorded at 8-hour intervals.

The NOx emissions were analyzed in terms of the decision values for NOx considered acceptable in order to meet environmental compliance guidelines. Analysis discovered over 50 control rules (27 for low emissions) with key operating variables governing the NOx emissions identified as percentage of excess air, windbox/furnace pressure drop, percentage of open auxiliary air, percentage of open lower and upper overfire air, and percentage of fuel air. The rules confirmed the known as well as derived new optimization strategies. They also provided values for range combination of parameters needed to be adjusted to achieve low NOx emissions.

Practically all applications of the rough sets methodology developed since its inception can be called knowledge discovery applications. This article presents only small fraction of them. In the years to come, the availability of advanced development tools and increased familiarity with the merits of the methodology will result in substantial growth in the number and quality of applications based on knowledge extracted from data rather than relying on the common practice of human knowledge explicitly encoded in algorithms.

Back to Top

References

1. Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic, 1991.

2. Grzymala-Busse, J. Pawlak, Z., Slowinski, R. and Ziarko, W. Rough sets. Commun. ACM 38, 11 (Nov. 1995), 89–95.

3. Munakata, T. Fundamentals of the New Artificial Intelligence. Springer-Verlag, Berlin. 1998.

4. Mrozek, A., and Plonka, L. Rough sets in industrial applications. In Rough Sets in Knowledge Discovery, Vol. 2. L. Polkowski and A. Skowron, Eds. Physica Verlag, 1998, pp. 214–237.

5. Lin, T.Y., and Cercone, N., Eds. Rough Sets and Data Mining. Kluwer Academic. 1997.

6. Ziarko, W., Ed. Rough Sets, Fuzzy Sets, and Knowledge Discovery. Springer Verlag, Berlin., 1994.

Back to Top

Author

Wojciech Ziarko ([email protected]) is a professor in the Department of Computer Science at the University of Regina in Regina in Saskatchewan, Canada.

Back to Top

Figures

F1Figure 1. Rotary clinker kiln.

Back to Top

Tables

T1Table 1. Data-extracted cement kiln control table.

Back to top


©1999 ACM  0002-0782/99/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 1999 ACM, Inc.


 

No entries found