According to Tim Berners-Lee, the Web will evolve toward the semantic Web: "To date, the Web has developed most rapidly as a medium of documents for people rather than of information that can be manipulated automatically. By augmenting Web pages with data targeted at computers and by adding documents solely for computers, we will transform the Web into the semantic Web.
Computers will find the meaning of semantic data by following hyperlinks to definitions of key terms and rules for reasoning about them logically. The resulting infrastructure will spur the development of automated Web services such as highly functional agents." [2]
But, what if not enough people represent machine processable information at all, or not richly enough, or not in numbers sufficient, to make these services viable? The "information" that appears to be the bottleneck for the adoption of the semantic Web is not data; it is not "7" or "cat." It is the rules and meanings about data defined precisely enough so that machines, not slow, error-prone humans, can correctly interpret and quickly process that data; it is information like "sabbaticals occur every 7 years," or "cats and dogs are mammals." For the semantic Web, ontologies from the AI field are envisaged to codify this information [2].
Therefore, the future of the semantic Web is linked with the future of ontologies on the semantic Web. In order to predict the future of these ontologies, why not look at the history of something similar. As opposed to models to codify information on the Web, what about going far back and examining models that codified information on paper? The aim of this article is to examine the evolution of paper-based systems, explain it using a conceptual model of system evolution, apply the model to Web-based systems analogous to paper-based systems, and finally project what happened with paper-based systems to make predictions about how ontologies will evolve, if at all, to make Berners-Lee's vision of the semantic Web viable.
In simple paper (such as memos) and HTML use, the author is responsible for authoring, and the reader, interpretation and processing. Paper dissemination requires a mechanically enabled physical infrastructure symbolized by the printing press; HTML dissemination requires an electronically enabled virtual infrastructure symbolized by the Internet.
However information is disseminated, the human mind's processing capacity is small relative to the size of the problems requiring processing for an objective solution. Simon [9] calls this bounded rationality. Fox [5] states that bounded rationality compels humans or processors to seek techniques to reduce complexity in information, task, and coordination. His model of evolution of organizational structures to reduce complexity can be applied to explain the evolution of paper-based information manipulation.
Information is too complex when it requires more processing than available in order to be properly analyzed and understood [5]. This complexity is reduced by omission and abstraction. When numerous simple documents need to be examined, requiring them all to be interpreted and processed by the reader is too taxing, especially when they are poorly written, or contain unnecessary or incomplete information. An omission strategy forces the author to submit only sets of information required for processing. An abstraction strategy allows sets of information to be abstracted from one document so that processing can be performed on a set rather than the whole document. For paper documents, this strategy is executed using business forms, which delineate the structure of the document from its contents.
According to Barnett [1], the first business form was a form letter for dispensation of sins, developed by Gutenberg in 1454. What used to be the responsibility of a simple paper document author was decomposed into forms design, and forms data entry. Designers were unlikely to be entering data, so they developed standard operating procedures that data entry clerks could use.
When the volume of actions necessary to accomplish a task becomes too great, the complexity of the task must be reduced [5] through division of labor. What used to be the responsibility of the reader of the simple paper document was decomposed into design of forms processing tasks, and task execution. Forms and task design were centralized and performed by professionals; data entry and task execution, de-centralized and performed by clerks. Innovations (circa 18901930) [1] enabled further division of labor; counting machines for punch cards and register machines sped processing, and one-write systems and carbon paper eliminated unnecessary task steps.
One way to guide division of labor to reduce complexity of coordinating different tasks is near decomposability of a system [10]. Construct units within which tasks are performed such that interactions between units are minimal. Strategies for reducing coordination complexity are predicated upon this principle [5]. One strategy is contracting. Information and tasks required to achieve an outcome may be too complex to coordinate for one organizational unit. That unit may choose to contract to a near decomposable unit, the contractor, which then assumes management of complexity in return for a contract price subject to contractual terms. Many businesses outsourced forms design and production to specialized printing houses such as Moore Business Forms (now called Moore Corp.), because large-scale forms production was prohibitively expensive. Low-cost office typesetting (circa 1950) changed this. A near decomposable unitthe organizational systems department (often subsuming a forms department)was created by many businesses, equipped with typesetters and staffed by forms and task designers. Hence, a specialized functional division arose, whose birth can be explained by the following: organizational substructuring toward functional or product orientationdepending on characteristics of problems faced by an organizationalso reduces coordination complexity.
The last significant electromechanical innovations (circa 19601970) were electrostatic and xerographic photocopying, which enabled inexpensive, high-quality, large-volume replication. As photocopiers became available outside the organizational systems department, forms users reduced their dependency on the department by photocopying extra legitimate, as well as customized, "bootleg" forms. The use of slack resources to decouple dependent tasks is a third coordination complexity reduction strategy. However, these bootleg forms also introduced uncertainty:
For example, a data processing clerk could not process a bootleg form that seemingly contained required information, though expressed ambiguously; or, a system tuned to process a certain volume of completed forms could not cope with additional volumes of user-photocopied forms. Uncertainty introduced by bootleg forms to an efficient forms processing system led to efficiency loss.
With the advent of widespread computerized data processing, systems based on paper-based business forms were transformed to those manipulating digitized data; organizational systems departments of forms and process designers gave way to MIS departments of database and programming analysts. One aim of process reengineering (circa 1990s) was to re-design computerized systems that had gradually evolved from forms-based systems, and hence still predicated upon some mechanistic and manual restrictions of forms use that no longer applied.
If reengineers understood how adoption of innovations led to changes in an organization's forms-based systems, they would have been able to systematically identify components of the evolved system most amenable for redesign as ones developed to implement outdated innovations. Moreover, if they could explain changes to forms-based systems using a model such as Fox's, they may have been able to make some predictions about how their redesigned system would evolve as vanguard innovations were eventually adopted. Taking this approach, in the early 1990s, would some prescient business process engineering expert have designed a flexible, not necessarily optimally efficient, inventory management system that could be integrated with customers' systems using the Internet? Such an approach is taken here to predict how ontologies for the semantic Web may evolve.
XML and ontologies are two means of explicitly representing information applied so that a reader interprets shared data as intended by the data author. XML use for the Web is analogous to business forms use, since informational structure represented in DTDs (terminology) is delineated from content represented as XML data (<foo>7</foo>
).
The definition of ontology used here is that it "consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of formal axioms that constrain interpretation and well-formed use of these terms" [3]. This is a more restrictive one than a "lowest common denominator" definition: "an ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning" [7]. For the semantic Web, an ontology must be expressed in a formal language so that a given ontology expression can be interpreted and processed unambiguously by a machine. Models for communicating vocabulary and structure to humans such as Yahoo!'s taxonomy [8]"lightweight ontologies"and most conventional ER diagrams are too informally expressed for automatic machine processing of semantics. Ontology use for the semantic Web then is analogous to use of business forms with standard operating procedures, since informational structure is represented as terminology; rules governing proper interpretation of the structure, as formal definitions and constraints (semantics or meanings); and content, as ontology ground terms (foo(7)
).
Shared understanding about a communityinformation that its members possessis always applied in solving problems in that community. The terminology used by community members can be codified as the community's DTDs. Ontologies, as "explicit representations of shared understanding" [6], can also be used to codify the terminology's semantics. For example, it must be assumed in using XML that the author and reader of <foo>7</foo>
have the same understanding of what "foo" means. This assumption need not be made in ontology use, since "foo" can be explicitly defined. In comparing ways of codifying shared understanding using the semantic Web, it must be acknowledged that XML is a much more mature technology than ontologies in terms of size of user community, availability of support tools, and viability of business models relying on the technology. Therefore, ontologies can be adopted in situations where the capability to represent semantics is important enough to overcome XML's maturity advantages. What are the characteristics of these situations?
For form-based systems, innovations were adopted over existing technologies to reduce information, task, and coordination complexity, or uncertainty. If it is accepted that forms are analogous to XML/ontologies and XML is a more mature technology than ontology for the semantic Web, but less innovative, then it is reasonable to state the following: Ontology adoption will occur in situations where complexity or uncertainty is reduced more by ontology rather than XML use. Specifically, this occurs when semantics reduces complexity or uncertainty. So, the pros and cons of XML and ontology uses are first analyzed in terms of the three complexity reduction principles.
Bounded rationality. XML use is less complex since semantics are not represented. Whereas many people can identify and classify terms, only some can systematically express meanings of these terms, or can represent them in a formal language. With XML use, however, there is increased uncertainty that crucial information for interpreting shared data is not represented. In situations where it is reasonable to assume that shared understanding can be implicitly applied (for example, by assuming everyone has been uniformly trained) or informally applied (by assuming user manuals are referenced), the uncertainty of omission is mitigated.
Division of labor. There is a clearer delineation of responsibilities in XML use. DTD and data-sharing task designs are performed by professionals; data entry and data sharing are performed by computers with some manual intervention. It may not be possible to automate, or even apply clerical skills, to data entry for ontology use because sometimes definitions and axioms are entered, and their formulations require skills beyond the merely clerical. Therefore, tasks for manipulating XML data are likely more efficient. However, for automated data sharing, an XML-based system will be more susceptible to data that cannot be interpreted properly than an ontology-based system, which is able to apply semantics for interpretation.
Near decomposability. If interactions between near decomposable units are minimal, a corollary states that interactions within a unit are great. Such a unit can then organize to reduce complexity of interactions, guided by principles of bounded rationality and division of labor. As long as a unit can be considered nearly decomposable, bounded rationality and division of labor provide reasons why XML use reduces complexity. However, if near decomposability cannot be assumed, ontology use increases the likelihood that data can still be shared.
The following summarizes the XML vs. ontologies analysis: a unit is nearly decomposable for purposes of data sharing if it is reasonable to assume shared understanding can be implicitly or informally applied to interpret data within that unit (a community). Within a near decomposable unit, it is important to reduce complexity in data sharing. If near decomposability cannot be assumed, reducing uncertainty of data sharing by explicitly and formally defining semantics in ontologies may be warranted. Unless reducing uncertainty is more important than reducing complexity for using the semantic Web, XML will be a better or more proven data-sharing platform than ontologies.
This reflects Fox's statement that as an organization structures to reduce complexity, it simultaneously faces increased uncertainty [5].
Figure 1 presents models in which shared understanding is codified. They reflect structures intended to reduce coordination complexity. In the contracting model, the business network can be considered a near decomposable unit, since data is greatly shared between its companies and service, which are more strongly near decomposable. According to the XML vs. ontologies analysis, XML use for data sharing within the network then is appropriate. An example of this model is Covisint, an online automotive industry exchange using Commerce One's XML-based xCBL. In the functional orientation model, the enterprise is more near decomposable than its departments and functions, so XML use within the enterprise is quite appropriate. For example, WebMethods provides XML-based tools to enable companies to perform the data integration function.
Photocopiers were used as slack resources that loosened the forms user's dependence on the designer, which led to users assuming design responsibility for some forms. A parallel effect for data sharing is the assumption of some of the enterprise's data integration responsibility by departments or other entities within the enterprise. The following presents one such slack resources model.
In this model, the analog to the photocopier is the data modeling tool. Using the tool, knowledge workersnot specialized data modelerswho apply shared understanding for their jobs also codify it. Codified shared understanding is then used to translate data and prepare it for use by an external entity. Bootleg forms produced with photocopiers introduced uncertainty because tasks had not been designed to handle them. Similarly, the data modeling tool gives knowledge workers the ability to codify idiosyncratic shared understanding that will result in data requiring unforeseen or unexpected idiosyncratic interpretation by another entity. One way to acknowledge that uncertainty is inevitable is to not commit to how data from an entity will be interpreted, hence the "?" shown in the model (see Figure 2).
In this model, it cannot be known a priori whether an entity and another with which its data needs to be shared are enclosed within a near decomposable unit. Complexity reduction afforded by the data modeling tool's use is offset by the uncertainty that uninterpretable data is produced if XML is used. In contrast, knowledge workers can explicitly represent semantics for interpretation and introduce less uncertainty if they use ontologies. Therefore, it is predicted that ontologies for the semantic Web may be widely adopted if there are ontology development tools that can be practically used by knowledge workers, not necessarily by ontologists (specialized ontology modelers).
The tool will be evaluated on factors such as ease of use and capability to express rich concepts without complex knowledge representation expertise. However, ontology adoption will not depend primarily on these factors. In Figure 2, the rationale for considering an entity as a near decomposable unit is not to codify shared understanding. If it were, ontologists would codify. The rationale is that a business need be satisfied by knowledge workers with useful skills. A popular KM principle is that people will not contribute to a knowledge base if doing so takes too much time and effort away from their own jobs [4]. Many KM tools (Intraspects is one example) are designed using this principle. Information to be shared is codified as a by-product of workers using the tool for tasks like email processing important to their jobs.
Jasper and Uschold [7] categorize ontology applications as: neutral authoring, ontology as specification, common access to information, and ontology-based search. Only in ontology as specificationdomain ontologies created and used as a basis for specifying and developing softwareis the ontology developed in the course of doing some other work, namely software development, and produced as a by-product. Therefore, it is predicted that ontologies are likely to be widely adopted, if an ontology developed by the knowledge worker is of use to the worker irrespective of whether it is used for data sharing. Therefore, ontologies may be widely adopted first for software specification. It can be argued that "lightweight" ontologies for ontology-based search are already commonly used. However, these ontologies do not conform to the definition of ontologies used in this article, since it is not likely machines can interpret representations in such ontologies automatically.
An ontology for software specification is useful even if applied only once, say for a large software project [7]. For early authors to the Web, intellectual curiosity was compelling enough reason to develop Web sites about which most people would not know. Isolated ontology development for software specification, uncoordinated with other ontology-like efforts, such as a decentralized approach, is a way of getting practical ontologies onto the semantic Web. Few assumptions can be made about how such ontologies will be used by others, so they should be designed for flexibility and adaptability, and commit little to how they would be used. Therefore, it is predicted that: The first phase in the evolution of the semantic Web may be the development of decentralized, adaptive ontologies for software specification
This article attempts to predict the future of semantic Web ontologies (Web-based, analog-to-business forms cum standard operating procedures) by analyzing the history of paper-based business forms. Forms innovations were adopted to reduce information, task, and coordination complexity. However, one such innovation, the photocopier, had a by-product effect of increasing uncertainty in forms processing. In evaluating possible adoption of competing analogs to forms for the semantic WebXML, and ontologiesit has been argued that as long as the pressing need is to reduce complexity, XML use is preferable to ontology use. It has also been posited that the innovation of modeling tools allowing knowledge workers to codify idiosyncratic information and then expect that information to be shared will increase uncertainty in data sharing. When this happens, the use of ontologies over XML to codify information will likely be desirable. These predictions confirm what some in the ontology community suspect may happen,1 and place emphasis on:
It must be noted as a caveat that these predictions are not founded on a rigorous analytical or empirical model. Rather, they are argued using analogies and a conceptual model, and hence much further research is required to strengthen their validity. Nevertheless, they are the reasonable product of a systematic analysis, and as such hopefully will provoke thought and motivate concrete research questions about the nascent semantic Web. How does the ontology development tool work? How are decentralized, adaptive ontologies constructed? How are such ontologies organized for data sharing in the future? The main contribution of this article is that it provides a rationale as to why these may be the pressing questions to ask in order to understand how ontologies and the semantic Web will unfold.
1. Barnett, R. Managing Business Forms. Robert Barnett and Associates Pty Ltd., 1996
2. Berners-Lee, T., Hendler, J., and Lassila, O. The semantic Web. Scientific American (May 2001).
3. Campbell, A.E. and Shapiro, S.C. Ontological mediation: An overview. In Proceedings of the IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing. AAAI Press, Menlo Park CA, 1995.
4. Davenport, T.H. and Prusak, L. Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston, MA., 1998.
5. Fox, M.S. An organizational view of distributed systems. IEEE Trans. Syst. Man. Cybernetics 11, 1 (1981), 7080.
6. Gruber, T.R. Towards principles for the design of ontologies used for knowledge sharing., In Guarino and Poli, R., Eds. International Workshop on Formal Ontology. N. Padova, Italy, 1993.
7. Jasper, R. and Uschold, M. A framework for understanding and classifying ontology applications. in IJCAI-99 Ontology Workshop. Stockholm, Sweden, July, 1999.
8. Labrou, Y. and Finin, T. Yahoo! as an ontologyUsing Yahoo! categories to describe documents. In Proceedings of the 8th International Conference on Information and Knowledge Management. Kansas City, MO, Nov. 1999, pp. 180187.
9. Simon, H.A. Models of Man. Wiley, New York, 1957.
10. Simon, H.A. The architecture of complexity. In Proceedings of the American Philosophical Society 106, (1962), pp. 467487.
1Once the infrastructure technologies for representing ontologies in the semantics are put ino place (after languages like RDF, DAML, and OIL are further developed and standardized).
Figure 1. Near decomposable units for data sharing: XML appropriate.
Figure 2. Near decomposable units for data sharing: Ontology appropriate.
©2002 ACM 0002-0782/02/0200 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.