Sending and receiving email, writing articles and programs, browsing the Web, and interacting with newsgroups are all ubiquitous computational activities centered on documents. Most users engage in these activities using a collection of applications, each of which may be reasonably good but collectively represent a document environment that is far from appealing. One deficiency is that each application has its own distinct set of capabilities, user interface style, way of setting preferences, perhaps even scripting language. In effect, each application is its own world to which the user has no choice but to adapt. Adapting to an application means having to learn multiple application-specific ways of doing the same conceptual task. Moreover, the ability to perform a particular task may not be available at all in some contexts, not because it is not useful or impossible to provide, but because a particular application does not happen to support it.
Current document environments not only represent a fragmented world but a rigid one as well. Specifically, it is difficult for even programmers to introduce into existing applications new functionality not anticipated by the application. Each application is likely to be extensible in some dimensions, but novel innovations require deep changes only the application vendor can provide, if it is possible to supply them at all. Innovators have to be content with demonstrating new ideas in toy applications in the laboratory from which these ideas escape only through willful adoption by vendors. Because vendors target the broadest possible commercial markets, and because some innovations would appeal to only a small community, these innovations are likely to become orphan features. Moreover, the fragmented nature of the document environment compounds the extensibility limitation, since an extension that works for one application is unlikely to work for another, hence weakening the innovator's motivation for producing it in the first place.
In a sense, the difficulty of extending document applications and the fragmented nature of document environments are the same problem. Existing document applications are generally centered on a particular document format, and extending the application to handle other formats is a prime example of the type of extension we claim is difficult. Thus, Web browsers can display HTML but rely on external applications for handling other formats, perhaps through plug-ins, each of which has its own distinct modes of interaction and user interface conventions. Support for a new format by the browser is something only the vendor can supply, if it is possible to supply it at all.
To confront both problems, the user needs a single, radically extensible document environment in which even deep changes, such as support for new formats and new user interfaces, can be made available as extensions. Moreover, once a feature is provided, it would apply everywhereat least where it makes sense to do so. The details of the user's application of the feature would not depend idiosyncratically on either format or application details.
Over the past several years, we've been developing a digital document model called Multivalent Documents (MVD) [7] that promises to provide such capabilities. MVD is a highly open, distributed, and extensible document framework, in effect, a document-centric shell developers can extend to manifest arbitrary document functionality. Developers extend MVD by writing "behaviors," or pieces of code conforming to the MVD protocol. Behaviors can be used to bridge new document formats into the framework and implement generic document functions while providing capabilities not otherwise readily available today. The MVD framework provides for different behaviors to compose together gracefully, so what might be fragments of functionality written by different developers appear to a user as a coherent, if unusually powerful, document system.
Here we explore and explain some of the possibilities enabled by MVD by presenting some of the capabilities we have implemented in this framework, emphasizing a number of distributed in situ annotations over multiple document formats. In the process of presenting these capabilities, we describe how MVD supports essentially arbitrary document formats, including common ones like HTML, TeX DVI, PDF, and ASCII, as well as new genres, including "enlivened legacy documents," or digitized paper documents, together with open-ended functionality. We also outline the MVD architecture and how it supports these and other capabilities.
Users interact with MVD as they might with most other document systems, opening or creating documents, interacting with them via menus, keyboards, and mice, and saving results. But unlike other systems, MVD has neither a native document format nor built-in functionality. Instead, an MVD document specifies a collection of behaviors, each of which may include associated pieces of content, or "layers." For example, an MVD document might specify a behavior that understands HTML, along with the location of an HTML layer. Such a document would also make use of many other behaviors. Some would implement near-universal document functionality; others might represent a site- or document-specific function; for example, a behavior might implement a novel search method. Opening a multivalent document would appear to the user much like opening the corresponding HTML page with a Web browser, except the functionality provided by the novel behavior would also be available. If, instead of HTML, the document itself specified a PDF-cognizant behavior, along with a PDF layer, then opening the document would resemble opening a PDF document in a suitable "previewer," except that, again, the functionality of the novel search behavior would be available as well. Moreover, the same set of document manipulations and user interface would be available in both cases.
Most MVD documents have a primary layer, called the "base layer," from which most of the document content is drawn. (Indeed, if asked to open a non-MVD document, say, an HTML document, MVD would impute this document to be an MVD document whose base layer is of the HTML type.) It is possiblein fact, commonfor other behaviors to contribute content as well. For example, suppose a user would like to add a hyperlink to a document in a format that does not support hyperlinks. One can do so by including a separate hyperlink behavior in the document. This behavior would refer to some portion of the primary document content, supplying the link destination as its content. This behavior would be responsible for managing appropriate formatting and user interactions.
How is such a behavior added to an MVD document? Typically, a user would just select some text and choose a menu item that adds a hyperlink. Behind the scenes, some behavior would have provided this menu entry, as well as the functionality for it, to produce a hyperlink. The user need not know anything about the architecture or even about the existence of behaviors. Moreover, after creating the desired hyperlink, the act of saving the document would produce a persistent representation of the behaviors needed to create the dynamic document. Thus, opening the saved document would recreate the document with the hyperlink, even though the original base layer content has not been alteredand, indeed, may be unmodifiable by this particular user.
Behaviors and layers can be downloaded at any time, allowing the system to be continually upgraded, modified, and updated. Improvements can operate anywhere in existing environments, without requiring services to be updated to support new features. Any type of document can be bridged into the framework; existing behaviors operate on all types. Every aspect of the system is subject to improvement, allowing developers to enhance and customize it to taste.
We have used MVD to implement a number of document manipulations we refer to collectively as "annotation" [6]. In MVD, annotations are behaviors that are useful annotatively, that is, they allow users to attach additional content to existing content. Because annotations are just behaviors, MVD annotations are distributed; they can be stored separately from the documents they annotate yet appear in situas if they were an intrinsic part of document content. These annotations may perform a number of functions, including: actively manipulating the documents to which they are attached; becoming attached to any document one can view, whether or not the underlying document format supports such an annotation type; and behaving robustly in the presence of change in the underlying document. Moreover, the resulting annotated document can be shared with other users without special server-side support.
We find it convenient to divide annotative behaviors into three classes: point-to-point spans of media elements; geometric regions of a document presentation; and structures within the document tree.
Spans. Spans are sequences of document content. Behaviors associated with spans can specify or alter typographic characteristics to be used to render the span and associate actions with user events relating to the span. For example, a hyperlink can be implemented as a behavior attached to a span. The behavior would use some typographic feature (such as underlining) to indicate the hyperlink's presence; associated actions cause the mouse pointer to change when moved over the span and a new document to be opened through a mouse click on the span.
Figure 1 shows how we used MVD to add four annotations to an HTML document. We used an MVD behavior that understands HTML for "correctively" parsing the HTML layer, converting it to the internal format used by MVD, and formatting and "painting" the document for display. Other behaviors annotate this HTML content. For example, a "copy editor replacement" behavior references a span of HTML content (the word "Acrobat") and suggests it be replaced by the text "PDF." Another copyediting behavior suggests deleting the word "that," another, italicizing a phrase, and yet another, inserting an omitted parenthesis. These copyediting behaviors appear somewhat like the change-tracking functions in editor tools. However, an important difference is they are not part of the content they annotate, even though they appear in situ. Indeed, annotators need not have any write privileges on the underlying base layer nor have to make a persistent copy of the document for the sake of providing such comments.
Nevertheless, useful functionality can still be provided. In particular, copyediting behaviors can be executable. For example, double-clicking on the replacement span in Figure 1 would cause the string "Acrobat" to be replaced by the string "PDF" in the text of the base layer.
Figure 2 shows MVD manipulating a quite different document formatan "enlivened scanned page image." That is, we created a document type by writing a behavior combining scanned page images with OCR analysis of the images to enable users to manipulate such documents as "electronic paper." The document would have the appearance of a scanned image but could be manipulated as if it were a structured document in a word processing application. For example, even though users are presented with a document image, they can select text from the document, search for and highlight target terms, and perform other manipulations.
Figure 2(a) shows some spans annotating this kind of document. One span in the figure is a hyperlink presented by underscoring the associated document content in blue along with the user-interface properties users have come to expect from hyperlinks. Another span on this page is a highlight, designed to resemble a yellow highlight marker pen. The results of a search are also displayed by spans marked by red boxes. The current selection (displayed with an orange background) is also implemented as a span.
Exactly the same behaviors (or the very same object code) could be used to annotate HTML documents as well. In this case, the functionality would appear to be less novel, as these are the sort of manipulations typically available for structured documents. Note though that the MVD architecture provides the annotative use of these capabilities at no additional cost. That is, a user could use the behaviors to highlight text in someone else's HTML page or add an additional hyperlink to that pagewithout having write permission on the underlying document or creating a copy of the underlying document.
To illustrate the multiformat nature of MVD behaviors, Figure 2(b) shows the same page as Figure 2(a) but with two copyediting spans. The span on the left extends over the word "equation" and simply provides a comment. The span on the right extends through the word "J" and recommends it be replaced with the text "Journal." Again, rather than placing the annotations in the margin or in a separate window, they are placed as if the document includes them in situ. To fit themselves into the document presentation, the annotations request additional inter-line spacing, or leading, between lines of text; MVD responds to the request with a subtle reformatting of the scanned page image.
Unlike simple overlays, spans are anchored to document contents. Thus, during runtime document manipulations, the annotation remains properly positioned. Moreover, when spans save themselves in a persistent MVD document, they save content-anchored positioning information, so if the base document changes later, heuristics can attempt to position the spans correctly.
Lenses. This class of multivalent behaviors [1] affects the geometric regions of a document's appearance. Like spans, MVD lenses can modify content-display parameters and receive events.
Figure 3(a) shows two examples of lenses. Toward the upper-left portion of the screen is a "Show OCR" lens. Inside this lens, the image text is replaced by the results of an OCR process, rendered in the font the OCR software identified as the one used in the original text. Toward the lower-right portion of the screen is a "Bit Magnify" lens enlarging the image beneath it. The effects compose, so that, for example, positioning the "Bit Magnify" lens over the "Show OCR" lens computes and presents a magnified portion of the OCR-rendered content.
Implementing notes is a naturally "annotative" use of lenses. Notes are just opaque lenses with their own full-featured document content. Figure 3(b) shows an HTML page annotated with a note whose contents include a hyperlink to a location further down the page, allowing the person reading it to click on the link to be transported to an off-screen comment.
Structures. Structural behaviors hook into the document tree and implement functions applicable to the document's structurally meaningful portions. For example, suppose the user were to select, in Figure 3(a), one of the bibliographic entries in the image. These entries have had associated with them instances of a "bibliographic" behavior, allowing the user to modify the selection protocol; subsequent selection moves to the cut buffer either BibTeX or reformatted text, instead of the default text, as determined automatically by the OCR process. A user could also modify this behavior to produce other types of formatting.
Combining annotations. The user often finds it useful to be able to combine several kinds of annotations. In Figure 4, we did so to achieve a flexible outline mode. A structural behavior readily implements a conventional outline mode by catching clicks on the section header to toggle a subtree's "visibility" graphics property. However, within "collapsed" sections, segments can be made visible by attaching span annotations whose properties have higher priority than those of the structural annotations. We call such spans "Notemarks," because they function as both a note of information important enough to override the collapsed section and as a bookmark that can be clicked on to open the outline and scroll to that point.
We illustrate Notemarks with another document format designed for Unix manual pages. We created a behavior specific to Unix manual pages that examines a text layer looking for manual page structures and creates a tree with spans and attached structural behaviors that present the page in some stylized fashion. Figure 4 shows an MVD document created in this way from the text of the "file" command manual page. The headings of each section are shown in a large font size, preceded by a triangle whose orientation displays the state of section collapse. Initially, the sections are all collapsed, so their contents are generally not visible. However, lines illustrating subcommands have spans overriding the collapse, so they are visible. Moreover, the behavior developer can choose to make other annotation types behave like Notemarksby having them set the visibility property and giving them high enough priority. We have made Notemarks from other annotation types for search matches, highlights, and copyediting marks, producing a concise "copy editor's summary" of all the areas that may need revision.
As in our other examples, Notemarks functionality is not limited to a particular document format. Turning on the equivalent of Notemarks in an HTML document collapses the document along major HTML structures.
The MVD architecture. When evaluating these examples, it is useful to bear in mind that none of the related functionality is built into MVD. Integrating a document format, creating a new genre, implementing standard document functions, and providing more novel functionality, such as copyediting annotations and lenses, are all done by writing behaviors. Allowing the system to integrate such powerful behaviors coherently is the primary design goal of the MVD architecture [7], which includes a number of key features:
The MVD protocol suite. To allow arbitrary extensibility of all aspects of the model, we have deliberately opened each of MVD's fundamental runtime operations with an extensible protocol. Behaviors implement functionality by contributing methods to one or more protocols. The life cycle begins with document instantiation (the "restore" protocol), or the assembly of a document's components, during which behaviors and layers are loaded and the behavior methods inserted into their appropriate places in the other protocols. The "build" protocol is then started. The "build" methods use the information in the layers to create an internal data structure representing the document. The "format" protocol methods lay out the resulting document, and the "paint" protocol methods render the document on a screen or translate it to a format suitable for printing. At this point, when the document is rendered, the "events" protocol is started. An "event" loop waits for input from a keyboard, mouse, or other input device and hands it to the methods implementing the protocol. Among other things, events can trigger the "save" protocol and select a portion of the document via the "clipboard" protocol. Moreover, an event might trigger some action that requires looping back to an earlier phase to rebuild, reformat, or repaint the document.
These protocols may seem rather conventional. Indeed, that is the point. Because they are so banal, special-case support is difficult to hide in the infrastructure. Instead, developers have to rely on behaviors to extend the protocols for all interesting functionality. Documents specify an order in which protocols evaluate methods contributed by behaviors. Some protocols have "before" and "after" stages, allowing behaviors a chance to inspect document state after lower-priority behaviors have run. This scheme allows for very flexible extension. It has allowed us to implement all the capabilities described earlier, including such non-obvious ones as lenses.
Media of various types are integrated into MVD by behaviors called "media adapters." During the build stage, media adapters contribute to the construction of an internal document tree. This tree encodes medium-independent structure in its internal nodes and medium-specific data at the leaves. Behaviors other than media adapters operate on the medium-independent internal nodes and communicate with particular media types at the leaves through the protocols. Thus, such behaviors can (and should) be written once without special accommodation for any particular medium and, as much as it applies to a given medium, operate on all media types.
Robust anchoring. MVD documents are composed of layers that may be under the control of different authorities. Rather than attempt to impose a requirement of strict synchronization, the MVD implementation provides support for redundant descriptions of internal locations across layers. This description includes three different types of location descriptors: the location's structural position in the documents internal tree representation (similar to HyTime's [2] TREELOC), along with any media-type-specific offset into the leaf node; an excerpt of the surrounding content; and a unique identifier. If the document is restored later (with the edited base document or other layers upon which it depends), MVD's location-reattachment method employs a series of strategies to try to reattach the anchor at a new appropriate location.
The details of this robust intradocument location scheme are described in [8], though our (still limited) experience has shown it to be remarkably robust. In one case, we created annotations on the Defense Advanced Research Projects Agency home page we then left unchanged for more than a year. During that time, each annotation reattached correctly despite many alterations of the page. In one such alteration, the page was apparently imported into an HTML authoring tool, which placed the entire document content into cells of a newly introduced table. In another, the page was subdivided into frames.
The hub document. We noted earlier that when an MVD document is stored, information has to be saved to record the behaviors and layers in the document, so the document can be restored properly. In effect, the persistent form of MVD is a kind of "hub" documentessentially a list, in XML, of the layers and behaviors in the single conceptual document. To open a document, the framework fetches the behaviors specified in the hub document, places their methods into the appropriate protocols, and begins following the protocols.
The framework can write a new hub document and, possibly, new or updated layers as well. The new hub document may reflect changes that occurred during the course of using MVD, including behavior instances created or eliminated and changes to a layer. Each behavior's "save" protocol implementation is responsible for writing the portions of a hub document reflecting the status of behavior instances or changes to a layer.
This description of the MVD architecture indicates how the model can manifest several characteristics:
Moreover, because MVD is implemented as a Java application, MVD documents can be viewed and manipulated on any Java-compliant platform.
Our current thrust is to develop enough media adaptors for commonly used formats and to make the behaviors robust enough and easy enough to use so MVD is an attractive application for real-world use. Current HTML support is comparable to that of many Web browsers; there is also support for a subset of Cascading Style Sheets, a Web style language recommended by the World Wide Web Consortium. We are also developing media adapters for other document formats, including XML, near-image formats (PDF and LaTeX/DVI in prototype today), and "multipage" document aggregates.
MVD is especially interesting in contexts supporting document collaboration. We are developing related mechanisms for supporting annotation and document sharing and for searching for and cascading MVD documents (such as retrieving and intelligently displaying all the annotations in all the documents that annotate a given document).
We have also begun to experiment with applying MVD to data with some temporal extent (such as sound and video), spatial extent (such as geographic data sets), and both temporal and spatial extent (such as georeferences sensor data). We have also implemented a second client to support georeferenced data (see elib.cs.berkeley.edu/gis3), but we have not yet explored whether it is possible or desirable to integrate all these data types into a single framework.
We have drawn on a great deal of other work in the area of document environments, document models, and computer-based collaborative work. For example, emacs, an extensible, customizable editor, is an inspiration in the area of extensible systems. But emacs is essentially ASCII-centric, making it difficult for programmers to extend it to many document formats (and impossible for scanned pages) and to new user interface modes (such as lenses). Emacs has many laudable qualities but is unsuitable as the basis for a modern document system.
OpenDoc and OLE both view documents as comprising multiple embedded document segments, each to be interpreted by separate software components. MVD, in effect, provides a third dimension. Thus, it is straightforward for a programmer to introduce behaviors into MVD that operate over multiple formats, whereas it is not possible to do so readily in OpenDoc or OLE.
A number of existing systems support various kinds of in-place annotation, including the annotations facility in Lotus Notes that requires "hooks" be made available for annotation attachment in a given document. ForComment supports individuals in a group making comments on documents in most common word-processing formats. Markup (see www.mstay.com) supports annotation, including copyediting marks, in the Macintosh environment. And the NeXT operating system provides blue-pencil markup over any document rendered as Display PostScript. These systems operate at the graphics level and hence make it possible for any document to be annotated. However, annotation is superficial, in that it can't manipulate document content and is not robust with respect to editing; in addition, these systems require buy-in to a particular operating system. And HyTime [2] and the XML linking language make it possible to mark and link to spans in read-only documents, although such documents must be in the same format.
Microcosm [3], ComMentor [9], and Knowledge Weasel [5] are examples of systems mindful of the fact that it takes a great deal of effort to build a document formatter/renderer; they therefore follow a strategy of interoperating with existing formatter/renderers. In contrast, we have pursued a strategy in the MVD architecture of imposing up-front costs to bridge existing application formats into the model and reproduce the desired pieces of functionalityin exchange for even greater functionality.
Our experience with the research prototype suggests that MVD is a useful step toward more powerful, flexible, intrinsically network-centric document models. We have shown that it is possible for a single platform-independent architecture to handle diverse document formats and provide a high degree of extensibility. Moreover, the MVD architecture can provide new modes of interactionbeyond viewing and authoringincluding "spontaneous collaboration" via distributed annotations. MVD is a step toward realizing the full potential the medium of digital documents can provide. You are invited to experiment with the prototype and to monitor its progress at www.cs.berkeley.edu/~wilensky/MVD. html.
1. Bier, E., Stone, M., Pier, K., Buxton, W., and DeRose, T. Toolglass and Magic Lenses: The see-through interface. In Proceedings of SIGGRAPH'93 (Anaheim, Calif., Aug.). ACM Press, New York, 1993, 7380.
2. DeRose, S. and Durand, D. Making Hypermedia Work: A User's Guide to HyTime. Kluwer Academic Publishers, Boston, 1994.
3. Fountain, A., Hall, W., Heath, I., and David, H. Microcosm: An open model for hypermedia with dynamic linking. In Proceedings of the European Conference on Hypertext (ECHT'90) (Paris, France, Nov.). ACM Press, New York, 1990, 298311.
4. Halasz, F. Reflections on NoteCards: Seven issues for the next generation of hypermedia systems. Commun. ACM 31, 7 (July 1988), 836852.
5. Lawton, D. and Smith, I. The Knowledge Weasel Hypermedia Annotation System. In Proceedings of Hypertext'93 (Seattle, Nov. 1418). ACM Press, New York, 1993, 106117.
6. Marshall, C. Annotation: From paper books to the digital library. In Proceedings of the 2nd ACM Conference on Digital Libraries (Philadelphia, July 2326). ACM Press, New York, 1997, 131140.
7. Phelps, T. Multivalent Documents: Anytime, Anywhere, Any Type, Every Way User-Improvable Digital Documents and Systems. University of California, Berkeley, comp. sci. tech. rep. CSD-98-1026, Dec. 17, 1998.
8. Phelps, T. and Wilensky, R. Robust intra-document locations. In Proceedings of the 9th World Wide Web Conference (Amsterdam, The Netherlands, May 1519, 2000.
9. Roscheisen, M., Mogensen, C., and Winograd, T. Beyond browsing: Shared comments, SOAPs, trails, and online communities. In Proceedings of the 3rd World Wide Web Conference (Darmstadt, Germany, Apr. 1014). Elsevier, The Netherlands, 1995.
Figure 1. Copyediting marks on an HTML document in a blue-green font, suggesting, in order, the following actions: replacing the word "Acrobat" with "PDF"; deleting a gratuitous "that" (middle of the page); italicizing the words "Roget's Thesaurus, Fifth Edition"; and inserting an omitted parenthesis.
Figure 2. Image document with spans. The spans in (a) include a hyperlink (blue underscore) and a highlight (yellow background). Also shown are spans corresponding to the current selection (orange background) and search results (red boxes). A menu (Anno) has been pulled down, revealing options for creating hyperlink and highlight spans. If the user selects one, the current selection span becomes an additional hyperlink or highlight span. The same scanned image in (b) includes some copyediting spans; one annotates the word "equation" with the comment "which?"; another suggests the text "J." be replaced with the text "Journal."
Figure 3. Geometric, or lens, behaviors: (a) shows a "Show OCR" lens and a "Bit Magnify" lens composing in overlapped regions; (b) shows a note and a "Magnify" lens on an HTML document. The note includes a hyperlink to other annotations offscreen. Attached to the references in (b) are structural behavior instances providing for alternative selection.
Figure 4. Notemarks. Showing through an otherwise collapsed description section are subcommands, search hits, highlights, and copyediting marks. Clicking on the desired area opens the document up to that point.
©2000 ACM 0002-0782/00/0600 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2000 ACM, Inc.
No entries found