In the Semantic Web research group at Hewlett-Packard Laboratories, Bristol, we frequently circulate items of interest (such as news articles, software tools, links to Web sites, and competitor information). We call them snippets, or information nuggets, we would like to store, annotate, and share. Email is not the ideal medium for these tasks; its transient nature means the snippets are effectively lost over time. Yet the risk from using a more formal process, like a centralized database, is that it is both cumbersome to use (a barrier to entry) and overly rigid in its data model (not amenable to storing different types of information). Our need illustrates what I call decentralized, informal knowledge management [5].
We are looking for a system capable of aggregating, annotating, indexing, and searching a community's snippets. The challenges we would face in developing such a system include:
Some commentators in the technology media have suggested blogs make an ideal tool for this kind of knowledge management [10]. But blogging offers only half the solution (roughly the top three capabilities just outlined). For the rest, we must look elsewhere, first at today's blogging technologies. For example, one popular blogging siteGoogle's www.blogger.comdefines blogging as "push-button publishing for the people." Indeed, blogging tools provide several routes to speedy publishing, ranging from online Web forms, to email, to the mobile phone. So in one sense, blogs are a method for the quick creation of Web pages. But with the help of Really Simple Syndication (RSS) and Atom (both XML-based syndication formats for blogs), blogs can publish machine-readable summaries of their content, allowing individual or community aggregators to collect, merge, sort, and index this data. In addition, hyperlinks, trackbacks (essentially reverse hyperlinks that identify who is talking about me, rather than who I am talking to), and recommended blogs (blogrolls) provide both coarse- (blog-to-blog) and fine-grain (item-to-item) link elements for the blogosphere. Blogging is thus a powerful tool for establishing and maintaining an online community.
Blogging's greatest benefit is social, not technological. First, ease of use makes it likely that more people will publish and publish more often, and that more information will be communicated. The structure of the information is often different from more static home pages, more like online journals, or (at a higher level) a series of snippets. Because blogs are interlinked, they lend themselves to a sort of digital ecology [1]. The structured nature of RSS allows bloggers and readers alike to integrate and search information feeds (such as BBC News).
Are these desirable capabilities enough to address and overcome the knowledge-management challenges outlined earlier? No, because in traditional blogging, metadata is used only for headline syndication. Metadata is not extensible, not linked to a rich, flexible data model, and certainly not capable of supporting vocabulary mixing and inferencing. Metadata can be extended and is what the existing blog standard RSS1.0 aims to do (web.resource.org/rss/1.0/).
How might Web developers improve and extend the way information is handled on the Web? One contender is the Semantic Web, a common framework that allows data to be shared and reused across application, enterprise, and community boundaries; information is given well-defined meaning, better enabling computers and people to cooperate [3]. RSS1.0 is a Semantic Web vocabulary that provides ways to express and integrate with rich information models (web.resource.org/rss/1.0). The Semantic Web standard Resource Description Framework (RDF) specifies, in essence, a Web-scale information-modeling format (www.w3.org/RDF/). The key element in RDF is the triplea simple subject-predicate-object constructthat can be joined to create a graph-like structure, with subjects (or objects) as their links, or arcs. Figure 1 outlines a simple example, representing statements concerning this article, expressing roughly the following assertions: "Semantic Blogging" is an article whose author is a person with name Steve Cayzer and title Research Engineer.
RDF provides several useful features for rich information processing:
Semantic Web technologies provide a useful way to address the last three of these desiderata. Consider capturing, say, an announcement (a snippet) posted on a Web page by a chief technology officer of a particular company, then retrieve it by querying on any of the relevant attributes (such as "this month's announcements" and "snippets related to company X"). Now imagine that extra information about the company could be added by a third party or integrated from an external source, and that it could be used to retrieve the snippet. Rich information models allow inference-enabled querying, perhaps employing background ontologies (such as "Find all snippets on this company's competitors").
The Semantic Web is best viewed as an enabling technology. Although my colleagues and I have shown it is possible to build dedicated Semantic Web snippet-management applications [2], the technology is likely to gain more traction when set within a familiar and lightweight context. In particular, a well-known problem in the Semantic Web is bootstrapping, or creating enough initial metadata to generate the network effect. In other words, how can we give information providers enough immediate value to encourage them to produce consistent, high-quality metadata in the first place? Blog software developers must do three things:
Each can be explored by embedding Semantic Web technology within a blogging framework.
Semantic blogging is an attempt to use the desirable features of both blogging and the Semantic Web; examples include:
Blog software developers must keep snippet capture lightweight, yet allow snippets to be stored in an accessible knowledge repository. Post-hoc enrichment provides adaptability and future proofing. As the carriers of snippets (think of a blog entry as an annotation attached to a snippet), blog entries can facilitate the linking of semantic blog output with other snippet data sources.
Semantic blogging demonstrator. I built a simple prototype application in 2003 as part of a pan-European project called Semantic Web Advanced Development-Europe (SWAD-E), a European Union- funded effort to bring Semantic Web technologies to a broad developer community. I set the prototype in the domain of a small group's bibliography management effort while retaining the essential characteristics of a more general snippet manager. My vision of semantic blogging is that Semantic Web technologies will enable new blogging modalities that would be difficult or impossible otherwise. For the purpose of the demonstrator, I chose three types of functionality:
How can we give information providers enough immediate value to encourage them to produce consistent, high-quality metadata in the first place?
Figure 2 outlines the demonstrator's basic structure in which a semantic metadata store is built beneath a blog infrastructure (see www.semanticblogging.org/ blojsom-devt/blog/). Input and output mechanisms complete the metadata pipeline, over which the semantic capabilities described earlier are built. I built over a Java blogging platform called blojsom (blojsom.sf.net/), using the HP Labs Semantic Web toolkit Jena (www.hpl.hp.com/semweb/jena.htm). A full description of the demonstrator's design is in [4].
Figure 3 is a demonstrator screenshot, including simple navigation and query options in the left-hand pane. Note that queries can be about blog entries (annotations), as well as about the underlying items. That is, the author of an article is not the same as the author of a blog entry about the article, although it is reasonable to want to search on either type of author. The panel is schema-controlled, so the exact form (both presentation and content) can be customized at runtime. The right-hand pane contains a table-based view of the search results. Again, the exact form of the view returned is schema-controlled, providing a mechanism for personalization.
Metadata control means that different forms can be used for different types of input, whether papers, conference proceedings, or news articles. The demonstrator also provides a simple ontology-backed mechanism to help users create metadata. Using the Simple Knowledge Organisation Systems schema, an RDF schema for thesauri and related knowledge-organization systems [7], make it possible to represent a group of categories as a hierarchical set of concepts. A user sharing snippets can thus associate a set of ranked (preferred and nonpreferred) indicator terms with each concept. When creating blog entries, the demonstrator analyzes the text using a simple stemmer [9], enabling the snippet sharer to rank categories that might be suitable for the blog post.
Genuinely useful tool. The demonstrator was a simple prototype, designed to illustrate Semantic Web values. My colleagues and I are now planning to exploit its ideas and technologies for several purposes:
For the latest on semantic blogging, see my blog at www.semanticblogging.org.
It is fair to ask whether other approaches might bring about these same benefits. For example, do we need the Semantic Web at all? Even without semantic metadata, much can be done to move blogging toward structured knowledge management. For example, one systemwww.bloglines.comuses aggregated blogs to recommend new, interesting blogs to its subscribers. Anothercalled Meme Streams, www.memestreams.net/tracks "memes," or whatever is copied from one person to another on the network, as they spread across the blogosphere. Yet anothercalled Waypath, www.waypath.comuses blogged URLs as a way of linking blog entries "about" the same item. However, there is a limit to how far these mechanisms can go before they need a richer information model.
The idea of blogging for knowledge management is not new [10], but blogs are not yet widely deployed as corporate intranet solutions.1 Other solutions (such as email and threaded discussion boards) are used, along with their attendant limitations of post-hoc access (for email) and system lock-in (for discussion boards). Blogs are imperfect [12]. Consider, for example, the problem of the signal-to-noise ratio in content; more information inevitably means more irrelevant, incomplete, or inaccurate content. A key challenge is finding a way to filter, sort, and navigate through the blogosphere. The Semantic Web can help here, too, by enriching blogs with richer, structured metadata and by providing mechanisms for recommendation networks.
Structure is, perhaps, a more troublesome issue. Consider, for example, the needs of shared categorization so you can link other people's information into a conceptual scheme that makes sense to you. Here again, the Semantic Web can help by providing a standard knowledge-representation format, together with tools for decentralized ontology creation and linking.
The Semantic Web is not the only way to embed such structure in XML. Topic Maps [8] are an information-modeling technique that allows conceptual maps to be represented, linked, and shared. Indeed, much of the value of semantic blogging might also be implemented over topic maps. However, RDF does offer one interesting capabilityinferencing over a rich information modelthus enabling the creation of implicit metadata.
The idea of decentralized ontology creation is not unique. Systems like the Topic Exchange (topicexchange.com), del.icio.us (del.icio.us for social bookmarking), and Flickr (www.flickr.com for online sharing of photographs) allow a community of users to collaboratively build up a knowledge structure (actually a list of tags). However, in all of these systems the ontology lacks semantics and are both centralized and universal. The Semantic Web may be better served by precise, local, domain-specific vocabularies that are loosely coupled, rather than by a one-size-fits-all central ontology, no matter how collaborative.
Some technologies can be used to help power semantic blog applications. A blogger might, for example, build semantic capabilities into Cascading Style Sheets to provide rich, user-specific customizations [11]. One might also use data formats like Easy News Topics (ENT) (matt.blogs.it/specs/ENT/ 1.0/), a lightweight metadata-creation tool and central portal (the k-collector) to provide a collaborative view of a community's blog postings. The metadata-creation tool is a useful way of providing machine-assisted metadata creation, one I extended in the demonstrator application. The k-collector is much more flexible than the Topic Exchange but not as flexible as an RDF model.
The semantic blogging demonstrator is complete and the lessons learned documented [4]. Meanwhile, other groups continue to apply semantic blogging ideas to communal blogs (such as Planet RDF, www.planetrdf.com), wikis (such as Platypus, platypuswiki.sourceforge.net/), aggregators (such as the semblog platform, www.semblog.org/wiki/), and authoring tools (such as Compendium, www.compendiuminstitute.org).
Semantic blogging is not yet a paradigm but is already more than a tool. I have sought to present its key ideas, describe an initial implementation, and point to ongoing work. I look forward to the future with interest.
1. Adar, E. and Adamic, L. Tracking information epidemics in blogspace. Posted 2003; www.hpl.hp.com/research/idl/papers/blogs2/index.html.
2. Banks, D., Cayzer, S., Dickinson, I., and Reynolds, D. The ePerson Snippet Manager: A Semantic Web Application. HP Labs Technical Report HPL-2002-328; www.hpl.hp.com/techreports/2002/HPL-2002-328.html.
3. Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American 284, 5 (May 2001), 3443.
4. Cayzer, S. Semantic Blogging: Lessons Learnt. Tech. Rep. SWAD-E 12.1.8, Sept. 2003; www.w3.org/2001/sw/Europe/reports/demo-lessons-report/.
5. Davenport, T., and Prusak, L. Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston, 2000.
6. Gruninger, M. and Lee, J. Ontology applications and design. Commun. ACM 45, 2 (Feb. 2002), 3941.
7. Miles, A., Rogers, N., and Beckett, D. SKOS-Core 1.0 Guide. W3C Draft, Mar. 2004; www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/.
8. Park, J., Ed. XML Topic Maps: Creating and Using Topic Maps for the Web. Addison-Wesley Professional, Boston, 2003.
9. Porter, M. An algorithm for suffix stripping. Program 14, 3 (1980), 130137.
10. Roll, M. Business Weblogs: A pragmatic approach for introducing Weblogs in medium and large enterprises. In BlogTalks, T. Burg, Ed. Herstellung: Books on Demand GmbH, Norderstedt, Germany, 2003.
11. Udell, J. The Semantic Blog. O'Reilly xml.com, Apr. 15, 2003; webservices.xml.com/pub/a/ws/2003/04/15/semanticblog.html.
12. Weiss, A. The last word: Your blog? Who gives a @*#%! ACM netWorker 8, 1 (2004).
1For example, a recent survey by the American Productivity and Quality Center found that blog use ranked 2.38 on a scale of 1 (not used) to 7 (used extensively); www.apqc.org/site/images/Guerrilla_Technologies_Proposal.pdf.
Figure 1. RDF graph encoding information concerning this article, including its type and author; sb and rdf are namespace abbreviations. Modeling the author as a "resource" (a node in the graph) allows anyone to attach other information (such as job title), perhaps post-hoc. The strings (or literals) are end points in the graph, to which no further information can be added.
Figure 2. High-level architecture of the semantic blogging demonstrator prototype application.
Figure 3. Snippets can be searched for, either through their own attributes (such as "I'm interested in snippets about HP") or through the attributes of their attached blog entry (such as "I'm interested in snippets captured by Bob"). Here, the query is summarized and the results are returned in a plain table format.
©2004 ACM 0001-0782/04/1200 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.
No entries found