=Paper=
{{Paper
|id=Vol-552/paper-4
|storemode=property
|title=Knowledge Gardening as Knowledge Federation
|pdfUrl=https://ceur-ws.org/Vol-552/Park-KF08.pdf
|volume=Vol-552
}}
==Knowledge Gardening as Knowledge Federation==
Knowledge Gardening as Knowledge Federation
Jack Park
SRI International
Menlo Park, CA
and
Knowledge Media Institute
The Open University
Milton Keynes, UK
Abstract. The term knowledge gardening, a contraction of the longer dynamic
knowledge gardening (DKG), is a direct descendant of Douglas Engelbart’s
Dynamic Knowledge Repository (DKR). A DKR exists as a combination of
humans and tools, epistemic communities and the tools they use to aggregate
information resources and work products, and to collaborate. We describe
TopicSpaces, an open source topic-map-based framework with which the
collective hypermedia discourse of epistemic communities is federated. We
define hypermedia discourse as the totality of social gestures made by such
communities. That is, recorded dialogs, linked, annotated and tagged Web
resources, recorded stories, virtually all addressable information resources
created anywhere on the Web constitute the range of resources federated. We
define federation of resources as the specific merging processes native to topic
mapping. We contrast federation with traditional semantic integration processes
where artifacts of knowledge are aggregated through processes of selection.
Where selection processes involve “weeding” (a gardening process), federation
does not perform weeding during the merge process; rather, federation involves
including all resources during the merge process; social processes including
reputation, trust, and dialog will help determine which resources users find
most valuable in their work. TopicSpaces provides a map of the federated
territory, user interface tools to facilitate some hypermedia discourse practices,
and Web services to interface with other hypermedia discourse tools.
Keywords: topic map, knowledge garden, knowledge federation, subject-
centric computing, hypermedia discourse
1 Introduction
We offer a position paper that describes one approach among many to the federation
of heterogeneous information resources and world views. Our thesis is that a subject-
centric federation is appropriate to the problem of supporting knowledge gardening
(also known as collective sensemaking—see Section 7) to find solutions to complex
and urgent problems. Our work with SRI’s Cognitive Assistant that Learns and
Organizes (CALO) project1 has taught us numerous lessons that support the need to
federate heterogeneous world views and information resources. For instance, Park and
Cheyer [1] report on the need to federate the personal ontologies of CALO users with
the business-oriented ontology that CALO uses internally to maintain semantic
interoperability among groups of CALO installations. Our thesis project, titled
Hypermedia Discourse Federation, explores the technologies and tactics required to
federate the work products of several tools of hypermedia discourse together with
heterogeneous information resources found on the Web.
In our work, we have adopted the term knowledge gardening as a name for the
federation processes. This follows Douglas Engelbart’s term Dynamic Knowledge
Repository, which is his name for the combination of people, software tools, and
processes as improvement communities. In some illustrations of our work, we use the
term knowledge garden to name our topic map-based Web portal. Our story explores
the role of topic maps in the federation of heterogeneous information resources
through processes of subject identification and merging different representations of
the same subject in the same map. We believe that the maintenance of well-organized
information resources can contribute to improvements in knowledge gardening
processes, toward improved human dialogue.
This paper is organized as follows. We first review the two elements of
hypermedia discourse that we federate through topic maps. They are semantic linking
(connecting), and dialogue (sometimes also known as issue) mapping. We include
social bookmarking as an additional element; while hypermedia discourse2 centers on
contested assertions and ideas, knowledge gardening entails the wider range of social
activities on the Web. We then review our subject-centric federation, or knowledge
gardening approach. We then sketch TopicSpaces, our prototype federation platform.
We then introduce the knowledge gardening process and close with illustrative
examples. Brief references to related work are given where appropriate.
2 Social Bookmarking: Tagging
Tags are associative reminders. In the CALO project, tags are the names of projects in
which CALO users are engaged. For instance, one typical CALO project is the CALO
“platform” itself, a project where CALO developers keep track of the design and
development progress on the product. The tag “Platform” would be used by CALO
developers as they surf the Web looking for information resources of value to the
team. They use that tag with Tagomizer, CALO’s social bookmarking application
written on top of the topic map engine TopicSpaces [2], [3].
Tagging is part of the larger social gardening repertoire; tags leave trails or form
scents [4] along information foraging [5] paths taken by many. Tagging is part of the
foraging and filtering aspects of knowledge gardening (see Section 7).
While tagging is generally thought to enable the formation of clusters of topics,
Brooks and Montanez report some interesting results [6] from experiments with hand-
tagged and auto-tagged articles. Using measures of pairwise similarity in the case of
1
CALO: http://www.ai.sri.com/project/CALO
2
Hypermedia Discourse: http://kmi.open.ac.uk/projects/hyperdiscourse/
human-tagged articles, they conclude that “tagging does manage to group articles into
categories, but that there is room for improvement.” They then report on an
experiment where they extract, from 500 articles, the three words with the top term
frequency – inverse document frequency (TFIDF) score from each article and use
those as “auto tags” for each article. They then cluster the auto-tagged articles. They
report better and smaller clusters when compared to human-derived tags, and suggest
that automated tagging can add great value to search for topics using tags. Our
prototype federation platform facilitates human tagging through its Tagomizer
application, while a background agent harvests tags automatically from bookmarked
Web pages.
Grouping and clustering topics with tags is not the only application for tagging.
We continue to discover new applications. For instance, Razavi and Iverson [7] report
on a novel approach to using tagging to maintain groups and access control to
information resources in their OpnTag3 project.
3 Semantic Linking
In some sense the entire Semantic Web enterprise is about semantic linking. In the
sense discussed here, a narrow definition is taken: semantic linking here refers to the
creation of typed connections between ideas found in documents on the Web. In that
sense, semantic linking is subject-centric by its very nature. In 2001, the Scholarly
Ontologies Project at the Knowledge Media Institute began to envision a
“complementary infrastructure that is ‘native’ to the internet, enabling more effective
dissemination, debate, and analysis of ideas”4. In 1999, three authors [8] proposed that
when a new article is to be published, “authors describe the document’s main
contributions and relationships to the literature using a controlled vocabulary
analogous to a metadata scheme (but implemented using a formal ontology), and
submit the description to a networked repository.” In more recent writing [9], the
Cohere project (Figure 1) has been described as an online means where social
processes are used to find and annotate ideas on the Web.
3
OpnTag: http://opntag.net/
4
ScholOnto: http://kmi.open.ac.uk/projects/scholonto/
Fig. 1. Cohere5 semantic linking Web portal
4 Dialogue Mapping
Dialogue mapping provides a common view of a growing structured representation of
streams of thoughts [10]. In fact, there are limits to conversation [11] that we illustrate
as Figure 2. Starting with a linear collection of thoughts, it is possible to tease out of
that collection a starting question followed by statements that answer the question,
statements that argue about the answers, and possibly statements that raise new
questions.
Fig. 2 . Finding structure in streams of thoughts with Compendium
Analyzing a large body of text into such a map is called issue mapping6. For
instance, a recent OpEd discussion 7 about food riots was mapped by the author as
illustrated in Figure 3.
5
Cohere: http://cohere.open.ac.uk/
6
Issue mapping: http://cognexus.org/issue_mapping.htm
7
OpEd: http://www.nytimes.com/2008/04/07/opinion/07krugman.html
Fig. 3. Finding structure in an OpEd with Compendium
The map reads left to right, starting, essentially, with an opening question. The
node “Food riots” leads to the columnist’s opening question: “How did this happen?”
The columnist provided his own three answers: “Long term trends”, “Bad luck”, and
“Bad policy”. From there, it is a matter of picking out questions being asked, finding
answers, and identifying any arguments made in the prose. A similar dialogue map
would occur if a discussion group was facilitated by a skilled dialogue mapper and
similar questions and responses were recorded.
5 Subject-centric Federation
We live in a vast collection of universes of discourse, each centered on different topic
domains, many of which overlap and share subjects and concerns. The issue map of
the OpEd illustrated in Figure 3 could just as easily have been generated in slightly
different forms, each representing a different interpretation by a different analyst.
That each is somehow different contributes to heterogeneity in information resources
with which we must all cope in our day-to-day and decision-making lives. A goal of
our work is to federate these heterogeneous resources into a coherent representation
with which we believe improved knowledge gardening is afforded.
Consider just one node in our OpEd issue map, the one shown in Figure 3, for
which the label reads “700 calories of animal feed to produce 100 calories of beef”.
That is a specific quote from the OpEd text; it is reasonable to expect that other
analysts might pick up the same claim, even if placed in a different part of the map’s
graph structure.
Fig. 4. A Claim found in the OpEd and represented in the issue map
Claims such as that are, at once, subject to fact checking, and to entailed subjects.
Fact checking can be the work of background agents, or the work of the crowd
engaged in knowledge gardening. Subject entailment goes with the nature of the
claim. That is, there is a relationship between animal feed and animals, and both of
those two subjects exist in a web of related (entailed) subjects. Consider the simple
concept map (Figure 5) of some (but not all) subjects entailed by the node illustrated
in Figure 4.
Fig. 5. Subjects entailed by the two subjects “Feed” and “Beef”
By creating a topic map of dialogues, and by including all entailed subjects, we
gain a broader means by which the work products of knowledge gardening can be
evaluated. By linking into that map each node created by each individual, no matter
how that node falls in its native dialogue map structure, we are performing subject-
centric federation: we are bringing together information resources that are about the
same subject, and we are connecting those resources to all known to the map
resources of the same or related subjects. We do so without editorial bias; we federate
regardless of whether or not we agree with claims represented. We leave
disagreements to the gardening processes in which the map’s users are engaged.
5.1 Related work
Tools that support dialogue mapping include Compendium8, bCisive9,
TruthMapping 10, and DebateGraph 11. Compendium and bCisive are desktop tools,
while TruthMapping and DebateGraph are online portals.
Mark Klein [19] describes online dialogue mapping on a large scale. He describes
the popular communication tools– instant messaging, email, forums, wikis – as facing
“serious shortcomings from the standpoint of enhancing collective intelligence”. He
8
Compendium: http://compendium.open.ac.uk/
9
bCisive: http://bcisive.austhink.com/
10
TruthMapping: http://truthmapping.com/
11
DebateGraph: http://debategraph.org/
then goes on to describe the need for maintaining structure in conversations as we
discussed in Section 4.
6 A Prototype Subject-centric Federation Platform
TopicSpaces is a servlet-based Web portal provider that includes a subject map,
which is a topic map created according to the Topic Maps Reference Model [12]. The
platform provides a servlet-driven REST API [13] for Web services, and will later
provide a tuplespace agent coordination platform [14] to coordinate harvesting agents
on the Web and those included in desktop applications.
Fig. 6. The TopicSpaces platform architecture
The platform illustrated in Figure 6 anticipates the ability to run seti@home-like
agent-based harvesting of resources found on the Web. A tuplespace platform [15]
provides the necessary agent coordination. For instance, consider the scenario where a
user tags a website that is new to the TopicSpaces portal. That new resource is sent to
a harvesting agent that can either perform harvesting tasks locally, or post a new
harvesting task to the Tuplespace where agents elsewhere on the Internet have
authenticated and are waiting for harvesting tasks. A typical harvesting task, well
suited to topic-mapped resources is that of the TextRunner12 process [16], where
bodies of text are parsed, not for sentence structure, but for noun and verb phrases
from which concept maps are constructed that represent the material being “read” by
the agent. The TextRunner approach parses bodies of text into lists of triples of type
{entity, relation, entity} from which concept maps, later topic maps,
can be constructed. We believe that the topic map’s attention to the details of subject
identity can render this process more accurate; to do so, an iterative process of
comparison of the resulting concept maps with their corresponding named topics in a
topic map will allow refinement of the concept map before migrating it into the topic
map. This will be particularly important in cases where named concepts found by the
12
TextRunner : http://www.cs.washington.edu/research/textrunner/
TextRunner algorithm are determined to be ambiguous; different entities with the
same name create such ambiguities.
6.1 Portals
TopicSpaces is a research platform, one that can support two classes of topic maps
portals as illustrated in Figure 7. One class is the all-in-one portal where all the
context view portals, collaboration portals, and personal workspaces are part of the
same software package. TopicSpaces is built like that as a means to explore all issues
related to knowledge gardening.
Fig. 7. The TopicSpaces Web portal architecture
A second class of portal separates all the context portals, collaboration portals, and
so forth from the subject map itself. Different portals can then be crafted using
standard CMS platforms such as Drupal, WordPress, and other popular software
products. TopicSpaces can provide Web services to those portals as needed.
6.2 REST Web Services API
What is a REST Web Service? It is simply a means to use URLs as query vehicles
by way of a servlet. Web browsers make such requests routinely; type a particular
URL into a browser and the server returns the entire Web page in a single HTML
string. A Web service would, instead, return a small fragment of HTML, of XML, or
Javascript Object Notation (JSON)13 as requested. Bookmarklets, as used by
Tagomizer, del.icio.us, and other social websites, represent a kind of Web service
where a short Javascript string embedded in a browser’s bookmarks is able to
transport information from a Web page to the portal that accepts the Bookmarklet’s
query. When we say “API”, we are specifying that there is a particular query string
that goes in the URL, and that query string is interpreted by the portal to perform the
requested task. Some tasks are to return a requested bit of information, the bookmarks
13
JSON: http://www.json.org/
associated with a particular tag, say. Other tasks are to update information in the topic
map, to add a new bookmark, say.
The TopicSpaces REST API takes the form:
/ws//