=Paper=
{{Paper
|id=Vol-209/paper-16
|storemode=property
|title=Applying the Semantic Web as a Writer's Tool
|pdfUrl=https://ceur-ws.org/Vol-209/saaw06-poster01-thomas.pdf
|volume=Vol-209
|dblpUrl=https://dblp.org/rec/conf/semweb/Thomas06
}}
==Applying the Semantic Web as a Writer's Tool==
Applying the Semantic Web as a Writer's Tool
Rick Thomas
Independent Software Developer
+1 404-966-5658
saaw2006@evenview.com
ABSTRACT 1. INTRODUCTION
The process of writing and the practice of Semantic Web (SW) We often think of the Semantic Web as a database of metadata
annotation are similar: each takes ideas and interprets them describing resources. This view will be challenged, though, as
into progressively refined symbols. Drawing on this parallel, we progressively divide those resources and model their inter-
they may be mutually improved, conceptually and by the use nal relations. For text resources, metadata will outgrow the text
of software tools. As an adjunct to writing, the SW will be more itself as we first treat paragraphs as resources and incremen-
than a distributed data scheme and will be part of the creative tally represent the explicit meaning of sentences.
process. Supported by the Semantic Web (SW), writing can be
more flexible and include richer, non-linear structure. Explicit language semantics has been pursued in many efforts:
for translation, for literary studies, for legal argumentation and
The key insight is that the basic operation of dividing a re- in computational linguistics. But there have been no common
source and commenting on the relations exposed by the divi- standards by which these techniques could interoperate. The
sion recurs throughout the process of writing. This operation SW, extended for the task, may serve this purpose and become
is rich enough semantically to serve as the basis of interpret- the “data bus” for modeling text. But the SW is not yet able to
ing and annotating text for use in the SW. serve this role. Its expressivity has been circumscribed to
avoid logical undecidability. And its method of reference
This paper describes an experimental tool that supports this would be quite unwieldy if used at the word level.
operation for a writer. Intentionally, the tool is minimal. The
writer produces text while organizing it at the paragraph level As a step toward this future, this paper sketches an approach
into related, web-addressable boxes. The spatial relations of for generating and using ontologies in writing. Basically, the
these boxes express a few general semantics, such as list mem- writer’s division of a text into paragraphs is a key opportunity
bership, likeness and currency. Thus the writer annotates with to capture the meaning of the work-in-progress. Each such di-
informal structure, which enhances the use of annotation tools vision implies relations among the paragraphs that may be
by giving identities to and relations among more narrowly fo- made explicit. This structure, plus the writer’s comments about
cussed portions of text. The notes and relations established are the divisions, become a scaffold for semantic annotation.
useful for the writer's access to the text and provide a scaffold
for revisions. Use of this technique depends on software that can manage the
newly explicit relations in the text, while hiding the technical
In this basic use, overt URIs and RDF are avoided, as is embed- details. Such a tool will offer direct benefits to the writer for
ded markup for data and links. In the implementation, though, organizing the work-in-progress while revealing resources that
boxes also contain the normally hidden RDF/N3 that describes may be usefully addressed by the SW.
box properties and relations. For the writer comfortable with
RDF, the boxes can be used for full annotation.
Looking forward, this approach lends itself naturally to sup- 2. WRITING AND THE SEMANTIC WEB
port for finer-resolution collaboration. Perhaps such a tech-
nique will help writers to discover and formulate relations and A written work is meaningless in the SW until semantic anno-
thus allow texts to interoperate in the future SW. tations describe its otherwise opaque content. Metadata may
describe its author and related documents; keywords and refer-
ences may be extracted and indexed for easy access; commen-
tary can be added to paragraphs in the text. But only with the
use ontologies specific to the domain of the text can it be said
to have meaning in the SW.
There is a gulf between SW annotation and writing: Annotation
works with macro-scale resources and ontologies that are
general-purpose and fixed. Writing addresses micro-scale re-
sources and needs ontologies that are content-specific and for-
mative.
Ontologies for annotation come from many processes: or precise and unwieldy. Pronouns are ambiguous, clause
1) conventional data ontologies describe embedded data structure is confusing, and cross reference is ugly and incon-
within a resource such as people and places; 2) annotation on- venient. The SW solves these problems by using universal
tologies associate comments and bookmarks with general re- identifiers, URIs. This consistency allows for ubiquitous defi-
sources; 3) general content ontologies mark the relations nitions, relations, comments and paraphrases (once its over-
among portions of the work; 4) generic ontologies are used to head is accepted). The SW is good with structure, but inelegant
label resources freely, which is one way to think of social tag- (to say the least) for linear expression. The approach here is to
ging; and 5) emergent ontology models the domain with present the writer with local reference like natural language,
structure generated incidentally from some mass process. but to undergird it with absolute URIs.
These ontologies are similar in that they are external to the Writers cope with many fragments and much iteration on the
work and fixed relative to it. (This assumption is relaxed way to a finished, linear work. The SW can easily represent the
below.) On the other hand a formative ontology is initially im- complexity, but this would be of little use without ways of ma-
plicit in the work and must be drawn out to model the content nipulating writing-specific structures. This design must pre-
as it is developed. sent views of the work that are clear and persistent, yet reflect
the detailed structure of the content.
The work is annotated incrementally, starting with basic on-
tologies and by gradually developing and applying more spe-
cialized ontologies. This modularity is the great strength of
the SW. Data-oriented ontologies stand with ontologies that 3. ANNOTATION + WRITING = SCIENCE
model the work directly. Multiple ontologies describe differ-
ent views of the same resources, and they work together be- Before discussing design, it is helpful to explore the flexible
cause they describe a network of relations linked by these piv- and dynamic use of ontology by analogy with the scientific
otal resources. More classes of resources and more relations are method. This analogy will expose the implicit activities in an-
distinguished among the resources, forming a network of notation and writing. In turn, that will illuminate the synergies
meaning. between writing and the SW to guide the design of software.
But if we require that the writer does nothing with explicit on- The scientist works in several phases: observation, hypothesis
tology, how can we capture structure that can be formed into formation, prediction of results and experimentation to test the
peer ontologies? The premise here is that as the writer divides predictions. Observation begins by selecting a manageable do-
the initial opaque text and comments on the divisions, the main and collecting data that characterizes it. Hypotheses are
significant structure of the work is revealed. The comments formed to describe the regularities within the observations.
and relations are then used to guide the formation of provi- Predictions project the data and hypotheses to new outcomes
sional ontologies and the application of SW annotations. and experiments test the hypotheses and contribute to the
stock of observations.
If this process of revealing implicit meaning is to be guided
by SW technologies, we need a combination of natural lan- Consider the parallel with annotation. Annotation starts with
guage and the SW that offers the best of both and makes them an interest in a domain of resources (Observation). Ontologies
mutually supportive. Three areas must be addressed: expres- are chosen that will structure that domain (Hypotheses). Anno-
sivity, reference and usability. tation is applied and presupposes that the annotation is correct
(Prediction), and that is confirmed by a consensus on its accu-
A writer commands natural language, a system of unlimited racy or usefulness (Experiment).
combinatorial expression. Language freely mixes resources,
statements, meta-statements, and ontology, though awkwardly The analogy brings out a lot that is implicit in the simple act
and unsystematically. The process of writing depends exten- of annotation. It first requires an active choice of resources.
sively on knowing the provenance, standing and relations of This selection, which may be incorrect and incomplete, colors
statements at a detailed level. The language of the SW is, by all the annotation. It requires a distinction to be made and jus-
contrast, limited to a subset of first order logic. Resources are tified by the annotator.
separate: statements and ontology are about resources, while Annotation requires a choice of ontologies. Ontology is usu-
the conventions to treat statements as resources are ad hoc. ally taken to be fixed prior to its use in annotation. Sometimes
But this approach to using the SW for writing immediately re- though, the annotator may not find an ontology that fits well,
quires friendly reification. The writer divides the text and or may find competing ontologies, or may be working in an
comments on the now explicit relation between two para- emerging domain, or may just prefer a casual approach. As an-
graphs. In other words, the elemental structure involved is notation proceeds the ontology is tested, and may require
statements about statements. While this brings the expressiv- changes to refine terms and add new relations, or may be re-
ity of the SW in line with language's all-in-one nature, it com- placed altogether. Thus ontology is formative in general. Like-
promises the logical design of the SW. Yet much can be done wise, each annotation is conditional and subject to revision.
without the use of inference and perhaps in practice the use of All these steps depend on the care, clarity and curation of the
reification can be isolated. annotator. In other words, the ontologies, the annotations, and
Reference in natural language is highly constrained because the users’ expectations are a system that must be kept tuned.
its origins are verbal, thus linear. Expressing complex struc- These stand as important criticisms of the SW, but this is not
ture clearly is difficult: reference is either relative and brittle intended to be dismal. Many domains are intuitive and stan-
dard, but fundamentally, semantics are negotiated in a process Boxes serve four purposes in the design: text editing, SW re-
of annotation where divisions are made and conventions about source identity, spatial relations representing semantics and
those divisions are formed. Thus SW implementations should spatial relations to engage the user's visual thinking.
support revisability.
This potential for variation helps to understand writing, which 4.1 Text editing
is by comparison unstructured. The writer reorganizes and re- Text editing is in a series of vertically stacked boxes, some-
writes, iteratively. Source materials and drafts are associated what like writing in the cells of a spreadsheet or outliner. These
and ranked, and are folded together with the writer's thoughts. boxes are in a list relation. In addition, boxes may be moved
Compared to science then, writing begins with a selection of freely within the page. Two boxes may be related, which en-
sources as context and inspiration (Observations). A plan for compasses their spatial relation and a comment.
the work defines terms, states themes and sketches arguments One of the goals of the design is to understand the interplay
(Hypotheses). The writer then drafts text to support the plan, between text editing in boxes and short-range reorganization
drawing support from the context (Prediction). Reading the of boxes. Boxes outside the linear flow of text provide a con-
draft is the test that leads to a revised plan and a new draft venient way to store snippets, but the mechanics of editing are
(Experiment). (Editors and readers are essential participants, still less convenient than a text editor. Support for moving
but the focus here is on the sole writer.) text between boxes based on box relations would be helpful.
The analogy illuminates the planning and testing phases of
writing. In planning, the writer intuitively organizes the do- 4.2 Semantic Web resource identity
main with borrowed, discovered and invented regularities. It is
like ontology, but it initially has only one instance - the text Boxes are optimized for a division of text to the paragraph
itself. It is not necessarily explicit or shared. If it is explicit, it level, but not finer to the word level. The box is a proxy for its
is expressed informally within the text. contents and so only relations to the box are available repre-
sent relations to the content. This compromise avoids refer-
Each new draft influences this intuitive ontology, which in ences directly into the text - a text that is frequently changing.
turn influences the next draft. This formative understanding of This limits modeling of details and may be ambiguous, but
the domain in effect carries the work's meaning from version to this is moderated by the limited scope of the box. (RDF within
version: text is held constant while the writer ponders mean- the scope of the box may be used but this is not supported by
ing; meaning is held constant while language is changed. This the interface and may be difficult to maintain.)
is a process of interpretation, where the meaning is represented
with one set of symbols and then recast to new symbols - more On the other hand, at the paragraph level, relations addressing
clear, more familiar, more appropriate. This evolution is central the whole resource may be accurate enough, so the need for
to writing. As for general annotation, the process generates links embedded in the text is reduced. Giving up precise loca-
distinctions that are the basis of formative ontology and may tion of reference yields a simpler, consistent structure and eas-
also be incorporated into the text. ier editing. The reference is narrower than, say, the page refer-
ence in a book index, but not as narrow as we are accustomed to
This is not to suggest that a writer will model the work explic- with an embedded hypertext link target.
itly (though for some works, like technical papers, this may be
feasible and desirable). Even so, a tool can help manage the As the primary resource in this SW application, boxes have
distinction between plan and text. If the outlines of the intui- URIs, but they don't appear in the user interface. The writer rec-
tive ontologies are revealed by the writer's actions, they can be ognizes a box by its text and its individual appearance - shape,
caught, refined and support the fluid process of writing. position on its page and relation to other boxes - and refer-
ences a box by gestures toward this visual representation.
4. AN INTERFACE FOR WRITERS 4.3 Spatial relations for semantics
Persistent spatial relations of boxes on pages are associated
The strategy for the design is simple: 1) Provide a way to par-
with logical relations. These relations are made by user ges-
tition the text of a work-in-progress into small resources. 2)
tures. Two boxes are related by their relative position, and a
Describe the resources in terms of a simple ontological struc-
comment applied to the relation in the form of a third box.
ture. 3) Capture the structure of the relations among the re-
These relations are simple and general in the interest of refin-
sources and use it to guide semantic annotation.
ing the user interface gestures and learning how expressive a
The interface is simplified to one primary element - a box, simple interface can be.
which contains text (and hidden RDF) and which is presented
There are two relations: List and Like. List locates a box in a
on a page with other boxes. As far as possible the interface is
sequence of boxes. Visually, boxes are connected bottom to
limited to editing text in a box and positioning boxes to indi-
top down the page, making explicit the normally implicit rela-
cate relations.
tion between paragraphs. Like is shown by proximity and
The writer reorganizes, ranks and associates text using the stands for any relation between boxes (other than List). Unlike
boxes as proxy, and then rewrites within boxes while referring graphs that depict RDF there are no arcs that show the relation.
to related text. This provides a frame for the iterative writing Instead, pairs of like boxes are be highlighted together and a
process. third box commenting on the relation is shown. The comment
text can say anything (and its RDF may detail the semantics). extended to carry RDF/N3. Preprocessing is used to reduce re-
peated syntax, to expand references to box and page, and to
The comment box (which may also be used with the List rela- convert some simplified notations to RDF/N3.
tion) is the key to capturing the structure of the work as a scaf-
fold for annotation. The relation between boxes is a stand-in Current implementation is experimental, though useful, and is
for any possible RDF statement. The comment is a stand in for being used to refine the user interface design and concept.
any possible ontology.
It is worth noting that the boxes provide a context for both
text and RDF, useful for controlling scope for search and infer- 6. CONCLUSION
ence as well as for text indexing.
This project takes a step toward a SW for less structured activ-
ity. The concepts and prototype serve to explore how SW ma-
4.4 Spatial relations for thinking chinery can stay invisible and still help the writer.
Boxes are arranged on a page, implemented as a web browser The key insight is that the basic operation of dividing a re-
tab. Pages are organized as notebooks in the order that they are source and commenting on the relations exposed by the divi-
created. The page is a neutral presentation area for boxes for sion recurs throughout the process of writing. This operation
assembly of the work. It is also a boundary for cognitive is rich enough semantically to serve as the basis of interpret-
scope, a working context. The unique visual layout of its ing and annotating text for use in the SW.
boxes engages the writer's spatial memory and reasoning.
Boxes also give the text a stronger identity by location. Even Looking forward, this approach lends itself naturally to sup-
if the text scrolls within its box, the box gives a better spatial port for finer-resolution collaboration. Perhaps such a tech-
cue than remembering that a paragraph is, say, two thirds nique will help writers to discover and formulate relations and
through the document. It adds an additional, natural means of thus allow texts to interoperate in the future SW.
recall and orientation.
The original impetus for this application was to import the
content of handwritten notebooks into the computer. Scanning 7. REFERENCES
and transcribing is easy, but capturing the relations intended
by sketches and side notes lead to this approach. The spatial The influences for this work are too numerous and diffuse to
relations of these notes imply semantic relations. acknowledge in a small space. Also, important prior work has
no doubt been neglected. In the interest of correcting these
When the box layout represents a physical page the number of omissions and continuing to improve these ideas, online refer-
relations is limited and the layout can be fixed until the user ences are available at http://www.evenview.com/saaw2006/
changes it. This persistence is important for later recall.
Topics include: Emergent and formative ontologies; Contexts
On the other hand, this annotation approach leads to the use of and reification; Merging and ontology; Tools for writers; Evo-
many more relations than resources. To work within the page, lutionary epistemology.
temporary layout transforms are needed. Managing these
transforms is the most difficult part of this design.
Here are several cases: 1) With a large number of small boxes 8. ACKNOWLEDGEMENTS
there may not be enough room to see the text when editing. In
this case the current box is enlarged while surrounding boxes Thanks are due to the reviewers of this paper. They have sug-
keep their same relative positions. 2) A box may be in more gested important corrections and clarifications.
than one List and have many Like boxes so the relations would
be obscured if they are shown all together. Temporary or per-
manent selective views allow focus. 3) Comments on relations
between boxes are placed on a different plane and are shown
only when the boxes are addressed, unless the box is “reified”
as a permanent box on the page. 4) A query of either box text
or RDF yields a collection of boxes that are displayed to-
gether, on a new page if needed. 5) Importing data presents a
similar problem, for example, a directory of links and files
with associated metadata.
5. IMPLEMENTATION AND DEMO
A prototype of the user interface is implemented using Java-
script in the Firefox browser. Python in a local server imple-
ments the RDF processing and text indexing.
The text is formatted with a plain text markup syntax, which is