=Paper=
{{Paper
|id=Vol-101/paper-10
|storemode=property
|title=Semantic Word Processing for Content Authors
|pdfUrl=https://ceur-ws.org/Vol-101/Marcelo_Tallis.pdf
|volume=Vol-101
}}
==Semantic Word Processing for Content Authors==
           Semantic Word Processing for Content Authors
                                                     Marcelo Tallis
                                               Teknowledge Corporation.
                                                   4640 Admiralty Way
                                               Marina del Rey, CA, USA
                                             email mtallis@teknowledge.com
                                                               gested as an alternative method for generating semantic
ABSTRACT                                                       annotations. Unfortunately, this technology is only able to
Document authors cannot routinely afford the overhead          extract sufficient information to fill in a flat template and
imposed by current semantic annotation tools. Some char-       cannot capture the relationship graph that connects the in-
acteristics of their task can be exploited to provide them     stances ([5][7]).
with a tool that will reduce the effort required to create     Clearly, current markup authoring tools are inadequate for
both the document content and their accompanying seman-        the task of routinely authoring content. Fortunately, some
tic annotations.                                               characteristics of this task, as it applies to some authors,
                                                               can be exploited to reduce the cost of producing these an-
SemanticWord is such a semantic annotation tool. Seman-
                                                               notations. Some of these characteristics are:
ticWord is an environment based in MS Word that inte-
grates content and markup authoring, providing customiza-          •    The documents to be authored are primarily con-
ble tools that allow simultaneous generation of content and             fined to a few topics. In this case it is worthwhile
semantic annotations, an annotation scheme that allows                  to spend some effort in setting up an environment
annotations to be reused when content is reused, a custom-              tailored to these topics. The savings from produc-
izable library of templates containing partially annotated              ing multiple documents will more than recoup the
text, and an automatic information extraction system with               tailoring cost.
the tools for refining and augmenting its output.                  •    There is a high degree of content reuse. For exam-
                                                                        ple, different documents include common actors
Keywords                                                                and places, share the same context, or update on
SemanticWord, Semantic Annotations, Semantic Web,
                                                                        previous accounts. This characteristic can be ex-
Markup Authoring Tools, COTS integration.
                                                                        ploited to reuse the annotations along with the
                                                                        content.
INTRODUCTION
The vast amount of information contained in the web is         SemanticWord is a semantic annotation tool designed with
beyond any individual’s grasp. Unfortunately, its content is   this kind of task in mind. Some of the features included in
primarily tailored to human consumption and not suitable       SemanticWord are:
for automatic semantic interpretation. The semantic web            •    An environment that integrates content and
addresses this problem by allowing content to be annotated              markup authoring. This environment is based in
with machine understandable semantic descriptions.                      MS Word, a product that is already familiar to
Although current annotation tools take care of the annota-              many authors.
tions syntax and the proper reference and use of ontology          •    Customizable tools for simultaneous generation of
terms ([4][5][6]), authoring semantic annotations remains a             content and semantic annotations.
tedious and expensive process.                                     •    An annotation scheme that allows for annotations
While this cost may be affordable to people who author                  to be reused when content is reused.
web documents sporadically (e.g., a teacher authoring her
                                                                   •    A customizable library of templates containing
homepage) it would be prohibitive to those who author and
                                                                        partially annotated text. Authors can include tem-
update documents routinely (e.g., an intelligence analyst
                                                                        plates in their documents to speed up both content
writing intelligence reports).
                                                                        and annotation production.
Automatic Information Extraction systems have been sug-
                                                                   •    An automatic information extraction system and
                                                                        the tools for refining and augmenting its output.
                                                               SEMANTIC WORD
                                                               SemanticWord offers an environment for authoring anno-
                                                               tated text documents based in MS Word. Its aim is to re-
                                                               duce the burden involved in authoring semantic annota-
tions. Authors are given a familiar and uniform environ-      language developed for the Semantic Web that supports the
ment where the creation of content and semantic descrip-      definition of machine-readable ontologies and the linking
tions can be freely interleaved. In many case both of them    of terms in documents to ontologies.
can be achieved in a single operation.                        SemanticWord annotations are attached to regions of text,
                                                              not to the document as a whole. There are two types of
Overview                                                      annotations: instances references and triple bags. An in-
SemanticWord extends MS Word in several dimensions            stances reference associates a text region with a “referen-
(see Figure 1). First, MS Word GUI is augmented with          cable” instance of a class. Triple bags describe the content
toolbars that support the creation of semantic descriptions   of a text region with a collection of triples that follow
(or annotations) that are attached to text regions. The GUI   DAML+OIL’s subject-predicate-object model. The subject
is also extended to show these annotations embedded           is an instance, the predicate is a property defined in an on-
within the text and to support their direct manipulation      tology, and the object can either be an instance or a value.
through mouse gestures. Second, SemanticWord extends          SemanticWord Annotations are retained across text
Word’s reach by opening a channel to the Semantic Web.        copy/cut and paste operations.
Content from the Semantic Web (both ontology definitions
                                                              Figure 2 illustrates a fragment of an annotated document.
and factual descriptions) is brought into SemanticWord to
                                                              An instance reference is rendered by enclosing the anno-
compose annotations that are later dumped back into the
                                                              tated text between square brackets and with an icon adja-
Semantic Web. Third, SemanticWord extends Word ser-
                                                              cent to the closing bracket. A triple bag is rendered by en-
vices by integrating AeroDAML, an automated information
                                                              closing the annotated text between square brackets and
extraction system. AeroDAML analyzes and annotates the
                                                              displaying a checkbox and a triples table adjacent to the
text of the document as it is being typed, appearing to the
                                                              closing bracket. The checkbox allows the user to display or
author as a service analogous to Word’s spelling and
                                                              hide the table. To facilitate the handling of heavily anno-
grammar checking. Finally, SemanticWord supports the
                                                              tated documents, the text associated to an individual anno-
rapid composition of annotated text through template in-
                                                              tation can be highlighted and all annotation marks can be
stantiation.
                                                              made invisible.
The above extensions were implemented using standard
                                                              An Instance reference icon can be dragged and dropped
Microsoft extensibility technology. Annotations are ren-
                                                              over a cell corresponding to the subject or object of a triple.
dered with ActiveX controls that can be placed in a docu-
                                                              Cells filled using this method do not store a direct reference
ment, implement their own behavior, control their GUI,
                                                              to the dropped instance but rather establish a link with the
and save their internal state. Automatic text analysis is
                                                              dragged instance reference. Updating the linked instance
driven by SmartTags technology that supports background
                                                              reference to refer to a different instance will alter the triple
parsing and tagging of the document text as it is being
                                                              too. This level of indirection improves maintainability.
typed. The rest is supported by an Office COM Add-in that
responds to MS Office/MS Word built-in events (e.g.,          Triple cells can also be filled by picking instances and
DocumentOpen) and extend Word’s menus and toolbars.           properties from special purpose browsers called choosers
The document content is manipulated through Word’s            (See Figure 3). Choosers can use the values already stored
COM API.                                                      in a triple to constrain the lists of choices offered to the
                                                              user. For example, if the subject and object of a row are
Semantic Annotations                                          already filled in then the corresponding property chooser
SemanticWord annotations are based in the DAML+OIL            will only show the properties whose domain and range are
language [3]. DAML+OIL is a knowledge representation          consistent with those entries. Because SemanticWord does
                                                              not enforce consistency, these constraints can be relaxed.
                                                              The choosers also provide other filters for constraining the
                                                              choices shown. For example, the instance chooser includes
                                          MS
                                                              filters for listing only the instances that have already been
        Word Document                     Word                referenced in the document. The instance choosers can se-
       (Text + Annotations )
                                                              lectively list instances corresponding to preexisting seman-
                                      Semantic                tic web markup (provided by the Ontology and KB Server)
          Annotated
                                       Word                   or new instances defined in the current document. They
          Templates                                           also allow users to create new locally defined instances or
                                                              provisional instances (described below), a function that a
                                                              user would invoke if the listed choices do not include the
                               Ontology
                               and KB
                                                 AeroDAML     desired instance. SemanticWord does not impose any order
                                                    (IE)
                                Server                        for filling in table cells, and can persist the state of tables
                                                              containing rows with one or more empty cells.
           Figure 1. SemanticWord Architecture
Locally defined instances are instances that cannot be ref-        mantic annotations that is tightly integrated to MS Word.
erenced from outside the document. Provisional instances           Word is the most massively adopted product for authoring
are an artifact to postpone the identification of an instance      text documents. SemanticWord includes a set of tools that
that is being used to describe relationships. Ultimately,          economize the production of semantic descriptions and
provisional instances must be replaced by references to            exploit opportunities for the simultaneous generation of
external or locally defined instances. SemanticWord keeps          text and annotations. Two examples of these tools are per-
track of the provisional instances and assists users in re-        sonal class toolbars and the cascading class menus, both
placing them.                                                      illustrated in Figure 4.
One obstacle that we noticed in other systems when com-            Personal Class Toolbars: Personal Class Toolbars consti-
posing a triple is that the role of the instances in the triple    tutes a convenient tool for generating both content and an-
cannot be established before examining the definition of           notations together with just one mouse click. Users can
the predicate property. For example, determining who is            create any number of Personal Class Toolbars, each one of
the subject and who is the object in the relationship be-          them tied to a single class. Each personalized class toolbar
tween an employee and her employer depends on how the              contains an instance selection combo box and buttons to
property that relates both of them is (arbitrarily) defined.       create instance references corresponding to the selected
Assigning an instance to the subject or the object of a triple     instance or a new one. If at the time the user creates an
prematurely might preclude the possibility of establishing         instance reference the document contains a selected region
the relationship. To avoid this problem in SemanticWord,           of text, then the instance reference will be attached to that
the property chooser can optionally list reversed properties.      region. If no text is currently selected, then both the “label”
Reversed properties are ordinary properties that assume the        of the instance reference will be inserted in the document at
subject and the object of a triple are switched. Reversed          the current text insertion point, and the new reference will
properties is only an artifact to add another degree of lib-       be associated with the inserted text.
erty in the order in which the triple arguments are filled --      Personal class toolbars save effort when a small percentage
the generated DAML markup switches the subject and ob-             of classes or instances account for a substantially larger
ject of a triple when a reversed property was selected.            percentage of the instance references that an author will
                                                                   need.
Taming Annotation Authoring
SemanticWord was conceived with the goal of minimizing             Classes Cascading Menu: A cascading class menu in-
the burden involved in authoring semantic annotations.             cludes an entry for every named class in the ontology at-
This burden is reduced through several techniques.                 tached to the document. This menu gives users access to
                                                                   most of the operations related to ontology classes, includ-
Non Intrusive Annotation Environment                               ing defining new instances, creating personal class tool-
SemanticWord provides an environment for authoring se-             bars, and opening instance choosers. When a user executes
                                         Figure 2. Fragment of an annotated document.
        The circular icon containing an I Bar (like that adjacent to “BAGRAM”) references an external instance
        from the semantic web. A smiley face icon (like that adjacent to “weapons cache”) references a locally de-
        fined instance. The boxed legend below the “weapons cache” instance reference is its tool tip. If the instance
        icon in the subject or object column of a table is overlapped by a small arrow in its lower left corner (like the
        one in the object column of the first row) then the cell is linked to an instance reference annotation. Modify-
        ing or deleting the linked instance reference will affect the triple too. If the instance icon is not overlapped
        by a small arrow (like the one the subject column of the first row) then the cell contains a direct reference to
        an instance.
any of these functions from this menu, the menu entry cor-       Automatic Information Extraction
responding to the selected class is duplicated and placed at     SemanticWord integrates an information extraction system
the top of the menu so the user can access it easily the next    (IES). Automatic information extraction technology prom-
time that she needs it. The cascading hierarchy is deter-        ises to significantly reduce the human overhead involved in
mined by the subclass hierarchy of the ontology. Classes         the semantic annotation task. Although this technology has
with multiple superclasses appear in the cascade under each      not reached a level of sophistication required to capture
superclass.                                                      deep relationships in text ([5][7]), it can provide useful
Direct Manipulation of Annotations: Direct manipulation          annotation fragments. The approach taken in Semantic-
of annotations is another method of simplifying the produc-      Word is to supply the tools that would allow users to aug-
tion of semantic annotations. In SemanticWord users can          ment the annotation provided by an IES.
compose semantic annotations by manipulating other anno-         SemanticWord uses AeroDAML, an IES developed at
tations that are placed within the document. For example,        Lockheed Martin [7]. AeroDAML processes text and pro-
the subject and object of a triple can be filled by dragging     duces DAML markup that relates instances and values to
instance references annotations over the triple. For some        Ontology classes and types. AeroDAML relies on a high
users this method is faster and more natural than searching      performance commercial information extraction system
for those same instances in instance browsers.                   called AeroText. The default AeroDAML is based in the
                                                                 default AeroText which includes “domain independent”
Flexible commitment order                                        extraction rules capable of extracting many proper nouns
Authors should not be forced to follow a strict order in         and frequently occurring relations. AeroText and conse-
carrying out the many steps involved in authoring semantic       quently AeroDAML can be tailored to particular domains
descriptions. Many of the features that support this princi-     through training sessions with annotated corpuses.
ple have been introduced before. These features are sum-
                                                                 SemanticWord provides an environment for refining and
marized in this section.
                                                                 augment the result of IESs. We observed that the default
    •    Elements of a triple can be entered in any order.       AeroDAML does a good job at recognizing and categoriz-
         Even the determination of which instance is the
         subject and which is the object can be postponed
         by means of the reversed properties. New in-
         stances can be created from the instance choosers
         avoiding a disruption of the triple’s composition
         process. Unlike other annotation tools, triples are
         laid out in a tabular structure rather than in a tree
         or other structures that impose a topological de-
         pendency among its nodes.
    •    Consistency is not enforced. A user is free to com-
         pose a triple that violates ontology constraints.                               Property Chooser
         The user can make the changes that would fix this
         conflict at a time convenient to her. Consistency is
         taken into account when filtering suggested
         choices for composing a triple, but the user can
         deactivate these filters with a single button click.
    •    Instance identification can be postponed but the
         instance can still be used to describe relationships.
         This is achieved through the use of provisionary
         instances, which can be used wherever definitive
         instance can but remind the user of the uncon-
         cluded task. SemanticWord will assist users in as-
         signing identity to these instances.
                                                                                   Instance Chooser (Object)
Annotation Reuse                                                         Figure 3. Property and Instance Choosers.
Annotations are attached to text regions and are going to be       The choices correspond to the filling of the property and
reused when those regions are reused. In particular, annota-       object columns of the second row of the triples table of
tions are carried over along text cut/copy and paste opera-        Figure 2. The listed choices are constrained by the content
tions and when fragments of a document are reused else-            of the other cells of the selected triple. These filters can be
where in the same document in other documents based on             relaxed by toggling the buttons on the top toolbars. The
the same ontology.                                                 Instance chooser also supports the definition of new in-
                                                                   stances.
ing proper nouns but their classification tends to be overly             persists as a (typically quite small) word document.
general. It also fails to recognize most of the relations be-            A template may be inserted into a document just like any
tween instances. For example, AeroDAML succeeds in                       other document. Both the text and annotations of the tem-
classifying Kabul as a Place but failed in finding the more              plate are copied into the target document. After insertion,
specific class City, perhaps because there was nothing in                the copy can still be subjected to further editing and anno-
the text that might clue AeroDAML about this fact. Se-                   tating.
manticWord let AeroDAML to recognize and classify
                                                                         Templates are authored in SemanticWord in template de-
proper nouns but expects the user to refine the classifica-
                                                                         sign mode. All annotations tools described previously are
tion and to specify their relationships.
                                                                         also available for annotating templates in template design
SemanticWord drives the information extraction process on                mode plus an additional toolbar that includes the template
the fly. As the user types the content of the document, a                specific authoring tools described below. We expect that
background thread feeds new or modified text to Aero-                    non-programmers would be able to author templates.
DAML in paragraph units (roughly), obtains the extracted
                                                                         Instance Placeholder: An instance placeholder annotates a
entities with their position in the text, and underlines those
                                                                         region of text that needs be replaced by an instance refer-
text regions with a blue wiggly line. This procedure is car-
                                                                         ence when the template is used in a document. It also
ried out in a way that resembles Word spelling and gram-
                                                                         serves as the surrogate for an instance reference, and as
mar checking and is implemented in terms of Microsoft
                                                                         such, it can participate as the subject or the object of one or
SmartTags technology.
                                                                         more triples in the template’s triple bags.
The user can examine the extracted entities and convert
                                                                         An instance placeholder is rendered like an instance refer-
them into instance reference annotations. As part of this
                                                                         ence annotation but with a different icon. In design mode
conversion the user has the option of refining the extracted
                                                                         this icon can be dragged over triple tables to compose the
type. Once an extracted entity has been transformed into an
                                                                         semantic annotations that describe the template. It can also
instance reference it behaves just like a natively created
                                                                         be dragged over another instance placeholder to specify a
instance reference. In particular, it can be dragged and
                                                                         co-reference requirement. In instantiation mode, this icon is
dropped onto cells of triple bags to describe the relation-
                                                                         a drop site for the concrete instance that is going to be
ships that AreoDAML missed.
                                                                         bound to the instance placeholder.
Annotated Templates                                                      When an instance placeholder is bound to an instance ref-
Annotated text templates reduce the amount of work in-                   erence, the label of the instance reference replaces the tem-
volved in authoring both semantic annotations and docu-                  plate’s text and all co-referential instance placeholders are
ment content. A template consists of a text fragment anno-               bound to that instance.
tated with semantic and template related descriptions, and               Optional group: An optional group delimits a region of
                                                    Figure 4. Toolbars and Menus.
         The last two toolbar rows belong to SemanticWord. The first row contains two juxtaposed personal class toolbars. The
         first one is tied to the class “Terrorist Organization” and has selected the instance “al Qaeda”. The second one is tied to
         “Country” and has selected “Afghanistan”. Clicking in the Check button will generate both the text and the annotation
         corresponding to the selected instance. The other buttons are for defining new instances before inserting their text and
         annotation. The last toolbar row has its classes cascading menu opened. This menu provides access to several class
         related functions. The most recently chosen classes get added to the top of the menu (like Weapon, Terrorist Organiza-
         tion, and Country) for easy access.
text in the template that can be optionally included in the         •  Markup that is tied to text fragments disappears if
instantiation of the template. The text delimited by an op-            the text fragment, or a region containing it, is de-
tional group can contain annotations and other groups. In              leted. Generally, this is desirable because the
particular, it can contain instance placeholders. Opting to            document’s content no longer supports the state-
delete an optional group from an instantiated template will            ment formalized by the deleted annotation.
automatically remove any triples having a cell linked to an     Among the difficulties of this approach we found:
instance placeholders within the deleted group.
                                                                    •    If an entity (e.g., a person or place) is mentioned
Repeated group: Like an optional group, repeated group                   several times within the text, it might be necessary
annotation delimits a region of text and can also contain                to duplicate its annotation too.
other groups and annotations. During instantiation the user
can ask that a repeated group be replicated any number of           •    Some concepts might be implicit or too abstract to
times. Each replication of the group creates its own incar-              be located in the text.
nation of the instance placeholders that it contains. When          •    As changes are made to text within an annotated
the group is replicated, all triples with cells linked to the            region – particularly at its boundaries – heuristics
instance placeholders contained in the group are replicated              must be used to adjust the boundaries. The use of
as well.                                                                 paired brackets for rendering these regions keeps
The utility of annotated templates is enhanced by the IES                the user informed of the result of these heuristics.
described above. The IES analyses the document and gen-         Although SemanticWord is biased toward the attachment
erates instance reference annotations corresponding to the      semantic annotations to text, it does not mandate it, open-
concrete entities mentioned in the text. These instance ref-    ing a whole spectrum of hybrid compromises. For example,
erences can be dragged over the template instance place         authors might choose to attach instance references to text
holders to instantiate the template and generate instantiated   but to describe their relationships in a single global triple
triples describing their relationships.                         bag. Moreover, not even the instance reference annotations
                                                                are required because the triples can be filled directly from
ANNOTATING TEXT REGIONS                                         instance choosers. More serious use of SemanticWord will
In SemanticWord, semantic descriptions are distributed          be required to weigh the pros and cons of this approach..
throughout the document and attached to text regions that
“support” their content. This is not a requirement for the      RELATED WORK
semantic web. Most of the semantic markup authoring             Research in semantic annotations is still in its infancy. A
tools reported in the literature do not adopt this practice.    number of systems have been developed to date that dem-
The descriptions they produce are associated only with a        onstrate different capabilities. However, the approaches
document, not with portions of that document.                   adopted by these systems do not necessarily compete
We speculate that relating a semantic description to the text   against each other but rather address different issues.
that supports it has advantages in terms of annotation au-      Ont-O-Mat ([4][5]), one of the first annotation systems to
thoring, reuse, maintenance, and validation. However, we        appear, is the concrete implementation of CREAM [4], an
also recognize that this practice might introduce unneces-      annotation and content authoring framework conceived for
sary complications.                                             the easy creation of relational metadata (i.e., relations be-
Some of the advantages of attaching semantic descriptions       tween instances). Ont-O-Mat includes its own HTML
to text are:                                                    document editor for viewing and composing the content of
                                                                the document being annotated and an ontology and fact
    •    Descriptions can be reused if the text is reused.      browser for visualizing the markup collected by a crawler
         Annotations are carried over along text cut/copy       and for authoring the markup that annotates a document.
         and paste operations and when document frag-           Like SemanticWord, Ont-O-Mat also provides mechanisms
         ments are reused in other documents.                   that simplify the creation of markup, document content, or
    •    Conformity between the semantic descriptions and       both. For example, dragging text from the document editor
         the content of the document can more easily be         and dropping it on top of a class listed in the ontology and
         validated and maintained.                              fact browser could automatically create an instance of that
    •    Authors might find it natural to find annotations      class with the dragged text filling some property of the
         by finding, through familiar text search/scroll        created instance (e.g., its name). Similarly, dragging an
         mechanisms, the text to which the annotations are      instance listed in the ontology and fact browser and drop-
         attached. Contrast this with browsing the semantic     ping it at some location within the document editor could
         markup directly. For example, in SemanticWord          insert in that location the text corresponding to the filler of
         authors compose triples by dragging around in-         some property of the dropped instance and eventually
         stance references placed within the text.              could attach to that text a hyperlink that describes the in-
                                                                stance further. A meta ontology specifies the type of ac-
tions to be carried out through the dragging and dropping        CONCLUSIONS
operations.                                                      SemanticWord integrates into a widely used COTS product
S-CREAM [5] extends the CREAM framework with an                  an environment for authoring document content and anno-
information extraction component for the semi-automatic          tations. It includes several features intended to minimize
generation of annotations. In S-CREAM manual annotation          the cost involved in authoring semantic annotations: cus-
is supported by Ont-O-Mat while automatic information            tomizable tools for generating content and annotations si-
extraction is supported by Amilcare [1], an adaptive infor-      multaneously, direct manipulation of annotations embed-
mation extraction system (IES). Because the IES is unable        ded in the document, reusable annotations, annotated text
to capture relationships in a graph that connects the indi-      templates, and an information extraction system including
viduals described in the text, the output of the IES has to be   support for refining and augmenting its output.
mapped into a Discourse Representation (dependent on the
domain) before generating a set of markup hypotheses.            REFERENCES
This technique is still very rudimentary.                        [1] Ciravegna, F. Adaptive Information Extraction from
SMORE [6] provides an environment for composing the                  Text by Rule Induction and Generalisation, Proc. of
content and the inline semantic annotation of web pages,             17th International Joint Conference on Artificial Intel-
email, and other online documents. Like SemanticWord,                ligence (IJCAI 2001) , Seattle, August 2001.
SMORE aims to support semantic annotation without dis-           [2] Ciravegna, F., Dingli, A., Petrelli, D., and Wilks, Y.,
rupting the document creation process. Toward this end               Timely and Non-Intrusive Active Document Annotation
SMORE supports practices like using place holders to de-             via Adaptive Information Extraction, in Semantic Au-
fer the final determination of the markup, referencing mul-          thoring, Annotation & Knowledge Markup (SAAKM
tiple ontologies that can be brought to bear when the need           2002), ECAI 2002 Workshop, July 22-26, 2002 ,
arises, and extending ontologies if none of the known on-            Lyon, France.
tologies fit the user needs. SMORE also integrates several       [3] Connolly, D, van Harmelen, F., Horrocks, I, McGuin-
unique capabilities, like the ability to annotate parts of im-       ness, D., Patel-Schneider, P., and Stein, L.
ages using SVG, an advance ontology search capability,               DAML+OIL (March 2001) ReferenceDescription,
web scraping, and a Semantic Virtual Portal that provides            W3C Note 18 December 2001
links to semantically related material                               http://www.w3.org/TR/daml+oil-reference.
MnM [8] and Melita [2] are environments that streamline          [4] Handschuh, S and Staab, A. Authoring and Annotation
the automatic production of semantic annotations using an            of Web Pages in CREAM, in Proceedings of the
information extraction system (IES). The process supported           WWW2002 - Eleventh International World Wide Web
by these systems comprises several activities, including             Conference, Hawaii, USA, May 2002.
manually annotating web pages (for training the IES),            [5] Siegfried Handschuh, Stephen Staab, Fabio Ciravegna,
training the IES using the annotated pages, tuning the per-          S-CREAM -- Semi-automatic CREAtion of Metadata,
formance of the trained system, and running the IES to               in Semantic Authoring, Annotation & Knowledge
automatically annotate a set of pages. MnM implements a              Markup (SAAKM 2002), ECAI 2002 Workshop, July
generic process model which is also generic with respect to          22-26, 2002 , Lyon, France.
the specific ontology server and information extraction tool
used. Melita is a demonstration system that seamlessly in-       [6] Aditya Kalyanpur, James Hendler, Bijan Parsia, Jenni-
tegrates manual annotation, incremental training, and auto-          fer Golbeck, SMORE - Semantic Markup, Ontology,
matic information extraction in a timely and non-intrusive           and RDF Editor, available at
way. These systems have demonstrated that is possible to             http://www.mindswap.org/papers/SMORE.pdf
highly automate the generation of semantic annotations.          [7] Paul Kogut, William Holmes, AeroDAML: Applying
Unfortunately, the scope of these annotations is restricted          Information Extraction to Generate DAML Annota-
to only filling in one information template per document.            tions from Web Pages , First International Conference
Both systems use Amilcare as their IES.                              on Knowledge Capture (K-CAP 2001), Workshop on
Among the described annotation tools only SemanticWord               Knowledge Markup and Semantic Annotation, Victo-
provides an environment for document authoring and se-               ria, B.C. October 21, 2001 AeroDAML.
mantic annotation that extends a COTS product that au-           [8] Maria Vargas-Vera, Enrico Motta, John Domingue,
thors have already adopted (MS Word). SemanticWord is                Mattia Lanzoni, Arthur Stutt1, and Fabio Ciravegna,
also the only one that associates semantic annotations               MnM: Ontology-Driven Tool for Semantic Markup, in
within text regions and consequently facilitates annotation          Proceedings of ECAI 2002, July, 2002 , Lyon, France.
reuse and maintainability.