<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harith Alani</string-name>
          <email>ha@ecs.soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanghee Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David E. Millard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark J. Weal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wendy Hall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul H. Lewis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nigel Shadbolt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>I.A.M. Group, ECS Dept. University of Southampton Southampton</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically extract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to generate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information Extraction</kwd>
        <kwd>Ontology Instantiation</kwd>
        <kwd>and Knowledge Consolidation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Web pages are the source of vast amounts of knowledge.
This knowledge is often buried by layers of text and
scattered over numerous sites. Associating web pages with
annotations to identify their knowledge content is the ambition
of the Semantic Web [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Much research is now focused on
developing ontologies to manipulate this knowledge and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
      </p>
      <p>K-CAP’03, October 23-25, 2003, Sanibel Island, FL, USA.</p>
      <p>
        Copyright 2003 ACM 1-58113-000-0/00/0000…$5.00
provide a variety of knowledge services. Automatic
instantiation of ontologies and building knowledge bases (KB)
with knowledge extracted from the web corpus is therefore
very beneficial. Artequakt is concerned with automating
ontology instantiation with knowledge triples (subject
relation - object) about the life and work of artists, and
providing this knowledge for biography generation services.
When analysing and extracting information from multi
sourced documents, it is inevitable that duplicated and
contradictory information will be extracted. Handling such
information is challenging for automatic extraction and
ontology instantiation approaches [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Artequakt applies a set of
heuristics and reasoning methods in an attempt to
distinguish conflicting information, to verify it, and to identify
and merge duplicate assertions in the KB automatically.
This paper describes the main components of the Artequakt
system, focusing on the latest development with respect to
knowledge consolidation and ontology instantiation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Extracting information from web pages to generate various
reports is becoming the focus of much research. The closest
work we found to Artequakt is the area of text
summarisation. A number of summarisation techniques have been
described to help bring together important pieces of
information from documents and present them to the user in a
compact form.</p>
      <p>
        Even though most summarisation systems deal with single
documents, some have targeted multiple resources [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ][
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
Statistical based summarisations tend to be domain
independent, but lack the sophistication required for merging
information from multiple documents [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. On the other
hand, Information Extraction (IE) based summarisations are
more capable of extracting and merging information from
various resources, but due to the use of IE, they are often
domain dependent.
      </p>
      <p>
        Radev developed the SUMMONS system [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to extract
information and generate summaries of individual events
from MUC (Message Understanding Conferences) text
corpuses. The system compares information extracted from
multiple resources, merges similar content and highlights
contradictions. However, like most IE based systems;
information merging is often based on linguistics and timeline
comparison of single events [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] or multiple events
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Artequakt’s knowledge consolidation is based on the
comparison of individual knowledge fragments, rather than
linguistic analyses or timeline comparison. Furthermore,
Artequakt’s consolidation is more fine-grained, focusing on the
comparison and merging of individual entities (e.g. places,
people, dates).</p>
      <p>
        Most traditional IE systems are domain dependent due to
the use of linguistic rules designed to extract information of
specific content (e.g. bombing events (MUC systems),
earthquake news [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], sports matches [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). Adaptive IE
systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] can ease this problem by identifying new
extraction rules induced from example annotations supplied
by users. However, training such tools can be difficult and
time consuming. Promising results are offered by more
advanced adaptive IE tools, such as Armadillo [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which
discovers new linguistic and structural patterns automatically,
thus requiring limited bootstrapping.
      </p>
      <p>
        Using ontologies to back up IE is hoped to support
information integration [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and increase domain portability
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Poibeau [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] investigated increasing domain
independency by using clustering methods on text corpuses to
aid users construct primitive ontologies to represent the
main corpus topics. Templates could then be generated
from the ontology and guide the IE process. Ontologies
produced by this approach are limited to the content of the
corpus, rather than representing a specific domain. In some
cases (such as in Artequakt) the corpus is very large and
diverse (e.g. the Web). Creating ontologies from such
corpus is infeasible. Furthermore, these ontologies are likely to
be rough, shallow, and include undesired concepts that
happen to be in the text corpus. Consequently, the cost of
bringing such ontologies to shape might exceed the benefit.
Instantiating ontologies with assertions from textual
documents can be a very laborious task. A number of tools have
been developed that instantiate ontologies semi
automatically with user driven annotations [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. IE learning tools,
such as Amilcare [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], can be used to automate part of the
annotation process and speed up ontology instantiation
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>ARTEQUAKT</title>
      <p>The Artequakt project has implemented a system that
searches the Web and extracts knowledge about artists,
based on an ontology describing that domain, and stores
this knowledge in a KB to be used for automatically
producing personalised biographies of artists. Artequakt draws
from the expertise and experience of three separate
projects; Sculpteur1, Equator2, and AKT3. The main
components of Artequakt are described in the following sections.
S y s te m O v e rv i ew
The architecture is designed to allow different
approaches to information extraction to be incorporated
with the ontology acting as a mediation layer between
the IE and the KB. Currently we are using textual
analysis tools to scrape web pages for knowledge, but with
the increasing proliferation of the semantic web,
addi</p>
      <sec id="sec-3-1">
        <title>1 http://www.sculpteurweb.org/</title>
      </sec>
      <sec id="sec-3-2">
        <title>2 http://www.equator.ac.uk/</title>
      </sec>
      <sec id="sec-3-3">
        <title>3 http://www.aktors.org/</title>
        <p>tional tools could be added that take advantage of any
semantically augmented pages passing the embedded
knowledge through the KB.</p>
        <p>As well as keeping open the interface between the KB
and the extraction technology, a clear separation has
been kept between the creation of a structured document
from the knowledge base and the rendering of that
document. In the current system, the information is
rendered into an HTML page but alternative-rendering
engines could be envisaged. For example, rather than
presenting the biography as a linear textual document, the
information might be rendered into a dynamic
presentation system such as SMIL, converted into an audio
stream using text to speech tools, or perhaps used to
generate a dynamic hypertext with links referring back
to queries to the KB on items such as artists names.
&lt;kb:Person rdf:about="&amp;kb;Person_1"
kb:name="Pierre-Auguste Renoir"
rdfs:label="Person_1"&gt;
&lt;kb:date_of_birth rdf:resource=
"&amp;kb;Date_1"/&gt;
&lt;kb:place_of_birth rdf:resource=
"&amp;kb;Place_1"/&gt;
&lt;kb:has_father rdf:resource=
"&amp;kb;Person_2"/&gt;
&lt;kb:has_information_text rdf:resource=
"&amp;kb;Paragraph_1"/&gt;
&lt;/kb:Person&gt;
&lt;kb:Date rdf:about="&amp;kb;Date_1"
kb:day="25"
kb:month="2"
kb:year="1841"
rdfs:label="Date_1"&gt;
&lt;/kb:Date&gt;
&lt;kb:E53.Place rdf:about="&amp;kb;Place_1"
kb:name="Limoges"
rdfs:label="Place_1"/&gt;
&lt;kb:Person rdf:about="&amp;kb;Person_2"
rdfs:label="Person_2"&gt;
&lt;kb:has_work_information rdf:resource=
"&amp;kb;Work_information_1"/&gt;
&lt;/kb:Person&gt;
&lt;kb:Work_information rdf:about=
"&amp;kb;Work_information_1"
kb:job_title="tailor"
rdfs:label="Work_information_1"&gt;
&lt;/kb:Work_information&gt;</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Artequakt Ontology</title>
      <p>For Artequakt the requirement was to build an ontology
to represent the domain of artists and artefacts. The
main part of this ontology was constructed from selected
sections in the CIDOC Conceptual Reference Model
(CRM4) ontology. The CRM ontology is designed to
represent artefacts, their production, ownership,
location, etc.</p>
      <sec id="sec-4-1">
        <title>4 http://cidoc.ics.forth.gr/index.html</title>
        <p>This ontology was modified for Artequakt and enriched
with additional classes and relationships to represent a
variety of information related to artists, their personal
information, family relations, relations with other artists,
details of their work, etc. The Artequakt ontology and
KB are accessible via an ontology server.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>KNOWLEDGE EXTRACTION</title>
      <p>
        The aim of our knowledge extraction tool is to identify
and extract knowledge triples from text documents and
to provide it as RDF files for entry into the KB [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Artequakt uses an ontology coupled with a
generalpurpose lexical database (WordNet) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and an
entityrecogniser (GATE) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as guidance tools for identifying
knowledge fragments.
      </p>
      <p>Artequakt attempts to identify not just entities, but also their
relationships following ontology relation declarations and
lexical information.</p>
    </sec>
    <sec id="sec-6">
      <title>E x tra cti o n Proced u re</title>
      <p>The extraction process is launched when the user requests a
biography for a specific artist that is not in the KB. The
query is passed to selected web search engines and the
search results are analysed with respect to relevancy to the
domain of artists.</p>
      <p>Each selected document is then divided into paragraphs and
sentences. Each sentence is analysed syntactically and
semantically to identify any relevant knowledge to extract.
Below is an example of an extracted paragraph:
"Pierre-Auguste Renoir was born in Limoges on February
25, 1841. His father was a tailor and his mother a
dressmaker. "
Annotations provided by GATE and WordNet highlight
that ‘Pierre-Auguste Renoir‘ is a person’s name,
‘February 25, 1841’ is a date, and ‘Limoges‘ is a location.
Relation extraction is determined by the categorisation
result of the verb ‘bear’ which matches with two
potential relations in the ontology; ‘date_of_birth’ and
‘place_of_birth’. Since both relations are associated
with ‘February 25, 1841‘ and ‘Limoges‘ respectively,
this sentence generates the following knowledge triples
about Renoir:
• Pierre-Auguste Renoir date_of_birth</p>
      <p>
        25/2/1841
• Pierre-Auguste Renoir place_of_birth Limoges
The second sentence generates knowledge triples related
to Renoir’s family:
Pierre-Auguste Renoir has_father Person_2
• Person_2 job_title Tailor
• Pierre-Auguste Renoir has_mother Person_3
• Person_3 job_title Dressmaker
Inaccurately extracted knowledge may reduce the
quality of the system’s output. For this reason, our extraction
rules were designed to be of low risk levels to ensure
higher extraction precision. Advanced consistency
checks can help identify some extraction inaccuracies;
e.g. a date of marriage is before the date of birth, or two
unrelated places of birth for the same person!
The extraction process terminates by sending the
extracted knowledge to the ontology server. Figure 2 is the
RDF representation of the extracted knowledge.
Artequakt’s IE process is out of the scope of this paper, and
is fully described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>BIOGRAPHY GENERATION</title>
      <p>Once the information has been extracted, stored and
consolidated, the Artequakt system repurposes it by
automatically generating biographies of the artists.
Figure 3 shows a biography of Renoir.</p>
      <p>
        The biographies are based on templates authored in the
Fundamental Open Hypermedia Model (FOHM) and
stored in the Auld Linky contextual structure server
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Each section of the template is instantiated with
paragraphs or sentences generated from information in
the KB. The KB informs the templates of the theme of
the sentences and paragraphs (e.g. influences, family
info, painting) and the generation tool select the relevant
ones and structure them in the desired form and order.
Very little text generation is used in the current
implementation (e.g. Figure 3, 1st and last sentences), but this
will be the focus of the next phase.
      </p>
      <p>
        By storing conflicting information rather than discarding
it during the consolidation process, the opportunity
exists to provide biographies that set out arguments as to
the facts (with provenance, in the form of links to the
original sources) by juxtaposing the conflicting
information and allowing the reader to make up their own mind.
Different templates can be constructed for different
types of biography. Two examples are the summary
biography, which provides paragraphs about the artist
arranged in a rough chronological order, and the fact
sheet, which simply lists a number of facts about the
artist, i.e. date of birth, place of study etc. The
biographies also take advantage of the structure server’s ability
to filter the template based on a user’s interest. If the
reader is not interested in the family life of the artist the
biography can be tailored to remove this information.
More about Artequakt’s biography generation is
available at [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>AUTOMATIC INSTANTIATION</title>
      <p>
        Storing knowledge extracted from text documents in
KBs offers new possibilities for further analysis and
reuse. Ontology instantiation refers to the insertion of
information into the KB, as described by the ontology
(sometimes referred to as ontology population).
Instantiating ontologies with a high quantity and quality of
knowledge is one of the main steps towards providing
valuable and consistent ontology-based knowledge
services. Manual ontology instantiation is very labour
intensive and time consuming. Some semi-automatic
approaches have investigated creating document
annotations and storing the results as assertions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref20">20</xref>
        ][
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] describe two frameworks for user-driven
ontology-based annotations, enforced with the IE
learning tool; Amilcare [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the two frameworks
are manually driven and mainly focus on entity
annotations. They lack the capability of identifying
relationships reliably. In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], relationships were added
automatically between instances, but only if these
instances already existed in the KB, otherwise user
intervention is required.
      </p>
      <p>In Artequakt we investigate the possibility of moving
towards a fully automatic approach of feeding the
ontology with knowledge extracted from unstructured text.
Information is extracted in Artequakt with respect to a
given ontology and provided as RDF or XML files using
tags mapped directly from names of classes and
relationships in that ontology. When the ontology server
receives a new RDF file, a feeder tool is activated to
parse the file and adds its knowledge triples to the KB
automatically. Once the feeding process terminates, the
consolidation tool searches for and merges any
duplication in the KB.</p>
    </sec>
    <sec id="sec-9">
      <title>KNOWLEDGE BASE CONSOLIDATION</title>
      <p>
        Automatically instantiating an ontology from diverse
and distributed resources poses significant challenges.
One persistent problem is that of the consolidation of
duplicate information that arises when extracting similar
or overlapping information from different sources.
Tackling this problem is important to maintain the
referential integrity and quality of results of any
ontologybased knowledge service. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] relied on manually
assigned object identifiers to avoid duplication when
extracting from different documents.
      </p>
      <p>
        Little research has looked at the problem of information
consolidation in the IE domain. This problem becomes
more apparent when extracting from multiple
documents. Comparing and merging extracted information is
often based on domain dependent heuristics [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Our approach attempts to identify inconsistencies
and consolidate duplications automatically using a set of
heuristics and term expansion methods based on
WordNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>D u p l i c a t e I n f o r ma t i o n
There exist two main type of duplication in our KB;
duplicate instances (e.g. multiple instance representing the
same artist), and duplicate attribute values (e.g. multiple
dates of birth extracted for the same artists).</p>
      <p>Artequakt’s IE tool treats each recognised entity (e.g.
Rembrandt, Paris) as a new instance. This may result in
creating instances with overlapping information (e.g.
two Person instances with the same name and date of
birth). The role of consolidation in Artequakt includes
analysing and comparing attribute values of the
instances of each type of concept in the KB (e.g. Person,
Date) to identify inconsistencies and duplications.
The amount of overlap between the attribute values of
any pair of instances could indicate their duplication
potential. However, this overlap is not always
measurable. IE tools are sometimes only able to extract
fragments of information about a given entity (e.g. an artist),
especially if the source document or paragraph is small
or difficult to analyse. This leads to the creation of new
instances with only one or two facts associated with
each. For example two artist instances with the name
Rembrandt, where one instance has a location
relationship to Holland, while the other has a date of birth of
1606. Comparing such shallow instances will not reveal
their duplication potential. Furthermore, neither the
source information nor the information extraction is
always accurate. For example a Rembrandt instance can
be extracted with the correct family attribute values, but
with the wrong date of birth, in which case this instance
will be mismatched with other Rembrandt instances in
spite of referring to the same artist.</p>
      <p>U n i q u e N a m e A s s u m p t i o n
One basic heuristic applied in Artequakt is that artist
names are unique; where artist instances with identical
names are merged. According to this heuristic, all
instances with the name Rembrandt are combined into one
instance. This heuristic is obviously not fool proof, but
it works well in the limited domain of artists.</p>
      <p>I n f o r m a t i o n O v e r l a p
There are cases where the full name of an artist is not
given in the source document or its extraction fails, in
which case they will not be captured by the unique-name
heuristic. For example, when we extracted information
about Rembrandt and merged same-name artists, two
instances remained for this artist; Rembrandt and
Rembrandt Harmenszoon van Rijn. In such a case we
compare certain attribute values, and merge the two
instances if there is sufficient overlap. For the two
Rembrandt instances, both had the same date and place of
birth, and therefore were combined into one instance.
The duplication would have not been caught if these
attributes had different values.</p>
      <p>A t t r i b u t e C o m p a r i s o n
When the above heuristics are applied, merged instances
might end up having multiple attribute values (e.g.
multiple dates and places of birth), which in turn need to be
analysed and consolidated. Note that some of these
attributes might hold conflicting information that should
be verified and held for future comparison and use.
Comparing the values of instance attributes is not
always straightforward as these values are often extracted
in different formats and specificity levels (e.g.
synonymous place names, different date styles) making them
harder to match. Artequakt applies a set of heuristics
and expansion methods in an attempt to match these
values. Consider the following sentences:
1. Rembrandt was born in the 17th century in Leyden.
2. Rembrandt was born in 1606 in Leiden, the
Netherlands.</p>
      <p>3. Rembrandt was born on July 15 1606 in Holland.
These sentences provide the same information about an
artist, written in different formats and specificity levels.
Storing this information in the KB in such different
formats is confusing for the biography generator which can
benefit from knowing which information is repetitive
and which is contradictory. Matching the above
sentences required enriching the original ontology with
some temporal and geographical reasoning.</p>
      <p>
        G eo g r a p h i c a l C o n s o l i d a t i o n
There has been much work on developing gazetteers of
place names, such as the Thesaurus of Geographic
Names (TGN) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Alexandria Digital Library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Ontologies can be integrated with such sources to
provide the necessary knowledge about geographical
hierarchies, place name variations, and other spatial
information [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Artequakt derives its geographical
knowledge from WordNet [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. WordNet contains information
about geopolitical place names and their hierarchies,
providing three useful relations for the context of
Artequakt; synonym, holonym (part of), and part_meronym
(sub part). The Artequakt ontology is extended to add
this information for each new instance of place added to
the KB.
      </p>
      <p>P l a c e N a m e S y n o n y m s
The synonym relationship is used to identify equivalent
place names. For example the three sentences above
mention several place names were Rembrandt was born.
Using the synonym relationship in WordNet, Leyden can
be identified as a variant spelling for Leiden, and that
Holland and The Netherlands are synonymous.
P l a c e S p e c i f i c i t y
The part-of and sub-part relationships in WordNet are
used to find any hierarchical links between the given
places. WordNet shows that Leiden is part of the
Netherlands, indicating that Leiden is the more precise
information about Rembrandt’s place of birth.</p>
      <p>S h a r e d P l a c e N a m e s
It is common for places to share the same name. For
example according to the TGN, there are 22 places worldwide
named London. This problem is less apparent with
WordNet due to its limited geographical coverage.</p>
      <p>In Artequakt, disambiguation of place names is dependent
on their specificity variations. For example after processing
the three sentences about Rembrandt, it becomes apparent
that he was born in a place named Leiden in the
Netherlands. If the last two sentences were not available, it would
have not been possible to tell for sure which Leiden is being
referred to (assuming there is more than one). One
possibility is to rely on other information, such as place of work,
place of death, to make a disambiguation decision.
However, this is likely to produce unreliable results.</p>
      <p>T emp o ra l C o n s o l i d a ti o n
Dates need to be analysed to identify any inconsistencies
and locate precise dates to use in the biographies.
Simple temporal reasoning and heuristics can be used to
support this task.</p>
      <p>Artequakt’s IE tool can identify and extract dates in
different formats, providing them as day, month, year,
decade, etc. This requires consolidation with respect to
precision and consistency. Going back to our previous
example, to consolidate the first date (17th century), the
process checks if the years of the other dates fall within
the given century. If this is true, then the process tries to
identify the more precise date. The date in the third
sentence is favoured over the other two dates as they are all
consistent, but the third date holds more information
than the other two. Therefore, the third date is used for
the instance of Rembrandt. If any of the given facts is
inconsistent then it will be stored for future verification
and use.</p>
      <p>At the end of the consolidation process, the knowledge
extracted from the three sentences above will be stored
in the KB as the following two triples for the instance of
Rembrandt:
•
•</p>
      <p>Rembrandt date_of_birth 15 July 1606</p>
      <p>Rembrandt place_of_birth Leiden
I n c o n s i s t en t I n f o r ma t i o n
Some of the extracted information can be inconsistent,
for example an artist with different dates or places of
birth or death, or inconsistent temporal information,
such as a date of death that falls before the date of birth.
The source of such inconsistency can be the original
document itself, or an inaccurate extraction. Predicting
which knowledge is more reliable is not trivial.
Currently we rely on the frequency in which a piece of
knowledge is extracted as an indicator of its accuracy;
the more a particular piece of information is extracted,
the more accurate it is considered to be. For example,
for Renoir, two unique dates of births emerged; 25 Feb
1841 and 5 Feb 1841. The former date has been
extracted from several web sites, while the latter was
found in one site only, and therefore considered to be
less reliable.</p>
      <p>A more advanced approach can be based on assigning
levels of trust for each extracted piece of knowledge,
which can be derived from the reliability of the source
document, or the confidence level of the extraction of
that particular information. The knowledge
consolidation process is not aimed at finding ‘the right answers’
however. The facts extracted are stored for future use,
with references to the original material.</p>
    </sec>
    <sec id="sec-10">
      <title>PORTABILITY TO OTHER DOMAINS</title>
      <p>The use of an ontology to back up IE is meant to increase
the system’s portability to other domains. By swapping the
current artist ontology with another domain specific one,
the IE tool should still be able to function and extract some
relevant knowledge, especially if it is concerned with
domain independent relations expressed in the ontology, such
as personal information (name, date and place of birth,
family relations, etc). However, some domain specific
extraction rules, such as painting style, will eventually have to be
retuned to fit the new domain.</p>
      <p>Similarly, the generation templates are currently manually
set for biography construction. These templates may need to
be modified if a different type of output is required. We aim
to investigate developing templates that can be dynamically
instructed and modified by the ontology.</p>
      <p>Consolidation is often based on domain dependent
heuristics. However, some of the heuristics used in Artequakt can
be suitable for other domain. For example, Artequakt’s
approach for comparing and integrating place names using
external gazetteers can be used in any domain. Similarly,
heuristics concerning the comparison of specific facts to
decide whether or not two instances of people are
duplicates is also domain independent. Further work is planned
to extend the scope of information integration
Building a cross-domain system is one of the aims of this
project, and will be fully investigated in the next stage of
development.</p>
    </sec>
    <sec id="sec-11">
      <title>EVALUATION</title>
      <p>
        We used the system to instantiate the KB with
information on five artists, extracted from around 50 web pages.
E x t r a c t i o n Per f o r m a n c e
Precision and recall were calculated for a set of 10 artist
relations (about birth, death, places where they worked
or studied, who influenced them, professions of their
parents, etc). Results showed that precision scored
higher than recall with average values of 85 and 42
respectively. The experiment is more detailed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
B i o g ra p h y E v a l u a ti o n
Although we have not conducted any formal evaluation
of the biographies generated by the system, we are in the
position to make a few observations. In general we
found that the system is fairly successful in reproducing
text for a given artist. We are currently looking at how
best to perform a qualitative evaluation of the
biographies, perhaps with a task-based user evaluation,
comparing the Artequakt system with a traditional search
engine.
      </p>
      <p>C o n s o l i d a t i o n R a t e
Table 1 shows the reduction rate in number of instances
and relations after consolidating the KB. Applying the
heuristics described earlier in the paper lead to the
reduction in number of instances of the Person and Date
classes by 90% and 64% respectively. Before
consolidation, 283 instances representing Rembrandt were stored.
The unique-name consolidation heuristic was the most
effective with no identified mistakes.</p>
      <p>When place instances are fed to the KB, they are
expanded using WordNet and stored alongside their
synonyms, holonyms (part of), and part_meronym (sub
parts). The number of Place instances created in the KB
has therefore increased significantly (94% rise). This
gave the consolidation the power to identify and
consolidate relationships to places as described in the
geographical consolidation section. Some instances (mainly
dates) were not consolidated due to slight syntactical
Class
Person
instance
Date
instance
Place
instance
Person
relations
differences, e.g. “25th/2/1841” versus “25/2/1841”. This
highlights the need for an additional syntactic-checking
process that could eliminate such noise.</p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>This paper describes a system that automatically extracts
knowledge, instantiates an ontology with knowledge triples,
and reassembles the knowledge in the form of biographies.
Problems related to this task, such as the identification and
consolidation of duplicated knowledge and the verification
of inconsistent knowledge, are highlighted. Artequakt’s
approaches to tackle these problems are described.
An initial experiment, using around 50 web pages and 5
artists, showed promising results, with nearly 3 thousand
unique knowledge triples extracted (before consolidation).
However, some of this knowledge was too sparse to be of
any clear benefit. This indicates that more pages need to be
processed, and further rules need to be constructed to cover
additional ontology concepts and relations and expand the
knowledge extraction scope.</p>
      <p>The generated biographies were informative and brought
together knowledge extracted from various sources.
However, reusing original text to generate biographies
highlighted several problems, including co-referencing and
other textual deixis (such as 'Later', or 'Nevertheless'). This
underlines the potential benefits of regenerating text
directly from the extracted facts, which is part of our near
future plans.</p>
      <p>Our consolidation techniques significantly decreased the
number of instances in the KB by up to 90% for certain
classes and 63% for attributes related to instances of
Person. Few instances remained undetected, mainly due to lack
of information required for the knowledge comparison.
Future work on Artequakt will continue to develop its
modular architecture and refine the information extraction
and consolidation processes. In addition we are beginning
to look at how we might leverage the full power of the
underlying ontology to aid extracting information from
multiple domains and produce different type of reports.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This research is funded in part by EU Framework 5 IST
project “Scultpeur” IST-2001-35372, EPSRC IRC project
“Equator” GR/N15986/01 and EPSRC IRC project “AKT”
GR/N15764/01</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tudhope</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Associative and Spatial Relationships in Thesaurus-Based Retrieval</article-title>
          .
          <source>Proc. 4th European Conf. on Digital Libraries</source>
          , pages
          <fpage>45</fpage>
          --
          <lpage>58</lpage>
          , Lisbon, Portugal, Sept. LNCS,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Automatic Extraction of Knowledge from Web Documents</article-title>
          .
          <source>Workshop on Human Language Technology for the Semantic Web and Web Services, 2nd Int. Semantic Web Conf. Sanibel Island</source>
          , Florida, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Semantic Web</article-title>
          .
          <source>Scientific American</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Adaptive Information Extraction from Text by Rule Induction and Generalisation</article-title>
          .
          <source>Proc.17th Int. Joint Conf. on Artificial Intelligence (IJCAI)</source>
          , pages
          <fpage>1251</fpage>
          --
          <lpage>1256</lpage>
          , Seattle, USA,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
          </string-name>
          , V.:
          <article-title>GATE: a framework and graphical development environment for robust NLP tools and applications</article-title>
          .
          <source>Proc. 40th Anniversary Meeting of the Association for Computational Linguistics</source>
          , Phil, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dingli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guthrie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilks</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Mining Web Sites Using Unsupervised Adaptive Information Extraction</article-title>
          .
          <source>Proc. 10th Conf</source>
          .
          <article-title>of the European Chapter of the Association for Computational Linguistics</article-title>
          , Budapest, Hungary,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
          </string-name>
          , F.:
          <string-name>
            <surname>S-CREAM -</surname>
          </string-name>
          Semi
          <source>Automatic Creation of Metadata. Semantic Authoring, Annotation and Markup Workshop, 15th European Conf. Artificial Intelligence</source>
          , France, Lyon,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Harpring</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Proper Words in Proper Places: The Thesaurus of Geographic Names</article-title>
          .
          <source>MDA Info</source>
          .
          <volume>2</volume>
          (
          <issue>3</issue>
          ),
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frew</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Geographic Names. The Implementation of a Gazetteer in a Georeferenced Digital Library</article-title>
          .
          <source>Digital Library Magazine</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millard</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weal</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web</article-title>
          . Workshop on Semantic Authoring,
          <source>Annotation &amp; Knowledge Markup, 15th Europ. Conf. on Artificial Intelligence</source>
          , pp
          <fpage>1</fpage>
          --
          <lpage>6</lpage>
          , France,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Bootstrapping an Ontology-based Information Extraction System. Intelligent Exploration of the Web</article-title>
          . P.
          <string-name>
            <surname>Szczepaniak</surname>
          </string-name>
          , et al., Heidelberg, Springer 2002.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barzilay</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hatzivassiloglou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klavans</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sable</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiffman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sigelman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Tracking and Summarizing News on a Daily Basis with Columbia's Newsblaster</article-title>
          .
          <source>Proc. Human Language Technology Conf</source>
          ., San Diego, CA, USA.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Michaelides</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millard</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weal</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeRoure</surname>
          </string-name>
          , D.:
          <article-title>Auld Leaky: A Contextual Open Hypermedia Link Server</article-title>
          .
          <source>Proc. 7th Hypermedia: Openness</source>
          ,
          <string-name>
            <given-names>Structural</given-names>
            <surname>Awareness</surname>
          </string-name>
          , and Adaptivity, pages
          <fpage>59</fpage>
          --
          <lpage>70</lpage>
          , Springer Verlag, Heidelberg,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Millard</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weal</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , DeRoure,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Shadbolt</surname>
          </string-name>
          , N.:
          <article-title>Generating Adaptive Hypertext Content from the Semantic Web</article-title>
          .
          <source>1st International Workshop on Hypermedia and the Semantic Web</source>
          , HyperText'03,
          <string-name>
            <surname>Nottingham</surname>
          </string-name>
          , UK.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beckwith</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Introduction to wordnet: An on-line lexical database</article-title>
          .
          <source>Int. J. Lexicography</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>235</fpage>
          --
          <lpage>312</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Poibeau</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Deriving a multi-domain information extraction system from a rough ontology</article-title>
          .
          <source>Proc. 17th Int. Conf. on Artificial Intelligence</source>
          , Seattle. USA,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown. K. R.:</surname>
          </string-name>
          <article-title>Generating natural language summaries from multiple on-line sources</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>469</fpage>
          -
          <lpage>500</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Reidsma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saggion</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          , H.:
          <article-title>Cross document annotation for multimedia retrieval</article-title>
          .
          <source>EACL Workshop on Language Technology and the Semantic Web</source>
          , Budapest,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An Annotation Framework for the Semantic Web</article-title>
          .
          <source>Proc. 1st Int. Workshop on MultiMedia Annotation</source>
          , Tokyo,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Vargas-Vera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Buckingham</given-names>
            <surname>Shum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lanzoni</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Knowledge Extraction by using an Ontology-based Annotation Tool</article-title>
          .
          <source>Proc. Workshop on Knowledge Markup &amp; Semantic Annotation, 1st Int. Conf. on Knowledge Capture</source>
          , pp
          <fpage>5</fpage>
          --
          <lpage>12</lpage>
          , Victoria,
          <string-name>
            <given-names>B.C.</given-names>
            ,
            <surname>Canada</surname>
          </string-name>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Vargas-Vera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanzoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stutt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup</article-title>
          .
          <source>13th Int. Conf. on Knowledge Engineering and Management (EKAW)</source>
          ,
          <year>Spain</year>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          :
          <article-title>Using WordNet for Text Retrieval. Fellbaum (edt</article-title>
          .)
          <source>WordNet: An Electronic Lexical Database</source>
          , pages
          <fpage>285</fpage>
          --
          <lpage>303</lpage>
          , MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>White</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korelsky</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pierce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wagstaff</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Multidocument Summarization via Information Extraction</article-title>
          .
          <source>Proc. of Human Language Technology Conf. (HLT</source>
          <year>2001</year>
          ), San Diego, CA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>