<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Harith</forename><surname>Alani</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sanghee</forename><surname>Kim</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">David</forename><forename type="middle">E</forename><surname>Millard</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mark</forename><forename type="middle">J</forename><surname>Weal</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Wendy</forename><surname>Hall</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paul</forename><forename type="middle">H</forename><surname>Lewis</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nigel</forename><surname>Shadbolt</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">I</forename><forename type="middle">A M</forename><surname>Group</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ecs</forename><surname>Dept</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton Southampton</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2445C0BF605E903EED9A4EEDEA75B7D1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>I.2.6 Learning -Knowledge acquisition I.2.7 Natural Language Processing -Text analysis</term>
					<term>Language parsing and understanding Information Extraction</term>
					<term>Ontology Instantiation</term>
					<term>and Knowledge Consolidation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically extract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to generate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>Web pages are the source of vast amounts of knowledge. This knowledge is often buried by layers of text and scattered over numerous sites. Associating web pages with annotations to identify their knowledge content is the ambition of the Semantic Web <ref type="bibr" target="#b2">[3]</ref>. Much research is now focused on developing ontologies to manipulate this knowledge and provide a variety of knowledge services. Automatic instantiation of ontologies and building knowledge bases (KB) with knowledge extracted from the web corpus is therefore very beneficial. Artequakt is concerned with automating ontology instantiation with knowledge triples (subjectrelation -object) about the life and work of artists, and providing this knowledge for biography generation services.</p><p>When analysing and extracting information from multi sourced documents, it is inevitable that duplicated and contradictory information will be extracted. Handling such information is challenging for automatic extraction and ontology instantiation approaches <ref type="bibr" target="#b17">[18]</ref>. Artequakt applies a set of heuristics and reasoning methods in an attempt to distinguish conflicting information, to verify it, and to identify and merge duplicate assertions in the KB automatically. This paper describes the main components of the Artequakt system, focusing on the latest development with respect to knowledge consolidation and ontology instantiation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RELATED WORK</head><p>Extracting information from web pages to generate various reports is becoming the focus of much research. The closest work we found to Artequakt is the area of text summarisation. A number of summarisation techniques have been described to help bring together important pieces of information from documents and present them to the user in a compact form.</p><p>Even though most summarisation systems deal with single documents, some have targeted multiple resources <ref type="bibr" target="#b11">[12]</ref> <ref type="bibr" target="#b22">[23]</ref>. Statistical based summarisations tend to be domain independent, but lack the sophistication required for merging information from multiple documents <ref type="bibr" target="#b16">[17]</ref>. On the other hand, Information Extraction (IE) based summarisations are more capable of extracting and merging information from various resources, but due to the use of IE, they are often domain dependent.</p><p>Radev developed the SUMMONS system <ref type="bibr" target="#b16">[17]</ref> to extract information and generate summaries of individual events from MUC (Message Understanding Conferences) text corpuses. The system compares information extracted from multiple resources, merges similar content and highlights contradictions. However, like most IE based systems; information merging is often based on linguistics and timeline comparison of single events <ref type="bibr" target="#b16">[17]</ref> <ref type="bibr" target="#b22">[23]</ref> or multiple events <ref type="bibr" target="#b17">[18]</ref>. Artequakt's knowledge consolidation is based on the comparison of individual knowledge fragments, rather than linguistic analyses or timeline comparison. Furthermore, Artequakt's consolidation is more fine-grained, focusing on the comparison and merging of individual entities (e.g. places, people, dates).</p><p>Most traditional IE systems are domain dependent due to the use of linguistic rules designed to extract information of specific content (e.g. bombing events (MUC systems), earthquake news <ref type="bibr" target="#b22">[23]</ref>, sports matches <ref type="bibr" target="#b17">[18]</ref>). Adaptive IE systems <ref type="bibr" target="#b3">[4]</ref> can ease this problem by identifying new extraction rules induced from example annotations supplied by users. However, training such tools can be difficult and time consuming. Promising results are offered by more advanced adaptive IE tools, such as Armadillo <ref type="bibr" target="#b5">[6]</ref>, which discovers new linguistic and structural patterns automatically, thus requiring limited bootstrapping.</p><p>Using ontologies to back up IE is hoped to support information integration <ref type="bibr" target="#b1">[2]</ref>[18] and increase domain portability <ref type="bibr" target="#b9">[10]</ref> <ref type="bibr" target="#b10">[11]</ref>. Poibeau <ref type="bibr" target="#b15">[16]</ref> investigated increasing domain independency by using clustering methods on text corpuses to aid users construct primitive ontologies to represent the main corpus topics. Templates could then be generated from the ontology and guide the IE process. Ontologies produced by this approach are limited to the content of the corpus, rather than representing a specific domain. In some cases (such as in Artequakt) the corpus is very large and diverse (e.g. the Web). Creating ontologies from such corpus is infeasible. Furthermore, these ontologies are likely to be rough, shallow, and include undesired concepts that happen to be in the text corpus. Consequently, the cost of bringing such ontologies to shape might exceed the benefit.</p><p>Instantiating ontologies with assertions from textual documents can be a very laborious task. A number of tools have been developed that instantiate ontologies semi automatically with user driven annotations <ref type="bibr" target="#b19">[20]</ref>. IE learning tools, such as Amilcare <ref type="bibr" target="#b3">[4]</ref>, can be used to automate part of the annotation process and speed up ontology instantiation <ref type="bibr" target="#b6">[7]</ref> <ref type="bibr" target="#b20">[21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ARTEQUAKT</head><p>The Artequakt project has implemented a system that searches the Web and extracts knowledge about artists, based on an ontology describing that domain, and stores this knowledge in a KB to be used for automatically producing personalised biographies of artists. Artequakt draws from the expertise and experience of three separate pro-jects; Sculpteur<ref type="foot" target="#foot_0">1</ref> , Equator<ref type="foot" target="#foot_1">2</ref> , and AKT<ref type="foot" target="#foot_2">3</ref> . The main components of Artequakt are described in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>System Overview</head><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates Artequakt's architecture which comprises of three key areas. The first concerns the knowledge extraction tools used to extract factual information from documents and pass it to the ontology server. The second key area is information management and storage. The information is stored by the ontology server and consolidated into a KB which can be queried via an inference engine. The final area is the narrative generation. The Artequakt server takes requests from a reader via a simple Web interface. The request will include an artist and the style of biography to be generated (chronology, summary, fact sheet, etc.). The server uses story templates to render a narrative from the information stored in the KB using a combination of original text fragments and natural language generation. The architecture is designed to allow different approaches to information extraction to be incorporated with the ontology acting as a mediation layer between the IE and the KB. Currently we are using textual analysis tools to scrape web pages for knowledge, but with the increasing proliferation of the semantic web, addi- tional tools could be added that take advantage of any semantically augmented pages passing the embedded knowledge through the KB.</p><p>As well as keeping open the interface between the KB and the extraction technology, a clear separation has been kept between the creation of a structured document from the knowledge base and the rendering of that document. In the current system, the information is rendered into an HTML page but alternative-rendering engines could be envisaged. For example, rather than presenting the biography as a linear textual document, the information might be rendered into a dynamic presentation system such as SMIL, converted into an audio stream using text to speech tools, or perhaps used to generate a dynamic hypertext with links referring back to queries to the KB on items such as artists names.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Artequakt Ontology</head><p>For Artequakt the requirement was to build an ontology to represent the domain of artists and artefacts. The main part of this ontology was constructed from selected sections in the CIDOC Conceptual Reference Model (CRM 4 ) ontology. The CRM ontology is designed to represent artefacts, their production, ownership, location, etc. 4 http://cidoc.ics.forth.gr/index.html This ontology was modified for Artequakt and enriched with additional classes and relationships to represent a variety of information related to artists, their personal information, family relations, relations with other artists, details of their work, etc. The Artequakt ontology and KB are accessible via an ontology server.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>KNOWLEDGE EXTRACTION</head><p>The aim of our knowledge extraction tool is to identify and extract knowledge triples from text documents and to provide it as RDF files for entry into the KB <ref type="bibr" target="#b9">[10]</ref>. Artequakt uses an ontology coupled with a generalpurpose lexical database (WordNet) <ref type="bibr" target="#b13">[14]</ref> and an entityrecogniser (GATE) <ref type="bibr" target="#b4">[5]</ref> as guidance tools for identifying knowledge fragments. Artequakt attempts to identify not just entities, but also their relationships following ontology relation declarations and lexical information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Extraction Procedure</head><p>The extraction process is launched when the user requests a biography for a specific artist that is not in the KB. The query is passed to selected web search engines and the search results are analysed with respect to relevancy to the domain of artists.</p><p>Each selected document is then divided into paragraphs and sentences. Each sentence is analysed syntactically and semantically to identify any relevant knowledge to extract.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Below is an example of an extracted paragraph:</head><p>"Pierre-Auguste Renoir was born in Limoges on February 25, 1841. His father was a tailor and his mother a dressmaker. "</p><p>Annotations provided by GATE and WordNet highlight that 'Pierre-Auguste Renoir' is a person's name, 'February 25, 1841' is a date, and 'Limoges' is a location. Relation extraction is determined by the categorisation result of the verb 'bear' which matches with two potential relations in the ontology; 'date_of_birth' and 'place_of_birth'. Since both relations are associated with 'February 25, 1841' and 'Limoges' respectively, this sentence generates the following knowledge triples about Renoir:</p><formula xml:id="formula_0">• Pierre-Auguste Renoir date_of_birth 25/2/1841 • Pierre-Auguste Renoir place_of_birth Limoges</formula><p>The second sentence generates knowledge triples related to Renoir's family:</p><formula xml:id="formula_1">Pierre-Auguste Renoir has_father Person_2 • Person_2 job_title Tailor • Pierre-Auguste Renoir has_mother Person_3 • Person_3 job_title Dressmaker</formula><p>Inaccurately extracted knowledge may reduce the quality of the system's output. For this reason, our extraction rules were designed to be of low risk levels to ensure higher extraction precision. Advanced consistency checks can help identify some extraction inaccuracies; e.g. a date of marriage is before the date of birth, or two unrelated places of birth for the same person! The extraction process terminates by sending the extracted knowledge to the ontology server. Figure <ref type="figure">2</ref> is the RDF representation of the extracted knowledge. Artequakt's IE process is out of the scope of this paper, and is fully described in <ref type="bibr" target="#b1">[2]</ref> and <ref type="bibr" target="#b9">[10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BIOGRAPHY GENERATION</head><p>Once the information has been extracted, stored and consolidated, the Artequakt system repurposes it by automatically generating biographies of the artists.  The biographies are based on templates authored in the Fundamental Open Hypermedia Model (FOHM) and stored in the Auld Linky contextual structure server <ref type="bibr" target="#b12">[13]</ref>. Each section of the template is instantiated with paragraphs or sentences generated from information in the KB. The KB informs the templates of the theme of the sentences and paragraphs (e.g. influences, family info, painting) and the generation tool select the relevant ones and structure them in the desired form and order.</p><p>Very little text generation is used in the current implementation (e.g. Figure <ref type="figure" target="#fig_2">3</ref>, 1 st and last sentences), but this will be the focus of the next phase.</p><p>By storing conflicting information rather than discarding it during the consolidation process, the opportunity exists to provide biographies that set out arguments as to the facts (with provenance, in the form of links to the original sources) by juxtaposing the conflicting information and allowing the reader to make up their own mind.</p><p>Different templates can be constructed for different types of biography. Two examples are the summary biography, which provides paragraphs about the artist arranged in a rough chronological order, and the fact sheet, which simply lists a number of facts about the artist, i.e. date of birth, place of study etc. The biographies also take advantage of the structure server's ability to filter the template based on a user's interest. If the reader is not interested in the family life of the artist the biography can be tailored to remove this information.</p><p>More about Artequakt's biography generation is available at <ref type="bibr" target="#b13">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AUTOMATIC INSTANTIATION</head><p>Storing knowledge extracted from text documents in KBs offers new possibilities for further analysis and reuse. Ontology instantiation refers to the insertion of information into the KB, as described by the ontology (sometimes referred to as ontology population). Instantiating ontologies with a high quantity and quality of knowledge is one of the main steps towards providing valuable and consistent ontology-based knowledge services. Manual ontology instantiation is very labour intensive and time consuming. Some semi-automatic approaches have investigated creating document annotations and storing the results as assertions <ref type="bibr" target="#b6">[7]</ref>[20] <ref type="bibr" target="#b20">[21]</ref>. <ref type="bibr" target="#b6">[7]</ref> and <ref type="bibr" target="#b19">[20]</ref> describe two frameworks for user-driven ontology-based annotations, enforced with the IE learning tool; Amilcare <ref type="bibr" target="#b2">[3]</ref>. However, the two frameworks are manually driven and mainly focus on entity annotations. They lack the capability of identifying relationships reliably. In <ref type="bibr" target="#b19">[20]</ref>, relationships were added automatically between instances, but only if these instances already existed in the KB, otherwise user intervention is required.</p><p>In Artequakt we investigate the possibility of moving towards a fully automatic approach of feeding the ontology with knowledge extracted from unstructured text. Information is extracted in Artequakt with respect to a given ontology and provided as RDF or XML files using tags mapped directly from names of classes and relationships in that ontology. When the ontology server receives a new RDF file, a feeder tool is activated to parse the file and adds its knowledge triples to the KB automatically. Once the feeding process terminates, the consolidation tool searches for and merges any duplication in the KB.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>KNOWLEDGE BASE CONSOLIDATION</head><p>Automatically instantiating an ontology from diverse and distributed resources poses significant challenges. One persistent problem is that of the consolidation of duplicate information that arises when extracting similar or overlapping information from different sources. Tackling this problem is important to maintain the referential integrity and quality of results of any ontologybased knowledge service. <ref type="bibr" target="#b17">[18]</ref> relied on manually assigned object identifiers to avoid duplication when extracting from different documents.</p><p>Little research has looked at the problem of information consolidation in the IE domain. This problem becomes more apparent when extracting from multiple documents. Comparing and merging extracted information is often based on domain dependent heuristics <ref type="bibr" target="#b16">[17]</ref> [18] <ref type="bibr" target="#b22">[23]</ref>. Our approach attempts to identify inconsistencies and consolidate duplications automatically using a set of heuristics and term expansion methods based on Word-Net <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Duplicate Information</head><p>There exist two main type of duplication in our KB; duplicate instances (e.g. multiple instance representing the same artist), and duplicate attribute values (e.g. multiple dates of birth extracted for the same artists).</p><p>Artequakt's IE tool treats each recognised entity (e.g. Rembrandt, Paris) as a new instance. This may result in creating instances with overlapping information (e.g. two Person instances with the same name and date of birth). The role of consolidation in Artequakt includes analysing and comparing attribute values of the instances of each type of concept in the KB (e.g. Person, Date) to identify inconsistencies and duplications.</p><p>The amount of overlap between the attribute values of any pair of instances could indicate their duplication potential. However, this overlap is not always measurable. IE tools are sometimes only able to extract fragments of information about a given entity (e.g. an artist), especially if the source document or paragraph is small or difficult to analyse. This leads to the creation of new instances with only one or two facts associated with each. For example two artist instances with the name Rembrandt, where one instance has a location relationship to Holland, while the other has a date of birth of 1606. Comparing such shallow instances will not reveal their duplication potential. Furthermore, neither the source information nor the information extraction is always accurate. For example a Rembrandt instance can be extracted with the correct family attribute values, but with the wrong date of birth, in which case this instance will be mismatched with other Rembrandt instances in spite of referring to the same artist.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Unique Name Assumption</head><p>One basic heuristic applied in Artequakt is that artist names are unique; where artist instances with identical names are merged. According to this heuristic, all instances with the name Rembrandt are combined into one instance. This heuristic is obviously not fool proof, but it works well in the limited domain of artists.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Information Overlap</head><p>There are cases where the full name of an artist is not given in the source document or its extraction fails, in which case they will not be captured by the unique-name heuristic. For example, when we extracted information about Rembrandt and merged same-name artists, two instances remained for this artist; Rembrandt and Rembrandt Harmenszoon van Rijn. In such a case we compare certain attribute values, and merge the two instances if there is sufficient overlap. For the two Rembrandt instances, both had the same date and place of birth, and therefore were combined into one instance. The duplication would have not been caught if these attributes had different values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Attribute Comparison</head><p>When the above heuristics are applied, merged instances might end up having multiple attribute values (e.g. multiple dates and places of birth), which in turn need to be analysed and consolidated. Note that some of these attributes might hold conflicting information that should be verified and held for future comparison and use.</p><p>Comparing the values of instance attributes is not always straightforward as these values are often extracted in different formats and specificity levels (e.g. synonymous place names, different date styles) making them harder to match. Artequakt applies a set of heuristics and expansion methods in an attempt to match these values. Consider the following sentences:</p><p>1. Rembrandt was born in the 17th century in Leyden. 2. Rembrandt was born in 1606 in Leiden, the Netherlands.</p><p>3. Rembrandt was born on July 15 1606 in Holland.</p><p>These sentences provide the same information about an artist, written in different formats and specificity levels. Storing this information in the KB in such different formats is confusing for the biography generator which can benefit from knowing which information is repetitive and which is contradictory. Matching the above sentences required enriching the original ontology with some temporal and geographical reasoning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Geographical Consolidation</head><p>There has been much work on developing gazetteers of place names, such as the Thesaurus of Geographic Names (TGN) <ref type="bibr" target="#b7">[8]</ref> and Alexandria Digital Library <ref type="bibr" target="#b8">[9]</ref>.</p><p>Ontologies can be integrated with such sources to provide the necessary knowledge about geographical hier-archies, place name variations, and other spatial information <ref type="bibr" target="#b0">[1]</ref>. Artequakt derives its geographical knowledge from WordNet <ref type="bibr" target="#b13">[14]</ref>. WordNet contains information about geopolitical place names and their hierarchies, providing three useful relations for the context of Artequakt; synonym, holonym (part of), and part_meronym (sub part). The Artequakt ontology is extended to add this information for each new instance of place added to the KB.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Place Name Synonyms</head><p>The synonym relationship is used to identify equivalent place names. For example the three sentences above mention several place names were Rembrandt was born.</p><p>Using the synonym relationship in WordNet, Leyden can be identified as a variant spelling for Leiden, and that Holland and The Netherlands are synonymous.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Place Specificity</head><p>The part-of and sub-part relationships in WordNet are used to find any hierarchical links between the given places. WordNet shows that Leiden is part of the Netherlands, indicating that Leiden is the more precise information about Rembrandt's place of birth.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Shared Place Names</head><p>It is common for places to share the same name. For example according to the TGN, there are 22 places worldwide named London. This problem is less apparent with Word-Net due to its limited geographical coverage.</p><p>In Artequakt, disambiguation of place names is dependent on their specificity variations. For example after processing the three sentences about Rembrandt, it becomes apparent that he was born in a place named Leiden in the Netherlands. If the last two sentences were not available, it would have not been possible to tell for sure which Leiden is being referred to (assuming there is more than one). One possibility is to rely on other information, such as place of work, place of death, to make a disambiguation decision. However, this is likely to produce unreliable results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Temporal Consolidation</head><p>Dates need to be analysed to identify any inconsistencies and locate precise dates to use in the biographies. Simple temporal reasoning and heuristics can be used to support this task.</p><p>Artequakt's IE tool can identify and extract dates in different formats, providing them as day, month, year, decade, etc. This requires consolidation with respect to precision and consistency. Going back to our previous example, to consolidate the first date (17 th century), the process checks if the years of the other dates fall within the given century. If this is true, then the process tries to identify the more precise date. The date in the third sentence is favoured over the other two dates as they are all consistent, but the third date holds more information than the other two. Therefore, the third date is used for the instance of Rembrandt. If any of the given facts is inconsistent then it will be stored for future verification and use.</p><p>At the end of the consolidation process, the knowledge extracted from the three sentences above will be stored in the KB as the following two triples for the instance of Rembrandt:</p><p>• Rembrandt date_of_birth 15 July 1606 • Rembrandt place_of_birth Leiden</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Inconsistent Information</head><p>Some of the extracted information can be inconsistent, for example an artist with different dates or places of birth or death, or inconsistent temporal information, such as a date of death that falls before the date of birth.</p><p>The source of such inconsistency can be the original document itself, or an inaccurate extraction. Predicting which knowledge is more reliable is not trivial. Currently we rely on the frequency in which a piece of knowledge is extracted as an indicator of its accuracy; the more a particular piece of information is extracted, the more accurate it is considered to be. For example, for Renoir, two unique dates of births emerged; 25 Feb 1841 and 5 Feb 1841. The former date has been extracted from several web sites, while the latter was found in one site only, and therefore considered to be less reliable.</p><p>A more advanced approach can be based on assigning levels of trust for each extracted piece of knowledge, which can be derived from the reliability of the source document, or the confidence level of the extraction of that particular information. The knowledge consolidation process is not aimed at finding 'the right answers' however. The facts extracted are stored for future use, with references to the original material.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PORTABILITY TO OTHER DOMAINS</head><p>The use of an ontology to back up IE is meant to increase the system's portability to other domains. By swapping the current artist ontology with another domain specific one, the IE tool should still be able to function and extract some relevant knowledge, especially if it is concerned with domain independent relations expressed in the ontology, such as personal information (name, date and place of birth, family relations, etc). However, some domain specific extraction rules, such as painting style, will eventually have to be retuned to fit the new domain.</p><p>Similarly, the generation templates are currently manually set for biography construction. These templates may need to be modified if a different type of output is required. We aim to investigate developing templates that can be dynamically instructed and modified by the ontology. Consolidation is often based on domain dependent heuristics. However, some of the heuristics used in Artequakt can be suitable for other domain. For example, Artequakt's approach for comparing and integrating place names using external gazetteers can be used in any domain. Similarly, heuristics concerning the comparison of specific facts to decide whether or not two instances of people are duplicates is also domain independent. Further work is planned to extend the scope of information integration Building a cross-domain system is one of the aims of this project, and will be fully investigated in the next stage of development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>EVALUATION</head><p>We used the system to instantiate the KB with information on five artists, extracted from around 50 web pages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Extraction Performance</head><p>Precision and recall were calculated for a set of 10 artist relations (about birth, death, places where they worked or studied, who influenced them, professions of their parents, etc). Results showed that precision scored higher than recall with average values of 85 and 42 respectively. The experiment is more detailed in <ref type="bibr" target="#b1">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Biography Evaluation</head><p>Although we have not conducted any formal evaluation of the biographies generated by the system, we are in the position to make a few observations. In general we found that the system is fairly successful in reproducing text for a given artist. We are currently looking at how best to perform a qualitative evaluation of the biographies, perhaps with a task-based user evaluation, comparing the Artequakt system with a traditional search engine.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Consolidation Rate</head><p>Table <ref type="table" target="#tab_1">1</ref> shows the reduction rate in number of instances and relations after consolidating the KB. Applying the heuristics described earlier in the paper lead to the reduction in number of instances of the Person and Date classes by 90% and 64% respectively. Before consolidation, 283 instances representing Rembrandt were stored. The unique-name consolidation heuristic was the most effective with no identified mistakes.</p><p>When place instances are fed to the KB, they are expanded using WordNet and stored alongside their synonyms, holonyms (part of), and part_meronym (sub parts). The number of Place instances created in the KB has therefore increased significantly (94% rise). This gave the consolidation the power to identify and consolidate relationships to places as described in the geographical consolidation section. Some instances (mainly dates) were not consolidated due to slight syntactical differences, e.g. "25 th /2/1841" versus "25/2/1841". This highlights the need for an additional syntactic-checking process that could eliminate such noise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CONCLUSIONS</head><p>This paper describes a system that automatically extracts knowledge, instantiates an ontology with knowledge triples, and reassembles the knowledge in the form of biographies. Problems related to this task, such as the identification and consolidation of duplicated knowledge and the verification of inconsistent knowledge, are highlighted. Artequakt's approaches to tackle these problems are described.</p><p>An initial experiment, using around 50 web pages and 5 artists, showed promising results, with nearly 3 thousand unique knowledge triples extracted (before consolidation). However, some of this knowledge was too sparse to be of any clear benefit. This indicates that more pages need to be processed, and further rules need to be constructed to cover additional ontology concepts and relations and expand the knowledge extraction scope.</p><p>The generated biographies were informative and brought together knowledge extracted from various sources. However, reusing original text to generate biographies highlighted several problems, including co-referencing and other textual deixis (such as 'Later', or 'Nevertheless'). This underlines the potential benefits of regenerating text directly from the extracted facts, which is part of our near future plans.</p><p>Our consolidation techniques significantly decreased the number of instances in the KB by up to 90% for certain classes and 63% for attributes related to instances of Person. Few instances remained undetected, mainly due to lack of information required for the knowledge comparison.</p><p>Future work on Artequakt will continue to develop its modular architecture and refine the information extraction and consolidation processes. In addition we are beginning to look at how we might leverage the full power of the underlying ontology to aid extracting information from multiple domains and produce different type of reports.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. The Artequakt Architecture</figDesc><graphic coords="2,319.68,316.72,223.20,258.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Figure 3 shows a biography of Renoir.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 .</head><label>3</label><figDesc>Figure 3. A Biography Generated Using Sentences.</figDesc><graphic coords="4,55.08,287.32,229.20,327.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 . Consolidation rates</head><label>1</label><figDesc></figDesc><table><row><cell>Class</cell><cell>Before consld.</cell><cell>After consld.</cell><cell>Rate%</cell></row><row><cell>Person instance</cell><cell>1475</cell><cell>152</cell><cell>-90</cell></row><row><cell>Date instance</cell><cell>83</cell><cell>30</cell><cell>-64</cell></row><row><cell>Place instance</cell><cell>30</cell><cell>505</cell><cell>+94</cell></row><row><cell>Person relations</cell><cell>4240</cell><cell>1562</cell><cell>-63</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.sculpteurweb.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://www.equator.ac.uk/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www.aktors.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGEMENTS</head><p>This research is funded in part by EU Framework 5 IST project "Scultpeur" IST-2001-35372, EPSRC IRC project "Equator" GR/N15986/01 and EPSRC IRC project "AKT" GR/N15764/01</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Associative and Spatial Relationships in Thesaurus-Based Retrieval</title>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tudhope</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 4 th European Conf. on Digital Libraries</title>
				<meeting>4 th European Conf. on Digital Libraries<address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<publisher>LNCS</publisher>
			<date type="published" when="2000-09">Sept. 2000</date>
			<biblScope unit="page" from="45" to="58" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Automatic Extraction of Knowledge from Web Documents</title>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Millard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Weal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shadbolt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Human Language Technology for the Semantic Web and Web Services, 2 nd Int. Semantic Web Conf. Sanibel Island</title>
				<meeting><address><addrLine>, Florida, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">The Semantic Web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lassila</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Scientific American</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Adaptive Information Extraction from Text by Rule Induction and Generalisation</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc.17 th Int. Joint Conf. on Artificial Intelligence (IJCAI)</title>
				<meeting>.17 th Int. Joint Conf. on Artificial Intelligence (IJCAI)<address><addrLine>Seattle, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="1251" to="1256" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">GATE: a framework and graphical development environment for robust NLP tools and applications</title>
		<author>
			<persName><forename type="first">H</forename><surname>Cunningham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Maynard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bontcheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tablan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 40 th Anniversary Meeting of the Association for Computational Linguistics</title>
				<meeting>40 th Anniversary Meeting of the Association for Computational Linguistics<address><addrLine>Phil, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mining Web Sites Using Unsupervised Adaptive Information Extraction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dingli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Guthrie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wilks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 10th Conf. of the European Chapter of the Association for Computational Linguistics</title>
				<meeting>10th Conf. of the European Chapter of the Association for Computational Linguistics<address><addrLine>Budapest, Hungary</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">S-CREAM -Semi Automatic Creation of Metadata</title>
		<author>
			<persName><forename type="first">S</forename><surname>Handschuh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic Authoring, Annotation and Markup Workshop, 15 th European Conf. Artificial Intelligence</title>
				<meeting><address><addrLine>France, Lyon</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Proper Words in Proper Places: The Thesaurus of Geographic Names</title>
		<author>
			<persName><forename type="first">P</forename><surname>Harpring</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">MDA Info</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Geographic Names. The Implementation of a Gazetteer in a Georeferenced Digital Library</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Frew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Digital Library Magazine</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">H</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Millard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shadbolt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Weal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Semantic Authoring, Annotation &amp; Knowledge Markup, 15 th Europ. Conf. on Artificial Intelligence</title>
				<meeting><address><addrLine>France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bootstrapping an Ontology-based Information Extraction System</title>
		<author>
			<persName><forename type="first">A</forename><surname>Maedche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">telligent Exploration of the Web</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Szczepaniak</surname></persName>
		</editor>
		<meeting><address><addrLine>Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Tracking and Summarizing News on a Daily Basis with Columbia&apos;s Newsblaster</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Mckeown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Hatzivassiloglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Klavans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nenkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sable</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schiffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sigelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Human Language Technology Conf</title>
				<meeting>Human Language Technology Conf<address><addrLine>San Diego, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Auld Leaky: A Contextual Open Hypermedia Link Server</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Michaelides</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Millard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Weal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Deroure</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 7 th Hypermedia: Openness, Structural Awareness, and Adaptivity</title>
				<meeting>7 th Hypermedia: Openness, Structural Awareness, and Adaptivity<address><addrLine>Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="59" to="70" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Generating Adaptive Hypertext Content from the Semantic Web</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Millard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Weal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Deroure</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shadbolt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">1st International Workshop on Hypermedia and the Semantic Web, HyperText&apos;03</title>
				<meeting><address><addrLine>Nottingham, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Introduction to wordnet: An on-line lexical database</title>
		<author>
			<persName><forename type="first">G</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Beckwith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fellbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Lexicography</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="235" to="312" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Deriving a multi-domain information extraction system from a rough ontology</title>
		<author>
			<persName><forename type="first">T</forename><surname>Poibeau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 17 th Int. Conf. on Artificial Intelligence</title>
				<meeting>17 th Int. Conf. on Artificial Intelligence<address><addrLine>Seattle. USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Generating natural language summaries from multiple on-line sources</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Radev</surname></persName>
		</author>
		<author>
			<persName><forename type="middle">K R</forename><surname>Mckeown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="469" to="500" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Cross document annotation for multimedia retrieval</title>
		<author>
			<persName><forename type="first">D</forename><surname>Reidsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kuper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cunningham</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EACL Workshop on Language Technology and the Semantic Web</title>
				<meeting><address><addrLine>Budapest</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">An Annotation Framework for the Semantic Web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maedche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Handschuh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 1 st Int. Workshop on MultiMedia Annotation</title>
				<meeting>1 st Int. Workshop on MultiMedia Annotation<address><addrLine>Tokyo</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Knowledge Extraction by using an Ontology-based Annotation Tool</title>
		<author>
			<persName><forename type="first">M</forename><surname>Vargas-Vera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Domingue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Buckingham Shum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lanzoni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Workshop on Knowledge Markup &amp; Semantic Annotation, 1 st Int. Conf. on Knowledge Capture</title>
				<meeting>Workshop on Knowledge Markup &amp; Semantic Annotation, 1 st Int. Conf. on Knowledge Capture<address><addrLine>Victoria, B.C., Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="5" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup</title>
		<author>
			<persName><forename type="first">M</forename><surname>Vargas-Vera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Domingue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lanzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stutt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">13 th Int. Conf. on Knowledge Engineering and Management (EKAW)</title>
				<meeting><address><addrLine>Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Using WordNet for Text Retrieval</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Voorhees</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WordNet: An Electronic Lexical Database</title>
				<editor>
			<persName><surname>Fellbaum</surname></persName>
		</editor>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="285" to="303" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Multidocument Summarization via Information Extraction</title>
		<author>
			<persName><forename type="first">M</forename><surname>White</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Korelsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cardie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pierce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wagstaff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Human Language Technology Conf. (HLT 2001)</title>
				<meeting>of Human Language Technology Conf. (HLT 2001)<address><addrLine>San Diego, CA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
