<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using Information Extraction Rules for Extending Domain Ontologies -Position Statement for the IJCAI-2001 Workshop on Ontology Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Michael</forename><surname>Sintek</surname></persName>
							<email>sintek@dfki.de</email>
							<affiliation key="aff0">
								<orgName type="department">German Research Center for Artificial Intelligence (DFKI</orgName>
								<orgName type="laboratory">) -Knowledge Management Group</orgName>
								<address>
									<postBox>P.O. Box 2080</postBox>
									<postCode>D-67608</postCode>
									<settlement>Kaiserslautern</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Markus</forename><surname>Junker</surname></persName>
							<email>junker@dfki.de</email>
							<affiliation key="aff0">
								<orgName type="department">German Research Center for Artificial Intelligence (DFKI</orgName>
								<orgName type="laboratory">) -Knowledge Management Group</orgName>
								<address>
									<postBox>P.O. Box 2080</postBox>
									<postCode>D-67608</postCode>
									<settlement>Kaiserslautern</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ludger</forename><surname>Van Elst</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">German Research Center for Artificial Intelligence (DFKI</orgName>
								<orgName type="laboratory">) -Knowledge Management Group</orgName>
								<address>
									<postBox>P.O. Box 2080</postBox>
									<postCode>D-67608</postCode>
									<settlement>Kaiserslautern</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andreas</forename><surname>Abecker</surname></persName>
							<email>aabecker@dfki.de</email>
							<affiliation key="aff0">
								<orgName type="department">German Research Center for Artificial Intelligence (DFKI</orgName>
								<orgName type="laboratory">) -Knowledge Management Group</orgName>
								<address>
									<postBox>P.O. Box 2080</postBox>
									<postCode>D-67608</postCode>
									<settlement>Kaiserslautern</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using Information Extraction Rules for Extending Domain Ontologies -Position Statement for the IJCAI-2001 Workshop on Ontology Learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">7C61215660B46ED6E83AA27BDEA1A29E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the FRODO project <ref type="bibr" target="#b0">[1]</ref> we aim at the development of a "Framework for Distributed Organizational Memories" (OMs). We start with the observation that knowledge and expertise is always heavily distributed in an organization. We accept the fact that this is not an intermediary, imperfect state which should be overcome by a central, ontologically structured information system, but rather a natural and meaningful situation (because during the introduction of OM systems it is normal to start with small, focussed systems which should interoperate later; because much expertise is better to be created, hold, and maintained locally; or because in the case of interorganizational collaborations or virtual teams a deeper integration of information systems cannot be achieved).</p><p>Hence, a main goal of the FRODO project is to develop a scalable, extensible OM middleware built for easy integration of new components and linking of collaborating components <ref type="bibr" target="#b1">[2]</ref>. FRODO builds upon the KnowMore framework for contextually-aware, ontology-based OMs <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, but relaxes some constraints of the original model, especially the idea of a centralized OM using one overall set of organizational ontologies.</p><p>Besides the technical provisions for such a distributed, highly dynamic environment, we lay special emphasis on considerations and methods which are necessary to realize such a scenario in industrial practice. In each industrial environment, besides the questions of smooth introduction of new technology regarding human factors and organizational processes, and besides the question of modeling tools and method support for knowledge (in particular ontologies for structuring OMs or parts of OMs) acquisition, at least two other factors are of utmost importance:</p><p>One is the predominance of informal, i.e. essentially textbased, representations of knowledge. This is not only just a matter of fact, but really useful, because the cost of formalization is often not in the right relation to the potential benefits such that many informal parts of the scenario are economically reasonable <ref type="bibr" target="#b4">[5]</ref>. One implication is that also methods for building formal models must be affordable.</p><p>The other is the fact that ontologies are not a stand-alone component built once and then remaining untouched, but a living element in the overall scenario, used for different purposes, communicating with other system parts, and representing knowledge about a continuously changing world <ref type="bibr" target="#b9">[10]</ref>. These two assumptions lead to two characteristics of our approach: ¯Learning ontological information from text documents should be a main component of the overall scenario. We set the goal already in <ref type="bibr" target="#b2">[3]</ref>. In the meanwhile we sketched a method for business-process oriented knowledge modeling in the company, realized as an amalgamation of the CommonKADS <ref type="bibr" target="#b5">[6]</ref> and the IDEF5 <ref type="bibr" target="#b6">[7]</ref> suites of methods <ref type="bibr" target="#b1">[2]</ref>. We build upon the Protégé-2000 knowledge acquisition and modeling tool <ref type="bibr" target="#b7">[8]</ref> which we extended already by some modules for modeling, reasoning, and visualization (see <ref type="bibr" target="#b0">[1]</ref>). We are currently working on an integration of the MindAccess(r) commercial <ref type="bibr" target="#b8">[9]</ref> text analysis workbench which employs a numberof statistical document feature extraction and document analysis functionalities.</p><p>¯In order to cope with the complexity and dynamics of real-world usage scenarios for ontologies in a distributed OM, we develop a methodological framework for understanding and organizing the roles, responsibilities, rights, and obligations of actors constituting an ontology society in a complex, agent-based OM system <ref type="bibr" target="#b9">[10]</ref>.</p><p>In the IJCAI-01 "Ontology Learning" workshop we would like to discuss primarily an approach for extending the above statistically-oriented learning techniques towards a more knowledge-based one using an ILP (Inductive Logic Programming <ref type="bibr" target="#b10">[11]</ref>) algorithm which can use more elaborated document models and can cope with different sources of sophisticated background knowledge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Ontology Learning with Information Extraction Rules</head><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates the overall idea of building ontologies with learned information extraction rules. We start with:</p><p>1. An initial, hand-crafted seed ontology of reasonable quality which contains already the relevant types of relationships between ontology concepts in the given domain.</p><p>2. An initial set of documents which exemplarily represent (informally) substantial parts of the knowledge represented formally in the seed ontology. ¯My headache was cured by medication with Aspirin. ¯Sue's headache was addressed with acupuncture. ¯Cancer can be treated with chemotherapy. ¯Cancer is often treated with surgery.</p><p>Our main idea is that, (i) given such texts are available which explain the ontological knowledge, and (ii) given these texts are sufficiently similar with respect to the question how similar factual statements are textually represented, it should be possible: 1. To take the pairs of (ontological statement, one or more textual representations) as positive examples for the way how specific ontological statements can be reflected in texts. There are two possibilities to extract such examples: ¯Based on the seed ontology, the system looks up the signature of a certain relation (e.g., R links a Disease with a Cure), searches all occurrences of instances of the concept classes Disease and Cure, respectively, within a certain maximum distance, and regards these co-occurrences as positive examples for relationship R. This approach presupposes that the seed documents have some "definitional" character, like domain specific lexica or textbooks. ¯The user goes through the seed documents with a marker and manually highlights all interesting pas-sages as instances of some relationship. This approach is more work-intensive, but promises faster learning and more precise results. We employed this approach already successfully in an industrial information extraction project <ref type="bibr" target="#b11">[12]</ref>.</p><p>2. Employ a pattern learning algorithm to automatically construct information extraction rules which abstract from the specific examples, thus creating general statements which text patterns are an evidence for a certain ontological relationship. In the example above, such an information extraction rule could have the form:</p><p>In order to detect an instance of the "Method B is a possible Cure for Disease A" relationship, search for an instance of the concept Disease, look whether there is a synonym of the word (stem) "treat" in a distance of at most two words, search for the word "with" in a distance of at most two words, directly followed by an instance of the concept Cure. In order to learn such information extraction rules, we need some prerequisites:</p><p>(a) A sufficiently detailed representation of documents (in particular, including word positions, which is not usual in conventional, vector-based learning algorithms, WordNet-synsets, and part-of-speech tagging). (b) A sufficiently powerful representation formalism for extraction patterns. (c) A learning algorithm which has direct access to background knowledge sources, like the already available seed ontology containing statements about known concept instances, or like the WordNet database of lexical knowledge linking words to their synonyms sets, giving access to suband superclasses of synonym sets, etc.</p><p>In <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref> we present an ILP-like rule learner specifically adapted to the task of pattern-based text classification (which can be solved with the same methods as the information extraction task used in the ontology learning application) which fulfills these requirements. In particualar, this rule learner relies on a document representation in which the order of words is preserved. Thus, learned text patterns can test on the order and distance of specific words. In <ref type="bibr" target="#b15">[16]</ref> it is shown how its implementation concepts can be mapped to standard ILP approaches, which shows how its expressive power with respect to pattern representation can even be extended towards full LP formalisms including recursive rules. In <ref type="bibr" target="#b14">[15]</ref> we elaborate a bit on the integration of background knowledge sources, especially WordNet.</p><p>3. Apply these learned information extraction rules to other, new text documents to discover new or not yet formalized instances of relationship R in the given application domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Status</head><p>The algorithm described has not yet been implemented and tested. However, all required prerequisites are available as described above and in <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>. Further, we are in contact with several application projects (in the nuclear and the chemical industry) in order to get significant test data. A critical factor for the success of the approach will be the question of how typical the textual representations of specific (kinds of) statements will be in the seed documents.</p><p>Compared to other ontology learning approaches it should be noted that our technique is not restricted to learning taxonomic relationships, but arbitrary relationships in an application domain. We expect that, in contrast to more statistically oriented approaches, which tend to result in too many candidate results (because of many possibly relevant word co-occurences), our approach needs more input and assumes more prerequisites, but found relationship candidates will be correct with a higher probability.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overall approach for ontology learning with information extraction rules</figDesc><graphic coords="2,150.48,62.53,311.11,233.16" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://www.dfki.uni-kl.de/frodo/" />
		<title level="m">FRODO project</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">FRODO: A Framework for Distributed Organizations -Milestone M1: Requirements Analysis and System Architecture</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Elst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Maus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schwarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
		<idno>D-01-01</idno>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
	<note type="report_type">DFKI Document</note>
	<note>In preparation. Partially in German</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Towards a Technology for Organizational Memories</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hinkelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kühn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="1998-05">1998. May/June</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Context-Aware, Proactive Delivery of Task-Specific Knowledge: The KnowMore Project</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hinkelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kühn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Information System Frontiers</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">3/4</biblScope>
			<date type="published" when="2000">2000</date>
			<publisher>Kluwer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Balancing Formality with Informality: User-Centred Requirements for Knowledge Management Technologies</title>
		<author>
			<persName><forename type="first">S</forename><surname>Buckingham Shum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium on Artificial Intelligence in Knowledge Management</title>
				<meeting><address><addrLine>Palo Alto, CA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="1997">1997</date>
			<biblScope unit="volume">97</biblScope>
		</imprint>
		<respStmt>
			<orgName>Stanford University</orgName>
		</respStmt>
	</monogr>
	<note>AIKM&apos;</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Knowledge Engineering and Management: The CommonKADS Methodology</title>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Akkermans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anjeiwerden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>De Hoog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shadbolt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Van De Velde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wielinga</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="http://www.idef.com/" />
		<title level="m">IDEF5 Method Report</title>
				<imprint>
			<date type="published" when="1994">1994</date>
		</imprint>
		<respStmt>
			<orgName>Information Integration for Concurrent Engineering</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Knowledge Modeling at the Millennium (The Design and Evolution of Protege</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Grosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Eriksson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Fergerson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Gennari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Musen</surname></persName>
		</author>
		<idno>SMI-1999-0801</idno>
		<ptr target="URL:protege.stanford.edu" />
		<imprint>
			<date type="published" when="1999">1999. 2000</date>
		</imprint>
		<respStmt>
			<orgName>Stanford Medical Lab</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="http://www.im-insiders.de/html/infomaterial.html.InGerman" />
		<title level="m">Insiders information management GmbH</title>
				<meeting><address><addrLine>Kaiserslautern</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
	<note>MindAccess product description</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Ontology-Related Services in Agent-Based Distributed Information Infrastructures</title>
		<author>
			<persName><forename type="first">L</forename><surname>Van Elst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Submitted to: SEKE&apos;01, The Thirteenth International Conference on Software Engineering &amp; Knowledge Engineering</title>
				<meeting><address><addrLine>Buenos Aires -Argentina</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001-06-13">2001. June 13-15, 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Inductive Logic Programming: Techniques and Applications</title>
		<author>
			<persName><forename type="first">N</forename><surname>Lavrac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dzeroski</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1994">1994</date>
			<publisher>Ellis Horwood</publisher>
			<pubPlace>Chichester, UK</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="http://www.dfki.de/pas/f2w.cgi?daimc/annoclass-e" />
		<title level="m">ANNOCLASS project description</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Heuristisches Lernen von Regeln für die Textkategorisierung</title>
		<author>
			<persName><forename type="first">M</forename><surname>Junker</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
		<respStmt>
			<orgName>Universität Kaiserslautern</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Dissertation. Fachbereich Informatik</note>
	<note>In German</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning Complex Pattern for Document Categorization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Junker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI-98/ICML Workshop on Learning for Text Categorization</title>
				<meeting><address><addrLine>Madison, Wisconsin, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Exploiting Thesaurus Knowledge in Rule Induction for Text Classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Junker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abecker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">RANLP&apos;97 -Recent Advances in NLP</title>
				<meeting><address><addrLine>Tzigov Chark, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="202" to="207" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Learning for Text Categorization and Information Extraction with ILP</title>
		<author>
			<persName><forename type="first">M</forename><surname>Junker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rinck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Learning Language in Logic</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1925">2000. 1925</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
