<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Open Information Extraction on German Wikipedia Texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christian</forename><surname>Klose</surname></persName>
							<email>christian.klose@fau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Technical Information Systems</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg</orgName>
								<address>
									<addrLine>Lange Gasse 20</addrLine>
									<postCode>90403</postCode>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zhou</forename><surname>Gui</surname></persName>
							<email>zhou.gui@fau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Technical Information Systems</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg</orgName>
								<address>
									<addrLine>Lange Gasse 20</addrLine>
									<postCode>90403</postCode>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andreas</forename><surname>Harth</surname></persName>
							<email>andreas.harth@fau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Technical Information Systems</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg</orgName>
								<address>
									<addrLine>Lange Gasse 20</addrLine>
									<postCode>90403</postCode>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Open Information Extraction on German Wikipedia Texts</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BDE4B2D95209E95C2CEACF1193A25DD3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:05+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Open Information Extraction</term>
					<term>Natural Language Processing</term>
					<term>Knowledge Graph Construction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Knowledge Graphs are becoming a fundamental building block for semantic search and voice assistants. This paper deals with the automated Knowledge Graph Construction from unstructured data. Predominantly, the focus is on Open Information Extraction (Open IE), an unsupervised learning approach that attempts to extract triples from plain text independent of their domain. Hence, it is the first step towards automated Knowledge Graph Construction. Previous work mainly applied Open IE to English texts. In this paper, the focus is on German texts. Due to the lack of German Open Information Extraction datasets, a dataset on the basis of Wikipedia is created. Two Open Information Extraction Systems for German are introduced. Finally, the performance of the systems are evaluated.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In his vision of the Semantic Web <ref type="bibr" target="#b0">[1]</ref>, Tim Berners-Lee described a change from the Web of documents for and by people to a Web of information. According to his vision, information on the Web should not only be manipulable by humans, but also by machines. Most documents in the World Wide Web consist to a large extent of text and are still difficult for machines to process today. For this reason, the W3C 1 has developed a universal language Resource Description Framework (RDF), which makes information for machines on the Web accessible. Information in RDF can be serialized in multiple formats. One common format is Turtle. Turtle is a text representation of an RDF Graph which allows to store RDF triples in a compact and human readable form. A large collection of RDF Graphs in a specific domain can construct a Knowledge Graph (KG). Virtual assistants in particular can make use of facts, events and abstract concepts stored in Knowledge Graphs to bring insights to people during semantic search or question answering.</p><p>In order to build Knowledge Graphs, knowledge can be extracted in the form of triples from documents that are available in natural language. The transformation from text into a machine readable form is, therefore, a core task for building Knowledge Graphs. It can be broadly summarized as the goal of Machine Reading <ref type="bibr" target="#b1">[2]</ref>. In the field of AI, Machine Reading is a long standing goal and is discussed in the research community under the term Information Extraction (IE). IE includes downstream tasks such as Named Entity Recognition (NER), Relation Extraction (RE) or Entity Linking (EL). In recent years, an unsupervised approach to Relation Extraction, namely Open Information Extraction, has shown promising results and is therefore, the main subjective of this paper. The paper is structured as follows: In Section 2 the previous work is outlined. Thereafter, in Section 3, the scientific approach applied for our research is described. In Section 4 the results are presented and discussed. Finally, in Section 5 a conclusion is drawn.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Open Information Extraction is about extracting all possible triples from text, without knowing the relations or entities occurring in it a priori. The first Open IE system ever created is Text Runner <ref type="bibr" target="#b2">[3]</ref>, a learning-based system developed by a group of researchers at the University of Washington. In the years to follow, other systems were introduced, each attempting to improve on the results of the state-of-the-art by overcoming identified weaknesses and flaws in the systems. In addition, other types than learning-based system emerged, namely rulebased, clause-based and systems making use of inter-proposition relationships <ref type="bibr" target="#b3">[4]</ref>. Rule-based systems entirely depend on hand-crafted rules or patterns. Systems that make use of this approach are, for example, KrakeN <ref type="bibr" target="#b4">[5]</ref> or Exemplar <ref type="bibr" target="#b5">[6]</ref>. In order to improve the precision of the systems described above, the idea of breaking down complex sentences into smaller components (clauses) came up. Two Clause-based Systems in particular are worth mentioning: ClausIE <ref type="bibr" target="#b6">[7]</ref> and Stanford Open IE <ref type="bibr" target="#b7">[8]</ref>. The system types mentioned so far have one common weakness. None of them is using the context and, therefore, a correct extraction cannot be guaranteed. Inter-Proposition-based Systems are trying to bridge this gap. Systems of this category, for example, are RelNoun <ref type="bibr" target="#b8">[9]</ref>, OpenIE4 <ref type="bibr" target="#b9">[10]</ref>, NestIE <ref type="bibr" target="#b10">[11]</ref> and MinIE <ref type="bibr" target="#b11">[12]</ref>. One of the first to apply neural networks to Open IE were <ref type="bibr" target="#b12">[13]</ref> with RnnOIE. The scientists formulated the problem as a sequence labeling task. Recently proposed models that follow a sequence labeling approach are SenseOIE <ref type="bibr" target="#b13">[14]</ref>, SpanOIE <ref type="bibr" target="#b14">[15]</ref> and iRankOIE <ref type="bibr" target="#b15">[16]</ref>. A downside of this discovered by <ref type="bibr" target="#b16">[17]</ref> is, however, that sequence labeling models are not able to change the sentence structure or use new auxiliary words in the extraction. <ref type="bibr" target="#b17">[18]</ref> used a different neural approach called sequence generation to develop CopyAttention and overcome that downside. Furthermore, <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b16">17]</ref> describe end-to-end approaches using seq2seq models based on the encoder-decoder framework alleviating the downsides and the need for hand-crafted patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Research Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset Creation</head><p>One of the main challenges within Open IE is to verify the quality of extractions made by the system. A solution to this is to have a dataset, that is to find qualitative training and testing data where triples are mapped to sentences. Currently, two approaches to automatically generate training data are considered particularly useful among researchers. First, the infobox-matching approach <ref type="bibr" target="#b20">[21]</ref> where Wikipedia infobox values are linked to sentences in the corpus and second, the distantly supervised approach <ref type="bibr" target="#b21">[22]</ref> where existing knowledge bases are used to heuristically map triples to sentences. For the creation of the German Wikipedia dataset, the infobox-matching approach is used. In total, 5 steps were executed to create a clean dataset including 1) finding and downloading a Wikipedia dataset 2) prepossessing and cleaning the text 3) matching all infobox triples to the correspond page text, 4) matching the triples on a sentence level and 5) filtering out noisy training examples. After the last step, a number of 6, 453 triples mapped to 5, 372 sentences containing 1, 324 relation types was derived. Furthermore, the average number of words used in each part was calculated. On average, the subject has 2.0, the predicate 1.0 and the object 1.5 words. Last but not least, the average length of a sentence was computed and amounts to 22.8 words. The dataset and the code used to create the dataset are published and freely accessible.<ref type="foot" target="#foot_0">2</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Open IE Systems</head><p>In total, two systems were implemented and used for our research. The first system is turCy and we implemented turCy as a spaCy<ref type="foot" target="#foot_1">3</ref> pipeline component to leverage the POS Tagger and Dependency Parser (DP). The second system uses an encoder-decoder seq2seq neural model that we call NeuralGerOIE.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">TurCy</head><p>Research by <ref type="bibr" target="#b4">[5]</ref> and <ref type="bibr" target="#b22">[23]</ref> implies that with a decent amount of POS and DP patterns, a large variety of triples can be extracted -independently of any other constraints. TurCy is following a similar approach. In fact, it is a pattern learning system for binary extractions that is assembled of two essential components. The first is the Pattern Builder, the second is the Triple Extractor. A pattern consists of nodes that represent the POS-tags of a sentence. The relations between the nodes reflect the dependencies parse tree. A pattern with respect to a sentence can be used to represent exactly one triple. The pattern itself consists of subpatterns. Each subpattern represents a node in a tree and maps a word with left and right child nodes. For the sentence: "Im Jahr 2019 zählte Nürnberg 518370 Bewohner." the POS and dependency tree is shown in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>The Triple Extractor of turCy is -at its core -a recursive sub-tree search using the Pattern List. The algorithm starts at the root node, traverses each path up to the leaf of the tree and checks whether a node and its edge match a sub-pattern of a respective pattern. If all sub-patterns match, the triple is assembled and stored with respect to a sentence. Notice, a match implies that at last one token of all three parts (subject, predicate, object) of a triple were found during the recursive sub-tree search in the sentence. However, the true nature of the algorithm is more complicated. If the interested reader wants to fully understand the working mechanism, we recommend diving into the code. Therefore, and also to ensure full transparency of our research, turCy has been packaged as a python library and is released under an open-source license.<ref type="foot" target="#foot_2">4</ref>   <ref type="foot" target="#foot_3">5</ref> The latest state-of-the-art Open IE systems use seq2seq neural networks. Therefore, the second system developed follows the sequence generation approach. and was created using the Simple Transformers<ref type="foot" target="#foot_4">6</ref> library. For training of the model, the WikiGerman4OIE dataset (3.1) was utilized. In addition, a pre-trained BART model for German was used. <ref type="foot" target="#foot_5">7</ref> It is important to mention that the model was trained to output multiple extractions within the scope of one subject.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">NeuralGerOIE</head><p>We fine-tuned the model for 10 epochs using a batch size of 8. The maximum length of the input sentence was set to 300, i.e. all words thereafter were truncated. A difference between the seq2seq models discussed earlier and the approach described here, is the type of separator used. While <ref type="bibr" target="#b17">[18]</ref> used start and end tags for each part of a triple (&lt;arg1&gt; Deep Learning &lt;/arg1&gt;&lt;rel&gt; is a subfield of &lt;/rel&gt;&lt;arg2&gt; Machine Learning &lt;/arg2&gt;), we found that one token before each part along with a final end token was sufficient. In addition, the model struggled to output the same separator token multiple times within a sequence. Therefore, we added a number to each separator token. Lastly, we noticed that the names of the separator tokens affected the quality of the outputs. We proceeded with the following triple input: &lt;sub&gt; Deep Learning &lt;rel0&gt; is a subfield of &lt;obj0&gt; Machine Learning &lt;end&gt;. In total, the fine-tuning took about 2 hours 25 minutes on a Nvidia Tesla V100 32GB GPU to complete. The results are discussed in the subsequent section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Each of the Open IE systems is evaluated against a gold dataset. The reason for high-quality annotations originates from the need to obtain more accurate insights regarding the quality of the extractions. Therefore, a subset of the WikiGerman4OIE dataset (3.1) was annotated by two of the authors. In total, the gold dataset consists of 47 sentences and 175 triples. On average, a sentence contains 3.8 triples. This subset is the basis for the evaluation process. Regarding the quantitative metrics, precision, recall, and 𝐹 1 -score is computed using the token-based evaluation method introduced by <ref type="bibr" target="#b23">[24]</ref> in a slightly adjusted manner to fit the systems outputs. The evaluation process is as follows: First, the full WikiGermanOIE dataset for turCy-large and the gold dataset for turCy-small was used to create the patterns with the Pattern Builder. The result were two pattern lists with sizes of 6,453 and 175 patterns, respectively, corresponding to the number of triples in each dataset. Second, the 47 sentences were fed as the only input into the Triple Extractor and the NeuralGerOIE prediction function. Lastly, precision, recall and 𝐹 1 -score were calculated. In general, we found that turCy-small achieves a better 𝐹 1 -score as NeuralGerOIE due to its ability to extract many triples. At the same time, we noticed that the NeuralGerOIE obtained a very high precision for sentences annotated with a single triple. A comparison between turCy-small and turCy-large (the only difference is in the number of patterns and their dataset of origin) indicates that, the quality of the automatically generated dataset is lower than initially expected. The reason for this assumption is that while trucy-large contains a high number of patterns build from the automatically generated dataset, only 29 triples were extracted. TurCy-small, on the other hand, yielded significantly more extractions with a lower number of patterns created from the gold dataset. In fact, the result is very counter-intuitive, as one would expect the number of extractions to be linearly correlated with the number of patterns. Moreover, when comparing turCy and NeuralGerOIE, we found that while turCy can only output words from the text in the extractions, the neural model can learn a direct representation between the words used in the text and the corresponding words in the gold dataset.</p><p>In addition to the quality analysis, we examined the run-time as Open IE systems are intended to process large amounts of text data at rapid pace. In order to make a judgment about the run-time, basically two metrics are taken into consideration. First, is the number sentences processed per second (# sent./sec.). Second, is the number of triples yielded per second (# triples/sec.). Furthermore, the ratio of these two metrics with respect to the number of stored patterns is of interest. Table <ref type="table" target="#tab_1">2</ref> shows that turCy-small is the best performing system, followed by Neural-GerOIE. As expected, the number of patterns has a major impact on the run-time -as we can see with turCy-large -leading to the conclusion that one of the main objectives of rule-based Open IE systems is to keep the number of patterns as small as possible, but as large as necessary to maximize the number of extractions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion &amp; Outlook</head><p>In this paper the contribution is twofold. First, a dataset for Open Information Extraction based on German Wikipedia texts were created and published. Second, two different approaches for Open IE were implemented and evaluated. Several interesting research directions for future works can be recommended. We firmly believe that there is still potential for improvement in terms of dataset quality and quantity. For instance, in order to improve the quality, a crowd-sourcing platform such as Amazon Mturk could be leveraged. In addition, the distantly supervised approach for automated training data generation can be explored. In doing so, it would also help to determine what impact the applied approach for automated training data generation has on the quality of extractions made by the Open IE system. Finally, the two Open IE systems can be further optimized to yield better results. For example, for the rule-based system turCy, tree pruning approaches can be explored to reduce the overall number of patterns and therefore, improve the extraction speed. Regarding NeuralGerOIE multi-subject extractions, different architectures and the utilization of more recent language models might be considered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgments</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: POS-DP-Tree or pattern representation of a simple sentence. The colored nodes are the parts of the triple in the pattern, with orange being the subject(s), purple being the predicate(s), and blue being the object(s). All other nodes are colored in green.</figDesc><graphic coords="4,108.88,84.19,375.03,208.42" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>System's accuracy analysis using default metrics.</figDesc><table><row><cell>System</cell><cell cols="6"># Patterns # Extractions # Matches Prec. Recall F1</cell></row><row><cell>turCy-large</cell><cell>6,453</cell><cell>29</cell><cell>21/175</cell><cell>0.71</cell><cell>0.11</cell><cell>0.19</cell></row><row><cell>turCy-small</cell><cell>175</cell><cell>114</cell><cell>94/175</cell><cell>0.83</cell><cell>0.52</cell><cell>0.64</cell></row><row><cell>NeuralGerOIE</cell><cell>-</cell><cell>52</cell><cell>40/175</cell><cell>0.72</cell><cell>0.30</cell><cell>0.43</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Run-time comparison between RE and Open IE.</figDesc><table><row><cell># System</cell><cell cols="3"># Patterns # sent. /sec. # triples /sec.</cell></row><row><cell>turCy-large</cell><cell>6,453</cell><cell>0.47</cell><cell>0.71</cell></row><row><cell>turCy-small</cell><cell>175</cell><cell>1.12</cell><cell>2.7</cell></row><row><cell>neuralGerOIE</cell><cell>-</cell><cell>1.14</cell><cell>1.14</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://github.com/ChrisDelClea/WikiGerman4OIE</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Library for advanced natural language processing: https://spacy.io</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://github.com/ChrisDelClea/turCy</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://github.com/ChrisDelClea/NeuralGermanOIE</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://simpletransformers.ai/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://huggingface.co/Shahm/bart-german</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research paper was created within the scope of the project: Software Campus 2.0 (FAU) Grant number 01IS17045. The project was funded by the German government, therefore, we would kindly thank them for their sponsorship.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The semantic web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lassila</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific American</title>
		<imprint>
			<biblScope unit="page" from="34" to="43" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Machine reading</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cafarella</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Open information extraction from the web</title>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Broadhead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI International Joint Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="2670" to="2676" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Niklaus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cetto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Freitas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Handschuh</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.05599</idno>
		<title level="m">A survey on open information extraction</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">KrakeN: N-ary facts in open information extraction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akbik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Löser</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/W12-3010" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)</title>
				<meeting>the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="52" to="56" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Mesquita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Barbosa</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/D13-1043" />
		<title level="m">Effectiveness and efficiency of open relation extraction</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="447" to="457" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Clausie: clause-based open information extraction</title>
		<author>
			<persName><forename type="first">L</forename><surname>Del Corro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd international conference on World Wide Web</title>
				<meeting>the 22nd international conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="355" to="366" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Leveraging linguistic structure for open domain information extraction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J J</forename><surname>Premkumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
		<title level="s">Long Papers</title>
		<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="344" to="354" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Demonyms and compound relational nouns in nominal open ie</title>
		<author>
			<persName><forename type="first">H</forename><surname>Pal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on Automated Knowledge Base Construction</title>
				<meeting>the 5th Workshop on Automated Knowledge Base Construction</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="35" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Open information extraction systems and downstream applications</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mausam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the twenty-fifth international joint conference on artificial intelligence</title>
				<meeting>the twenty-fifth international joint conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="4074" to="4077" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Nested propositions in open information extraction</title>
		<author>
			<persName><forename type="first">N</forename><surname>Bhutani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jagadish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Radev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="55" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Minie: minimizing facts in open information extraction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gashteovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Corro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Supervised open information extraction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stanovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dagan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="885" to="895" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Supervising unsupervised open information extraction models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="728" to="737" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Span model for open information extraction on accurate corpus</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="9523" to="9530" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1905.13413</idno>
		<title level="m">Improving open information extraction via iterative rank-aware learning</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Kolluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Rathore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chakrabarti</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.08178</idno>
		<title level="m">Imojie: Iterative memory-based joint open information extraction</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.04270</idno>
		<title level="m">Neural open information extraction</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Rebel: Relation extraction by end-to-end language generation</title>
		<author>
			<persName><forename type="first">P.-L</forename><forename type="middle">H</forename><surname>Cabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2021</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2370" to="2381" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Kolluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Adlakha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chakrabarti</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.03147</idno>
		<title level="m">Openie6: Iterative grid labeling and coordination analysis for open information extraction</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Open information extraction using wikipedia</title>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 48th annual meeting of the association for computational linguistics</title>
				<meeting>the 48th annual meeting of the association for computational linguistics</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="118" to="127" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Distant supervision for relation extraction without labeled data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mintz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Snow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/P09-1113" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</title>
				<meeting>the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1003" to="1011" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Open language learning for information extraction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mausam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schmitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Bart</surname></persName>
		</author>
		<author>
			<persName><surname>Etzioni</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/D12-1048" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</title>
				<meeting>the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="523" to="534" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Wire57: A fine-grained benchmark for open information extraction</title>
		<author>
			<persName><forename type="first">W</forename><surname>Lechelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Langlais</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="6" to="15" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
