<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Integrating Open and Closed Information Extraction: Challenges and First Steps</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Arnab</forename><surname>Dutta</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Research Data and Web Science</orgName>
								<orgName type="institution">University of Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Mathias</forename><surname>Niepert</surname></persName>
							<email>mniepert@cs.washington.edu</email>
							<affiliation key="aff1">
								<orgName type="department">Computer Science and Engineering</orgName>
								<orgName type="institution">University of Washington</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christian</forename><surname>Meilicke</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Research Data and Web Science</orgName>
								<orgName type="institution">University of Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simone</forename><forename type="middle">Paolo</forename><surname>Ponzetto</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Research Data and Web Science</orgName>
								<orgName type="institution">University of Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Integrating Open and Closed Information Extraction: Challenges and First Steps</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">17E0A0D8FC12851C73A2F5BDBBD12067</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Information extraction</term>
					<term>Entity Linking</term>
					<term>Ontologies</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Over the past years, state-of-the-art information extraction (IE) systems such as NELL <ref type="bibr" target="#b4">[5]</ref> and ReVerb [9]  have achieved impressive results by producing very large knowledge resources at web scale with minimal supervision. However, these resources lack the schema information, exhibit a high degree of ambiguity, and are difficult even for humans to interpret. Working with such resources becomes easier if there is a structured information base to which the resources can be linked. In this paper, we introduce the integration of open information extraction projects with Wikipedia-based IE projects that maintain a logical schema, as an important challenge for the NLP, semantic web, and machine learning communities. We describe the problem, present a gold-standard benchmark, and take the first steps towards a data-driven solution to the problem. This is especially promising, since NELL and ReVerb typically achieve a very large coverage, but still still lack a fullfledged clean ontological structure which, on the other hand, could be provided by large-scale ontologies like DBpedia [2] or YAGO <ref type="bibr" target="#b12">[13]</ref>.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Research on information extraction (IE) systems has experienced a strong momentum in recent years. While Wikipedia-based information extraction projects such as DBpedia <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b16">17]</ref> and YAGO <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b12">13]</ref> have been in development for several years, systems such as NELL <ref type="bibr" target="#b4">[5]</ref> and ReVerb <ref type="bibr" target="#b8">[9]</ref> that work on very large and unstructured text corpora have more recently achieved impressive results. The developers of the latter systems have coined the term open information extraction (OIE), to describe information extraction systems that are not constrained by the boundaries of encyclopedic knowledge and the corresponding fixed schemata that are, for instance, used by YAGO and DBpedia. The data maintained by OIE systems is important for analyzing, reasoning about, and discovering novel facts on the web and has the potential to result in a new generation of web search engines <ref type="bibr" target="#b6">[7]</ref>. At the same time, the data of open IE projects would benefit from a corresponding logical schema even if it was incomplete and light-weight in nature. Hence, we believe that the problem of integrating open and schema-driven information extraction projects is a key scientific challenge. In order to integrate existing IE projects we have to overcome a difficult problem of linking different manifestations of the same real world object, or more commonly the task of entity resolution. The fact that makes this task challenging is that triples from such systems are underspecified and ambiguous. Let us illustrate this point with an example triple from Nell where two terms (subject and object) are linked by some relationship (predicate):</p><p>agentcollaborateswithagent(royals, mlb) In this triple, royals and mlb are two terms which are linked by some relation agentcollaborateswithagent. Interpreting these terms is difficult since they can have several meanings, including very infrequent and highly specialized ones, which are sometimes difficult to interpret even for humans. Here, royals refers to the baseball team Kansas City Royals and mlb to Major League Baseball.</p><p>In general, due to the fact that information on the Web is highly heterogeneous, there can be a fair amount of ambiguity in the extracted facts. The problem becomes even more obvious when we encounter triples like: bankbankincountry(royal, ireland) Here, royal refers to a different real-world entity, namely the Royal Bank of Scotland. Hence, it is important to uniquely identify the terms in accordance with the contextual information provided by the entire triple. In this paper, we aim at aligning such polysemous terms from open IE systems to instances from a closed IE system, while focusing on NELL and DBpedia in particular.</p><p>The remainder of the paper is organized as follows, in Section 2 we introduce the information extraction projects relevant to our work. We present our baseline algorithm for finding the best matching candidates for a term in Section 3 and in Section 4 introduce a gold standard for evaluating its performance. In Section 5 we report performance results of the proposed approach. In Section 6 we discuss related work on information extraction and entity linking. Finally, we conclude the paper in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Information Extraction Projects: A Brief Overview</head><p>The Never Ending Language Learning <ref type="bibr" target="#b4">[5]</ref> (Nell) project's objective is the creation and maintenance of a large-scale machine learning system that continuously learns and extracts structured information from unstructured web pages. Its extraction algorithms operate on a large corpus of more than 500 million web pages <ref type="foot" target="#foot_0">1</ref> and not solely on the set of Wikipedia articles. The NELL system was bootstrapped with a small set of classes and relations and, for each of those, 10-15 positive and negative instances. The guiding principle of NELL is to build several semi-supervised machine learning <ref type="bibr" target="#b5">[6]</ref> components that accumulate instances of the classes and relations, re-train the machine learning algorithms with these instances as training data, and re-apply the algorithms to extract novel instances. This process is repeated indefinitely with each re-training and extraction phase called an iteration. Since numerous extraction components work in parallel and extract facts with different degrees of confidence in their correctness, one of the most important aspects of Nell is its ability to combine these different extraction algorithms into one coherent model. This is also accomplished with relatively simple linear machine learning algorithms that weigh the different components based on their past accuracy.</p><p>Nell has been running since 2010, initially fully automated and without any human supervision. Since it has experienced concepts drift for some of its relations and classes, that is, an increasingly worse extraction performance over time, Nell now is given some corrections by humans to avoid this long-term behavior. Nell does not adhere to any of the semantic web standards such as RDF or description logic.</p><p>DBpedia <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b16">17]</ref> is a project that aims at automatically acquiring large amounts of structured information from Wikipedia. It extracts information from infobox templates, categories, geo-coordinates, etc.. However, it does not learn relations from the Wikipedia categories. This template information is mapped to an ontology. In addition, it has a fixed set of classes and relations. Moreover, the ontology is with more than 1000 different relations much broader than other existing ontologies like YAGO <ref type="bibr" target="#b24">[25]</ref> or semantic lexicons like BabelNet <ref type="bibr" target="#b18">[19]</ref>.</p><p>DBpedia represents its data in accordance with the best-practices of publishing linked open data. The term linked data describes an assortment of best practices for publishing, sharing, and connecting structured data and knowledge over the web <ref type="bibr" target="#b1">[2]</ref>. DBpedia's relations are modeled using the resource description framework (RDF), a generic graph-based data model for describing objects and their relationships. The entities in DBpedia have unique URIs. This makes it appropriate as our reference knowledge base to which we can link the terms from Nell. In the case of the examples from Section 1, by linking the terms appropriately to DBpedia, we are able to attach an unambiguous identifier to them which was initially missing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methodology</head><p>Wikipedia is an exhaustive source of unstructured data which has been extensively used to enrich machines with knowledge <ref type="bibr" target="#b14">[15]</ref>. In this work we use Wikipedia as an entity-tagged corpus <ref type="bibr" target="#b3">[4]</ref> in order to bridge knowledge encoded in Nell with DBpedia. Since there is a corresponding DBpedia entity for each Wikipedia article <ref type="bibr" target="#b1">[2]</ref>, we can in fact formulate our disambiguation problem as that of linking entities mentioned within Nell triples to their respective Wikipedia articles. Our problem is that, due to polysemy, often a term from Nell can refer to several different articles in Wikipedia or, analogously, instances in DBpedia. For instance, the term jaguar can refer to several articles such as the car, the animal and so on.</p><p>anchor In this work we accordingly explore the idea of using Wikipedia to find out the most probable article for a given term. Wikipedia provides regular data dumps and there are offthe-shelf preprocessing tools to parse those dumps. We used WikiPrep <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b9">10]</ref> for our purpose. WikiPrep removes redundant information from the original dumps and creates more relevant XML dumps with additional information like the number of pages in each category, incoming links to each Wikipedia article and their anchor text, and a lot more<ref type="foot" target="#foot_2">2</ref> . In our work, we are primarily interested in the link counts, namely the frequency of anchor text labels pointing to the same Wikipedia page. Table <ref type="table" target="#tab_0">1</ref> shows some of the articles the anchors jaguar or lincoln are referring to. Intuitively, out of all the outgoing links from the anchor term jaguar, 1842 links pointed to the article Jaguar Cars and so on. Essentially, these anchors are analogous to the NELL terms. Based on these counts, we create a ranked list of articles for a given anchor <ref type="foot" target="#foot_3">3</ref> .</p><p>As seen in Table <ref type="table" target="#tab_0">1</ref>, the output from WikiPrep can often be a long list of anchor-article pairs and some of them having as low as just one link count. Accordingly, we adopt a probabilistic approach in selecting the best possible DBpedia instance. For any given anchor in Wikipedia, the fraction of articles the links points to is proportional to the probability that the anchor term refers to the particular article <ref type="bibr" target="#b22">[23]</ref>. More formally, suppose some anchor e refers to N articles A 1 , . . . , A N with n 1 , . . . , n N respective links counts, then the conditional probability P of e referring to A j is given by, P (A j |e) = n j / N i=1 n i . We compute the probabilities for each terms we are interested in and from the ranked list of descending P (A j |e), top-k candidates are selected. The choice of k is described in Section 4. We apply this idea on the Nell data set. For each Nell triple, we take the terms occurring as subject and object, and apply the procedure above.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Creating a Gold Standard</head><p>Nell provides regular data dumps<ref type="foot" target="#foot_4">4</ref> consisting of facts learned from the Web. Based on this data we create a frequency distribution over the predicates. To this end, we first clean up the data from the dumps (since these contain additional information, such as, for instance, iteration of promotion, best literal strings, and so on<ref type="foot" target="#foot_5">5</ref> , which are irrelevant to our task). In Table <ref type="table" target="#tab_1">2</ref>, we list the 30 most frequent predicates. Since the gold standard should not be biased towards predicates with many assertions we randomly sampled 12 predicates from the set of predicates with at least 100 assertions (highlighted in bold in the table). In this paper, we focus on this smaller set of predicates due to the time consuming nature of the manual annotations we needed to perform. However, we plan to continuously extend the gold standard with additional predicates in the future.</p><p>For each Nell predicate we randomly sampled 100 triples. We assigned each predicate and the corresponding list of triples to an annotator. Since we wanted to annotate a large number of triples within an acceptable time frame, we first applied the method described in Section 3 to generate possible mapping candidates for the Nell subject and object of each triple. In particular, we generated the top-3 mappings, thereby avoiding generation of too many possible candidates, and presented those candidates to the annotator. Note that in some cases (see Table <ref type="table" target="#tab_3">3</ref>), our method could not determine a possible mapping candidate for a Nell instance. In this case, the triple had to be annotated without presenting a matching candidate for subject or object or both. In our setting, each annotation instance falls under one of the following three cases:</p><p>(i) One of the mapping candidates is chosen as the correct mapping, i.e., the simplest case. (ii) The correct mapping is not among the presented candidates (or no candidates have been generated). However, the annotator can find the correct  mapping after a combined search in DBpedia, Wikipedia or other resources available on the Web. (iii) The annotator cannot determine a DBpedia entity to which the given Nell instance should be mapped. This was the case when the term was too ambiguous, underspecified, or not represented in DBpedia. In this case the annotator marked the instance as unmatchable ('?').</p><p>Table <ref type="table" target="#tab_3">3</ref> shows four possible annotation outcomes for the bookwriter predicate. The first example illustrates case (i) and (ii). The second example illustrates case (i) and (iii). With respect to this example the annotator could not determine the reference for the Nell term gospel. The third example illustrates a special case of (ii) where no mapping candidate has been generated for the Nell term patricia a mckillip. The fourth example shows that the top match generated by our algorithm is not always the correct mapping, but might also be among the other alternatives that have been generated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Evaluation Measures</head><p>In the following, we briefly re-visit the definitions of precision and recall and explain their application in our evaluation scenario. Let A refer to the mappings generated by our algorithm, and G refer to mappings in the gold standard. Precision is defined as prec(A, G) = |A ∩ G|/|A| and recall as rec(A, G) = |A ∩ G|/|G|. The F 1 measure is the equally weighted harmonic mean of both values, i.e.,</p><formula xml:id="formula_0">F 1 (A, G) = 2 * prec(A, G) * rec(A, G)/(prec(A, G) + rec(A, G)).</formula><p>If an annotator assigned a question mark, then the corresponding Nell term could not be mapped and it does not appear in the gold standard G. This can again be seen in Table <ref type="table" target="#tab_3">3</ref>, where we present the mappings generated by  our algorithm for four triples, as well as the corresponding gold-standard annotations. If the mapping A consists of top-k possible candidates, computing precision and recall on the examples, we have the precision value for k = 1 as prec@1 = 4/7 ≈ 57% and rec@1 = 4/7 ≈ 57%. Note that precision and recall are not the same in general, because |A| = |G| in most cases. More generally, we are interested in prec@k, the fraction of top-k candidates that are correctly mapped and rec@k, the fraction of correct mappings that are in the top-k candidates.</p><p>For k = 3, we have prec@3 = 5/17 ≈ 29% and rec@3 = 5/7 ≈ 71%.</p><p>It can be expected that prec@1 will have the highest score and rec@1 will have the lowest score. When we analyze A with k &gt; 1, we focus mainly on the increase in recall. Here we are in particular interested in the value of k for which the number of additionally generated correct mappings in A is negligibly small compared to the mappings generated in A for k + 1.</p><p>When generating the gold standard, we realized that finding the correct mappings is often a hard task and sometimes even difficult for a human annotator. We had also observed that the problem of determining the gold standard varies strongly across the properties we analyzed. For some of the properties we could match all (or nearly all) subjects and objects in the chosen triples, while for other properties up to 15% of the instances could not be matched. Table <ref type="table" target="#tab_5">4</ref> presents the percentage of entities that could not be matched by the annotators, together with the main reason the annotators provided when they could not find a corresponding entity in DBpedia. A typical example of a problematic triple from the agentcollaborateswithagent property is agentcollaborateswithagent(world, greg)</p><p>In this case, the mapping for subject and object was annotated with a question mark. We also observed cases in which an uncommon description was chosen that had no counterpart in DBpedia. Some examples from the predicate animalistypeofanimal are the labels furbearers or small mammals. Fig. <ref type="figure">1</ref>. prec@1 and rec@1 of our proposed method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Results and Discussion</head><p>We run our algorithm against the gold standard<ref type="foot" target="#foot_6">6</ref> , and report the precision and recall values. In Figure <ref type="figure">1</ref>, we show the precision and recall values obtained on the set of Nell predicates. These values are for top-1 matches. Precision and recall vary across the predicates with lakeinstate having the highest precision. Using micro-average method, for the top-1 matches we achieved a precision of 82.78% and an average recall of 81.31% across all the predicates. In the case of macro-averaging, instead, we achieved precision of 82.61% and recall of 81.42%.</p><p>In Figure <ref type="figure">2</ref>, we show the values for rec@2, rec@5 and rec@10 compared to rec@1, the recall values reported in Figure <ref type="figure">1</ref>. By considering more possible candidates with increasing k, every term gets a better chance of being matched correctly, thus explaining the increases in rec@k with k. However, it must be noted, that for most of the predicates the values tend to saturate after rec@5. This reflects that after a certain k any further increase in k does not alter the correct mappings, since our algorithm already provided a match within top-1 or top-2 candidates. Still, for some we observe an increase even at rec@10 because there can be still a possibility of one correct matching candidate lying at a much lower rank in the top-k list of candidates.</p><p>In Figure <ref type="figure" target="#fig_2">3</ref>, we plot the micro-average values of the precision, recall and F 1 scores over varying k. We attain the best F 1 score of 0.82 for k = 1 and the recall values tend to saturate after k = 5.</p><p>This raises an important question regarding the upper bound of recall of our algorithm. In practice, we cannot achieve a recall of 1.0 because we are limited from factors like:</p><p>the matching candidate being never referred to by the terms. For example, gs refers to the company Goldmann Sachs, but it never appeared even in all the possible candidates, since Goldmann Sachs is never referred to with gs in Wikipedia. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>NELL Predicates</head><p>rec@1 rec@2 rec@5 rec@10 Fig. <ref type="figure">2</ref>. Comparison of rec@1 against rec@k.</p><p>persons being often referred to by the combination of their middle and last name. For e.g. hussein obama. It is actually talking about President Barack Obama, but with our approach we cannot find a good match. misspelled words. We have entities like missle instead of missile.</p><p>However, there are ways to further improve the recall of our method like, for instance, by means of string similarity techniques -e.g., Levenshtein edit distance. A similarity threshold (say, as high as 95%) could then be tuned to consider entities which only partially match a given term. Another alternative would be to look for sub-string matches for the terms with middle and last names of persons. For instance, hussein obama can have a possible match if terms like barrack hussein obama has a candidate match. In addition, a similarity threshold can be introduced in order to avoid matching by arbitrary longer terms.  In general, thanks to the annotation task and our experiments we were able to acquire some useful insights about the data set and the proposed task.</p><p>-Predicates with polysemous entities, like companyalso knownas, usually have lower precision. The triples for this predicate had a wide usage of abbreviated terms (the stock exchange codes for the companies) and that accounts for a lower precision value. -The Nell data is skewed towards a particular region or type. The triples involving persons and sports primarily refer to basketball or baseball. Similarly, for lakeinstate, nearly all the triples refer to lakes in United States.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>Key contributions in information extraction have concentrated on minimizing the amount of human supervision required in the knowledge harvesting process.</p><p>To this end, much work has explored unsupervised bootstrapping for a variety of tasks, including the acquisition of binary relations <ref type="bibr" target="#b2">[3]</ref>, facts <ref type="bibr" target="#b7">[8]</ref>, semantic class attributes and instances <ref type="bibr" target="#b19">[20]</ref>. Open Information Extraction further focused on approaches that do not need any manually-labeled data <ref type="bibr" target="#b8">[9]</ref>, however, the output of these systems still needs to be disambiguated by linking it to entities and relations from a knowledge base. Recent work has extensively explored the usage of distant supervision for IE, namely by harvesting sentences containing concepts whose relation is known and leveraging these sentences as training data for supervised extractors <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b13">14]</ref>. Talking of integration of open and closed IE projects, it is worthwhile to mention the work of <ref type="bibr" target="#b20">[21]</ref> where matrix factorization technique was employed for extracting relations across different domains. They proposed an universal schema which supports cross domain integration. There has been some work on instance matching in the recent past. Researchers have transformed the task into a binary classification problem and solved it with machine learning techniques <ref type="bibr" target="#b21">[22]</ref>. Some have tried to enrich unstructured data in form of text with Wikipedia entities <ref type="bibr" target="#b17">[18]</ref>. However, in our approach we consider the context of the entities while creating the gold standard which makes it bit different from these above mentioned entity linking approaches. Also, there are tools like Tìpalo <ref type="bibr" target="#b11">[12]</ref> for automatic typing of DBpedia entities. They use language definitions from Wikipedia abstracts and use WordNet in the background for disambiguation. PARIS <ref type="bibr" target="#b23">[24]</ref> takes a probabilistic approach to align ontologies utilizes the interdependence of instances and schema to compute probabilities for the instance matches. Lin et al. <ref type="bibr" target="#b15">[16]</ref> provide a novel approach to link entities across million documents. They take web extracted facts and link the entities to Wikipedia by means of information from Wikipedia itself, as well as additional features like string similarity, and most importantly context information of the extracted facts. The Silk framework <ref type="bibr" target="#b25">[26]</ref> discovers missing links between entities across linked data sources by employing similarity metrics between pairs of instances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>In this paper, we introduced a most-frequent-entity baseline algorithm in order to link entities from an open domain system to a closed one. We introduced a gold standard for this task and compared our baseline against it. In the near future, we plan to extend this work with more complex and robust methods, as well as extending our methodology to cover other open IE projects like ReVerb.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Recall 0 .</head><label>0</label><figDesc>0 0.2 0.4 0.6 0.8 1.0 a ct o rs ta rr e d in m o vi e a g e n tc o lla b o ra te sw ith a g e n t a n im a lis ty p e o fa n im a l a th le te le d sp o rt st e a m b a n kb a n ki n co u n tr y ci ty lo ca te d in st a te b o o kw ri te r co m p a n ya sl o kn o w n a s p e rs o n le a d so rg a n iz a tio n te a m p la ys a g a in st te a m w e a p o n m a d e in co u n tr y la ke in st a te</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig.3. Micro-average prec@k, rec@k and F1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Snippet of those articles linked to using the anchor jaguar and lincoln.</figDesc><table><row><cell></cell><cell>Article</cell><cell>Link count</cell></row><row><cell>jaguar</cell><cell>Jaguar Cars</cell><cell>1842</cell></row><row><cell cols="2">jaguar Jaguar Racing</cell><cell>440</cell></row><row><cell>jaguar</cell><cell>Jaguar</cell><cell>414</cell></row><row><cell>. . .</cell><cell>. . .</cell><cell>. . .</cell></row><row><cell cols="2">lincoln Lincoln, England</cell><cell>1844</cell></row><row><cell cols="2">lincoln Lincoln, Nebraska</cell><cell>920</cell></row><row><cell cols="2">lincoln Lincoln (2012 film)</cell><cell>496</cell></row><row><cell>. . .</cell><cell>. . .</cell><cell>. . .</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>The 30 most frequent predicates found in Nell. The set of predicates we randomly sampled for the gold standard are in bold.</figDesc><table><row><cell>Top predicates</cell><cell>Instances</cell><cell>Random predicates</cell><cell>Instances</cell></row><row><cell>generalizations</cell><cell>1297709</cell><cell>personleads-organization</cell><cell>716</cell></row><row><cell>proxyfor</cell><cell>5540</cell><cell>countrylocatedingeopoliticallocation</cell><cell>632</cell></row><row><cell>agentcreated</cell><cell>4354</cell><cell>actorstarredinmovie</cell><cell>537</cell></row><row><cell>subpartof</cell><cell>3262</cell><cell>athleteledsportsteam</cell><cell>294</cell></row><row><cell>atlocation</cell><cell>2877</cell><cell>personbornincity</cell><cell>285</cell></row><row><cell>mutualproxyfor</cell><cell>2803</cell><cell>bankbankincountry</cell><cell>246</cell></row><row><cell>locationlocatedwithinlocation</cell><cell>2159</cell><cell>weaponmadeincountry</cell><cell>188</cell></row><row><cell>athleteplayssport</cell><cell>2076</cell><cell>athletebeatathlete</cell><cell>148</cell></row><row><cell>citylocatedinstate</cell><cell>2010</cell><cell>companyalsoknownas</cell><cell>107</cell></row><row><cell>professionistypeofprofession</cell><cell>1936</cell><cell>lakeinstate</cell><cell>105</cell></row><row><cell>subpartoforganization</cell><cell>1874</cell><cell></cell><cell></cell></row><row><cell>bookwriter</cell><cell>1809</cell><cell></cell><cell></cell></row><row><cell>furniturefoundinroom</cell><cell>1674</cell><cell></cell><cell></cell></row><row><cell>agentcollaborateswithagent</cell><cell>1541</cell><cell></cell><cell></cell></row><row><cell>animalistypeofanimal</cell><cell>1540</cell><cell></cell><cell></cell></row><row><cell>agentactsinlocation</cell><cell>1490</cell><cell></cell><cell></cell></row><row><cell>teamplaysagainstteam</cell><cell>1448</cell><cell></cell><cell></cell></row><row><cell>athleteplaysinleague</cell><cell>1390</cell><cell></cell><cell></cell></row><row><cell>worksfor</cell><cell>1303</cell><cell></cell><cell></cell></row><row><cell>chemicalistypeofchemical</cell><cell>1303</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>Four annotation examples of the bookwriter predicate (we have removed the URI prefix http://dbpedia.org/resource/ for better readability).</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4 .</head><label>4</label><figDesc>The percentage of entities per predicate that could not be matched by a human annotator.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://lemurproject.org/clueweb09/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1">royals ⇒ http://dbpedia.org/resource/Kansas City Royals royal ⇒ http://dbpedia.org/resource/The Royal Bank of Scotland</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">http://www.cs.technion.ac.il/ ~gabr/resources/code/wikiprep/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">Note that, while there are alternative data sets such as the Crosswiki data<ref type="bibr" target="#b22">[23]</ref>, in this work we opted instead for exploiting only Wikipedia internal-link anchors since we expect them to provide a cleaner source of data.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4">http://rtw.ml.cmu.edu/rtw/resources</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5">http://rtw.ml.cmu.edu/rtw/faq</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_6">The data are freely available at http://web.informatik.uni-mannheim.de/data/ nell-dbpedia/NellGoldStandard.tar.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Dbpedia: A nucleus for a web of open data</title>
		<author>
			<persName><forename type="first">Sören</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georgi</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jens</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zachary</forename><surname>Ive</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 6th International Semantic Web Conference joint with 2nd Asian Semantic Web Conference (ISWC+ASWC 2007)</title>
				<meeting>6th International Semantic Web Conference joint with 2nd Asian Semantic Web Conference (ISWC+ASWC 2007)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="722" to="735" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">DBpedia -A crystallization point for the web of data</title>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jens</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georgi</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sören</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Hellmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="154" to="165" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Extracting patterns and relations from the world wide web</title>
		<author>
			<persName><forename type="first">Sergey</forename><surname>Brin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Selected papers from the International Workshop on The World Wide Web and Databases, WebDB &apos;98</title>
				<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="172" to="183" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Using encyclopedic knowledge for named entity disambiguation</title>
		<author>
			<persName><forename type="first">Razvan</forename><surname>Bunescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marius</forename><surname>Paşca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of EACL-06</title>
				<meeting>of EACL-06</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="9" to="16" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Toward an architecture for never-ending language learning</title>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Carlson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Justin</forename><surname>Betteridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Kisiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Burr</forename><surname>Settles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Estevam</forename><forename type="middle">R</forename><surname>Hruschka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tom</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of AAAI-10</title>
				<meeting>of AAAI-10</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1306" to="1313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Semi-Supervised Learning</title>
		<author>
			<persName><forename type="first">Olivier</forename><surname>Chapelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bernhard</forename><surname>Schlkopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Zien</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
	<note>1st edition</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Search needs a shake-up</title>
		<author>
			<persName><forename type="first">Oren</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">476</biblScope>
			<biblScope unit="page" from="25" to="26" />
			<date type="published" when="2011">7358. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Web-scale information extraction in KnowItAll: (preliminary results)</title>
		<author>
			<persName><forename type="first">Oren</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Doug</forename><surname>Downey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stanley</forename><surname>Kok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ana-Maria</forename><surname>Popescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tal</forename><surname>Shaked</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stephen</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Yates</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW &apos;04</title>
				<meeting>of WWW &apos;04</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="100" to="110" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Identifying relations for open information extraction</title>
		<author>
			<persName><forename type="first">Anthony</forename><surname>Fader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stephen</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oren</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of EMNLP-11</title>
				<meeting>of EMNLP-11</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1535" to="1545" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge</title>
		<author>
			<persName><forename type="first">Evgeniy</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaul</forename><surname>Markovitch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of AAAI-06</title>
				<meeting>of AAAI-06</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="1301" to="1306" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Computing semantic relatedness using wikipedia-based explicit semantic analysis</title>
		<author>
			<persName><forename type="first">Evgeniy</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaul</forename><surname>Markovitch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IJCAI-07</title>
				<meeting>of IJCAI-07</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1606" to="1611" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Automatic typing of DBpedia entities</title>
		<author>
			<persName><forename type="first">Aldo</forename><surname>Gangemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreagiovanni</forename><surname>Nuzzolese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valentina</forename><surname>Presutti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Draicchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alberto</forename><surname>Musetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Ciancarini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2012</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Berlin and Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">7649</biblScope>
			<biblScope unit="page" from="65" to="81" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia</title>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Hoffart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabian</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Klaus</forename><surname>Berberich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerhard</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">194</biblScope>
			<biblScope unit="page" from="28" to="61" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning 5000 relational extractors</title>
		<author>
			<persName><forename type="first">Raphael</forename><surname>Hoffmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Congle</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACL-10</title>
				<meeting>of ACL-10</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="286" to="295" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Collaboratively built semi-structured content and Artificial Intelligence: The story so far</title>
		<author>
			<persName><forename type="first">Eduard</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simone</forename><surname>Paolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ponzetto</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">194</biblScope>
			<biblScope unit="page" from="2" to="27" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Entity linking at web scale</title>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mausam</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Oren</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of AKBC-WEKEX &apos;12</title>
				<meeting>of AKBC-WEKEX &apos;12</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="84" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Dbpedia: A multilingual crossdomain knowledge base</title>
		<author>
			<persName><forename type="first">Pablo</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Max</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of LREC-12</title>
				<meeting>of LREC-12</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Learning to link with wikipedia</title>
		<author>
			<persName><forename type="first">David</forename><surname>Milne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of CIKM &apos;08</title>
				<meeting>of CIKM &apos;08</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="509" to="518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</title>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simone</forename><surname>Paolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ponzetto</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">193</biblScope>
			<biblScope unit="page" from="217" to="250" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Organizing and searching the world wide web of facts -step two: harnessing the wisdom of the crowds</title>
		<author>
			<persName><forename type="first">Marius</forename><surname>Paşca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW &apos;07</title>
				<meeting>of WWW &apos;07</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="101" to="110" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Relation extraction with matrix factorization and universal schemas</title>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Limin</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benjamin</forename><forename type="middle">M</forename><surname>Marlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Mccallum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL &apos;13)</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A machine learning approach for instance matching based on similarity metrics</title>
		<author>
			<persName><forename type="first">Shu</forename><surname>Rong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xing</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evanwei</forename><surname>Xiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haofen</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Qiang</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yong</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2012</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Berlin and Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">7649</biblScope>
			<biblScope unit="page" from="460" to="475" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A cross-lingual dictionary for english wikipedia concepts</title>
		<author>
			<persName><forename type="first">I</forename><surname>Valentin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Angel</forename><forename type="middle">X</forename><surname>Spitkovsky</surname></persName>
		</author>
		<author>
			<persName><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc of LREC-12</title>
				<meeting>of LREC-12</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="3168" to="3175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Paris: probabilistic alignment of relations, instances, and schema</title>
		<author>
			<persName><forename type="first">Fabian</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Serge</forename><surname>Abiteboul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Senellart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2011-11">November 2011</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="157" to="168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Yago: A Core of Semantic Knowledge</title>
		<author>
			<persName><forename type="first">Fabian</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gjergji</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerhard</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">16th international World Wide Web conference (WWW 2007)</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Silk -A Link Discovery Framework for the Web of Data</title>
		<author>
			<persName><forename type="first">Julius</forename><surname>Volz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Gaedke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georgi</forename><surname>Kobilarov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of LDOW &apos;09</title>
				<meeting>of LDOW &apos;09</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Open information extraction using Wikipedia</title>
		<author>
			<persName><forename type="first">Fei</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACL-10</title>
				<meeting>of ACL-10</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="118" to="127" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
