<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Generating Domain-Specific Knowledge Graphs: Challenges with Open Information Extraction</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nitisha</forename><surname>Jain</surname></persName>
							<email>nitisha.jain@hpi.de</email>
							<affiliation key="aff0">
								<orgName type="institution">HPI -Hasso Plattner Institute</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alejandro</forename><surname>Sierra-Múnera</surname></persName>
							<email>alejandro.sierra@hpi.de</email>
							<affiliation key="aff0">
								<orgName type="institution">HPI -Hasso Plattner Institute</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Julius</forename><surname>Streit</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">HPI -Hasso Plattner Institute</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simon</forename><surname>Thormeyer</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">HPI -Hasso Plattner Institute</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Philipp</forename><surname>Schmidt</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">HPI -Hasso Plattner Institute</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maria</forename><surname>Lomaeva</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">University of Potsdam</orgName>
								<address>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ralf</forename><surname>Krestel</surname></persName>
							<email>r.krestel@zbw.eu</email>
							<affiliation key="aff2">
								<orgName type="department">ZBW -Leibniz Centre for Economics</orgName>
								<address>
									<settlement>Kiel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="institution">Kiel University</orgName>
								<address>
									<settlement>Kiel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Generating Domain-Specific Knowledge Graphs: Challenges with Open Information Extraction</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7389CF5DB9ED311A50E918C1DCE2732B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:05+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge graphs</term>
					<term>Open information extraction</term>
					<term>Domain-specific texts</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Knowledge Graphs (KGs) are a popular way to structure and represent knowledge in a machine-readable way. While KGs serve as the foundation for many applications, the automatic construction of these KGs from texts is a challenging task where Open Information Extraction techniques are prominently leveraged. In this paper, we focus on generating a domain-specific knowledge graph based on art-historic texts from a digitized text collection. We describe the combined use and adaption of existing open information extraction methods to build an art-historic KG that can facilitate data exploration for domain experts. We discuss the challenges that were faced at each step and present detailed error analysis to identify the limitations of existing methods when working with domain-specific corpora.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Knowledge Graphs (KGs) have gained considerable popularity in both academia and industry. They are employed to represent information in a structured format after extraction from large collections of heterogeneous, diverse, and unstructured documents <ref type="bibr" target="#b0">[1]</ref>. These KGs can then be used for downstream tasks, such as question answering, logical inference, recommendation, or information retrieval. Besides general KGs that aim to capture generic knowledge about real-world data, such as DBpedia <ref type="bibr" target="#b1">[2]</ref> and Wikidata <ref type="bibr" target="#b2">[3]</ref>, domain-specific KGs have become important for targeted domains <ref type="bibr" target="#b3">[4]</ref>. They have been leveraged to support multiple informationbased applications, e.g., in the context of health and life sciences <ref type="bibr" target="#b4">[5]</ref>, news search <ref type="bibr" target="#b5">[6]</ref> or fact checking <ref type="bibr" target="#b6">[7]</ref>.</p><p>There have been several efforts towards automatic construction of general purpose knowledge graphs from the Web based on machine learning techniques <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. In the absence of a prespecified list of relations for performing pattern-based extractions, Open Information Extraction Goals. In this paper, we describe an ongoing project <ref type="foot" target="#foot_0">1</ref> for the automatic construction of a knowledge graph based on a large, private archive of art-historic documents. Instead of relying on existing ontologies to dictate the information extraction process (that might restrict the scope of the entities and relations that could be extracted from the text when the ontology is not hand-crafted for the specific dataset) we decided to pursue the schema-less Open IE approach in this work. We present the results from our exploration of existing Open IE techniques to generate structured information and discuss our insights in terms of their shortcomings and limited applicability when deployed for noisy, digitized data in the art domain.</p><p>We make the following contributions in this paper: (i) Construct a domain-specific knowledge graph based on a collection of digitized art-historic documents. (ii) Describe the process of automated construction of the KG with Open IE techniques. (iii) Analyze and discuss the challenges and limitations for the adaptation of Open IE tools to domain-specific datasets.</p><p>1-18</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>With the availability of digitized cultural data, several previous works have proposed KGs for art-related datasets <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b22">23]</ref>. Arco <ref type="bibr" target="#b20">[21]</ref> is a large Italian cultural heritage graph with a pre-defined ontology that was developed in a collaborative fashion with contributions from domain experts all over the country. While the Arco KG is quite broad in its coverage, Ardo <ref type="bibr" target="#b23">[24]</ref> pertains to a very specific use case of multimedia archival records. Similarly, the Linked Stage Graph <ref type="bibr" target="#b24">[25]</ref> was developed as a KG specifically for storing historical data about the Stuttgart State Theater. Increasingly, the principles of linked open data <ref type="foot" target="#foot_1">2</ref> have also been widely adopted within the cultural heritage domain for facilitating researchers, practitioners and generic users to study and consume cultural objects. Notable examples include the CIDOC-CRM <ref type="bibr" target="#b25">[26]</ref>, the Rijksmuseum collection <ref type="bibr" target="#b26">[27]</ref>, the Zeri Photo Archive<ref type="foot" target="#foot_2">3</ref> , OpenGLAM <ref type="bibr" target="#b27">[28]</ref> among many others. Most related to our work is the ArtGraph <ref type="bibr" target="#b21">[22]</ref> where the authors have integrated the art resources from DBpedia and WikiArt and constructed a KG with a well-defined schema that is centered around artworks and artists. While all these works are concerned with KGs and ontologies for specific art-related corpora, they have leveraged a schema for representing the information and are not concerned with the challenges of a schema-free extraction process, which is the main focus of this work.</p><p>Open IE approaches extract triples directly from text, without an explicit ontology or schema behind the extraction process. Several works have been proposed in the past. TextRunner <ref type="bibr" target="#b11">[12]</ref> relies on a self supervised classifier which determines trustworthy relationships with pairs of entities, while Reverb <ref type="bibr" target="#b10">[11]</ref> uses syntactical and lexical constraints to overcome incoherent and uninformative relationships. ClausIE <ref type="bibr" target="#b13">[14]</ref> relies heavily on dependency parsing to construct clauses from which the propositions will be extracted. In this work, we have leveraged the Stanford CoreNLP OpenIE implementation <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b12">13]</ref> that uses dependency parsing to minimize the phrases of the resulting clauses, and was originally evaluated in a slot filling task.</p><p>The construction of domain-specific KGs has been the subject of investigation in previous works for various domains, e.g. software engineering <ref type="bibr" target="#b29">[30]</ref>, academic literatures <ref type="bibr" target="#b30">[31]</ref>, and more prominently, the biomedical domain <ref type="bibr" target="#b31">[32,</ref><ref type="bibr" target="#b32">33,</ref><ref type="bibr" target="#b33">34]</ref>. However, the previously proposed automated methods are not directly applicable for the arts and cultural heritage domain, where unique challenges with respect to the heterogeneity and quality of data are prevalent. This work identifies and discusses the particular difficulties encountered while applying existing information extraction techniques to art-related corpora.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Automated Construction of Art-historic KG</head><p>In this section, we describe our underlying art-historic dataset as well as the steps employed for the automated extraction of information (in form of triples) to construct an art-historic knowledge graph. Fig. <ref type="figure" target="#fig_0">1</ref> shows an overview of this process. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset</head><p>For this work, we are working with a large collection of recently digitized art-historical texts provided by our project partners. This collection consists of a variety of heterogeneous documents including auction catalogs, exhibition catalogs, art books, etc. that contain semi-structured as well as unstructured texts describing artists, artworks, exhibitions and so on. Art historians regularly study these data collections for art-historical analysis. Therefore, a systematic representation of this data in the form of a KG would be a valuable resource for them to explore this data swiftly and efficiently. The whole collection is quite large (≈ 1TB of data), in order to restrict the size of the dataset for a proof-of-concept of our KG construction process, a subset of this dataset pertaining to information about the artist Picasso was chosen. The decision of choosing an artist-oriented subset of the collection enabled us to better understand the context and evaluate the triples that were obtained throughout the process of KG construction. The data was filtered by querying the document collection using the keyword query 'Picasso', resulting in 224,469 entries (where each entry corresponds to a page of the original digitized corpus) containing the term 'Picasso'. Due to the filtering, each entry is an independent document, in the sense that the neighboring entries do not always represent the correct context. This led to some of the entries in our dataset containing incomplete sentences at the beginning or the end of a page. One such example is an entry starting with 'to say47-Picasso never belittled his work, until . . . ' where the tokens 'to say' belong to a sentence which started in a different entry, that might no longer be a part of the dataset under consideration. It is important to note that in the same example we can see more noise, e.g., numbers are mixed in between words in the digitized version of the text. This noise in the dataset was introduced by the optical character recognition (OCR) process during the digitization of the documents (performed in a prior step by the data providers). In general, the dataset contains full sentences, such as 'Matisse's return to the study of ancient and Renaissance sculpture is significant in itself. ', as well as short description phrases, figure captions or footnotes such as 'G. Bloch, Pablo Picasso, Bern, 1972, vol. III, p.142'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Finding Named Entities</head><p>As a first step, it was interesting to inspect if the named entities present in the corpus could be easily identified. A dictionary-based approach to find the named entities would identify the mentions with a high precision, but at the cost of very low recall by ignoring many potentially interesting entities to be discovered in the corpus. Therefore, we chose to follow a machine learning approach to named entity recognition (NER). Generic NER tools work very well for the common entity types, such as person, location, organization and so on, though fine-grained or domain-specific entities are harder to identify <ref type="bibr" target="#b34">[35]</ref>. We employed the SpaCy library <ref type="foot" target="#foot_3">4</ref> for finding named entities since its pre-trained models includes a Work_Of_Art category that could potentially identify the entities that are important in the art domain (this could encompass mentions of paintings, books, statues etc.). Excluding the cardinal entities in order to reduce noise, the SpaCy library with the pre-trained 'en_core_web_trf' model was used to identify the following entity types -Work_Of_Art, Person, Product, ORG, LOC, GPE and NORP, which showed reasonably good results. The process of NER enabled us to filter out any sentences without any entity mention since such sentences were likely to have no useful information for the KG construction. Thus, the NER step helped with pruning the dataset for further processing, as well as improving the quality of the resulting KG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Triple Extraction</head><p>After obtaining informative sentences from the previous step, we employed Open IE tools to extract the triples from them. It is important to note that while there are some art-related ontologies proposed in previous works such as Arco <ref type="bibr" target="#b20">[21]</ref> and ArDo <ref type="bibr" target="#b23">[24]</ref>, none of them are suitable for our corpus since they are very specific to the datasets they were designed for. Other general ontologies such as CIDOC-CRM are, on the other hand, too broad and would not be able to extract novel and interesting facts from a custom and heterogeneous corpus such as ours, where the entities and relations among them are not known before hand. In the absence of such an ontology specifically designed for the description of art-historic catalogs, we choose to employ open information extraction techniques for the construction of our KG in order to broaden the scope and utility of the extracted information.</p><p>To this end, we ran the Stanford CoreNLP OpenIE annotator <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b35">36]</ref> to extract ⟨subject, predicate, object⟩ triples from the sentences. A total of 5,057,488 triples were extracted in this process, where multiple triples could be extracted from a single sentence. Another round of filtering was performed at this stage, where any triples that did not contain a named entity in the subject or object phrase were removed. Additionally, duplicate entries and triples with serial numbers as entities were also ignored. Some examples of triples that were removed are: ⟨we,</p><formula xml:id="formula_0">1-18</formula><p>have, good relationship⟩, ⟨i, be, director⟩, ⟨brothel, be in, evening⟩, ⟨drawings, acquired, work⟩. A total of 160,000 triples remained, a valid triple at this stage looked like ⟨P. Picasso, is, artiste⟩.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Entity Linking</head><p>Once the triples were extracted, the entity linking component of the Stanford CoreNLP pipeline <ref type="bibr" target="#b28">[29]</ref> was used to link the entities. This component uses WikiDict as a resource, and uses the dictionary to match the entity mention text to a specific entity in Wikipedia. Since the entities in our dataset were present in multiple different surface forms, this step allowed us to partially normalize the entities and identify the unique entities. Though the number of entities was reduced as a result, the total number of triples remained the same. Note that this linking could only map entities to their Wikipedia counterpart if the entity was found as a subject or object in a triple. In many cases though, the subject and object were noun phrases instead of obvious entities, for which this kind of linking did not really work. This process was still quite useful as around 108,841 out of 337,100 entities were successfully linked to their Wikipedia form (leading to 8,369 unique entities). Some of the most frequent entities found in the dataset (along with their frequencies) were: (Pablo_Picasso, 11219), (Paris, 2178), (Artist, 1904), (Henri_Matisse, 1769), (Georges_Braque, 1352).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Canonicalization</head><p>One of the main challenges when constructing a KG through Open IE techniques, is that of canonicalization. Multiple surface forms of the same entity or relation might be observed in the triples extracted with Open IE techniques in the form of noun phrases or verb phrases that need to be identified and tagged to a single semantic entity or relation in the KG. Since the triples extracted from our dataset via Open IE method comprised many noisy phrases, as well as new entities, such as titles of artworks, that may not be available for mapping in existing databases, entity linking techniques would not suffice in this case. Different from entity linking (that can only link entities already present in external KGs), canonicalization is able to perform clustering for the entities and relations that may not be present in existing KGs, by labelling them as OOV (out of vocabulary) instances. In this work, we chose to perform canonicalization with the help of CESI <ref type="bibr" target="#b36">[37]</ref> which is a popular and openly available approach for this task. The CESI approach performs clustering over the non-canonicalized forms of noun phrases for entities and verb phrases for the relations. It leverages different sources of side information for noun phrases and relation phrases such as entity linking, word senses and rule-mining systems for learning embeddings for these phrases using the HolE <ref type="bibr" target="#b37">[38]</ref> knowledge graph embedding technique. The clustering is then performed using hierarchical agglomerative clustering (HAC) based on the cosine similarity of the phrase embeddings in vector space. In this manner, different phrases for the same entity or relation were mapped to one canonicalized form for including in the KG. In total, we obtained 3,789 entity clusters and 3,778 relation clusters from the CESI approach that contained two or more terms.</p><p>Representative Selection. An important step in the CESI approach is the assignment of representatives for the clusters obtained for the noun and relation phrases. This is decided by calculating a weighted mean of all the cluster members' embeddings in terms of their frequency of occurrence. The phrase closest to this mean is selected as the representative. However, this technique did not work well for our domain-specific and noisy dataset and many undesirable errors were noticed. For example, an entity cluster obtained from CESI was: Olga_Khokhlova, olga, khokhlova, picasso. Since Picasso is the most frequent entity in the dataset, it was chosen as representative by CESI, but this is clearly wrong since Picasso and Olga are different entities. There were several other errors observed, e.g., all days of the week were clustered together in one cluster. This could be a result of the embedding and contexts of the days of the week to be quite similar, hence their vectors would end up together in the vector space. In other cases, the color blue occasionally showed up in a cluster of phrases related to color red, certain dates got clustered and certain related but not interchangeable words got clustered (kill vs murder vs shot). In some cases, the first name was being replaced by the incorrect full name (not every david is david johnson). To mitigate the above discussed errors, we had to perform manual vetting of the clusters for verification and selection of the correct cluster representatives which took around 2-3 person hours. During this process, certain clusters, where the entities were different, were removed (such as the cluster with days of the week). After this, the entities and relations were canonicalized as per their chosen cluster representatives leading to a total of 35,305 unique entities and 33,448 unique relations in the final KG <ref type="foot" target="#foot_4">5</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Entity Typing</head><p>Since a schema or ontology was not employed to extract the triples from text, the entities in our KG do not have any entity types implicitly assigned to them. Therefore, we attempted to identify the types of as many entities in our graph as possible. With the help of NER, we assigned the types to the entities that were recognized in the triples. A total of 14,960 entities were typed with this technique to generic types such as Person, Product, ORG, LOC, GPE, NORP and Work_Of_Art, as well as numeric types such as Date, Time and Ordinal. Note that Work_of_Art is quite a broad category that includes artworks but also movies, books and various other art forms. Since artworks such as paintings and sculptures are one of the most important entities in our art-historic KG, it is worthwhile to identify the mention and type of these entities. However, generic NER process is neither equipped nor optimized to correctly identify such mentions. Thus, we additionally applied dictionary-based matching. This was done by compiling a large gazetteer of artwork titles by querying Wikidata with the help of the Wikidata Query Service<ref type="foot" target="#foot_5">6</ref> for the names of paintings and sculptures, retrieving approximately 15,000 artwork titles. In addition, we augmented our dictionary with the names of the artwork entities from the ArtGraph dataset <ref type="bibr" target="#b21">[22]</ref> which contains more than 60,000 artworks derived from DBpedia and WikiArt. If a match was found for an entity in our KG in the compiled dictionary, the type was assigned as artwork accordingly. This led to the tagging of further 1,397 entities in our KG as artworks. The dictionary-based matching for artworks was particularly useful in the cases where it was able to correctly identify entities that were wrongly assigned as the Person type by NER, such as la_donna_gravida, portrait_of_mary_cassatt and st._paul_in_prison. Similar to artworks, we attempted to additionally identify the names of artists in our triples.</p><p>While NER could only tag entities as Person, we used a dictionary of artist names from Wikidata to identify 656 unique artist entities in our data. These included names of artists such as Piet Mondrian, Edvard Munch and Rembrandt. However, the process of entity typing described above is only able to identify and tag around half of the entities in our KG. Several domain and corpus-specific challenges acted as bottlenecks during this process. For example, even after filtering, some triples extracted from Open IE contained either subject or object noun phrases that were generic and did not correspond to any named entity. Examples of such phrases include essay, anthology, periodical, or album that are present in triples such as ⟨album, be_shown_in, Paris⟩. Without designing a custom ontology for this corpus, such entities cannot be hoped to be correctly typed.</p><p>The categorization of the relations in the KG is a particularly complicated task due to the wide variety of relations extracted from the Open IE process. Few of the most frequent relations in the KG are will, be_in, have, show, paint, work etc. We estimated that the types of the entities could be utilized to find patterns and link the most popular edges in the KG to the relations in existing graphs such as Wikidata or ArtGraph. However, preliminary analysis led to some interesting observations. Firstly, we noted the presence of multiple relations between pairs of entities in the KG. For example, Picasso and June are connected by various relations such as will_be, work and take_trip_in that were extracted from different contexts in the corpus and represent separate meaningful facts. Furthermore, in general, there are several different types of semantic relations between the popular entity types in our KG. For instance, two entities of the type artist are connected by several relations including work, meet, know_well, be_with, friend_of and be_admirer_of. While this variety indicates that a large number of interesting facts have been derived by Open IE in the absence of a fixed and limiting schema, normalizing the relations to improve the quality of the KG is a difficult task that is part of the ongoing and future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Art-historic Knowledge Graph</head><p>The statistics of the KG generated from the steps as described in the previous section are shown in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Graph Features</head><p>After obtaining the refined set of triples for the first version of the art-historic KG, we performed a preliminary analysis of the graph to derive useful insights with the help of the NetworkX<ref type="foot" target="#foot_6">7</ref> package. To understand the graph structure, the number of disconnected components of the graph was measured before and after the canonicalization step. It was noticed that the number of disconnected components was reduced to around 1,500 (down from 2,500) after clustering with CESI. This indicates that canonicalization of entities and relations improved the quality of the knowledge graph by removing unnecessary disconnected parts that were created through redundant triples. Additionally, we also performed node centrality on the graph using eigenvector centrality <ref type="bibr" target="#b38">[39]</ref> and link analysis using PageRank <ref type="bibr" target="#b39">[40]</ref>. For both the measures, the node for Pablo Picasso was the most central. This confirms the property of the underlying dataset which is focused on Picasso. Other central nodes discovered were corresponding to popular words in the corpus such as work, artist, painting etc. Overall, it is promising to witness that centrality analysis of the generated KG conforms well regarding the main entities and topics of the underlying corpus. A hand-picked example of a subset of the neighborhood of the entity Picasso is shown in Fig. <ref type="figure" target="#fig_1">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Evaluation</head><p>task due to the open world assumption <ref type="bibr" target="#b16">[17]</ref>, we attempted to perform limited evaluation in terms of the coverage of the KG in a semi-automated fashion. For this, we first created a  subset of Wikidata <ref type="bibr" target="#b2">[3]</ref> by querying for triples about the entity Picasso and used this as the knowledge graph for comparison. This is motivated by the fact that Wikidata contains high quality information about Picasso and the entity linking used in our pipeline performs the linking to Wikipedia (hence, Wikidata) entities. Therefore, it was likely to have a higher match between the surface forms of entities in our KG to the Wikipedia entities, as compared to other datasets such as DBpedia.</p><p>From the obtained Wikidata subset, 100 triples were randomly selected that related to information about Picasso as well as about museums that owned his works. Upon careful manual inspection (independently by three annotators) and resolution of conflicts with discussions, it was measured that the facts represented in 43% of these triples were also present in our KG as a direct match or in a different form with the same meaning. Notably, our KG was missing information about the museums that own Picasso's works, this is because our underlying corpus is also lacking comprehensive information on this topic. Therefore, triples relating to museums from Wikidata could not be matched. Additionally, we checked how many of our entities and entity pairs are written in exactly the same way as in the Wikidata graph. Overall, around 12% of entities and 10% of entity pairs in our graph have exact matches in Wikidata. These preliminary results are promising and point towards the need for a domain-oriented construction process for further improvement of the art-historic KG. In particular, the precision of the triples in art-historic KG is more important to the users and therefore, factual verification for the triples that were extracted from our dataset but are not found in Wikidata needs to be conducted by enlisting the help of domain experts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Implementation</head><p>Taking cue from related work <ref type="bibr" target="#b21">[22]</ref>, we have encoded our KG data into Neo4j<ref type="foot" target="#foot_7">8</ref> which is a no-SQL graph database that provides an efficient way of capturing the diverse connections between the different entities of our knowledge graph. Additionally, the knowledge graph stored in the Neo4j database can be queried easily with the help of the Cypher language for enabling data exploration and knowledge discovery. Fig. <ref type="figure" target="#fig_3">3</ref> shows the results of a few example queries that can be executed on the KG -venues where Picasso and other artists had exhibited their work; and various art schools or movements where Picasso was involved. Further, Fig. <ref type="figure" target="#fig_4">4</ref> shows the persons and/or art styles that Picasso influenced or was influenced by. In some cases, interesting connections with other relevant entities are also retrieved, thus providing useful cues for further exploration of the data in the KG for domain experts as well as interested users.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and Error Analysis</head><p>Due to the source corpus being heterogeneous and noisy, the Open IE process led to a number of incorrect triples in the KG despite our best efforts to eliminate the noise at each step. Here, we perform a critical analysis and look deeper into the quality of the triples in the first version of the KG. For this, we sample few of the incorrectly extracted triples, to understand the nature of mistakes committed by the automated KG generation process. Table <ref type="table" target="#tab_1">2</ref> presents some triples in the KG and the corresponding text snippets in the input data from which they were extracted.</p><p>In T1, even though the triple appears to be syntactically correct, the actual entity corresponds to the entire phrase The Third of May 1808 in Madrid which is an artwork, and thus the correct triples should relate this artwork to the corresponding artist Francisco de Goya, perhaps including the date 1814 as well. This example illustrates the difficulty of recognizing artwork titles, given that they usually contain other entities like Madrid (location). A similar mistake can be seen in T6. Here Appel was incorrectly recognized as a location instead of the surname of Karel Appel (person), and thus the triple represents the information to be an influence of an artist on a location, instead of between the artists.</p><p>Examples in T2 to T6 represent the triples and the supporting text snippets for the results of the query as depicted in Figure <ref type="figure" target="#fig_4">4</ref>, which contains a mixture of factually correct, factually incorrect, and speculative facts. In T2, a relation was correctly extracted from the text, but the head entity was incorrectly recognized as 'American'. This example speaks for the need for additional work on co-reference resolution, in order to properly follow the connections in the text. A more precise triple would have been ⟨Gorky, beInfluenceBy, Pablo Picasso⟩.</p><p>T3 is an example in which the lack of context in the syntactic analysis of the sentence results in the assumption that the statement is true, although it is a suggestion by a specific person and therefore, not necessarily a true fact. A similar example is T4 in which the source text is  explaining a potential influence relation between the artists, but it cannot be directly assumed that it is a fact. These two examples illustrate that the context of the actual text might get lost during the extraction process, which may lead to erroneous facts being represented in the KG. Thus, it is important to take into account the provenance information that can help the user understand the full context for obtaining the correct information. A different scenario is depicted in T5, in which the text clearly confirms the validity of the fact. One interesting observation is regarding the syntactic structure of the relation phrase -the word 'doubtless' acts as an adverb emphasizing the validity of the fact, and although it divides the relation phrase 'was influenced by', the syntactic analyzer and the canonicalization step were able to normalize the relation to a canonical form. This is also evident in the diversity of relation phrases in this sample of texts. They are expressed in different tenses, with auxiliary verbs, and sometimes spread within a more complex sentence, as seen in T5. Examples T3 to T6 illustrate the need for fact-checking in our KG. Particularly, the facts in the KG could be presented to domain experts who would be able to easily look at the information in a user-friendly manner and then proceed to investigate further to either corroborate or even contradict the triples in the automatically generated KG. We envision the easy access and scrutiny of the information stored in large text collections to be the primary use-case of this automatically generated art-historic KG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Lessons Learned and Future Work</head><p>This work presented a first attempt at constructing a domain-oriented knowledge graph for the art domain in an automated fashion with Open IE techniques. Due to the noisy and heterogeneous dataset that is typical of digitized art-historic collections, we encountered challenges at various steps of the KG construction process. During the very first step, it was difficult to correctly identify the mentions of artworks (i.e. titles of paintings) in the dataset due to the noise and inherent ambiguities. This domain-specific issue needs further attention in order to improve the quality as well as coverage of the resulting KG, as discussed in detail by previous work <ref type="bibr" target="#b34">[35]</ref>. In addition, a co-reference resolution tool <ref type="bibr" target="#b40">[41]</ref> could also help with the identification and linking of relevant entities.</p><p>While the Open IE approach allowed for the extraction of a wide variety of entities and relations, this led to canonicalization becoming a complicated task. We observed that existing techniques for canonicalization on generic datasets, such as CESI, do not show comparable performance for domain-specific dataset. It would be interesting to investigate if large pretrained language models such as FastText and BERT could compete with the relatively older KG embeddings that were employed in CESI for obtaining better clusters. There are other recent works on canonicalization <ref type="bibr" target="#b41">[42,</ref><ref type="bibr" target="#b42">43]</ref> that demonstrate better results and would be worth exploring further for our use case in future work. Another important aspect is the incomplete tagging of the various types of entities obtained from Open IE. Attributed yet again to the noise in the process, as well as to lack of any underlying schema, many entities could not be assigned their correct type. This task needs further exploration for the enrichment of the KG.</p><p>Moreover, we have only considered English texts in this work so far, since the existing methods show their best performance with English texts. However, our art-historic collection is comprised of multiple languages and we would like to expand the pipeline to process multilingual texts. Taking into account the existing limitations of the methods with domain-specific corpora, this seems to be an arduous but interesting research challenge.</p><p>With regard to the implementation of the KG pipeline, while we have so far used off-the-shelf tools and libraries like SpaCy, Stanford CoreNLP and CESI, we plan to further fine-tune them to the task of domain-specific KG construction. It will also be worthwhile to explore and evaluate the performance with other available tools such as Flair <ref type="bibr" target="#b43">[44]</ref> and Blink <ref type="bibr" target="#b44">[45]</ref> for entity recognition, linking and typing, as well as OpenIE <ref type="bibr" target="#b15">[16]</ref> and MinIE <ref type="bibr" target="#b14">[15]</ref> for the extraction of triples. The scalability of these approaches and the completeness of the resulting KG in the presence of new and expanding cultural heritage datasets is also an open research question to be looked into.</p><p>The evaluation of the art-historic KG is also a crucial task worth discussing. While we have performed a semi-automated evaluation for the first version of our KG, a more rigorous and thorough evaluation of the correctness of the facts is certainly imperative before this KG can be useful to a non-expert user (as discussed in Section 5). One way to ensure this would be to maintain the provenance and of the facts in the KG, in terms of their source document as well as their confidence measure. This could also facilitate a fair and complementary manual evaluation in terms of precision and recall which could provide further insights. For this, we plan to closely collaborate with domain experts and enlist their help in the near future.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>In this work, we have presented our approach to construct an art-historic KG from digitized texts in an automated manner. We have leveraged existing Open IE tools for various stages of the KG construction process and discussed the limitations and challenges while adapting these generic tools for domain-specific datasets. We have presented these insights with the hope of encouraging interesting dialogue and further progress along these lines. While our limited initial analysis and evaluation has shown encouraging results, it has also shown clear indications towards the points of improvement for creating a more refined and comprehensive version of an art-historic KG which could be used for downstream tasks such as search and querying.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Construction of art-historic KG.</figDesc><graphic coords="4,89.29,84.19,416.69,234.39" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Illustration of a subset of the KG.</figDesc><graphic coords="9,89.29,319.94,416.69,234.39" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>photo_secession _gsallery berthe_weills_ gallery</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Examples of query results on the KG (node colours assigned by Neo4j).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Illustration of a subset of KG, depicting the influence of and on Picasso (corresponding query: MATCH p=(s)-[:beinfluenceby]-(o) WHERE s.name="Pablo Picasso" RETURN p)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Statistics of the KG</figDesc><table><row><cell>Attribute</cell><cell>Total Triples</cell><cell>Unique Entities</cell><cell>Unique Relations</cell><cell cols="2">Artworks Artists</cell></row><row><cell cols="3">Count 147,510 35,305</cell><cell>33,448</cell><cell>1,397</cell><cell>656</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Examples of triples in the KG with their corresponding source texts T1 ⟨The Third of May 1808, beIn, Madrid⟩ At the center of the show, a room containing Francisco de Goya's The Third of May 1808 in Madrid (1814), Édouard Manet's The Execution of Emperor Maximilian of Mexico (1868-69). . . T2 ⟨American, beInfluenceBy, Pablo Picasso⟩ The more one examines Gorky's early works, the more they appear like Gorkys rather than like Picassos. Moreover, his unabashed borrowings can be seen as forward-looking: for an American to be influenced by Picasso in the heyday of American Scene painting was, art historian Meyer Schapiro points out, "an act of originality. " T3 ⟨Pablo Picasso, beInfluenceBy, Morris Louis⟩ . . . to Andrew Hudson, art critic of The Washington Post, for suggesting that Pablo Picasso has been influenced by Morris Louis and Kenneth Noland, two leaders of the "post-painterly" Washington, D.C. T4 ⟨Guevara, beInfluenceBy, Pablo Picasso⟩ It is probable that Guevara was influenced by Picasso to experiment with the encaustic technique, which had been practised in antiquity. Hot wax was used as a medium for mixing floral and vegetable dyes. T5 ⟨Pablo Picasso, beInfluenceBy, Aubrey Beardsley⟩ Picasso was influenced doubtless by Aubrey Beardsley, who had died in 1899 at the age of twenty-six, but then what an excellent influence it proved to be for this portrait ! T6 ⟨Appel, beInfluenceBy, Pablo Picasso⟩ In artistic respect, one could also see, that Karel Appel was strongly influenced in this period, by Picasso and Miro.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://hpi.de/naumann/projects/web-science/ai4art.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Linked Open Data: http://www.w3.org/DesignIssues/LinkedData</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://fondazionezeri.unibo.it/en</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://spacy.io/usage/v3</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">It is to be noted that existing canonicalization techniques such as CESI are largely optimized for canonicalization of entities and their performance is considerably worse for relations. We also observed similar results during our analysis.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://query.wikidata.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://pypi.org/project/networkx/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://neo4j.com</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Knowledge graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blomqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cochez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D</forename><surname>Melo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E L</forename><surname>Gayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Neumaier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1" to="37" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">DBpedia -A large-scale, multilingual knowledge base extracted from Wikipedia</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jentzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kontokostas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Kleef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="167" to="195" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Wikidata: A free collaborative knowledge base</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<title level="m">Domain-specific knowledge graph construction</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Knowlife: A knowledge graph for health and life sciences</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ernst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th International Conference on Data Engineering</title>
				<meeting>the 30th International Conference on Data Engineering</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1254" to="1257" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Searching news articles using an event knowledge graph leveraged by Wikidata</title>
		<author>
			<persName><forename type="first">C</forename><surname>Rudnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ehrhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ferret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teyssou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tannier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Companion Proceedings of The 2019 World Wide Web Conference</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1232" to="1239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Computational fact checking from knowledge networks</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L</forename><surname>Ciampaglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shiralkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Rocha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bollen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Menczer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Flammini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PloS one</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0128193</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Incremental Knowledge Base Construction using Deepdive</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">De</forename><surname>Sa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment International Conference on Very Large Data Bases</title>
				<meeting>the VLDB Endowment International Conference on Very Large Data Bases</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">1310</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Toward an Architecture for Never-Ending Language Learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Carlson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Betteridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kisiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Settles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">R</forename><surname>Hruschka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th AAAI Conference on Artificial Intelligence</title>
				<meeting>the 24th AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1306" to="1313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Open information extraction from the web</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="68" to="74" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Identifying Relations for Open Information Extraction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Fader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1535" to="1545" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Textrunner: Open Information Extraction on the Web</title>
		<author>
			<persName><forename type="first">A</forename><surname>Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Broadhead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)</title>
				<meeting>Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="25" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Leveraging linguistic structure for open domain information extraction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J J</forename><surname>Premkumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
		<title level="s">Long Papers</title>
		<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="344" to="354" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">ClausIE: Clause-based open information extraction</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Corro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd International Conference on World Wide Web</title>
				<meeting>the 22nd International Conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">MinIE: Minimizing facts in open information extraction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gashteovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Del Corro</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D17-1278</idno>
		<ptr target="https://aclanthology.org/D17-1278.doi:10.18653/v1/D17-1278" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2630" to="2640" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kolluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Adlakha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mausam</surname></persName>
		</author>
		<author>
			<persName><surname>Chakrabarti</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.306</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.306.doi:10.18653/v1/2020.emnlp-main.306" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3748" to="3761" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Predicting Completeness in Knowledge Bases</title>
		<author>
			<persName><forename type="first">L</forename><surname>Galárraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Razniewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Amarilli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM International Conference on Web Search and Data Mining</title>
				<meeting>the 10th ACM International Conference on Web Search and Data Mining</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="375" to="383" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Domain-Specific Knowledge Graph Construction for Semantic Analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Extended Semantic Web Conference (ESWC) 2020 Satellite Events</title>
				<meeting>the Extended Semantic Web Conference (ESWC) 2020 Satellite Events<address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="250" to="260" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">PaintKG: The painting knowledge graph using biLSTM-CRF</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gao</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICISE51755.2020.00094</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 International Conference on Information Science and Education (ICISE-IE)</title>
				<meeting>the 2020 International Conference on Information Science and Education (ICISE-IE)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="412" to="417" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Building a Semantic Knowledge-base for Painting Conservators</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hunter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Odat</surname></persName>
		</author>
		<idno type="DOI">10.1109/eScience.2011.32</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2011 IEEE Seventh International Conference on eScience</title>
				<meeting>the 2011 IEEE Seventh International Conference on eScience</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="173" to="180" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">ArCo: The Italian cultural heritage knowledge graph</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Carriero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Mancinelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Marinucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Nuzzolese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Presutti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Veninata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="36" to="52" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Castellano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sansaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Vessio</surname></persName>
		</author>
		<idno>arXiv-2105</idno>
		<title level="m">ArtGraph: Towards an Artistic Knowledge Graph</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Information extraction for knowledge base construction in the music domain</title>
		<author>
			<persName><forename type="first">S</forename><surname>Oramas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa-Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sordo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Serra</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.datak.2016.06.001</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0169023X16300416.doi:10.1016/j.datak.2016.06.001" />
	</analytic>
	<monogr>
		<title level="j">Data and Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">106</biblScope>
			<biblScope unit="page" from="70" to="83" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">ArDO: An ontology to describe the dynamics of multimedia archival records</title>
		<author>
			<persName><forename type="first">O</forename><surname>Vsesviatska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tietz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hoppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sprau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dessì</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sack</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th Annual ACM Symposium on Applied Computing</title>
				<meeting>the 36th Annual ACM Symposium on Applied Computing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1855" to="1863" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Linked Stage Graph</title>
		<author>
			<persName><forename type="first">T</forename><surname>Tietz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Waitelonis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Felgentreff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sack</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SEMANTICS Posters&amp;Demos</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Oldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Labs</surname></persName>
		</author>
		<title level="m">The CIDOC Conceptual Reference Model (CIDOC-CRM): PRIMER, CIDOC-CRM official web site</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">The Rijksmuseum Collection as Linked Data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Dijkshoorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jongma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Aroyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Ossenbruggen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ter Weele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wielemaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="221" to="230" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Van Hooland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<title level="m">Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata</title>
				<imprint>
			<publisher>Facet publishing</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">The Stanford CoreNLP natural language processing toolkit</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Surdeanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Finkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mcclosky</surname></persName>
		</author>
		<ptr target="http://www.aclweb.org/anthology/P/P14/P14-5010" />
	</analytic>
	<monogr>
		<title level="m">Association for Computational Linguistics (ACL) System Demonstrations</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="55" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">HDSKG: Harvesting domain specific knowledge graph from content of webpages</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Kabir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Sawada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-W</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th International Conference on Software Analysis, Evolution and Re-engineering (SANER)</title>
				<meeting>the 24th International Conference on Software Analysis, Evolution and Re-engineering (SANER)</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="56" to="67" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">AKMiner: Domain-specific knowledge graph mining from academic literatures</title>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Web Information Systems Engineering</title>
				<meeting>the International Conference on Web Information Systems Engineering</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="241" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Constructing biomedical domainspecific knowledge graph with minimum supervision</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Information Systems</title>
		<imprint>
			<biblScope unit="volume">62</biblScope>
			<biblScope unit="page" from="317" to="336" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Bio2RDF: Towards a mashup to build bioinformatics knowledge systems</title>
		<author>
			<persName><forename type="first">F</forename><surname>Belleau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Nolin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tourigny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rigault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Morissette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical informatics</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="706" to="716" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Knowlife: A versatile approach for constructing a large knowledge graph for biomedical sciences</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ernst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Mona L.? Identifying Mentions of Artworks in Historical Archives</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Krestel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Who</forename><surname>Is</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Theory and Practice of Digital Libraries</title>
				<meeting>the International Conference on Theory and Practice of Digital Libraries<address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="115" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Stanza: A Python natural language processing toolkit for many human languages</title>
		<author>
			<persName><forename type="first">P</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bolton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<ptr target="https://nlp.stanford.edu/pubs/qi2020stanza.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">CESI: Canonicalizing open knowledge bases using embeddings and side information</title>
		<author>
			<persName><forename type="first">S</forename><surname>Vashishth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Talukdar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 World Wide Web Conference</title>
				<meeting>the 2018 World Wide Web Conference</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1317" to="1327" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Holographic embeddings of knowledge graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nickel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rosasco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Poggio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">30</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Power and centrality: A family of measures</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bonacich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American journal of sociology</title>
		<imprint>
			<biblScope unit="volume">92</biblScope>
			<biblScope unit="page" from="1170" to="1182" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<title level="m" type="main">The PageRank Citation Ranking: Bringing Order to the Web</title>
		<author>
			<persName><forename type="first">L</forename><surname>Page</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Motwani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Winograd</surname></persName>
		</author>
		<idno>1999-66</idno>
		<ptr target="http://ilpubs.stanford.edu:8090/422/,previousnumber=SIDL-WP-1999-0120" />
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>Stanford InfoLab</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Deep reinforcement learning for mention-ranking coreference models</title>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<ptr target="https://nlp.stanford.edu/pubs/clark2016deep.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods on Natural Language Processing</title>
				<meeting>the 2016 Conference on Empirical Methods on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<monogr>
		<title level="m" type="main">Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network</title>
		<author>
			<persName><forename type="first">T</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jiang</surname></persName>
		</author>
		<idno>CoRR abs/2006.09610</idno>
		<ptr target="https://arxiv.org/abs/2006.09610.arXiv:2006.09610" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Open knowledge graphs canonicalization using variational autoencoders</title>
		<author>
			<persName><forename type="first">S</forename><surname>Dash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rossiello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bagchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gliozzo</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.811</idno>
		<ptr target="https://aclanthology.org/2021.emnlp-main.811.doi:10.18653/v1/2021.emnlp-main.811" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and<address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="10379" to="10394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">FLAIR: An easy-to-use framework for state-of-the-art NLP</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akbik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bergmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blythe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rasul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schweter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vollgraf</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-4010</idno>
		<ptr target="https://aclanthology.org/N19-4010.doi:10.18653/v1/N19-4010" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics</title>
				<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="54" to="59" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Scalable Zero-shot Entity Linking with Dense Entity Retrieval</title>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Josifoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.519</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.519.doi:10.18653/v1/2020.emnlp-main.519" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6397" to="6407" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
