<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Generating Domain-Specific Knowledge Graphs: Challenges with Open Information Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nitisha Jain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alejandro Sierra-Múnera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julius Streit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Thormeyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Lomaeva</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralf Krestel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HPI - Hasso Plattner Institute</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kiel University</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Potsdam</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>ZBW - Leibniz Centre for Economics</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>0</volume>
      <fpage>5</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Knowledge Graphs (KGs) are a popular way to structure and represent knowledge in a machine-readable way. While KGs serve as the foundation for many applications, the automatic construction of these KGs from texts is a challenging task where Open Information Extraction techniques are prominently leveraged. In this paper, we focus on generating a domain-specific knowledge graph based on art-historic texts from a digitized text collection. We describe the combined use and adaption of existing open information extraction methods to build an art-historic KG that can facilitate data exploration for domain experts. We discuss the challenges that were faced at each step and present detailed error analysis to identify the limitations of existing methods when working with domain-specific corpora.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge graphs</kwd>
        <kwd>Open information extraction</kwd>
        <kwd>Domain-specific texts</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) have gained considerable popularity in both academia and industry.
They are employed to represent information in a structured format after extraction from large
collections of heterogeneous, diverse, and unstructured documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These KGs can then be
used for downstream tasks, such as question answering, logical inference, recommendation,
or information retrieval. Besides general KGs that aim to capture generic knowledge about
real-world data, such as DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Wikidata [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], domain-specific KGs have become
important for targeted domains [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They have been leveraged to support multiple
informationbased applications, e.g., in the context of health and life sciences [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], news search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or fact
checking [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        There have been several eforts towards automatic construction of general purpose knowledge
graphs from the Web based on machine learning techniques [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. In the absence of a
prespecified list of relations for performing pattern-based extractions, Open Information Extraction
(Open IE) is a popular approach, where a large set of relational triples can be extracted from
text without any human input or domain expertise [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Several Open IE techniques have
been proposed to build and populate knowledge graphs from free-form texts [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16">11, 12, 13, 14,
15, 16</xref>
        ]. However, these methods for automated knowledge base construction sufer from a
number of shortcomings in terms of their coverage [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and applicability to specific domains [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Existing techniques that exhibit state-of-the-art results on standard, clean datasets fail to achieve
comparable performance for domain-specific datasets, e.g., in the art-historic domain where the
data often consists of highly heterogeneous and noisy collections [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        KG for Art. The art and cultural heritage domain provides a plethora of opportunities for
knowledge graph applications. An art knowledge graph can enable art historians, as well as
interested users, to explore interesting information that is hidden in large volumes of text in a
structured manner. With a large variety of diverse information sources and manifold application
scenarios, the (automated) construction of task-specific and domain-specific knowledge graphs
becomes even more crucial for this domain. In contrast to general purpose KGs, a KG for
the art domain could comprise a specific set of entity types, such as artworks, galleries, as
well as relevant relations, such as influenced_by, part_of_movement etc., depending on the
specific task and on the specific text collection. The important entities and relations might also
difer across diferent document types, such as auction catalogues, exhibition catalogues, or art
magazines. On one hand, a general purpose, art-oriented ontology may not be well-suited and
comprehensive enough for specific data collections. On the other hand, designing a custom
ontology for the diferent art corpora would be a challenging and expensive task due to the
need for significant domain expertise. In the past, several attempts have been made at creating
KGs for art and related domains [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
        ], with the most recent one by Castellano et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
However, a systematic method for the construction of a knowledge graph based on a collection
of art-related documents without a well-defined ontology has not been proposed thus far.
Goals. In this paper, we describe an ongoing project1 for the automatic construction of a
knowledge graph based on a large, private archive of art-historic documents. Instead of relying
on existing ontologies to dictate the information extraction process (that might restrict the
scope of the entities and relations that could be extracted from the text when the ontology is not
hand-crafted for the specific dataset) we decided to pursue the schema-less Open IE approach
in this work. We present the results from our exploration of existing Open IE techniques to
generate structured information and discuss our insights in terms of their shortcomings and
limited applicability when deployed for noisy, digitized data in the art domain.
      </p>
      <p>We make the following contributions in this paper: (i) Construct a domain-specific knowledge
graph based on a collection of digitized art-historic documents. (ii) Describe the process of
automated construction of the KG with Open IE techniques. (iii) Analyze and discuss the
challenges and limitations for the adaptation of Open IE tools to domain-specific datasets.</p>
      <sec id="sec-1-1">
        <title>1https://hpi.de/naumann/projects/web-science/ai4art.html</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        With the availability of digitized cultural data, several previous works have proposed KGs for
art-related datasets [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref23">19, 20, 21, 23</xref>
        ]. Arco [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is a large Italian cultural heritage graph with a
pre-defined ontology that was developed in a collaborative fashion with contributions from
domain experts all over the country. While the Arco KG is quite broad in its coverage, Ardo [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
pertains to a very specific use case of multimedia archival records. Similarly, the Linked Stage
Graph [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] was developed as a KG specifically for storing historical data about the Stuttgart State
Theater. Increasingly, the principles of linked open data2 have also been widely adopted within
the cultural heritage domain for facilitating researchers, practitioners and generic users to study
and consume cultural objects. Notable examples include the CIDOC-CRM [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], the Rijksmuseum
collection [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], the Zeri Photo Archive3, OpenGLAM [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] among many others. Most related
to our work is the ArtGraph [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] where the authors have integrated the art resources from
DBpedia and WikiArt and constructed a KG with a well-defined schema that is centered around
artworks and artists. While all these works are concerned with KGs and ontologies for specific
art-related corpora, they have leveraged a schema for representing the information and are not
concerned with the challenges of a schema-free extraction process, which is the main focus of
this work.
      </p>
      <p>
        Open IE approaches extract triples directly from text, without an explicit ontology or schema
behind the extraction process. Several works have been proposed in the past. TextRunner [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
relies on a self supervised classifier which determines trustworthy relationships with pairs of
entities, while Reverb [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] uses syntactical and lexical constraints to overcome incoherent and
uninformative relationships. ClausIE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] relies heavily on dependency parsing to construct
clauses from which the propositions will be extracted. In this work, we have leveraged the
Stanford CoreNLP OpenIE implementation [
        <xref ref-type="bibr" rid="ref13 ref29">29, 13</xref>
        ] that uses dependency parsing to minimize
the phrases of the resulting clauses, and was originally evaluated in a slot filling task.
      </p>
      <p>
        The construction of domain-specific KGs has been the subject of investigation in previous
works for various domains, e.g. software engineering [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], academic literatures [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], and
more prominently, the biomedical domain [
        <xref ref-type="bibr" rid="ref32 ref33 ref34">32, 33, 34</xref>
        ]. However, the previously proposed
automated methods are not directly applicable for the arts and cultural heritage domain, where
unique challenges with respect to the heterogeneity and quality of data are prevalent. This
work identifies and discusses the particular dificulties encountered while applying existing
information extraction techniques to art-related corpora.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Automated Construction of Art-historic KG</title>
      <p>In this section, we describe our underlying art-historic dataset as well as the steps employed
for the automated extraction of information (in form of triples) to construct an art-historic
knowledge graph. Fig. 1 shows an overview of this process.</p>
      <sec id="sec-3-1">
        <title>2Linked Open Data: http://www.w3.org/DesignIssues/LinkedData 3https://fondazionezeri.unibo.it/en</title>
        <sec id="sec-3-1-1">
          <title>3.1. Dataset</title>
          <p>For this work, we are working with a large collection of recently digitized art-historical texts
provided by our project partners. This collection consists of a variety of heterogeneous documents
including auction catalogs, exhibition catalogs, art books, etc. that contain semi-structured as
well as unstructured texts describing artists, artworks, exhibitions and so on. Art historians
regularly study these data collections for art-historical analysis. Therefore, a systematic
representation of this data in the form of a KG would be a valuable resource for them to explore
this data swiftly and eficiently. The whole collection is quite large ( ≈ 1TB of data), in order to
restrict the size of the dataset for a proof-of-concept of our KG construction process, a subset
of this dataset pertaining to information about the artist Picasso was chosen. The decision of
choosing an artist-oriented subset of the collection enabled us to better understand the context
and evaluate the triples that were obtained throughout the process of KG construction. The data
was filtered by querying the document collection using the keyword query ‘Picasso’, resulting
in 224,469 entries (where each entry corresponds to a page of the original digitized corpus)
containing the term ‘Picasso’. Due to the filtering, each entry is an independent document, in
the sense that the neighboring entries do not always represent the correct context. This led to
some of the entries in our dataset containing incomplete sentences at the beginning or the end
of a page. One such example is an entry starting with ‘to say47—Picasso never belittled his work,
until . . . ’ where the tokens ‘to say’ belong to a sentence which started in a diferent entry, that
might no longer be a part of the dataset under consideration. It is important to note that in the
same example we can see more noise, e.g., numbers are mixed in between words in the digitized
version of the text. This noise in the dataset was introduced by the optical character recognition
(OCR) process during the digitization of the documents (performed in a prior step by the data
providers). In general, the dataset contains full sentences, such as ‘Matisse’s return to the study
of ancient and Renaissance sculpture is significant in itself.’ , as well as short description phrases,
ifgure captions or footnotes such as ‘G. Bloch, Pablo Picasso, Bern, 1972, vol. III, p.142’.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Finding Named Entities</title>
          <p>
            As a first step, it was interesting to inspect if the named entities present in the corpus could be
easily identified. A dictionary-based approach to find the named entities would identify the
mentions with a high precision, but at the cost of very low recall by ignoring many potentially
interesting entities to be discovered in the corpus. Therefore, we chose to follow a machine
learning approach to named entity recognition (NER). Generic NER tools work very well for
the common entity types, such as person, location, organization and so on, though fine-grained
or domain-specific entities are harder to identify [
            <xref ref-type="bibr" rid="ref35">35</xref>
            ]. We employed the SpaCy library4 for
ifnding named entities since its pre-trained models includes a Work_Of_Art category that could
potentially identify the entities that are important in the art domain (this could encompass
mentions of paintings, books, statues etc.). Excluding the cardinal entities in order to reduce
noise, the SpaCy library with the pre-trained ‘en_core_web_trf’ model was used to identify the
following entity types - Work_Of_Art, Person, Product, ORG, LOC, GPE and NORP, which
showed reasonably good results. The process of NER enabled us to filter out any sentences
without any entity mention since such sentences were likely to have no useful information for
the KG construction. Thus, the NER step helped with pruning the dataset for further processing,
as well as improving the quality of the resulting KG.
          </p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.3. Triple Extraction</title>
          <p>
            After obtaining informative sentences from the previous step, we employed Open IE tools to
extract the triples from them. It is important to note that while there are some art-related
ontologies proposed in previous works such as Arco [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] and ArDo [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], none of them are
suitable for our corpus since they are very specific to the datasets they were designed for. Other
general ontologies such as CIDOC-CRM are, on the other hand, too broad and would not be
able to extract novel and interesting facts from a custom and heterogeneous corpus such as
ours, where the entities and relations among them are not known before hand. In the absence
of such an ontology specifically designed for the description of art-historic catalogs, we choose
to employ open information extraction techniques for the construction of our KG in order to
broaden the scope and utility of the extracted information.
          </p>
          <p>
            To this end, we ran the Stanford CoreNLP OpenIE annotator [
            <xref ref-type="bibr" rid="ref29 ref36">29, 36</xref>
            ] to extract ⟨subject,
predicate, object⟩ triples from the sentences. A total of 5,057,488 triples were extracted in this
process, where multiple triples could be extracted from a single sentence. Another round of
ifltering was performed at this stage, where any triples that did not contain a named entity in
the subject or object phrase were removed. Additionally, duplicate entries and triples with serial
numbers as entities were also ignored. Some examples of triples that were removed are: ⟨we,
have, good relationship⟩, ⟨i, be, director⟩, ⟨brothel, be in, evening⟩, ⟨drawings, acquired, work⟩. A
total of 160,000 triples remained, a valid triple at this stage looked like ⟨P. Picasso, is, artiste⟩.
          </p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.4. Entity Linking</title>
          <p>
            Once the triples were extracted, the entity linking component of the Stanford CoreNLP pipeline [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ]
was used to link the entities. This component uses WikiDict as a resource, and uses the
dictionary to match the entity mention text to a specific entity in Wikipedia. Since the entities in
our dataset were present in multiple diferent surface forms, this step allowed us to partially
normalize the entities and identify the unique entities. Though the number of entities was
reduced as a result, the total number of triples remained the same. Note that this linking could
only map entities to their Wikipedia counterpart if the entity was found as a subject or object
in a triple. In many cases though, the subject and object were noun phrases instead of obvious
entities, for which this kind of linking did not really work. This process was still quite useful as
around 108,841 out of 337,100 entities were successfully linked to their Wikipedia form (leading
to 8,369 unique entities). Some of the most frequent entities found in the dataset (along with
their frequencies) were: (Pablo_Picasso, 11219), (Paris, 2178), (Artist, 1904), (Henri_Matisse, 1769),
(Georges_Braque, 1352).
          </p>
        </sec>
        <sec id="sec-3-1-5">
          <title>3.5. Canonicalization</title>
          <p>
            One of the main challenges when constructing a KG through Open IE techniques, is that of
canonicalization. Multiple surface forms of the same entity or relation might be observed in the
triples extracted with Open IE techniques in the form of noun phrases or verb phrases that need
to be identified and tagged to a single semantic entity or relation in the KG. Since the triples
extracted from our dataset via Open IE method comprised many noisy phrases, as well as new
entities, such as titles of artworks, that may not be available for mapping in existing databases,
entity linking techniques would not sufice in this case. Diferent from entity linking (that can
only link entities already present in external KGs), canonicalization is able to perform clustering
for the entities and relations that may not be present in existing KGs, by labelling them as OOV
(out of vocabulary) instances. In this work, we chose to perform canonicalization with the help
of CESI [
            <xref ref-type="bibr" rid="ref37">37</xref>
            ] which is a popular and openly available approach for this task. The CESI approach
performs clustering over the non-canonicalized forms of noun phrases for entities and verb
phrases for the relations. It leverages diferent sources of side information for noun phrases
and relation phrases such as entity linking, word senses and rule-mining systems for learning
embeddings for these phrases using the HolE [
            <xref ref-type="bibr" rid="ref38">38</xref>
            ] knowledge graph embedding technique. The
clustering is then performed using hierarchical agglomerative clustering (HAC) based on the
cosine similarity of the phrase embeddings in vector space. In this manner, diferent phrases for
the same entity or relation were mapped to one canonicalized form for including in the KG. In
total, we obtained 3,789 entity clusters and 3,778 relation clusters from the CESI approach that
contained two or more terms.
          </p>
          <p>Representative Selection. An important step in the CESI approach is the assignment of
representatives for the clusters obtained for the noun and relation phrases. This is decided by
calculating a weighted mean of all the cluster members’ embeddings in terms of their frequency
of occurrence. The phrase closest to this mean is selected as the representative. However, this
technique did not work well for our domain-specific and noisy dataset and many undesirable
errors were noticed. For example, an entity cluster obtained from CESI was: Olga_Khokhlova,
olga, khokhlova, picasso. Since Picasso is the most frequent entity in the dataset, it was chosen
as representative by CESI, but this is clearly wrong since Picasso and Olga are diferent entities.
There were several other errors observed, e.g., all days of the week were clustered together in
one cluster. This could be a result of the embedding and contexts of the days of the week to
be quite similar, hence their vectors would end up together in the vector space. In other cases,
the color blue occasionally showed up in a cluster of phrases related to color red, certain dates
got clustered and certain related but not interchangeable words got clustered (kill vs murder vs
shot). In some cases, the first name was being replaced by the incorrect full name (not every
david is david johnson). To mitigate the above discussed errors, we had to perform manual
vetting of the clusters for verification and selection of the correct cluster representatives which
took around 2-3 person hours. During this process, certain clusters, where the entities were
diferent, were removed (such as the cluster with days of the week). After this, the entities and
relations were canonicalized as per their chosen cluster representatives leading to a total of
35,305 unique entities and 33,448 unique relations in the final KG 5.</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>3.6. Entity Typing</title>
          <p>
            Since a schema or ontology was not employed to extract the triples from text, the entities in
our KG do not have any entity types implicitly assigned to them. Therefore, we attempted
to identify the types of as many entities in our graph as possible. With the help of NER, we
assigned the types to the entities that were recognized in the triples. A total of 14,960 entities
were typed with this technique to generic types such as Person, Product, ORG, LOC, GPE,
NORP and Work_Of_Art, as well as numeric types such as Date, Time and Ordinal. Note
that Work_of_Art is quite a broad category that includes artworks but also movies, books and
various other art forms. Since artworks such as paintings and sculptures are one of the most
important entities in our art-historic KG, it is worthwhile to identify the mention and type of
these entities. However, generic NER process is neither equipped nor optimized to correctly
identify such mentions. Thus, we additionally applied dictionary-based matching. This was
done by compiling a large gazetteer of artwork titles by querying Wikidata with the help of the
Wikidata Query Service6 for the names of paintings and sculptures, retrieving approximately
15,000 artwork titles. In addition, we augmented our dictionary with the names of the artwork
entities from the ArtGraph dataset [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] which contains more than 60,000 artworks derived from
DBpedia and WikiArt. If a match was found for an entity in our KG in the compiled dictionary,
the type was assigned as artwork accordingly. This led to the tagging of further 1,397 entities
in our KG as artworks. The dictionary-based matching for artworks was particularly useful
in the cases where it was able to correctly identify entities that were wrongly assigned as the
5It is to be noted that existing canonicalization techniques such as CESI are largely optimized for canonicalization
of entities and their performance is considerably worse for relations. We also observed similar results during our
analysis.
          </p>
          <p>6https://query.wikidata.org/
Person type by NER, such as la_donna_gravida, portrait_of_mary_cassatt and st._paul_in_prison.
Similar to artworks, we attempted to additionally identify the names of artists in our triples.
While NER could only tag entities as Person, we used a dictionary of artist names from Wikidata
to identify 656 unique artist entities in our data. These included names of artists such as Piet
Mondrian, Edvard Munch and Rembrandt.</p>
          <p>However, the process of entity typing described above is only able to identify and tag around
half of the entities in our KG. Several domain and corpus-specific challenges acted as bottlenecks
during this process. For example, even after filtering, some triples extracted from Open IE
contained either subject or object noun phrases that were generic and did not correspond to any
named entity. Examples of such phrases include essay, anthology, periodical, or album that are
present in triples such as ⟨album, be_shown_in, Paris⟩. Without designing a custom ontology
for this corpus, such entities cannot be hoped to be correctly typed.</p>
          <p>The categorization of the relations in the KG is a particularly complicated task due to the
wide variety of relations extracted from the Open IE process. Few of the most frequent relations
in the KG are will, be_in, have, show, paint, work etc. We estimated that the types of the entities
could be utilized to find patterns and link the most popular edges in the KG to the relations
in existing graphs such as Wikidata or ArtGraph. However, preliminary analysis led to some
interesting observations. Firstly, we noted the presence of multiple relations between pairs of
entities in the KG. For example, Picasso and June are connected by various relations such as
will_be, work and take_trip_in that were extracted from diferent contexts in the corpus and
represent separate meaningful facts. Furthermore, in general, there are several diferent types
of semantic relations between the popular entity types in our KG. For instance, two entities
of the type artist are connected by several relations including work, meet, know_well, be_with,
friend_of and be_admirer_of. While this variety indicates that a large number of interesting
facts have been derived by Open IE in the absence of a fixed and limiting schema, normalizing
the relations to improve the quality of the KG is a dificult task that is part of the ongoing and
future work.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Art-historic Knowledge Graph</title>
      <p>The statistics of the KG generated from the steps as described in the previous section are shown
in Table 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Graph Features</title>
        <p>
          After obtaining the refined set of triples for the first version of the art-historic KG, we performed
a preliminary analysis of the graph to derive useful insights with the help of the NetworkX7
package. To understand the graph structure, the number of disconnected components of
the graph was measured before and after the canonicalization step. It was noticed that the
number of disconnected components was reduced to around 1,500 (down from 2,500) after
clustering with CESI. This indicates that canonicalization of entities and relations improved
the quality of the knowledge graph by removing unnecessary disconnected parts that were
created through redundant triples. Additionally, we also performed node centrality on the graph
using eigenvector centrality [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] and link analysis using PageRank [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. For both the measures,
the node for Pablo Picasso was the most central. This confirms the property of the underlying
dataset which is focused on Picasso. Other central nodes discovered were corresponding to
popular words in the corpus such as work, artist, painting etc. Overall, it is promising to witness
that centrality analysis of the generated KG conforms well regarding the main entities and
topics of the underlying corpus. A hand-picked example of a subset of the neighborhood of the
entity Picasso is shown in Fig. 2.
1–18
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation</title>
        <p>
          Due to the lack of any gold standard for direct comparison, the evaluation of the resulting
KG proved challenging. While an absolute measure of the coverage of any KG is a non-trivial
task due to the open world assumption [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], we attempted to perform limited evaluation in
terms of the coverage of the KG in a semi-automated fashion. For this, we first created a
(a) Artists exhibited at. (corresponding query:
        </p>
        <p>MATCH p=(:Artist)-[r:exhibitat]-&gt;() RETURN p)
Jacques
Villon</p>
        <p>exhibitat
Georges
Braque
exh</p>
        <p>ibitat
salon_des_
independants
photo_secession</p>
        <p>_gsallery
exhibitat</p>
        <p>Wassily
Kandinsky
annual_
exhibition
(b) Picasso involved in various Art schools. (corresponding query:</p>
        <p>
          MATCH p=(s)-[r:involvedin]-&gt;() WHERE s.name="Pablo Picasso" RETURN p)
subset of Wikidata [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] by querying for triples about the entity Picasso and used this as the
knowledge graph for comparison. This is motivated by the fact that Wikidata contains high
quality information about Picasso and the entity linking used in our pipeline performs the
linking to Wikipedia (hence, Wikidata) entities. Therefore, it was likely to have a higher match
between the surface forms of entities in our KG to the Wikipedia entities, as compared to other
datasets such as DBpedia.
        </p>
        <p>From the obtained Wikidata subset, 100 triples were randomly selected that related to
information about Picasso as well as about museums that owned his works. Upon careful manual
inspection (independently by three annotators) and resolution of conflicts with discussions, it
was measured that the facts represented in 43% of these triples were also present in our KG
as a direct match or in a diferent form with the same meaning. Notably, our KG was missing
information about the museums that own Picasso’s works, this is because our underlying corpus
is also lacking comprehensive information on this topic. Therefore, triples relating to museums
from Wikidata could not be matched. Additionally, we checked how many of our entities and
entity pairs are written in exactly the same way as in the Wikidata graph. Overall, around 12% of
entities and 10% of entity pairs in our graph have exact matches in Wikidata. These preliminary
results are promising and point towards the need for a domain-oriented construction process
for further improvement of the art-historic KG. In particular, the precision of the triples in
art-historic KG is more important to the users and therefore, factual verification for the triples
that were extracted from our dataset but are not found in Wikidata needs to be conducted by
enlisting the help of domain experts.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Implementation</title>
        <p>
          Taking cue from related work [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], we have encoded our KG data into Neo4j8 which is a no-SQL
graph database that provides an eficient way of capturing the diverse connections between
the diferent entities of our knowledge graph. Additionally, the knowledge graph stored in the
Neo4j database can be queried easily with the help of the Cypher language for enabling data
exploration and knowledge discovery. Fig. 3 shows the results of a few example queries that
can be executed on the KG - venues where Picasso and other artists had exhibited their work;
and various art schools or movements where Picasso was involved. Further, Fig. 4 shows the
persons and/or art styles that Picasso influenced or was influenced by. In some cases, interesting
connections with other relevant entities are also retrieved, thus providing useful cues for further
exploration of the data in the KG for domain experts as well as interested users.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Error Analysis</title>
      <p>Due to the source corpus being heterogeneous and noisy, the Open IE process led to a number
of incorrect triples in the KG despite our best eforts to eliminate the noise at each step. Here,
we perform a critical analysis and look deeper into the quality of the triples in the first version
of the KG. For this, we sample few of the incorrectly extracted triples, to understand the nature
of mistakes committed by the automated KG generation process. Table 2 presents some triples
in the KG and the corresponding text snippets in the input data from which they were extracted.</p>
      <p>In T1, even though the triple appears to be syntactically correct, the actual entity corresponds
to the entire phrase The Third of May 1808 in Madrid which is an artwork, and thus the correct
triples should relate this artwork to the corresponding artist Francisco de Goya, perhaps including
the date 1814 as well. This example illustrates the dificulty of recognizing artwork titles, given
that they usually contain other entities like Madrid (location). A similar mistake can be seen
in T6. Here Appel was incorrectly recognized as a location instead of the surname of Karel
Appel (person), and thus the triple represents the information to be an influence of an artist on
a location, instead of between the artists.</p>
      <p>Examples in T2 to T6 represent the triples and the supporting text snippets for the results
of the query as depicted in Figure 4, which contains a mixture of factually correct, factually
incorrect, and speculative facts. In T2, a relation was correctly extracted from the text, but the
head entity was incorrectly recognized as ‘American’. This example speaks for the need for
additional work on co-reference resolution, in order to properly follow the connections in the
text. A more precise triple would have been ⟨Gorky, beInfluenceBy, Pablo Picasso ⟩.</p>
      <p>T3 is an example in which the lack of context in the syntactic analysis of the sentence results
in the assumption that the statement is true, although it is a suggestion by a specific person
and therefore, not necessarily a true fact. A similar example is T4 in which the source text is
Morris</p>
      <p>Louis
roman_art
american</p>
      <p>guevara
beinfluenceby
beinfluenceby
Aubrey
Beardsley
b
eifl
n
u
e
n
c
e
b
y
y
cb
ne
e
u
einfl
b</p>
      <p>Pablo
Picasso
b
eifl
n
u
e
n
c
e
b
y
y
b
e
c
n
e
u
eifl
n
b
philpot
beinfluenceby
beinfluenceby</p>
      <p>Appel
femme_
couchee</p>
      <p>City
Artist
NORP
None
⟨American, beInfluenceBy, Pablo Picasso⟩
The more one examines Gorky’s early works, the more they appear like Gorkys rather than
like Picassos. Moreover, his unabashed borrowings can be seen as forward-looking: for an
American to be influenced by Picasso in the heyday of American Scene painting was, art
historian Meyer Schapiro points out, “an act of originality.”
⟨Pablo Picasso, beInfluenceBy, Morris Louis⟩
. . . to Andrew Hudson, art critic of The Washington Post, for suggesting that Pablo Picasso
has been influenced by Morris Louis and Kenneth Noland, two leaders of the “post-painterly”
Washington, D.C.
⟨Guevara, beInfluenceBy, Pablo Picasso⟩
It is probable that Guevara was influenced by Picasso to experiment with the encaustic
technique, which had been practised in antiquity. Hot wax was used as a medium for mixing
floral and vegetable dyes.
⟨Pablo Picasso, beInfluenceBy, Aubrey Beardsley⟩
Picasso was influenced doubtless by Aubrey Beardsley, who had died in 1899 at the age of
twenty-six, but then what an excellent influence it proved to be for this portrait !
⟨Appel, beInfluenceBy, Pablo Picasso⟩
In artistic respect, one could also see, that Karel Appel was strongly influenced in this period,
by Picasso and Miro.
explaining a potential influence relation between the artists, but it cannot be directly assumed
that it is a fact. These two examples illustrate that the context of the actual text might get lost
during the extraction process, which may lead to erroneous facts being represented in the KG.
Thus, it is important to take into account the provenance information that can help the user
understand the full context for obtaining the correct information.</p>
      <p>A diferent scenario is depicted in T5, in which the text clearly confirms the validity of the fact.
One interesting observation is regarding the syntactic structure of the relation phrase - the word
‘doubtless’ acts as an adverb emphasizing the validity of the fact, and although it divides the
relation phrase ‘was influenced by’, the syntactic analyzer and the canonicalization step were
able to normalize the relation to a canonical form. This is also evident in the diversity of relation
phrases in this sample of texts. They are expressed in diferent tenses, with auxiliary verbs, and
sometimes spread within a more complex sentence, as seen in T5. Examples T3 to T6 illustrate
the need for fact-checking in our KG. Particularly, the facts in the KG could be presented to
domain experts who would be able to easily look at the information in a user-friendly manner
and then proceed to investigate further to either corroborate or even contradict the triples in the
automatically generated KG. We envision the easy access and scrutiny of the information stored
in large text collections to be the primary use-case of this automatically generated art-historic
KG.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Lessons Learned and Future Work</title>
      <p>
        This work presented a first attempt at constructing a domain-oriented knowledge graph for
the art domain in an automated fashion with Open IE techniques. Due to the noisy and
heterogeneous dataset that is typical of digitized art-historic collections, we encountered challenges
at various steps of the KG construction process. During the very first step, it was dificult to
correctly identify the mentions of artworks (i.e. titles of paintings) in the dataset due to the
noise and inherent ambiguities. This domain-specific issue needs further attention in order to
improve the quality as well as coverage of the resulting KG, as discussed in detail by previous
work [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. In addition, a co-reference resolution tool [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ] could also help with the identification
and linking of relevant entities.
      </p>
      <p>
        While the Open IE approach allowed for the extraction of a wide variety of entities and
relations, this led to canonicalization becoming a complicated task. We observed that existing
techniques for canonicalization on generic datasets, such as CESI, do not show comparable
performance for domain-specific dataset. It would be interesting to investigate if large
pretrained language models such as FastText and BERT could compete with the relatively older KG
embeddings that were employed in CESI for obtaining better clusters. There are other recent
works on canonicalization [
        <xref ref-type="bibr" rid="ref42 ref43">42, 43</xref>
        ] that demonstrate better results and would be worth exploring
further for our use case in future work. Another important aspect is the incomplete tagging of
the various types of entities obtained from Open IE. Attributed yet again to the noise in the
process, as well as to lack of any underlying schema, many entities could not be assigned their
correct type. This task needs further exploration for the enrichment of the KG.
      </p>
      <p>Moreover, we have only considered English texts in this work so far, since the existing
methods show their best performance with English texts. However, our art-historic collection
is comprised of multiple languages and we would like to expand the pipeline to process
multilingual texts. Taking into account the existing limitations of the methods with domain-specific
corpora, this seems to be an arduous but interesting research challenge.</p>
      <p>
        With regard to the implementation of the KG pipeline, while we have so far used of-the-shelf
tools and libraries like SpaCy, Stanford CoreNLP and CESI, we plan to further fine-tune them
to the task of domain-specific KG construction. It will also be worthwhile to explore and
evaluate the performance with other available tools such as Flair [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] and Blink [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] for entity
recognition, linking and typing, as well as OpenIE [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and MinIE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for the extraction of
triples. The scalability of these approaches and the completeness of the resulting KG in the
presence of new and expanding cultural heritage datasets is also an open research question to
be looked into.
      </p>
      <p>The evaluation of the art-historic KG is also a crucial task worth discussing. While we have
performed a semi-automated evaluation for the first version of our KG, a more rigorous and
thorough evaluation of the correctness of the facts is certainly imperative before this KG can
be useful to a non-expert user (as discussed in Section 5). One way to ensure this would be
to maintain the provenance and of the facts in the KG, in terms of their source document as
well as their confidence measure. This could also facilitate a fair and complementary manual
evaluation in terms of precision and recall which could provide further insights. For this, we
plan to closely collaborate with domain experts and enlist their help in the near future.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this work, we have presented our approach to construct an art-historic KG from digitized
texts in an automated manner. We have leveraged existing Open IE tools for various stages
of the KG construction process and discussed the limitations and challenges while adapting
these generic tools for domain-specific datasets. We have presented these insights with the
hope of encouraging interesting dialogue and further progress along these lines. While our
limited initial analysis and evaluation has shown encouraging results, it has also shown clear
indications towards the points of improvement for creating a more refined and comprehensive
version of an art-historic KG which could be used for downstream tasks such as search and
querying.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , et al.,
          <article-title>DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6 (</article-title>
          <year>2015</year>
          )
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative knowledge base</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kejriwal</surname>
          </string-name>
          ,
          <article-title>Domain-specific knowledge graph construction</article-title>
          , Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siu</surname>
          </string-name>
          , G. Weikum,
          <article-title>Knowlife: A knowledge graph for health and life sciences</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Data Engineering</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>1254</fpage>
          -
          <lpage>1257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ehrhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ferret</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Teyssou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tannier</surname>
          </string-name>
          ,
          <article-title>Searching news articles using an event knowledge graph leveraged by Wikidata</article-title>
          ,
          <source>in: Companion Proceedings of The 2019 World Wide Web Conference</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1232</fpage>
          -
          <lpage>1239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Ciampaglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shiralkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bollen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Menczer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Flammini</surname>
          </string-name>
          ,
          <article-title>Computational fact checking from knowledge networks</article-title>
          ,
          <source>PloS one 10</source>
          (
          <year>2015</year>
          )
          <article-title>e0128193</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. De Sa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , C.
          <article-title>Ré, Incremental Knowledge Base Construction using Deepdive</article-title>
          ,
          <source>in: Proceedings of the VLDB Endowment International Conference on Very Large Data Bases</source>
          , volume
          <volume>8</volume>
          ,
          <year>2015</year>
          , p.
          <fpage>1310</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Betteridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kisiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Hruschka</surname>
          </string-name>
          , T. M. Mitchell,
          <article-title>Toward an Architecture for Never-Ending Language Learning</article-title>
          ,
          <source>in: Proceedings of the 24th AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1306</fpage>
          -
          <lpage>1313</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <article-title>Open information extraction from the web</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>51</volume>
          (
          <year>2008</year>
          )
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <article-title>Identifying Relations for Open Information Extraction</article-title>
          ,
          <source>in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1535</fpage>
          -
          <lpage>1545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadhead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          , Textrunner:
          <article-title>Open Information Extraction on the Web, in: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT</article-title>
          ),
          <year>2007</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Angeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J. J.</given-names>
            <surname>Premkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Leveraging linguistic structure for open domain information extraction</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Corro</surname>
          </string-name>
          , R. Gemulla,
          <article-title>ClausIE: Clause-based open information extraction</article-title>
          ,
          <source>Proceedings of the 22nd International Conference on World Wide Web</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gashteovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <source>del Corro</source>
          ,
          <article-title>MinIE: Minimizing facts in open information extraction</article-title>
          ,
          <source>in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>2630</fpage>
          -
          <lpage>2640</lpage>
          . URL: https://aclanthology.org/D17-1278. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          -1278.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kolluru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Adlakha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , Mausam,
          <string-name>
            <surname>S. Chakrabarti,</surname>
          </string-name>
          <article-title>OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>3748</fpage>
          -
          <lpage>3761</lpage>
          . URL: https:// aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>306</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>306</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Galárraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Amarilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <article-title>Predicting Completeness in Knowledge Bases</article-title>
          ,
          <source>in: Proceedings of the 10th ACM International Conference on Web Search and Data Mining</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <article-title>Domain-Specific Knowledge Graph Construction for Semantic Analysis</article-title>
          ,
          <source>in: Proceedings of the Extended Semantic Web Conference (ESWC) 2020 Satellite Events</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>250</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>PaintKG: The painting knowledge graph using biLSTM-CRF</article-title>
          ,
          <source>in: Proceedings of the 2020 International Conference on Information Science and Education (ICISE-IE)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>412</fpage>
          -
          <lpage>417</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICISE51755.
          <year>2020</year>
          .
          <volume>00094</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Odat</surname>
          </string-name>
          ,
          <article-title>Building a Semantic Knowledge-base for Painting Conservators</article-title>
          ,
          <source>in: Proceedings of the 2011 IEEE Seventh International Conference on eScience</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          . doi:
          <volume>10</volume>
          .1109/eScience.
          <year>2011</year>
          .
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mancinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marinucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , C. Veninata,
          <article-title>ArCo: The Italian cultural heritage knowledge graph</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2019</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Castellano</surname>
          </string-name>
          , G. Sansaro, G. Vessio,
          <article-title>ArtGraph: Towards an Artistic Knowledge Graph, arXiv e-prints (</article-title>
          <year>2021</year>
          ) arXiv-
          <fpage>2105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Oramas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Espinosa-Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sordo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Serra</surname>
          </string-name>
          ,
          <article-title>Information extraction for knowledge base construction in the music domain</article-title>
          ,
          <source>Data and Knowledge Engineering</source>
          <volume>106</volume>
          (
          <year>2016</year>
          )
          <fpage>70</fpage>
          -
          <lpage>83</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S0169023X16300416. doi:
          <volume>10</volume>
          .1016/j.datak.
          <year>2016</year>
          .
          <volume>06</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vsesviatska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hoppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sprau</surname>
          </string-name>
          , N. Meyer, D. Dessì, H. Sack,
          <string-name>
            <surname>ArDO:</surname>
          </string-name>
          <article-title>An ontology to describe the dynamics of multimedia archival records</article-title>
          ,
          <source>in: Proceedings of the 36th Annual ACM Symposium on Applied Computing</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1855</fpage>
          -
          <lpage>1863</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Waitelonis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Felgentref</surname>
          </string-name>
          , N. Meyer, A. Weber,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sack</surname>
          </string-name>
          , Linked Stage Graph, in: SEMANTICS Posters&amp;Demos,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D.</given-names>
            <surname>Oldman</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Labs, The CIDOC Conceptual Reference Model (CIDOC-CRM): PRIMER, CIDOC-CRM oficial web site (</article-title>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dijkshoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jongma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Van Ossenbruggen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          , W. ter Weele, J. Wielemaker,
          <article-title>The Rijksmuseum Collection as Linked Data, Semantic Web 9 (</article-title>
          <year>2018</year>
          )
          <fpage>221</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>S. Van Hooland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata</article-title>
          , Facet publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>C. D. Manning</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>McClosky, The Stanford CoreNLP natural language processing toolkit, in: Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          ,
          <year>2014</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          . URL: http://www.aclweb.org/ anthology/P/P14/P14-5010.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Kabir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sawada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>HDSKG</surname>
          </string-name>
          :
          <article-title>Harvesting domain specific knowledge graph from content of webpages</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Software Analysis</source>
          ,
          <article-title>Evolution and Re-engineering (SANER)</article-title>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Wan,</surname>
          </string-name>
          <article-title>AKMiner: Domain-specific knowledge graph mining from academic literatures</article-title>
          ,
          <source>in: Proceedings of the International Conference on Web Information Systems Engineering</source>
          , Springer,
          <year>2013</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Constructing biomedical domainspecific knowledge graph with minimum supervision</article-title>
          ,
          <source>Knowledge and Information Systems</source>
          <volume>62</volume>
          (
          <year>2020</year>
          )
          <fpage>317</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>F.</given-names>
            <surname>Belleau</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Nolin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tourigny</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rigault</surname>
            ,
            <given-names>J. Morissette,</given-names>
          </string-name>
          <article-title>Bio2RDF: Towards a mashup to build bioinformatics knowledge systems</article-title>
          ,
          <source>Journal of biomedical informatics 41</source>
          (
          <year>2008</year>
          )
          <fpage>706</fpage>
          -
          <lpage>716</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siu</surname>
          </string-name>
          , G. Weikum,
          <article-title>Knowlife: A versatile approach for constructing a large knowledge graph for biomedical sciences</article-title>
          ,
          <source>BMC bioinformatics 16</source>
          (
          <year>2015</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          , Who is Mona L.?
          <article-title>Identifying Mentions of Artworks in Historical Archives</article-title>
          ,
          <source>in: Proceedings of the International Conference on Theory and Practice of Digital Libraries</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>P.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bolton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Stanza: A Python natural language processing toolkit for many human languages</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          ,
          <year>2020</year>
          . URL: https://nlp.stanford.edu/pubs/qi2020stanza.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vashishth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jain</surname>
          </string-name>
          , P. Talukdar, CESI:
          <article-title>Canonicalizing open knowledge bases using embeddings and side information</article-title>
          ,
          <source>in: Proceedings of the 2018 World Wide Web Conference</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1317</fpage>
          -
          <lpage>1327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rosasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Poggio</surname>
          </string-name>
          ,
          <article-title>Holographic embeddings of knowledge graphs</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>30</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonacich</surname>
          </string-name>
          ,
          <article-title>Power and centrality: A family of measures</article-title>
          ,
          <source>American journal of sociology 92</source>
          (
          <year>1987</year>
          )
          <fpage>1170</fpage>
          -
          <lpage>1182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>L.</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          , T. Winograd,
          <article-title>The PageRank Citation Ranking: Bringing Order to the Web</article-title>
          ,
          <source>Technical Report 1999-66</source>
          ,
          <string-name>
            <surname>Stanford</surname>
            <given-names>InfoLab</given-names>
          </string-name>
          ,
          <year>1999</year>
          . URL: http://ilpubs. stanford.edu:
          <volume>8090</volume>
          /422/, previous number =
          <string-name>
            <surname>SIDL-WP-</surname>
          </string-name>
          1999-0120.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning for mention-ranking coreference models</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods on Natural Language Processing</source>
          ,
          <year>2016</year>
          . URL: https://nlp.stanford.edu/pubs/clark2016deep.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network</article-title>
          , CoRR abs/
          <year>2006</year>
          .09610 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2006</year>
          .09610. arXiv:
          <year>2006</year>
          .09610.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dash</surname>
          </string-name>
          , G. Rossiello,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bagchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gliozzo</surname>
          </string-name>
          ,
          <article-title>Open knowledge graphs canonicalization using variational autoencoders</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>10379</fpage>
          -
          <lpage>10394</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <volume>811</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp-main.
          <volume>811</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <string-name>
            <surname>FLAIR:</surname>
          </string-name>
          <article-title>An easy-to-use framework for state-of-the-art NLP, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
          . URL: https://aclanthology.org/N19-4010. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -4010.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Josifoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          , L. Zettlemoyer,
          <article-title>Scalable Zero-shot Entity Linking with Dense Entity Retrieval</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>6397</fpage>
          -
          <lpage>6407</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>519</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>519</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>