<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vani Kanjirangat</string-name>
          <email>vanik@idsia.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Antonucci</string-name>
          <email>alessandro@idsia.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Edge Labels</institution>
          ,
          <addr-line>Verb Clusters, Supersenses, Lowest Common Hypernyms, Knowledge Graphs</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), USI-SUPSI</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Edge labelling represents one of the most challenging processes for knowledge graph creation in unsupervised domains. Abstracting the relations between the entities, extracted in the form of triplets, and assigning a single label to a cluster of relations might be quite dificult without supervision and tedious if based on manual annotations. This seems to be particularly the case for applications in literary text understanding, which is the focus of this paper. We present a simple but eficient way to label the edges between the character entities in the knowledge graph extracted from a novel or a short story using a two-level clustering based on BERT-embedding with supersenses and hypernyms. The lack of benchmark datasets in the literary domain poses significant challenges for evaluations. In this work-in-progress paper, we discuss preliminary results to understand the potential for further research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Extracting structured information from narrative texts is a significant challenge for
contemporary AI. The complexity further increases in the case of literary text because of possible
ambiguous usage of words, neologisms, unique author writing styles, and many other subtle
linguistic aspects. In fact the analysis of literary texts involves various complex steps such
as the identification of the main characters and relations and their typification
(e.g., gender,
partnerships, goodness). Moreover, the high variance in style and the lexicon with frequent
use of neologisms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and figures of speech [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] further complicates the scenario. Most of
the past explorations are limited to particular application areas, such as biomedical literature
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], or news and social media analysis [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Diferent embedding techniques and the more
recent attention based models, including transformers, evolved as the state-of-the-art for both
unsupervised and supervised NLP tasks [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref7 ref8 ref9">7, 8, 9, 10, 11, 12, 13</xref>
        ].
      </p>
      <p>Identifying a more abstract and meaningful edge label for unsupervised knowledge graph
extractions and its evaluation is a challenging process. We report here the current state of our
work in the field with preliminary experiments on unsupervised edge labelling of knowledge
graphs extracted from literary texts. A simple technique to label the edges in a reasonable way is
evaluated. The code is already available in a public repository (github.com/IDSIA/novel2graph).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The onset of deep learning has given the drive to powerful data processing models to ease
NLP applications. For knowledge graphs (KGs), deep models are used to embed the triplet
information and address tasks such as link predictions and graph completion [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] and
training embeddings [
        <xref ref-type="bibr" rid="ref16">16, 17, 18, 19</xref>
        ]. Another major shift was the introduction of attention,
and transformer models [20], with many works that adopting attention mechanisms for KG
completion and learning tasks [21, 22, 23]. There has been also focus towards unsupervised
learning of KG embeddings [24, 25].
      </p>
      <p>The automatic interpretation and visual analysis of literary texts have been explored from
various perspectives in the past few years. In [26], the literary characters and the network
associations have been studied, while in [27] sentiment relations between (Shakespeare’s)
characters have been processed. When it comes to unsupervised KG constructions, a combination
of classical and deep learning NLP techniques is usually required.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>Let us first briefly discuss the entity extraction process, which is a necessary preprocessing
already studied in our previous works before approaching the edge labelling approach. Since,
we are dealing with literary text, our entities are the characters in the given input novel or short
story, which exhibit various characteristics and relations. As in [28], we used the Stanford Named
Entity Recognition Tagger1 together with a character de-aliasing, i.e., unifying the character
names that can be possibly referred in diferent ways (e.g., Ron and Ronald). This is achieved
by the DBSCAN clustering algorithm [29] paired with the Levenshtein string distances. We
use the partial_ratio method provided by the fuzzywuzzy module2 to compute the distance
matrix. This is followed by the coreference resolution3 using the Stanford package and some
heuristic adjustments. Each character entity is eventually represented by a unique identifier.
These entities define the nodes of the KG. The next step is to label the edges connecting these
nodes, which is the major focus of the present work.</p>
      <sec id="sec-3-1">
        <title>3.1. Verb Extraction and Embedding</title>
        <p>Following [28], we extract all the sentences containing two characters/entities and exclude
self-relations (e.g., Harry, I am Harry Potter ). For simplicity, we also prune the sentences, where
the second character appears at the end of the sentence (e.g., said Harry). To split the larger
sentences, we use constituency parsing tree4 to extract the subtrees. Our approach traverses the
tree using a depth-first search and extracts each phrase (S) containing at least one noun phrase
(NP) and one verb phrase (VP) starting from the bottom of the tree. For instance, consider the
sentence:</p>
        <p>CHAR0 is talking to CHAR1, while CHAR1 is cooking for CHAR2.
1https://nlp.stanford.edu/software/CRF-NER.html
2https://pypi.org/project/fuzzywuzzy
3https://nlp.stanford.edu/projects/coref.shtml
4https://stanfordnlp.github.io/CoreNLP/parse.html
The constituency parsing tree returns two extracted phrases (CHAR0 is talking to CHAR1 and
CHAR1 is cooking for CHAR2) as depicted in Fig. 1.</p>
        <p>NP</p>
        <p>NNP</p>
        <sec id="sec-3-1-1">
          <title>CHAR0 VBZ is S</title>
          <p>VP
VBG
talking IN
to
PP
VP
NP</p>
          <p>NNP</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>CHAR1</title>
          <p>,
,
.
.</p>
          <p>IN</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>SBAR</title>
          <p>S
while NP
VP</p>
          <p>NNP VBZ</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>CHAR1 is VBG PP</title>
          <p>cooking IN
VP
for
NP</p>
          <p>NNP</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>CHAR2</title>
          <p>We refer to the set of output sentences as relational sentences. Once we have all the relational
sentences, the next step is to extract a representative verb for each relational sentence. Using
Part-of-Speech (POS) tagging, we extract the verbs in these relational sentences. Further,
we embed the sentences using Sentence BERT (SBERT) [30] and extract the embeddings of
the corresponding extracted verbs. SBERT uses a Siamese network structure [31] to produce
meaningful sentence encodings. Once we have the embedded verbs, we group similar verbs
together. Since the embeddings are supposed to encode semantic or contextual information,
sentences with similar vector representations are supposed to share similar relations.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Verb Clustering and Edge Labelling</title>
        <p>Algorithm 1: Verb Clustering (Level 1)</p>
        <p>Input: Extracted Verbs [ 1, 2,..  ], Supersense Categories (SC)</p>
        <p>Output: Supersense-based Verb Clusters
1 Find embeddding of [ 1, 2,..  ];
2 if Verb in single SC then Assign it to that SC;
3 else if Verb in multiple SCs then
4 for SC do
5 Remove the verb from SC;
6 Compute average embedding of SC with the remaining verbs;
7 if Verb not in any SC then Compute the average embedding of SCs;
8 Compute distance between the verb embedding and the average embeddings of SCs;
9 Assign the verb to the SC at minimum distance;
Algorithm 2: Verb Clustering (Level 2)</p>
        <p>Input: Supersense-based Verb Clusters</p>
        <p>Output: Triplet (1 ,  ,2 )
1 for each supersense-based verb cluster do
2 Take all the verb pairs
3 for each verb pair do
4 Compute the lowest common hypernym (LCH) and store them all;
5 Sort the LCHs based on their frequency;
6 for each verb do
7 Associate it to the most common LCH;
8 if no LCH associated to a verb then Consider as outlier;
9 Associate the relation label with the corresponding LCH;
10 Generate the triplets;</p>
        <p>To achieve this, we adopt the two-level verb clustering summarised by Algs. 1 and 2. The
ifrst step involves grouping the extracted verbs into supersense clusters as given in Alg. 1.
Supersense (SS) [32] is a terminology from WordNet [33], where the words are grouped into
sets of synonyms called synsets. Each synset is associated with one of the 45 broader semantic
categories/SSs, out of which we have 26 nouns, 16 verbs, 3 adjectives, and 1 adverb. This can
be regarded as a coarse-grained word sense grouping, but it can be quite helpful for many
NLP tasks. We focus on verb SS category only, as we consider the verbs in a sentence as the
input. A word can belong to multiple SS categories (as a word can have diferent senses), and
hence SS tagging or disambiguation is another challenging research problem. In the proposed
approach, we consider the 16 verb SSs as the category or clusters to which an input verb has to be
assigned, which are {body, change, cognition, communication, competition, consumption, contact,
creation, emotion, motion, perception, possession, social, stative, weather}. We then compute the
embeddings of the extracted verbs with SBERT. Further, we follow the steps from 2 to 8 in Alg. 1
to assign the verb to a specific SS category. If the verb belongs to multiple SS category or to
none of these categories, we compute the average of all verb embeddings that belong to each SS
category and assign the verb to one with which it has minimum cosine distance.</p>
        <p>The input to second-level, described in Alg. 2 is the SS-based verb clusters. We take all
the verb pairs in a cluster and compute the lowest-common-hypernyms (LCHs), which is the
lowest common ancestor node between the given synsets in the hierarchy. Since each verb
can have multiple synsets, we can have multiple LCHs for a verb pair. These are then sorted
based on the frequency of their occurrences, which are related to the strength of association
with the verb pair and associate it to the most common LCH. This LCH is considered as
the edge label and we generate the triplets (1 ,  ,2 ), where  is the predicate/relation and
1 and 2 are the entities/characters. E.g., for the verb cluster {call, pass, share , give, take,
spend, buy}, the output is {Synset(’move.v.02’): [’save’, ’call’, ’pass’, ’give’, ’take’],Synset(’act.v.01’):
[’share’],Synset(’give.v.03’): [’spend’, ’buy’]}.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We use the first six books of the Harry Potter series by J.K. Rowling (885‘943 words). Tab. 1
shows the statistics of the number of sentences extracted before and after co-referencing for
the first book. K-means with cosine distance is used for sentence clustering. Algs. 1 and 2 are
applied. A snapshot of the supersense-based clusters obtained using the proposed approach
defined is in Tab. 2 (left), while the final triplets obtained from verb clusters at level 2 is in Tab. 2
(right). Semantically similar verbs are properly clustered together under the corresponding
supersense category. E.g., for category communication, we have verbs such as speak, raise,
warn, and mutter. They are closely related to each other in the sense that all these verbs a
diferent ways of communications to express the emotions and further character relations. The
preliminary experiments show that our approach yield meaningful clusters and triplets.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We described our preliminary experiments with an unsupervised edge labelling approach for
knowledge graphs. A two-level clustering approach, based on verb supersenses and lowest
common hypernyms has been used. To capture semantic similarity, we used the BERT-based
embeddings. The approach was empirically evaluated on a literary text. As future work, we
aim to enhance sense clustering by approaches such as sense-BERT [34].
(Volume 1: Long Papers), 2015, pp. 687–696.
[17] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on
hyperplanes, in: C. E. Brodley, P. Stone (Eds.), Proceedings of the Twenty-Eighth AAAI
Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada,
AAAI Press, 2014, pp. 1112–1119.
[18] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge
graph completion, in: B. Bonet, S. Koenig (Eds.), Proceedings of the Twenty-Ninth AAAI
Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, AAAI Press,
2015, pp. 2181–2187.
[19] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings
for modeling multi-relational data, in: Advances in Neural Information Processing Systems,
2013, pp. 2787–2795.
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems,
2017, pp. 5998–6008.
[21] X. Liu, H. Tan, Q. Chen, G. Lin, Ragat: Relation aware graph attention network for
knowledge graph completion, IEEE Access 9 (2021) 20840–20849.
[22] C. Li, X. Peng, Y. Niu, S. Zhang, H. Peng, C. Zhou, J. Li, Learning graph attention-aware
knowledge graph embedding, Neurocomputing 461 (2021) 516–529.
[23] H. Wang, S. Li, R. Pan, M. Mao, Incorporating graph attention mechanism into knowledge
graph reasoning based on deep reinforcement learning, in: Proceedings of the 2019
conference on empirical methods in natural language processing and the 9th international
joint conference on natural language processing (EMNLP-IJCNLP), 2019, pp. 2623–2631.
[24] N. Sheikh, X. Qin, B. Reinwald, C. Miksovic, T. Gschwind, P. Scotton, Knowledge graph
embedding using graph convolutional networks with relation-aware attention, arXiv
preprint arXiv:2102.07200 (2021).
[25] N. Veira, B. Keng, K. Padmanabhan, A. G. Veneris, Unsupervised embedding enhancements
of knowledge graphs using textual associations., in: IJCAI, 2019, pp. 5218–5225.
[26] A. Piper, M. Algee-Hewitt, K. Sinha, D. Ruths, H. Vala, Studying literary characters and
character networks, 2017.
[27] E. T. Nalisnick, H. S. Baird, Character-to-character sentiment analysis in shakespeare’s
plays, in: Proceedings of the 51st Annual Meeting of the Association for Computational
Linguistics, volume 2, 2013, pp. 479–483.
[28] S. Mellace, V. Kanjirangat, A. Antonucci, Relation clustering in narrative knowledge graphs,
in: Proceedings of AI4Narratives - Workshop on Artificial Intelligence for Narratives in
conjunction with the 29th International Joint Conference on Artificial Intelligence and the
17th Pacific Rim International Conference on Artificial Intelligence (IJCAI 2020), Yokohama,
Japan, volume 2794 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 23–27.
[29] D. Birant, A. Kut, ST-DBSCAN: An algorithm for clustering spatial–temporal data, Data &amp;</p>
      <p>Knowledge Engineering 60 (2007) 208–221.
[30] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese
BERTnetworks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), 2019, pp. 3973–3983.
[31] F. Schrof, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition
and clustering, in: Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 815–823.
[32] M. Ciaramita, M. Johnson, Supersense tagging of unknown nouns in wordnet, in:
Proceedings of the 2003 conference on Empirical methods in natural language processing,
2003, pp. 168–175.
[33] G. A. Miller, WordNet: An electronic lexical database, MIT press, 1998.
[34] Y. Levine, B. Lenz, O. Dagan, O. Ram, D. Padnos, O. Sharir, S. Shalev-Shwartz, A. Shashua,
Y. Shoham, Sensebert: Driving some sense into bert, in: Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics, 2020, pp. 4656–4667.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Martínez Carbajal</surname>
          </string-name>
          , et al.,
          <article-title>Neologisms in Harry Potter books</article-title>
          , Universidad de Valladolid. Facultad de Filosofía y Letras (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Å. Nygren,</surname>
          </string-name>
          <article-title>Essay on the linguistic features in J.K. Rowling's Harry Potter and the Philosopher's</article-title>
          <string-name>
            <surname>Stone</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A hybrid model based on neural networks for biomedical relation extraction</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>81</volume>
          (
          <year>2018</year>
          )
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu,</surname>
          </string-name>
          <article-title>Clinical relation extraction with deep learning</article-title>
          ,
          <source>International Journal of Hybrid Information Technology</source>
          <volume>9</volume>
          (
          <year>2016</year>
          )
          <fpage>237</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. Q.</given-names>
            <surname>Trieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Q.</given-names>
            <surname>Tran</surname>
          </string-name>
          , M.-T. Tran,
          <article-title>News classification from social media using Twitterbased doc2vec model and automatic query expansion</article-title>
          ,
          <source>in: Proceedings of the Eighth International Symposium on Information and Communication Technology</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>460</fpage>
          -
          <lpage>467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Towards automatic fake news classification</article-title>
          ,
          <source>Proceedings of the Association for Information Science and Technology</source>
          <volume>55</volume>
          (
          <year>2018</year>
          )
          <fpage>805</fpage>
          -
          <lpage>807</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dagan</surname>
          </string-name>
          ,
          <article-title>Improving distributional similarity with lessons learned from word embeddings, Transactions of the Association for Computational Linguistics 3 (</article-title>
          <year>2015</year>
          )
          <fpage>211</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Jaques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Valiati</surname>
          </string-name>
          ,
          <article-title>An analysis of hierarchical text classification using word embeddings</article-title>
          ,
          <source>Information Sciences 471</source>
          (
          <year>2019</year>
          )
          <fpage>216</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wieting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Livescu</surname>
          </string-name>
          ,
          <article-title>Towards universal paraphrastic sentence embeddings</article-title>
          ,
          <source>arXiv preprint arXiv:1511.08198</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gardner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <article-title>Deep contextualized word representations</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>2227</fpage>
          -
          <lpage>2237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>5754</fpage>
          -
          <lpage>5764</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Kagnet: Knowledge-aware graph networks for commonsense reasoning</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2822</fpage>
          -
          <lpage>2832</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          , KG-BERT:
          <article-title>BERT for knowledge graph completion</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>03193</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding via dynamic mapping matrix</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>