<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Creativity Embedding: a vector to characterise and classify plausible triples in deep learning NLP models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>St John's College</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Giuseppe Rizzo LINKS Foundation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Isabeau Oliveri Politecnico di Torino</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Luca Ardito Politecnico di Torino</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Maurizio Morisio Politecnico di Torino</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. In this paper we define the creativity embedding of a text based on four self-assessment creativity metrics, namely diversity, novelty, serendipity and magnitude, knowledge graphs, and neural networks. We use as basic unit the notion of triple (head, relation, tail). We investigate if additional information about creativity improves natural language processing tasks. In this work, we focus on triple plausibility task, exploiting BERT model and a WordNet11 dataset sample. Contrary to our hypothesis, we do not detect increase in the performance.</p>
      </abstract>
      <kwd-group>
        <kwd>- Creativity Embedding</kwd>
        <kwd>Creativity Metric</kwd>
        <kwd>NLP</kwd>
        <kwd>Creativity Evaluation</kwd>
        <kwd>Triple</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Current conversational agents have emerged as
powerful instruments for assisting humans.
Oftentimes, their cores are represented by natural
language processing (NLP) models and algorithms.
However, these models are far from being
exhaustive representation of reality and language
dynamics, trained on biased data through deep learning
algorithms, where the flow among various layers
without could result in information loss
        <xref ref-type="bibr" rid="ref15">(Wang et
al., 2015)</xref>
        . As a consequence, NLP techniques still
find it challenging to manage conversation that
they have never encountered before, reacting not
efficiently to novel scenarios.
      </p>
      <p>One way to mitigate these issues is the
integration of structured information, which
knowledge graphs are one of the best-known
sys</p>
      <p>Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
educated at</p>
      <sec id="sec-1-1">
        <title>Relation</title>
      </sec>
      <sec id="sec-1-2">
        <title>Head</title>
      </sec>
      <sec id="sec-1-3">
        <title>Tail</title>
        <p>
          tems for representing them. The most
prominent example is the Semantic Web
          <xref ref-type="bibr" rid="ref1">(Berners-Lee
et al., 2001)</xref>
          , where the information is represented
through linked statements, each one composed
of head,relation,tail, forming a triple (Figure 1).
This semantic embedding allows significant
advantages such as reasoning over data and
operating with heterogeneous data sources.
        </p>
        <p>
          Integration of structured information is not the
only method that literature provides us to improve
NLP techniques. Previous researches pointed out
that analysis of creativity features could improve
self-assessment evaluation, with benefits for
solutions generated and inputs understanding
          <xref ref-type="bibr" rid="ref13 ref4 ref6">(Lamb
et al., 2018; Karampiperis et al., 2014;
Surdeanu et al., 2008)</xref>
          . We specify that in this
work creativity is intended as capability to
create, understand and evaluate novel contents. The
concepts of Creativity AI have been discussed
in their interconnections with the Semantic Web
(Ławrynowicz, 2020), generalizable to knowledge
graphs. Kuznetsova et al.
          <xref ref-type="bibr" rid="ref5">(Kuznetsova et al.,
2013)</xref>
          define quantitative measures of creativity
in lexical compositions, exploring different
theories, such as divergent thinking, compositional
structure and creative semantic subspace. The
crucial point is that no every novel combinations are
perceived creative and useful, distinguishing
creativity perceived in unconventional, uncommon or
”expressive in an interesting, imaginative, or
inspirational way”.
        </p>
        <p>Despite it is made clear the interest of the
scientific community in exploring this direction, little
research is conducted over creativity in the NLP
field. The results and the considerations made by
Kuznetsova and Ławrynowicz, led us to
investigate the possible correlations between
improvements in NLP tasks and creativity, with a
particular focus on self-assessment. In this paper we
introduce a novel approach for supporting deep
learning algorithms with a mathematical
representation of creativity feature of a text. We named
it creativity embedding and based it on metrics
of self-evaluation creativity over graph knowledge
base.
2
2.1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <sec id="sec-2-1">
        <title>Self-assessment creativity metrics</title>
        <p>
          When humans face a problem they never
encountered before, they usually perform a
selfassessment procedure respect their previous
knowledge and context, generally voting for the
best solution. Following the example reported in
Figure 2, we can imagine that a person has to
describe the colour of a grey desk. He does not
remind the name of the colour at that time, and
performs a creative process. He use a metaphor
to describe the grey colour of the desk,
referring to the stereotype colour of a ”mouse”. This
metaphor is widely accepted, and the colour would
be ideally understand by the interlocutor. If in
place of ”mouse” the random term ”mask” is
used, the meaning will not probably received if
not particular context or knowledge is shared
between the person and the interlocutor, resulting
in a not effective creative process. To emulate
this self-assessment procedure, we propose
metrics inspired by the related-concept literature, such
as recommender systems
          <xref ref-type="bibr" rid="ref9">(Monti et al., 2019)</xref>
          and
machine learning
          <xref ref-type="bibr" rid="ref11 ref12">(Pimentel et al., 2014; Ruan
et al., 2020)</xref>
          . The knowledge is represented by
a graph of items interconnected by their relation
(triples).
        </p>
        <p>We define four metrics, namely diversity (1),
novelty (2), serendipity (3), and magnitude (4).
In these metrics we make use of a similarity
function. In fact, to define the similarity (or
the diversity, from another angle) between two
or more items, we need a method and a
representation that allows us to define a distance
"What is the color
of the desk?"</p>
        <sec id="sec-2-1-1">
          <title>Person</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Person Knowledge and Context desk</title>
          <p>desk
desk
color
color
color
grey
mask
p1:0.1 p2:0.5 p3:0.2 p4: ...
mouse
p1:0.9 p2:0.5 p3:0.1 p4: ...
p1:0.2 p2:0.6 p3:0.3 p4: ...</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Possible</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>Solutions</title>
          <p>between them. In the literature, there is no fixed
notion of similarity. However, a common strategy
for texts is transforming words and sentences
in vectors, taking in account and keeping their
distributional properties and connections.
Subsequently, mathematical distance functions are
applied. The similarity function could defines a
semantic similarly function between two items
(words or sentences) under these conditions. For
prompt understanding, we anticipate that in our
experiment we use cosine similarity function and
BERT vectors (embeddings) as words
representation, as will be discussed in following sections.
Nevertheless, thus defined metrics could be
computed with different item vector representation
and similarity function, as long as it is adopted a
similarity function with output domain [0,1], with
high value for high similarity.</p>
          <p>Diversity (1) represents the semantic diversity
between the head hT and tail tT of the triple T .
This information tells how these two elements are
not semantically close. It could be considered as
T internal semantic diversity.</p>
          <p>div(T ) = 1
similarity(hT ; tT )
(1)
Novelty (2) of a triple T is its average
semantic diversity respect others triples in the context.
Context C is the sub-graph of triple obtained by
traversing the paths of length p in the knowledge
graph, starting from the triple hT under
examination, collecting n nearest triples. It could be
considered as external semantic diversity of T respect
to the context C retrieved.</p>
          <p>nov(T ) = 1 Xn 1
n i=1
similarity(T; Ci)
(2)
Serendipity (3) is here intended as the semantic
novelty of the triple T , taking into account the
s most novel triples considering the knowledge
graph (refined context S). It could be considered
as T novelty relevance.</p>
          <p>similarity(T; Si)
(3)
ser(T ) = 1 Xs 1</p>
          <p>s i=1
Magnitude (4) outlines the rarity of the triple,
ranking rk each component of the triple by the
number of its occurrences over the total number of
items in the knowledge graph. The ranking
function thus defined has an output domain [0,1].
mag(T ) =
rk(hT ) + rk(relT ) + rk(tT )
3
(4)
2.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Creativity Embedding</title>
        <p>
          There were no annotated datasets on the creativity
characteristics of interest. For this reason, a direct
comparison with the ground truth was hampered.
To overcome this obstacle, we indirectly measured
the effectiveness of this approach by applying it
to an external model and judging the results on
the triple plausibility task
          <xref ref-type="bibr" rid="ref10 ref15 ref16 ref17">(Yao et al., 2019; Wang
et al., 2018; Wang et al., 2015; Pado´ et al., 2009)</xref>
          .
The triple plausibility task consists of classifying
a dataset’s triples in plausible or not plausible
classes, comparing the result respect to the ground
truth. We choose this task to perform an indirect
evaluation of our proposal, rely on the correlation
between plausibility and creativity
          <xref ref-type="bibr" rid="ref6">(Lamb et al.,
2018)</xref>
          , as plausibility could represent a positive
outcome of an effective creative process. The
current trend in machine learning and natural
language processing models pushes the use of
mathematical representation of meaningful
information utilising vectors, commonly known in this
field as embeddings. For these reasons, we outline
and train a neural network using the computed
ground truth to predict creativity values, and
define as creativity embedding the weight of last
hidden layer. This creativity embedding can be
added and adapted in its dimension. Stated the
above concepts, we define the subsequent research
questions.
        </p>
        <p>Research Question: A creativity embedding
extracted from the creativity neural network could
improve triple plausibility classification in deep
learning models?
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Model Architecture</title>
      <sec id="sec-3-1">
        <title>BERT</title>
        <p>
          We select Bidirectional Encoder Representations
from Transformers (BERT)
          <xref ref-type="bibr" rid="ref2">(Devlin et al., 2019)</xref>
          as
a model for investigating the effects of creativity
embedding, due to its flexibility and modularity, as
well as being state of the art for various NLP tasks.
The BERT model could be divided into three main
parts: preprocessing of the input, stack of
transformer layers, and other layers on top to perform
a particular task - typically a classifier. A stack
of Transformers forms the BERT core. A
transformer exploits the attention mechanism to learn
the contextual relationship between sentences and
words input. The input is not considered in one
direction, but figuratively in all ones at one time,
defining the context of a word considering the
entire surrounding words. The model is trained with
a sort of play, where some words or entire
sentences are masked, and the model has to predict
them. We do not modify the core of the model;
we are more interested in the preprocessing part,
where we will inject the creativity embedding, as
explained in the next section.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Creativity Neural Network and</title>
      </sec>
      <sec id="sec-3-3">
        <title>Creativity CLS Embedding</title>
        <p>
          The outline of the architecture proposed for the
task is shown in Figure 3. In the lower part,
the triple flows through the BERT model. We
used a modified tokenization technique of
Knowledge Graph BERT (KG-BERT)
          <xref ref-type="bibr" rid="ref17">(Yao et al., 2019)</xref>
          ,
adapted for the structure of the triple. The triple
is split in tokens respect the BERT vocabulary
of known words. Special tokens are included in
the sequence, classification (CLS) and separator
(SEP) tokens. CLS corresponding embeddings are
in charge of representing the sentence
mathematically, and SEP tokens that separate different
sentences. On the KG-BERT version for triple
plausibility, SEP is used to separate head words from
1 c
tbm + .. + tbm 
E E
rbm 1+ .. + rbm 
        </p>
        <p>b
E E
hbm 1+ .. + hbm a
E E
4
0
3
2
=
3
*
8
6
7
...</p>
        <p>...</p>
        <p>.
.. ..
.</p>
        <p>...</p>
        <p>div
nov
ser
mag</p>
        <p>&gt;
ibeeddngm .,...,00&lt;&gt;&lt;
iiittt rcyvaeLongddSC .,.....,.,.000&gt;&lt;&gt;&gt;&lt;&lt;
a ..</p>
        <p>&gt;
,..
0&lt;&gt;
,..
0&gt;&lt;
.,.
0&lt;
..
&gt;
,..
, 0&lt;
.326 ..&gt;
&lt;
Creativity Embedding
)
9
1
0
2
lt.,
a
e
ism ,ob
n c
a a
ch ,J</p>
        <p>n
e il
nM ve
ito ,D
tn 7
te 10
reA l.,2</p>
        <p>a
rm te
fo ,
s h
ranT isshA
i,
n
a
w
s
a
V
  (
iltrfisscoaaC pL(tlhNa[e0uo,s/tYri1ibep]lsleIes)?
y
iit
b
s
u
a
l
P
e
lir
p
T
le ilta
p
i
truT lre
p</p>
        <sec id="sec-3-3-1">
          <title>In dhea</title>
          <p>relation and tail words in three different sentences.
The corresponding token identifiers and
embeddings are retrieved through two lookup tables,
provided by the BERT model. At the top of Figure 3,
we show our creativity neural network. A
compact and fixed-size version of the embeddings is
obtained from BERT, summing the embeddings of
each component of the triple. This compact
version feeds the proposed neural network in charge
of predicting creativity’s four values and
producing creativity embedding. The neural network
consists of an input layer (768 3 neurons), an
output layer (4 neurons), 4 fully connected hidden
layers with a dropout probability = 0:5. The
activation function used is ReLU . This neural
network structure is basic since its main task is to
have a flexible last hidden layer adaptable to the
technology that would leverage the creativity
embedding. The CLS token is one of the most
representative tokens to perform classification and other
types of predictions. Came to us exploiting CLS
token to adding creative embedding of the triple,
providing the model with a non-empty CLS,
Creativity CLS Embedding. In this case, the
penultimate layer has been described with several
neurons equal to 768, the same size as the BERT
embeddings. On the top of the architecture, a linear
classifier is in charge of predictions of the
plausibility task relying on Creativity CLS Embedding.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Experiment</title>
      <p>
        In this experiment we random sample triples
from WordNet11
        <xref ref-type="bibr" rid="ref8">(Miller, 1995)</xref>
        dataset (50000
train, 5000 validation, 3000 test, with positive and
negative labels balanced).
      </p>
      <sec id="sec-4-1">
        <title>Creativity Neural Network. As stated in the</title>
        <p>previous sections, we compute the four metrics
on each triple dataset to create the ground truth.
As a similarity function we use cosine similarity,
that returns a value between 0 and 1, with high
value for high similarity. We applied the cosine
similarity function after transforming words and
sentences in embeddings, provided by BERT
model. We encountered slowdowns only with
novelty metric. The number of nodes is not
predictable a priori in our setting, and the
mathematical nature of the formula is sensitive to a high
number of nodes. Peaks of memory allocation
could occur, as well as long computation time.
We limit the failure due to out of memory or
timeout of the scheduled jobs applying the ”divide
et impera” paradigm and other adjustments. The
length of the path p, seen as recursion deep, is
fixed to 5. For each node interested by recursion,
the number of maximum neighbor nodes n
considered is fixed to 20. Once we obtain all the
metrics values, we can train the Creativity Neural
Network, as a regression problem. We use: as loss
criterion mean squared error loss; as optimizer
AdamW with learning rate = 0:001, betas =
(0:9; 0:999), epsilon = 1e 08, weight decay =
0:01; as scheduler StepLR with parameters step
size = 10 and gamma = 0:1; we train the model
for 10 epochs, size batch of 512. To evaluate
performance on test set we compute explained
variance score = 0:4493, mean absolute error
= 0:1733 , mean squared error = 0:0388 and R2
score = 6:7694. Although small values of mean
squared and absolute error, R2 tells us that the
model do not approximate the distribution better
than the ”best-fit” line. This is probably due to
low entropy of the inputted metrics values, that
inspected, result in stationing around 0:5 value.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Triple Plausibility Task. The tokenized triple</title>
        <p>is inputted to the Creativity Neural Network,
obtaining the creativity embeddings. This is added
to the CLS embedding token, and the triple flows
through the Transformers stack. Therefore, the
BERT model is used to make predictions and
address the triple plausibility task, putting a linear
classifier on top of the Transformer stack. We
use as loss function the binary cross-entropy loss
function. The literature suggests few epochs and
samples for the finetuning process. We finetune
BERT for 2 epochs; after we freeze the weights of
the model, training only the classifier layer for 3
epochs. We select BERT base uncased as baseline
model; as optimizer AdamW with learning rate =
5e 05, as scheduler a linear scheduler with warm
up proportion = 10%; for the classifier dropout
probability = 0:5. We fix the maximum sequence
length at 100 tokens, as all the triples after
tokenization do not exceed this number of tokens.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Result and Conclusion</title>
      <p>
        In this paper we investigate if defined creativity
embedding improves triple plausibility task,
exploiting BERT model. We do not detect an
increase in the performance (Table 1), comparing
ourselves to KG-BERT results. In this
comparison we should point out that the sample used is
one fifth of the complete WN11 dataset. This
result is somewhat contrary to our expectations, as
the creativity embeddings represent in some way a
priori information. A possible explanation might
be the learning methodology of the creativity
embedding: we suppose that a significant loss of
information in the process has occurred. Further
research might explore other types of embeddings
        <xref ref-type="bibr" rid="ref3">(Grohe, 2020)</xref>
        , as graph2vec, and different
integration of the proposed metrics. Future
experimental investigations may try different parameter
configurations. For example, the number of nodes
considered intuitively could change the values of
metrics as a novelty. Nevertheless, more in-depth
data analysis on the used dataset, corresponding
knowledge graph, and data correlations could
provide additional insights. In future work, we will
consider different combinations of metrics defined
to train the creativity neural network. It is
possible that there are metrics more or not relevant for
the task. Selecting metrics strictly relevant will
result in a lightening of the computational effort
and will give us information about correlations
between metrics and results. To conclude, we aim to
bring the NLP community’s attention to new
research topics on creativity.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Computational resources provided by
HPC@POLITO, which is a project of
Academic Computing within the Department of
Control and Computer Engineering at the
Politecnico di Torino2. We thank the reviewers from
CLiC-it 2020 conference for the comments and
advices.
2http://www.hpc.polito.it</p>
      <p>F1
0.6379
0.9334
deep bidirectional transformers for language
understanding. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), pages
4171–4186.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>James</given-names>
            <surname>Hendler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ora</given-names>
            <surname>Lassila</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>The semantic web</article-title>
          .
          <source>Scientific american</source>
          ,
          <volume>284</volume>
          (
          <issue>5</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Grohe</surname>
          </string-name>
          .
          <year>2020</year>
          . Word2vec,
          <year>node2vec</year>
          ,
          <year>graph2vec</year>
          , x2vec:
          <article-title>Towards a theory of vector embeddings of structured data</article-title>
          .
          <source>In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS'20, page 1-16</source>
          , New York, NY, USA. Association for Computing Machinery.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Karampiperis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koukourikos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Koliopoulou</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Towards machines for measuring creativity: The use of computational tools in storytelling activities</article-title>
          .
          <source>In 2014 IEEE 14th International Conference on Advanced Learning Technologies</source>
          , pages
          <fpage>508</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Polina</given-names>
            <surname>Kuznetsova</surname>
          </string-name>
          , Jianfu Chen, and
          <string-name>
            <given-names>Yejin</given-names>
            <surname>Choi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Understanding and quantifying creativity in lexical composition</article-title>
          .
          <source>In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1246</fpage>
          -
          <lpage>1258</lpage>
          , Seattle, Washington, USA, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Carolyn</surname>
            <given-names>Lamb</given-names>
          </string-name>
          , Daniel G. Brown, and
          <string-name>
            <surname>Charles</surname>
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Clarke</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Evaluating computational creativity: An interdisciplinary tutorial</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>51</volume>
          (
          <issue>2</issue>
          ), February.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Agnieszka</given-names>
            <surname>Ławrynowicz</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Creative ai: A new avenue for the semantic web? Semantic Web</article-title>
          , pages
          <fpage>69</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>George A</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Diego</given-names>
            <surname>Monti</surname>
          </string-name>
          , Enrico Palumbo, Giuseppe Rizzo, and
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Morisio</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Sequeval: An offline evaluation framework for sequence-based recommender systems</article-title>
          . Information,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Ulrike</surname>
            <given-names>Pado´</given-names>
          </string-name>
          , Matthew W Crocker, and Frank Keller.
          <year>2009</year>
          .
          <article-title>A probabilistic model of semantic plausibility in sentence processing</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <fpage>794</fpage>
          -
          <lpage>838</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Marco A.F.</given-names>
            <surname>Pimentel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>David A.</given-names>
            <surname>Clifton</surname>
          </string-name>
          , Lei Clifton, and
          <string-name>
            <given-names>Lionel</given-names>
            <surname>Tarassenko</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A review of novelty detection</article-title>
          .
          <source>Signal Processing</source>
          ,
          <volume>99</volume>
          :
          <fpage>215</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Yu-Ping</surname>
            <given-names>Ruan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhen-Hua</surname>
            <given-names>Ling</given-names>
          </string-name>
          , Xiaodan Zhu, Quan Liu, and
          <string-name>
            <surname>Jia-Chen Gu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Generating diverse conversation responses by creating and ranking multiple candidates</article-title>
          .
          <source>Computer Speech Language</source>
          ,
          <volume>62</volume>
          :
          <fpage>101071</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Mihai</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          , Massimiliano Ciaramita, and
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Learning to rank answers on large online qa collections</article-title>
          .
          <source>In Proceedings of ACL-08: HLT</source>
          , pages
          <fpage>719</fpage>
          -
          <lpage>727</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Denny Vrandecˇic</surname>
          </string-name>
          ´ and Markus Kro¨tzsch.
          <year>2014</year>
          .
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ):
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          , September.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Quan</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bin</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Li</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Knowledge base completion using embeddings and rules</article-title>
          .
          <source>IJCAI'15</source>
          , page 1859-
          <fpage>1865</fpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Su</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Greg Durrett</surname>
            , and
            <given-names>Katrin</given-names>
          </string-name>
          <string-name>
            <surname>Erk</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Modeling semantic plausibility by injecting world knowledge</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <issue>Short Papers)</issue>
          , pages
          <fpage>303</fpage>
          -
          <lpage>308</lpage>
          , New Orleans, Louisiana, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Liang</given-names>
            <surname>Yao</surname>
          </string-name>
          , Chengsheng Mao, and
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Luo</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Kg-bert: Bert for knowledge graph completion</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .03193.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>