=Paper= {{Paper |id=Vol-2769/64 |storemode=property |title=Creativity Embedding: A Vector to Characterise and Classify Plausible Triples in Deep Learning NLP Models |pdfUrl=https://ceur-ws.org/Vol-2769/paper_64.pdf |volume=Vol-2769 |authors=Isabeau Oliveri,Luca Ardito,Giuseppe Rizzo,Maurizio Morisio |dblpUrl=https://dblp.org/rec/conf/clic-it/OliveriARM20 }} ==Creativity Embedding: A Vector to Characterise and Classify Plausible Triples in Deep Learning NLP Models== https://ceur-ws.org/Vol-2769/paper_64.pdf
    Creativity Embedding: a vector to characterise and classify plausible
                   triples in deep learning NLP models

   Isabeau Oliveri                  Luca Ardito                  Giuseppe Rizzo                 Maurizio Morisio
Politecnico di Torino           Politecnico di Torino           LINKS Foundation               Politecnico di Torino
  isabeau.oliveri@                  luca.ardito@                 giuseppe.rizzo@                maurizio.morisio@
       polito.it                       polito.it               linksfoundation.com                    polito.it


                       Abstract                                   Douglas Adams                      St John’s College


    English. In this paper we define the cre-                                      educated at

    ativity embedding of a text based on four                         Head                                 Tail
    self-assessment creativity metrics, namely                                      Relation
    diversity, novelty, serendipity and magni-
    tude, knowledge graphs, and neural net-
                                                               Figure 1: The triple (Douglas Adams, educated at,
    works. We use as basic unit the notion
                                                               St John’s College), from Wikidata knowledge base
    of triple (head, relation, tail). We inves-
                                                               (Vrandečić and Krötzsch, 2014), is an example of
    tigate if additional information about cre-
                                                               statement.
    ativity improves natural language process-
    ing tasks. In this work, we focus on triple
    plausibility task, exploiting BERT model                   tems for representing them. The most promi-
    and a WordNet11 dataset sample. Con-                       nent example is the Semantic Web (Berners-Lee
    trary to our hypothesis, we do not detect                  et al., 2001), where the information is represented
    increase in the performance.                               through linked statements, each one composed
    Keywords - Creativity Embedding; Cre-                      of head,relation,tail, forming a triple (Figure 1).
    ativity Metric; NLP; Creativity Evalua-                    This semantic embedding allows significant ad-
    tion; Triple; Knowledge Graph; BERT.                       vantages such as reasoning over data and operat-
                                                               ing with heterogeneous data sources.
                                                                  Integration of structured information is not the
1   Introduction                                               only method that literature provides us to improve
Current conversational agents have emerged as                  NLP techniques. Previous researches pointed out
powerful instruments for assisting humans. Often-              that analysis of creativity features could improve
times, their cores are represented by natural lan-             self-assessment evaluation, with benefits for solu-
guage processing (NLP) models and algorithms.                  tions generated and inputs understanding (Lamb
However, these models are far from being exhaus-               et al., 2018; Karampiperis et al., 2014; Sur-
tive representation of reality and language dynam-             deanu et al., 2008). We specify that in this
ics, trained on biased data through deep learning              work creativity is intended as capability to cre-
algorithms, where the flow among various layers                ate, understand and evaluate novel contents. The
without could result in information loss (Wang et              concepts of Creativity AI have been discussed
al., 2015). As a consequence, NLP techniques still             in their interconnections with the Semantic Web
find it challenging to manage conversation that                (Ławrynowicz, 2020), generalizable to knowledge
they have never encountered before, reacting not               graphs. Kuznetsova et al. (Kuznetsova et al.,
efficiently to novel scenarios.                                2013) define quantitative measures of creativity
   One way to mitigate these issues is the inte-               in lexical compositions, exploring different the-
gration of structured information, which knowl-                ories, such as divergent thinking, compositional
edge graphs are one of the best-known sys-                     structure and creative semantic subspace. The cru-
                                                               cial point is that no every novel combinations are
     Copyright ©2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-      perceived creative and useful, distinguishing cre-
ternational (CC BY 4.0).                                       ativity perceived in unconventional, uncommon or
”expressive in an interesting, imaginative, or in-          "What is the color
                                                              of the desk?"
spirational way”.
    Despite it is made clear the interest of the scien-
tific community in exploring this direction, little
research is conducted over creativity in the NLP                                   desk
                                                                                                    color
                                                                                                                         grey
field. The results and the considerations made by                                         p1:0.1 p2:0.5 p3:0.2 p4: ...

Kuznetsova and Ławrynowicz, led us to investi-                                                      color
gate the possible correlations between improve-                                   desk                              mouse
                                                                                          p1:0.9 p2:0.5 p3:0.1 p4: ...
ments in NLP tasks and creativity, with a partic-
                                                                 Person
ular focus on self-assessment. In this paper we                                                     color
                                                                                   desk                                  mask
introduce a novel approach for supporting deep                                            p1:0.2 p2:0.6 p3:0.3 p4: ...

learning algorithms with a mathematical represen-
tation of creativity feature of a text. We named                Person
                                                                                                 Possible
                                                              Knowledge
it creativity embedding and based it on metrics                                                  Solutions
                                                              and Context
of self-evaluation creativity over graph knowledge
base.                                                     Figure 2: A person produces different solutions to
                                                          answer a question. Therefore he performs a self-
2     Approach                                            assessment procedure, taking into account several
                                                          parameters p based on its knowledge and the con-
2.1    Self-assessment creativity metrics
                                                          text. Finally, he chooses the possible best solution.
When humans face a problem they never en-                 Parameters are expressed as numbers, for simplic-
countered before, they usually perform a self-            ity.
assessment procedure respect their previous
knowledge and context, generally voting for the
best solution. Following the example reported in          between them. In the literature, there is no fixed
Figure 2, we can imagine that a person has to de-         notion of similarity. However, a common strategy
scribe the colour of a grey desk. He does not             for texts is transforming words and sentences
remind the name of the colour at that time, and           in vectors, taking in account and keeping their
performs a creative process. He use a metaphor            distributional properties and connections. Sub-
to describe the grey colour of the desk, refer-           sequently, mathematical distance functions are
ring to the stereotype colour of a ”mouse”. This          applied. The similarity function could defines a
metaphor is widely accepted, and the colour would         semantic similarly function between two items
be ideally understand by the interlocutor. If in          (words or sentences) under these conditions. For
place of ”mouse” the random term ”mask” is                prompt understanding, we anticipate that in our
used, the meaning will not probably received if           experiment we use cosine similarity function and
not particular context or knowledge is shared be-         BERT vectors (embeddings) as words represen-
tween the person and the interlocutor, resulting          tation, as will be discussed in following sections.
in a not effective creative process. To emulate           Nevertheless, thus defined metrics could be com-
this self-assessment procedure, we propose met-           puted with different item vector representation
rics inspired by the related-concept literature, such     and similarity function, as long as it is adopted a
as recommender systems (Monti et al., 2019) and           similarity function with output domain [0,1], with
machine learning (Pimentel et al., 2014; Ruan             high value for high similarity.
et al., 2020). The knowledge is represented by
a graph of items interconnected by their relation           Diversity (1) represents the semantic diversity
(triples).                                                between the head hT and tail tT of the triple T .
                                                          This information tells how these two elements are
   We define four metrics, namely diversity (1),
                                                          not semantically close. It could be considered as
novelty (2), serendipity (3), and magnitude (4).
                                                          T internal semantic diversity.
In these metrics we make use of a similarity
function. In fact, to define the similarity (or                     div(T ) = 1 − similarity(hT , tT )                          (1)
the diversity, from another angle) between two
or more items, we need a method and a rep-                Novelty (2) of a triple T is its average seman-
resentation that allows us to define a distance           tic diversity respect others triples in the context.
Context C is the sub-graph of triple obtained by         hidden layer. This creativity embedding can be
traversing the paths of length p in the knowledge        added and adapted in its dimension. Stated the
graph, starting from the triple hT under examina-        above concepts, we define the subsequent research
tion, collecting n nearest triples. It could be con-     questions.
sidered as external semantic diversity of T respect
to the context C retrieved.                                 Research Question: A creativity embedding
                     n                                   extracted from the creativity neural network could
                  1X                                     improve triple plausibility classification in deep
      nov(T ) =         1 − similarity(T, Ci )    (2)
                  n i=1                                  learning models?
Serendipity (3) is here intended as the semantic         3     Model Architecture
novelty of the triple T , taking into account the
s most novel triples considering the knowledge           3.1    BERT
graph (refined context S). It could be considered        We select Bidirectional Encoder Representations
as T novelty relevance.                                  from Transformers (BERT) (Devlin et al., 2019) as
                     s                                   a model for investigating the effects of creativity
                  1X
      ser(T ) =         1 − similarity(T, Si )    (3)    embedding, due to its flexibility and modularity, as
                  s i=1                                  well as being state of the art for various NLP tasks.
                                                         The BERT model could be divided into three main
Magnitude (4) outlines the rarity of the triple,
                                                         parts: preprocessing of the input, stack of trans-
ranking rk each component of the triple by the
                                                         former layers, and other layers on top to perform
number of its occurrences over the total number of
                                                         a particular task - typically a classifier. A stack
items in the knowledge graph. The ranking func-
                                                         of Transformers forms the BERT core. A trans-
tion thus defined has an output domain [0,1].
                                                         former exploits the attention mechanism to learn
                  rk(hT ) + rk(relT ) + rk(tT )          the contextual relationship between sentences and
  mag(T ) =                                       (4)    words input. The input is not considered in one
                               3
                                                         direction, but figuratively in all ones at one time,
2.2    Creativity Embedding                              defining the context of a word considering the en-
There were no annotated datasets on the creativity       tire surrounding words. The model is trained with
characteristics of interest. For this reason, a direct   a sort of play, where some words or entire sen-
comparison with the ground truth was hampered.           tences are masked, and the model has to predict
To overcome this obstacle, we indirectly measured        them. We do not modify the core of the model;
the effectiveness of this approach by applying it        we are more interested in the preprocessing part,
to an external model and judging the results on          where we will inject the creativity embedding, as
the triple plausibility task (Yao et al., 2019; Wang     explained in the next section.
et al., 2018; Wang et al., 2015; Padó et al., 2009).
The triple plausibility task consists of classifying     3.2    Creativity Neural Network and
a dataset’s triples in plausible or not plausible               Creativity CLS Embedding
classes, comparing the result respect to the ground      The outline of the architecture proposed for the
truth. We choose this task to perform an indirect        task is shown in Figure 3. In the lower part,
evaluation of our proposal, rely on the correlation      the triple flows through the BERT model. We
between plausibility and creativity (Lamb et al.,        used a modified tokenization technique of Knowl-
2018), as plausibility could represent a positive        edge Graph BERT (KG-BERT) (Yao et al., 2019),
outcome of an effective creative process. The            adapted for the structure of the triple. The triple
current trend in machine learning and natural            is split in tokens respect the BERT vocabulary
language processing models pushes the use of             of known words. Special tokens are included in
mathematical representation of meaningful infor-         the sequence, classification (CLS) and separator
mation utilising vectors, commonly known in this         (SEP) tokens. CLS corresponding embeddings are
field as embeddings. For these reasons, we outline       in charge of representing the sentence mathemat-
and train a neural network using the computed            ically, and SEP tokens that separate different sen-
ground truth to predict creativity values, and           tences. On the KG-BERT version for triple plau-
define as creativity embedding the weight of last        sibility, SEP is used to separate head words from
                                                                                                                                                                                                                                Creativity Neural Network
                                                                                                                                                                                                                                Fully Connected + Dropout
                                                                                                                                                                                                                        Input   Hidden        Hidden     Hidden   Hidden   Output
                                                                                                                                                                                                                        Layer   Layer         Layer      Layer    Layer    Layer
                                                                                                                                                                                                                       (2304)   (2048,        (2048,     (1024,    (768,    (4)
                                                                                                                                                                                                                                ReLU)         ReLU)      ReLu)    ReLu)




                                                                                                                                Embt 1




                                                                                                                                                                                     Embt c
                                                                                                                                         +                                     +                                                                                                                                     div




                                                                                                                                              ...




                                                                                                                                                                                                      768 * 3 = 2304
                                                                                                                                                                                                                                                                                                                   nov




                                                                                                                                Embr 1




                                                                                                                                                                                     Embr b




                                                                                                                                                                                                                                                                  ...
                                                                                                                                                                                                                                                          ...
                                                                                                                                         +                                     +




                                                                                                                                              ...




                                                                                                                                                                                                                       ...




                                                                                                                                                                                                                                    ...



                                                                                                                                                                                                                                                   ...
                                                                                                                                                                                                                                                                                                                    ser




                                                                                                                                                                                        a
                                                                                                                                Embh 1




                                                                                                                                                                                    Embh
                                                                                                                                         +                                     +                                                                                                                                   mag




                                                                                                                                              ...




                                                                                                                                                                                                                                                                                                                       < 0, ... > < 0, ... >
                                                                                                                                                                                   < -0.96, < 0.78,
                                            SEP




                                                                                                                                                                                             ... >
                                                                                      102




                                                                                                                                                                                                                                                                              adding to CLS creativity embedding
                                            Tokt c




                                                                                                                                                                                     ... >
                                                                                      236
                                                     word token strings to word ids




                                                                                                                                                 word ids to word embeddings




                                                                                                                                                                                                                                                                                                                                                          ( Vaswani, Ashish, et al., 2017, Devlin, Jacob, et al., 2019)
                                            ...




                                                                                      ...




                                                                                                                                                                                       ...




                                                                                                                                                                                                                                                                                                                       ...
                                                                                                                                                                                                                                                                                                                       < 0, ... > < 0, ... > < 0, ... >
                                                                                                                                                                                   < 1.77,
                                                                                      18956
                                            Tokt 1
                      BERT tokenization




                                                                                                                                                                                    ... >




                                                                                                                                                                                                                                                                                                                                                                      Transformer Attention Mechanism
                                                                                                                                                                                   < -2.36, < 0.78,




                                                                                                                                                                                                                                                                                                                                                                                                                          Triple Plausibity Classificator
                                            SEP




                                                                                                                                                                                             ... >
                                                                                      102




                                                                                                                                                                                                                                                                                                                                                                                                                                                           L [0, 1] Is
                                                                                                                                                                                                                                                                                                                                                                                                                                                            the triple
                                                                                                                                                                                                                                                                                                                                                                                                                                                           plausible?
                                            Tokr b




                                                                                                                                                                                     ... >
               tail




                                                                                      56




                                                                                                                                                                                                                                                                                                                                                                                                                                                            (No/Yes)
Input Triple

               rel




                                            ...




                                                                                      ...




                                                                                                                                                                                       ...




                                                                                                                                                                                                                                                                                                                       ...
                                                                                                                                                                                                                                                                                                                       < 0, ... > < 0, ... > < 0, ... >
                                                                                                                                                                                   < 0.65,
                                              1
               head




                                                                                                                                                                                    ... >
                                                                                      455
                                            Tokr




                                                                                                                                             0
                                                                                                                                                                                   < 0.78,
                                            SEP




                                                                                                                                                                                    ... >
                                                                                      102




                                                                                                              word embeddings




                                                                                                                                                                                   < 6.36,
                                              a




                                                                                                                                                                                    ... >
                                            Tokh




                                                                                      96




                                                                                               word ids
                                                                                      12
                                            ...




                                                                                                                                                                                       ...




                                                                                                                                                                                                                                                                                                                       ... < 0, ... >
                                                                                                                                                                                   < 0.02, < -1.25,
                                              1




                                                                                      1290




                                                                                                                                                                                             ... >
                                            Tokh




                                                                                                                                             30K




                                                                                                                                                                                                                                                                                                                   < 3.26,
                                            CLS




                                                                                                                                                                                     ... >




                                                                                                                                                                                                                                                                                                                    ... >
                                                                                      101




                                                                                                          0                768

                                          Tokenizer KG-BERT                                   BERT Word Embedding Lookup                                                                                                                                            Creativity Embedding
                                                                                                        Table



Figure 3: For each triple, Creativity Embedding computed by Creativity Neural Network is added to
BERT CLS embedding, defining the Creativity CLS Embedding. A linear classifier on top perform the
triple plausibility classification.


relation and tail words in three different sentences.                                                                                                                                                                                     providing the model with a non-empty CLS, Cre-
The corresponding token identifiers and embed-                                                                                                                                                                                            ativity CLS Embedding. In this case, the penul-
dings are retrieved through two lookup tables, pro-                                                                                                                                                                                       timate layer has been described with several neu-
vided by the BERT model. At the top of Figure 3,                                                                                                                                                                                          rons equal to 768, the same size as the BERT em-
we show our creativity neural network. A com-                                                                                                                                                                                             beddings. On the top of the architecture, a linear
pact and fixed-size version of the embeddings is                                                                                                                                                                                          classifier is in charge of predictions of the plausi-
obtained from BERT, summing the embeddings of                                                                                                                                                                                             bility task relying on Creativity CLS Embedding.
each component of the triple. This compact ver-
sion feeds the proposed neural network in charge                                                                                                                                                                                          4        Experiment
of predicting creativity’s four values and produc-
ing creativity embedding. The neural network                                                                                                                                                                                              In this experiment we random sample triples
consists of an input layer (768 ∗ 3 neurons), an                                                                                                                                                                                          from WordNet11 (Miller, 1995) dataset (50000
output layer (4 neurons), 4 fully connected hidden                                                                                                                                                                                        train, 5000 validation, 3000 test, with positive and
layers with a dropout probability = 0.5. The acti-                                                                                                                                                                                        negative labels balanced).
vation function used is ReLU . This neural net-
work structure is basic since its main task is to                                                                                                                                                                                           Creativity Neural Network. As stated in the
have a flexible last hidden layer adaptable to the                                                                                                                                                                                        previous sections, we compute the four metrics
technology that would leverage the creativity em-                                                                                                                                                                                         on each triple dataset to create the ground truth.
bedding. The CLS token is one of the most repre-                                                                                                                                                                                          As a similarity function we use cosine similarity,
sentative tokens to perform classification and other                                                                                                                                                                                      that returns a value between 0 and 1, with high
types of predictions. Came to us exploiting CLS                                                                                                                                                                                           value for high similarity. We applied the cosine
token to adding creative embedding of the triple,                                                                                                                                                                                         similarity function after transforming words and
                                                                                                                                                                                                                                          sentences in embeddings, provided by BERT
model. We encountered slowdowns only with              5       Result and Conclusion
novelty metric. The number of nodes is not
                                                       In this paper we investigate if defined creativity
predictable a priori in our setting, and the mathe-
                                                       embedding improves triple plausibility task, ex-
matical nature of the formula is sensitive to a high
                                                       ploiting BERT model. We do not detect an in-
number of nodes. Peaks of memory allocation
                                                       crease in the performance (Table 1), comparing
could occur, as well as long computation time.
                                                       ourselves to KG-BERT results. In this compari-
We limit the failure due to out of memory or
                                                       son we should point out that the sample used is
timeout of the scheduled jobs applying the ”divide
                                                       one fifth of the complete WN11 dataset. This re-
et impera” paradigm and other adjustments. The
                                                       sult is somewhat contrary to our expectations, as
length of the path p, seen as recursion deep, is
                                                       the creativity embeddings represent in some way a
fixed to 5. For each node interested by recursion,
                                                       priori information. A possible explanation might
the number of maximum neighbor nodes n
                                                       be the learning methodology of the creativity em-
considered is fixed to 20. Once we obtain all the
                                                       bedding: we suppose that a significant loss of in-
metrics values, we can train the Creativity Neural
                                                       formation in the process has occurred. Further re-
Network, as a regression problem. We use: as loss
                                                       search might explore other types of embeddings
criterion mean squared error loss; as optimizer
                                                       (Grohe, 2020), as graph2vec, and different inte-
AdamW with learning rate = 0.001, betas =
                                                       gration of the proposed metrics. Future experi-
(0.9, 0.999), epsilon = 1e−08 , weight decay =
                                                       mental investigations may try different parameter
0.01; as scheduler StepLR with parameters step
                                                       configurations. For example, the number of nodes
size = 10 and gamma = 0.1; we train the model
                                                       considered intuitively could change the values of
for 10 epochs, size batch of 512. To evaluate
                                                       metrics as a novelty. Nevertheless, more in-depth
performance on test set we compute explained
                                                       data analysis on the used dataset, corresponding
variance score = −0.4493, mean absolute error
                                                       knowledge graph, and data correlations could pro-
= 0.1733 , mean squared error = 0.0388 and R2
                                                       vide additional insights. In future work, we will
score = −6.7694. Although small values of mean
                                                       consider different combinations of metrics defined
squared and absolute error, R2 tells us that the
                                                       to train the creativity neural network. It is possi-
model do not approximate the distribution better
                                                       ble that there are metrics more or not relevant for
than the ”best-fit” line. This is probably due to
                                                       the task. Selecting metrics strictly relevant will
low entropy of the inputted metrics values, that
                                                       result in a lightening of the computational effort
inspected, result in stationing around 0.5 value.
                                                       and will give us information about correlations be-
                                                       tween metrics and results. To conclude, we aim to
   Triple Plausibility Task. The tokenized triple      bring the NLP community’s attention to new re-
is inputted to the Creativity Neural Network, ob-      search topics on creativity.
taining the creativity embeddings. This is added       Acknowledgments
to the CLS embedding token, and the triple flows
through the Transformers stack. Therefore, the         Computational     resources    provided      by
BERT model is used to make predictions and ad-         HPC@POLITO, which is a project of Aca-
dress the triple plausibility task, putting a linear   demic Computing within the Department of
classifier on top of the Transformer stack. We         Control and Computer Engineering at the Politec-
use as loss function the binary cross-entropy loss     nico di Torino2 . We thank the reviewers from
function. The literature suggests few epochs and       CLiC-it 2020 conference for the comments and
samples for the finetuning process. We finetune        advices.
BERT for 2 epochs; after we freeze the weights of
the model, training only the classifier layer for 3
                                                       References
epochs. We select BERT base uncased as baseline
model; as optimizer AdamW with learning rate =         Tim Berners-Lee, James Hendler, and Ora Lassila.
5e−05 , as scheduler a linear scheduler with warm        2001. The semantic web. Scientific american,
                                                         284(5):34–43.
up proportion = 10%; for the classifier dropout
probability = 0.5. We fix the maximum sequence         Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
length at 100 tokens, as all the triples after tok-       Kristina Toutanova. 2019. Bert: Pre-training of
                                                           2
enization do not exceed this number of tokens.                 http://www.hpc.polito.it
                               Number of triples                        Model Metrics
                             Train   Val      Test          Accuracy   Recall Precision         F1
            CE+BERT          50000 3000 5000                 0.5093    0.8510   0.5102        0.6379
            KG-BERT         225162 5218 21088                0.9334    0.9345   0.9324        0.9334

                                Table 1: Triple plausibility experiment results.


  deep bidirectional transformers for language under-        Yu-Ping Ruan, Zhen-Hua Ling, Xiaodan Zhu, Quan
  standing. In Proceedings of the 2019 Conference of           Liu, and Jia-Chen Gu. 2020. Generating diverse
  the North American Chapter of the Association for            conversation responses by creating and ranking mul-
  Computational Linguistics: Human Language Tech-              tiple candidates. Computer Speech Language,
  nologies, Volume 1 (Long and Short Papers), pages            62:101071.
  4171–4186.
                                                             Mihai Surdeanu, Massimiliano Ciaramita, and Hugo
Martin Grohe. 2020. Word2vec, node2vec, graph2vec,             Zaragoza. 2008. Learning to rank answers on large
 x2vec: Towards a theory of vector embeddings of               online qa collections. In Proceedings of ACL-08:
 structured data. In Proceedings of the 39th ACM               HLT, pages 719–727.
 SIGMOD-SIGACT-SIGAI Symposium on Principles
 of Database Systems, PODS’20, page 1–16, New                Denny Vrandečić and Markus Krötzsch. 2014. Wiki-
 York, NY, USA. Association for Computing Ma-                  data: A free collaborative knowledgebase. Com-
 chinery.                                                      mun. ACM, 57(10):78–85, September.
                                                             Quan Wang, Bin Wang, and Li Guo. 2015. Knowl-
P. Karampiperis, A. Koukourikos, and E. Koliopoulou.           edge base completion using embeddings and rules.
   2014. Towards machines for measuring creativity:            IJCAI’15, page 1859–1865. AAAI Press.
   The use of computational tools in storytelling activi-
   ties. In 2014 IEEE 14th International Conference on       Su Wang, Greg Durrett, and Katrin Erk. 2018. Model-
   Advanced Learning Technologies, pages 508–512.              ing semantic plausibility by injecting world knowl-
                                                               edge. In Proceedings of the 2018 Conference of
Polina Kuznetsova, Jianfu Chen, and Yejin Choi. 2013.          the North American Chapter of the Association
  Understanding and quantifying creativity in lexical          for Computational Linguistics: Human Language
  composition. In Proceedings of the 2013 Confer-              Technologies, Volume 2 (Short Papers), pages 303–
  ence on Empirical Methods in Natural Language                308, New Orleans, Louisiana, June. Association for
  Processing, pages 1246–1258, Seattle, Washington,            Computational Linguistics.
  USA, October. Association for Computational Lin-
  guistics.                                                  Liang Yao, Chengsheng Mao, and Yuan Luo. 2019.
                                                               Kg-bert: Bert for knowledge graph completion.
Carolyn Lamb, Daniel G. Brown, and Charles L. A.               arXiv preprint arXiv:1909.03193.
  Clarke. 2018. Evaluating computational creativity:
  An interdisciplinary tutorial. ACM Comput. Surv.,
  51(2), February.

Agnieszka Ławrynowicz. 2020. Creative ai: A new
  avenue for the semantic web? Semantic Web, pages
  69–78.

George A Miller.       1995.  Wordnet: a lexical
  database for english. Communications of the ACM,
  38(11):39–41.

Diego Monti, Enrico Palumbo, Giuseppe Rizzo, and
  Maurizio Morisio. 2019. Sequeval: An offline eval-
  uation framework for sequence-based recommender
  systems. Information, 10(5):174.

Ulrike Padó, Matthew W Crocker, and Frank Keller.
  2009. A probabilistic model of semantic plausi-
  bility in sentence processing. Cognitive Science,
  33(5):794–838.

Marco A.F. Pimentel, David A. Clifton, Lei Clifton,
 and Lionel Tarassenko. 2014. A review of novelty
 detection. Signal Processing, 99:215 – 249.