<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Sartre to Frege in Three Steps: A? Search for Enriching Semantic Text Similarity Measures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Colla</string-name>
          <email>davide.colla@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Mensa</string-name>
          <email>enrico.mensa@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Leontino</string-name>
          <email>marco.leontino@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele P. Radicioni</string-name>
          <email>daniele.radicioni@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin, Computer Science Department</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. In this paper we illustrate a preliminary investigation on semantic text similarity. In particular, the proposed approach is aimed at complementing and enriching the categorization results obtained by employing standard distributional resources. We found that the paths connecting entities and concepts from documents at stake provide interesting information on the connections between document pairs. Such semantic browsing device enables further semantic processing, aimed at unveiling contexts and hidden connections (possibly not explicitly mentioned in the documents) between text documents.1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In the last few years many efforts have been
spent to extract information contained in text
documents, and a large number of resources have
been developed that allow exploring
domainbased knowledge, defining a rich set of specific
semantic relationships between nodes
        <xref ref-type="bibr" rid="ref2 ref25 ref27 ref30 ref34 ref39">(Vrandecic
and Kro¨tzsch, 2014; Auer et al., 2007; Navigli
and Ponzetto, 2012)</xref>
        . Being able to extract and
to make available the semantic content of
documents is a challenging task, with beneficial impact
on different applications, such as document
categorisation
        <xref ref-type="bibr" rid="ref6">(Carducci et al., 2019)</xref>
        , keyword
extraction
        <xref ref-type="bibr" rid="ref7">(Colla et al., 2017)</xref>
        , question answering,
text summarisation, semantic texts comparison, on
building explanations/justifications for similarity
judgements
        <xref ref-type="bibr" rid="ref8">(Colla et al., 2018)</xref>
        and more. In this
paper we present an approach aimed at extracting
1Copyright c 2019 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
meaningful information contained in text
documents, also based on background information
contained in an encyclopedic resource such as
Wikidata
        <xref ref-type="bibr" rid="ref25 ref30 ref34 ref39">(Vrandecic and Kro¨tzsch, 2014)</xref>
        .
      </p>
      <p>Although our approach has been devised on a
specific application domain (PhD theses in
philosophy), we argue that it can be easily extended to
further application settings. The approach focuses
on the ability to extract relevant pieces of
information from text documents, and to map them onto
the nodes of a knowledge graph, obtained from
semantic networks representing encyclopedic and
lexicographic knowledge. In this way it is
possible to compare different documents based on their
graphical description, which has a direct
anchoring to their semantic content.</p>
      <p>
        We propose a system to assess the similarity
between textual documents, hybridising the
propositional approach (such as traditional statements
expressed through RDF triples) with a
distributional description
        <xref ref-type="bibr" rid="ref12">(Harris, 1954)</xref>
        of the nodes
contained in the knowledge graph, that are
represented with word embeddings
        <xref ref-type="bibr" rid="ref23 ref35 ref5">(Mikolov et al.,
2013; Camacho-Collados et al., 2015; Speer et al.,
2017)</xref>
        . This step allows to obtain similarity
measures (based on vector descriptions, and on
pathfinding algorithms) and explanations (represented
as paths over a semantic network) more focused
on the semantic definition of concepts and entities
involved in the analysis.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Surveying the existing approaches requires to
briefly introduce the most widely used resources
along with their main features.</p>
      <sec id="sec-2-1">
        <title>Resources</title>
        <p>
          BabelNet (BN) is a wide-coverage multilingual
semantic network, originally built by integrating
WordNet
          <xref ref-type="bibr" rid="ref24">(Miller, 1995)</xref>
          and Wikipedia
          <xref ref-type="bibr" rid="ref26">(Navigli
and Ponzetto, 2010)</xref>
          . NASARI is a vectorial
resource whose senses are represented as vectors
associated to BabelNet synsets
          <xref ref-type="bibr" rid="ref5">(Camacho-Collados
et al., 2015)</xref>
          . Wikidata is a knowledge graph based
on Wikipedia, whose goal is to overcome
problems related to information access by creating new
ways for Wikipedia to manage its data on a global
scale
          <xref ref-type="bibr" rid="ref25 ref30 ref34 ref39">(Vrandecic and Kro¨tzsch, 2014)</xref>
          .
2.1
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Approaches to semantic text similarity</title>
        <p>Most literature in computing semantic similarity
between documents can be arranged into three
main classes.</p>
        <p>
          Word-based similarity. Word-based metrics are
used to compute the similarity between documents
based on their terms; examples of features
analysed are common morphological structures
          <xref ref-type="bibr" rid="ref16">(Islam
and Inkpen, 2008)</xref>
          and words overlap
          <xref ref-type="bibr" rid="ref15">(Huang et
al., 2011)</xref>
          between the texts. In one of the most
popular theories on similarity (the Tversky’s
contrast model) the similarity of a word pair is defined
as a direct function of their common traits
          <xref ref-type="bibr" rid="ref38">(Tversky, 1977)</xref>
          . This notion of similarity has been
recently adjusted to model human similarity
judgments for short texts: the Symmetrical Tversky
Ratio Model
          <xref ref-type="bibr" rid="ref18">(Jimenez et al., 2013)</xref>
          , and employed
to compute semantic similarity between word- and
sense-pairs
          <xref ref-type="bibr" rid="ref20 ref21 ref7 ref8">(Mensa et al., 2017; Mensa et al.,
2018)</xref>
          .
        </p>
        <p>
          Corpus-based similarity. Corpus-based
measures try to identify the degree of similarity
between words using information derived from large
corpora
          <xref ref-type="bibr" rid="ref10 ref22 ref31">(Mihalcea et al., 2006; Gomaa and Fahmy,
2013)</xref>
          .
        </p>
        <p>
          Knowledge-based similarity. Knowledge-based
measures try to estimate the degree of
semantic similarity between documents by using
information drawn from semantic networks
          <xref ref-type="bibr" rid="ref22">(Mihalcea
et al., 2006)</xref>
          . In most cases only the
hierarchical structure of the information contained in the
network is considered, without considering the
relation types within nodes
          <xref ref-type="bibr" rid="ref17 ref33">(Jiang and Conrath,
1997; Richardson et al., 1994)</xref>
          ; some authors
consider the “is-a” relation
          <xref ref-type="bibr" rid="ref32">(Resnik, 1995)</xref>
          , but
leaving unexploited the more domain-dependent ones.
Moreover, only concepts are usually considered,
omitting the Named Entities.
        </p>
        <p>
          An emerging paradigm is that of
knowledge graphs. Knowledge graph extraction is a
challenging task, particularly popular in recent
years
          <xref ref-type="bibr" rid="ref25 ref30 ref34 ref39">(Schuhmacher and Ponzetto, 2014)</xref>
          .
Several approaches have been developed, e.g., aimed
at extracting knowledge graphs from textual
corpora, attaining a network focused on the type of
documents at hand
          <xref ref-type="bibr" rid="ref31">(Pujara et al., 2013)</xref>
          . Such
approaches may be affected by scalability and
generalisation issues. In the last years many resources
representing knowledge in a structured form have
have been proposed that build on encyclopedic
resources
          <xref ref-type="bibr" rid="ref2 ref25 ref30 ref34 ref36 ref39">(Auer et al., 2007; Suchanek et al., 2007;
Vrandecic and Kro¨tzsch, 2014)</xref>
          .
        </p>
        <p>
          As regards as semantic similarity, a
framework has been proposed based on entity extraction
from documents, providing mappings to
knowledge graphs in order to compute semantic
similarities between documents
          <xref ref-type="bibr" rid="ref29">(Paul et al., 2016)</xref>
          .
Their similarity measures are mostly based on the
network structure, without introducing other
instruments such as embeddings, that are largely
acknowledged as relevant in semantic similarity.
Hecht et al. (2012) propose a framework endowed
with explanatory capabilities from similarity
measures based on relations between Wikipedia pages.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The System</title>
      <p>In this Section we illustrate the generation process
of the knowledge graph from Wikidata, which will
be instrumental to build paths across documents.
Such paths are then used, at a later time, to enrich
the similarity scores computed during the
classification.
3.1</p>
      <sec id="sec-3-1">
        <title>Knowledge Graph Extraction</title>
        <p>The first step consists of the extraction of a
knowledge graph related to the given reference domain.
Wikidata is then searched for concepts and entities
related to the domain being analysed. By
starting from the extracted elements, which constitute
the basic nodes of the knowledge graph, we still
consider Wikidata and look for relevant semantic
relationships towards other nodes, not necessarily
already extracted in the previous step. The types
of relevant relationships depend on the treated
domain. Considering the philosophical domain, we
selected a set of 30 relations relevant to
compare the documents. For example, we considered
the relation movement that represents the literary,
artistic, scientific or philosophical movement,the
relation studentOf that represents the person who
has taught the considered philosopher, and the
relation influencedBy that represents the person’s</p>
        <p>Christian
Jakob Kraus
hasAwardReceived
… the relevance
of Kant is put in
perspective by…
Rationalism
hasMovement
… the philosophy of
Baruch Spinoza,</p>
        <p>with analysis…
isMovementOf</p>
        <p>René
Decartes
idea from which the considered philospher’s idea
has been influenced. In this way, we obtain a graph
where each node is a concept or entity extracted
from Wikidata; such nodes are connected with
edges labeled with specific semantic relations.</p>
        <p>The obtained graph is then mapped onto
BabelNet. At the end of the first stage, the
knowledge graph represents the relevant domain
knowledge (Figure 1) encoded through BabelNet nodes,
that are connected through the rich set of relations
available in Wikidata. Each text document can be
linked to the knowledge graph, thereby allowing to
make semantic comparisons by analysing the
possible paths connecting document pairs.</p>
        <p>Without loss of generality, we considered the
philosophical domain, and extracted a
knowledge graph containing 22; 672 nodes and 135; 910
typed edges; Wikidata entities were mapped onto
BabelNet approximately in the 90% of cases.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Information extraction and semantic similarity</title>
        <p>
          The second step consists in connecting the
documents to the obtained knowledge graph. We
harvested a set of 475; 383 UK doctoral theses in
several disciplines through the Electronic Theses
Online Service (EThOS) of the British National
Library.2 At first, concepts and entities related to the
reference domain were extracted from the
considered documents, with a special focus on two
different types of information, such as concepts and
Named Entities. Concepts are keywords or
multiword expressions representing meaningful items
related to the domain (such as, e.g.,
‘philosophyof-mind’, ‘Rationalism’, etc.) while Named
Entities are persons, places or organisations (mostly
universities, in the present setting) strongly related
to the considered domain. Named entities are
extracted using the Stanford CoreNLP NER
module
          <xref ref-type="bibr" rid="ref19 ref30">(Manning et al., 2014)</xref>
          improved with
extrac2https://ethos.bl.uk.
tion rules based on morphological and
syntactical patterns, considering for example sequences
of words starting with a capital letter or
associated to a particular Part-Of-Speech pattern.
Similarly, we extract relevant concepts based on
particular PoS patterns (such as
NOUN-PREPOSITIONNOUN, thereby recognizing, for example,
philosophy of mind).
        </p>
        <p>
          We are aware that we are not considering the
problem of word sense disambiguation
          <xref ref-type="bibr" rid="ref20 ref28 ref35 ref37 ref7">(Navigli,
2009; Tripodi and Pelillo, 2017)</xref>
          . The
underlying assumption is that as long as we are concerned
with a narrow domain, this is a less severe
problem: e.g., if we recognise the person Kant in a
document related to philosophy, probably the person
cited is the philosopher whose name is Immanuel
Kant (please refer to Figure 1), rather than the less
philosophical Gujarati poet, playwright and
essayist Kavi Kant.3
        </p>
        <p>By mapping concepts and Named Entities
found in a document onto the graph, we gain a set
of access points to the knowledge graph. Once
acquired the access points to the knowledge graph
for a pair of documents, we can compute the
semantic similarity between documents by analysing
the paths that connect them.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Building Paths across Documents</title>
        <p>The developed framework is used to compute
paths between pairs of senses and/or entities
featuring two given documents. Each edge in the
knowledge graph has associated a semantic
relation type (such as, e.g., “hasAuthor”,
“influencedBy”, “hasMovement”). Each path
intervening between two documents is in the form
DOC1 ACCES!S SaulKripke influencedB!y
LudwigW ittgenstein influencedB!y BertrandRussell
influencedB!y BaruchDeSpinoza ACCESS DOC2
3https://tinyurl.com/y3s9lsp7.</p>
        <p>In this case we can argue in favor of the relatedness
of the two documents based on the chain of
relationships illustrating that Saul Kripke (from
document d1) has been influenced-by Ludwig
Wittgenstein, that has been influenced-by Bertrand
Russel, that in turn has been influenced-by Baruch De
Spinoza, mentioned in d2. The whole set of paths
connecting elements from a document d1 to a
document d2 can be thought of as a form of evidence
of the closeness of the two documents: documents
with numerous shorter paths connecting them are
intuitively more related. Importantly enough, such
paths over the knowledge graph do not contain
general information (e.g., Kant was a man), but
rather they are highly domain-specific (e.g., Oskar
Becker had as doctoral student Ju¨rgen Habermas).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A? Search</title>
      <p>
        The computation of the paths is performed via a
modified version of the A? algorithm
        <xref ref-type="bibr" rid="ref13">(Hart et al.,
1968)</xref>
        . In particular, paths among access nodes are
returned in order, from the shortest to the longest
one. Given the huge dimension of the network,
and since we are guaranteed to retrieve shortest
paths first, we stop the search after one second of
computation time.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experimentation</title>
      <p>In this Section we report the results of a
preliminary experimentation: given a dataset of PhD
theses, we first explore the effectiveness of standard
distributional approaches to compute the semantic
similarity between document pairs; we then
elaborate on how such results can be complemented
and enriched through the computation of paths
between entities therein.</p>
      <p>Experimental setting We extracted 4 classes of
documents (100 for each class) from the EThOS
dataset. For each record we retrieved the title and
abstract fields, that were used for subsequent
processing. We selected documents containing
‘Antibiotics’, ’Molecular’, ‘Hegel’ or ‘Ethics’ either
in their title (in 15 documents per class) or in their
abstract (15 documents per class). Each class is
featured on average by 163:5 tokens (standard
deviation = 39:3), including both title and
abstract. The underlying rationale has been that of
selecting documents from two broad areas, each
one composed by two different sets of data,
having to do with medical disciplines and molecular
biology in the former case, and with Hegelianism
(1)
(2)
and the broad theme of ethics in the latter case.
Intra-domain classes (that is both
‘Antibiotics’‘Molecular’ and ‘Hegel’-‘Ethics’) are not
supposed to be linearly separable, as it mostly occurs
in real problems. Of course, this feature makes
more interesting the categorization problem. The
dataset was used to compute some descriptive stats
(such as inverse document frequency),
characterizing the whole collection of considered
documents.</p>
      <p>From the aforementioned set of 400 documents
we randomly chose a subset of 20 documents, 5
documents for each of the 4 classes from those
containing the terms (either ‘Antibiotics’,
’Molecular’, ‘Hegel’ or ‘Ethics’) in the title. This
selection strategy was aimed at selecting more clearly
individuated documents, exhibiting a higher
similarity degree within classes than across classes.4</p>
      <sec id="sec-5-1">
        <title>4.1 Investigation on Text Similarity with</title>
      </sec>
      <sec id="sec-5-2">
        <title>Standard Distributional Approaches</title>
      </sec>
      <sec id="sec-5-3">
        <title>GLoVE and Word Embedding Similarity</title>
        <p>
          The similarity scores were computed for each
document pair with a Word Embedding Similarity
approach
          <xref ref-type="bibr" rid="ref1">(Agirre et al., 2016)</xref>
          . In particular, each
document d has been provided with a vector
description averaging the GloVe embeddings ti
          <xref ref-type="bibr" rid="ref30">(Pennington et al., 2014)</xref>
          for all terms in the title and
abstract:
where each t~i is the GloVe vector for the term ti.
Considering two documents d1 ad d2, each one
associated to a particular vector N!di , we compare
them using the cosine similarity metrics:
sim(N!d1 ; N!d2 ) =
        </p>
        <p>N!d1</p>
        <p>N!d2 :
kN!d1 kkN!d2 k
The obtained similarities between each document
pair are reported in Figure 2(a).5 The computed
distances show that overall this approach is
sufficient to discriminate the scientific doctoral theses
from the philosophical ones. In particular, the top
green triangle shows the correlation scores among
antibiotics documents, while the bottom
triangle reports the correlation scores among
philo4In future work we will verify such assumptions by
involving domain experts in order to validate and/or refine the
heuristics employed in the document selection.</p>
        <p>
          5The plot was computed using the corrplot package in R.
A2
0.9 A3
sophical documents. The red square
graphically illustrates the poor correlation between the
two classes of documents. On the other side,
the subclasses (Hegelism-Ethics and
AntibioticsMolecular) could not be separated. Provided
that word embeddings are known to conflate all
senses in the description of each term
          <xref ref-type="bibr" rid="ref21 ref4">(CamachoCollados and Pilehvar, 2018)</xref>
          , this approach
performed surprisingly well in comparison to a
baseline based on a one-hot vector representation, only
dealing with term-based features (Figure 2(b)).
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>NASARI and Sense Embedding Similarity</title>
        <p>
          We then explored the hypothesis that
semantic knowledge can be beneficial for better
separating documents: after performing word sense
disambiguation (the BabelFy service was
employed
          <xref ref-type="bibr" rid="ref25">(Moro et al., 2014)</xref>
          ), we used the NASARI
embedded version to compute the vector N!d, as
the average of all vectors associated to the senses
contained in Sd, basically employing the same
formula as in Equation 1. We then computed the
similarity matrix, displayed in Figure 2(c). It clearly
emerges that also NASARI is well suited to solve
a classification task when domains are well
separated. However, also in this case the adopted
approach does not seem to discriminate well within
the two main classes: for instance, the square with
vertices E1-H1; E5-H1; E5-H5; E1-H5 should be
reddish, indicating a lower average similarity
between documents pertaining the Hegel and Ethics
classes. We experimented in a set of widely varied
conditions and parameters, obtaining slightly
better similarity scores by weighting NASARI
vectors with senses idf, and senses connectivity (c,
obtained from BabelNet):
where H(si) is the number of documents
containing the sense si. The resulting similarities scores
are provided in Figure 2(d).
        </p>
        <p>Documents are in fact too close, and
presumably the adopted representation (merging all
senses in each document) is not as precise as
needed. In this setting, we tried to investigate the
documents similarity based on the connections
between their underlying sets of senses. Such
connections were computed on the aforementioned
graph.
4.2</p>
      </sec>
      <sec id="sec-5-5">
        <title>Enriching Text Similarity with Paths across Documents</title>
        <p>In order to examine the connections between the
considered documents we focused on the
philosophical portion of our dataset, and exploited the
knowledge graph described in Section 3. The
computed paths are not presently used to refine
the similarity scores, but only as a suggestion to
characterize possible connections between
document pairs. The extracted paths contain precious
information that can be easily integrated in
downstream applications, by providing specific
information that can be helpful for domain experts
to achieve their objectives (e.g., in semantically
browsing text documents, in order to find influence
relations across different philosophical schools).</p>
        <p>
          As anticipated, building paths among the
fundamental concepts of the documents allows
grasping important ties between the documents
topics. For instance, one of the extracted paths
(between the author ‘Hegel’ and the work ‘Sense
and Reference’
          <xref ref-type="bibr" rid="ref9">(Frege, 1948)</xref>
          ) shows the
connections between the entities at stake as follows.
G.W.F. Hegel hasMovement Continental
Philosophy, which is in turn the movementOf H.L.
Bergson, who has been influencedBy G. Frege, who
finally hasNotableWork Sense and Reference. The
semantic specificity of this information provides
precious insights that allow for a proper
consideration of the relevance of the second document w.r.t.
the first one. It is worth noting that the fact that
Hegel is a continental philosopher is trivial –tacit
knowledge– for philosophers, and was most
probably left implicit in the thesis abstract, while it can
be a relevant piece of information for a system
requested to assess the similarity of two
philosophical documents. Also, this sort of path over the
extracted knowledge graph enables a form of
semantic browsing that benefits from the rich set of
Wikidata relations paired with the valuable
coverage ensured by BabelNet on domain-specific
concepts and entities.
        </p>
        <p>The illustrated approach allows the
uncovering of insightful and specific connections between
documents pairs. However, this preliminary study
also pointed out some issues. One key problem is
the amount of named entities contained in the
considered documents (e.g., E5 only has one access
point, while E3 has none). Another issue has to
do with the inherently high connectivity of some
nodes of the knowledge graph (hubness). For
instance, the nodes Philosophy, Plato and Aristotle
are very connected, which results in the extraction
of some trivial and uninteresting paths among the
specific documents. The first issue could be
tackled by also considering the main concepts of a
document if no entity can be found, whilst the second
one could be mitigated by taking into account the
connectivity of the nodes as a negative parameter
while computing the paths.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we have investigated the
possibility of enriching semantic text similarity measures
via symbolic and human readable knowledge. We
have shown that distributional approaches allow
for a satisfactory classification of documents
belonging to different topics, however, our
preliminary experimentation showed that they are not able
to capture the subtle aspects characterizing
documents in close areas. As we have argued,
exploiting paths over graphs to explore connections
between document pairs may be beneficial in making
explicit domain-specific links between documents.</p>
      <p>
        As a future work, we could refine the
methodology related to the extraction of the concepts in the
Knowledge Graph, defining approaches based on
specific domain-related ontologies. Two relevant
works, to these ends, are the PhilOnto ontology,
that represents the structure of philosophical
literature
        <xref ref-type="bibr" rid="ref11 ref15">(Grenon and Smith, 2011)</xref>
        , and the InPho
taxonomy
        <xref ref-type="bibr" rid="ref3">(Buckner et al., 2007)</xref>
        , combining
automated information retrieval methods with
knowledge from domain experts. Both resources will
be employed in order to extract a more concise,
meaningful and discriminative Knowledge Graph.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors are grateful to the EThOS staff for
their prompt and kind support. Marco Leontino
has been supported by the REPOSUM project,
BONG CRT 17 01 funded by Fondazione CRT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          , Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and
          <string-name>
            <given-names>Janyce</given-names>
            <surname>Wiebe</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          , pages
          <fpage>497</fpage>
          -
          <lpage>511</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>So¨ren Auer, Christian Bizer</article-title>
          , Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and
          <string-name>
            <given-names>Zachary</given-names>
            <surname>Ives</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>In The semantic web</source>
          , pages
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Cameron</given-names>
            <surname>Buckner</surname>
          </string-name>
          , Mathias Niepert, and
          <string-name>
            <given-names>Colin</given-names>
            <surname>Allen</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Inpho: the indiana philosophy ontology</article-title>
          .
          <source>APA Newsletters-newsletter on philosophy and computers</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>26</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jose</surname>
          </string-name>
          Camacho-Collados and Mohammad Taher Pilehvar.
          <year>2018</year>
          .
          <article-title>From word to sense embeddings: A survey on vector representations of meaning</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>63</volume>
          :
          <fpage>743</fpage>
          -
          <lpage>788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Jose´</surname>
            Camacho-Collados, Mohammad Taher Pilehvar, and
            <given-names>Roberto</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>NASARI: a novel approach to a semantically-aware representation of items</article-title>
          .
          <source>In Proceedings of NAACL</source>
          , pages
          <fpage>567</fpage>
          -
          <lpage>577</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Giulio</given-names>
            <surname>Carducci</surname>
          </string-name>
          , Marco Leontino, Daniele P Radicioni, Guido Bonino, Enrico Pasini, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Tripodi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Semantically aware text categorisation for metadata annotation</article-title>
          .
          <source>In Italian Research Conference on Digital Libraries</source>
          , pages
          <fpage>315</fpage>
          -
          <lpage>330</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Davide</given-names>
            <surname>Colla</surname>
          </string-name>
          , Enrico Mensa, and Daniele P Radicioni.
          <year>2017</year>
          .
          <article-title>Semantic measures for keywords extraction</article-title>
          .
          <source>In Conference of the Italian Association for Artificial Intelligence</source>
          , pages
          <fpage>128</fpage>
          -
          <lpage>140</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Davide</given-names>
            <surname>Colla</surname>
          </string-name>
          , Enrico Mensa,
          <string-name>
            <given-names>Daniele P.</given-names>
            <surname>Radicioni</surname>
          </string-name>
          , and Antonio Lieto.
          <year>2018</year>
          .
          <article-title>Tell me why: Computational explanation of conceptual similarity judgments</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU)</source>
          ,
          <source>Special Session on Advances on Explainable Artificial Intelligence, Communications in Computer and Information Science (CCIS)</source>
          ,
          <source>Cham</source>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Gottlob</given-names>
            <surname>Frege</surname>
          </string-name>
          .
          <year>1948</year>
          .
          <article-title>Sense and reference</article-title>
          .
          <source>The philosophical review</source>
          ,
          <volume>57</volume>
          (
          <issue>3</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Wael H Gomaa and Aly A Fahmy</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A survey of text similarity approaches</article-title>
          .
          <source>International Journal of Computer Applications</source>
          ,
          <volume>68</volume>
          (
          <issue>13</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Grenon</surname>
          </string-name>
          and
          <string-name>
            <given-names>Barry</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Foundations of an ontology of philosophy</article-title>
          .
          <source>Synthese</source>
          ,
          <volume>182</volume>
          (
          <issue>2</issue>
          ):
          <fpage>185</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Zellig S Harris</surname>
          </string-name>
          .
          <year>1954</year>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          ,
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Peter E. Hart</surname>
            ,
            <given-names>Nils J.</given-names>
          </string-name>
          <string-name>
            <surname>Nilsson</surname>
            , and
            <given-names>Bertram</given-names>
          </string-name>
          <string-name>
            <surname>Raphael</surname>
          </string-name>
          .
          <year>1968</year>
          .
          <article-title>A formal basis for the heuristic determination of minimum cost paths</article-title>
          .
          <source>IEEE Transactions on Systems Science and Cybernetics</source>
          , SSC-
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Brent</given-names>
            <surname>Hecht</surname>
          </string-name>
          , Samuel H Carton, Mahmood Quaderi, Johannes Scho¨ning, Martin Raubal, Darren Gergle, and
          <string-name>
            <given-names>Doug</given-names>
            <surname>Downey</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Explanatory semantic relatedness and explicit spatialization for exploratory search</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>415</fpage>
          -
          <lpage>424</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Cheng-Hui</surname>
            <given-names>Huang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jian</given-names>
            <surname>Yin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Fang</given-names>
            <surname>Hou</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A text similarity measurement combining word semantic information with tf-idf method</article-title>
          .
          <source>Jisuanji Xuebao(Chinese Journal of Computers)</source>
          ,
          <volume>34</volume>
          (
          <issue>5</issue>
          ):
          <fpage>856</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Aminul</given-names>
            <surname>Islam</surname>
          </string-name>
          and
          <string-name>
            <given-names>Diana</given-names>
            <surname>Inkpen</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Semantic text similarity using corpus-based word similarity and string similarity</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data (TKDD)</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Jay J Jiang</surname>
          </string-name>
          and David W Conrath.
          <year>1997</year>
          .
          <article-title>Semantic similarity based on corpus statistics and lexical taxonomy</article-title>
          .
          <source>arXiv preprint cmp-lg/9709008.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Sergio</given-names>
            <surname>Jimenez</surname>
          </string-name>
          , Claudia Becerra, Alexander Gelbukh, Av Juan Dios Ba´tiz, and Av Mendiza´bal.
          <year>2013</year>
          .
          <article-title>Softcardinality-core: Improving text overlap with distributional measures for semantic textual similarity</article-title>
          .
          <source>In Proceedings of *SEM</source>
          <year>2013</year>
          , volume
          <volume>1</volume>
          , pages
          <fpage>194</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Christopher D. Manning</surname>
            , Mihai Surdeanu, John Bauer, Jenny Finkel,
            <given-names>Steven J.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
          </string-name>
          , and
          <string-name>
            <surname>David McClosky</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The Stanford CoreNLP natural language processing toolkit</article-title>
          .
          <source>In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Mensa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Daniele P.</given-names>
            <surname>Radicioni</surname>
          </string-name>
          , and Antonio Lieto.
          <year>2017</year>
          .
          <article-title>Merali at semeval-2017 task 2 subtask 1: a cognitively inspired approach</article-title>
          .
          <source>In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          , pages
          <fpage>236</fpage>
          -
          <lpage>240</lpage>
          , Vancouver, Canada, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Mensa</surname>
          </string-name>
          , Daniele P Radicioni, and Antonio Lieto.
          <year>2018</year>
          .
          <article-title>Cover: a linguistic resource combining common sense and lexicographic information</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>52</volume>
          (
          <issue>4</issue>
          ):
          <fpage>921</fpage>
          -
          <lpage>948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          , Courtney Corley,
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Strapparava</surname>
          </string-name>
          , et al.
          <year>2006</year>
          .
          <article-title>Corpus-based and knowledge-based measures of text semantic similarity</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>6</volume>
          , pages
          <fpage>775</fpage>
          -
          <lpage>780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>George A</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>WordNet: a lexical database for English</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Moro</surname>
          </string-name>
          , Alessandro Raganato, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Entity linking meets word sense disambiguation: a unified approach</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>2</volume>
          :
          <fpage>231</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          and Simone Paolo Ponzetto.
          <year>2010</year>
          .
          <article-title>BabelNet: Building a very large multilingual semantic network</article-title>
          .
          <source>In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>216</fpage>
          -
          <lpage>225</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          and Simone Paolo Ponzetto.
          <year>2012</year>
          .
          <article-title>BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>193</volume>
          :
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Word sense disambiguation: A survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Christian</given-names>
            <surname>Paul</surname>
          </string-name>
          , Achim Rettinger, Aditya Mogadala,
          <article-title>Craig A Knoblock,</article-title>
          and
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Szekely</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Efficient graph-based document similarity</article-title>
          .
          <source>In European Semantic Web Conference</source>
          , pages
          <fpage>334</fpage>
          -
          <lpage>349</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Jay</given-names>
            <surname>Pujara</surname>
          </string-name>
          , Hui Miao, Lise Getoor, and
          <string-name>
            <given-names>William</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Knowledge graph identification</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <fpage>542</fpage>
          -
          <lpage>557</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>Philip</given-names>
            <surname>Resnik</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Using information content to evaluate semantic similarity in a taxonomy</article-title>
          .
          <source>arXiv preprint cmp-lg/9511007.</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>Ray</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Smeaton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and John</given-names>
            <surname>Murphy</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Using wordnet as a knowledge base for measuring semantic similarity between words</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schuhmacher</surname>
          </string-name>
          and Simone Paolo Ponzetto.
          <year>2014</year>
          .
          <article-title>Knowledge-based graph document modeling</article-title>
          .
          <source>In Proceedings of the 7th ACM international conference on Web search and data mining</source>
          , pages
          <fpage>543</fpage>
          -
          <lpage>552</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>Robert</given-names>
            <surname>Speer</surname>
          </string-name>
          , Joshua Chin, and
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Havasi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Conceptnet 5.5: An open multilingual graph of general knowledge</article-title>
          .
          <source>In AAAI</source>
          , pages
          <fpage>4444</fpage>
          -
          <lpage>4451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Fabian M Suchanek</surname>
            ,
            <given-names>Gjergji</given-names>
          </string-name>
          <string-name>
            <surname>Kasneci</surname>
            , and
            <given-names>Gerhard</given-names>
          </string-name>
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <given-names>Rocco</given-names>
            <surname>Tripodi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Pelillo</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A gametheoretic approach to word sense disambiguation</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <given-names>Amos</given-names>
            <surname>Tversky</surname>
          </string-name>
          .
          <year>1977</year>
          .
          <article-title>Features of similarity</article-title>
          .
          <source>Psychological review</source>
          ,
          <volume>84</volume>
          (
          <issue>4</issue>
          ):
          <fpage>327</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <given-names>Denny</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          and Markus Kro¨tzsch.
          <year>2014</year>
          .
          <article-title>Wikidata: A free collaborative knowledge base</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>