<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Frame Embeddings for Event-Based Knowledge Reconciliation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Reforgiato Recupero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Misael Mongiovi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aldo Gangemi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petar Ristoski</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTC-CNR</institution>
          ,
          <addr-line>Rome, Catania</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIPN, Universite</institution>
          <addr-line>Paris 13</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Cagliari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper focuses on reconciling knowledge graphs generated from two text documents about similar events described di erently. The proposed approach employs and extends MERGILO, a tool for reconciling knowledge graphs extracted from text, using word similarity and graph alignment. Our approach e ectively handles events using Frame Embeddings and Frame Based Similarities. It is evaluated over a coreference resolution task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This study addresses the problem of knowledge reconciliation (KR) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] from
the perspective of events. KR is useful in providing a combination of
multiple graphs generated by multiple texts describing the same event. This merged
graph provides a graph based summary of multiple texts which is more easily
comprehensible by users and machines and usable by the algorithms providing
interactive exploration of graphs/text analytics through visualization methods.
      </p>
      <p>
        MERGILO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a tool for reconciling knowledge graphs extracted from
text, it rst computes the word similarity between the node labels and then
performs graph alignment over the whole graphs. When di erent verbs denote
similar events and di erent agents play slightly di erent roles, the string
matching techniques as introduced in MERGILO might be not appropriate in the
KR process. For overcoming this limitation we use Frame Semantics which
describes a situation in the text with the help of frames and roles. For identifying
frames and semantic roles of entities in a text we use FRED [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which generates
event-centered knowledge graphs from two di erent texts. Then, the similarity
between these events is computed by calculating the similarity between the
corresponding FrameNet frames and semantic roles (frame elements). We adapt
WordNet similarity measures to frames and roles and vector based similarities
using the FrameNet graph and the subsumption hierarchy of roles as de ned
in Framester [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We follow the approach RDF2Vec [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to generate graph based
frame embeddings. It uses graph mining algorithms such as graph walks and
graph kernels to traverse the graph for generating sequences, which are then
fed to a neural model for generating its vector representations. Finally, we show
experimentation over Cross-document Coreference Resolution (CCR) reporting
signi cant improvements over a baseline.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Event-Based Knowledge Reconciliation</title>
      <p>Consider the two sentences: \The Spaniards conquered the Incas." and \The
Incas were invaded by the Spaniards." They are describing the same event in the
past using di erent words i.e., event of an attack or an invasion from Spaniards
to Incas. Figure 1 shows the FRED graph of the rst sentence. Given two
such knowledge graphs, MERGILO rst performs graph compression by merging
nodes in the same graph. The two compressed graphs are aligned by establishing
a 1-1 correspondence between nodes of the two graphs by maximizing a score
function, which combines the similarity between aligned nodes and the similarity
between aligned edges. In such a case, the similarity between \conquered" and
\invaded" is not e ective since word similarity is low, although in this context
such words describe the same event.</p>
      <p>For computing similarity between two nodes containing verb senses, the verb
senses are rst mapped to frames using Framester mappings. For example, in
Figure 1 s1 vn.data1:Conquer 42030000 and for second sentence we have
s2 vn.data:Invade 10000000. According to Framester mappings, we obtain
s1 Ñ tConqueringu and s2 Ñ tAttacku. These nodes are replaced by their
corresponding frames. The edges containing the VN-roles are mapped to
FNroles. For example, in Figure 1, the verb sense vn.data:Conquer 42030000
evokes the roles vn.role:Agent and vn.role:Patient which are mapped to
fe:Conqueror.conquering and fe:Theme.conquering respectively.</p>
      <p>Then the similarities are computed in two ways: (i) by considering the
taxonomical structure imposed by the \inheritance" relation represented as
fnschema2:inheritsFrom in Framester using Path Similarity, Wu-Palmers
Similarity, Leacock-Chodorow Similarity; (ii) using Frame Embeddings.
1 prefix vn.data: http://www.ontologydesignpatterns.org/ont/vn/vn31/data/
2 prefix fnschema: http://www.ontologydesignpatterns.org/ont/framenet/tbox/
Frame Embeddings using RDF2Vec: To learn latent numerical representation of
the frames and roles in the FrameNet graph, we follow the RDF2Vec approach.
First we transform the graph into a set of sequences of entities, which is then
fed into a neural language models, resulting into vector representation of all the
nodes in the graph in a latent feature space.</p>
      <p>
        To convert the graph into a set of sequences of entities we use two approaches,
i.e., graph walks and Weisfeiler-Lehman Subtree RDF Graph Kernels. (i) Graph
Walks: given a graph G pV; Eq, for each vertex v P V , we generate all graph
walks Pv of depth d rooted in vertex v. To generate the walks, we use the
breadthrst algorithm. In the rst iteration, the algorithm generates paths by exploring
the direct outgoing edges of the root node vr. In the second iteration, for each of
the previously explored edges, the algorithm visits the connected vertices. The
nal set of sequences for the given graph G is the union of the sequences of
all the vertices PG vPV Pv. (ii) Graph Kernels: it computes the number of
sub-trees shared between two or more graphs by using the Weisfeiler-Lehman [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
test of graph isomorphism. This algorithm creates labels representing subtrees.
      </p>
      <p>Once the set of sequences of entities is extracted, we build a word2vec model.
Word2vec is a particularly computationally-e cient two-layer neural net model
for learning word embeddings from raw text. There are two di erent algorithms,
the Continuous Bag-of-Words model (CBOW) and the Skip-Gram model. The
CBOW model predicts target words from context words within a given window.
The input layer is comprised from all the surrounding words for which the input
vectors are retrieved from the input weight matrix, averaged, and projected in
the projection layer. Then, using the weights from the output weight matrix,
a score for each word in the vocabulary is computed, which is the probability
of the word being a target word. The skip-gram model does the inverse of the
CBOW model. Once the training is nished, the cosine similarity is computed
between two frames and roles.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimentation</title>
      <p>The experimentations were conducted for the task of Cross-document
Coreference Resolution on RDF graphs, which focuses on associating RDF nodes about
a same entity (object, person, concept, etc.) across di erent RDF graphs
generated from text. The dataset used for the experimentation was obtained by the
EECB dataset which speci es coreferent mentions (text fragment). Our dataset
was obtained by generating RDF graphs using FRED and associating text
mensions to graph nodes by manual annotations. Following are the metrics used
for evaluations3: (1) MUC is a link-based metric that quanti es the number of
merges necessary to cover predicted and gold clusters (ground truth). (2) B3 is
a mention-based metric that quanti es the overlap between predicted and gold
clusters for a given mention. (3) CEAFM (Constrained Entity Aligned F-measure
Mention-based) measures the number of corresponding mentions in the optimal
3 The formulas of Precision, Recall and F1 are suppressed because of space constraints.
one-to-one alignment between gold and predicted clusters. (4) CEAFE
(Constrained Entity Aligned F-measure Entity-Based) measures the overlap between
gold and predicted clusters in their optimal one-to-one alignment.</p>
      <p>MERGILO was considered as the baseline. Table 1 shows the results for
Wu-Palmer's similarity, Path similarity and Leacock-Chodorow similarity and
the results for cosine similarity using (i) graph walks, (ii) graph kernels. Here
Frame2Vec and Role2Vec refers to the vector representations generated for
FrameNet frames and frame elements i.e., semantic roles respectively. We
further built CBOW and Skip-Gram models with the following parameters: window
size = 5; number of iterations = 10; negative sampling for optimization;
negative samples = 25; with average input vector for CBOW. We experiment with
200, 500 and 800 dimensions. These results are compared with MERGILO. Each
model used for graph walks and graph kernels perform better for all the
considered metrics, showing a clear advantage of using the proposed approach. The
generated models are freely available on-line4.</p>
      <p>muc bcub ceafm ceafe
MERGILO Baseline 24.05 17.36 28.61 26.20</p>
      <p>Similarity Measures
Wu-Palmer 27.14 19.91 31.91 29.41
Path 27.16 19.93 31.85 29.38
Leacock Chodorow 27.04 19.80 31.74 29.21</p>
      <p>Graph Walks
Frame2Vec Role2Vec muc bcub ceafm ceafe
CBOW 200 CBOW 200 27.34 19.99 32.15 29.82
CBOW 200 SG 800 27.38 19.97 32.29 29.98
CBOW 200 SG 500 27.28 19.95 31.99 29.54</p>
      <p>Graph Kernels
Frame2Vec Role2Vec muc bcub ceafm ceafe
CBOW 200 CBOW 200 26.76 19.57 31.50 29.06
CBOW 200 SG 200 26.70 19.52 31.45 28.99
CBOW 200 SG 500 26.70 19.52 31.45 28.99</p>
      <p>SG 500 CBOW 200 26.90 19.68 31.58 29.08
4
Ongoing work includes application of frame embeddings in real systems, such as
news series integration, knowledge graph evolution with robust event
reconciliation (e.g. text streaming) etc.
4 http://lipn.univ-paris13.fr/~alam/Frame2Vec/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. G. de Vries and S. de Rooij.
          <article-title>Substructure counting graph kernels for machine learning from rdf data</article-title>
          .
          <source>J. Web Sem., 35</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Asprino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Reforgiato</surname>
          </string-name>
          .
          <article-title>Framester: A wide coverage linguistic linked data hub</article-title>
          .
          <source>In EKAW</source>
          <year>2016</year>
          .,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Draicchio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongiov</surname>
          </string-name>
          .
          <article-title>Semantic web machine reading with FRED</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongiov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Reforgiato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Consoli</surname>
          </string-name>
          .
          <article-title>Merging open knowledge extracted from text with MERGILO</article-title>
          .
          <source>Knowl.-Based Syst., 108</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>RDF2Vec: RDF Graph Embeddings for Data Mining</article-title>
          .
          <source>In ISWC</source>
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>