Community graph and linguistic analysis to validate relationships for
                      knowledge base population
                         Rashedur Rahman1 and Brigitte Grau2 and Sophie Rosset3
                            1
                              IRT SystemX, LIMSI, CNRS, Université Paris-Saclay
                                  rashedur.rahman@irt-systemx.fr
                               2
                                 LIMSI, CNRS, ENSIIE, Université Paris-Saclay
                                       brigitte.grau@limsi.fr
                                   3
                                     LIMSI, CNRS, Université Paris-Saclay
                                       sophie.rosset@limsi.fr

                                Abstract                    method described in this paper is in the framework
                                                            of the latter task which, given an entity and a re-
        Relation extraction between entities from
                                                            sponse provided by a system (its value and a text-
        text plays an important role in informa-
                                                            segment that justifies the claimed relation), has to
        tion extraction and knowledge discovery
                                                            decide whether the value is correct or not. We fo-
        related tasks. Relation extraction sys-
                                                            cus on relations that occur between two entities.
        tems produce a large number of candi-
                                                               Different approaches have been studied for vali-
        dates where many of them are not cor-
                                                            dating relations particularly by evaluating the con-
        rect. A relation validation method justi-
                                                            fidence that a system can have on the source of the
        fies a claimed relation based on the in-
                                                            response, i.e. the document that justifies the re-
        formation provided by a system. In this
                                                            sponse (Yu et al., 2014) and the confidence score
        paper, we propose some features by an-
                                                            of the system (Viswanathan et al., 2015). Never-
        alyzing the community graphs of entities
                                                            theless, other criteria are needed that concern vali-
        to account for some sort of world knowl-
                                                            dating semantics of a relation by linguistic charac-
        edge. The proposed features improve vali-
                                                            teristics (Niu et al., 2012; Hoffmann et al., 2011;
        dation of relations significantly when they
                                                            Yao et al., 2011; Riedel et al., 2010) and are simi-
        are combined with voting and some state-
                                                            lar to those used in relation extraction task.
        of-the-art linguistic features.
                                                               However, in most cases, the different relation
1       Introduction                                        validation methods do not take into account the
                                                            global information, that can be computed on the
Extracting relations from texts is important for dif-       collection of text-documents. Collection level
ferent information extraction and knowledge dis-            global information about the object of a relation
covery related tasks such as knowledge base popu-           and words around the mentions have been taken
lation, question-answering, etc. This task requires         into account for web relation extraction by (Au-
Natural Language Understanding of pieces of text            genstein, 2016). Such information allows to in-
which is particularly complex when searching for            troduce some sort of world knowledge for mak-
a large number of semantic relations that describe          ing choices based on criteria that are independent
entities in the open domain. The relationship types         of how a relation is expressed in the text-segment.
can be relative to the family of a person (spouse,          We hypothesize that two entities having a true re-
children, parents etc.) or characteristics of a com-        lationship should be linked to more common enti-
pany (founder, top_members_or_employees etc.),              ties than a proposed false relationship between that
etc. This task, named slot filling, is evaluated in         pair of entities. For example, the spouse of a per-
the KBP evaluation1 in which systems must ex-               son will share more places and relationships with
tract instances of around 40 relation types related         his/her spouse than with other people. Therefore,
to several kinds of entities (person, organization,         we extracted a graph of entities from the collec-
location and their different sub-types). Thus, in or-       tion that allowed us to propose new characteriza-
der to take advantage of several system’s capabili-         tions of the relations by graph-based features (Han
ties and improve results, a final step can be added         et al., 2011), (Friedl et al., 2010), (Solá et al.,
that enables to validate results of the systems. The        2013). We also introduce information-theoretic
    1
        https://tac.nist.gov/                               measurements on the graph of entities, some of


                                                      133
which have been successfully used in other tasks,         for different types of relation in the open do-
such as entropy for knowledge detection in pub-           main. Therefore, recently neural network based
lishing networks (Holzinger et al., 2013) and mu-         methods have been popular for relation classi-
tual information for the validation of responses in       fication task (Vu et al., 2016), (Dligach et al.,
question answering systems (Magnini et al., 2002;         2017), (Zheng et al., 2016). These methods use
Cui et al., 2005). Additionally, we propose de-           word-embeddings for automatically learning the
pendency pattern edit-distance for capturing the          patterns and semantics of relations without using
syntactic evidence of relations. Word-embeddings          any handcrafted features. Dependency based neu-
have also been explored to detect the unknown             ral networks have also been proposed (Cai et al.,
triggers of relation expression.                          2016), (Liu et al., 2015) to capture features on the
   The relation validation method we propose is           shortest path.
thus based on three categories of information: lin-          A voting method has been proposed by (Sam-
guistic information associated with the expression        mons et al., 2014) for ensemble systems to val-
of the relations in texts, information coming from        idate the outcomes that are proposed by multi-
the graphs of entities built on the collection, and       ple systems from different information sources.
finally information related to the systems and the        This method shows good results and remains sta-
proposals made. We evaluated our relation vali-           ble from a dataset to another.
dation system on a sub-part of KBP CSSF-2016                 Several graph based methods (Gardner and
corpus and show that the validation step achieves         Mitchell, 2015), (Lao et al., 2015), (Wang et al.,
around 5% to 8% higher accuracy over the base-            2016) have been proposed for knowledge base
line features when they are combined together             completion task by applying Path Ranking Algo-
with the proposed graph-based features.                   rithm (Lao and Cohen, 2010). These methods ba-
                                                          sically use the already existing relationships in a
2   Related Works                                         knowledge base to learn inference and create new
                                                          relations by the inference model. Yu and Ji (2016)
Relation validation methods have studied different        proposed a graph based method for trigger-word
kinds of features to decide if a type of relation ex-     identification for slot filling task by using PageR-
ists or not.                                              ank and Affinity propagation on a graph built at
   Existence and semantic assessment of relation          sentence level.
candidates rely on linguistic features, as syntac-           Information-theoretic measurements on graphs
tic paths or the existence of trigger words between       have been successfully used in some related tasks.
the pair of entity-mentions. Dependency tree (Cu-         Holzinger et al. (Holzinger et al., 2013) measured
lotta and Sorensen, 2004), (Bunescu and Mooney,           entropy to discover knowledge in publication net-
2005), (Fundel et al., 2007) provides clues for de-       works. Some question-answering systems mea-
ciding the presence of a relation in unsupervised         sured point-wise mutual information (Magnini
relation extraction. Gamallo et al. (2012) pro-           et al., 2002), (Cui et al., 2005) to exploit redun-
posed rule-based dependency parsing for open in-          dancy. In order to find the important and influen-
formation extraction. They defined some patterns          tial nodes in a social network, centralities of the
of relation by parsing the dependencies and dis-          nodes have been measured (Friedl et al., 2010).
covering verb-clauses in the sentences.                   Solá et al. (2013) explored the concept of eigen-
   Syntactic analysis cannot characterize the type        vector centrality in the multiplex networks. In or-
of a relation. Therefore, words around the en-            der to validate the proposed relationships, we ap-
tity mentions in sentences have been analyzed to          ply these different measures on graphs of entities
characterize the semantics of a relation (Niu et al.,     constructed from the text-collection.
2012), (Hoffmann et al., 2011), (Yao et al., 2011),
(Riedel et al., 2010), (Mintz et al., 2009). Chowd-       3     Community Graph of Entities
hury et al. (2012) proposed a hybrid kernel by
combining dependency patterns and trigger words           3.1    Definition of the Graph
for bio-medical relation extraction. Thus we ex-          Let, a graph G = (E, R), a query relation (slot)
plored these different kinds of linguistic features       rq , a query entity eq ✏E, candidate responses ec =
for validating relationships.                             {ec1 , ec2 , . . . , ecn }✏E where rq = r(eq , ec )✏R.
   It can be difficult to identify the trigger words      The list of candidates is generated by different re-


                                                    134
lation extraction systems. Suppose, other relations        Freebase, geo-names etc and perform with high
ro ✏R where ro 6= rq . We characterize whether             precision. It is able to decompose the entity men-
a candidate-entity ec i of Ec is correct or not for        tions into components, such as first name, last
a query relation (rq ) by analyzing the communi-           name and title for a person named entity and
ties of Xq and Xc formed by the query entity and           classifies location named entities into country,
each candidate response. A community Xi con-               state/province and city. When the two systems dis-
tains the neighbors of ei , and this up to several         agree, as in (Stanford: location, Luxid: person),
possible steps.                                            we choose the annotation produced by Luxid be-
                                                           cause it provides more precise information about
                                                           the detected entity than Stanford does.
                                                              The knowledge graph represents documents,
                X                 D


                                                           sentences, mentions and entities as nodes and the
                     Barack
                     Obama                                 edges between these nodes represent relationships
                                                           between these elements.

                                               C
                                                                                       IN_SAME_SENTENCE
                                                                        barack obama                      michelle obama


             Figure 1: Community graph

   Fig. 1 shows an example of such type of graph
where the entity of a query, its type and rela-                   Barack                     Obama                   Michelle
                                                                 name.first                 name.last               name.first
tionship name are Barack Obama, person and
spouse accordingly. The candidate responses are
                                                                                             Sentence-1


Michelle Robinson and Hilary Clinton that are                                                  Doc-1

linked to Barack Obama by spouse relation hy-
pothesis. The objective is to classify Michelle
                                                                              Figure 2: Knowledge graph
Robinson as the correct response based on the
community analysis. The communities of Barack
Obama (green rectangle), Michelle Robinson (pur-              Multiple mentions of the same entity found in
ple circle) and Hilary Clinton (orange ellipse)            the same document are connected to the same en-
are defined by in_same_sentence relation which             tity node in the knowledge graph, based on the
means the pair of entities are mentioned in the            textual similarity of the references and their possi-
same sentence in the text. The graph is thus               ble components, which corresponds to a first step
constructed from untyped semantic relationships            of entity linking on local criteria. This operation
based on co-occurrences. It would also be possi-           is performed by Luxid. However, an entity can
ble to use typed semantic relationships provided           be mentioned in different documents with differ-
by a relation extraction system.                           ent forms (eg, Barack Obama, President Barack
                                                           Obama, President Obama etc.) which creates re-
3.2 Construction of the Community Graph                    dundant nodes in the knowledge graph. Entities
                                                           are then grouped according to the similarity of
The graph of entities as illustrated in Fig. 1 is cre-
                                                           their names and the similarity of their neighboring
ated from a graph representing the knowledge ex-
                                                           entities calculated by Eq. 2. This step groups the
tracted from the texts (lower part of Fig. 2) called
                                                           similar entities into a single entity in the commu-
knowledge graph. This knowledge graph is gener-
                                                           nity graph (upper part of Fig. 2). This latter graph
ated after applying systems of named entity recog-
                                                           is constructed from the information on the entities
nition (NER) and sentence splitting.
                                                           and relations present in the knowledge graph and
   Recognition of named entities is done us-
                                                           the link with the documents is always maintained.
ing Stanford system (Manning et al., 2014) and
                                                           It is thus possible to know the number of occur-
Luxid2 . Luxid is a rule-based NER system that
                                                           rences of each entity and each relation. The graph
uses some external information sources such as
                                                           is stored in a Neo4j database, a graph-oriented
   2
       http://www.expertsystem.com/fr/                     database, which makes it possible to extract the


                                                     135
subgraphs linked to an entity by queries. We only             The eigenvector centrality (Bonacich and
consider as members of the communities the enti-           Lloyd, 2001) measures the influence of a node in
ties of type person, location and organization.            a network. A node will be even more influen-
                                                           tial if it is connected to other influential nodes.
4    Relation Validation                                   We hypothesize that the query-entity should be
In order to predict whether a relationship is cor-         more influenced by the correct candidate than by
rect or not, we consider this problem as a binary          other candidates. We measure the influences of the
classification task based on three categories of in-       candidates in the community of the query-entity
formation. We calculate a set of features using the        by calculating the absolute difference between the
graphs (see section 4.1), to which we add features         score of eigenvector centrality of the query-entity
based on a linguistic analysis of the text that jus-       and that of each candidate. We, therefore, assume
tifies the candidate and describes the relationship        that this difference should be smaller for a correct
(see section 4.2) and an estimation of trust on the        candidate than for an incorrect candidate. Suppose
candidates according to the frequencies of them in         A = (ai,j ) is the adjacency matrix of a graph G.
the responses of each query (see section 4.3). Ta-         The eigenvector centrality xi of node i is calcu-
ble 1 summarizes all the features used for the clas-       lated recursively by Eq. 3.
sification task.                                                                  1X
                                                                           xi =           ak,i xk                (3)
4.1 Graph-based Features                                                             k
We assume that a correct candidate of a query is an              where, 6= 0 is a constant and the equation
important member of the community of the query             can be expressed in matrix form: x = xA
entity. A community Xe of an entity is defined by             We also hypothesize that mutual information
the sub-graph formed by its neighbors up to sev-           (Eq. 4) between the pair of communities of a
eral levels. A merging of the communities of two           query-entity and a correct candidate must be
particular entities includes all the neighbors of that     higher than that computed between the communi-
pair of entities. We, therefore, define different fea-     ties of the query-entity and an incorrect candidate.
tures related to this hypothesis.
   We hypothesize that the network density (Eq. 1)         M I(Xq , Xc ) = H(Xq ) + H(Xc )           H(Xq , Xc )
of the community of a correct candidate merged                                                               (4)
with the community of the query entity must be                                      n
                                                                                    X
higher than the density of an incorrect candidate’s              where, H(X) =            p(ei ) log2 (p(ei ))
community merged with that of the same entity.                                      i=1
                                                                                  number of edges of e
          number of existing edgeswith e                                 p(e) =
    ⇢Xe =                                         (1)                             number of edges of X
            number of possible edges
                                                                Xq and Xc are the communities of a query-
According to the Fig. 1 the merged community
                                                           entity and a candidate respectively.
of Michelle Robinson and Barack Obama is more
dense than the merged community of Hilary Clin-
                                                              The community of an entity (query-entity or
ton and Barack Obama.
                                                           candidate) is extended up to the third level to mea-
   We compute the network similarity (Eq. 2) be-
                                                           sure eigenvector centrality and mutual informa-
tween two communities and hypothesize that the
                                                           tion.
score of the network similarity between the com-
munities of a query entity and a correct candidate         4.2   Linguistic Features
would be higher than the score between that query
                                                           For assessing if a relation exists between the pair
entity and a wrong candidate.
                                                           of entity mentions, we define syntactic features.
                         |Xq \ Xc |                        For characterizing the semantic of the relation, we
            similarity = p                        (2)      represent it by seed words and analyze the sen-
                           |Xq ||Xc |
                                                           tence at the lexical level.
     where, Xq and Xc are the community mem-                  Syntactic features are calculated from depen-
bers of the query entity and of a candidate entity         dency analysis, i.e. the parser (Manning et al.,
accordingly.                                               2014) provides a tree in which nodes are the


                                                     136
            Feature Group            Feature Name
                                     Network density
                                     Eigenvector centrality
            Graph
                                     Mutual information
                                     network similarity
                                     Minimum edit distance between dependency patterns
                                     Dependency pattern length
                                     Are the query and filler mentions found in the same clause
            Linguistic               Has trigger word between mentions
                                     Has trigger word in dependency path
                                     Has trigger word in minimum subtree
                                     Is trigger based relation
            Baseline (voting)        Filler credibility

                                    Table 1: Relation validation features

                                                            imum edit distance as a feature.
words of the sentence and the edges between                     Since relations are often expressed in short de-
them are labeled by their syntactic role. We col-           pendency paths, the length of the simplified pat-
lected a list of dependency patterns for each re-           tern is considered as a feature.
lation from annotated examples. For example,                    The semantic analysis is performed based on
in the sentence Paola, Queen of the Belgians is             trigger words associated with the relation types.
the wife of King Albert of Belgium. the de-                 We consider semantic features as boolean values
pendency pattern between Paola and King Al-                 by defining two types of trigger words: positive
bert is [nn, nsubj, prep_of] and the dependen-              trigger and negative trigger. Positive trigger words
cies are nn(Queen, Paola), nsubj(wife, Queen),              refer to the keywords that strongly support a par-
prep_of(wife, Albert). We simplify the pattern              ticular relation while the negative triggers strongly
[nn, nsubj, prep_of] to [nsubj, prep_of] by remov-          negate the claimed relations. For example, wife,
ing leading and following nn for noise reduction.           husband, married are positive triggers while par-
We notice that sometimes the dependency pat-                ent, children, brother are negative triggers for a
terns contain consecutive labels like [nsubj, dobj,         spouse relation. We collected these seed words
prep_of, prep_of, poss]. In such cases, we sim-             from the assessed dataset of TAC KBP 2014 slot
plify the pattern by substituting the consecutive           filling task. In total we collected around 250 trig-
labels with a single label which leads to simplify          gers and 553 dependency patterns of 41 relations
[nsubj, dobj, prep_of, prep_of, poss] into [nsubj,          from 3, 579 annotated snippets.
dobj, prep_of, poss]. This simplification general-              Since the relations are expressed by a variety
izes the dependency patterns.                               of words it is hard to collect all the trigger words
   The acquired patterns are compared to the sim-           for a relation. Therefore, we associate a word
plified dependency path of a sentence by comput-            embedding to each trigger by using a pre-trained
ing edit distances. Suppose a list of pre-annotated         GloVe (Pennington et al., 2014) model. Thus, de-
dependency patterns are (a,b,c), (a,c,d), (b,c,d) for       ciding if a word is a trigger or not relies on the
a relation R and the dependency pattern (a,c,b)             similarity of their embeddings. Suppose, a, b are
is extracted from a relation provenance sentence            two words between the query and filler mention of
between the query and the filler mention for a              a claimed relation R and x, y, z the positive triggers
claimed relation to be R. We calculate the edit             for the claimed relation. We compute the similar-
distance between each pair of [(a,c,b), (a,b,c)],           ity between the vectors of each pair of (a,x), (a,y),
[(a,c,b), (a,c,d)],[(a,c,b), (b,c,d)] and keep the min-     (a,z), (b,x), (b,y), (b,z). If the similarity score for


                                                      137
a word from a, b satisfies a predefined threshold          2015 and 2016. TAC provided a reference cor-
(0.7) we consider that there exists a trigger word.        pus for the English CSSF-2015 evaluation task
We check whether there is any positive and/or neg-         that consisted of 45, 000 documents. These doc-
ative trigger word in three cases for validating a         uments include texts from newswires and discus-
claimed relation: 1) between the mentions at sur-          sion_forums. We parsed these texts for building
face level 2) in the dependency path and 3) in the         the knowledge graph. We compiled our train-
minimum subtree as in (Chowdhury and Lavelli,              ing data from the assessments of English CSSF-
2012).                                                     2015 responses of slot filling systems. There were
   Some relations can be expressed without using           9, 339 round-1 queries for English CSSF2015 and
any trigger word. For example, the snippet Mr.             in total 330, 314 round-2 queries were generated
David, from California won the prize expresses             by all the slot filling systems based on the re-
the per:city_of_residence relation without explic-         sponses of round-1 queries. NIST assessed SF
itly using any trigger word. We classify the rela-         responses of around 2, 000 round-1 and 2, 500
tion types in two classes: can be expressed with-          round-2 queries. A lot of queries have been an-
out trigger word or not, and use a boolean flag            swered with only wrong responses. Therefore, we
(is_trigger-based_relation) as a feature.                  do not take into account these queries for build-
                                                           ing our training corpus. We selected only queries
4.3 Voting: Filler Credibility                             that have been answered with correct and wrong
We use and calculate the credibility score for can-        responses. This subset counted total 1, 296 (1, 080
didates based on all the responses given by differ-        round-1 and 216 round-2) slot filling queries.
ent systems to a query.                                       We extracted answers corresponding to those
                                                           queries from the system assessment file that con-
                                                           tains the assessment of the filler values and
    f iller credibility (Fi , Q)                           relation-provenance offsets accordingly. The re-
              number of occurrences of Fi                  lation provenance offsets refer to the document
     =
         # of occurrences of all the candidates            ids and begin-end position of the text segments
                                               (5)         in the evaluation corpus. The values of filler as-
                                                           sessment can be correct (C), wrong (W) and in-
   Let F be the candidates of a query Q supplied           exact (X) while the assessment values of relation
by the systems S. The credibility of a candidate           provenance can be correct (C), wrong (W), short
Fi is computed by the equation 5.                          (S) and long (L) where S and L are considered
   The filler credibility counts the relative vote of      as inexact. We only take into account the C and
a candidate which indicates the degree of agree-           W filler assessments and separate the correct and
ment by different systems to consider the candi-           wrong responses according to the relation prove-
date as correct. Since we can assume that sys-             nance assessment. When the relation provenance
tems already performed some linguistic and prob-           assessment is C the filler assessment can be ei-
abilistic analysis to make the responses, filler cred-     ther C or X but not W. It results in 68, 076 re-
ibility holds strong evidence for a candidate to be        sponses. Several features have to be computed on
correct. Some slot filling and slot filler valida-         complete sentences, and not on sentence excerpts.
tion methods have used the system credibility (Yu          As the relation provenance offset of a SF response
et al., 2014) and confidence score (Wang et al.,           is not guaranteed to be a complete sentence, we
2013; Viswanathan et al., 2015; Rodriguez et al.,          extract the complete sentence corresponding to the
2015) of the responses for validating relations but        relation provenance offset snippet from the source
these features are much system and data depen-             document.
dent. Therefore, we use only filler credibility as
the baseline.                                                 The linguistic features are calculated from the
                                                           analyzed sentences where the mentions of the pair
5    Experiments and Results                               of entities (query and candidate) must be identi-
                                                           fied. However, our system cannot find both en-
5.1 Data                                                   tries in all selected sentences. This happens when
We perform our experiments by using TAC-KBP                either the query entity or the candidate entity is
English cold start slot filling (CSSF) datasets of         mentioned by a pronoun or nominal anaphora as


                                                     138
we do not use any co-reference resolution. In ad-        5.2   Results
dition, the named entity detection system, which
                                                         We have trained the models by using several clas-
results from two efficient systems, does not de-
                                                         sifiers and evaluated relation validation method by
tect all the entity mentions (of queries and candi-
                                                         standard precision, recall, F-measure and accuracy
dates) present in the queries and hypotheses. This
                                                         of different models.
restriction also applies to the computation of fea-
                                                            In this experiment we show the contribution of
tures based on the graph that is constructed over
                                                         the proposed graph based features for validating
the recognized named entities. This behavior cor-
                                                         relations. Since community-graph based analysis
responds to a trend generally observed in named
                                                         does not account the semantics but holds some ev-
entity recognition systems when applied to dif-
                                                         idences of how the entities are associated to each
ferent documents from those on which they were
                                                         other we expect significant gain of precision by
trained (here web documents and blogs instead of
                                                         our relation validation method so that a better F-
newspaper articles or Wikipedia pages).
                                                         score can be achieved.
                                                            We compare the classification performances of
   Moreover, adding the constraint to finding two        different classifiers e.g. LibLinear, SVM, Naive
entities in the same sentence causes this additional     Bayes, MaxEnt and Random Forest based on the
decrease in performance. A total of 55, 276 hy-          best combination of the features as shown in Ta-
potheses (of the initial 68, 076) could be processed     ble 2. We obtain the best precision (32.2), re-
to compute the linguistic features for the responses     call (48.1), F-score (38.5) and accuracy (72.4) by
of 1, 296 queries that have been responded with          Random Forest classifier. The second highest pre-
both correct and wrong candidates. Our system            cision (29.0) and accuracy (72.35) are resulted
restricts to extract graph based features of limited     by MaxEnt while Naive Bayes results the second
number of query-responses due to the NER limi-           highest recall (45.8) and F-score (38.5). Since
tation and in_same_sentence constraint. In sum-          Random Forest results the best scores over other
mary, we can extract both the linguistic and graph       classifiers we observe the performances of differ-
features for 4, 321 responses from 260 queries,          ent feature sets by this classifier.
(213 from round-1 and 47 from round-2). Since               Table 3 presents the classification performances
there are many wrong responses compared to the           of different feature sets by Random Forest classi-
number of correct responses, we take a subset of         fier. Filler credibility as a single feature obtains
the wrong responses randomly from CSSF-2015              the F-score and accuracy of 34.6 and 62.1 accord-
dataset for training the system after removing the       ingly. It represents a strong baseline, as shown in
duplicate responses where the ratio of correct and       (Sammons et al., 2014). The combination of filler
wrong responses of each query is 2 : 1 approx-           credibility and linguistic features boosts the per-
imately. After applying the filtering process the        formance by around 6 points in term of accuracy
training dataset contains in total 3, 481 (1, 268        though the F-score drops slightly. When the filler
positive and 2, 213 negative) instances.                 credibility is combined with the graph features it
                                                         gains a significant precision (29.0) and accuracy
                                                         (70.3) which are around 4 and 8 points higher ac-
   Similar process has been applied on TAC KBP           cordingly. This combination also increases the F-
CSSF-2016 dataset that we use for testing. CSSF-         score slightly. This may seem surprising, as the
2016 dataset consists of around 30, 000 docu-            graphs are the same for assumptions about pairs
ments. Around 34, 267 responses (of 925 queries)         of identical entities but linked by different rela-
have been assessed as correct or wrong by TAC.           tionships. It should be noted that the relation hy-
Our system could compute graph based features            potheses are already the results of different sys-
for around 3, 884 responses of 352 queries that          tems based on the semantics of relations. The
have been responded by both correct and wrong            graph-based features account the global context of
answers. There are 699 correct and 3, 185 wrong          the hypotheses of relations and the experimental
responses among 3, 884 responses with graph              results show the significant contribution of them.
based features in our test dataset. The statistics          The best precision (32.2), F-score (38.5) and
of the training and test datasets are shown in the       accuracy (72.4) are achieved by combining filler
columns 1 to 4 of Table 4.                               credibility, linguistic and graph features. This


                                                   139
               Classifier            Precision          Recall         F-measure       Accuracy
               LibLinear                26.5             38.5            31.4            69.72
               SVM                      23.7             29.9            26.4            70.06
               Naive Bayes              28.0             45.8            34.8            69.07
               MaxEnt                   29.0             37.1            32.5            72.35
               Random Forest            32.2             48.1            38.5            72.40

                    Table 2: Relation validation performances by different classifiers

          Feature Groups                 Precision           Recall       F-measure        Accuracy
          Filler credibility (Fc)              25.1             55.8            34.6          62.1
          Fc + Linguistic                      26.8             45.2            33.7          68.0
          Fc + Graph                           29.0             45.1            35.3          70.3
          Fc + Linguistic + Graph              32.2             48.1            38.5          72.4

                   Table 3: Relation validation performances by different feature sets


combination improves the precision, F-score and             relations (org-parents, per-country_of_birth and
accuracy by around 7, 4 and 10 points over the              per-stateorprovince_of_birth) have very small
filler credibility. The precision is gained because         number of positive instances where all of these
of low false negative that indicates the system clas-       have been classified as wrong and these relations
sifies a small number of wrong responses as cor-            individually counts zero true positive. Therefore,
rect. This results strongly signify the contribution        the precision, recall and F-score become zero.
of graph based features for validating the claimed          However, most of the negative instances of
relations.                                                  these relations have been correctly classified
    We also investigated the classification perfor-         as wrong. Moreover, the test dataset contains
mance of Fc+Linguistic+Graph model relation by              only negative instances for some relations
relation as shown in Table 4.                               (per-cities_of_residence, per-schools_attended,
    We notice that the classification performances          per-children, org-alternate_names and org-
are not similar for different relations according           founded_by). All of the instances have been
to the F-score and accuracy although we train               correctly classified as wrong that result 100%
a single model with the responses of different              accuracy for these relations. Interestingly, we
relations. However, as we do not have the similar           notice that proposed relation validation method
number of training instances for all the relations          discards all the wrong instances of per-children
(e.g.     per-city_of_birth and org-subsidiaries),          relation although the training dataset does not
as we see in column 2, it may impact the                    contain any positive or negative instances of this
results. Additionally, in the test data, there              relation. A similar result has also been observed
are a very small number of positive responses               for per-country_of_death relation where the
compared to the negative ones (see column 5)                system of relation validation obtains an accuracy
for some relations that cause inconsistency in              of 86.7 to validate the instances. These results
classification performance. For example, the                justify that the relation validation system trained
distribution of positive and negative responses of          by the instances of different relations is able to
per:top_members_employees, per:city_of_birth                predict correctly whether an instance of a relation
and per:parents are more balanced compared                  is correct or wrong even though the system is not
to countries_of_residence, org-subsidiaries, em-            trained by the instances of that particular relation.
ployee_or_member_of, country_of_headquarters.                  Table 5 explains better that our system performs
Therefore, these two sets of relations make a               better than the baseline system to discard the neg-
clear difference between their scores. Also, some           ative responses. It presents the confusion matrix


                                                      140
                                             Training Data                                   Test Data
    Relation Name                       # Instances    Pos. (%)       # Instances     Pos. (%)         F-score   Accuracy
    org-top_members_employees           147            46.3           59              29 (49.2%)       93.1      93.2
    per-city_of_birth                   423            33.3           92              70 (76.1%)       86.8      77.2
    statesorprovinces_of_residence      81             33.3           170             70 (41.2%)       78.8      80.0
    org-city_of_headquarters            402            33.3           175             41 (23.4%)       63.0      80.6
    per-parents                         37             94.6           32              17 (53.1%)       61.1      56.3
    per-country_of_death                0              0              120             6 (5%)           33.3      86.7
    per-countries_of_residence          501            33.3           1461            261 (17.9%)      27.9      66.0
    org-country_of_headquarters         474            33.3           416             82 (19.7%)       22.0      71.0
    stateorprovince_of_headquarters     157            31.2           369             60 (16.3%)       11.4      74.8
    org-subsidiaries                    60             35             251             34 (13.5%)       15.4      65.0
    per-employee_or_member_of           566            37.3           248             16 (6.5%)        11.9      64.1
    org-parents                         10            40              21              10 (47.6%)      0.0        52.4
    per-country_of_birth                42            33.3            66              6 (1.5%)        0.0        65.2
    per-stateorprovince_of_birth        93            33.3            19              2 (10.5%)       0.0        52.6
    per-cities_of_residence             78            33.3            123             0 (0%)          0.0        85.4
    per-schools_attended                12            33.3            34              0 (0%)          0.0        100
    per-children                        0             0               221             0 (0%)          0.0        100
    org-alternate_names                 65            49.2            6               0 (0%)          0.0        100
    org-founded_by                      40            70              1               0 (0%)          0.0        100
    All Together                        3481          36.4            3884            699 (18%)       38.5       72.4


                         Table 4: Performance of relation validation relation by relation

              Feature Groups                    TP                     FN           FP         TN
              Filler credibility (Fc)           390 (55.8%)            309          1,164      2,021 (63.5%)
              Fc + Linguistic                   316 (45.2%)            383          862        2,323 (72.9%)
              Fc + Graph                        315 (45.1%)            384          771        2,414 (75.8%)
              Fc + Linguistic + Graph           336 (48.1%)            363          709        2,476 (77.7%)

Table 5: Confusion matrix resulted by different feature sets (where the number of positive and negative
instances are 699 and 3, 185 accordingly)


(true positive (TP), false negative (FN), false pos-               to account for general knowledge about the en-
itive (FP) and true negative (TN)) of the classi-                  tities having true relationships. Experimental re-
fication task by different feature sets. The com-                  sults have shown that our proposed features sig-
bination, Fc+Linguistic+Graph discards 77.7%                       nificantly improve a baseline constructed from the
wrong responses which is around 14% higher than                    votes on the responses of different systems. The
the baseline. All the experimental results signify                 proposed method outperforms the baseline to dis-
the contribution of the proposed features for the                  card wrong relationships.
task of relation validation.
                                                                      The calculation of the different characteristics is
6     Conclusion                                                   dependent on the parsing of the texts, in particular,
                                                                   on the results of the NER system. This part has to
This paper presents a method of validating rela-                   be improved in order to evaluate the contribution
tionships from the system outputs. We have intro-                  of community graph on more responses. Although
duced some features on the linked entities which                   the proposed method results better F-score and ac-
are computed at the global level of the collec-                    curacy compared to the baseline, the method also
tion. We have proposed to deal with the com-                       discards some positive responses that drops the re-
munity graphs of entities that make it possible                    call; thus we have to overcome this limitation.


                                                             141
References                                                    Matt Gardner and Tom M Mitchell. 2015. Efficient and
                                                               expressive knowledge base completion using sub-
Isabelle Augenstein. 2016. Web Relation Extraction             graph feature extraction. In EMNLP. pages 1488–
   with Distant Supervision. Ph.D. thesis, University          1498.
   of Sheffield.
                                                              Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective
Phillip Bonacich and Paulette Lloyd. 2001.                      entity linking in web text: a graph-based method. In
  Eigenvector-like measures of centrality for asym-             Proceedings of the 34th international ACM SIGIR
  metric relations. Social networks 23(3):191–201.              conference on Research and development in Infor-
Razvan C Bunescu and Raymond J Mooney. 2005.                    mation Retrieval. ACM, pages 765–774.
  A shortest path dependency kernel for relation ex-
                                                              Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
  traction. In Proceedings of the conference on Hu-
                                                                Zettlemoyer, and Daniel S Weld. 2011. Knowledge-
  man Language Technology and Empirical Methods
                                                                based weak supervision for information extraction
  in Natural Language Processing. Association for
                                                                of overlapping relations. In Proceedings of the 49th
  Computational Linguistics, pages 724–731.
                                                                Annual Meeting of the Association for Computa-
Rui Cai, Xiaodong Zhang, and Houfeng Wang.                      tional Linguistics: Human Language Technologies-
  2016. Bidirectional recurrent convolutional neu-              Volume 1. Association for Computational Linguis-
  ral network for relation classification. In Pro-              tics, pages 541–550.
  ceedings of the 54th Annual Meeting of the As-
                                                              Andreas Holzinger, Bernhard Ofner, Christof Stocker,
  sociation for Computational Linguistics (Volume
                                                                André Calero Valdez, Anne Kathrin Schaar, Martina
  1: Long Papers). Association for Computational
                                                                Ziefle, and Matthias Dehmer. 2013. On graph en-
  Linguistics, Berlin, Germany, pages 756–765.
                                                                tropy measures for knowledge discovery from pub-
  http://www.aclweb.org/anthology/P16-1072.
                                                                lication network data. In Availability, reliability, and
Md Faisal Mahbub Chowdhury and Alberto Lavelli.                 security in information systems and HCI, Springer,
 2012. Combining tree structures, flat features and             pages 354–362.
 patterns for biomedical relation extraction. In Pro-
 ceedings of the 13th Conference of the European              Ni Lao and William W Cohen. 2010. Relational re-
 Chapter of the Association for Computational Lin-              trieval using a combination of path-constrained ran-
 guistics. Association for Computational Linguistics,           dom walks. Machine learning 81(1):53–67.
 pages 420–429.
                                                              Ni Lao, Einat Minkov, and William W Cohen. 2015.
Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-             Learning relational features with backward random
  Seng Chua. 2005. Question answering passage re-               walks. In ACL (1). pages 666–675.
  trieval using dependency relations. In Proceedings
                                                              Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming
  of the 28th annual international ACM SIGIR confer-
                                                                Zhou, and Houfeng WANG. 2015. A dependency-
  ence on Research and development in information
                                                                based neural network for relation classification.
  retrieval. ACM, pages 400–407.
                                                                In Proceedings of the 53rd Annual Meeting
Aron Culotta and Jeffrey Sorensen. 2004. Dependency             of the Association for Computational Linguis-
  tree kernels for relation extraction. In Proceed-             tics and the 7th International Joint Confer-
  ings of the 42nd Annual Meeting on Association for            ence on Natural Language Processing (Volume
  Computational Linguistics. Association for Compu-             2: Short Papers). Association for Computa-
  tational Linguistics, page 423.                               tional Linguistics, Beijing, China, pages 285–290.
                                                                http://www.aclweb.org/anthology/P15-2047.
Dmitriy Dligach, Timothy Miller, Chen Lin, Steven
 Bethard, and Guergana Savova. 2017. Neural tem-              Bernardo Magnini, Matteo Negri, Roberto Prevete, and
 poral relation extraction. EACL 2017 page 746.                 Hristo Tanev. 2002. Is it the right answer?: ex-
                                                                ploiting web redundancy for answer validation. In
Dipl-Math Bettina Friedl, Julia Heidemann, et al. 2010.         Proceedings of the 40th Annual Meeting on Associ-
  A critical review of centrality measures in social net-       ation for Computational Linguistics. Association for
  works. Business & Information Systems Engineer-               Computational Linguistics, pages 425–432.
  ing 2(6):371–385.
                                                              Christopher D. Manning, Mihai Surdeanu, John
Katrin Fundel, Robert Küffner, and Ralf Zimmer. 2007.           Bauer, Jenny Finkel, Steven J. Bethard,
  Relex—relation extraction using dependency parse              and David McClosky. 2014.          The Stanford
  trees. Bioinformatics 23(3):365–371.                          CoreNLP natural language processing toolkit.
                                                                In Association for Computational Linguistics
Pablo Gamallo, Marcos Garcia, and Santiago                      (ACL) System Demonstrations. pages 55–60.
  Fernández-Lanza. 2012. Dependency-based open                  http://www.aclweb.org/anthology/P/P14/P14-5010.
  information extraction. In Proceedings of the joint
  workshop on unsupervised and semi-supervised                Mike Mintz, Steven Bills, Rion Snow, and Dan Ju-
  learning in NLP. Association for Computational                rafsky. 2009. Distant supervision for relation ex-
  Linguistics, pages 10–18.                                     traction without labeled data. In Proceedings of


                                                        142
  the Joint Conference of the 47th Annual Meeting of         Limin Yao, Aria Haghighi, Sebastian Riedel, and An-
  the ACL and the 4th International Joint Conference           drew McCallum. 2011. Structured relation discov-
  on Natural Language Processing of the AFNLP:                 ery using generative models. In Proceedings of the
  Volume 2-Volume 2. Association for Computational             Conference on Empirical Methods in Natural Lan-
  Linguistics, pages 1003–1011.                                guage Processing. Association for Computational
                                                               Linguistics, pages 1456–1466.
Feng Niu, Ce Zhang, Christopher Ré, and Jude W
  Shavlik. 2012. Deepdive: Web-scale knowledge-              Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji,
  base construction using statistical learning and in-         Chi Wang, Shi Zhi, Jiawei Han, Clare R Voss, and
  ference. VLDS 12:25–28.                                      Malik Magdon-Ismail. 2014. The wisdom of mi-
                                                               nority: Unsupervised slot filling validation based on
Jeffrey Pennington, Richard Socher, and Christo-               multi-dimensional truth-finding. In COLING. pages
   pher D. Manning. 2014. Glove: Global vectors for            1567–1578.
   word representation. In Empirical Methods in Nat-
   ural Language Processing (EMNLP). pages 1532–             Dian Yu and Heng Ji. 2016. Unsupervised person slot
   1543. http://www.aclweb.org/anthology/D14-1162.             filling based on graph mining. In ACL.
Sebastian Riedel, Limin Yao, and Andrew McCallum.            Suncong Zheng, Jiaming Xu, Peng Zhou, Hongyun
  2010. Modeling relations and their mentions with-            Bao, Zhenyu Qi, and Bo Xu. 2016. A neural net-
  out labeled text. In Machine Learning and Knowl-             work framework for relation extraction: Learning
  edge Discovery in Databases, Springer, pages 148–            entity semantic and relation pattern. Knowledge-
  163.                                                         Based Systems 114:12–23.
Miguel Rodriguez, Sean Goldberg, and Daisy Zhe
  Wang. 2015. University of florida dsr lab system
  for kbp slot filler validation 2015. In Proceedings of
  the Eighth Text Analysis Conference (TAC2015).
Mark Sammons, Yangqiu Song, Ruichen Wang,
 Gourab Kundu, Chen-Tse Tsai, Shyam Upadhyay,
 Siddarth Ancha, Stephen Mayhew, and Dan Roth.
 2014. Overview of ui-ccg systems for event argu-
 ment extraction, entity discovery and linking, and
 slot filler validation. Urbana 51:61801.
Luis Solá, Miguel Romance, Regino Criado, Julio Flo-
  res, Alejandro Garcia del Amo, and Stefano Boc-
  caletti. 2013. Eigenvector centrality of nodes in
  multiplex networks. Chaos: An Interdisciplinary
  Journal of Nonlinear Science 23(3):033131.
Vidhoon Viswanathan, Nazneen Fatema Rajani, Yinon
  Bentor, and Raymond Mooney. 2015. Stacked en-
  sembles of information extractors for knowledge-
  base population. In Proceedings of ACL.
Ngoc Thang Vu, Heike Adel, Pankaj Gupta, and Hin-
  rich Schütze. 2016. Combining recurrent and con-
  volutional neural networks for relation classifica-
  tion. In Proceedings of the 2016 Conference of
  the North American Chapter of the Association
  for Computational Linguistics: Human Language
  Technologies. Association for Computational Lin-
  guistics, San Diego, California, pages 534–539.
  http://www.aclweb.org/anthology/N16-1065.
I-Jeng Wang, Edwina Liu, Cash Costello, and Christine
   Piatko. 2013. Jhuapl tac-kbp2013 slot filler valida-
   tion system. In Proceedings of the Sixth Text Analy-
   sis Conference (TAC 2013). volume 24.
Quan Wang, Jing Liu, Yuanfei Luo, Bin Wang, and
  C Lin. 2016. Knowledge base completion via cou-
  pled path ranking. In Proceedings of the 54th An-
  nual Meeting of the Association for Computational
  Linguistics. pages 1308–1318.


                                                       143