Community graph and linguistic analysis to validate relationships for knowledge base population Rashedur Rahman1 and Brigitte Grau2 and Sophie Rosset3 1 IRT SystemX, LIMSI, CNRS, Université Paris-Saclay rashedur.rahman@irt-systemx.fr 2 LIMSI, CNRS, ENSIIE, Université Paris-Saclay brigitte.grau@limsi.fr 3 LIMSI, CNRS, Université Paris-Saclay sophie.rosset@limsi.fr Abstract method described in this paper is in the framework of the latter task which, given an entity and a re- Relation extraction between entities from sponse provided by a system (its value and a text- text plays an important role in informa- segment that justifies the claimed relation), has to tion extraction and knowledge discovery decide whether the value is correct or not. We fo- related tasks. Relation extraction sys- cus on relations that occur between two entities. tems produce a large number of candi- Different approaches have been studied for vali- dates where many of them are not cor- dating relations particularly by evaluating the con- rect. A relation validation method justi- fidence that a system can have on the source of the fies a claimed relation based on the in- response, i.e. the document that justifies the re- formation provided by a system. In this sponse (Yu et al., 2014) and the confidence score paper, we propose some features by an- of the system (Viswanathan et al., 2015). Never- alyzing the community graphs of entities theless, other criteria are needed that concern vali- to account for some sort of world knowl- dating semantics of a relation by linguistic charac- edge. The proposed features improve vali- teristics (Niu et al., 2012; Hoffmann et al., 2011; dation of relations significantly when they Yao et al., 2011; Riedel et al., 2010) and are simi- are combined with voting and some state- lar to those used in relation extraction task. of-the-art linguistic features. However, in most cases, the different relation 1 Introduction validation methods do not take into account the global information, that can be computed on the Extracting relations from texts is important for dif- collection of text-documents. Collection level ferent information extraction and knowledge dis- global information about the object of a relation covery related tasks such as knowledge base popu- and words around the mentions have been taken lation, question-answering, etc. This task requires into account for web relation extraction by (Au- Natural Language Understanding of pieces of text genstein, 2016). Such information allows to in- which is particularly complex when searching for troduce some sort of world knowledge for mak- a large number of semantic relations that describe ing choices based on criteria that are independent entities in the open domain. The relationship types of how a relation is expressed in the text-segment. can be relative to the family of a person (spouse, We hypothesize that two entities having a true re- children, parents etc.) or characteristics of a com- lationship should be linked to more common enti- pany (founder, top_members_or_employees etc.), ties than a proposed false relationship between that etc. This task, named slot filling, is evaluated in pair of entities. For example, the spouse of a per- the KBP evaluation1 in which systems must ex- son will share more places and relationships with tract instances of around 40 relation types related his/her spouse than with other people. Therefore, to several kinds of entities (person, organization, we extracted a graph of entities from the collec- location and their different sub-types). Thus, in or- tion that allowed us to propose new characteriza- der to take advantage of several system’s capabili- tions of the relations by graph-based features (Han ties and improve results, a final step can be added et al., 2011), (Friedl et al., 2010), (Solá et al., that enables to validate results of the systems. The 2013). We also introduce information-theoretic 1 https://tac.nist.gov/ measurements on the graph of entities, some of 133 which have been successfully used in other tasks, for different types of relation in the open do- such as entropy for knowledge detection in pub- main. Therefore, recently neural network based lishing networks (Holzinger et al., 2013) and mu- methods have been popular for relation classi- tual information for the validation of responses in fication task (Vu et al., 2016), (Dligach et al., question answering systems (Magnini et al., 2002; 2017), (Zheng et al., 2016). These methods use Cui et al., 2005). Additionally, we propose de- word-embeddings for automatically learning the pendency pattern edit-distance for capturing the patterns and semantics of relations without using syntactic evidence of relations. Word-embeddings any handcrafted features. Dependency based neu- have also been explored to detect the unknown ral networks have also been proposed (Cai et al., triggers of relation expression. 2016), (Liu et al., 2015) to capture features on the The relation validation method we propose is shortest path. thus based on three categories of information: lin- A voting method has been proposed by (Sam- guistic information associated with the expression mons et al., 2014) for ensemble systems to val- of the relations in texts, information coming from idate the outcomes that are proposed by multi- the graphs of entities built on the collection, and ple systems from different information sources. finally information related to the systems and the This method shows good results and remains sta- proposals made. We evaluated our relation vali- ble from a dataset to another. dation system on a sub-part of KBP CSSF-2016 Several graph based methods (Gardner and corpus and show that the validation step achieves Mitchell, 2015), (Lao et al., 2015), (Wang et al., around 5% to 8% higher accuracy over the base- 2016) have been proposed for knowledge base line features when they are combined together completion task by applying Path Ranking Algo- with the proposed graph-based features. rithm (Lao and Cohen, 2010). These methods ba- sically use the already existing relationships in a 2 Related Works knowledge base to learn inference and create new relations by the inference model. Yu and Ji (2016) Relation validation methods have studied different proposed a graph based method for trigger-word kinds of features to decide if a type of relation ex- identification for slot filling task by using PageR- ists or not. ank and Affinity propagation on a graph built at Existence and semantic assessment of relation sentence level. candidates rely on linguistic features, as syntac- Information-theoretic measurements on graphs tic paths or the existence of trigger words between have been successfully used in some related tasks. the pair of entity-mentions. Dependency tree (Cu- Holzinger et al. (Holzinger et al., 2013) measured lotta and Sorensen, 2004), (Bunescu and Mooney, entropy to discover knowledge in publication net- 2005), (Fundel et al., 2007) provides clues for de- works. Some question-answering systems mea- ciding the presence of a relation in unsupervised sured point-wise mutual information (Magnini relation extraction. Gamallo et al. (2012) pro- et al., 2002), (Cui et al., 2005) to exploit redun- posed rule-based dependency parsing for open in- dancy. In order to find the important and influen- formation extraction. They defined some patterns tial nodes in a social network, centralities of the of relation by parsing the dependencies and dis- nodes have been measured (Friedl et al., 2010). covering verb-clauses in the sentences. Solá et al. (2013) explored the concept of eigen- Syntactic analysis cannot characterize the type vector centrality in the multiplex networks. In or- of a relation. Therefore, words around the en- der to validate the proposed relationships, we ap- tity mentions in sentences have been analyzed to ply these different measures on graphs of entities characterize the semantics of a relation (Niu et al., constructed from the text-collection. 2012), (Hoffmann et al., 2011), (Yao et al., 2011), (Riedel et al., 2010), (Mintz et al., 2009). Chowd- 3 Community Graph of Entities hury et al. (2012) proposed a hybrid kernel by combining dependency patterns and trigger words 3.1 Definition of the Graph for bio-medical relation extraction. Thus we ex- Let, a graph G = (E, R), a query relation (slot) plored these different kinds of linguistic features rq , a query entity eq ✏E, candidate responses ec = for validating relationships. {ec1 , ec2 , . . . , ecn }✏E where rq = r(eq , ec )✏R. It can be difficult to identify the trigger words The list of candidates is generated by different re- 134 lation extraction systems. Suppose, other relations Freebase, geo-names etc and perform with high ro ✏R where ro 6= rq . We characterize whether precision. It is able to decompose the entity men- a candidate-entity ec i of Ec is correct or not for tions into components, such as first name, last a query relation (rq ) by analyzing the communi- name and title for a person named entity and ties of Xq and Xc formed by the query entity and classifies location named entities into country, each candidate response. A community Xi con- state/province and city. When the two systems dis- tains the neighbors of ei , and this up to several agree, as in (Stanford: location, Luxid: person), possible steps. we choose the annotation produced by Luxid be- cause it provides more precise information about the detected entity than Stanford does. The knowledge graph represents documents, X D sentences, mentions and entities as nodes and the Barack Obama edges between these nodes represent relationships between these elements. C IN_SAME_SENTENCE barack obama michelle obama Figure 1: Community graph Fig. 1 shows an example of such type of graph where the entity of a query, its type and rela- Barack Obama Michelle name.first name.last name.first tionship name are Barack Obama, person and spouse accordingly. The candidate responses are Sentence-1 Michelle Robinson and Hilary Clinton that are Doc-1 linked to Barack Obama by spouse relation hy- pothesis. The objective is to classify Michelle Figure 2: Knowledge graph Robinson as the correct response based on the community analysis. The communities of Barack Obama (green rectangle), Michelle Robinson (pur- Multiple mentions of the same entity found in ple circle) and Hilary Clinton (orange ellipse) the same document are connected to the same en- are defined by in_same_sentence relation which tity node in the knowledge graph, based on the means the pair of entities are mentioned in the textual similarity of the references and their possi- same sentence in the text. The graph is thus ble components, which corresponds to a first step constructed from untyped semantic relationships of entity linking on local criteria. This operation based on co-occurrences. It would also be possi- is performed by Luxid. However, an entity can ble to use typed semantic relationships provided be mentioned in different documents with differ- by a relation extraction system. ent forms (eg, Barack Obama, President Barack Obama, President Obama etc.) which creates re- 3.2 Construction of the Community Graph dundant nodes in the knowledge graph. Entities are then grouped according to the similarity of The graph of entities as illustrated in Fig. 1 is cre- their names and the similarity of their neighboring ated from a graph representing the knowledge ex- entities calculated by Eq. 2. This step groups the tracted from the texts (lower part of Fig. 2) called similar entities into a single entity in the commu- knowledge graph. This knowledge graph is gener- nity graph (upper part of Fig. 2). This latter graph ated after applying systems of named entity recog- is constructed from the information on the entities nition (NER) and sentence splitting. and relations present in the knowledge graph and Recognition of named entities is done us- the link with the documents is always maintained. ing Stanford system (Manning et al., 2014) and It is thus possible to know the number of occur- Luxid2 . Luxid is a rule-based NER system that rences of each entity and each relation. The graph uses some external information sources such as is stored in a Neo4j database, a graph-oriented 2 http://www.expertsystem.com/fr/ database, which makes it possible to extract the 135 subgraphs linked to an entity by queries. We only The eigenvector centrality (Bonacich and consider as members of the communities the enti- Lloyd, 2001) measures the influence of a node in ties of type person, location and organization. a network. A node will be even more influen- tial if it is connected to other influential nodes. 4 Relation Validation We hypothesize that the query-entity should be In order to predict whether a relationship is cor- more influenced by the correct candidate than by rect or not, we consider this problem as a binary other candidates. We measure the influences of the classification task based on three categories of in- candidates in the community of the query-entity formation. We calculate a set of features using the by calculating the absolute difference between the graphs (see section 4.1), to which we add features score of eigenvector centrality of the query-entity based on a linguistic analysis of the text that jus- and that of each candidate. We, therefore, assume tifies the candidate and describes the relationship that this difference should be smaller for a correct (see section 4.2) and an estimation of trust on the candidate than for an incorrect candidate. Suppose candidates according to the frequencies of them in A = (ai,j ) is the adjacency matrix of a graph G. the responses of each query (see section 4.3). Ta- The eigenvector centrality xi of node i is calcu- ble 1 summarizes all the features used for the clas- lated recursively by Eq. 3. sification task. 1X xi = ak,i xk (3) 4.1 Graph-based Features k We assume that a correct candidate of a query is an where, 6= 0 is a constant and the equation important member of the community of the query can be expressed in matrix form: x = xA entity. A community Xe of an entity is defined by We also hypothesize that mutual information the sub-graph formed by its neighbors up to sev- (Eq. 4) between the pair of communities of a eral levels. A merging of the communities of two query-entity and a correct candidate must be particular entities includes all the neighbors of that higher than that computed between the communi- pair of entities. We, therefore, define different fea- ties of the query-entity and an incorrect candidate. tures related to this hypothesis. We hypothesize that the network density (Eq. 1) M I(Xq , Xc ) = H(Xq ) + H(Xc ) H(Xq , Xc ) of the community of a correct candidate merged (4) with the community of the query entity must be n X higher than the density of an incorrect candidate’s where, H(X) = p(ei ) log2 (p(ei )) community merged with that of the same entity. i=1 number of edges of e number of existing edgeswith e p(e) = ⇢Xe = (1) number of edges of X number of possible edges Xq and Xc are the communities of a query- According to the Fig. 1 the merged community entity and a candidate respectively. of Michelle Robinson and Barack Obama is more dense than the merged community of Hilary Clin- The community of an entity (query-entity or ton and Barack Obama. candidate) is extended up to the third level to mea- We compute the network similarity (Eq. 2) be- sure eigenvector centrality and mutual informa- tween two communities and hypothesize that the tion. score of the network similarity between the com- munities of a query entity and a correct candidate 4.2 Linguistic Features would be higher than the score between that query For assessing if a relation exists between the pair entity and a wrong candidate. of entity mentions, we define syntactic features. |Xq \ Xc | For characterizing the semantic of the relation, we similarity = p (2) represent it by seed words and analyze the sen- |Xq ||Xc | tence at the lexical level. where, Xq and Xc are the community mem- Syntactic features are calculated from depen- bers of the query entity and of a candidate entity dency analysis, i.e. the parser (Manning et al., accordingly. 2014) provides a tree in which nodes are the 136 Feature Group Feature Name Network density Eigenvector centrality Graph Mutual information network similarity Minimum edit distance between dependency patterns Dependency pattern length Are the query and filler mentions found in the same clause Linguistic Has trigger word between mentions Has trigger word in dependency path Has trigger word in minimum subtree Is trigger based relation Baseline (voting) Filler credibility Table 1: Relation validation features imum edit distance as a feature. words of the sentence and the edges between Since relations are often expressed in short de- them are labeled by their syntactic role. We col- pendency paths, the length of the simplified pat- lected a list of dependency patterns for each re- tern is considered as a feature. lation from annotated examples. For example, The semantic analysis is performed based on in the sentence Paola, Queen of the Belgians is trigger words associated with the relation types. the wife of King Albert of Belgium. the de- We consider semantic features as boolean values pendency pattern between Paola and King Al- by defining two types of trigger words: positive bert is [nn, nsubj, prep_of] and the dependen- trigger and negative trigger. Positive trigger words cies are nn(Queen, Paola), nsubj(wife, Queen), refer to the keywords that strongly support a par- prep_of(wife, Albert). We simplify the pattern ticular relation while the negative triggers strongly [nn, nsubj, prep_of] to [nsubj, prep_of] by remov- negate the claimed relations. For example, wife, ing leading and following nn for noise reduction. husband, married are positive triggers while par- We notice that sometimes the dependency pat- ent, children, brother are negative triggers for a terns contain consecutive labels like [nsubj, dobj, spouse relation. We collected these seed words prep_of, prep_of, poss]. In such cases, we sim- from the assessed dataset of TAC KBP 2014 slot plify the pattern by substituting the consecutive filling task. In total we collected around 250 trig- labels with a single label which leads to simplify gers and 553 dependency patterns of 41 relations [nsubj, dobj, prep_of, prep_of, poss] into [nsubj, from 3, 579 annotated snippets. dobj, prep_of, poss]. This simplification general- Since the relations are expressed by a variety izes the dependency patterns. of words it is hard to collect all the trigger words The acquired patterns are compared to the sim- for a relation. Therefore, we associate a word plified dependency path of a sentence by comput- embedding to each trigger by using a pre-trained ing edit distances. Suppose a list of pre-annotated GloVe (Pennington et al., 2014) model. Thus, de- dependency patterns are (a,b,c), (a,c,d), (b,c,d) for ciding if a word is a trigger or not relies on the a relation R and the dependency pattern (a,c,b) similarity of their embeddings. Suppose, a, b are is extracted from a relation provenance sentence two words between the query and filler mention of between the query and the filler mention for a a claimed relation R and x, y, z the positive triggers claimed relation to be R. We calculate the edit for the claimed relation. We compute the similar- distance between each pair of [(a,c,b), (a,b,c)], ity between the vectors of each pair of (a,x), (a,y), [(a,c,b), (a,c,d)],[(a,c,b), (b,c,d)] and keep the min- (a,z), (b,x), (b,y), (b,z). If the similarity score for 137 a word from a, b satisfies a predefined threshold 2015 and 2016. TAC provided a reference cor- (0.7) we consider that there exists a trigger word. pus for the English CSSF-2015 evaluation task We check whether there is any positive and/or neg- that consisted of 45, 000 documents. These doc- ative trigger word in three cases for validating a uments include texts from newswires and discus- claimed relation: 1) between the mentions at sur- sion_forums. We parsed these texts for building face level 2) in the dependency path and 3) in the the knowledge graph. We compiled our train- minimum subtree as in (Chowdhury and Lavelli, ing data from the assessments of English CSSF- 2012). 2015 responses of slot filling systems. There were Some relations can be expressed without using 9, 339 round-1 queries for English CSSF2015 and any trigger word. For example, the snippet Mr. in total 330, 314 round-2 queries were generated David, from California won the prize expresses by all the slot filling systems based on the re- the per:city_of_residence relation without explic- sponses of round-1 queries. NIST assessed SF itly using any trigger word. We classify the rela- responses of around 2, 000 round-1 and 2, 500 tion types in two classes: can be expressed with- round-2 queries. A lot of queries have been an- out trigger word or not, and use a boolean flag swered with only wrong responses. Therefore, we (is_trigger-based_relation) as a feature. do not take into account these queries for build- ing our training corpus. We selected only queries 4.3 Voting: Filler Credibility that have been answered with correct and wrong We use and calculate the credibility score for can- responses. This subset counted total 1, 296 (1, 080 didates based on all the responses given by differ- round-1 and 216 round-2) slot filling queries. ent systems to a query. We extracted answers corresponding to those queries from the system assessment file that con- tains the assessment of the filler values and f iller credibility (Fi , Q) relation-provenance offsets accordingly. The re- number of occurrences of Fi lation provenance offsets refer to the document = # of occurrences of all the candidates ids and begin-end position of the text segments (5) in the evaluation corpus. The values of filler as- sessment can be correct (C), wrong (W) and in- Let F be the candidates of a query Q supplied exact (X) while the assessment values of relation by the systems S. The credibility of a candidate provenance can be correct (C), wrong (W), short Fi is computed by the equation 5. (S) and long (L) where S and L are considered The filler credibility counts the relative vote of as inexact. We only take into account the C and a candidate which indicates the degree of agree- W filler assessments and separate the correct and ment by different systems to consider the candi- wrong responses according to the relation prove- date as correct. Since we can assume that sys- nance assessment. When the relation provenance tems already performed some linguistic and prob- assessment is C the filler assessment can be ei- abilistic analysis to make the responses, filler cred- ther C or X but not W. It results in 68, 076 re- ibility holds strong evidence for a candidate to be sponses. Several features have to be computed on correct. Some slot filling and slot filler valida- complete sentences, and not on sentence excerpts. tion methods have used the system credibility (Yu As the relation provenance offset of a SF response et al., 2014) and confidence score (Wang et al., is not guaranteed to be a complete sentence, we 2013; Viswanathan et al., 2015; Rodriguez et al., extract the complete sentence corresponding to the 2015) of the responses for validating relations but relation provenance offset snippet from the source these features are much system and data depen- document. dent. Therefore, we use only filler credibility as the baseline. The linguistic features are calculated from the analyzed sentences where the mentions of the pair 5 Experiments and Results of entities (query and candidate) must be identi- fied. However, our system cannot find both en- 5.1 Data tries in all selected sentences. This happens when We perform our experiments by using TAC-KBP either the query entity or the candidate entity is English cold start slot filling (CSSF) datasets of mentioned by a pronoun or nominal anaphora as 138 we do not use any co-reference resolution. In ad- 5.2 Results dition, the named entity detection system, which We have trained the models by using several clas- results from two efficient systems, does not de- sifiers and evaluated relation validation method by tect all the entity mentions (of queries and candi- standard precision, recall, F-measure and accuracy dates) present in the queries and hypotheses. This of different models. restriction also applies to the computation of fea- In this experiment we show the contribution of tures based on the graph that is constructed over the proposed graph based features for validating the recognized named entities. This behavior cor- relations. Since community-graph based analysis responds to a trend generally observed in named does not account the semantics but holds some ev- entity recognition systems when applied to dif- idences of how the entities are associated to each ferent documents from those on which they were other we expect significant gain of precision by trained (here web documents and blogs instead of our relation validation method so that a better F- newspaper articles or Wikipedia pages). score can be achieved. We compare the classification performances of Moreover, adding the constraint to finding two different classifiers e.g. LibLinear, SVM, Naive entities in the same sentence causes this additional Bayes, MaxEnt and Random Forest based on the decrease in performance. A total of 55, 276 hy- best combination of the features as shown in Ta- potheses (of the initial 68, 076) could be processed ble 2. We obtain the best precision (32.2), re- to compute the linguistic features for the responses call (48.1), F-score (38.5) and accuracy (72.4) by of 1, 296 queries that have been responded with Random Forest classifier. The second highest pre- both correct and wrong candidates. Our system cision (29.0) and accuracy (72.35) are resulted restricts to extract graph based features of limited by MaxEnt while Naive Bayes results the second number of query-responses due to the NER limi- highest recall (45.8) and F-score (38.5). Since tation and in_same_sentence constraint. In sum- Random Forest results the best scores over other mary, we can extract both the linguistic and graph classifiers we observe the performances of differ- features for 4, 321 responses from 260 queries, ent feature sets by this classifier. (213 from round-1 and 47 from round-2). Since Table 3 presents the classification performances there are many wrong responses compared to the of different feature sets by Random Forest classi- number of correct responses, we take a subset of fier. Filler credibility as a single feature obtains the wrong responses randomly from CSSF-2015 the F-score and accuracy of 34.6 and 62.1 accord- dataset for training the system after removing the ingly. It represents a strong baseline, as shown in duplicate responses where the ratio of correct and (Sammons et al., 2014). The combination of filler wrong responses of each query is 2 : 1 approx- credibility and linguistic features boosts the per- imately. After applying the filtering process the formance by around 6 points in term of accuracy training dataset contains in total 3, 481 (1, 268 though the F-score drops slightly. When the filler positive and 2, 213 negative) instances. credibility is combined with the graph features it gains a significant precision (29.0) and accuracy (70.3) which are around 4 and 8 points higher ac- Similar process has been applied on TAC KBP cordingly. This combination also increases the F- CSSF-2016 dataset that we use for testing. CSSF- score slightly. This may seem surprising, as the 2016 dataset consists of around 30, 000 docu- graphs are the same for assumptions about pairs ments. Around 34, 267 responses (of 925 queries) of identical entities but linked by different rela- have been assessed as correct or wrong by TAC. tionships. It should be noted that the relation hy- Our system could compute graph based features potheses are already the results of different sys- for around 3, 884 responses of 352 queries that tems based on the semantics of relations. The have been responded by both correct and wrong graph-based features account the global context of answers. There are 699 correct and 3, 185 wrong the hypotheses of relations and the experimental responses among 3, 884 responses with graph results show the significant contribution of them. based features in our test dataset. The statistics The best precision (32.2), F-score (38.5) and of the training and test datasets are shown in the accuracy (72.4) are achieved by combining filler columns 1 to 4 of Table 4. credibility, linguistic and graph features. This 139 Classifier Precision Recall F-measure Accuracy LibLinear 26.5 38.5 31.4 69.72 SVM 23.7 29.9 26.4 70.06 Naive Bayes 28.0 45.8 34.8 69.07 MaxEnt 29.0 37.1 32.5 72.35 Random Forest 32.2 48.1 38.5 72.40 Table 2: Relation validation performances by different classifiers Feature Groups Precision Recall F-measure Accuracy Filler credibility (Fc) 25.1 55.8 34.6 62.1 Fc + Linguistic 26.8 45.2 33.7 68.0 Fc + Graph 29.0 45.1 35.3 70.3 Fc + Linguistic + Graph 32.2 48.1 38.5 72.4 Table 3: Relation validation performances by different feature sets combination improves the precision, F-score and relations (org-parents, per-country_of_birth and accuracy by around 7, 4 and 10 points over the per-stateorprovince_of_birth) have very small filler credibility. The precision is gained because number of positive instances where all of these of low false negative that indicates the system clas- have been classified as wrong and these relations sifies a small number of wrong responses as cor- individually counts zero true positive. Therefore, rect. This results strongly signify the contribution the precision, recall and F-score become zero. of graph based features for validating the claimed However, most of the negative instances of relations. these relations have been correctly classified We also investigated the classification perfor- as wrong. Moreover, the test dataset contains mance of Fc+Linguistic+Graph model relation by only negative instances for some relations relation as shown in Table 4. (per-cities_of_residence, per-schools_attended, We notice that the classification performances per-children, org-alternate_names and org- are not similar for different relations according founded_by). All of the instances have been to the F-score and accuracy although we train correctly classified as wrong that result 100% a single model with the responses of different accuracy for these relations. Interestingly, we relations. However, as we do not have the similar notice that proposed relation validation method number of training instances for all the relations discards all the wrong instances of per-children (e.g. per-city_of_birth and org-subsidiaries), relation although the training dataset does not as we see in column 2, it may impact the contain any positive or negative instances of this results. Additionally, in the test data, there relation. A similar result has also been observed are a very small number of positive responses for per-country_of_death relation where the compared to the negative ones (see column 5) system of relation validation obtains an accuracy for some relations that cause inconsistency in of 86.7 to validate the instances. These results classification performance. For example, the justify that the relation validation system trained distribution of positive and negative responses of by the instances of different relations is able to per:top_members_employees, per:city_of_birth predict correctly whether an instance of a relation and per:parents are more balanced compared is correct or wrong even though the system is not to countries_of_residence, org-subsidiaries, em- trained by the instances of that particular relation. ployee_or_member_of, country_of_headquarters. Table 5 explains better that our system performs Therefore, these two sets of relations make a better than the baseline system to discard the neg- clear difference between their scores. Also, some ative responses. It presents the confusion matrix 140 Training Data Test Data Relation Name # Instances Pos. (%) # Instances Pos. (%) F-score Accuracy org-top_members_employees 147 46.3 59 29 (49.2%) 93.1 93.2 per-city_of_birth 423 33.3 92 70 (76.1%) 86.8 77.2 statesorprovinces_of_residence 81 33.3 170 70 (41.2%) 78.8 80.0 org-city_of_headquarters 402 33.3 175 41 (23.4%) 63.0 80.6 per-parents 37 94.6 32 17 (53.1%) 61.1 56.3 per-country_of_death 0 0 120 6 (5%) 33.3 86.7 per-countries_of_residence 501 33.3 1461 261 (17.9%) 27.9 66.0 org-country_of_headquarters 474 33.3 416 82 (19.7%) 22.0 71.0 stateorprovince_of_headquarters 157 31.2 369 60 (16.3%) 11.4 74.8 org-subsidiaries 60 35 251 34 (13.5%) 15.4 65.0 per-employee_or_member_of 566 37.3 248 16 (6.5%) 11.9 64.1 org-parents 10 40 21 10 (47.6%) 0.0 52.4 per-country_of_birth 42 33.3 66 6 (1.5%) 0.0 65.2 per-stateorprovince_of_birth 93 33.3 19 2 (10.5%) 0.0 52.6 per-cities_of_residence 78 33.3 123 0 (0%) 0.0 85.4 per-schools_attended 12 33.3 34 0 (0%) 0.0 100 per-children 0 0 221 0 (0%) 0.0 100 org-alternate_names 65 49.2 6 0 (0%) 0.0 100 org-founded_by 40 70 1 0 (0%) 0.0 100 All Together 3481 36.4 3884 699 (18%) 38.5 72.4 Table 4: Performance of relation validation relation by relation Feature Groups TP FN FP TN Filler credibility (Fc) 390 (55.8%) 309 1,164 2,021 (63.5%) Fc + Linguistic 316 (45.2%) 383 862 2,323 (72.9%) Fc + Graph 315 (45.1%) 384 771 2,414 (75.8%) Fc + Linguistic + Graph 336 (48.1%) 363 709 2,476 (77.7%) Table 5: Confusion matrix resulted by different feature sets (where the number of positive and negative instances are 699 and 3, 185 accordingly) (true positive (TP), false negative (FN), false pos- to account for general knowledge about the en- itive (FP) and true negative (TN)) of the classi- tities having true relationships. Experimental re- fication task by different feature sets. The com- sults have shown that our proposed features sig- bination, Fc+Linguistic+Graph discards 77.7% nificantly improve a baseline constructed from the wrong responses which is around 14% higher than votes on the responses of different systems. The the baseline. All the experimental results signify proposed method outperforms the baseline to dis- the contribution of the proposed features for the card wrong relationships. task of relation validation. The calculation of the different characteristics is 6 Conclusion dependent on the parsing of the texts, in particular, on the results of the NER system. This part has to This paper presents a method of validating rela- be improved in order to evaluate the contribution tionships from the system outputs. We have intro- of community graph on more responses. Although duced some features on the linked entities which the proposed method results better F-score and ac- are computed at the global level of the collec- curacy compared to the baseline, the method also tion. We have proposed to deal with the com- discards some positive responses that drops the re- munity graphs of entities that make it possible call; thus we have to overcome this limitation. 141 References Matt Gardner and Tom M Mitchell. 2015. Efficient and expressive knowledge base completion using sub- Isabelle Augenstein. 2016. Web Relation Extraction graph feature extraction. In EMNLP. pages 1488– with Distant Supervision. Ph.D. thesis, University 1498. of Sheffield. Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Phillip Bonacich and Paulette Lloyd. 2001. entity linking in web text: a graph-based method. In Eigenvector-like measures of centrality for asym- Proceedings of the 34th international ACM SIGIR metric relations. Social networks 23(3):191–201. conference on Research and development in Infor- Razvan C Bunescu and Raymond J Mooney. 2005. mation Retrieval. ACM, pages 765–774. A shortest path dependency kernel for relation ex- Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke traction. In Proceedings of the conference on Hu- Zettlemoyer, and Daniel S Weld. 2011. Knowledge- man Language Technology and Empirical Methods based weak supervision for information extraction in Natural Language Processing. Association for of overlapping relations. In Proceedings of the 49th Computational Linguistics, pages 724–731. Annual Meeting of the Association for Computa- Rui Cai, Xiaodong Zhang, and Houfeng Wang. tional Linguistics: Human Language Technologies- 2016. Bidirectional recurrent convolutional neu- Volume 1. Association for Computational Linguis- ral network for relation classification. In Pro- tics, pages 541–550. ceedings of the 54th Annual Meeting of the As- Andreas Holzinger, Bernhard Ofner, Christof Stocker, sociation for Computational Linguistics (Volume André Calero Valdez, Anne Kathrin Schaar, Martina 1: Long Papers). Association for Computational Ziefle, and Matthias Dehmer. 2013. On graph en- Linguistics, Berlin, Germany, pages 756–765. tropy measures for knowledge discovery from pub- http://www.aclweb.org/anthology/P16-1072. lication network data. In Availability, reliability, and Md Faisal Mahbub Chowdhury and Alberto Lavelli. security in information systems and HCI, Springer, 2012. Combining tree structures, flat features and pages 354–362. patterns for biomedical relation extraction. In Pro- ceedings of the 13th Conference of the European Ni Lao and William W Cohen. 2010. Relational re- Chapter of the Association for Computational Lin- trieval using a combination of path-constrained ran- guistics. Association for Computational Linguistics, dom walks. Machine learning 81(1):53–67. pages 420–429. Ni Lao, Einat Minkov, and William W Cohen. 2015. Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat- Learning relational features with backward random Seng Chua. 2005. Question answering passage re- walks. In ACL (1). pages 666–675. trieval using dependency relations. In Proceedings Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming of the 28th annual international ACM SIGIR confer- Zhou, and Houfeng WANG. 2015. A dependency- ence on Research and development in information based neural network for relation classification. retrieval. ACM, pages 400–407. In Proceedings of the 53rd Annual Meeting Aron Culotta and Jeffrey Sorensen. 2004. Dependency of the Association for Computational Linguis- tree kernels for relation extraction. In Proceed- tics and the 7th International Joint Confer- ings of the 42nd Annual Meeting on Association for ence on Natural Language Processing (Volume Computational Linguistics. Association for Compu- 2: Short Papers). Association for Computa- tational Linguistics, page 423. tional Linguistics, Beijing, China, pages 285–290. http://www.aclweb.org/anthology/P15-2047. Dmitriy Dligach, Timothy Miller, Chen Lin, Steven Bethard, and Guergana Savova. 2017. Neural tem- Bernardo Magnini, Matteo Negri, Roberto Prevete, and poral relation extraction. EACL 2017 page 746. Hristo Tanev. 2002. Is it the right answer?: ex- ploiting web redundancy for answer validation. In Dipl-Math Bettina Friedl, Julia Heidemann, et al. 2010. Proceedings of the 40th Annual Meeting on Associ- A critical review of centrality measures in social net- ation for Computational Linguistics. Association for works. Business & Information Systems Engineer- Computational Linguistics, pages 425–432. ing 2(6):371–385. Christopher D. Manning, Mihai Surdeanu, John Katrin Fundel, Robert Küffner, and Ralf Zimmer. 2007. Bauer, Jenny Finkel, Steven J. Bethard, Relex—relation extraction using dependency parse and David McClosky. 2014. The Stanford trees. Bioinformatics 23(3):365–371. CoreNLP natural language processing toolkit. In Association for Computational Linguistics Pablo Gamallo, Marcos Garcia, and Santiago (ACL) System Demonstrations. pages 55–60. Fernández-Lanza. 2012. Dependency-based open http://www.aclweb.org/anthology/P/P14/P14-5010. information extraction. In Proceedings of the joint workshop on unsupervised and semi-supervised Mike Mintz, Steven Bills, Rion Snow, and Dan Ju- learning in NLP. Association for Computational rafsky. 2009. Distant supervision for relation ex- Linguistics, pages 10–18. traction without labeled data. In Proceedings of 142 the Joint Conference of the 47th Annual Meeting of Limin Yao, Aria Haghighi, Sebastian Riedel, and An- the ACL and the 4th International Joint Conference drew McCallum. 2011. Structured relation discov- on Natural Language Processing of the AFNLP: ery using generative models. In Proceedings of the Volume 2-Volume 2. Association for Computational Conference on Empirical Methods in Natural Lan- Linguistics, pages 1003–1011. guage Processing. Association for Computational Linguistics, pages 1456–1466. Feng Niu, Ce Zhang, Christopher Ré, and Jude W Shavlik. 2012. Deepdive: Web-scale knowledge- Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, base construction using statistical learning and in- Chi Wang, Shi Zhi, Jiawei Han, Clare R Voss, and ference. VLDS 12:25–28. Malik Magdon-Ismail. 2014. The wisdom of mi- nority: Unsupervised slot filling validation based on Jeffrey Pennington, Richard Socher, and Christo- multi-dimensional truth-finding. In COLING. pages pher D. Manning. 2014. Glove: Global vectors for 1567–1578. word representation. In Empirical Methods in Nat- ural Language Processing (EMNLP). pages 1532– Dian Yu and Heng Ji. 2016. Unsupervised person slot 1543. http://www.aclweb.org/anthology/D14-1162. filling based on graph mining. In ACL. Sebastian Riedel, Limin Yao, and Andrew McCallum. Suncong Zheng, Jiaming Xu, Peng Zhou, Hongyun 2010. Modeling relations and their mentions with- Bao, Zhenyu Qi, and Bo Xu. 2016. A neural net- out labeled text. In Machine Learning and Knowl- work framework for relation extraction: Learning edge Discovery in Databases, Springer, pages 148– entity semantic and relation pattern. Knowledge- 163. Based Systems 114:12–23. Miguel Rodriguez, Sean Goldberg, and Daisy Zhe Wang. 2015. University of florida dsr lab system for kbp slot filler validation 2015. In Proceedings of the Eighth Text Analysis Conference (TAC2015). Mark Sammons, Yangqiu Song, Ruichen Wang, Gourab Kundu, Chen-Tse Tsai, Shyam Upadhyay, Siddarth Ancha, Stephen Mayhew, and Dan Roth. 2014. Overview of ui-ccg systems for event argu- ment extraction, entity discovery and linking, and slot filler validation. Urbana 51:61801. Luis Solá, Miguel Romance, Regino Criado, Julio Flo- res, Alejandro Garcia del Amo, and Stefano Boc- caletti. 2013. Eigenvector centrality of nodes in multiplex networks. Chaos: An Interdisciplinary Journal of Nonlinear Science 23(3):033131. Vidhoon Viswanathan, Nazneen Fatema Rajani, Yinon Bentor, and Raymond Mooney. 2015. Stacked en- sembles of information extractors for knowledge- base population. In Proceedings of ACL. Ngoc Thang Vu, Heike Adel, Pankaj Gupta, and Hin- rich Schütze. 2016. Combining recurrent and con- volutional neural networks for relation classifica- tion. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Lin- guistics, San Diego, California, pages 534–539. http://www.aclweb.org/anthology/N16-1065. I-Jeng Wang, Edwina Liu, Cash Costello, and Christine Piatko. 2013. Jhuapl tac-kbp2013 slot filler valida- tion system. In Proceedings of the Sixth Text Analy- sis Conference (TAC 2013). volume 24. Quan Wang, Jing Liu, Yuanfei Luo, Bin Wang, and C Lin. 2016. Knowledge base completion via cou- pled path ranking. In Proceedings of the 54th An- nual Meeting of the Association for Computational Linguistics. pages 1308–1318. 143