=Paper= {{Paper |id=Vol-1409/paper-02 |storemode=property |title=Rule Mining for Semantifying Wikilinks |pdfUrl=https://ceur-ws.org/Vol-1409/paper-02.pdf |volume=Vol-1409 |dblpUrl=https://dblp.org/rec/conf/www/GalarragaSM15 }} ==Rule Mining for Semantifying Wikilinks== https://ceur-ws.org/Vol-1409/paper-02.pdf
                      Rule Mining for Semantifying Wikilinks

                     Luis Galárraga, Danai Symeonidou, Jean-Claude Moissinac
                                  Télécom ParisTech, Paris, France
         {luis.galarraga, danai.symeonidou, jean-claude.moissinac}@telecom-paristech.fr




ABSTRACT                                                         In some other cases, the semantic connection encoded in a
Wikipedia-centric Knowledge Bases (KBs) such as YAGO             wikilink can be vague and opaque and even not modeled in
and DBpedia store the hyperlinks between articles in Wiki-       the schema of the KB. For example, Obama’s article also
pedia using wikilink relations. While wikilinks are signals of   links to the articles for cocaine and ovarian cancer.
semantic connection between entities, the meaning of such           In this work, we show how to leverage the already seman-
connection is most of the times unknown to KBs, e.g., for        tified wikilinks to semantify the others. This is achieved
89% of wikilinks in DBpedia no other relation between the        by learning frequent semantic patterns from the relations in
entities is known. The task of discovering the exact relations   the KB and the wikilinks. If we observe that people often
that hold between the endpoints of a wikilink is called wik-     link to the countries where they come from, we can suggest
ilink semantification. In this paper, we apply rule mining       that unsemantified wikilinks from people to countries con-
techniques on the already semantified wikilinks to propose       vey a nationality relationship. This example also implies
relations for the unsemantified wikilinks in a subset of DB-     that the types of entities play an important role when se-
pedia. By mining highly supported and confident logical          mantifying wikilinks. For instance, the fact that France links
rules from KBs, we can semantify wikilinks with very high        to Spain suggests that the implicit relation carried by the
precision.                                                       wikilink holds between countries (or even places) and there-
                                                                 fore discards any relation with an incompatible signature. If
                                                                 we assume that wikilinks between countries encode a trade
1.   INTRODUCTION                                                partnership, we can formulate this pattern as a logical rule:
   Some of the most prominent KBs such as DBpedia [1]
                                                                 linksTo(x, y)∧is(x, Country)∧is(y, Country) ⇒ deals(x, y)
or YAGO [19] build upon accurate information extraction
on the semi-structured parts of Wikipedia articles such as       Given an unsemantified wikilink between two countries, this
infoboxes, Wikipedia categories and hyperlinks between ar-       rule will predict that they must be trade partners. Such
ticles, namely wikilinks. Even though wikilinks account for      predictions could be proposed as candidate facts to popu-
more than 25% of the non-literal facts in DBpedia, they          late KBs. Still, this application scenario would require the
are rarely exploited. Nevertheless, the fact that two entities   rules to have certain quality, i.e., they should be statistically
are connected via a hyperlink accurately suggests a seman-       significant and draw correct conclusions in most cases. This
tic connection between them. The goal of this paper is to        would avoid capturing noisy or irrelevant patterns and make
discover the exact meanings of such connections.                 wrong predictions.
   Some wikilinks are already semantified in KBs. YAGO              The process of learning logical rules from structured data
and DBpedia, for example, know that Barack Obama links           is known as Rule Mining. In this paper, we resort to a
to USA and is also a citizen and the President of that coun-     method called AMIE [6] to mine logical rules from KBs. We
try. KBs can extract such information because it is usually      then use the rules to draw conclusions and compute a list
available in the infoboxes; however if the information lies      of the most likely candidate meanings (relations) between
somewhere outside the infoboxes, KBs will not see it, lead-      the entities of unsemantified wikilinks. Using a straightfor-
ing to unsemantified wikilinks (see [9, 21] for automatic pop-   ward inference method, we can discover meanings for 180K
ulation of infoboxes from text). This is the case for 89% of     unsemantified wikilinks with very high precision.
wikilinks in DBpedia. For instance, the Wikipedia article of        In addition to the semantification of wikilinks, and to fur-
Barack Obama links to the article of the 2009 Nobel Prize,       ther emphasize their value, we discuss their effect in the
but DBpedia does not know that he won the Nobel Prize.           task of rule mining. We observe that sometimes, they can
                                                                 increase the confidence of the obtained rules. For instance,
                                                                 assuming that a rule mining approach learns the rule:
                                                                             currentM ember(x, y) ⇒ team(x, y)
                                                                 we observe that by requiring the existing of a wikilink be-
                                                                 tween the entities:
                                                                     linksT o(x, y) ∧ currentM ember(x, y) ⇒ team(x, y)
Copyright is held by the owner/author(s).
WWW2015 Workshop: Linked Data on the Web (LDOW2015).             we achieve higher confidence. This observation could be
leveraged by data inference and link prediction approaches.          stances of coaches and soccer leagues. While this method
It also provides additional insights about the KB.                   also makes use of the instance information to mine patterns,
                                                                     it does not aim to discover relations between entities. Thus,
2.   RELATED WORK                                                    it does not make use of any other relations holding between
                                                                     the endpoints of wikilinks. In the same spirit, [16] builds
   Link prediction. The task of discovering semantic links
                                                                     upon EKPs and uses the instance information to map both
between entities in KBs is often referred in the literature
                                                                     entities and classes to a vector space. A similarity function
as link prediction. Due to the prominence of the Semantic
                                                                     on this space is used to compute the distance of an entity to
Web, the problem has been extensively studied using multi-
                                                                     the prototypical vector of classes and predict the types for
ple paradigms.
                                                                     untyped entities.
   Statistical graphical models such as Bayesian Networks [5]
and Markov Logic Networks (MLN) [17] offer a theoretically
rigorous framework for data inference in KBs. Given a KB             3.     PRELIMINARIES
and a set of soft weighted rules expressed in first order logic,
MLNs support multiple inference tasks such as probability            3.1      Rule Mining
calculation for queries and predictions, and MAP (Maximum               Our proposal to semantify wikilinks relies on logical rules
a Posteriori) inference. The major drawback of such meth-            mined from a KB and its wikilinks. In this paper we use a
ods is that in the original formulation they do not scale to the     logical notation to represent rules and facts in a KB, e.g., the
size of current KBs. Nevertheless, there have been initiatives       fact that Angela Merkel is a citizen of Germany is expressed
to extend the applicability of MLNs to large datasets [14].          as nationality(Angela Merkel, Germany). An atom is a fact
   Some approaches represent KBs as matrices or tensors [12,         where at least one of the arguments of the relation is a vari-
13]. Under this paradigm, for instance, a KB can be repre-           able, e.g., nationality(x, Germany). We say that an atom
sented as a three-dimensional tensor where the fact r(x, y) is       holds in a KB if there exists an assignment for the variables
encoded as 1 in the cell with coordinates (r, x, y). Methods         in the atom that results in a fact in the KB. Moreover,we say
such as RESCAL [12], among others [13, 18] resort to tensor          that two atoms are connected if they share at least one vari-
factorization and latent factor analysis on the matrix repre-        able. The building blocks for logical rules are conjunctions
sentation of the KB, in order to estimate the confidence of          of transitively connected atoms. For example, the rule that
the missing cells, i.e., how likely the missing facts are true       says that married couples have the same nationality can be
based on the latent features in the data. Even though the            expressed as:
scores are often given a probabilistic interpretation, they are
                                                                           nationality(x, y) ∧ spouse(x, z) ⇒ nationality(z, y)
not probabilities in a strict sense. Unlike our approach, this
line of methods does not rely on explicitly formulated rules         This is a Horn rule. The left-hand side of the implication is a
to perform inference.                                                conjunction of connected atoms called the body, whereas the
   A third family of approaches [7, 20, 3] resorts to em-            right-hand side is the head. In this paper, we focus on closed
bedding models to formulate the link prediction problem.             Horn rules, i.e., rules where each variable occurs in at least
In [20], entities are represented as vectors in an embed-            two atoms of the rule. Closed Horn rules always conclude
ding space, while relations are defined as transformations           concrete facts for assignments of the variables to values in
on those vectors, e.g., the transformation nationality maps          the KB. If the KB knows nationality(Barack Obama, USA)
the vector of Barack Obama to the vector of USA. Methods             and spouse(Barack Obama, Michelle Obama), our example
based on embedding methods are very effective at predicting          rule will conclude nationality(Michelle Obama, USA). If the
values for functional relations, e.g., place of birth and still      conclusion of a rule does not exist in the KB, we call it
perform fairly well for one-to-many relations, e.g., children.       a prediction. Rule Mining approaches require a notion of
   Unlike the previous methods, the approach proposed in [10]        counter-examples and precision for rules, to account for the
relies on a graph representation for KBs and applies ran-            cases where the rules err. In the next section we describe
dom walks and path ranking methods to discover new facts             such notions as well as a method to learn closed Horn rules
in large KBs. In a similar fashion [11] mines frequent meta-         from potentially incomplete KBs.
paths on data graphs, i.e., sequences of data types connected
by labeled edges, and uses them to predict links between en-         3.2      AMIE
tities.                                                                AMIE [6] is a system that learns closed Horn rules of the
   All the approaches mentioned so far tackle the link pre-          form:
diction problem in KBs in a general way. Our approach in                      B1 ∧ · · · ∧ Bn ⇒ r(x, y) Abbrev. B ⇒ r(x, y)
contrast, has a more focused scope, since we aim at predict-
ing semantic links for entities for which there exists a signal      AMIE assesses the quality of rules in two dimensions: sta-
of semantic connection, namely a wikilink.                           tistical significance and confidence. The first dimension is
   Wikilinks for type induction. Some approaches have                measured by the support of the rule. This metric is defined
leveraged the semantic value conveyed by wikilinks for the           according to the following formula:
task of type inference in KBs. The work presented in [15]
                                                                          supp(B ⇒ r(x, y)) := #(x, y) : ∃z1 , ..., zm : B ∧ r(x, y)
represents the set of wikilinks as a directed graph where
each entity is replaced by its more specific type in the DB-         In other words, the support is the number of distinct assign-
pedia type hierarchy. The method discovers frequent sub-             ments of the head variables for which the rule concludes a
graph patterns on such graph. These are called Encyclopedic          fact in the KB. Support is defined to be monotonic; given
Knowledge Patterns (EKP). EKPs can be used to describe               a rule, the addition of a new atom will never increase its
classes of entities and therefore predict the types for untyped      support. Moreover, support is a measure of statistical evi-
entities, e.g., instances of soccer players will often link to in-   dence, thus, it does not gauge the precision of the rule, i.e.,
how often it draws correct or incorrect conclusions. This                       the cases, this corresponds to its location. We also observe
requires a notion of negative examples. Since KBs do not                        that in our dataset, 81% of the links for these classes are
encode negative information, rule mining approaches resort                      not semantified. Rule mining techniques can help us learn
to different assumptions to derive counter-evidence. Meth-                      the patterns suggested by Table 1 and semantify more links.
ods based on traditional association rule mining [8] resort to                  For example, the fact that organizations link to the places
the Closed World Assumption (CWA). Under the CWA, any                           where they are located can be expressed as:
conclusion of the rule that is absent in the KB, is a counter-
                                                                                  linksT o(x, y) ∧ is(x, Org) ∧ is(y, Loc) ⇒ location(x, y)
example. This mechanism, however, contradicts the Open
World Assumption that KBs make. In constrast, AMIE                              Such a rule would allow us to predict the relation location
uses the Partial Completeness Assumption (PCA) to deduce                        for unsemantified wikilinks between organizations and loca-
counter-examples. The PCA is the assumption that if a KB                        tions. This is a link prediction task and has a great value
knows some r-values for an instance, then it knows all its                      for web-extracted KBs such as YAGO or DBpedia.
values. If a rule predicts a second nationality for Barack                         We start by constructing a training set K from DBpedia
Obama, knowing that he is American, the PCA will count                          3.81 consisting of 4.2M facts and 1.7M entities, including
such deduction as a counter-example. On other hand if the                       people, places and organizations. We enhance this dataset
KB did not know any nationality for Obama, then such case                       with the type information about the entities, i.e., 8M rdf:type
would be disregarded as evidence, while the CWA would still                     statements, and the wikilinks between those entities. Since
count it as negative evidence. Notice that, the PCA is per-                     we can only learn from already semantified wikilinks, we
fectly safe for functional relations, e.g., place of birth and                  restrict the set of wikilinks to those where both endpoints
still feasible for quasi-functions such as nationality.                         participate in a relation in the data, i.e., linksT o(a, b) ∈ K
   The confidence of a rule under the PCA follows the for-                      iff ∃ r, r0 , x, y : (r(x, a) ∨ r(a, x)) ∧ (r0 (y, a) ∨ r0 (a, y)). This
mula:                                                                           procedure led us to a training set K with a total of 18M
                                      supp(B ⇒ r(x, y))                         facts. We ran AMIE on this dataset and configured it to
pcaconf (B ⇒ r(x, y)) :=                                                        mine closed Horn rules of the form:
                            #(x, y) : ∃z1 , . . . , zk , y 0 : B ∧ r(x, y 0 )
The PCA confidence normalizes the support of the rule (num-                          linksT o∗ (x, y) ∧ B ∧ is(x, C) ∧ is(y, C 0 ) ⇒ r(x, y)
ber of positive examples) over the number of both the posi-                     where linksTo is an alias for wikiPageWikiLink, linksTo*
tive and the negative examples according to the PCA.                            denotes either linksTo or linksTo−1 , ”is” is a synonym for
   AMIE uses support and confidence as quality metrics for                      rdf:type and B is a conjunction of up to 2 atoms. We call
rules and the user can threshold on these metrics. In ad-                       them semantification rules. With support and PCA confi-
dition, AMIE implements a set of strategies to guarantee                        dence thresholds 100 and 0.2 respectively, AMIE found 3546
good runtime and rules of good quality. Examples of such                        semantification rules on the training set K. Table 2 shows
strategies are prune by support and the skyline technique. To                   examples of those rules.
prune the search space efficiently, AMIE relies on the mono-                       We then use the rules to draw predictions of the form
tonicity of support, that is, once a rule has dropped below                     p := r(a, b), i.e., r(a, b) ∈  / K. We restrict even further
the given support threshold, the system can safely discard                      the set of predictions, by requiring the arguments to be
the rule and all its derivations with more atoms. The skyline                   the endpoints of unsemantified wikilinks, more precisely,
technique, on the other hand, is an application of the Occam                    @ r0 : r0 6= linksT o ∧ r0 (a, b) ∈ K. Recall that those predic-
Razor principle: among a set of hypotheses with the same                        tions may have a different degree of confidence depending
predictive power, the one with fewer assumptions (the sim-                      on the confidence of the rules that are used to deduce them.
plest) should be preferred. If the system has already learned                   Moreover, a prediction can in principle be deduced by multi-
a rule of the form B ⇒ r(x, y) and then finds a more specific                   ple rules since AMIE explores the search space of rules in an
version of the rule, i.e., B ∧ rn (xn , yn ) ⇒ r(x, y), the more                exhaustive fashion. To take this observation into account,
specific rule will be output only if it has higher confidence.                  we define the confidence of a prediction p according to the
                                                                                following formula:
4.   SEMANTIFYING WIKILINKS                                                                            |R|
   Our approach to semantify wikilinks relies on the intu-
                                                                                                       Y
                                                                                    conf (p) := 1 −          (1 − [φ(Ri , p) × pcaconf (Ri )])      (1)
ition that (a) wikilinks often convey a semantic connection                                            i=1
between entities, (b) some of them are already semantified
in KBs, (c) the types of the entities in the wikilink define the                where R is the set of semantification rules and φ(Ri , p) = 1
signature of its implicit relation and (d) the already semanti-                 if Ri ` p, i.e., if p is concluded from rule Ri ; otherwise
fied wikilinks can help us semantify the others. The already                    φ(Ri , p) = 0. The rationale behind Formula 1 is that the
semantified wikilinks constitute our training set. From this                    more rules lead to a prediction, the higher the confidence on
training set, we mine a set of semantic patterns in the form                    that prediction should be. The confidence is then defined as
of logical rules.                                                               the probability that at least one of the rules Ri that con-
   To justify our intuition, we look at the types of the end-                   cludes p applies. This can be calculated as 1 minus the prob-
points of semantified wikilinks in DBpedia. We restrict our                     ability that none of the rules holds. The latter probability
analysis to the classes Person, Place and Organization. Ta-                     is defined as the product of the probabilities that each rule
ble 1 shows the most common relations holding between                           in isolation does not hold, in other words (1 − pcaconf (Ri )).
pairs of those entities for which there exists at least one wik-                Formula 1 thus, makes two strong assumptions. First, it
ilink. For example, we observe that when a person links to a                    confers a probabilistic interpretation to the PCA confidence.
place, in 56% of the cases, the person was born in that place.                  1
                                                                                  We learn rules on DBpedia 3.8 to corroborate some of their
Similarly, when an organization links to a place, in 19% of                     predictions automatically in DBpedia 3.9
       Domain           Range                                     Relation - % occurrences
        Person          Person         successor          18%    associatedBand 11% associatedMusicalArtist             11%
        Person           Place         birthPlace         56%    deathPlace     18% nationality                         8%
        Person        Organization     team               53%    almaMater      8%    party                             5%
         Place           Place         isPartOf           29%    country        28% location                            13%
         Place          Person         leaderName         42%    architect      32% saint                               12%
         Place        Organization     owner              24%    tenant         16% operatedBy                          12%
      Organization    Organization     sisterStation      18%    associatedBand 15% associatedMusicalArtist             15%
      Organization      Person         currentMember      22%    bandMember     20% formerBandMember                    20%
      Organization       Place         location           19%    city           17% hometown                            13%

Table 1: Top-3 relations encoded in wikilinks between instances of Person, Place and Organization in DBpedia.

   Rule                                                                                                          PCA. Conf.
   linksT o(x, y) ∧ parent(x, y) ∧ successor(y, x) ∧ is(x, P erson) ∧ is(y, P erson) ⇒ predecessor(x, y)              1.0
   linksT o(x, y) ∧ picture(x, y) ∧ is(x, ArchitecturalStructure) ∧ is(y, P opulatedP lace) ⇒ location(x, y)          0.94
   linksT o(y, x) ∧ owner(x, y) ∧ subsidiary(y, x) ∧ is(y, Co.) ∧ is(x, Co.) ⇒ owningCompany(x, y)                    1.0

                         Table 2: Some semantification rules mined by AMIE on DBpedia.


               Precision@1      Precision@3                                    Rules without wikilink             857
                0.77 ± 0.10      0.67 ± 0.07                                   Rules with wikilink               1509
                                                                               Rules with confidence gain        1389
Table 3: Average MAP@1 and MAP@3 scores for                                    Weighted average gain (wag)       0.03
semantification of wikilinks on DBpedia.                                       Rules with gain ≥ 0.1              139

                                                                   Table 5: Statistics about rule mining with and with-
Second, it assumes that rules are independent events. While        out wikilinks.
we do not claim these assumptions to be correct, they still
provide a naive baseline to estimate the likelihood of facts
without resorting to more sophisticated approaches for data
                                                                   5.   WIKILINKS FOR RULE MINING
inference. As we show later, such a naive estimator delivers          The skyline technique implemented in AMIE prevents the
satisfactory results in our scenario.                              system from reporting low quality rules. If AMIE finds two
   Given an unsemantified wikilink l := linksT o(a, b), For-       rules B ⇒ r(x, y) and B ∧ rn (xn , yn ) ⇒ r(x, y) and the
mula 1 allows us to propose a list of candidate meanings for       latter has lower confidence, the system will not output it
l. If among the set of predictions there are several facts of      because it is worse in all dimensions, i.e., it has also lower
the form ri (a, b), then each relation ri is a semantification     support. We therefore investigate the confidence gain car-
candidate for l with confidence conf (ri (a, b)). For each un-     ried by the addition of wikilink atoms in rules.
semantified link, we propose a list of semantification candi-         We first run AMIE on the DBpedia mapping-based triples.
dates sorted by confidence. Our procedure proposes relation        In a second run, we add the wikilinks to the mapping-based
candidates for 180K unsemantified wikilinks in the training        triples and instruct the system to mine, when possible, rules
set. Since, we can semantify only 1% of them by automat-           of the form linksT o∗ (x, y) ∧ B ⇒ r(x, y), i.e., if the skyline
ically checking our predictions in DBpedia 3.9, we evaluate        technique does not prune the longer rule. In both cases, we
the precision of our approach on a sample of 60 unseman-           set a threshold of 100 positive examples for support and no
tified wikilinks. We then evaluate the correcteness of their       confidence threshold. We report our findings in Table 5. We
rankings of semantification candidates as follows: for each        observe that requiring the head variables to be connected via
wikilink we count the number of correct candidates at top          a wikilink increases the number of rules from 857 to 1509.
1 and top 3 of the ranking, we then add up these counts            This occurs because in the second run, AMIE sometimes
and divide them by the total number of candidates at top           mines versions of the rules with and without the linksTo ∗
1 and top 3 respectively. This gives us an estimation of the       atom. In other words, for some rules the addition of a wik-
precision of our approach. Table 3 shows the estimated pre-        ilink atom provides a confidence gain. This is the case for
cision values drawn from the sample as well as the size of         1389 rules as Table 5 shows. We are interested in finding
the Wilson Score Interval [4] at confidence 95%. The results       how much confidence gain is carried by those rules. Thus,
imply that, for example, the precision at top 1 for the whole      we define the gain of a wikilink rule as a variant of the gain
set of wikilinks lies in the interval 77% ± 10% with 95%           metric used in association rule mining [2]:
probability.                                                        gain(R) := supp(R) × (pcaconf (R) − pcaconf (R¬linksTo ))
   Table 4 shows some examples of wikilinks and the rank-
ing of semantification candidates proposed by our approach.        That is, the gain of a wikilink rule is the product of its
The number in parentheses corresponds to the confidence of         support and the difference in confidence with respect to the
the semantification candidate. The candidates evaluated as         rule without the linksTo ∗ atom. Table 5 reports an aver-
correct according to the our evaluation are in italics.            age gain of 0.03. We find, however, that for 10% of rules,
             WikiLink                                            Semantification candidates
             Interstate 76 (west) → Colorado State Highway       routeJunction (1.0)
             J. Bracken Lee → Herbert B. Maw                     predecessor (1.0), parent(0.998), governor(0.882)
             WHQX → WTZE                                         sisterStation (1.0)

 Table 4: Some examples of semantification candidates for wikilinks. The correct candidates are in italics.

 Rule                                                ∆-conf         [4] L. D. Brown, T. T. Cai, and A. DasGupta. Interval
 producer(x, y) ∧ recordLabel(x, y) ⇒ artist(x, y)    0.34
                                                                        estimation for a binomial proportion. Statistical
 debutT eam(x, y) ⇒ team(x, y)                        0.28
 of f icialLanguage(x, y) ⇒ spokenIn(x, y)            0.19
                                                                        Science, 2001.
                                                                    [5] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer.
Table 6: Confidence gain for some rules when spe-                       Learning probabilistic relational models. In IJCAI,
cialized with a linksTo atom on the head variables.                     1999.
                                                                    [6] L. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek.
                                                                        AMIE: Association rule mining under incomplete
the gain can be higher than 0.1. We show some of those                  evidence in ontological knowledge bases. In WWW,
rules with their corresponding confidence gain in Table 6. It           2013.
follows that, in the majority of cases, the wikilinks do not        [7] A. Garcı́a-Durán, A. Bordes, and N. Usunier. Effective
provide a significant confidence gain to rule mining in DB-             blending of two and three-way interactions for
pedia. The reason lies on the fact that for 99% of the triples          modeling multi-relational data. In ECML-PKDD,
in the DBpedia mapping-based dataset, there is a wikilink               2014.
between the arguments of the triples, that is, the addition         [8] B. Goethals and J. Van den Bussche. Relational
of a wikilink atom does not provide additional information              Association Rules: Getting WARMER. In Pattern
to the rule. On the other hand, for certain relations, the ar-          Detection and Discovery, volume 2447. Springer Berlin
guments are not sometimes not connected with a wikilink.                / Heidelberg, 2002.
This is the case for 100K triples. In such cases, the addition
                                                                    [9] D. Lange, C. Böhm, and F. Naumann. Extracting
of a linksT o∗ atom may convey a confidence gain that can
                                                                        structured information from wikipedia articles to
be used to improve the quality of the rules.
                                                                        populate infoboxes. In CIKM, 2010.
   All our datasets and experimental results are available un-
                                                                   [10] N. Lao, T. Mitchell, and W. W. Cohen. Random walk
der http://luisgalarraga.de/semantifying-wikilinks.
                                                                        inference and learning in a large scale knowledge base.
                                                                        In EMNLP, 2011.
6.   CONCLUSIONS                                                   [11] C. Meng, R. Cheng, S. Maniu, P. Senellart, and
   While none of the major Wikipedia-centric KBs make fur-              W. Zhang. Discovering meta-paths in large
ther use of the wikilinks, in this work we have shown that              heterogeneous information networks. In WWW, 2015.
they often encode latent relations between entities. Such re-      [12] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way
lations may not be captured in KBs. We have shown that                  model for collective learning on multi-relational data.
rule mining techniques and naive inference methods are a                In ICML, 2011.
feasible alternative to accurately discover those implicit se-     [13] M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing
mantics. This wikilink semantification task can be seen as a            YAGO: Scalable machine learning for linked data. In
particular case of the link prediction problem in KBs. With             WWW, 2012.
this work, we aim at turning the attention to the wikilinks,       [14] F. Niu, C. Ré, A. Doan, and J. Shavlik. Tuffy: Scaling
as they convey valuable information that can help improve               up statistical inference in markov logic networks using
the completeness of KBs.                                                an rdbms. VLDB Endowment., 2011.
                                                                   [15] A. Nuzzolese, A. Gangemi, V. Presutti, and
7.   ACKNOWLEDGMENTS                                                    P. Ciancarini. Encyclopedic knowledge patterns from
  This work is supported by the Chair “Machine Learning                 wikipedia links. In ISWC. 2011.
for Big Data” of Télécom ParisTech and Labex DigiCosme           [16] A. G. Nuzzolese, A. Gangemi, V. Presutti, and
(project ANR-11-LABEX-0045-DIGICOSME) operated by                       P. Ciancarini. Type inference through the analysis of
ANR as part of the program “In- vestissement d’Avenir” Idex             wikipedia links. In LDOW, 2012.
Paris-Saclay (ANR-11-IDEX-0003-02).                                [17] M. Richardson and P. Domingos. Markov logic
                                                                        networks. Mach. Learn., 62(1-2):107–136, Feb. 2006.
8.   REFERENCES                                                    [18] A. P. Singh and G. J. Gordon. Relational learning via
 [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann,                       collective matrix factorization. In KDD, 2008.
     R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a            [19] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A
     web of open data. In ISWC, 2007.                                   Core of Semantic Knowledge. In WWW, 2007.
 [2] R. J. Bayardo. Mining the most interesting rules.             [20] Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge
     pages 145–154, 1999.                                               graph embedding by translating on hyperplanes. In
 [3] A. Bordes, N. Usunier, A. Garcı́a-Durán, J. Weston,               AAAI, 2014.
     and O. Yakhnenko. Translating embeddings for                  [21] F. Wu and D. S. Weld. Autonomously semantifying
     modeling multi-relational data. In NIPS, 2013.                     wikipedia. In CIKM, 2007.