Predicting the quality of semantic relations by applying
         Machine Learning Classifiers to the Semantic Web
                                    Miriam Fernandez, Marta Sabou, Petr Knoth, Enrico Motta
                                                    Knowledge Media Institute (KMi)
                                                            The Open University
                                          Walton Hall, Milton Keynes, Mk7 6AA, United Kingdom
                                    {m.fernandez, r.m.sabou, p.knoth, e.motta}@open.ac.uk


ABSTRACT                                                                correct independently of an interpretation context, in the case of
In this paper, we propose the application of Machine Learning           Chapter ⊆ Book, subsumption has been used incorrectly to model
(ML) methods to the Semantic Web (SW) as a mechanism to pre-            a meronymy relation.
dict the correctness of semantic relations. For this purpose, we        One of the first attempts to address this problem is the work of
have acquired a learning dataset from the SW and we have per-           Sabou et al. [4]. In this wok the authors investigate the use of the
formed an extensive experimental evaluation covering more than          Semantic Web (SW) as a source of evidence for predicting the
1,800 relations of various types. We have obtained encouraging          correctness of a semantic relation. They show that the SW is not
results, reaching a maximum of 74.2% of correctly classified se-        just a motivation to investigate the problem, but a large collection
mantic relations for classifiers able to validate the correctness of    of knowledge-rich results that can be exploited to address it. Fol-
multiple types of semantic relations (generic classifiers) and up to    lowing this idea, the work presented in this paper makes use of the
98% for classifiers focused on evaluating the correctness of one        SW as a source of evidence for predicting the correctness of se-
particular semantic relation (specialized classifiers).                 mantic relations. However, as opposed to [4], which introduces
                                                                        several evaluation measures based on the adaptation of existing
Categories and Subject Descriptors                                      Natural Language methodologies to SW data, this work aims to
                                                                        approach the problem using Machine Learning (ML) techniques.
I.5.2 [Pattern Recognition]: Design Methodology –Classifier
                                                                        For this purpose, we have worked on: a) acquiring a medium-
design and evaluation, Feature evaluation and selection, Pattern
                                                                        scale learning dataset from the SW and b) performing an experi-
analysis.
                                                                        mental evaluation covering more than 1,800 relations of various
                                                                        types. We have obtained encouraging results, reaching a maxi-
General Terms                                                           mum of 74.2% of correctly classified semantic relations for clas-
Algorithms, Measurement, Design, Experimentation.                       sifiers able to validate the correctness of multiple types of seman-
                                                                        tic relations (generic classifiers) and up to 98% for classifiers
Keywords                                                                focused on evaluating the correctness of one particular semantic
Semantic Web, Semantic Relations, Machine Learning.                     relation (specialized classifiers).


1. INTRODUCTION                                                         2. ACQUIRING A LEARNING DATASET
The problem of relation extraction between two terms is a well-         The problem addressed in this work can be formalized as a classi-
known research problem traditionally addressed by the Natural           fication task. In this type of Machine Learning problems, the
Language Processing (NLP) community. The approaches found in            learning method is presented with a set of classified examples
the literature follow several different trends like: the exploitation   from which it is expected to learn how to predict the classification
of lexical patters to extract relations from textual corpora [3], the   of unseen examples. The collection of classified examples, or the
generation of statistical measures that detect correlations between     learning dataset, is obtained in three phases. In the first phase, a
words based on their frequency within documents [2] or, the ex-         set of manually evaluated semantic relations is acquired. These
ploitation of structured knowledge resources like WordNet1 to           relations can be seen as a quadruple <s, R, t, e> where s is the
detect or refine relations [1].                                         source term, t is the target term, R is the relation to be evaluated,
                                                                        and e {T, F} is a manual Boolean evaluation provided by users
With the evolution of the SW notion of knowledge reuse, from an         where T denotes a true or correct relation, and F denotes a false or
ontology-centered view, to a more fine-grained perspective where        incorrect relation; e.g., <Helicopter, ⊆ , Aircraft, T>. This expe-
individual knowledge statements (i.e., semantic relations) are          rimental data is obtained from the datasets of the Ontology
reused rather than entire ontologies, a parallel problem arises:        Alignment Evaluation Initiative2 (OAEI) and includes the
estimating the correctness of a known relation between two terms.       AGROVOC/NALT and the OAEI'08 datasets. These datasets
As an illustrative example, imagine the two following relations:        comprise a total of 1,805 semantic relations of different types: ⊆,
Book – containsChapter –Chapter, Chapter ⊆ Book. While the              ⊇, ⊥ and named. Among them, 1,129 are evaluated as true (T),
relation Book – containsChapter –Chapter can be considered

1                                                                       2
    http://wordnet.princeton.edu/                                           http://oaei.ontologymatching.org/
correct relations, and 676 are evaluated as false (F), incorrect          measures for the positive and negative class: True Positives rate
relations. In the second phase, a set of SW mappings (occurrences         (TP), False Positives rate (FP), Precision, Recall, F-Measure (F-
of relations containing the same or equivalent source, s and target,      Mea) and ROC area value. More details about these measures can
t terms in the publicly available SW data) is obtained for each           be found in [5]. The results obtained by the best classifier for each
particular semantic relation. These mappings are extracted using          classification problem can be seen in Table 1.
the services of the Watson SW gateway. Specific details about the                    Table 1. Best results obtained for each dataset
SW mapping extraction algorithm can be found in [4]. In the
third phase, these mappings are formalized and represented in                            Generic      ⊆             ⊇             named
terms of the values of their features (or attributes). The selected                      J48          J48af         NvBayes       J48
attributes to represent each classified example are:                       Correct       74.2044%     85.2077%      98.0122%      76.1555%
 • e, the relation correctness {T, F}. This is the class attribute,
      i.e., the one that will be predicted for future examples.            Incorrect     25.7956%     14.7923%      1.9878%       23.8445%
  • Type(R), the type of relation to be evaluated: ⊆, ⊇, ⊥ and             TPRate        0.742        0.852         0.98          0.762
      named relations.                                                     FPRate        0.254        0.122         0.06          0.209
  • | M |, the number of mappings.
                                                                           Precision     0.76         0.889         0.984         0.79
  • | M ⊆ |, the number of subclass mappings.
  • | M ⊇ |, the number of superclass mappings.                            Recall        0.742        0.852         0.98          0.762
  • | M ⊥ |, the number of disjoint mappings.                              F-Mea         0.747        0.851         0.981         0.766
  • | M R |, the number of named related mappings.                         ROC           0.749        0.875         0.995         0.767
  • | M S |, the number of sibbling mappings.
  • For each particular mapping Mi we consider                            4. CONCLUSIONS AND FUTURE WORK
    Type (Ri), the relation type of the mapping: ⊆, ⊇, ⊥, named
                                                                          In this paper, we investigate the problem of predicting the
        and sibling.
    Pl (Mi) the path length of the mapping Mi                            correctness of semantic relations. Our hypothesis is that ML
    Np (Mi) the number of paths that lead to the mapping Mi.             methods can be adapted to exploit the SW as a source of
        Note that for sibling and named mappings the connection           knowledge to perform this task. The result of our experi-
        can be derived from 2 different paths connected by a com-         ments are promising, reaching a maximum of 74.2% of cor-
        mon node.                                                         rectly classified semantic relations for classifiers able to
    | Mi ⊆ |, the number of subclass relations in Mi                     validate the correctness of multiple types of semantic rela-
    | Mi ⊇ |, the number of superclass relations in Mi                   tions (generic classifiers) and up to 98% for classifiers fo-
    | Mi ⊥|, the number of disjoint relations in Mi                      cused on evaluating the correctness of one particular se-
    | Mi R |, the number of named relations in Mi
                                                                          mantic relation (specialized classifiers).
                                                                          Despite the success in the prediction process obtained by
3. EXPERIMENTS AND RESULTS
This study addressed four different classification problems: pre-
                                                                          the classifiers, it is important to highlight that only 60% of
dicting the correctness of any particular semantic relation (generic      the relations contained in these datasets were covered by
classifiers) and predicting the correctness of a given type of se-        the SW. This limits our approach to domains where seman-
mantic relation: ⊆, ⊇ or named (specialized classifiers). Note that       tic information is available, which constitutes an open prob-
the ⊥ relation has been discarded from our experiments due to the         lem for future research work.
lack of negative examples. To address each of these problems,
three different classifiers: the J48 Decision Tree, the NaiveBayes        5. REFERENCES
classifier, and the LibSVM classifier, all of them provided by We-        [1] Budanitsky, A. and Hirst, G. Evaluating WordNet-based
ka [5] were used. Each classifier was applied using the whole set             measures of semantic distance. 2006. Computational Lin-
of attributes (Section 2) or a filtered set of attributes (af) obtained       guistics, 32(1):13-47.
using a combination of the cfSubsetEval and the BestFirst algo-
rithms [5]. To train and test the classifiers, each dataset was di-       [2] Calibrasi, R.L. and Vitanyi, P.M. The Google Similarity
vided in the following way: approximately 70% of the data was                 Distance. 2007. IEEE Transactions on Knowledge and Data
used for training and 30% of the data was used for testing. This              Engineering, 19(3):370-383.
division was done manually to avoid the appearance of mappings            [3] Cimiano, P. 2006. Ontology Learning and Population from
coming from the same semantic relation in the training and the                Text: Algorithms, Evaluation and Applications. Springer-
test sets. Note that the SW mappings coming from the same se-                 Verlag New York, Inc.
mantic relation share in common at least the first eight attributes,      [4] Sabou, M. and Gracia, J. Spider: Bringing Non-Equivalence
therefore, it is important to maintain them together in the same set          Mappings to OAEI. 2008. In Proc. of the Third International
(either the train or the test set) for a fair evaluation. To evaluate         Workshop on Ontology Matching
the classifiers and compare them against each other the following
measures were selected: the percentage of correctly classified            [5] Witten, I.H. and Eibe F. Data Mining. Practical Machine
instances, the percentage of incorrectly classified instances and,            Learning Tools and Techniques. 2000. The Morgan Kauf-
the weighted average of the values obtained using the following               mann Series in Data Management Systems