Predicting the quality of semantic relations by applying Machine Learning Classifiers to the Semantic Web Miriam Fernandez, Marta Sabou, Petr Knoth, Enrico Motta Knowledge Media Institute (KMi) The Open University Walton Hall, Milton Keynes, Mk7 6AA, United Kingdom {m.fernandez, r.m.sabou, p.knoth, e.motta}@open.ac.uk ABSTRACT correct independently of an interpretation context, in the case of In this paper, we propose the application of Machine Learning Chapter ⊆ Book, subsumption has been used incorrectly to model (ML) methods to the Semantic Web (SW) as a mechanism to pre- a meronymy relation. dict the correctness of semantic relations. For this purpose, we One of the first attempts to address this problem is the work of have acquired a learning dataset from the SW and we have per- Sabou et al. [4]. In this wok the authors investigate the use of the formed an extensive experimental evaluation covering more than Semantic Web (SW) as a source of evidence for predicting the 1,800 relations of various types. We have obtained encouraging correctness of a semantic relation. They show that the SW is not results, reaching a maximum of 74.2% of correctly classified se- just a motivation to investigate the problem, but a large collection mantic relations for classifiers able to validate the correctness of of knowledge-rich results that can be exploited to address it. Fol- multiple types of semantic relations (generic classifiers) and up to lowing this idea, the work presented in this paper makes use of the 98% for classifiers focused on evaluating the correctness of one SW as a source of evidence for predicting the correctness of se- particular semantic relation (specialized classifiers). mantic relations. However, as opposed to [4], which introduces several evaluation measures based on the adaptation of existing Categories and Subject Descriptors Natural Language methodologies to SW data, this work aims to approach the problem using Machine Learning (ML) techniques. I.5.2 [Pattern Recognition]: Design Methodology –Classifier For this purpose, we have worked on: a) acquiring a medium- design and evaluation, Feature evaluation and selection, Pattern scale learning dataset from the SW and b) performing an experi- analysis. mental evaluation covering more than 1,800 relations of various types. We have obtained encouraging results, reaching a maxi- General Terms mum of 74.2% of correctly classified semantic relations for clas- Algorithms, Measurement, Design, Experimentation. sifiers able to validate the correctness of multiple types of seman- tic relations (generic classifiers) and up to 98% for classifiers Keywords focused on evaluating the correctness of one particular semantic Semantic Web, Semantic Relations, Machine Learning. relation (specialized classifiers). 1. INTRODUCTION 2. ACQUIRING A LEARNING DATASET The problem of relation extraction between two terms is a well- The problem addressed in this work can be formalized as a classi- known research problem traditionally addressed by the Natural fication task. In this type of Machine Learning problems, the Language Processing (NLP) community. The approaches found in learning method is presented with a set of classified examples the literature follow several different trends like: the exploitation from which it is expected to learn how to predict the classification of lexical patters to extract relations from textual corpora [3], the of unseen examples. The collection of classified examples, or the generation of statistical measures that detect correlations between learning dataset, is obtained in three phases. In the first phase, a words based on their frequency within documents [2] or, the ex- set of manually evaluated semantic relations is acquired. These ploitation of structured knowledge resources like WordNet1 to relations can be seen as a quadruple where s is the detect or refine relations [1]. source term, t is the target term, R is the relation to be evaluated, and e {T, F} is a manual Boolean evaluation provided by users With the evolution of the SW notion of knowledge reuse, from an where T denotes a true or correct relation, and F denotes a false or ontology-centered view, to a more fine-grained perspective where incorrect relation; e.g., . This expe- individual knowledge statements (i.e., semantic relations) are rimental data is obtained from the datasets of the Ontology reused rather than entire ontologies, a parallel problem arises: Alignment Evaluation Initiative2 (OAEI) and includes the estimating the correctness of a known relation between two terms. AGROVOC/NALT and the OAEI'08 datasets. These datasets As an illustrative example, imagine the two following relations: comprise a total of 1,805 semantic relations of different types: ⊆, Book – containsChapter –Chapter, Chapter ⊆ Book. While the ⊇, ⊥ and named. Among them, 1,129 are evaluated as true (T), relation Book – containsChapter –Chapter can be considered 1 2 http://wordnet.princeton.edu/ http://oaei.ontologymatching.org/ correct relations, and 676 are evaluated as false (F), incorrect measures for the positive and negative class: True Positives rate relations. In the second phase, a set of SW mappings (occurrences (TP), False Positives rate (FP), Precision, Recall, F-Measure (F- of relations containing the same or equivalent source, s and target, Mea) and ROC area value. More details about these measures can t terms in the publicly available SW data) is obtained for each be found in [5]. The results obtained by the best classifier for each particular semantic relation. These mappings are extracted using classification problem can be seen in Table 1. the services of the Watson SW gateway. Specific details about the Table 1. Best results obtained for each dataset SW mapping extraction algorithm can be found in [4]. In the third phase, these mappings are formalized and represented in Generic ⊆ ⊇ named terms of the values of their features (or attributes). The selected J48 J48af NvBayes J48 attributes to represent each classified example are: Correct 74.2044% 85.2077% 98.0122% 76.1555% • e, the relation correctness {T, F}. This is the class attribute, i.e., the one that will be predicted for future examples. Incorrect 25.7956% 14.7923% 1.9878% 23.8445% • Type(R), the type of relation to be evaluated: ⊆, ⊇, ⊥ and TPRate 0.742 0.852 0.98 0.762 named relations. FPRate 0.254 0.122 0.06 0.209 • | M |, the number of mappings. Precision 0.76 0.889 0.984 0.79 • | M ⊆ |, the number of subclass mappings. • | M ⊇ |, the number of superclass mappings. Recall 0.742 0.852 0.98 0.762 • | M ⊥ |, the number of disjoint mappings. F-Mea 0.747 0.851 0.981 0.766 • | M R |, the number of named related mappings. ROC 0.749 0.875 0.995 0.767 • | M S |, the number of sibbling mappings. • For each particular mapping Mi we consider 4. CONCLUSIONS AND FUTURE WORK  Type (Ri), the relation type of the mapping: ⊆, ⊇, ⊥, named In this paper, we investigate the problem of predicting the and sibling.  Pl (Mi) the path length of the mapping Mi correctness of semantic relations. Our hypothesis is that ML  Np (Mi) the number of paths that lead to the mapping Mi. methods can be adapted to exploit the SW as a source of Note that for sibling and named mappings the connection knowledge to perform this task. The result of our experi- can be derived from 2 different paths connected by a com- ments are promising, reaching a maximum of 74.2% of cor- mon node. rectly classified semantic relations for classifiers able to  | Mi ⊆ |, the number of subclass relations in Mi validate the correctness of multiple types of semantic rela-  | Mi ⊇ |, the number of superclass relations in Mi tions (generic classifiers) and up to 98% for classifiers fo-  | Mi ⊥|, the number of disjoint relations in Mi cused on evaluating the correctness of one particular se-  | Mi R |, the number of named relations in Mi mantic relation (specialized classifiers). Despite the success in the prediction process obtained by 3. EXPERIMENTS AND RESULTS This study addressed four different classification problems: pre- the classifiers, it is important to highlight that only 60% of dicting the correctness of any particular semantic relation (generic the relations contained in these datasets were covered by classifiers) and predicting the correctness of a given type of se- the SW. This limits our approach to domains where seman- mantic relation: ⊆, ⊇ or named (specialized classifiers). Note that tic information is available, which constitutes an open prob- the ⊥ relation has been discarded from our experiments due to the lem for future research work. lack of negative examples. To address each of these problems, three different classifiers: the J48 Decision Tree, the NaiveBayes 5. REFERENCES classifier, and the LibSVM classifier, all of them provided by We- [1] Budanitsky, A. and Hirst, G. Evaluating WordNet-based ka [5] were used. Each classifier was applied using the whole set measures of semantic distance. 2006. Computational Lin- of attributes (Section 2) or a filtered set of attributes (af) obtained guistics, 32(1):13-47. using a combination of the cfSubsetEval and the BestFirst algo- rithms [5]. To train and test the classifiers, each dataset was di- [2] Calibrasi, R.L. and Vitanyi, P.M. The Google Similarity vided in the following way: approximately 70% of the data was Distance. 2007. IEEE Transactions on Knowledge and Data used for training and 30% of the data was used for testing. This Engineering, 19(3):370-383. division was done manually to avoid the appearance of mappings [3] Cimiano, P. 2006. Ontology Learning and Population from coming from the same semantic relation in the training and the Text: Algorithms, Evaluation and Applications. Springer- test sets. Note that the SW mappings coming from the same se- Verlag New York, Inc. mantic relation share in common at least the first eight attributes, [4] Sabou, M. and Gracia, J. Spider: Bringing Non-Equivalence therefore, it is important to maintain them together in the same set Mappings to OAEI. 2008. In Proc. of the Third International (either the train or the test set) for a fair evaluation. To evaluate Workshop on Ontology Matching the classifiers and compare them against each other the following measures were selected: the percentage of correctly classified [5] Witten, I.H. and Eibe F. Data Mining. Practical Machine instances, the percentage of incorrectly classified instances and, Learning Tools and Techniques. 2000. The Morgan Kauf- the weighted average of the values obtained using the following mann Series in Data Management Systems