-

An automatic way of generating incoherent terminologies with parameters

Yu Zhang

Dantong Ouyang

Yuxin Ye ?

0 0 College of Computer Science and Technology, Jilin University , Changchun 130012 , China

117 127

The minimal incoherence preserving sub-terminologies (Mips) is defined for identifying the axioms responsible for the unsatisfiable concepts in incoherent ontology. While a great many performance evaluations have been proposed in the past, what remains to be investigated is whether we have effective reasoners to solve the Mips problems, in which case a particular reasoner will be more efficiency than others. After analyzing the structural complexity of terminology, we develop a Mips Benchmark (MipsBM) to evaluate the performances of reasoners by defining six complexity metrics based on concept dependency networks model. Evaluation experiments show that the proposed metrics can effectively reflect the complexity of benchmark data. Not only can the benchmark help the users to determine which reasoner is likely to perform best in their applications, but also help the developers to improve the performances and qualities of their reasoners.

Incoherent terminology Mips Benchmark MipsBM

In practice, building an ontology is a very complicated process and is easy to make errors, an ontology O is incoherent if there exists an unsatisfiable concept in O, and the existence of unsatisfiable concept indicates that the formal definition is incorrect. Therefore, how to find all the unsatisfiable concepts is the challenging of ontology debugging. Researchers have proposed various methods to debug incoherent ontology. Ontology debugging is achieved by using reasoners, currently, most of the reasoners, such as Pellet [ 1 ], HermiT[ 2 ], FaCT++[ 3 ], TrOWL[ 4 ] and JFact support the inference tasks. A great many performance evaluations for reasoners have been performed in the past, What remains to be investigated is whether we have effective reasoners to solve the Mips problems, in which cases a particular reasoner will be more efficiency than others. There are several criteria for a good benchmark test data. First, we need to systematically construct several types of logical contradictions to create an incoherent TBox. Second, there must be a number of parameters that could influence the complexity of benchmark data and the difficulty for reasoning.

Related Work

In the research of knowledge base query, (LUBM) [ 5 ] is developed based on several complexity metrics of ontology and provides 14 test queries to assess the efficiency, correctness and completeness of the knowledge base. However, the correlations between the classes of LUBM are low, thus Li Ma extends it to University Ontology Benchmark (UOBM) [ 6 ] by adding a series of association classes. However, either LUBM or UOBM only can evaluate single ontology, thus Yingjie Li et al. [ 7 ] develop a multi-ontology synthetic benchmark that can evaluate not only single ontology but also federated ontologies. In the research of ontology matching, Alfio Ferrara et al. [ 8 ] propose a disciplined approach to the semiautomatic generation of benchmarks called SWING (Semantic Web Instance Generation), but all the evaluations in SWING are only for single language, so Christian Meilicke et al. [ 9 ] design a benchmark for multilingual ontology matching called MultiFarm. Besides, the work in [ 10 ] presents the design of a modular test generator to evaluate different matchers on the generated tests. In the research of ontology reasoning and debugging, the benchmarks proposed in [ 11 ] and [ 12 ] are used to evaluate the classification performances of reasoners. The work in [ 13 ] focus on the applicability of specific reasoners to certain expressivity clusters, and evaluate the loading time, classification and conjunctive queries performances of reasoners. JustBench [ 14 ] is a typical benchmark to evaluate the reasoners for calcuating justification. In [ 15 ], several machine learning techniques are used to predict classification time and determine the metrics that can be used to predict reasoning performance. The work in [ 16 ] proposes a method to construct the justification dataset from realistic ontologies with different sizes and expressivities. 3

Complexity Analysis for Incoherent TBox

The expressivity of a particular DL is determined by the concept constructors it provides [ 17 ]. SHOIN (D) is a very expressive DL that provides the constructors including H (role hierarchies), O(nominals), I(inverse roles), N (Number restriction) and S is the abbreviation for ALC with transitive roles. ALC is the basic description logic consisting of the constructors ¬C (negation), C u D(conjunction), C t D(disjunction), ∃r.C(existential restriction) and ∀r.C(value restriction). Stefan Schlobach proposes the minimal unsatisfiability preserving sub-TBox (Mups)[ 18 ] to identify the axioms responsible for the unsatisfiability of concepts in incoherent TBox. For T in example 1, it can be shown that the concepts C3, C5, C6, C9, C10 are unsatisfiable by using standard DL TBox reasoning. We can get their Mups:

Mups(T , C3) = {{α11}}, Mups(T , C5) = {{α13}}, Mups(T , C6) = {{α9, α10, α14}}, Mups(T , C9) = {{α10, α12, α15, α16, α17}}, Mups(T , C10) = {{α13, α18}, {α9, α10, α14, α18}, {α10, α12, α15, α16, α17, α18}}. Definition 1 (MIPS[ 18 ]). A TBox T 0 ⊆ T is a minimal incoherence preserving sub-TBox (MIPS) of T if and only if T 0 is incoherent, and every sub-TBox T 00 ⊂ T 0 is coherent. The set of all MIPS of T is denoted as MIPS(T ). We will abbreviate the set of MIPS for T by Mips(T ). For T in example 1 we can get Mips(T ) = {{α11}, {α13}, {α9, α10, α14}, {α10, α12, α15, α16, α17}}. Definition 2 (Mips Size). Let Mips(T ) be the Mips of an incoherent TBox T , the number of axiom set in the Mips(T ) is called Mips Size.

Let Ms represent the Mips size, for Mips(T ) = {{α11}, {α13}, {α9, α10, α14}, {α10, α12, α15, α16, α17}}, there are four axiom sets in the Mips(T ), thus the Mips size Ms =4.

Definition 3 (Mips Depth). Let Mips(T ) be the Mips of an incoherent TBox T , the maximum number of axioms in all the axiom sets is called Mips Depth. Let Md represent the Mips depth. Using the previous example again, both the number of axioms in the first axiom set {α11} and the second axiom set {α13} are one, while in the third axiom set {α9, α10, α14}, the number is three, and in the last axiom set {α10, α12, α15, α16, α17}}, the number is five, thus the maximum number of axioms Md =5.

Given a TBox T , the concept dependency networks N are defined as follows. Definition 4 (concept dependency networks). A directed graph N=(V,E) is a corresponding concept dependency networks of a given TBox T , where V is the set of vertices representing all the concepts in T , and E is the set of edges representing all the dependencies between concepts.

Definition 5 (concept depth). In the concept dependency networks of TBox T , suppose the concept depth of C is cd(C), cd(C) can be recursively defined as follows. if C =. C1 u C2, then dep(C) = max(cd(C1), cd(C2)) + 1; if C =. C1 t C2, then dep(C) = max(cd(C1), cd(C2)) + 1; if C =.. ∃r.C1, then cd(C) = cd(C1) + 1; iiff CC ==. ∀Cr1.,Ct1h,etnhecnd(cCd)(C=)c=d(cCd1()C+1)1+; 1; if C =. ¬C1, then dep(C) = cd(C1) + 1; if C is an atom, then cd(C) = 0;

The = is either ≡ or v.

If the concept depth of C is 1, C is called a simple concept, otherwise called a complex concept. Suppose that TBox T contains p simple concepts and q complex concepts, we have the total number of concepts m = p+q. Besides, the maximal concept depth of T , denoted as λ, can be defined as: λ = max(cd(Ci)), 1 ≤ i ≤ m.

Definition 6 (semantic cluster). In the TBox T , the subTBox T 0 ⊆ T which is composed of concepts linked together by semantic dependency relationship, is called a semantic cluster of T .

Suppose that the number of semantic dependency is μ. The semantic cluster must satisfy the constraint p + μ Piλ=1 dep(Ci) = m. Furthermore, the clustering coefficient can be defined as: η = μ Piλ=1 dep(Ci) .

m (1) If μ = 0, which means there is not any semantic cluster in the TBox, so the minimum of clustering coefficient ηmin = 0. If, however, p = 0, which means the TBox is composed only of complex concepts, then μ Piλ=1 dep(Ci) = m, so the maximum of clustering coefficient ηmax = 1. 4

MipsBM System

MipsBM consists of two components: satisfiable concept generator and unsatisfiable concept generator. According to the characteristics of axioms appearing in SHOIN (D) TBox, we categorize them into two groups: constructors and operands. The constructors group consists of concept constructor and property constructor. And the operands group is composed of atom set and role set. The constructors and operands table are shown in Table 1.

The proof for Algorithm 1 is as follows.

Proof. Because there are not any complement or disjoint constructors in the Satisfiable Constructors in Table 1, the concepts generated by Algorithm 1 must be satisfiable.

The first while loop corresponds to the number of semantic clusters, in each loop, the algorithm creates a semantic cluster, and the value of μ is decreased by 1 until μ = 0. The second while loop corresponds to the maximum concept depth, in each loop, the algorithm creates a concept, and the concept depth of the latter concept is 1 bigger than that of the former one. When the loop is finished, the concept depth of the last concept reaches λ. After that, the number of satisfiable concepts is obtained, the rest of the concepts are created in the third while loop.

In order to build an incoherent terminology, MipsBM needs to create several unsatisfiable concepts which can be achieved through systematically constructing logical clashes.

Definition 7 (Independent Unsatisfiable Concept). C is an independent unsatisfiable concept if the unsatisfiability of C depends on the concept definition rather than the unsatisfiability of other concepts.

Definition 8 (Dependent Unsatisfiable Concept). C is a dependent unsatisfiable concept if the unsatisfiability of C depends on the unsatisfiability of other concepts.

From the Example 1, C3, C5, C6 and C9 are independent unsatisfiable concepts, C10 is dependent unsatisfiable concept because its unsatisfiability depends on unsatisfiable concepts C5, C6, C9.

Definition 9 (Clash Sequences). Let Seq+(C) be the positive clash sequence of C, and Seq−(C) the negative clash sequence. Seq+(C) is of the form < (C1, I1, C2) , (C2, I2, C3), · · · , (Cm, Im, C) > (i = 1, · · · , m), where Ci v Ci−1, Ii represents the indexes of axioms related to Ci v Ci−1. Seq−(C) is of the form < (¬C1, I10, C20), (C20, I0 , C30), · · · , (Cn0, In0, C) > (i = 1, · · · , n), where Ci0 v Ci0−1 , 2 Ii0 represents the indexes of axioms related to Ci0 v Ci0−1. After that, the unsatisfiable concept C can be generated by C v Cm u Cn0.

For example, The clash sequences of C9:

Seq+(C9)=< (A1, {α10}, C2), (C2, {α10, α15}, C7), (C7, {α10, α15, α17}, C9)) >, Seq−(C9)=< (¬A1, {α12}, C4), (C4, {α12, α16}, C8), (C8, {α12, α16, α17}, C9) >. Unsatisfiable concepts can be divided into two types as follows. complement clash: C is a complement clash concept if it is a subclass of both class A and the complement of class A. For example: α1 : C1 v ∀t1.A1 u ∃t1.¬A1. Then C1 is a complement clash root concept. cardinality clash: C is a cardinality clash concept if the at-least restriction is bigger than the at-most restriction in its definition. For example: α2 : C2 ≡≥ 2.t2u ≤ 1.t2. Then C2 is a cardinality clash root concept.

Unsatisfiable concept generator (Algorithm 2) creates the satisfiable concepts by constructing clash sequences.

Algorithm 2: unsatGenerator(unsatnum,Ms,iMd )

inputs:unsatnum: number of unsatisfiable concept

Ms: Mips size; iMd : increasement of Mips depth output: U : unsatisfiable concept set; Mips(T ): the Mips of T 01 U = ∅, Mips(T ) = ∅, k = 0, len = 0; 02 constructor = randomSelect(UnsatConceptConstructor;) 03 construct a pair of clsh sequences : {Seq+, Seq+} 04 D0 ← Seq+., D00 ← Seq−; 05 I(Ck) : Ck = intersectionOf (D0, D00); 06 CR.add(Ck), Mips.add(I(Ck)), k++, len++; 07 while(k ≤ Ms) 08 len=len+iMd; 09 construct a pair of clsh sequences : {Seq+, Seq+} 10 D0 ← Seq+, D00 ← Seq−; 11 for(i=j=1; j < len; i++,j=j+2) 12 Sx,y ← (SatAtomSet,SatRoleSet,someValues,allValues); 13 . 14 II((0DDii0)) :: DDii0 ==. iinntteerrsseeccttiioonnOOff ((DDii0−−11,, SSyx));; 15 Mips.add(I(Di), I(0Di0)); 16 I(Ck) : Ck =. intersectionOf (Di, Di0), CR.add(Ck), Mips.add(I(Ck)); 17 U .add(CR), Mips(T ).add(Mips), k++; 18 num = unsatnum − k; 19 while(m ≤ num) 20 Cr ←randomSelect (CR); 21 Sz ←. (SatAtomSet,SatRoleSet,someValues,allValues); 22 Ck = intersectionOf (Cr, Sz); 23 U .add(Ck),m + +; 24 return U, M ips(T ) Theorem 1 The unions of clash sequences of independent unsatisfiable concepts are the Mips of TBox.

Proof. By Definition 1(Incoherent TBox), we have that a TBox T is incoherent if and only if there is a concept name in T which is unsatisfiable. Therefore, according to Definition 3(Mips), we can prove Theorem 1 based on two points: One concept is unsatisfiable in the union of contradiction sequences.

And the concept is satisfiable in every subset of the union of contradiction sequences.

We prove the first point. Let Ck be a satisfiable concept, According to the unsatGenerator algorithm, Ck is created by Ck v Di u Di0, where Di v Di−1 and Di0 v Di0−1 . Similarly, Di−1 v Di−2, · · · , D2 v D1 and Di0−1 v Di0−2, · · · , D20 v D10. The corresponding clash sequences are: < (D1, I1, D2), (D2, I2, D3), · · · , (Di, Ii, Ck) >, where Ii = Ii ∪ Ii−1. < (D10, I10, D20), (D20, I20, D30), · · · , (Di0, Ii0, Ck) >, where Ii0 = Ii0 ∪ Ii0−1.

D1 and D10 have the form either D1 ≡ A, D10 ≡ ¬A or D1 ≡≥ mt, D10 ≡≤ nt(m > n, and t is a role name). this implies that Ck v D1 and Ck v D10, i.e.

Ck v A u ¬A or Ck v≥ mtu ≤ nt(m > n). Therefore, Ck is unsatisfiable in T 0 = Ii ∪ Ii0, i.e. Ck is unsatisfiable in the union of clash sequences.

Next, we prove the second point. Let T 00 be the every subset of T 0 after removing any one axiom αj from Ii ∪ Ii0. If αj occurs in the Seq + of Ck, we have that Di u Di−1, Di−1 v Di−2, · · · , αj : Dj v Dj−1, · · · , D2 v D1. Removing αj is equivalent to removing Dj v Dj−1 from the Seq+ of Ck, so Di is not the subset of D1. If αj occurs in the Seq − of Ck, we have that Di0 uDi0−1, Di0−1 v Di0−2, · · · , αj : Dj0 v Dj0−1, · · · , D20 v D10. Removing αj is equivalent to removing Dj0 v Dj0−1 from the Seq− of Ck, so Di0 is not the subset of D10. We know Ck v Di u D0, so i Ck is not the subset of both D1 and D10. Therefore, Ck is satisfiable in T 00, i.e. Ck is satisfiable in every subset of the union of clash sequences. 5

Evaluation with MipsBM

The MipsBM experiments demonstrate how to evaluate the performances of reasoners for Calculating Mips. Pellet 2.3.1 1, HermiT 1.3.8 2, FaCT++ 1.6.2 3, JFact 1.0.0 4 and TrOWL 1.4 5 are the five most widely-used description logics reasoners used in our experiments. The tests are performed on a PC (Intel(R) Core(TM) CPU 3.40Ghz) with 4 GB RAM. Our performance measure is the run time (in seconds) to calculate Mips.

From Figure 2, we can conclude from the second experiment that TBox size plays a significant influence on the performances of different reasoners. Therefore, the following evaluations can be viewed from two aspects: small scale TBox (the number of concepts m = 2000) and large scale TBox (m = 20000). 1 http://clarkparsia.com/pellet 2 http://www.hermit-reasoner.com/ 3 http://code.google.com/p/factplusplus/ 4 http://sourceforge.net/projects/jfact/ 5 http://trowl.eu/

In the case of large scale TBox In the case of small scale TBox Fig. 4. performance evaluations for reasoners about complexity metrics

After the evaluation experiments, we give a further analysis from two perspectives.

What makes an incoherent TBox difficult to calculate Mips? In order to answer this question, we consider the impact of construction parameters on structure complexity of incoherent terminology. A large number of satisfiable concepts mean a large size of TBox, Reasoners have to take a lot of time to perform satisfiability checking, so the run time becomes longer. There are many relevance relations between one concept and others if the concept depth is large, as the number of semantic clusters increases, the number of semantic dependencies between the concepts will grow significantly. The Mips size corresponds to the scale of minimal conflict axiom set, our reasoners need to find the minimal conflict axiom set of the incoherent TBox, thus the size of semantic dependency is strictly determined by the Mips depth. According to Definition 9, the clash sequences of unsatisfiable concepts correspond to the increase of Mips depth, the larger the depth is, the longer the clash sequences are, therefore, a larger value of the increase of Mips depth leads to a higher complex of incoherent TBox.

Which is the most appropriate reasoner to solve Mips problem? Because of the differences of optimization approaches, the five reasoners have different performances for the same benchmark test data. When the number reaches 8000, Pellet is faster than FaCT++, when reaches 14000, TrOWL is faster than FaCT++, and when reaches 18000, HermiT performs better than FaCT++. In the process of consistency checking, HermiT uses the anywhere blocking technique to limit the sizes of models which are constructed, so it has an advantage over ABox. Unfortunately, the ontology test data generated by our MipsBM only consists of TBox, thus the advantages haven’t been fully fulfilled. Our experiments show that timeout is the main reason to cause the failures of JFact, especially for a large inputs, It is because JFact takes longer to load the TBox than others. In the case of large scale TBox, JFact fails to resolve the Mips problems when the number of clusters increases beyond 80 in the fourth experiment. 6

Conclusion and future work

This paper presents a benchmark to generate different complicated terminologies to evaluate the performances of description logics reasoners for calculating Mips. Our purpose is to find out the reasons which result in the difficulty and high cost of ontology debugging. Experiments show that the six construction parameters can fully reflect the complexity of incoherent TBox.

As for future work, we plan to improve our benchmark under realistic semantic web conditions to evaluate reasoners by using realistic TBox data, and focus on different ontology reasoning and debugging algorithms to evaluate their completeness and correctness by using our extended benchmark.

Sirin

Evren ,

Parsia

Bijan , et al., Pellet:A practical OWL-DL reasoner, Web Semantics: science , services and agents on the World Wide Web , 2007 , 5 ( 2 ): 51 - 53 .

Rob

Shearer , Boris Motik, and Ian Horrocks, HermiT: A highly-efficient owl reasoner , in: Proceedings of the 5th International Workshop on OWL: Experiences and Directions . Karlsruhe, Germany. 2008 .

Dmitry

Tsarkov and Ian Horrocks, FaCT++ Description Logic Reasoner: System Description , in: Proceddings of Third International Joint Conference on Automated Reasoning . Seattle, WA, USA, 2006 , pp. 292 - 297 .

4. Edward Thomas,

Jeff Z.

Pan , Yuan Ren, TrOWL: Tractable OWL 2 Reasoning Infrastructure, in: Proceedings of 7th Extended Semantic Web Conference . Heraklion, Crete, Greece, 2010 , pp. 431 - 435 .

Yuanbo

Guo , Zhengxiang Pan, and Jeff Heflin, LUBM: A benchmark for OWL knowledge base systems , Web Semantics: Science, Services and Agents on the World Wide Web , 2005 , 3 ( 2 ): 158 - 182 .

6. Li

, Yang

Yang

, Zhaoming Qiu , et al., Towards a complete owl ontology benchmark , in: Proceedings of the 3rd European Semantic Web Conference (ESWC) , Budva, Montenegro, June, 2006 , pp. 125 - 139 .

7. Yingjie

, Yang Yu and Jeff Hefli, A multi-ontology synthetic benchmark for the semantic web , in: Proceedings of the 1st International Workshop on Evaluation of Semantic Technologies , Shanghai, China. 2010 .

Alfio

Ferrara ,

Stefano

Montanelli , et al., Benchmarking matching applications on the semantic web , The Semanic Web: Research and Applications , Springer Berlin Heidelberg, 2011 , pp. 108 - 122 .

Christian

Meilicke , Raul Garca-Castro , et al., MultiFarm: A benchmark for multilingual ontology matching , Web Semantics: Science, Services and Agents on the World Wide Web , 2012 , 15 : 62 - 68 .

10. Maria

Rosoiu

, Cassia Trojahn Dos Santos , et al., Ontology matching benchmarks: generation and evaluation , in: Proceedings of the 6th ISWC workshop on ontology matching (OM) , 2011 , pp. 73 - 84 .

11. Zhengxiang

Pan

, Benchmarking DL Reasoners Using Realistic Ontologies , in: Proceedings of the OWLED05 Workshop on OWL: Experiences and Directions , Galway, Ireland, November, 2005 , 188 .

12. Tom Gardiner, Ian Horrocks and Dmitry Tsarkov, Automated benchmarking of description logic reasoners , in: Proceedings of the 19th International Workshop on Description Logics , Windermere, Lake District, UK, May, 2006 , pp. 167 - 174 .

13. Jurgen

Bock

Peter

Haase , et al., Benchmarking

OWL

reasoners, in: Proceedings of the ARea 2008 Workshop , Tenerife, Spain, June, 2008 .

14. Samantha

Bail

, Bijan Parsia, Ulrike Sattler, JustBench: a framework for OWL benchmarking , in: Proceedings of the 9th International Semantic Web Conference (ISWC) , Shanghai, China, November, 2010 , pp. 32 - 47 .

15. Yong-Bin

Kang

, Yuan-Fang

Shonali

Krishnaswamy , Predicting reasoning performance using ontology metrics , in: Proceedings of the 11th International Semantic Web Conference (ISWC) , Boston, MA, USA, November, 2012 , pp. 198 - 214 .

16. Ji

Qiu

, Gao Zhiqiang,

Huang

Zhisheng , et al., Measuring effectiveness of ontology debugging systems , Knowledge-Based Systems , 2014 , 71 : 169 - 186 .

17. Kathrin

Dentler

Ronald

Cornet , et al., Comparison of reasoners for large ontologies in the OWL 2 EL profile , Semantic Web , 2011 , 2 ( 2 ): 71 - 87 .

18. Stefan

Schlobach

, Zhisheng Huang,

Ronald

Cornet , et al., Debugging incoherent terminologies , Journal of Automated Reasoning , 2007 , 39 ( 3 ): 317 - 349 .

19. Aditya

Kalyanpur

, Debugging and repair of owl ontologies , Washington DC, American: The University of Maryland, 2006 .