=Paper= {{Paper |id=Vol-1287/lanmr2014_paper_8 |storemode=property |title=Identification of Ontological Relations Using Formal Concept Analysis |pdfUrl=https://ceur-ws.org/Vol-1287/lanmr2014_paper_8.pdf |volume=Vol-1287 |dblpUrl=https://dblp.org/rec/conf/lanmr/VidalPMSA14 }} ==Identification of Ontological Relations Using Formal Concept Analysis== https://ceur-ws.org/Vol-1287/lanmr2014_paper_8.pdf
          Identification of ontological relations using
                   Formal Concept Analysis ?

    Mireya Tovar1,2 , David Pinto2 , Azucena Montes1,3 , Gabriel González1 , and
                                 Darnes Vilariño2
    1
        Centro Nacional de Investigación y Desarrollo Tecnológico (CENIDET), Mexico
        2
          Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla,
          Mexico, 3 Engineering Institute, Universidad Nacional Autónoma de Mexico.
                         {mtovar,amontes,gabriel}@cenidet.edu.mx
                                 {dpinto,darnes}@cs.buap.mx



           Abstract. In this paper we present an approach for the automatic iden-
           tification of relations in ontologies of restricted domain. We use the evi-
           dence found in a corpus associated to the same domain of the ontology
           for determining the validity of the ontological relations. Our approach
           employs formal concept analysis, a method used for the analysis of data,
           but in this case used for relations discovery in a corpus of restricted do-
           main. The approach uses two variants for filling the incidence matrix that
           this method employs. The formal concepts are used for evaluating the
           ontological relations of the target ontology. The performance obtained
           was about 96% for taxonomic relations and 100% for non-taxonomic
           relations.

           Keywords: Formal concept analysis, ontology evaluation, ontological
           relations


1        Introduction

There is a huge amount of information that is uploaded every day to the World
Wide Web, thus arising the need for automatic tools able to understand the
meaning of such information. However, one of the central problems of construct-
ing such tools is that this information remains unstructured nowadays, despite
the effort of different communities for giving a semantic sense to the World
Wide Web. In fact, the Semantic Web research direction attempts to tackle this
problem by incorporating semantic to the web data, so that it can be processed
directly or indirectly by machines in order to transform it into a data network
[10]. For this purpose, it has been proposed to use knowledge structures such
as “ontologies” for giving semantic and structure to unstructured data. An on-
tology, from the computer science perspective, is “an explicit specification of a
conceptualization”’ [3].
?
    This work has been partially supported by CONACYT and PROMEP grants:
    CONACYT 54371, PROMEP/103.5/12/4962 BUAP-792.
2       Tovar et. al.

    Ontologies can be divided into four main categories, according to their gener-
alization levels: generic ontologies, representation ontologies, domain ontologies,
and application ontologies. Domain ontologies, or ontologies of restricted do-
main, specify the knowledge for a particular type of domain, for example: med-
ical, tourism, finance, artificial intelligence, etc. An ontology typically includes
the following components: classes, instances, attributes, relations, constraints,
rules, events and axioms.
   In this paper we are interested in the process of discovering and evaluating
ontological relations, thus, we focus our attention on the following two types:
taxonomic relations and/or non-taxonomic relations. The first type of relations
are normally referred as relations of the type “is-a” (hypernym/hyponymy or
subsumption).
    There are plenty of research works in literature that addresses the problem
of automatic construction of ontologies. The major of those works evaluate man-
ually created ontologies by using a gold standard, which in fact, it is supposed
to be manufactured by an expert. By using this approach, it is assumed that
the expert has created the ontology in a correct way, however, there is not a
guarantee of such thing. Thus, we consider very important to investigate a man-
ner to automatically evaluate the quality of this kind of resources, which are
continuously been used in the framework of the semantic web.
    Our approach attempts to find evidence of the relations to be evaluated in a
reference corpus (associated to the same domain of the ontology) using formal
concept analysis. To our knowledge, this topic has nearly been studied from the
formal concept analysis point of view. In [2], for example, it is presented an ap-
proach for the automatic acquisition of taxonomies from text in two domains:
tourism and finance. They use different measures for weighting the contribution
of each attribute (such as conditional probability and pointwise mutual informa-
tion (PMI)).
    In [6] are presented two experiments for building taxonomies automatically.
In the first experiment, the attribute set includes a group of sememes obtained
from the HowNet lexicon, whereas in the second the attributes are a basically
set of context verbs obtained from a large-scale corpus; all this for building an
ontology (taxonomy) of the Information Technology (IT) domain. They use five
experts of IT for evaluating the results of the system, reporting a 43.2% of correct
answers for the first experiment, and 56.2% of correct answers for the second
one.
    Hele-Mai Haav [4] presents an approach to semi-automatic ontology extrac-
tion and design by usign Formal Concept Analysis combined with a rule-based
language, such as Horn clauses, for taxonomic relations. The attributes are noun-
phrases of a domain-specific text describing a given entity. The non-taxonomic
relations are defined by means of predicates and rules using Horn clauses.
   In [13] it is presented an approach to derive relevance of “events” from an
ontology of the event domain. The ontology of events is constructed using Formal
Concept Analysis. The event terms are mapped into objects, and the name
        Identification of ontological relations using Formal Concept Analysis      3

entities into attributes. These terms and entities were recovered from an corpus
in order to build the incidence matrix.
     From the point of view of the evaluation of the ontology, some of the works
mentioned above perform an evaluation by means of gold standard ([2]) in order
to determine the level of overlapping between the ontology that has been built
automatically and the manually constructed ontology (called gold standard).
     Another approach for evaluating ontologies is by means of human experts as
it is presented in [6].
     In our approach we used a typed dependency parser for determining the verb
of a given sentence, which is associated to the ontological concepts of a triple from
which the relation component require to be validated. The ontological concepts
together with their associated verbs are introduced, by means of an incidence
matrix, to Formal Concept Analysis (FCA) system. The FCA method allow
us to find evidence of the ontological relation to be validated by searching the
semantic implicit in the data. In order to validate our approach, we employ a
manual evaluation process by means of human experts.
     The remaining of this paper is structured as follows: Section 2 describes
more into detail the theory of formal concept analysis. In section 3 we present
the approach proposed in this paper. Section 4 shows and discusses the results
obtained by the presented approach. Finally, in Section 5 the findings and the
future work are given.


2     Formal Concept Analysis

Formal Concept Analysis (FCA) is a method of data analysis that describes re-
lations between a particular set of objects and a particular set of attributes [1].
It was introduced by Rudolf Wille in 1992 [11] as an area of research based on a
model of set theory to concepts and concept hierarchies. It also allows data anal-
ysis methods for the formal representation of conceptual knowledge. FCA pro-
duces two kinds of output from the input data: a concept lattice and a collection
of attribute implications. The concept lattice is a collection of formal concepts
of the data, which are hierarchically ordered by a subconcept-superconcept re-
lation. The attribute implication describes a valid dependency in the data. FCA
can be seen as a conceptual clustering technique that provides intentional de-
scriptions for abstract concepts. From a philosophical point of view, a concept
is a unit of thoughts made up of two parts: the extension and the intension [12].
The extension covers all objects or entities beloging to this concept, whereas the
intension comprises all the attributes or properties valid for all those objects.
    FCA is based in the set theory, and it proposes a formal representation of
conceptual knowledge [11]. FCA begins with the primitive idea of a context
defined as a triple (G, M, I), where G and M are sets, I is a binary relation
between G and M (I is the incidence of the context); the elements of G and M
are named objects and attributes, respectively.
    For A ⊆ G, A0 = {m ∈ M |∀g ∈ A : (g, m) ∈ I}, and dually, for B ⊆ M ,
B 0 = {g ∈ G|∀m ∈ B : (g, m) ∈ I}
4        Tovar et. al.

   A0 is the set of all attributes common to the objects of A, B 0 is the set of all
objects that have all attributes in B.
   A formal concept is defined as [2]: A pair (A, B) is a formal concept of
(G, M, I) iff A ⊆ G, B ⊆ M , A0 = B and A = B 0 .
   In other words, (A, B) is a formal concept if the attribute set shared by the
objects of A are identical with those of B; and A is the set of all the objects
that have all attributes in B. A is the extension, and B is the intension of the
formal concept (A, B). The formal concepts of a given context are ordered by
the relation of subconcept - superconcept definided by:
                    (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1 )
    FCA is a tool applied to various problems such as: hierarchical taxonomies,
information retrieval, data mining, etc., [1]. In this case, we use this tool for
identifying ontological relations of restricted domain.

3     Approach for evaluating semantic relations
We employ the theory of FCA to automatically identify ontological relations in a
corpus of restricted domain. The approach considers two variants in the selection
of properties or attributes for building the incidence matrix that is used by the
FCA method for obtaining the formal concepts.
    The difference between the two variants is the type of syntactic dependencies
parser used in the preprocessing phase for getting the properties.
    The first variant uses the minipar tagger [7], whereas the second variant
employs the Stanford tagger [8]. For each variant, we selected manually a set of
dependency relations in order to extract verbs from each sentence of the corpus
that contains an ontology concept. These verbs are then used as properties or
attributes in the incidence matrix.
    The Stanford dependencies are triples containing the name of the relation, the
governor and the dependent. Examples of these triples are shown in Table 1. For
the purpose of our research, from each triple we have selected the governor (p=1),
the dependent (p=2) or both (p=1,2) as attributes of the incidence matrix.
    In the case of the minipar parser, we use the pattern C:i:V for recovering
the verbs of the sentence. The grammatical categories that made up the pattern
follows: C is a clause, I is an inflectional phrase, and V is a verb or verbal phrase.
Some examples of triples recovered from the sentences are shown in Table 2.
    The approach proposed in this paper involves the following three phases:
 1. Pre-processing stage. The reference corpus is split into sentences, and all
    the information (ontology and the sentences) are normalized. In this case,
    we use the TreeTagger PoS tagger for obtaining the lemmas [9]. An infor-
    mation retrieval system is employed for filtering those sentences containing
    information referring to the concepts extracted from the ontology. The onto-
    logical relations are also extracted from the ontology1 . Thereafter, we apply
1
    We used Jena for extracting concepts and ontological relations (http://jena.
    apache.org/)
        Identification of ontological relations using Formal Concept Analysis    5




  Table 1. Dependency relations obtained using the Stanford dependency parser

Relation name p Meaning                            Example
nsubj         1 Nominal subject                    nsubj(specialized, research)
prep          1 Prepositional modifier             prep into(divided, subfields)
root          2 Root of the sentence               root(ROOT, give)
acomp         1 Adjectival complement              acomp(considered, feasible)
advcl         1,2 Adverbial clause modifier        advcl(need, provide)
agent         1 Agent complement of a passive verb agent(simulated, machine)
aux           1,2 Auxiliar verb                    aux(talked, can)
auxpass       1,2 Passive auxiliar                 auxpass(used, is)
cop           1,2 Copula                           cop(funded, is)
csubj         2 Clausal subject                    csubj(said, having)
csubjpass     1,2 Clausal passive subject          csubjpass(activated, assuming)
dobj          1 Direct object of a verbal phrase   dobj(create, system)
expl          1 Expletive                          expl(are, there)
iobj          1 Indirect object                    iobj(allows, agent)
nsubjpass     1 Passive nominal subject            nsubjpass(embedded, agent)
parataxis     2 Parataxis                          parataxis(Scientist, said)
pcomp         2 Prepositional complement           pcomp(allow, make)
prepc         1 Prepositional clausal modifier     prepc like(learning, clustering)
prt           1,2 Phrasal verb particle            prt(find, out)
tmod          1 Temporal modifier                  tmod(take, years)
vmod          2 Reduced non-finite verbal modifier vmod(structure, containing)




                 Table 2. Triples obtained by the Minipar parser


                                 Triples
                                 fin C:i:VBE be
                                 inf C:i:V make
                                 fin C:i:V function
6        Tovar et. al.

   the syntactic dependency parser for each sentence associated to the ontol-
   ogy concepts. In order to extract the verbs from these sentences, we use the
   patterns shown in Table 3 for each syntactic dependency parser, and each
   type of ontological relation.
   By using this information together with the ontology concepts, we construct
   the incidence matrix that feed the FCA system.
2. Identification of ontological relations. The concepts that made up the triple
   in which the ontological relation is present are searched in the formal con-
   cepts list obtained by the FCA system2 [5]. The approach assigns a value of
   1 (one) if the pair of concepts of the ontological relation exists in the formal
   concept, otherwise it assigns a zero value. We consider the selection criteria
   shown in the third column of Table 3 for each type of ontological relation.
   As can be seen, in the Stanford approach we have tested three different
   selection criteria based on the type of verbs to be used.
3. Evaluation. Our approach provides a score for evaluating the ontology by
   using the accuracy formulae: Accuracy(ontology) = |S(R)|  |R| , where |S(R)| is
   the total number of relations from which our approach considers that exist
   evidence in the reference corpus, and |R| is the number of semantic relations
   in the ontology to be evaluated. For measuring this approach, we compare
   the results obtained by our approach with respect to the results obtained by
   human experts.


               Table 3. Patterns or relation name used by each variant

     Variant Pattern or relation name      Type of selection      Type of relation
     minipar C:i:V                         All verbs recovered    taxonomic, non-
                                                                  taxonomic
     stanford1 root, cop                  Only the verbs to be taxonomic
                                          and include
     stanford2 nsubj, prep, root, dobj, All verbs recovered non-taxonomic
               acomp, advcl, agent, aux,
               auxpass, cop, csubj, csub-
               jpass, dobj, expl, iobj,
               cop, nsubjpass, parataxis,
               pcomp, prepc, prt, tmod,
               vmod
     stanford3 nsubj, prep, root, dobj, Only        the     verbs non-taxonomic
               acomp, advcl, agent, aux, present in the onto-
               auxpass, cop, csubj, csub- logical relations
               jpass, dobj, expl, iobj,
               cop, nsubjpass, parataxis,
               pcomp, prepc, prt, tmod,
               vmod



2
    We used the sequential version of FCALGS: http://fcalgs.sourceforge.net/
         Identification of ontological relations using Formal Concept Analysis   7

4     Experimental results
In this section we present the results obtained in the experiments carried out.
Firstly, we present the datasets, the results obtained by the approach aforemen-
tioned follow; finally, the discussion of these results are given.

4.1    Dataset
We have employed an ontology of the Artificial Intelligence (AI) domain3 [14] for
the experiments executed. In Table 4 we present the number of concepts (C),
taxonomic relations (T R) and non-taxonomic relations (N T ) of the ontology
evaluated. The characteristics of its reference corpus are also given in the same
Table: number of documents (D), number of tokens (T ), vocabulary dimension-
ality (V ), and the number of sentences filtered (O) by the information retrieval
system (S).

                                 Table 4. Datasets

                    Domain Ontology    Reference corpus
                           C T R NT D       T    V O S
                    AI    276 205 61 8 11,370 1,510 475 415




4.2    Obtained results
As we mentioned above, we validated the ontology relations by means of human
expert’s judges. This manual evaluation was carried out in order to determine
the performance of our approach, and consequently, the quality of the ontology.
    Table 5 shows the results obtained by the approach presented in this paper
when the AI ontology is evaluated. We used the accuracy criterion for deter-
mining the quality of the taxonomic relations. The first column presents two
variants for identifying the taxonomic relations. The last three columns indicate
the quality of the system prediction according to three different human experts
(E1 , E2 and E3 ). The second column shows the quality obtained by the approach
for each type of variant.
    Table 6 shows the results obtained by the approach when the non-taxonomic
relations are evaluated.
    The results presented here were obtained with a subset of sentences associ-
ated to the ontological relations because of the great effort needed for manually
evaluate their validity. Therefore, in order to have a complete evaluation of the
two type of ontological relations, we have calculated their accuracy, but in this
case considering all the sentences associated to the relations to be evaluated.
Table 7 shows the variantes used for evaluating the ontological relations and the
accuracy assigned to each type of relation (Accuracy).
3
    The ontology together with its reference corpus can be downloaded from
    http://azouaq.athabascau.ca/goldstandards.htm
8       Tovar et. al.

Table 5. Accuracy of the AI ontology, and quality of the system prediction for taxo-
nomic relations

         Variation Accuracy Quality(E1 ) Quality(E2 ) Quality(E3 ) Average
         minipar     0.96      0.90         0.85         0.94       0.90
         stanford1 0.61        0.57         0.56         0.60       0.58

Table 6. Accuracy of the AI ontology and quality of the system prediction for non-
taxonomic relations

         Variation Accuracy Quality(E1 ) Quality(E2 ) Quality(E3 ) Average
         minipar     0.93      0.81         0.86         0.89       0.85
         stanford2 0.97        0.87         0.92         0.95       0.92
         stanford3 0.92        0.83         0.90         0.90       0.88

                        Table 7. Accuracy given to the AI ontology

                           Relation type Variante Accuracy
                           Taxonomic     minipar    96.59%
                                         stanford1 73.17%
                           Non-taxonomic minipar    95.08%
                                         stanford2 100.00%
                                         stanford3 96.72%



    As can be seen, the approach obtained a better accuracy for non-taxonomic
relations than for taxonomic ones. This result is obtained because the approach is
able to associate the verbs that exist in both, the relation and the domain corpus,
by means of the FCA method. Therefore, when non-taxonomic relations are
evaluated, the approach has more opportunity to find evidence of their validity.


5   Conclusions

In this paper we have presented an approach based on FCA for the evaluation
of ontological relations. Two types of variants for constructing the incidence
matrix were employed. The Stanford variant was more accurate than the minipar
one; actually, the minipar variant obtained a good accuracy for the two types
of relations evaluated (taxonomic and non-taxonomic), whereas the Stanford
variant obtained the best results for the non-taxonomic relations. The minipar
variant, on the other hand, is quite fast in comparison with the Stanford one.
    According to the results presented above, the approach obtains the global
approach presented in this paper obtained an accuracy an accuracy of 96% for
taxonomic relations, and 100% for non-taxonomic relations. This result shows,
in some way, the quality of the ontology. These results should be seen in terms of
the ability of our system for evaluating ontological relations. As future work, we
will analyze the reasons for which the approach does not detect all the taxonomic
relations.
         Identification of ontological relations using Formal Concept Analysis       9

References
 1. Belohlávek, R.: Introduction to formal context analysis. Tech. rep., Dept of Com-
    puter science. Palacḱ y University, Olomouk, Czech Republic. (2008)
 2. Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora
    using formal concept analysis. J. Artif. Int. Res. 24(1), 305–339 (Aug 2005)
 3. Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge
    Sharing. In: Guarino, N., Poli, R. (eds.) Formal Ontology in Conceptual Analy-
    sis and Knowledge Representation. Kluwer Academic Publishers, Deventer, The
    Netherlands (1993)
 4. Haav, H.M.: A semi-automatic method to ontology design by using fca. In: Snsel,
    V., Belohlvek, R. (eds.) CLA. CEUR Workshop Proceedings, vol. 110. CEUR-
    WS.org (2004)
 5. Krajca, P., Outrata, J., Vychodil, V.: Parallel recursive algorithm for FCA. In:
    Proceedings of the Sixth International Conference on Concept Lattices and Their
    Applications. vol. 433, pp. 71–82. CEUR-WS.org, Olomouc (2008)
 6. Li, S., Lu, Q., Li, W.: Experiments of ontology construction with formal concept
    analysis. In: ren Huang, C., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A.,
    Prevot, L. (eds.) Ontology and the Lexicon, pp. 81–97. Cambridge University Press
    (2010), cambridge Books Online
 7. Lin, D.: Dependency-based evaluation of minipar. In: Proc. Workshop on the Eval-
    uation of Parsing Systems. Granada (1998)
 8. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency
    parses from phrase structure trees. In: LREC (2006)
 9. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceed-
    ings of the International Conference on New Methods in Language Processing.
    Manchester, UK (1994)
10. Solı́s, S.: La Web Semántica. Lulu Enterprises Incorporated (2007)
11. Wille, R.: Concept lattices and conceptual knowledge systems. Computers & Math-
    ematics with Applications 23(69), 493 – 515 (1992)
12. Wolf, E.K.: A first course in formal concept analysis. In: SoftStat’93 Advances in
    Statistical Software 4. vol. 429, pp. 429–438. Faulbaumr
13. Xu, W., Li, W., Wu, M., Li, W., Yuan, C.: Deriving event relevance from the ontol-
    ogy constructed with formal concept analysis. In: Computational Linguistics and
    Intelligent Text Processing, 7th International Conference, CICLing 2006. Lecture
    Notes in Computer Science, Springer (2006)
14. Zouaq, A., Gasevic, D., Hatala, M.: Linguistic patterns for information extraction
    in ontocmaps. In: Blomqvist, E., Gangemi, A., Hammar, K., del Carmen Suárez-
    Figueroa, M. (eds.) WOP. CEUR Workshop Proceedings, vol. 929. CEUR-WS.org
    (2012)