A Semi-automatic Method to Ontology Design by Using
                        FCA
   A Semi-automatic Method to Ontology Design
                    by Using FCA
                                       Hele-Mai Haav

                  Institute of Cybernetics at Tallinn
                                       Hele-Mai       University of Technology
                                                   Haav
                            Akadeemia tee 21, 12618 Tallinn, Estonia
               Institute of Cyberneticshelemai@cs.ioc.ee
                                          at Tallinn University of Technology
                          Akadeemia tee 21, 12618 Tallinn, Estonia
                                     helemai@cs.ioc.ee

       Abstract. Ontology design is a complex and time-consuming process. It is
       extremely difficult for human experts to discover ontology from given data or
       texts. This paper presents a semi-automatic method for ontology extraction and
       design. The method is based on Formal Concept Analysis and a Horn clause
       model of a concept lattice. Inputs to the technique are domain-specific texts or
       data. After transformations, resulting domain-specific ontology is represented
       as a set of rules and facts according to Horn clause model of concept lattice
       based ontology representation. Ontology designer is given this initial ontology
       expression for further extension by adding concepts and relationships (part-of,
       related to, etc) by using a rule language based on Horn clauses. Validation of
       ontology is done by logical inference.


1 Introduction

    Manual construction and description of domain-specific ontology is a complex and
time-consuming process. The recent study on ontology design methodologies shows
that it is very hard for a designer to develop accurate and consistent ontology [10].
    Therefore, a number of approaches propose to improve ontology construction
using automatic discovery of taxonomic and non-taxonomic relationships from
domain data or domain-specific texts [1, 4, 8, 9, 12]. These approaches can be
classified to two groups: most of the approaches provide methods for automatic
discovery of taxonomic relationships and only few of them deal with learning
ontological relationships between concepts. Unfortunately, in the approaches
available, there is a lack of combination of the two methods, because methods for
learning ontological relationships rely to a given initial taxonomy of concepts and use
it in learning process [8, 11].
    The goal of this paper is to present a new approach to semi-automatic ontology
extraction and design by using Formal Concept Analysis (FCA) [2] combined with a
rule-based language. Our approach provides tools for automatic or semi-automatic
extraction of taxonomy of concepts from domain-specific texts, automatic
transformation this initial ontology to Horn clause language and a rule language for
expression of non-taxonomic relationships. Validation of ontology is done by logical
inference.


   c V. Snášel, R. Bělohlávek (Eds.): CLA 2004, pp. 13–24, ISBN 80-248-0597-9.
     VŠB – Technical University of Ostrava, Dept. of Computer Science, 2004.
  14      Hele-Mai Haav


   Our approach is applicable in many application domains, where domain specific
ontologies can be extracted from web catalogs, product catalogs, domain specific
dictionaries and texts etc.
   The rest of the paper is structured as follows. Section 2 briefly overviews related
works. Two basic notions used in our approach: concept lattice based ontology
expression and its first-order logic model are introduced in sections 3 and 4. General
framework of proposed approach is presented in section 5. Section 6 concludes the
work.


2 Related Works

   There are several attempts on ontology learning and ontology extraction [1, 7, 8, 9,
11, 12]. For example, in KRAFT [9], local ontology is extracted from shared
ontology. In the system Infosleuth [12] ontologies are constructed semi-automatically
from textual databases. They use ontology-learning approach, where human experts
provide a set of initial words denoting concepts from high-level ontology. In [8]
discovering non-taxonomic relationships from texts using shallow text processing
methods is presented. Their technique is integrated into ontology learning tool
TextToOnto. The method presented in [11] uses manually built initial ontology to
build conceptual hierarchy semi-automatically from texts. Resulting ontology is
presented in Description Logics.
   Similarity of our approach and those discussed above is in using texts as
descriptions of conceptualisation of domain and learning formal ontology from the
given texts.
   To contrast our approach to the research mentioned above, we like to express that
we combine automatic extraction of taxonomic relationships from domain-specific
texts with semi-automatic expression of full ontology (including also non-taxonomic
relationships) in first-order language for further reasoning and search.
   Before starting to present our framework for ontology extraction, we first introduce
two basic notions of our method: concept-lattice based ontology representation and its
first order language model.


3 Concept Lattice Based Ontology Representation

In this section, we define concept lattice based ontology expression as a reduced
concept lattice of a given context K(O,C,R), where O is object set, C is set of
attributes of objects and R is a binary relationship between objects and attributes. Our
definition is based on FCA [2], which is lattice theory applied to the analysis of
hierarchies of concepts. We assume that a reader of this paper is briefly familiar with
this theory. Most important FCA notions used in the paper will be discussed below by
illustrative examples. We defined concept lattice based ontology representation first
in [4]. In this paper we refine this definition to meet requirements of the method
presented.
                 A Semi-automatic Method to Ontology Design by Using FCA            15


3.1 A Brief Introduction to FCA and to Sample Domain

A reader is referred to [2] for detailed knowledge about FCA. In the following we
give a very basic introduction to the main principles of FCA by using examples.
   For example, a context K(O,C,R) of real estate domain can be as shown as in Table
1. The set of objects O is a collection of real estate domain specific texts (ads) about
real estate items denoted by references like A1, A2, etc. in the table 1. The set of
attributes C is the set of noun-phrases chunked from these texts by using NLP. If a
text describing a real estate item contains certain noun-phrase, then the relationship R
holds and we denote it by number 1 in the table below.

                           Table 1. Real estate domain context

   Objects                             Attributes (noun-phrases)
                 Real     Family      Country Summer Blockhouse           Skyscraper
                estate    house        house       house
      A1          1         1
      A2          1         1            1           1
      A3          1                      1
      A4          1                                              1             1
      A5          1                                              1
      A6          1                                  1


A formal concept of the context K(O, C, R) is defined as pair (A, B), where A ⊆ O, B
⊆ C, A´= B and B´=A, where A´ is the set of attributes common to all the objects in A
and B´ is the set of objects having the attributes in B. The extent of the concept (A,
B) is A and its intent is B.
         For concepts (A1, B1) and (A2, B2) in the set S of all concepts of the context
K(O,C,R) we have

                    (A1, B1) ≤ (A2, B2) ⇔ A1 ⊆ A2 ⇔ B1 ⊇ B2.

The relation ≤ is an order on S. It is shown in [2] that (S(K), ≤) is a complete lattice
and this lattice is known as the concept lattice of the context K(O, C, R).
          For example, the concept lattice in Fig. 1 corresponds to the context
presented in the table 1.
   Each node in this lattice (denoted by black circle) is a formal concept. For
example, one of the formal concepts of the context described in Table 1 is
{A2, A6} × {Real estate, Summerhouse}, where the set {A2, A6} is the extent of the
concept and the set {Real estate, Summerhouse} is its intent.
          Sub and super-concept relationships between the formal concepts are
represented by edges in the Hasse diagram in Fig. 1. For example, the formal concept
{A4} × {Real estate, Blockhouse, Skyscraper} is a sub-concept of the concept {A4,
A5} × {Real Estate, Blockhouse}. There are no objects that are defined by all the
attributes, so the extent of the bottom concept is empty.
  16        Hele-Mai Haav


                                             Real estate

                                        A1, A2, A3, A4, A5, A6

                                                     Real estate
          Real estate          Real estate                                   Real estate,
                                                     Country house
          Summerhouse          Blockhouse                                    Family house

                                                     A2, A3
             A2, A6            A4, A5                                        A1, A2

       Real estate, Blockhouse, Skyscraper                 Real Estate, Summerhouse,
                                                           Country house, Family house

                  A4
                                                                      A2


                          Real estate, Summerhouse, Blockhouse,
                          Country house, Family house, Skyscraper


                                             {}


                       Fig. 1. The concept lattice of a real estate domain


3.2 Concept Lattice Based Ontology Expression

   There is redundant information in concept lattice. The two kinds of redundancy can
be eliminated from concept lattice without loosing any information as shown in [3]:
redundant elements in formal concepts intents and redundant objects in formal
concepts extents. Our reduction procedure has 2 steps: elimination of redundant
elements from formal concepts intents and elimination of lattice of extents.
   Elimination of redundant elements. For a pair (A, B), B (intent) will appear in
every descendant. The inherited elements may be eliminated. Let B´ be the set of
elements in B that do not appear in any descendant of B. Then we can consider the
concept lattice, where the nodes contain the pairs (A, B´) instead of pairs (A, B). This
lattice is a result of the first step of our reduction procedure.
   Elimination of lattice of extents. After elimination of redundant elements from
concept intents, we eliminate lattice of extents LO and get as a result a reduced lattice
of intents LCR of formal concepts.
   We call the resulting lattice LCR of reduction procedure as concept lattice based
ontology expression. Fig. 2 shows the lattice LCR of our example.
   Naming concepts. After reducing concept lattice to its intentional part, we need to
give formal concepts the names. Naming in our case can be done as follows:
                  A Semi-automatic Method to Ontology Design by Using FCA             17


     1.  A concept gets a unique name that is the name of the attribute(s) of formal
         concepts, which are left after reduction procedure. For each attribute c, there
         is a most general concept whose intent includes c, this is called attribute
         concept and the name of this concept is the name of corresponding
         attribute(s). Let us recall that attributes of formal concepts indicate domain
         specific concepts in our approach.
    2. After the previous naming procedure, there might be nodes that do not get
         names. In principle, the names for these nodes need to be provided by
         domain expert or ontology designer. It is possible automatically generate
         formal names (e.g. c1, c2…) for those nodes and then ask advice from
         human expert.
  The Fig. 2 depicts real estate domain ontology produced from concept lattice
shown in Fig.1 using reduction and naming procedures.

                                      Real estate


                                               Country house
                           Blockhouse
          Summerhouse                                                 Family house


             Skyscraper                                          C1


                                      C2


                     Fig. 2. Concept lattice based ontology representation

   One may notice that 2 nodes in the lattice did not get names according to the
naming procedure. The nodes denoted by generated concept names C1, C2 do not
have labels in the lattice. Ontology designer may analyse the lattice above and give
the appropriate names to the concepts. It is also interesting that those nodes really
denote new unknown (discovered) concepts, because the domain-specific texts did not
include any noun-phrases for denoting these concepts.
   In principle, the lattice shown in figure 2 is just a complete lattice, which nodes are
labeled by concept names. We call this lattice concept lattice based ontology
representation because it is obtained from full concept lattice that is extracted by FCA
from the given context.
  18      Hele-Mai Haav


4 A Logic Model                  of    Concept        Lattice      Based      Ontology
Representation

In the previous section, we have defined domain ontology as a concept lattice reduced
to its intensional part LCR (see for example Fig. 2). In this section, we provide first
order logic model for LCR. The model was first provided in [5]. At the moment we are
not interested in extensions, which gave birth to full concept lattice for a formal
context of a domain. Nevertheless, existence of full concept lattice gives always an
opportunity to find extent of a given concept. In order to build first order logic model
for concept lattice based ontology expression we need to define mappings from lattice
structure to a first order language.


4.1 Language Constructs

  We use standard syntax for first order logic and define a simple rule language
based on Horn clauses as follows.

  An alphabet of the rule-language is defined as follows:
  1. Set of constants N that consists of the set of concept names C, names of
       properties A, and special names any (lattice top, empty top is always True)
       and nil (lattice bottom, empty bottom is always False).
  2. Set of variable names V. Uppercase letters denote the variables in V.
  3. Set of predicate symbols P

        Terms are either constants or variables. An atom (atomic formula) is a
formula of the form p(t1,...,tn) , where p is a predicate symbol and t1,...,tn are terms. A
formula is called ground if it contains no variables.

   Horn clause (rules) have at most one atom on its head and they are formulas of the
following form:
                   A ← B1, B2,...,Bn ,
                   where B1, B2,...,Bn is conjunction of atoms Bi, i=0,...,n and
universal quantification of variables is assumed.
   An interpretation for the rule-language is defined as a set of ground atoms
constructed from predicate names in P and constants in N. As the language is general,
then different inference engines can be used.


4.2 Transformation of Concept Lattice to Rule Language

The mappings from lattice to rule language are defined as follows.
                 A Semi-automatic Method to Ontology Design by Using FCA            19


Mapping concepts
   Concepts are represented using their names in ontology as constants in rule
language. For example, house, summerhouse etc. If we like to refer to extents, then
concepts can also be represented by predicates like house(X). At the moment, we are
not interested in extents of concepts.

Mapping taxonomic relationships
   The predicate subconcept(Concept1,Concept2) is used to denote that Concept1 is
an immediate subconcept of Concept2. Subconcept predicates are automatically
generated according to the given lattice LCR.
   Predicate isa is used to represent partial order relationship between concepts. For
example, the predicate isa(summerhouse, real-estate) defines partial order relationship
between the concepts summerhouse and real-estate stating that summerhouse isa real-
estate (i.e. summerhouse is subconcept of real-estate ).

Rules for lattice axioms
  As LCR is complete lattice, then the rules for lattice axioms are as follows:

  Reflexivity:
  isa(Concept, Concept)
  Transitivity:
  isa(Concept1, Concept2)←subconcept(Concept1, Concept2)
  isa(Concept1, Concept2) ← isa(Concept1, Concept3), isa(Concept3, Concept2)
  Predicate subconcept denotes that Concept1 is an immediate subconcept of
  Concept2.
  Antisymmetry:
  equal(Concept1,Concept2) ← isa(Concept1,Concept2), isa(Concept2,Concept1)

Rules for lattice operations
   As LCR is a complete lattice, then for each set of concepts, there exists always a
greatest lower bound (glb or greatest common subconcept) and a least upper bound
(lub or a least common superconcept). Lattice meet is used to calculate glb and join is
operation to calculate lub. We define these operations using the following set of rules.

   Lattice meet operation
   c_subconcept(C, C1,…,Ck)←isa(C, C1),…, isa(C, Ck)
   g_c_subconcept(C, C1,…,Ck) ← c_subconcept(C, C1,…,Ck),
                                   c_subconcept(T, C1,…,Ck), isa(T, C)
The predicate c_subconcept(C, C1,…,Ck) means that the concept C is a common
subconcept of the set of concepts {C1,…,Ck}
   The predicate g_c_subconcept(C, C1,…,Ck) means that the concept C is the
greatest common subconcept of the set of concepts {C1,…,Ck}.
   Symmetrically, we define predicates and rules for join operation as follows:
   Lattice join operation
   c_superconcept(C, C1,…,Ck)←isa(C1, C),…, isa(Ck, C)
   l_c_superconcept(C, C1,…,Ck) ← c_superconcept(C, C1,…,Ck),
                                   c_superconcept(T, C1,…,Ck), isa(C, T)
  20      Hele-Mai Haav


Logical model of a given lattice LCR can automatically be generated on the basis of
mappings presented above. This process is demonstrated in the following example.

4.3 Examples about Transformations

  The following set of ground subconcept atoms is automatically generated on the
basis of concept lattice based ontology expression shown in Fig. 2.

  subconcept(familyhouse, real-estate)
  subconcept(summerhouse, real-estate)
  subconcept(countryhouse, real-estate)
  subconcept(blockhouse, real-estate)
  subconcept(c1, countryhouse)
  subconcept(c1, summerhouse)
  subconcept(c1, familyhouse)
  subconcept(skyscraper, blockhouse)
  subconcept(c2, c1)
  subconcept(c2, skyscraper)

   On the basis of this set of atoms, partial order relationships can be derived using
predefined rules from the Horn clause model of concept lattice based ontology
representation. Also, lattice operations can be performed and consistence of ontology
expression can be checked.


5 The Ontology Design Method

The proposed semi-automatic ontology design method is based on the two concepts
introduced in previous sections: concept lattice based ontology representation and its
logic model.

5.1 General Schema of the Method

The method comprises the following steps:
1. Extracting formal context of a domain from domain-specific texts or data
2. Computing initial ontology as the concept lattice from the context using FCA and
    reduction procedures
3. Transforming initial ontology to a set of expressions in first order language
4. Extending initial ontology with additional rules and facts

  General schema of this method is drawn in the Fig. 3 as follows:
                 A Semi-automatic Method to Ontology Design by Using FCA                 21


    Domain specific
    texts or data


                 NLP based
                 context extraction
                                                                          Set of rules
                                                                          describing
                                      Concept                             initial
     Formal           FCA and         lattice      Transformations        ontology
     context          reduction       based
                                      ontology
                                      expression


                                                                          More rules
                                                                          and facts


                                                                          Complete
                                                                          set of rules
                                                        Inference         and facts
                                                                          representing
                                                                          ontology


           Fig. 3. General schema of the semi-automatic ontology design method

  In the following we describe each step of the method in more detail.


5.2 Extraction of a Formal Context

   According to the method, first task is to produce a formal context K(O,C,R) for
extractable domain-specific ontology. For our approach, objects of formal context can
be textual descriptions of domain entities written in some natural language. We
assume that those descriptions use domain specific vocabulary and are rather short.
For example, suitable descriptions can be ads of real estate items in the real estate
web catalogues, descriptions of products in product catalogues, technical descriptions
of components.
   Attributes of an object for FCA are noun-phrases present in the domain-specific
text describing a given domain-specific entity. Binary relationship R between
descriptions (texts) of domain entities and noun phrases is discovered during the NLP
process of text sources. A resulting set of noun phrases together with references to the
domain-specific text sources are stored into the database table, which represents a
context for the application domain in the form of binary relationship between
descriptions of entities and noun phrases.
  22      Hele-Mai Haav


5.3 Computing Initial Ontology

   In our approach, FCA is performed on the database, which stores formal context.
After that the resulting concept lattice is reduced and certain naming procedure is
performed in order to get concept lattice based ontology expression.
   As formal concept lattice (even its reduced form) can be large, then a user should
be given tools for browsing the lattice.
   There exist several algorithms for FCA and construction of line diagrams of
concept lattices. Excellent comparison of performance of those algorithms can be
found in [6].

5.4 Transforming Initial Ontology to Horn Logic

   Next step is to provide means for transforming concept lattice based ontology
expression into Horn logic. This process enables to produce logical expression of
ontology lattice and specify intended semantics of the descriptions in first order logic.
   For that purpose, we introduced Horn logic based formulation of concept lattice
based ontology representation in the section 4 of this paper. Initial ontology is
automatically transformed to a set of facts according to this formulation. The latter
includes also rules for partial order relationships, lattice axioms and lattice operations
in order to reason about ontology. From the logical descriptions inference can
automatically be done using one of automatic theorem provers. As our approach uses
first order language, then it is possible to attach different ontology inference engines
for practical applications by translating ontology expression to any inference engine
rule language. This was one of the reasons behind choosing Horn logic based rule
language.

5.5 Extending Initial Ontology with Additional Rules and Facts

As we have seen, taxonomic relationships between concepts can be automatically
generated from a given lattice based ontology expression using logic-based
formulation of concept lattice. In order to define non-taxonomic relationships the
corresponding groups of predicates and rules are defined.

Properties of concepts
   For defining properties of concepts, the following predicate can be used:
hasproperty(Conceptname, Propertyname).

Inheritance of properties
  Inheritance of properties can be represented by the following rule:
  hasproperty(C1, X)←isa(C1,C2), hasproperty(C2, X)

Ontological relationships
Ontological relationships like part-of, related-to etc can be easily represented via
predicates. The following predicates demonstrate opportunities adding other
ontological relationships:
   partof(C1,C2)
                 A Semi-automatic Method to Ontology Design by Using FCA          23


  related(C1,C2)
  synonyms(C1,C2), etc.

   Additional specific inference rules can easily be added to the set of predefined
lattice based inference rules by ontology designer.
   Non-taxonomic relationships give additional possibilities for ontology
representation and reasoning about ontology.

  5.6 Reasoning about Ontology

   Reasoning is important to ensure the quality of design of ontology. It can be used
to find contradictory concepts, to derive implied relationships, etc.
   Inference rules for lattice axioms and operations can be used to decide taxonomic
relationships between concepts as well as perform lattice operations.
   For example, to find the least common superconcept of the set of concepts
{summerhouse; countryhouse}, we define the following query:
   l_c_superconcept(X, summerhouse, countryhouse).
   The answer is the concept real-estate. If we are interested in finding the greatest
common subconcept of the concepts familyhouse and countryhouse, then the
corresponding query is as follows: g_c_subconcept(X, familyhouse, countryhouse).
The answer is the concept c1.
   We may be interested in all the superconcepts of the concept skyscraper, for
example. The query isa(skyscraper, X) gives the list of ground atoms as an answer. In
our example, this is the list of the following atoms isa(skyscraper, real-estate) and
isa(skyscraper, blockhouse).
   Inference about non-taxonomic relationships is made possible due to additional
rules.     For       example,       a      designer      may       add    the     fact
hasproperty(blockhouse,no_of_floors). Using inference, we may ask query
hasproperty(X, no_of_floors) and receive an answer that also the fact
hasproperty(skyscraper, no_of_floors) holds.
   Logical inference about ontology expression provides means for designing
consistent and accurate ontologies.


6 Conclusion

We have provided a method to help ontology designer automatically to extract initial
ontology from given set of domain specific texts, to map it automatically to rule
language and to use rule language for adding non-taxonomic relationships to the
ontology representation. Giving means for reasoning about ontology expression
ensures consistency and accuracy of designed ontology.
   Our main contribution resides in extracting the ontology from the NL texts by
using FCA and transforming it to Horn logic.
   Our future work is concerned about evaluation of the proposed method by
developing some prototypical tool. By now, we only made experiments on different
components of the approach.
     24    Hele-Mai Haav


Acknowledgements

   This research was partially funded by Enterprise Estonia funding as a part of RQL
project.


References

1.  Decker, S., Erdmann, M., Fensel, D., and Studer, R.: Ontobroker: Ontology Based Access
    to Distributed and Semi-Structured Information, Meersman R. et al. (eds.): Semantic
    Issues in Multimedia Systems. Proceedings of DS-8. Kluwer Academic Publisher, Boston,
    (1999), 351-369
2. Ganter, B. and Wille, R.: Formal Concept Analysis, Mathematical Foundations, Springer,
    (1999)
3. Godin, R., Missaoui, R. and Alaoui, H.: Learning Algorithms Using a Galois Lattice
    Structure, Proc. of the Third Int. Conference on Tools for Artificial Intelligence, IEEE
    Computer Society Press, CA, (1991), 22-29
4. Haav, H.-M.: Learning Ontologies for Domain-specific Information Retrieval, W.
    Abramowicz (Ed), Knowledge-Based Information Retrieval and Filtering from the Web,
    Kluwer Academic Publishers, (2003), 285-300
5. Haav, H.-M.: Combining FCA and a Logic Language for Ontology Representations, J.
    Barzdins (Ed), Databases and Information Systems, Sixth International Baltic Conference
    on Databases and Information Systems, June 6-9, Riga, 2004, Scientific papres of
    University of Latvia, Vol 672, (2004), 436-451
6. Kuznetsov, S.O. and Obiedkov, S.A.: Comparing performance of Algorithms for
    Generating Concept Lattices, Journal of Experimental & Theoretical Artificial Intelligence
    vol 14(1), (2002), Nos. 2-3, 189-216
7. Maedche, A.: Ontology Learning for the Semantic Web, Kluwer Academic Publishers,
    (2002)
8. Maedche, A. and Staab, S.: Discovering Conceptual Relations from Text. ECAI 2000.
    Proceedingsof the 14th European Conference on Artificial Intelligence, IOS Press,
    Amsterdam, (2000)
9. Preece, A.D., Hui, K.-J., Gray, W.A., Marti, P. et al: The Kraft arhitecture for knowledge
    fusion and transformation, In Proc. of 19th SGES Int. Conference, Springer-Verlag (1999)
10. Tempich, C. and Volz, R.: Towards a benchmark for Semantic Web reasoners-an analysis
    of the DAML ontology library, Sure Y (ed) Proceedings of Workshop of Evaluation of
    Ontology-based Tools (EON 2003) at 2nd Int. Semantic Web Conference (ISWC 2003),
    USA, (2003)
11. Todirascu, A. and Beuvron, F.: Using Description Logics for Ontology Extraction, In:
    Proceedings of Conference TALN 2000, Lausanne, October 16-18, (2000)
12. Woelk, D. and Tomlinson, C.: The Infosleuht project: Intelligent Search Management via
    Semantic Agents. In Second World Wide Web Conference ’94:Mosaic and the Web,
    (1994)