=Paper= {{Paper |id=None |storemode=property |title=A Study of Parallel Enumerative Structures for Ontology Building and Enrichment |pdfUrl=https://ceur-ws.org/Vol-674/Paper55.pdf |volume=Vol-674 |dblpUrl=https://dblp.org/rec/conf/ekaw/KamelR10 }} ==A Study of Parallel Enumerative Structures for Ontology Building and Enrichment == https://ceur-ws.org/Vol-674/Paper55.pdf
  Ontology Building Using Parallel Enumerative Structures
                       Mouna Kamel                                                         Bernard Rothenburger
  Institut de Recherche en Informatique de Toulouse                         Institut de Recherche en Informatique de Toulouse
                 (IRIT) – CNRS – UPS,                                                      (IRIT) – CNRS – UPS
  118, Route de Narbonne, 31062 Toulouse, France                           118, Route de Narbonne 31062 Toulouse, France
                (+33) 5 61 55 83 38                                                       (+33) 5 61 55 83 38
                        kamel@irit.fr                                                         rothenburger@irit.fr
                                                                           Under IAU definitions, in the Solar System and in order of
                                                                           increasing distance from the Sun, there are eight planets:
ABSTRACT
The semantics of a text is carried by both the natural language it            •    four terrestrials:
contains and its layout. As ontology building processes have so                      - Mercury,
far taken only plain text into consideration, our aim is to elicit its               - Venus,
textual structure. We focus here on parallel enumerative structures                  - Earth,
because they bear implicit or explicit hierarchical relations, they                  - Mars.
have salient visual properties, and they are frequently found in              •    four gas giants:
corpora. We have defined a process which identifies them in a                        - Jupiter,
text, translates them into ontology structures and finally links such                - Saturn,
structures to the concepts of an existing ontology. We have
                                                                                     - Uranus,
assessed this process on Wikipedia encyclopaedic articles as they
                                                                                     - Neptune.
are rich in definitions and statements, and contain many
enumerations. The many ontology structures we have obtained are
                                                                           Example 1 : a structure which carries ontological knowledge
thus used to enrich an ontology which we had automatically built
from database specification documents.
                                                                           Under IAU definitions, there are eight planets in the Solar System.
                                                                           In order of increasing distance from the Sun, they are the four
Categories and Subject Descriptors                                         terrestrials, Mercury, Venus, Earth, and Mars, then the four gas
I.2.7 Natural Language Processing - Text analysis, I.2.6 Learning          giants, Jupiter, Saturn, Uranus, and Neptune.
- Knowledge acquisition                                                      Example 2 : a sentential representation of the example 1

General Terms
Algorithms, Documentation, Languages

Keywords
Ontology building and enrichment from text, layout analysis, NLP
tools.

1. MOTIVATION                                                            Figure 1. Conceptual network corresponding to the meaning of
   Many approaches have been suggested for the construction,                                   examples 1 and 2
enrichment or population of ontology from text. They are based           However for layout structure analysis (example 1), different parts
on lexical, syntactical, semantic or rhetorical aspects of natural       of the knowledge are more easily identifiable thanks to lexical or
language. They encompass machine learning [1], specific natural          typo-dispositional marks. We claim that it becomes thus easier to
language processing tools [2], or combination of both [3]. These         identify in an automated way the corresponding conceptual
methods are usually applied on plain texts. However, a large             network. The above meaning-bearing layouts allow a
variety of layouts or structures can be found in the visual              straightforward identification of ontological relations: often
presentation of a text with a diversity of interpretations for each of   hyperonymy, sometimes meronymy, and occasionally other
them [4]. Some of them implicitly carry ontological knowledge as         relations.
shown in example 1. The meaning carried by this structure may be
expressed through the sentence in example 2. In both cases, a            We focus here on a specific kind of meaning-bearing layout that
human being may easily deduce the conceptual framework                   we call parallel enumerative structures (PES). Example 1 is
presented in figure 1.                                                   typical of such a layout. These structures present some regularities
                                                                         and appear very frequently. Their analysis could be a relevant
In the case of sentence analysis (example 2), the automatic              contribution to improve knowledge elicitation and modelling from
deduction by a Natural Language Processing (NLP) tool of its             text. Moreover, it would provide new triggers for the
formal counterpart is a very tricky issue which will necessitate to      identification of new concepts or semantic relations, therefore
carry out non trivial tasks such as the resolution of anaphora or        enabling to go beyond the classical ontology learning approaches
the design of sophisticated multi-sentence textual patterns.             which only consider the plain text.
2. TRANSLATION PROCESS                                                  whether the enumeration is parallel, (3) identifying the father
An enumeration is a set of items with or without semantic               concept and the nature of the semantic relation, (4) extracting the
relations between them. An item is a co-enumerated entity which         child concepts from each item and (5) building an ontological
can be discernable by typographic, dispositional and/or lexico-         structure. This fifth step is based on annotations produced over
syntactic marks. And a parallel enumeration is a paradigmatic           the four previous steps.
enumeration (i.e. all items are functionally equivalent, textually or
syntactically), visually homogeneous (i.e. all items are visually
                                                                        3. APPLICATION
                                                                        Wikipedia documents are encyclopaedic and contain a lot of
equivalent) and isolated (i.e. no item is linked to any textual unit
                                                                        definitional statements and properties. Furthermore, articles are
which is out of the enumeration). An introductory phrase,
                                                                        written according to a comprehensive set of editorial and
hereafter called primer, is a phrase or a sentence which introduces
                                                                        structural guidelines. Actually it thus advocates the writing of
an enumeration, and which is identifiable by lexico-syntactic
                                                                        PES. The experiment reported in this paper concerns the
and/or typo-dispositional marks. Finally, let us call parallel
                                                                        enrichment of an existing ontology which is a frame of reference
enumerative structure (PES) a vertical textual structure composed
                                                                        used to localise information relating to urbanism, environment
of a primer and a parallel enumeration.
                                                                        and territorial organisations. It contains both geographical and
                                                                        real-world concepts. This ontology has 728 concepts. We then
    There are a number of diseases and conditions affecting the         obtain 182 disambiguated pages which contain at least one PES
                                                                        (according our criteria). From these 182 articles we exploit 276
    gastrointestinal system, including:
                                                                        PES which allowed to enrich our ontology with 349 new concepts
   Item Marker            1) Cholera
                                                      primer            and 201 instances which were considered as relevant by experts
          item            2) Colorectal cancer                          and knowledge engineers involved in the building of this
                          3) Diverticulitis        enumeration
                                                                        ontology.

                                                                        4. FUTURE WORKS
                      Enumerative structure                             In the short-term, our idea is to combine our approach with the
      Figure 2. Composition of an enumerative structure                 usual ontology learning from text ones. For example, in order to
                                                                        better take advantage of Wikipedia’s articles, it would seem
Broadly speaking, the idea is to translate a PES into a single
                                                                        interesting to complete the approach of Herbelot et al. [5], which
ontology structure (i.e. one or two-level hierarchy) according to
                                                                        exploits plain text only. We also plan to exploit redirect links and
the following principles: (1) the primer contains one father
                                                                        homonym pages to maximise the number of relevant articles. On
concept and one semantic relation which links this father concept
                                                                        the other hand we want to improve the analysis of enumerative
to concepts contained in the items, (2) each item contains one
                                                                        structures by going beyond simple parsing, particularly regarding
child concept semantically related to the father concept of the
                                                                        the primer. Authors may use complex grammatical constructions
primer, (3) all child concepts will be considered as belonging to
                                                                        or linguistic variations in their writing, even within the
the same conceptual level. An example of this correspondence is
                                                                        enumerative structures. We then face problems of anaphora
the structure obtained in Figure 1 from the example 1.
                                                                        resolution, ellipses, apposition, extraposition and rhetorical forms,
The syntactic structure of the primer helps to identify the father      etc. Also, discourse analysis must be carried out to process non-
concept and the semantic relation it contains. We have                  parallel enumerative structures.
characterized 3 cases:
 The primer is not syntactically correct.                              5. REFERENCES
                                                                        [1] Nédellec, C., Nazarenko, A.: Ontology and Information
- The primer could be composed of a noun phrase. This noun                  Extraction. in S. Staab & R. Studer (eds.) Handbook on
phrase represents the father concept and the semantic relation is           Ontologies in Information Systems, Springer (2003)
the relation is-a.
                                                                        [2] Giuliano, C., Lavelli, A., Romano, L.: Exploiting Shallow
- The primer ends with a verb phrase at the active form. The                Linguistic Information for Relation Extraction from
semantic class to which this verb belongs reflects the nature of the        Biomedical Literature. In Proc. EACL (2006)
relation and the father concept corresponds to the main term of         [3] Giovannetti, E., Marchi, S., Montemagni, S.: Combining
the noun phrase which is the subject of this verb.                          Statistical Techniques and Lexico-syntactic Patterns for
 The primer is complete. It contains a lexical unit taken from a           Semantic Relation Extraction from Text. Fifth workshop on
gazetteer or a number which specifies the number of items. The              Semantic Web Applications and Perspectives, FA0-UN,
concept father is the term which co-occurs with this lexical                Roma, Italy (2008).
marker, and the relation is the relation is-a.                          [4] Virbel, J., Luc, C.: Le modèle d'architecture textuelle:
                                                                            fondements et expérimentation. Verbum, Vol. XXIII, N. 1, p.
 The primer is syntactically correct and not complete. The
                                                                            103-123 (2001)
father concept may be found in the subject noun phrase or in the
object noun phrase of the main clause and may be eventually             [5] Herbelot, A., Copestake, A., 2006: Acquiring ontological
detected thanks to heuristics. The relation is the relation is-a.           relationships from Wikipedia using RMRS. In: Proceedings
                                                                            of the International Semantic Web Conference 2006.
Our method consists in (1) identifying each enumerative structure           Workshop on Web Content Mining with Human Language
and its different components (primer and items), (2) checking               Technologies, Athens, GA (2006).