=Paper=
{{Paper
|id=None
|storemode=property
|title=A Study of Parallel Enumerative Structures for Ontology Building and Enrichment
|pdfUrl=https://ceur-ws.org/Vol-674/Paper55.pdf
|volume=Vol-674
|dblpUrl=https://dblp.org/rec/conf/ekaw/KamelR10
}}
==A Study of Parallel Enumerative Structures for Ontology Building and Enrichment ==
Ontology Building Using Parallel Enumerative Structures
Mouna Kamel Bernard Rothenburger
Institut de Recherche en Informatique de Toulouse Institut de Recherche en Informatique de Toulouse
(IRIT) – CNRS – UPS, (IRIT) – CNRS – UPS
118, Route de Narbonne, 31062 Toulouse, France 118, Route de Narbonne 31062 Toulouse, France
(+33) 5 61 55 83 38 (+33) 5 61 55 83 38
kamel@irit.fr rothenburger@irit.fr
Under IAU definitions, in the Solar System and in order of
increasing distance from the Sun, there are eight planets:
ABSTRACT
The semantics of a text is carried by both the natural language it • four terrestrials:
contains and its layout. As ontology building processes have so - Mercury,
far taken only plain text into consideration, our aim is to elicit its - Venus,
textual structure. We focus here on parallel enumerative structures - Earth,
because they bear implicit or explicit hierarchical relations, they - Mars.
have salient visual properties, and they are frequently found in • four gas giants:
corpora. We have defined a process which identifies them in a - Jupiter,
text, translates them into ontology structures and finally links such - Saturn,
structures to the concepts of an existing ontology. We have
- Uranus,
assessed this process on Wikipedia encyclopaedic articles as they
- Neptune.
are rich in definitions and statements, and contain many
enumerations. The many ontology structures we have obtained are
Example 1 : a structure which carries ontological knowledge
thus used to enrich an ontology which we had automatically built
from database specification documents.
Under IAU definitions, there are eight planets in the Solar System.
In order of increasing distance from the Sun, they are the four
Categories and Subject Descriptors terrestrials, Mercury, Venus, Earth, and Mars, then the four gas
I.2.7 Natural Language Processing - Text analysis, I.2.6 Learning giants, Jupiter, Saturn, Uranus, and Neptune.
- Knowledge acquisition Example 2 : a sentential representation of the example 1
General Terms
Algorithms, Documentation, Languages
Keywords
Ontology building and enrichment from text, layout analysis, NLP
tools.
1. MOTIVATION Figure 1. Conceptual network corresponding to the meaning of
Many approaches have been suggested for the construction, examples 1 and 2
enrichment or population of ontology from text. They are based However for layout structure analysis (example 1), different parts
on lexical, syntactical, semantic or rhetorical aspects of natural of the knowledge are more easily identifiable thanks to lexical or
language. They encompass machine learning [1], specific natural typo-dispositional marks. We claim that it becomes thus easier to
language processing tools [2], or combination of both [3]. These identify in an automated way the corresponding conceptual
methods are usually applied on plain texts. However, a large network. The above meaning-bearing layouts allow a
variety of layouts or structures can be found in the visual straightforward identification of ontological relations: often
presentation of a text with a diversity of interpretations for each of hyperonymy, sometimes meronymy, and occasionally other
them [4]. Some of them implicitly carry ontological knowledge as relations.
shown in example 1. The meaning carried by this structure may be
expressed through the sentence in example 2. In both cases, a We focus here on a specific kind of meaning-bearing layout that
human being may easily deduce the conceptual framework we call parallel enumerative structures (PES). Example 1 is
presented in figure 1. typical of such a layout. These structures present some regularities
and appear very frequently. Their analysis could be a relevant
In the case of sentence analysis (example 2), the automatic contribution to improve knowledge elicitation and modelling from
deduction by a Natural Language Processing (NLP) tool of its text. Moreover, it would provide new triggers for the
formal counterpart is a very tricky issue which will necessitate to identification of new concepts or semantic relations, therefore
carry out non trivial tasks such as the resolution of anaphora or enabling to go beyond the classical ontology learning approaches
the design of sophisticated multi-sentence textual patterns. which only consider the plain text.
2. TRANSLATION PROCESS whether the enumeration is parallel, (3) identifying the father
An enumeration is a set of items with or without semantic concept and the nature of the semantic relation, (4) extracting the
relations between them. An item is a co-enumerated entity which child concepts from each item and (5) building an ontological
can be discernable by typographic, dispositional and/or lexico- structure. This fifth step is based on annotations produced over
syntactic marks. And a parallel enumeration is a paradigmatic the four previous steps.
enumeration (i.e. all items are functionally equivalent, textually or
syntactically), visually homogeneous (i.e. all items are visually
3. APPLICATION
Wikipedia documents are encyclopaedic and contain a lot of
equivalent) and isolated (i.e. no item is linked to any textual unit
definitional statements and properties. Furthermore, articles are
which is out of the enumeration). An introductory phrase,
written according to a comprehensive set of editorial and
hereafter called primer, is a phrase or a sentence which introduces
structural guidelines. Actually it thus advocates the writing of
an enumeration, and which is identifiable by lexico-syntactic
PES. The experiment reported in this paper concerns the
and/or typo-dispositional marks. Finally, let us call parallel
enrichment of an existing ontology which is a frame of reference
enumerative structure (PES) a vertical textual structure composed
used to localise information relating to urbanism, environment
of a primer and a parallel enumeration.
and territorial organisations. It contains both geographical and
real-world concepts. This ontology has 728 concepts. We then
There are a number of diseases and conditions affecting the obtain 182 disambiguated pages which contain at least one PES
(according our criteria). From these 182 articles we exploit 276
gastrointestinal system, including:
PES which allowed to enrich our ontology with 349 new concepts
Item Marker 1) Cholera
primer and 201 instances which were considered as relevant by experts
item 2) Colorectal cancer and knowledge engineers involved in the building of this
3) Diverticulitis enumeration
ontology.
4. FUTURE WORKS
Enumerative structure In the short-term, our idea is to combine our approach with the
Figure 2. Composition of an enumerative structure usual ontology learning from text ones. For example, in order to
better take advantage of Wikipedia’s articles, it would seem
Broadly speaking, the idea is to translate a PES into a single
interesting to complete the approach of Herbelot et al. [5], which
ontology structure (i.e. one or two-level hierarchy) according to
exploits plain text only. We also plan to exploit redirect links and
the following principles: (1) the primer contains one father
homonym pages to maximise the number of relevant articles. On
concept and one semantic relation which links this father concept
the other hand we want to improve the analysis of enumerative
to concepts contained in the items, (2) each item contains one
structures by going beyond simple parsing, particularly regarding
child concept semantically related to the father concept of the
the primer. Authors may use complex grammatical constructions
primer, (3) all child concepts will be considered as belonging to
or linguistic variations in their writing, even within the
the same conceptual level. An example of this correspondence is
enumerative structures. We then face problems of anaphora
the structure obtained in Figure 1 from the example 1.
resolution, ellipses, apposition, extraposition and rhetorical forms,
The syntactic structure of the primer helps to identify the father etc. Also, discourse analysis must be carried out to process non-
concept and the semantic relation it contains. We have parallel enumerative structures.
characterized 3 cases:
The primer is not syntactically correct. 5. REFERENCES
[1] Nédellec, C., Nazarenko, A.: Ontology and Information
- The primer could be composed of a noun phrase. This noun Extraction. in S. Staab & R. Studer (eds.) Handbook on
phrase represents the father concept and the semantic relation is Ontologies in Information Systems, Springer (2003)
the relation is-a.
[2] Giuliano, C., Lavelli, A., Romano, L.: Exploiting Shallow
- The primer ends with a verb phrase at the active form. The Linguistic Information for Relation Extraction from
semantic class to which this verb belongs reflects the nature of the Biomedical Literature. In Proc. EACL (2006)
relation and the father concept corresponds to the main term of [3] Giovannetti, E., Marchi, S., Montemagni, S.: Combining
the noun phrase which is the subject of this verb. Statistical Techniques and Lexico-syntactic Patterns for
The primer is complete. It contains a lexical unit taken from a Semantic Relation Extraction from Text. Fifth workshop on
gazetteer or a number which specifies the number of items. The Semantic Web Applications and Perspectives, FA0-UN,
concept father is the term which co-occurs with this lexical Roma, Italy (2008).
marker, and the relation is the relation is-a. [4] Virbel, J., Luc, C.: Le modèle d'architecture textuelle:
fondements et expérimentation. Verbum, Vol. XXIII, N. 1, p.
The primer is syntactically correct and not complete. The
103-123 (2001)
father concept may be found in the subject noun phrase or in the
object noun phrase of the main clause and may be eventually [5] Herbelot, A., Copestake, A., 2006: Acquiring ontological
detected thanks to heuristics. The relation is the relation is-a. relationships from Wikipedia using RMRS. In: Proceedings
of the International Semantic Web Conference 2006.
Our method consists in (1) identifying each enumerative structure Workshop on Web Content Mining with Human Language
and its different components (primer and items), (2) checking Technologies, Athens, GA (2006).