Towards ONTO6 Framework for Concept Elicitation Uldis Straujums University of Latvia, Faculty of Computing, Raina bulvaris 19, Riga, LV-1586, Latvia Abstract The article proposes an approach to simplify a process of identifying significant concepts for a given domain. The author describes his ONTO6 methodology based on a semi-informal meta-ontology. The stages of applying the ONTO6 methodology are: the development of a meta-ontology instance appropriate for the domain to be informatized; the development of an initial ontology from the meta-ontology instance; and the gradual detailing of domain concepts that appear in the initial ontology – the development of an enriched initial ontology. The transition of ONTO6 methodology to ONTO6 framework by usage of tools – LVTagger, Cellfie – is demonstrated. Keywords 1 Ontology learning, domain-specific modeling, text analysis tool 1. Introduction The article proposes an approach to simplify a process of identifying significant concepts for a given domain. The author expands on his previous research in the specific field of informatization [1, 2]. Informatization is understood as an analysis of the business processes, the specification of requirements and the development of software. In his previous research the author had proposed a unified description of methods and suggestions to identify the essential concepts of the domain to be informatized to introduce notations for the various levels of detail, and to specify details for the informatization aspects. The author’s approach helped overcome the difficulties observed during implementation of several informatization projects and to implement several improvements:  The development of a unified understanding about the domain to be informatized, particularly about the essential concepts and their interpretation  The introduction of a suitable notation for various aspects of informatization which are necessary for users involved in the project and appropriate for different levels of competence  The proposal of a general methodology for performing informatization. The proposed ONTO6 methodology appears to be expandable to other domains with several enhancements. Namely, several tools have to be added to make the process of supplying source information more convenient for the user. Firstly, a tool for entering information into the instance of the metaontology for the particular domain has to be developed. Secondly, an API has to be specified and implemented to allow a fine-tuning of the essential concepts elicitation according to the particular domain. In the article, the author gives the specifications of needed enhancements of ONTO6 methodology and describes the current state of their implementation. Thus, the original informatization-specific ONTO methodology is transformed into a ONTO6 framework suitable for the essential concept elicitation for a given domain. Baltic DB&IS 2022 Doctoral Consortium and Forum, July 03-06, 2022, Riga, LATVIA EMAIL: uldis.straujums@lu.lv (U. Straujums) ORCID: 0000-0002-2212-5435 (U. Straujums) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Roots of the ONTO6 methodology The need to identify and record significant concepts for a given domain is well accepted. Scientists have developed several knowledge representation approaches – controlled vocabularies, thesauri, classification schemes, taxonomies, topic maps, frame languages, logical theories and meta-models. All of these approaches, as well as many others, form the basis of so-called ontologies. 2.1. Concept of ontology The concept of ontology was formally defined by Thomas Gruber in a Stanford University publication [3], in which he redefines the concept of ontology as generally applied in philosophy. The definition of ontology as formulated by Thomas Gruber is: “An ontology is the explicit specification of a conceptualisation for a domain”. Over time the concept of ontology has been defined more precisely [4]: “An ontology defines a set of representational primitives with which to model a domain of knowledge or discourse”. 2.2. Formal ontologies As pointed out by Ruth Wilson [5], the differences lie in the possibilities of describing terms and to define relationships between them. These differences at the formalization level represent an ontology spectrum. A meta-model is a clearly defined model of the domain of interest, comprising concepts and rules that are essential for the construction of specific models. Each meta-model is an ontology, but it is a far richer notion – it can be used as a set of building blocks and rules, as a model for the domain of interest and as an instance of another model. 2.3. Semiformal ontologies At one end of the ontology spectrum is a controlled vocabulary – a list of enumerated terms. Ideally, each term should have only one meaning. In practice, however, terms are accorded different meanings in different domains. If several terms have one and the same meaning, then one term is selected as the preferred one, while the others are classified as synonyms or aliases. With controlled vocabularies, more advanced ontologies can be constructed. For example, a thesaurus is built up by adding associative relationships to the controlled vocabulary. Frame languages have the ability to express the properties, logical constraints and detailed relationships of terms. Ontologies can be used to express and analyze taxonomical relationships as suggested by Christopher Welty and Nicola Guarino [6]. 2.4. Ontology as an understanding Ontologies can be depicted in several ways, but the presentation must be suitable for the target user. Furthermore, they must be capable of adapting to users from different backgrounds and abilities (desire) to get a grasp of formal constructions. These requirements mean that the developer of the ontology and its users must come to a common understanding regarding the level that the user can comprehend. Thomas Gruber, the inventor of the concept of ontology, has advocated this approach with exceptional clarity in the 2004 publication “Semantic Web & Informations Systems” [7]. 2.5. Ontology clusters A domain-specific ontology is usually built by a team of several people who have diverse skills within the framework of the particular domain. This approach has been described by researchers Pepijn R. S. Visser and Valentina A. M. Tamma [8], who recommend taking advantage of individual team members, who have mutually complementary knowledge about the concepts of the particular domain. A hierarchal ontology is created with an application-specific ontology at the root. The definitions of the terms in this application-ontology are derived from an existing top level ontology, which the abovementioned authors have chosen to be the English language lexicographical data base Wordnet [9]. The new ontology cluster is a derived ontology that defines new concepts using those concepts already defined in the upper ontology. 3. Methodologies of ontologies development Methodologies in the development of ontologies reflect the formal background of the ontology developer. 3.1. Logical theories The specification can be developed in the form of a logical theory that describes the intended meaning. Nicola Guarino [10] implements this principle, “An ontology is a logical theory accounting for the intended meaning of a formal vocabulary, i.e. its ontological commitment to a particular conceptualization of the world. The intended models of a logical language using such a vocabulary are constrained by its ontological commitment. An ontology indirectly reflects this commitment (and the underlying conceptualization) by approximating these intended models." 3.2. Linguistic relativism A specification can be developed using the concept of linguistic relativism, an approach suggested by Boris Wyssusek [11]. It is understood that the concept of linguistic expression is not uniquely definable since separate elements can have different interpretations. The criterion for the adoption of an interpretation is its correspondence to the real world. A common understanding of the language needs to be attained, such common understanding is a prerequisite for a stable interpretation of the language. Abel Browarnik and Oded Maimon [12] propose models for ontology learning based on linguistic knowledge and existing, wide coverage syntactical, lexical and semantic resources – ASIUM, Text-To- Onto, TextStorm/Clouds, Syndicate, OntoLearn, CRCTOL and OntoGain. 3.3. Analysis of taxonomical relations Concepts unified through taxonomy are analyzed according to their meta-characteristics: identity, rigidity, unity, dependence, thereby revealing more readily the intended meaning of the taxonomical relations. This is the course followed by Christopher Welty and Nicola Guarino [6]. 3.4. Methodologies specific to information systems Typical concepts of information systems are formalized: the system, the subsystem, unification. Yair Wand and Ron Weber [13] use the formal model to confirm whether the system is properly divided into components. If an information system can be regarded as a branch of science, then it can be analysed with a methodology that looks at several processes important to development: inclination, learning, influence of culture, consolidation, as shown in the work of Brian O’Donovan and Dewald Roode [14]. For the analysis of an operating system with an existing descriptive ontology, Peter Green and Michael Roseman suggest changing the ontology into an ER-based meta-model. The meta-model then permits the form of the central concept of the ontology to be determined – function, activity or thing [15]. Researcher Mauri Leppänen [16] proposes the following methodology for the analysis of the output of information systems – “A system of perspectives is composed of five perspectives. These are the systelogical perspective, the infological perspective, the conceptual perspective, the datalogical perspective, and the physical perspective.” Māris Treimanis [17] recommends an aspect-oriented approach when structuring the output of an information system. For the building of a taxonomy for modeling method requirements researchers Dimitris Karagiannis, Patrik Burzynski, Wilfrid Utz, Robert Andrei Buchmann [18] propose a metamodel CoChaCo (Concept-Characteristic-Connector) for representation and management of modeling methods, including an evaluation protocol. 3.5. Methodological applications Ontologies are developed using various means and differ in the way they depict the world. Standards in ontology are necessary for the regulation of the following:  What should be included in an ontology  What are the basic categories and entities  How are the entities depicted taking into account the knowledge level of the prospective user. A great variety of backgrounds is needed for the development of ontologies. The author’s aim is to develop a methodology for the user who is an expert in the problem-domain albeit without any special knowledge in formalized engineering knowledge systems. The term “methodology” here is rather ambiguous. There are several definitions of methodology. The author has chosen the definition: methodology – an assorted coordinated succession of techniques or methods that constitute a general system theory or prescribes how thought-intensive activity is to be achieved [19]. This definition of methodology has been chosen for the author’s approach, i.e., a succession of techniques or methods has been developed that defines the thought-intensive activities for the concept elicitation. Methods comprising the methodology consist of procedures, which have been proposed for the creation of a knowledge model, for the acquisition of a conceptual scheme from a knowledge model, and for the detailed description of the aspects of the given domain. Techniques comprising the methodology include processes that have been developed for the application of methodology methods, as well as recommendations for completing the stages of the methodology – the elicitation of knowledge in the development of the knowledge model, the derivation of a conceptual schema, and the choice of aspect level of detail. The author’s ONTO6 methodology is developed for the user who is an expert in the problem-domain without any special knowledge in formalized engineering knowledge systems. 4. ONTO6 methodology The development of the ONTO6 methodology was influenced by a “6W” approach based on six questions, which, it seems, was first mentioned by the Greek rhetorician Hermagoras already in the year 1 B. C. [20]. The 6W approach can be considered as a means of obtaining essential information by asking the questions - What, Where, When, How, Why, Who. The author has named his methodology ONTO6, a name that was chosen not only because ontology is used to define a knowledge model, but also because the development of the ontology was influenced by the 6W approach. The 6W approach has been adapted to the organization of business knowledge [21], the depiction of business structures [22], [23], journalism, police work [24], the organization of brain-storming sessions [25], the sphere of architectonic design [26], user modeling [27], the planning of information systems [17], [28], but it is not known to be used in the area of informatization. The ONTO6 methodology makes use of the 6W framework: What, Where, When, How, Why, Who. It is aimed at identifying concepts, determining the interaction between objects corresponding to those concepts and determining the functionality of the objects. The ONTO6 methodology is based on a semi-informal meta-ontology. The stages of applying the ONTO6 methodology are:  the development of a meta-ontology instance appropriate for the domain to be informatized  the development of an initial ontology from the meta-ontology instance; and  the gradual detailing of domain concepts that appear in the initial ontology – the development of an enriched initial ontology. The end result is an ontology cluster, comprising a meta-ontology, a meta-ontology instance, an initial ontology, and an enriched initial ontology. The ontology cluster is examined for its comprehensibility and its suitability for the domain, thus obtaining answers to several questions of competence. To achieve a sufficiently general methodology, one that can be applied to the conceptualization of diverse domains to be informatized, a base structure has been incorporated into the methodology as well as a process for obtaining a useful model of the conceptualization of a particular domain from the base structure. This base structure in the ONTO6 methodology is a knowledge model that contains the meta-concept – aspect space. Aspect space describes all possible aspects of the domain to be informatized by grouping them into subsets. For a given aspect set A = {a1, a2,.,ai,.. an}, where i = 1 to n, where n is a natural number and ai is an aspect of the domain to be informatized, the aspect space (A) is the set of all the subsets of the aspect set A (power-set). Therefore any element of the aspect space is a subset of an aspect set. The aspect space remains constant for any domain to be informatized, however a suitable aspect space element must be allocated to the domain. From the knowledge model a usable model for the particular domain to be informatized can be derived. ONTO6 knowledge model is built (see Figure 1). Figure 1 shows a set of knowledge of various types of the domain to be informatized, namely, aspect subset, sub-aspects, concept instances. The relations among them are determined by procedures elicitating sub-aspects and concept instances from the textual information on the domain. Figure 1: ONTO6 knowledge model In order to obtain the model for the domain to be informatized, a procedure is applied to the knowledge model for determining the aspect subset (an instance of the aspect space) – the frequence of terms corresponding to a particular aspect is calculated, least frequent aspects are not included in the aspect space. A subjectively chosen threshold value is used for determining the essential aspects; a procedure for adding sub-aspect class instances is developed using text morphological analysis. In line with the six question approach [20], the aspect set, A, is chosen to be A = {What, Where, When, How, Why, Who}. It is proposed to depict the concepts of the knowledge model in the language OWL with classes. For example, the term “Who” is shown as follows in the syntax of OWL RDF/XML: Who The meta-concept "Aspect space" is depicted as a class of classes with restrictions on the class elements. In OWL RDF/XML syntax this appears as: This is the power-set. The relationship is shown as a property of the object or data type. For example, the relationship "characterisedBy" is shown in the syntax of the language OWL RDF/XML as a property of the object "Infodomain" as follows: The relationship between the concepts of the knowledge model is depicted in an ontology, which is referred to as a meta-ontology because it contains the meta-concept “Aspect space”, whose instance is the concept “Aspect subset”. Meta-ontology is an essential tool of the ONTO6 methodology. The ONTO6 methodology prescribes the development of a meta-ontology instance in conformance with the domain to be informatized, the development of an initial ontology from the meta-ontology instance and the enrichment of the initial ontology in subsequent informatization. The initial ontology does not change during informatization process. A visualization of the ONTO6 meta-ontology can be built (see Figure 2). The circles denote the possible aspects, while the arrows show possible relationships between the aspects. In meta-ontology aspect space instances, some of the arrows between the aspects as well as some aspects themselves may be absent along with the arrows. Figure 2: ONTO6 Meta-Ontology Highest level Simplified Visualization Author’s ONTO6 methodology was successfully applied to several domains to be informatized including the Latvian Education Informatization System (LIIS). It became clear as a result applying the ONTO6 methodology that the LIIS domain has only two essential aspects – Where and What. Therefore LIIS can be added to that class of domain to be informatized, which has as its subspace instance the aspect subset {Who, What}. In Figure 3 the essential LIIS domain concepts can be shown: education content, teaching, management, schools, ministry ,society, etc. . Figure 3: The essential LIIS domain concepts The ontologies gained as a result of the ONTO6 methodology stages provide answers to questions of competency formulated by necessity in the development of the methodology:  what are the essential concepts in the given problem domain? (the meta-ontology instance contains only the essential aspects – Who, Where)  what are the relevant sub-concepts of the essential concepts? (the meta-ontology instance contains some sub-aspects of the essential aspects)  which aspects of informatization must be examined in more detail? (the initial ontology includes the sub-aspect instances – Abox elements – schools, school boards, ministries, society, educational content, training, administration, infrastructure, information services)  what kind of functionality is inherent (desired) in the specific aspect? (refined ontologies and visualizations agreed with the user describe in detail the desired functionality)  which problem domains are similar to the given domain? (it is natural to consider as similar those domains which have the same essential aspects as the LIIS domain). With ONTO6 methodology it is possible to find essential concepts for different domains. 5. From methodology to framework Some constraints of ONTO6 methodology usage are: a fixed algorithm for concept elicitation, tiresome manual work to add ontology class instances (individuals) into an ontology, manual comparison of results with expert results. The author has looked at several tools which could help at concept elicitation, namely, tools for finding word patterns – AntCone, WordSmith Tools, #LanesBox, SCP, corpkit, TextStat and LVTagger. Author has decided to use the Latvian language text analysis tool LVTagger developed by Peteris Paikens [29] because the fine-tuning of the essential concepts elicitation according to the particular domain can be easy accomplished using LVTagger. The author has decided to use the Cellfie Plugin for Protégé 5 [30] for automatically entering class instances into an ontology for a particular domain. 5.1. Concept elicitation with LVTagger As an input for LVTagger a text relevant to a particular domain could be given. As an output an information of text morphological analysis is produced. Several output formats are supported: CONLL- X, tab-delimited columns. The author has applied the LVTagger for an annotated word list creation in the CONLL-X format from a large text document describing a state level project for informatization (see Figure 4). Figure 4: LVTagger usage for CONLL-X format world list creation The output of LVTagger serves as an input for process adding ontology class instances to an ontology. The adding of instances is done with the Cellfie Plugin for Protégé ontology editor. 5.2. Adding individuals to a class with Cellfie plugin The list of individuals created by LVTagger can be easily converted to an Excel spreadsheet. An Excel spreadsheet can be given to the Protégé plugin Cellfie. The author has added indivividuals to the LIIS ontology with Cellfie (see Figure 5). Figure 5: Adding individuals to an ontology with Cellfie plugin 6. Conclusion The ONTO6 methodology has proven to be useful in situations where a compact view is desired of a complicated domain. It has shown itself to be well-suited to the development of a unified user understanding of the domain and for the creation of a description of the essential domain characteristics. The ONTO6 framework will serve as a convenient way to apply the ONTO6 methodology. 7. References [1] U. Straujums, Conceptualising Informatization with the ONTO6 Methodology, in: volume 733 of Acta Universitatis Latviensis. Computer Science and Information Technologies, University of Latvia, Riga, 2008, pp.241-260. [2] U.Straujums, ONTO6 Methodology, Ph.D.Thesis, University of Latvia, Riga, 2010. [3] T. R. Gruber, Toward Principles for the Design of Ontologies Used for Knowledge Sharing. KSL- 93-04, Knowledge Systems Laboratory, Stanford University, 1993. [4] T. Gruber, Ontology. Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.), Springer-Verlag, 2009. URL: http://tomgruber.org/writing/ontology-definition-2007.htm. [5] R. Wilson, The Role of Ontologies in Teaching and Learning. JISC Technology and Standards Watch Report TSW0402, 2004. [6] Ch. Welty, N. Guarino, “Supporting ontological analysis of taxonomic relationships.” Data and Knowledge Engineering 39(1), (2001): 51-74. [7] T. Gruber, “Interview Tom Gruber.” AIS SIGSEMIS Bulletin 1(3), (2004): 4-8. [8] P. R. S. Visser, V. A. M. Tamma, An experience with ontology clustering for information integration, in: Proceedings of the IJCAI-99 Workshop on Intelligent Information, Stockholm, Sweden, 1999. [9] Wordnet: An Electronic Lexical Database. Ed. by Christiane Fellbaum. Bradford Books, 1998. [10] N. Guarino, Formal Ontology and Information Systems, in: N.Guarino (ed). Formal Ontology and Information Systems. Proceedings of FOIS’98, IOS Press, Amsterdam, Trento, Italy, 1998. pp. 3- 15. [11] B. Wyssusek, Ontology and Ontologies in Information Systems Analysis and Design: A Critique, in: Proceedings of the Tenth Americas Conference on Information Systems, 2004, pp. 4303-4308. [12] A. Browarnik, O. Maimon, Ontology Learning from Text, in: ALLDATA 2015: The First International Conference on Big Data, Small Data, Linked Data and Open Data, Barcelona, Spain, 2015. [13] Y. Wand, R. Weber. “An Ontological Model of an Information System.” IEEE Transactions on Software Engineering 16(11), (1990): 1282-1292. [14] B. O’Donovan, D. Roode. “A framework for understanding the emerging discipline of information systems.” Information Technology & People 15(1), 2002: 26-41. [15] P. Green, M. Rosemann, Ontological Analysis of Business Systems Analysis Techniques, Business Systems Analysis with Ontologies. UQ Business School, Australia; Queensland University of Technology, Idea Group Publishing, Australia, 2005. [16] M. Leppänen, An Ontological Framework and a Methodical Skeleton for Method Engineering. Helsinki, 2005. [17] M. Treimanis, ISTechnology – Technology Based Approach to Information system Development, in: Proceedings of the Third International Baltic Workshop “Databases and Information Systems”, vol. 2, Riga, 1998, pp. 76-90. [18] D. Karagiannis, P. Burzynski, W. Utz, R. A. Buchmann, A Metamodeling Approach to Support the Engineering of Modeling Method Requirements, in: 2019 IEEE 27th International Requirements Engineering Conference (RE), Jeju Island, Korea (South), 2019, pp. 199-210. [19] IEEE Standard Glossary of Software Engineering Terminology. IEEE Computer Society. IEEE Std 610.121990, New York, 2002. [20] D. W. Robertson, Jr., “A Note on the Classical Origin of 'Circumstances' in the Medieval Confessional.” Studies in Philology 43(1), (1946): 6-14. [21] Organizing Business Knowledge: The MIT Process Handbook. /Ed. Malone, Thomas W., Crowston, Kevin, Herman, George A. MIT Press, 2003. [22] J. F. Sowa, J. A. Zachman, “Extending and formalizing the framework for information systems architecture.” IBM System Journal 31(3), (1992): 590-616. [23] John Zachman, A. Framework2. The Concise Definition, 2008. [24] SixWs. Online Encyclopedia Wikipedia, 2022. URL: http://en.wikipedia.org/wiki/Six_Ws. [25] Mindtools. Starbusting template, 2022. URL: http://www.mindtools.com/pages/article/newCT_91.htm. [26] Ju-Hung Lan, A Preliminary Study of Knowledge Management in Collaborative Architectural Design, in: CAADRIA2004, Seoul, Korea, 2004, pp. 35-47. [27] M. Yudelson, T. Gavrilova, P. Brusilovsky, Towards User Modeling Meta-Ontology, in: UM2005, LNAI 3538, Edinburgh, UK, 2005, pp.448-452. [28] J. Iljins, M. Treimanis, From Organization Business Model to Information System: One approach and Lessons Learned, in: 19th International Conference on Information Systems. Prague, Czech Republic, 2010. [29] P. Paikens, Latvian morphological tagger, 2022. URL: https://github.com/PeterisP/LVTagger. [30] Cellfie Plugin, 2022. URL: https://github.com/protegeproject/cellfie-plugin/wiki.