Semantic Interoperability Issues in the Master’s
    Curriculum in Artificial Intelligence at Sofia University

                           Maria Nisheva-Pavlova[0000-0002-9917-9535]

        Faculty of Mathematics and Informatica, Sofia University St. Kliment Ohridski
                                marian@fmi.uni-sofia.bg


      Abstract. In today’s world of digital transformation and big data the issues related
      to the semantic interoperability of different information systems that support the
      decision making activities in the same or similar areas are becoming increasingly
      important. It is therefore natural for such issues to be in the focus of the education at
      master’s level in a variety of professional fields. The paper discusses the experience
      in this regard of the Master’s program in Artificial Intelligence at the Faculty of
      Mathematics and Informatics at Sofia University, focusing on some good examples
      of student projects.


      Keywords: Semantic Interoperability, Ontology, Ontology Matching, Semantic
      Enrichment, Semantic Search.


1    Introduction
In recent years, information systems have to process huge amounts of
heterogeneous data coming from different sources and in various formats. The
work with big heterogeneous datasets makes it necessary to use proper methods
and tools for data integration and achievement of semantic interoperability of the
respective software systems.
     The requirement for semantic interoperability of two information systems
supposes that each of them will understand the semantics of the information sent
or requested by the other, as well as the semantics of its information sources.
Currently ontologies underlie the only widely accepted paradigm for representing
and managing open knowledge that can be shared and reused in a way that allows
automatic interpretation and inference.
     Therefore, the study of ontology design methodologies and formal methods
for ontology matching as well as acquiring practical skills for using ontologies
in building different types of intelligent software systems is one of the important
goals of the university education at master’s level in the field of Artificial Intel-
ligence.


 Copyright © 2021 for this paper by its authors. Use permitted under
 Creative Commons License Attribution 4.0 International (CC BY 4.0).
2   Overview of the Master’s program in AI
The master’s program in Artificial Intelligence (AI) has been operating successfully
at the Faculty of Mathematics and Informatics at Sofia University for nearly 20
years. Its educational objectives include mastering of deep theoretical knowledge
in the classical and some modern areas of Artificial Intelligence and acquisition of
various practical skills needed for the application of AI methods and techniques
in a wide range of fields of Informatics and Information Technologies. The
curriculum includes courses in fundamentals of Artificial Intelligence, knowledge
modeling and design of knowledge bases, machine learning (in particular deep
learning), information retrieval, data mining and knowledge discovery in large
datasets, natural language processing, image processing and pattern recognition,
embedded and autonomous systems, neural networks and genetic algorithms,
robot control, semantic technologies, recommender systems, legal and ethical
aspects of the development and use of AI systems, etc.
     The successful graduates of the master’s program in AI are able to apply
their knowledge and skills in research and educational organizations, as well as
in leading software companies in the development of, for example:
     • software for data analysis and knowledge discovery in big data;
     • software for semantic web and semantic network services;
     • intelligent search engines;
     • intelligent user interfaces;
     • expert systems, recommender systems, intelligent virtual assistants, intel-
         ligent learning environments and other types of knowledge-based soft-
         ware systems;
     • smart databases;
     • image processing and image recognition tools;
     • different types of intelligent embedded systems: intelligent robots, smart
         home systems, etc.
     The education in the master’s program follows the good methodologi-
cal practices of combining the accumulation of abstract theoretical knowledge
through lectures and providing various opportunities for its understanding and
acquiring skills for its application in real problem situations through workshops,
homework assignments and especially in the development of appropriate course
projects, many of which act as bridges between different subjects.

3   Theoretical aspects – knowledge modeling and reasoning
The curriculum of the master’s program includes two compulsory courses, whose
curricula consistently cover the theoretical foundations and some technological
issues of semantic interoperability.


                                        237
     The Knowledge Representation and Engineering course has mostly abstract
content and introduces the fundamental principles of functioning and the modern
methods for creation of knowledge-based systems (KBS). Following the method-
ology proposed by Brachman and Levesque in [1], the course introduces the basic
principles of functioning and some advanced methods for design and implemen-
tation of KBS. Special attention is paid to the problems of domain analysis and
the conceptualization of domain knowledge. The most important theoretical and
practical aspects of a set of classical and modern methods for knowledge repre-
sentation and reasoning are discussed. Students who have successfully passed the
course in Knowledge Representation and Engineering are expected to be able to
analyze and construct conceptual models of knowledge and to design KBS aimed
to solve complex tasks with various characteristics. The course syllabus covers
the following topics:
     • Key concepts: knowledge, knowledge representation and reasoning.
        Knowledge-based systems. Knowledge engineering;
     • The language of first-order logic (FOL). Syntax, semantics and pragmat-
        ics of FOL;
     • Resolution. Reasoning with Horn clauses;
     • Production rule systems. RETE algorithm;
     • Object-oriented knowledge representation. Frames;
     • Structured description of knowledge. Computing entailments. Taxono-
        mies and classification;
     • Inheritance. Strict inheritance. Strategies for defeasible inheritance;
     • Default reasoning. Closed-world reasoning. Circumscription;
     • Knowledge representation and reasoning with KRL;
     • Concepts and language tools for describing information resources with
        RDF/RDFS;
     • Ontologies – definition, classification, basic characteristics and require-
        ments, applications. Concepts and language tools for describing ontolo-
        gies with OWL.
     As basic readings we recommend the classic textbooks [1 – 3] as well as the
W3C standard recommendations [4 – 6].


                                       238
 Fig. 1. Courses in the Master’s in AI curriculum providing semantic interoperability knowledge
                                        and technical skills.

     The Knowledge Bases course is the second one in the series shown in Fig.
1. It aims to acquaint students with the current state of research and practical
developments in the field of knowledge bases, focusing primarily on the study
of modern tools for creating knowledge bases with different characteristics and
their application in the development of various types of KBS. Issues related to the
basic principles and technologies of the Semantic Web, the semantic interoper-
ability of information systems, the creation of semantic digital libraries, etc. are
also studied. Here are some of the main topics covered by the course syllabus in
the context of semantic interoperability:
     • Semantic web and semantic technologies. Language standards for the Se-
         mantic web;
     • Ontology engineering methodologies. Ontology mapping and merging;
     • Cyc’s knowledge base – the world’s broadest and deepest commonsense
         knowledge base. Cyc’s inference engines;
     • Semantic databases. Tools for creating and using semantic databases;
     • Semantic annotation. Semantic enhancement. Semantic search;
     • Semantic digital libraries.
     The knowledge and skills gained through these compulsory courses are
upgraded by a number of elective courses, among which the most important in
terms of the topic discussed is the Semantic Web one.
     The Semantic Web course presents the basics of semantic technologies and
the work with RDF data and linked open data. The main types of ontologies
and examples of their application are discussed. Methods for implementation and
work with semantic knowledge graphs and their applications are presented. The
course has been taught by experts from the recognized leader in enterprise knowl-
edge graph technology and semantic database engines Ontotext1 and covers sev-
eral specific topics including

     1
         https://www.ontotext.com/


                                             239
    • Knowledge graphs;
    • RDF data model and its serialization formats;
    • Semantic integration of heterogeneous data;
    • Linked open data;
    • Data visualization.
    The issues related to the semantic interoperability and the reusability of
knowledge and data, in particular in unforeseen problem situations, connect these
courses and form one of the thematic areas of study in the master’s program in AI.

4   Practical aspects – course projects
A significant part of the students’ independent work, aimed at understanding
and mastering the abstract theory, is related to the preparation of homework
assignments and the development of course projects.
     Most homework assignments and course projects have generic topics and
formulations that may be specified by the individual student depending on his/
her interests and preferences.
     Here are several examples in this regard.
     Example 1
     Design a knowledge base for a subject area of your choice. The concepts and
their properties (roles) should be described in terms of description logic (the DL
language [2]).
     Based on these descriptions, build a corresponding ontology implemented
with the means of Protégé/OWL [7] (concepts ↔ classes, roles ↔ properties,
constants ↔ individuals/instances).
     The knowledge base should include both atomic and different types of non-
atomic concepts (constructed by the operators EXISTS, FILLS, ALL, AND).
Provide appropriate statements of the three main types: d ⊑ e, d ≐ e, c → e.
     Quantitative characteristics:
     • number of concepts (classes): at least 20,
     • number of constants (individuals/instances): at least 10,
     • roles (properties): at least 10. Include properties with different charac-
        teristics (inverse, functional, transitive) and with appropriate and diverse
        domains and ranges.
     Describe appropriate examples for automatic reasoning (using a reasoner of
your choice) on the knowledge base – at least one inference of the type KB ╞ (c
→ e) and at least one inference of the type KB ╞ (d ⊑ e).
     Describe at least one example of classification of the knowledge base. The
same example should be illustrated with Protégé/OWL.


                                        240
     Example 2
     Based on a series of paragraphs (one or more) in an article of your choice in
Wikipedia, create an ontology that presents the concepts, objects, and relation-
ships described in the text. Implement the ontology in Protégé/OWL or Apache
Jena/OWL2 and check its consistency.
     The paragraphs in the article should be selected so that the ontology contains
both primitive and defined classes.
     Describe an appropriate example for performing automatic reasoning based
on the created ontology.
     Example 3
     The project is aimed at in-memory implementation of the Web Annotation
Data Model [8]. The task has a general formulation as follows.
     Write a program that implements semantic annotation of text with support
for the main elements of the annotation model (body, target, selector – for exam-
ple text position selector).
     Your program should read a short text (no longer than two paragraphs) and
annotate it based on a publicly available ontology of your choice, tailored to the
content of the text. The result of the work of the program should include finding
the location in the text of at least three previously known (explicitly stated) con-
cepts from the ontology.
     For the implementation of the project students may use technology of their
choice. A preferred option is the choice of the DBpedia ontology3, programming
in Java and presentation of the annotation in an open format, which facilitates
its use for various purposes – for example, presentation based on JSON-LD [9].
Thus, in the process of developing their course project, students learn and master
the work with new technologies and software platforms.
     Usually many students take the opportunity to work on course projects on
topics suggested by themselves (and agreed with the professor) and often such
course projects grow into valuable master’s theses.

5   Research and development – good practices of master’s theses
     Traditionally, graduates of our master’s program in AI develop excellent di-
ploma projects in modern and complex areas. The results achieved in most of them
have been published by graduates and their supervisors in authoritative specialized
scientific journals. In most cases, reusability and semantic interoperability with
other systems are part of the characteristics of the developed software products.
     As an example of good practice in this regard, we can consider the master’s
thesis entitled “Virtual health assistant” [10], defended in March 2021.
    2
        https://jena.apache.org/documentation/inference/#owl
    3
        https://wiki.dbpedia.org/


                                           241
     The aim of the thesis is to create a web-based virtual medical assistant that
provides users with fast and convenient health information related to the symp-
toms and treatment of various socially significant diseases. For this purpose, the
healthcare assistant uses an appropriate knowledge base. It supports functionality
for automatic collection of data from verified sources on the Internet, processing
this data, and building and extending a knowledge graph, which is the main com-
ponent of the knowledge base.
     The health assistant provides a user-friendly interface and receives as input
a set of symptoms related to the condition and sufferings of the user. As a result,
the assistant generates an appropriate answer, indicating probable diagnoses and
detailed information about each of them, including a description of the disease,
its synonyms and symptoms, as well as medications that help to treat it.
     In addition to the information provided, the healthcare assistant asks ques-
tions about more symptoms that the user may have missed. If the user indicates
other symptoms, the result of the assistant’s work can be updated. All components
of the health assistant can be easily expanded with additional functionalities.
     Another good example of a master’s thesis that successfully addresses most
issues of semantic interoperability is the one on “Intelligent system for answering
specialized questions about COVID-19”. It is aimed at development of a ques-
tion answering system based on information retrieval, natural language process-
ing and text mining techniques. For the implementation of the system and the
conduct of the planned experiments with it, the freely available COVID-19 Open
Research Dataset (CORD-19) presented in one of the Kaggle competitions4 has
been used. CORD-19 is prepared as a reliable set of more than 500,000 schol-
arly resources about COVID-19, SARS-CoV-2 and related coronaviruses in order
to assist the medical community in preparing answers to as many high-priority
questions related to COVID-19 as possible.
     After the necessary data processing, the system uses BERT [11] – a pre-
trained model for recognizing the context of the words in the question and the
possible answers. BERT uses neural networks to find the most accurate answer
to a given question.

6   Conclusion
The analysis of the experience of the master’s program in Artificial Intelligence at
Sofia University for nearly 20 years shows that the chosen approach of combining
lecture courses, providing deep theoretical knowledge at an abstract level, with
the challenge for students to acquire modern technological skills in the process
of developing proper course and diploma projects, gives very good results in
the education of specialists in AI at master’s level. It is particularly suitable for
    4
        https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge


                                           242
training in a number of areas that underlie the acquisition of knowledge and skills
to create semantically interoperable information systems that can work flexibly
with large heterogeneous datasets.

7    Acknowledgements
The presented work has been supported by Project BG05M2P001-1.001-0004
“Universities for Science, Informatics and Technologies in the e-Society (UNITe)”
funded by Operational Program “Science and Education for Smart Growth” co-
funded by European Regional Development Fund.

References
1. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach (3rd ed.). Pearson Education
    Ltd. (2010).
2. Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Elsevier (2004).
3. Brachman, R., Levesque, H.: Readings in Knowledge Representation. Morgan Kaufmann
    (1985).
4. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation 25 February 2014. http://www.
    w3.org/TR/rdf11-concepts/, last accessed 2021/03/30.
5. RDF Schema 1.1. W3C Recommendation 25 February 2014. http://www.w3.org/TR/rdf-sche-
    ma, last accessed 2021/03/30.
6. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommenda-
    tion 11 December 2012. http://www.w3.org/TR/owl2-overview/, last accessed 2021/03/30.
7. Horridge, M., Brandt, S.: A Practical Guide to Building OWL Ontologies Using Pro-
    tégé 4 and CO-ODE Tools, Edition 1.3. University of Manchester (2011). http://owl.cs.
    manchester.ac.uk/research/co-ode, last accessed 2021/03/30.
8. Sanderson, R., Ciccarese, P., Young, B.: Web Annotation Data Model (W3C Recommendation
    23 February 2017). https://www.w3.org/TR/annotation-model/, last accessed 2021/03/30.
9. JSON for Linking Data. https://json-ld.org/, last accessed 2021/03/30.
10. Tsanova, R.: Virtual Health Assistant. Master Thesis. Faculty of Mathematics and Informatics,
    Sofia University St. Kliment Ohridski (2021).
11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Trans-
    formers for Language Understanding. Proceedings of the 2019 Conference of the North Ameri-
    can Chapter of the Association for Computational Linguistics: Human Language Technologies,
    Vol. 1, pp. 4171–4186. Association for Computational Linguistics (2019). https://www.aclweb.
    org/anthology/N19-1423.pdf, last accessed 2021/03/30.


                                               243