Semantic Library as a Tool of Defining a Scientific Subject Area
Olga M. Ataeva, Vladimir A. Serebryakov

Dorodnicyn Computing Center FRC CSC of RAS, Vavilov str., 40, Moscow, 119333, Russia

                Abstract
                The paper considers an information system designed to represent the subject area associated
                with science and its features. Highlighted general concepts for the formal description of such
                a subject area in the knowledge base of the semantic library. The peculiarity of these areas is
                that the data structure is subject to frequent changes. Therefore, the tools of organizing
                knowledge, which is a semantic library, should be sufficiently universal and not require deep
                technical knowledge. The paper describes the functionality of the system and its use when
                setting up a subject area. For each area, the set of resources can differ both in format and in the
                set of the resources themselves. The set of concepts that form the description of the library's
                content should be so universal that it can be adapted to the needs of a particular area. Three
                levels of metadata are used to represent the data.

                Keywords 1
                Semantic library, ontology, knowledge representation

1. Introduction
    Various researchers have dealt with the issues of the semantic organization of knowledge since an-
cient times. Libraries specialized in specific areas usually use their classifiers to organize their re-
sources. This approach provides a more detailed analysis of the content of documents and the correlation
of semantic concepts of the contents of the library with a certain direction of the specialized area of
knowledge.
    The accumulated data have become available to a wide range of users through the network, the
functionality of digital libraries is becoming more and more diverse, satisfying the information needs
of users.
    The focus of the proposed work is subject areas related to science and their features. General con-
cepts for their formal descriptions in the knowledge base are highlighted. The peculiarity of these areas
is that the data structure is subject to frequent changes [1–4]. The main emphasis is placed on the
presentation of a generalized model of a scientific subject area and its features, implementation in search
engines and differences from classical approaches to information retrieval in scientific data sets.
    New problems and challenges also relate to the representation of knowledge in the information en-
vironment for various fields of science using modern approaches. To ensure the consumption of scien-
tific information at a new level, first of all, it is necessary to move to a semantically meaningful repre-
sentation of scientific knowledge extracted from information in the digital environment.
    To represent the data of the subject area, it uses metadata of three levels: (1) universal concepts
without reference to the subject area, or metametadata; (2) concepts for describing a specific subject
area or metadata, the definitions of which are given in terms of the first level; (3) application domain
data as such, represented in terms of second level metadata. Based on this metadata, user interaction
interfaces are configured for navigation, editing and information retrieval.
    The main task of creating and describing a generalized representation of scientific knowledge for a
certain area is to help experts in organizing knowledge and providing access to it [5–9]. At the same


SSI-2021: Scientific Services & Internet, September 20–23, 2021, Moscow (online)
EMAIL: oli@ultimeta.ru (O.M. Ataeva); serebr@ultimeta.ru (V.A. Serebryakov)
ORCID: 0000-0003-0367-5575 (O.M. Ataeva); 0000-0003-1423-621X (V.A. Serebryakov)
             © 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
time, the means of organizing knowledge should be sufficiently universal and not require deep technical
knowledge.
    The task was to create such an information system that could take into account all the variety of
different types of resources of a scientific subject area that can be stored in it, and at the same time
support its terminological description. One of the main tasks to be solved in the context of the system
is to provide the ability to integrate data from sources that support the semantic description of the data
model. In fact, such a system should be a constructor with an adaptable stored data content model to
create a digital library of any direction. An adaptable data model allows you to describe an arbitrary
data model of the library content within a subject area, fixed in terms of a thesaurus.
    A new generation information system should take into account the variety of types of resources in a
scientific subject area and at the same time support its terminological description. The main tasks of
such a system are to provide the possibility of integrating data from sources that support the semantic
description of the data model, and the development of an ontological representation of the content of
the subject area, which would allow describing any types of resources from the integrated sources. At
the moment, the distribution kit of the semantic library has been implemented and ready to use. The
following is a description of the main ideas of the data model and subsystems, which are presented in
the distribution of the information system.

2. About the data model
    In the information model of the semantic library, concepts were introduced to describe the contents
of the library for a certain subject area [10–13]. These concepts allow you to construct a description of
any type of information resources for this area. At the same time, according to the definition, infor-
mation objects that are directly the contents of a library have a distributed nature, which means that data
can come from various sources and aggregate information about an information object from various
sources, directly saving data in the library itself or storing links to identical objects in sources data.
    To describe the resources that make up the content of a specific subject area, concepts are used that
are common to any of them. That is, the set of concepts that form the description of the library's content
should be so universal that it can be adapted to the needs of a particular area.
    The content of the library is closely related to the thesaurus, which maintains relationships of various
types both between concepts and between concepts and information objects. This allows you to imple-
ment flexible custom search, the result of which will be a balanced list of objects in the subject area.
Collections of a wide variety of resource types are defined based on the same thesaurus. This approach
is extremely useful for creating separate custom collections.
    In fact, the concepts are divided into three categories: the first includes definitions of the concepts
of the content of the semantic library, the second category refers to the definition of the concepts nec-
essary to support the terms in the domain thesaurus, and the third includes the definitions necessary to
describe the processes of integration of the content of these resources [14–23]. Based on these defini-
tions, the main processes are described, such as, for example, the integration of data from different
sources, categorization / classification, mapping of different data source models to a given subject area,
construction of equivalence classes, etc.

3. Architecture
    Consider a formal description of the system that defines its goals, functions, externally visible prop-
erties and interfaces. It also includes a description of the components of the system and their relation-
ships, along with the principles that govern its design, operation and possible subsequent development.
This description includes software subsystems, visualized properties of those subsystems, relationships
between subsystems, and restrictions on their use. Moreover, each subsystem can consist of several
levels of abstraction, and each level can have its own architecture. Below is a list of the main subsys-
tems:
      Subsystem for describing the content of the information system,
      Thesaurus control subsystem,

                                                   141
       Subsystem of automated data processing and presentation,
       Subsystem for the implementation of data integration tasks,
       Recommender subsystem.

   Each of these subsystems is responsible for a specific functionality and uses its own subset of con-
cepts from the information model.

4. Content description subsystem
    Let's consider one of the subsystems that determines the basic settings of the system. The set of
concepts that make up the information model of the Libmeta library content is responsible for the uni-
versality of defining the system content: an information resource and, which describe resource in-
stances. An information resource is the main unit for describing the content of a library, and an infor-
mation object represents instances of information resources. Each of them has its own unique identifier.
In fact, the semantic meaning of an information resource is equivalent to the concept of an ontology
class with some restrictions in its description. The structure of the description of information objects is
determined by the concepts of attribute and a set of attributes, which are defined when describing the
corresponding resource. An attribute is an element describing a property of a resource, and a set of
attributes is defined as a collection of attributes of different kinds. The types of attributes are as follows:
attribute, file, object, numeric, text, string. In addition to defining the range of values of an attribute, an
important characteristic is its type and the definition of the number of its values. To describe a specific
information resource, the concept of an attribute value is used, which is closely related to the concept
of an attribute and is actually a container for storing specific values of an information object of a certain
type.
    These concepts provide a structured description of content and provide support for its adaptability.
This approach also provides the description of specific resources and their objects in the form of RDF
triplets and provides SPARQL an access point for publishing data in machine-readable formats.
    In general, a specific implementation of the library content model can be based on some imported
ontology, the classes of which are converted into resources, the properties are described in terms of
LibMeta attributes, the attribute sets actually define the domains of the ontology properties. When
building a library resource model based on this ontology, all URIs of properties, relations and classes
of the selected ontology are saved. If necessary, when importing the selected ontology into the system,
you can change the set of concepts by expanding or, on the contrary, reducing it by means of the system.
    Of course, this way of mapping the ontology to the concepts of the LibMeta system does not preserve
the entire possible list of restrictions imposed on the properties and classes of the ontology initially, but
its structural part remains, which is sufficient for solving problems defined within the system.
    Figure 1 shows the basic concepts used to construct a description of the subject area within this
subsystem.
    When describing information resources and determining a set of their attributes, the types of attrib-
utes that form the structural description of the resource play an important role. Attributes are divided
into several overlapping types: search, descriptive, administrative, identifying. In the formation of
search interfaces, it is the search attributes that play an important role, which are used when performing
an attribute search by resource types. The result of such a search is objects, a short description of which
is presented to the user through descriptive attributes.
    In fact, within the framework of this subsystem, the initial configuration of the configuration of the
library content and its interfaces for a specific subject area is performed. Figure 2 shows the sequence
of user actions to configure the system.


                                                     142
Figure 1: Basic concepts and their relations


Figure 2: The sequence of user actions to configure the system


5. Basic functionality of LibMeta
   Basic functionality of LibMeta:
    creation / viewing / editing of information resources and their structure;
    creation / viewing / editing of information objects and their structure;
    connecting data sources;
    loading data from connected data sources, which later become part of the library's content;
    creating / viewing / editing the structure of the thesaurus of the supported subject area;

                                                143
       create / view / edit thesaurus concepts
       batch loading of data that make up the content of the library;
       attribute / semantic / full-text search and navigation through the available information objects
        of the system;
       attribute / semantic / full-text search by data sources;
       creating / viewing / editing collections of information objects;
       formation of a subject area ontology by describing the structure of information resources and
        thesaurus;
       provision of data constituting the content of the system in a machine-readable format;
       highlighting links between information objects and concepts of the thesaurus;
       support for semantic labels or folksonomy [24–26] to describe the thematic focus of information
        objects;
       creating / viewing / editing the user's area of interest;
       creation of a recommendation system:
        a) based on the description of the user's interests;
        b) based on the subject area thesaurus under consideration;
       support for user micro-thesaurus based on the domain thesaurus.

    LibMeta functionality available to all public users:
     viewing information resources and their structure;
     viewing information objects and their structure;
     attribute / semantic / full-text search and navigation through the available system resources;
     attribute and semantic search over data sources;
     viewing public collections of information objects.

    From the point of view of an authorized user, the semantic library additionally provides the
following functionality:
     defining your micro-thesaurus as an extension of a certain node defined in the system of the
        main terminological thesaurus. It also provides support for the creation of so-called annotation
        ontologies or user ontologies (folksonomy), which are a collective vocabulary of users, com-
        piled as a result of the process of putting semantic labels on resources by them;
     defining your own collections of information objects;
     organization of joint thematic collections for user groups;
     attribute and semantic search on data sources with the ability to save search results;
     the user in the role of the system administrator has access to all the above-defined functionality
        and can use the additional functionality available only to him:
        a) can expand descriptions of resource types or create new ones at the request of users;
        b) can, at the request of users, include their resource objects in the public list of objects;
        c) for groups of users to make available the ability to edit certain types of resources or taxon-
            omies;
        d) edit user groups and roles and the set of operations available to them;
        e) edit and configure the main terminological thesaurus and its links.


6. Conclusions
    The description of the information system for the implementation of the functionality of the
semantic library for a certain subject area is presented. Thus, subject matter experts get the opportunity
to implement the main task of the library – the semantic / intellectual construction of the scientific
knowledge space for a certain subject area. That is, endowing it with semantics by highlighting clearly
intellectually meaningful connections, support for semantic markup. The main design tools are the
ontology of the subject area, which allows you to meaningfully structure and ensure connectivity
between the resources that are included in the scientific knowledge space of the subject area, and the
use of unified terminological support in the form of a thesaurus of this subject area. To implement the

                                                   144
functions of openness of the scientific space of knowledge, the possibilities of integrating other data
sources and the possibility of linking with their data were implemented. Providing functionality for
collaborative work on the development of the scientific knowledge space increases the efficiency of
research carried out in it and expands the possibilities for keeping it up to date.

References
[1]. Y. V. Leonova, A. M. Fedotov, Sozdanie prototipa sistemy upravleniya informacionnymi resur-
      sami, Vestnik Vostochno-Kazahstanskogo gos. Tekhn. Universiteta i zhurnala Vychislitel'nye
      tekhnologii, Kazahstan, (2018) 47–56.
[2]. M. V. Kulagin, A. S. Lopatenko, Nauchnye informacionnye sistemy i elektronnye biblioteki. Po-
      trebnost' v integracii, Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye
      kollekcii, 2001.
 [3]. Y. I. Shokin, A. M. Fedotov, V. B. Barahnin, Problemy poiska informacii, 2010.
      URL: https://nsu.ru/xmlui/handle/nsu/161.
[4]. K. Börner, VIVO, A semantic approach to scholarly networking and discovery, volume 1 of Syn-
      thesis lectures on the Semantic Web: theory and technology, 2012.
[5]. N. B. Ngok, A. F. Tuzovskij, Obzor podhodov semanticheskogo poiska, Doklady Tomskogo gosu-
      darstvennogo universiteta sistem upravleniya i radioelektroniki, 22, 2010.
[6]. Z. V. Apanovich, P. S. Vinokurov, T. A. Kislicina, Tools for Visual Analysis of Information Con-
      tent of Portals Included in Linked Open Data Cloud, Conference “Digital libraries: Advanced
      Methods and Technologies, Digital Collections”, RCDL 2011, Voronezh, Russia, October 19–22,
      2011, pp. 113–120.
[7]. E. A. Orobinskaya, A. Y. Doroshenko, Ispol'zovanie ontologij dlya avtomaticheskoj obrabotki
      tekstov na estestvennom yazyke, 2011.
      URL: http://repository.kpi.kharkov.ua/handle/KhPI-Press/14950.
[8]. B. V. Dobrov, N. V. Lukashevich, Tezaurus RuTez kak resurs dlya resheniya zadach informacion-
      nogo poiska, Trudy Vserossijskoj Konferencii Znaniya-Ontologii-Teorii (ZONT-09), Novosibirsk,
      2009.
      URL: http://ns.math.nsc.ru/conference/zont09/reports/93Dobrov-Lukashevich.pdf.
[9]. A. C. Ngonga Ngomo, et al, Sorry, i don't speak SPARQL: translating SPARQL queries into nat-
      ural language, Proceedings of the 22nd international conference on World Wide Web, ACM, 2013,
      pp. 977–988.
[10].V. A. Serebryakov, O. M. Ataeva, Osnovnye ponyatiya formal'noj modeli semanticheskih bibli-
      otek i formalizaciya processov integracii v nej. Programmnye produkty i sistemy 4 (2015) 180–
      187.
[11].O. M. Ataeva, V. A. Serebryakov, Personal'naya otkrytaya semanticheskaya cifrovaya biblioteka
      LibMeta, Konstruirovanie kontenta. Integraciya s istochnikami LOD. Inform. i eyo primen. 2, 11
      (2017) 85–100.
[12].O. M. Ataeva, Informacionnaya model' semanticheskoj biblioteki LibMeta. Programmnye
      produkty i sistemy 4 (2016) 36–44.
[13].O. M. Ataeva, V. A. Serebryakov, Ontologiya cifrovoj semanticheskoj biblioteki LibMeta. In-
      formatika i eyo primeneniya 12, 1 (2018) 2–10.
[14].P. A. Lomov, M. G. Shishaev, Integraciya ontologij s ispol'zovaniem tezaurusa dlya
      osushchestvleniya semanticheskogo poiska. Informacionnye tekhnologii i vychislitel'nye sistemy
      3 (2009) 49–59.
[15].Y. Katsis, Y. Papakonstantinou, View-based data integration. Encyclopedia of Database Systems
      (2009) 3332–3339.
[16].L. Xu, W. D. Embley, Combining the Best of Global-as-View and Local-as-View for Data Inte-
      gration. ISTA 48 (2004) 123–136.
[17].M. R. Kogalovskij, Metody integracii dannyh v informacionnyh sistemah. Institut problem rynka
      RAN, volume 74, 2010. URL: http://www.ipr-ras.ru/old_site/articles/kogalov10-05.pdf.
[18].A. E. Karabach, Sistemy integracii informacii na osnove semanticheskih tekhnologij, Nauka,
      tekhnika i obrazovanie 2 (2014) 58–62.

                                                 145
[19].M. Lenzerini, Data integration: A theoretical perspective, Proceedings of the twenty-first ACM
     SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, 2002, pp. 233–
     246.
[20].D. Calvanese, De G. Giacomo, M. Lenzerini, Ontology of Integration and Integration of Ontolo-
     gies, Description Logics, 2001.
     URL: http://www.diag.uniroma1.it/degiacom/papers/2001/CaDL01dl.pdf.
[21].N. F. Noy, Semantic integration: a survey of ontology-based approaches. ACM Sigmod Record
     33, 4 (2004) 65–70.
[22].L. Zhao, R. Ichise, Ontology integration for linked data. Journal on Data Semantics 4 (2014) 237–
     254.
[23].Le Hoaj, A. F. Tuzovskij, Razrabotka semanticheskih elektronnyh bibliotek na osnove ontolog-
     icheskih modelej, Trudy XV Vseros. nauch. konf. “Elektronnye biblioteki: perspektivnye metody
     i tekhnologii, elektronnye kollekcii”, RCDL, 2013, pp. 143–151.
[24].A. Noruzi, Folksonomies:(un) controlled vocabulary. Knowledge Organization 33, 4 (2006) 199–
     203.
[25].L. Specia, E. Motta, Integrating Folksonomies with the Semantic Web. The Semantic Web: Re-
     search and Applications (2007) 624–639.
[26].T. Gruber, Ontology of folksonomy: A mash-up of apples and oranges. International Journal on
     Semantic Web and Information Systems 3, 1 (2007) 1–11.


                                                 146