=Paper= {{Paper |id=Vol-2543/spaper10 |storemode=property |title=Mathematical Modeling of the Processes of Interdisciplinary Collections Formation in the Digital Libraries Environment |pdfUrl=https://ceur-ws.org/Vol-2543/spaper10.pdf |volume=Vol-2543 |authors=Nikolay Kalenov,Irina Sobolevskaya,Alexander Sotnikov |dblpUrl=https://dblp.org/rec/conf/ssi/KalenovSS19 }} ==Mathematical Modeling of the Processes of Interdisciplinary Collections Formation in the Digital Libraries Environment== https://ceur-ws.org/Vol-2543/spaper10.pdf
         Mathematical Modeling of the Processes
 of Interdisciplinary Collections Formation in the Digital
                  Libraries Environment

 N. Kalenov1[0000-0001-5269-0988], I. Sobolevskaya1[0000-0002-9461-3750], A. Sotnikov1[0000-0002-
                                              0137-1255]


1 Joint Supercomputer Center of the Russian Academy of Sciences — Branch of Federal State

 Institution “Scientific Research Institute for System Analysis of the Russian Academy of Sci-
    ences” (JSCC RAS — Branch of SRISA), 119334, Moscow, Leninsky av., 32 a, Russia
                                    asotnikov@jscc.ru



        Abstract. The task of forming a digital space of scientific knowledge
        (DSSK) is analyzed in the paper. The difference of this concept from the gen-
        eral concept of the information space is considered. The DSSK is represented as
        numerous different accessible objects verified by the world scientific communi-
        ty. The form of structured representations in the digital space is the semantic
        network, in which the fundamental principles of the organization are the fun-
        damental objects and the subsequent construction of their hierarchy, in particu-
        lar, according to the principle of inheritance. The classification of the objects
        that make up the content of the DSSK is introduced. The concept of a hierar-
        chical relationship between an object is defined. The use of the concepts of set
        theory in the construction of DSSK allows you to divide by information into
        levels of detail. The concept of levels of the objects hierarchy of digital space of
        scientific knowledge is introduced. The definitions of objects of various levels
        are given. The principles of working with objects of each level are formulated
        too. It is shown that with the help of the hierarchical structure of information
        presentation in the digital library environment, a user collection can be formed
        in the central processing center. Constructing a hierarchy which is a section
        with a high degree of detail, allows you to increase the efficiency of infor-
        mation search in the space of knowledge and information analysis.


        Keywords: Semantic Network, Information Space of Knowledge, Electronic
        Library, Levels of Detail, Hierarchy of Information Objects.


1       Introduction

Information is at the forefront of the many areas of our lives. IT and computing tech-
nologies penetration has expanded the possibilities for the capturing, analysis, dissem-
inating, processing and use of scientific information.
   Modern needments for professional information require the development of a
knowledge space, which is a digital environment in which information resources and

Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
392


services from different fields of science, culture and education are integrated. A part
of the general knowledge space is the digital space of scientific knowledge (DSSK),
which differs from other components of the common space (in particular, such as
Wikipedia) in that the information objects represented in the DSSK are verified by the
world scientific community and are separated from information objects that are ideo-
logical, religious and other scientifically controversial character [1].
   The flow of requests to the DSSK is often continuous, rapidly time-varying, not
always predictable and unlimited in the form of the request. The software that pro-
cesses such requests cannot afford to store and “review” the parameters of the request,
which often requires a quick response in real time. Requirements for the accuracy of
data retrieval in the DSSK (in contrast to the Search Engine) necessitate the develop-
ment of special methods for request processing queries with a sufficiently accurate
mapping of the query text to the metadata space describing certain objects of the
DSSK. DSSK metadata, on the other hand, includes not only sets of keywords, but
also more complex structures, for example, hierarchical classification systems.
   The form of a structured representation of the digital knowledge space is a seman-
tic network. The basic organization principle of this space is based on the classifica-
tion system of objects and the subsequent construction of their hierarchy, in particu-
lar, according to the principle of inheritance: “macroeconomics” - section of the
“economy”, “poetry collection” – publication, etc.
   In accordance with this principle, objects are classified into a number of categories
or classes based on their common features.
   Most digital data collections are a diverse information network connecting objects
of various types. For example, an electronic publication (the type of the object is
“book”), in addition to plain text, contains additional information, such as the author
of the publication (the type of the object is “person”), year of publication, publishing
house, place of publication, etc. In turn, the object "person", in addition to a sequence
of characters that specify a surname, is associated with person’s biography, a field of
scientific interests ("subject matter"), etc. Thus, from the object “publication”, a con-
nection can be established with another “object” (“author”), with the text of this pub-
lication, with the “subject of the object”, etc. In the general case, the DSSK should
support various types of relationships between its elements – both within the same
class of objects (in particular, a recursive hierarchical relationship) and between ob-
jects of different classes.
   A significant number of studies have been devoted to the construction of thematic
hierarchies, a hierarchy of concepts, object models, etc., which provide hierarchical
organization of data at different levels of detail and have applications such as web
search and viewing tasks [2, 3, 4].
   In [5], the NetClus algorithm was described, which allows one to establish rela-
tionships between multi-type objects to create high-quality network clusters. The
NetClus algorithm allows reordering attribute objects in each newly defined network
cluster.
   In this paper, we consider the DSSK in the aspect of theory of sets, which allows
us to approach the issues of constructing space and working with it from a new point
of view.
                                                                                     393


2      Сonfiguration of DSSK

Let Ω – set containing all the elements of a digital scientific space located in some
(possibly distributed, storage). Ω includes, in turn, two sets. The first of them (denot-
ed by 𝛢) consists of digital images of real-world objects (digitized publications, ar-
chival documents, photographs, etc.) and objects created exclusively in the digital
environment (electronic publications, 3D models, multimedia materials, etc.). All
objects are being numbered in some way. Numbering should uniquely identify the
object and provide the ability to retrieve it from storage.
The second set (denoted by 𝐵) includes metadata containing multidimensional charac-
teristics of the objects of the first set, ensuring their selection by requests to DSSK
and presentation to users.
    The set 𝛢 consists of elements 𝑎𝑖 ,, where 𝑖=1…N (N is the total number of objects
reflected in Ω). These elements are objects of the following types:
          text files (recognized digitized printed or handwritten documents) or doc-
             uments originally generated in electronic form;
          static images (unrecognized digitized documents, digitized or originally
             digitally generated photos);
          digital or digitized audio recordings;
          digital or digitized video/film materials;
          3D models of various objects;
          multimedia installations (digital models of natural processes and technical
             devices, educational materials, virtual tours, etc.).
If the elements of the set 𝛢 are being represented by a simple collection of pairs “ob-
ject – its number”, then the set 𝐵 in general, is a rather complex facet-hierarchical
structure. Each of its elements is represented not only by a specific meaning and ref-
erence to an element of the set 𝛢 (which is the case in traditional bibliographic infor-
mation retrieval systems), but may include an indication of links with other elements.
Thus, by elements of the set 𝐵 we mean a structure that includes the semantic value of
object characteristics, an indication to one or more elements of the set 𝛢 correspond-
ing this characteristic, and an indication ti relations with other structures, which are
also elements of set 𝐵.
    The constituent elements of set 𝐵 can be indices of classification systems (such as
State Classifier of Scientific and Technical Information, UDC, etc.) documents
metadata such as individual characteristics of a person (surname and name, date and
place of birth, etc.), names of events, their text descriptions, temporal and geograph-
ical characteristics of objects, etc.
    In order to ensure the accuracy of searching for objects in the DSSK, set 𝐵 must
include a number of non-overlapping sets characterizing various aspects of infor-
mation about the elements of the set 𝛢. Obviously, there can be myriad such parti-
tions, but we will restrict ourselves to considering the “intuitive minimum”, but cov-
ering a wide range of characteristics of objects, a data set including classes such as
“what (who), where, when”, supplemented by the “subject” class and formal charac-
teristics specific to the DSSK, allocated to the subsets 𝐵1 (types of objects listed
above) and 𝐵2 (conditions for providing users with various objects of the set 𝛢).
394


The subset 𝐵1 of the set 𝐵 (𝐵1 : 𝐵1 ⊂ 𝐵) consists of 6 elements, which are the charac-
teristics of the elements of the metadata set, which we call the representation types of
a digital object. Namely:
      𝑏11 – text view with the ability to search for a fragment of text;
      𝑏12 – static image;
      𝑏13 – 3D object;
      𝑏14 – audio document;
      𝑏15 – video document;
      𝑏16 – is a multimedia object.
The subset 𝐵2 of the set 𝐵 (𝐵2 : 𝐵2 ⊂ 𝐵) consists of elements that determine the condi-
tions for the provision of a digital object to the user. The introduction of this subset is
due to various legislative requirements for the public presentation of an object. Ele-
ments of the set 𝐵2 will be called the conditions for the provision of the object. Name-
ly:
      𝑏21 – the object is in the free access;
      𝑏24 – the object is in limited access, free of charge for a certain group of us-
          ers (for example, a paid subscription to full-text scientific publications for
          employees of a certain institution) and inaccessible to other users;
      𝑏23 – the object is in limited access free of charge for a certain group and
          commercial for other users (for example, a digital model of a museum exhib-
          it may be available for free viewing to museum visitors, and remote viewing
          provides a certain fee.
      𝑏24 – the object is commercially available, i.e. the user needs to pay access
          to this resource.
The subset 𝐵3 of the set 𝐵 (𝐵3 : 𝐵3 ⊂ 𝐵) contains the main characteristics of the object
necessary for its identification during the search (“what” or “who”, “where”, “when”).
    Note that the subsets 𝐵1 , 𝐵2 and 𝐵3 of the set 𝐵 do not intersect each other.
    The subset 𝐵4 contains elements of the class “subjects”, it can have a rather com-
plex structure containing indices and names of elements of various classification sys-
tems (strictly hierarchical type State Rubricator of Scientific and Technical Infor-
mation [7], facetted type UDC [8, 9], etc.), keywords and terms, thesauruses, etc.



2.1    Examples of Subsets 𝑩𝟑

As an element of the set 𝐵3 belonging to the class “what”, to act as an obligatory ele-
ment is the name of a specific object, which can be supplemented by elements that
specify the type of object within a given view, such as a book, article, archive docu-
ment, etc., as well as unstructured explanations containing this or that information
about the object. For example, a collection of photos of Moscow of the 30s can be
supplemented by a detailed article on the architecture of the city of that time, present-
ed in the form of hypertext.
   As an element of the “where” class of the set 𝐵3 , various implementations can be
made that are related directly to the geographical location of the object (for example,
for a person the place of birth, for a museum object - the place of its initial discovery,
                                                                                      395


for an event - the country or city where it occurred, etc.). Elements that are belonged
to class "where" may be represent as organizations described, in turn, by its metadata
(for example, the person’s place of work, the place of storage of a museum item, a
publishing house for a printed document, etc.). For example, the collection of herbaria
A.N.      Petunnikova       [http://www.gbmt.ru/ru/about/fund/fondovaya-kollektsiya-
gerbariy/, http://e-heritage.ru/ras/view/person/general.html?id=49901007], contains
information about geographic location of the collection elements.
   The print materials year of publication, or year of the person birth, etc. may act as
an element of the “when” class of the set 𝐵3 .


3      The Collections Generation of the Set 𝜜 Elements

If the user’s requests to the DSSK include only elements of the set 𝐵, then the task of
selecting and presenting documents is reduced to the formation of the sampling condi-
tions, consisting of query terms connected each other by Boolean operators. Data
search in space 𝐵 is carried out by comparing its elements with query elements (let's
call it linear search). The result of the search is the addresses of the corresponding
elements of the set 𝛢, by which these elements are retrieved from the store and pro-
vided to the user in accordance with the conditions reflected in the set 𝐵2 .
    However, in practice, often the user needs to create a collection of elements of the
set Α for requests that do not explicitly formulate in terms of the set 𝐵. For example, it
is necessary to form a user's collection of the "Silver Age" poets works reflected in
the digital library (DL) of 20th century publications. During the formation of the DL
elements the “Silver Age” was not indicated as a temporary characteristic. It is also
not present in any of the classification systems that could be used in the database crea-
tion. In accordance with a query in terms of “Silver Age poets” its result will be an
empty subset of the elements of the set 𝛢. At the same time it is obvious that among
the elements of the set 𝛢 there are objects that meet the requirements for this collec-
tion. To detect them it is necessary to construct a mapping of the user query to the set
𝐵 and then implement a linear search for the query, including the corresponding ele-
ments of the set 𝐵.
    Note that the properties of objects of the space Ω are transferred to all objects of
the subsets of this set, which avoids a significant part of the duplication of infor-
mation [10, 11].


4      Building a Hierarchy of Digital Objects Representation
       in a Digital Library Environment

Let the set 𝐹 be the set of “the object’s characteristics”. The characteristic of the ob-
ject (𝑓𝑠 , where 𝑠 = 1 … ∞) is understood as a given parameter, according to which the
objects of the set Α will be combined into a user collection. For example, some re-
search area or object type (mineral, silver age poets, minerals mentioned in silver age
poetry, etc.) could be such parameter. I.e. 𝐹 = ⋃∞𝑠 𝑓𝑠 .
396


   Applying some mapping 𝑓𝑠 to the elements of the sets 𝐴 and 𝐵, we can obtain the
so-called collection that is the subset of one or more types objects integrated by a
given parameters: 𝑓𝑠 (𝐵(𝐴), 𝐴).
   Applying the map 𝑓𝑠 to the set 𝐵, we obtain the subset 𝐵(𝑓𝑠 ) by which we can se-
lect the objects corresponding to it from the set A. We call the class of these objects
the user collection 𝐺(𝑓𝑠 ). The collection of subsets 𝐴 corresponding to some common
characteristic 𝐵(𝑓) is called the class set 𝐺(𝑓). The concepts and definitions de-
scribed above make it possible to build a hierarchy of the representation of digital
library environment objects which allows to formalize of a general approach to the
formation of user collections conversely.
   The hierarchy levels of user collections are determined by the structure of user-
defined object mapping properties to subsets 𝐵. In general:
                            𝐵(𝑓𝑠 ) = 𝐵1 (𝑓) ∪ 𝐵3 (𝑓) ∪ 𝐵4 (𝑓)
The first level of the hierarchy (narrow-focus collections) includes collections for
which the subset 𝐵3 (𝑓) is not empty; the second level (theme-specific collections)
includes collections for which 𝐵3 (𝑓) is empty and 𝐵1 (𝑓) and 𝐵4 (𝑓) are not empty;
the third level (thematic collections) includes collections for which 𝐵1 (𝑓) and 𝐵3 (𝑓)
are empty and not empty 𝐵4 (𝑓).


5      Conclusion

The creating of such a hierarchy allows to optimize the process of formation and
maintenance of information funds of digital libraries and also allows the user to
choose from the whole set of interconnected resources of the digital library those
information objects that are united by one or more features [11–16].
Using the hierarchical structure of the information representation of objects in the
environment of the digital library, a user collection can be formed in the DSSK.
   Thus the task of collections creation according to a prescribed criterion is reduced
to the following steps (Figure 1):
     1. A correspondence analysis of the elements of this collection attribute to the
          elements of the set 𝐵;
     2. Separation the collection attributes into two subsets. The first subset contains
          the elements of the set 𝐵 in explicit form (for example, the type of an object,
          typical objects, etc.). The second subset doesn't contains the elements of the
          set 𝐵 in explicit form;
     3. The implementation of the algorithm for mapping the characteristics of the
          collection to the set 𝐵.
                                                                                        397




                              Fig. 1. Users collections creation

    As an example of the implementation of the algorithm for mapping the collection
to a set of metadata, let us consider the formation of materials collection related to the
“Silver Age” poets in the environment of the Digital Library “Scientific Heritage of
Russia” (DL SHR). Such collection should include information about the authors and
all documents related to them (including bibliography and full texts of their works).
    At the first stage, the years of publication are determined. I.e. the parameter of the
request “Silver Age” is translated into the “language” of metadata. After that, a selec-
tion is made of all objects included in the DL SHR for a given span of time. Next,
from the found data array, persons are selected that correspond to such metadata ele-
ments as the “author”, which, in turn, are associated with publications that have a
“publication type” metadata element with the value “poetry”. As a result, we get a list
of the names of the poets of the Silver Age.
    Then, of all the objects obtained in the first stage, all materials are selected accord-
ing to the “author” parameter.
    Thus, a collection of all DL SHR objects associated with the “Silver Age” poets
will be obtained (including archival documents, photo documents, etc.).
    The research is carried out by JSCC RAS – branch of SRISA within supported by
Russian Foundation for Basic Research (projects 18-07-00893 and 18-00-00372).


References
    Antopol'skij, A.B., Kalenov, N.Е., Serebryakov, V.A., Sotnikov, N.A.: Tochka zreniya o
    edinom cifrovom prostranstve nauchnyh znanij. Vestnik Rossijskoj akademii nauk 89(7),
    pp. 728–735 (2019).
    Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing.
    Web Intell Agent Syst. 1(3, 4), pp. 219–234 (2003).
    Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks
    with star network schema. KDD '09 Proceedings of the 15th ACM SIGKDD international
    conference on Knowledge discovery and data mining. pp. 797–806 (2009).
    Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: a look back and into
    the future. ACM Computing Surveys (CSUR) 44(4), Article 20 (2012).
398


      Wang, C., Liu, J., Desai, N., Danilevsky, M., Han, J.: Constructing topical hierarchies in
      heterogeneous information networks. Knowledge and Information Systems 44(3), pp. 529–
      558 (2015).
      Kalenov N.Е., Sobolevskaya I.N., Sotnikov A.N. Ierarhicheskie urovni predstavleniya in-
      formacionnyh ob"ektov v srede elektronnyh bibliotek. Informaciya i innovacii. Vol. 13.
      Iss. 2. pp. 25–31 (2018).
      Kalenov, N., Sobolevskaya, I., Sotnikov, A.: Digital museum collections and representa-
      tion of objects of natural history museum storage in the Scientific Heritage of Russia Digi-
      tal Library. Scientific and technical information Ser. 1, 10, pp. 33–38 (2016).
      Antopol'skij, A.B., Beloozerov, V.N., Markarova, T.S., Dmitrieva, Е.Y.: Ustanovlenie
      sootvetstvij rubrik GRNTI rubrikam drugih sistem klassifikacii nauchnoj i tekhnicheskoj
      informacii. Scientific and technical information Ser. 1, 3, pp. 3–18 (2015).
      Astahova, T.S.: Problemy otrazheniya sovremennogo nauchnogo znaniya v klassi-
      fikacionnyh sistemah: novoe v UDK. Sbornik trudov konferencii «Perspektivnye naprav-
      leniya nauchnyh issledovanij i kriticheskie tekhnologii v klassifikacionnyh sistemah».
      VINITI RAN, Moskva, 25-27 oktyabrya, pp. 32–35, Moscow (2017).
      Aleksandrov, P.S.: Vvedenie v teoriyu mnozhestv i obshchuyu topologiyu. Nauka, Mos-
      cow (1977).
      Steffen, L., Manat, M., Frank, S.: Reductions between types of numberings. Annals of
      Pure and Applied Logic 170 (12), 102716 (2019).
      Antopolsky, A., Atayeva, O., Serebryakov, V.: Environment of integration of data of sci-
      entific libraries, archives, and museums “LibMeta”. Information resources of Russia Vol.
      5(129), pp. 8–12 (2012).
      Kalenov, N., Sobolevskaya, I., Sotnikov, A.: On the interaction of the Scientific Heritage
      of Russia Digital Library with natural history museums. Information resources of Russia
      148, pp. 2–6 (2015).
      Ivanov, V.M., Strelkov, S.V., Kholina, A.A., Avtyushenko, A.L.: Virtual reconstructions
      in multimedia exhibitions of objects of cultural heritage. Virtual archaeology collection
      Hermitage, pp. 41–49 (2015),
      http://www.virtualarchaeology.ru/pdf/281_va_book2015.pdf, last accessed 2019/11/12.
      Barutkina, L.P.: Multimedia in a modern museum exhibition. Bulletin of St. Petersburg
      State University of Culture and Arts. SPbSUCA, pp. 106–108 (2011).
      Vassileva, S., Kovatcheva, E.: The innovative model for interactivity in Bulgarian muse-
      ums. In: 10th Annual International Conference of Education, Research and Innovation
      (ICERI), CERI Proceedings, pp. 5407–5412 (2017).
      Maggio, A., Kuffer, J., Lazzari, M.: Advances and trends in bibliographic research: Exam-
      ples of new technological applications for the cataloging of the georeferenced library her-
      itage. Journal of Librarianship and Information Science 49(3), pp. 299–312 (2017).
      Frandsen, T.F., Tibyampansha, D., Ibrahim, G.R., von Isenburg, M.: Library training to
      promote electronic resource usage: A case study in information literacy assessment. In-
      formation and Learning Science 118(11–12), pp. 618–628 (2017).
       Shahzad, F., Alwosaibi, F.M. Development of an e-Library Web application. IMSCI. In:
      11th International Multi-Conference on Society; Orlando; the United States, Cybernetics
      and Informatics, Proceedings, pp. 153–158 (2017).
      Mi, X.Y., Pollock, B.M.: Metadata Schema to Facilitate Linked Data for 3D Digi-
      tal Models of Cultural Heritage Collections: A University of South Florida Libraries Case
      Study. Cataloging & Classification Quarterly. 56(2–3), pp. 273–286 (2018).