Big Math Methods in Lobachevskii-DML Digital
                  Library

Alexander Elizarov[0000−0003−2546−6897] and Evgeny Lipachev[0000−0001−7789−2332]

Higher School of Information Technologies and Intelligent Systems; N.I. Lobachevskii
Institute of Mathematics and Mechanics of Kazan (Volga region) Federal University,
                     35 Kremlevskaya ul., Kazan, Russia, 420008
                   amelizarov@gmail.com, elipachev@gmail.com


      Abstract. We oﬀer methods for the formation of digital collections from
      a set of documents (scientiﬁc articles, monographs, collections of re-
      ports), which are presented in various storage formats. Based on the anal-
      ysis of the structure of documents and the stylistic features of their de-
      sign, we have developed an algorithm for extracting the metadata of these
      documents. We present a software tool for dividing collections of articles
      into separate documents and the formation of their semantic presenta-
      tion. On the example of a collection “Proceedings of N.I. Lobachevskii
      Mathematical Center”, which have a diﬀerent format and structure, we
      describe the algorithm for creating a digital collection and its inclusion
      in the Lobachevskii-DML.
      Algorithms for replenishing the electronic collections of the Lobachevskii-
      DML digital library and generating metadata of documents of these col-
      lections in selected formats are presented. Services for the normalization
      of the Lobachevskii-DML digital library collection metadata were devel-
      oped in accordance with the DTD rules and the NISO JATS and DBLP
      XML schemas. Algorithms for creating a mandatory and fundamental
      collection of metadata collections are presented in accordance with the
      rules of the European digital mathematical library EuDML.

      Keywords: Electronic mathematical collections · Digital libraries · For-
      mation and extraction of metadata · Semantic links of information ob-
      jects · Metadata normalization services · Mandatory and fundamental
      metadata sets · Lobachevskii-DML digital library.


1   Introduction
The term “Big Data”, which is currently widely used in various subject ar-
eas [1], in relation to mathematics requires certain clariﬁcations: big data in
mathematics is fundamentally diﬀerent from big data in the current traditional
understanding. In mathematics, all data is essential; moreover, in mathematical
documents, many of their parts, especially formulas, are a peculiar code that
requires decoding and special interpretation.
   Information and communication technologies (ICT) have completely changed
the life cycle of scientiﬁc documents both at the stages of their preparation and


 Copyright © 2019 for this paper by its authors. Use permitted under Creative
 Commons License Attribution 4.0 International (CC BY 4.0).


                                       59
dissemination, and at the stage of obtaining results. The above fully applies to
scientiﬁc activities in the ﬁeld of mathematics. But when solving mathematical
problems, the expectations from the use of ICT are signiﬁcantly higher. Here
we can draw an analogy with the way computers have completely eliminated
manual calculations. The complexity of manual calculations, moreover, their
routine, can be demonstrated by the example of numerous tables of function
values. Such, for example, are the four-digit tables of V.M. Bradis [2], familiar
to all mathematicians: they were created in 1921 and reprinted more than 60
times.
    Computations have always required the use of speciﬁc methods and non-
standard organizational solutions to cope with the volume (Volume is one of
the characteristics of big data) and to overcome the barrier of the computing
power of the individual. Riche de Prony (Gaspard Clair François Marie Riche
de Prony) in 1791–1802 to compile “cadastral tables” containing logarithms (8
characters), created a “factory for calculation” (“usine à calculer”), dividing the
calculators into sections: a section of theorists from ﬁve prominent mathemati-
cians, including Legendre, a section of “calculators”, the number which was 7–8
people familiar with numerical and analytical calculations, and a section of 60-80
people who were engaged only in addition and subtraction [3]–[6]. Speaking of
Velocity as one of the characteristics of big data, the duration of manual calcula-
tions illustrates an example of calculating π: V. Shanks (William Shanks, 1873)
spent 15 years calculating 707 characters of this number (but only 555 of them
turned out to be true).
    “Manual” calculations are a typical calculation practice used almost until
the middle of the 20th century. After that, the created computers saved the
scientists from tedious arithmetic operations. Today, on the simplest laptop, the
calculation of Pi using the same algorithm that was used for manual counting
will take less than a second. Humankind is expecting the same progress now not
only in calculations. In the same way, intelligent computer tools should leave
in the past time-consuming routine (and not only!) operations in Mathematics.
In addition to computing and document preparation, intelligent search tools
are needed, including recommendation systems for ﬁnding scientiﬁc articles that
are close in content; terminological annotation services; personal information
assistants and information platforms for publishing automation.
    This article describes approaches to managing large collections of digital
mathematical documents based on semantic methods and consistent with the
principles of the World Digital Mathematical Library (WDML), as well as related
to the areas constituting the Big Math concept. These approaches are being
developed and already partially implemented in the framework of the project
for creating the Lobachevskii–DML digital math library.


2   Big Data in Mathematics and Big Math

Mathematicians, as well as scientists of other specialties, in recent decades have
faced such volumes of scientiﬁc documents that require the involvement of new


                                      60
methods of working with information. These methods should be primarily based
on the use of intelligent software tools. Estimates of the growth in the volume of
scientiﬁc production made today are fairly approximate and take into account
only articles in scientiﬁc journals. As an example, we present the results of a
calculation carried out by the Center for Science and Technology Research at
the University of Leiden (SBF 2007). According to this Center (see, for example,
[7]), the number of scientiﬁc publications in professional journals worldwide in-
creased from about 686 thousand in 1990 to about 1,260 thousand in 2006, which
corresponds to an increase of 84%. The annual growth rate calculated on this ba-
sis was more than 5%. At the same time, the number of scientiﬁc publications is
growing faster than the world economy. In addition to journal articles, scientiﬁc
knowledge is being disseminated today through such new forms of publications
as academic blogs, social networks, and dynamic publications. These forms have
already become widespread on the Web (see, for example, [8]).
    Specialized software services are being developed for working with scientiﬁc
content. Currently, computer support is used at all stages of the life cycle of a
scientiﬁc document. Mathematical content has features that do not always allow
using general-purpose software tools to work with it. The speciﬁcity of math-
ematical documents is determined, ﬁrst of all, by the logical structure of texts
presented in the form of a strict sequence of objects – deﬁnitions, statements
and proofs. They clearly indicate or are implicitly hidden links with objects from
other documents that are understandable only to a specialist in mathematics.
    The presence of specialized formulas is another feature of mathematical doc-
uments, which requires the use of specialized software tools at all stages of the
document life cycle, in particular, for their input and display. Such tools, as a
rule, are developed by mathematicians themselves (for example, [9]–[13]).
    Documents that contain similar texts may diﬀer signiﬁcantly in terms of
the content laid down in the formulas contained in these documents. Moreover,
absolutely identical formulations of theorems can have qualitative diﬀerences on
the results declared in them. Examples are theorems on the improvement of
approximation estimates or reference books on special sections of mathematics
(see, for example, [14]). Therefore, without methods that use the semantics of
not only texts, but also formulas, eﬀective work with mathematical documents
is impossible [15]–[18].
    Big data in mathematics also manifests itself in studies that require consid-
eration and analysis of numerous cases. For example, the classiﬁcation of ﬁnite
simple groups required the long-term eﬀorts of a large group of mathematicians
and is presented on more than 10,000 journal pages. An overview of this grand
study is given in [19, 20]. The well-known problem of four colors was reduced
to 1936 conﬁgurations and to create an algorithm for checking them on a com-
puter [21, 22]. The validity of the computer proof was conﬁrmed by G. Gonthier
by the formalization in the Coq language in 2005 [23].
   J. Carette, W.M. Farmer, M. Kohlhase and F. Rabe [24] proposed to use,
by analogy with the term Big Data, the term Big Math to denote the ﬁeld


                                     61
of creating methods and developing software systems to support mathematical
research. They highlighted 5 main aspects of Big Math:

 – Inference (output of statements by deduction);
 – Computation (algorithmic transformation of representations of mathemati-
   cal objects into forms that are easier to understand);
 – Tabulation (creating static, speciﬁc data related to mathematical objects
   and structures that can be easily stored, queried and shared);
 – Narration (bringing the results into a form that people can assimilate);
 – Organization (modular organization of mathematical knowledge).

   The main task of mathematical software systems today is to integrate the
aspects that make up Big Math.


3   Integrating Mathematical Knowledge with Digital
    Mathematical Libraries

The system of digital mathematical libraries currently being created is intended
to consolidate and make accessible both modern mathematical knowledge and
the knowledge contained in articles and books published in the pre-digital period.
To achieve this goal, in the framework of digital libraries, methods for managing
digital information are developed that take into account the characteristics of
the presentation of mathematical content (see, for example, [25, 26]).
    The most important tasks in the management of mathematical knowledge are
highlighted in [17, 26, 27]. The deﬁning part of these problems can be solved with
the help of digital mathematical libraries built using semantic technologies [26].
    An overview of digital mathematical libraries from the point of view of the
DELOS Digital Library Reference Model is given in [25]. These libraries are
mainly national and carry out the task of consolidating the mathematical doc-
uments of their countries, primarily books and journal articles. Examples of
such libraries are The Numdam French digital mathematics library [28] and the
All-Russian Mathematical Portal Math-Net.Ru [29].
    In the ﬁeld of integration of mathematical knowledge, the most signiﬁcant is
the Global Digital Mathematics Library (GDML) initiative [30, 31]. The World
Digital Mathematics Library (WDML) project put forward the idea of combining
the entire corpus of digital mathematical documents in the distributed system of
electronic collections as the main task [26]. The European Digital Mathematics
Library (EuDML, https://initiative.eudml.org/) [32] project is aimed at inte-
grating European mathematical resources. This project is considered as one of
the stages of building the World Digital Mathematical Library.


4   Lobachevskii Digital Mathematical Library

In accordance with the basic principles of WDML, a digital library Lobachevskii
Digital Mathematics Library (Lobachevskii-DML, https://lobachevskii-dml.ru/)


                                     62
is being created at the Kazan University [33]. The construction of this library
involves the development of management tools for mathematical content that
take into account not only the speciﬁcs of mathematical texts, but also the pe-
culiarities of processing Russian-language texts. Another objective of this digital
library is the integration of the mathematical resources of Kazan University and
their inclusion in the global scientiﬁc infrastructure, in particular, Math-Net.Ru
and EuDML. To solve this problem, methods for the normalization of metadata
are being developed in accordance with the schemes of international scientomet-
ric databases.

4.1   Use in the Organization of Digital Collections of Semantic
      Analysis Methods
In the project WDML [26] in the organization of digital collections proposed
to use an object approach. It involves the analysis and processing of not only
the documents themselves included in the collections, but also the objects con-
tained in these documents (in particular, deﬁnitions, mathematical statements
and their proofs). This section presents a number of methods that have been
developed within the framework of this approach and are implemented in the
formation of Lobachevskii-DML’s digital scientiﬁc collections. These collections
were formed as a result of processing an array of unstructured digitized mathe-
matical documents, presented in various formats (.pdf, .tex, .doc, .docx), using
the developed special methods. Approbation of the methods is performed on
the journal archive “Proceedings of N.I. Lobachevskii Mathematical Center” for
1998–2018, containing more than 60 volumes.
     Note that the main purpose of the “Proceedings ...” is the publication of
materials of mathematical conferences. As a result, the majority of the volumes of
the “Proceedings ...” contain several dozen articles with a limited (from a modern
point of view) composition of metadata. Since 1998 (since the release of the ﬁrst
volume), several style rules have been used to prepare materials, which inﬂuenced
the choice of formats and the design of articles in the collected collections. The
prerequisites for creating a digital collection from the array of ﬁles “Proceedings
...” were the division of volumes into separate articles, the selection of metadata
describing each article, the generation of additional metadata containing, in
particular, the bibliographic description of the article, a relation to the article ﬁle
in the digital collection, as well as relations to the proﬁles of the authors of the
article on academic portals and scientometric databases (kpfu.ru, MathNet.ru,
Scopus, etc.). The main steps in creating this digital collection are as follows.
     At the ﬁrst stage, the processed archive was clustered: the volumes of “Pro-
ceedings ...” were divided into classes in accordance with the similarity of their
structure and design. For each class, a set of regular expression patterns was de-
veloped that deﬁne the rules for searching information blocks. The basis of this al-
gorithm is the approach proposed in [34, 35]. The algorithm is implemented in the
form of programs in the C# language, allowing to process ﬁles in TEX, OpenXML
(.docx) and .pdf formats. TEX ﬁles were processed using standard functions
that implement text string operations. PDFLib (https://www.pdﬂib.com) and


                                       63
iTextSharp libraries (https://www.nuget.org/packages/iTextSharp/) were used
to process PDF ﬁles. For documents presented in the form of .docx ﬁles, the
word/document.xml ﬁle was extracted from the .docx archive in accordance
with the Oﬃce OpenXML format (see, for example, [36]).
    At the next stage, the metadata that describe both the volume as a whole and
the articles included in it were selected from the array of ﬁles of the“Proceedings...”
volumes. In particular, for all the articles of each volume were allocated their
names, as well as the page numbers of their beginning and end. For this, an
algorithm was developed that uses the structural homogeneity of each volume
and the style uniqueness in the design of articles in it. In addition to the listed
metadata, this algorithm allowed us to also highlight lists of authors, bibliogra-
phy blocks and other metadata (for example, e-mail addresses and keywords), if
they are present in the text.
    Further, an XML-language was proposed for describing digital mathematical
collections, which consists of a set of tags and XML-schemas based on the Journal
Archiving and Interchange Tag Suite (https://jats.nlm.nih.gov/1.2d2/). In the
notation of this language, on the basis of the data obtained at the stage of
processing the initial array of ﬁles, a description of the collection “Proceedings...”
was carried out.
    Using the methods of text analysis [1, 37] from the documents of the digital
collection, we have isolated the terms that make up the sets of keywords for
inclusion in the metadata. The term extraction algorithm is a development of
the approach proposed in [34, 35, 38].
    The next step in creating a digital collection included the procedures for di-
viding each volume of “Proceedings...” into separate articles. To do this, from
XML-ﬁles containing meta-descriptions of volumes, we read tags, whose at-
tributes point to the starting and ending pages of articles. Next, we divide the
ﬁles into separate documents, which are assigned names in accordance with the
rules adopted in the digital collection. The process of selecting articles was orga-
nized using a program developed in Python using the functions of the PyPDF2
library (http://pybrary.net/pyPdf/).
    Such metadata as authors’ email addresses and their aﬃliation, we imported
from authors’ proﬁles that are presented on academic sites and in various scien-
tiﬁc databases, and in parallel they were reﬁned. In this procedure, the semantic
links established in the process of forming a digital collection were applied. The
corresponding algorithm is based on the method of [33, 35, 38].
    The implementation of the algorithms described above allowed us to form a
digital collection of the “Proceedings of N.I. Lobachevskii Mathematical Center”
and together with the speciﬁed set of metadata to include it in the digital library
Lobachevskii-DML.

4.2   Formats and Normalization of Metadata Documents of Digital
      Math Libraries
Metadata Formats. At present, publications on mathematics are indexed in
many scientometric databases. These databases impose diﬀerent requirements


                                       64
on the composition of the metadata of the documents included in them and the
schemes for their presentation. On the other hand, digital math libraries also
use various metadata formats when building their collections. This is partly due
to the fact that the articles included in such collections, being published in jour-
nals in accordance with the rules established in them, diﬀer in the requirements
for the metadata used. These diﬀerences can be quite signiﬁcant, primarily re-
lated to the composition of metadata and their format, and are most noticeable
in the archival collections of scientiﬁc journals. For example, in many articles
published before 2000, there are no keywords and annotations, and the aﬃli-
ation of authors appeared only in articles of recent years. At the same time,
the constantly expanding set of metadata used today testiﬁes to their increasing
role in the improvement of modern scientiﬁc communications. Thus, there is a
need to develop both methods for extracting missing metadata from documents
and methods for converting already created metadata into the formats of rel-
evant scientometric databases. Note also that participation in such projects of
integration of mathematical resources as EuDML (The European Digital Math-
ematics Library, https://initiative.eudml.org/) [32, 39], involves the provision of
sets of metadata generated according to schemes of aggregators of mathematical
resources.
    Note that the metadata scheme of the digital mathematical library EuDML
is described in [40]: the metadata is divided into basic, fundamental, and addi-
tional [41]. To describe journal articles in the EuDML project, XML schemas
(NISO JATS V1.0) [42] are used. The mandatory set of EuDML metadata is
minimal in composition and contains the title of the article in the original lan-
guage, the names and surnames of the authors, the list of bibliographies, the
unique identiﬁer of the article (for example, doi) and the URL of its full text.
The fundamental set of metadata, in addition to the required metadata, includes
annotation of the article and keywords.
    A number of electronic collections of the digital library Lobachevskii-DML
are physically located in other digital libraries. Our tasks are to replenish such
collections with additional metadata, as well as automatically selecting objects
and establishing semantic links between them.
    When forming the fundamental set of metadata of electronic collections
stored on external resources, the metadata presented on these resources is ini-
tially imported. For this purpose, a program for extracting metadata from web
pages and writing them in the XML-format of the digital library Lobachevskii-
DML, as well as replenishing and subsequent conversion by EuDML schemas.
    As an example, we will point out the archive of articles of the journal “Rus-
sian Mathematics (Izvestiya VUZ. Matematika)”. This journal collection is dig-
itized, supplied with meta descriptions, presented on the portal MathNet.Ru
(http://www.mathnet.ru/php/journal.phtml?jrnid=ivm) (see also [29]), and is
also one of the collections digital library Lobachevskii-DML. The following steps
are implemented for this collection.
    Part of the metadata was imported from the “Citation in AMSBIB for-
mat” block of the MathNet.Ru portal. Then, keywords and a hyperlink to the


                                      65
Springer Link portal page (https://link.springer.com/journal/11982) with the
English version of the article were read from the web page. This information is
included in the metadata, and a hyperlink is made.
    The next step involves analyzing the web page of the English version of the
article, extracting and recording metadata. Next, a personal identiﬁer of this
article was generated, which was proposed to be created as a string concatenation
– journal identiﬁer (attribute value “jrnid =”) and article identiﬁer (attribute
value “paperid =”) on the portal MathNet.Ru.

Normalization of Metadata. By normalization, we mean the use of methods
for generating or transforming document metadata in accordance with the rules
and XML-schemas of digital libraries and scientometric databases.
    One of the most popular and respected computer science libraries is “Dblp
Computer Science Bibliography” (DBLP, https://dblp.uni-trier.de/). A prereq-
uisite for the inclusion of electronic collections in this library is the reorganization
and normalization of the metadata of the relevant documents. Among the col-
lections of the digital library Lobachevskii-DML, such is the collection of the
“Russian Digital Libraries Journal” (https://elbib.ru/). An archive of articles
published in this journal, starting in 2015, was chosen to prepare for indexing in
DBLP. The necessary metadata are: publication identiﬁer, the names and sur-
names of the authors, title of work, year of publication, volume, number, starting
and ending pages of the article in the journal number and URL of the full text
of the article.
    Normalization to the DBLP format occurs in three stages: the extraction of
the required metadata, the addition of metadata and their normalization into
the desired format.
    Using the program developed in C# and the System: XML extension tools,
the collection ﬁles are processed sequentially and, as a result, a set of metadata
is generated for each document. At the next stage, the metadata is updated with
information about the article and its authors in English. This information is im-
ported from the English version of the journal’s site using the HTMLAgilityPack
extension tool. Since the English-language information about the authors is in-
complete – only the names and initials are indicated – the names are translated
from the Russian-language page. The result of this work was the inclusion of
the Russian Digital Libraries Journal and articles published in it in 2015–2018
in the DBLP database (https://dblp.uni-trier.de/db/journals/rdlj/).

Lobachevskii-DML Metadata Factory. As a rule, the term “metadata fac-
tory” refers to a set of software tools for managing metadata in digital libraries
(see, for example, [28]). These tools are aimed at performing operations such as
extracting metadata from digital documents, improving metadata, reﬁning meta-
data, updating metadata and normalizing metadata into digital library formats
and formats of scientometric databases. The structure of the metadata factory
of the digital library Lobachevskii-DML also includes semantic transliteration
services and a recommendatory system for reﬁning scientiﬁc classiﬁers.


                                        66
4.3   Digital Mathematical Ecosystem

On the Lobachevskii-DML digital library portal, the OntoMath digital ecosystem
is presented, which is an essential part of this digital library [43]. The main com-
ponents of this ecosystem are: mathematical ontologies Mocassin, OntoMathPro
and OntoMathEdu , the semantic publishing platform, the semantic search service
OntoMathSearch, recommender systems for the selection of scientiﬁc classiﬁers,
search for related articles and terminological annotation.


Mathematical Ontologies. The concept of the Semantic Web assumes the
semantic structuring of the Inter-net data space for its use by software agents,
and the main tasks are the uniﬁcation (compatibility) and binding of data from
diﬀerent sources. Most relevant to applying Linked Data principles is the LOD
project. Its main advantage is in a standardized approach to the structuring and
storage of integrated data that is loaded and presented in the form of RDF, that
is, triplets of the “subject – predicate – object” type.
     An important direction in the development of the Semantic Web domain was
the development of ontologies of subject domains, including ontologies of the
presentation of mathematical knowledge [44].
     The representation and exchange of knowledge in any subject area is based
on its conceptualization (see, for example, [17]). The communication process
(both between people and between machines) uses a language with a dictionary
containing a set of terms to denote elements of conceptualization. Successful
communication requires that all its participants, ﬁrst, share a common con-
ceptualization and, second, use a common vocabulary. A means of solving this
problem, as is known, is ontology. Ontology deﬁnes the basic concepts of a cer-
tain subject area and the relationship between them. The main components of
ontology are classes, relations and axioms.


Mocassin Ontology [45] is an ontology of the logical structure of mathemat-
ical documents, designed for automatic analysis of mathematical publications
in the LATEX format. This ontology formally (in the OWL language) describes
the semantics of the structural elements of mathematical documents (for exam-
ple, theorems, lemmas, proofs, deﬁnitions, etc.) expressed in the form of classes
and properties. In addition, the ontology contains the axioms of cardinality and
transitivity.


The ontology of professional mathematics OntoMathPro [46, 47] is the
ontology of mathematical knowledge, which is organized in the form of two
hierarchies:

 – hierarchies of areas of mathematics: mathematical logic, set theory, algebra,
   geometry, topology, and so on;
 – hierarchies of mathematical objects: set, function, integral, elementary event,
   Lagrange polynomial, etc.


                                      67
    The OntoMathPro ontology is developed in OWL-DL/RDFS and contains
3450 classes, 6 types of object properties, 3630 instances of the IS-A property,
and 1140 instances of the remaining properties. It contains ﬁve types of relation-
ships: Class → Subclass, Deﬁned with the help, Associative relationship, Task
→ Solution method and Area of Mathematics → Mathematical object. Ontology
concepts contain their name in Russian and English, deﬁnition, links to external
resources from the Linked Open Data cloud, and links to other concepts. Objects
of semantic annotation are also formulas associated with formulas, fragments of
text that specify the descriptions of variable formulas.


Ontology of educational mathematics OntoMathEdu . In the current ver-
sion, this ontology is developed for the system description of the educational
aspect of mathematical knowledge. The initial ontology design of OntoMathEdu
is based on the OntoMathPro ontology developed by us earlier and described
above. A new conceptualization has been created, reﬂecting the conceptual sys-
tem of mathematics that corresponds to school education. Professional termi-
nology has been adapted to educational activities, in particular, the language of
school mathematics. Relationships reﬂecting the didactic dependence between
the concepts have been added to OntoMathEdu . Ontology concepts contain their
names in English, Russian, and Tatar languages, as well as basic deﬁnitions, re-
lationships with other ontology concepts (associative relationships), and links to
concepts from external data sets. The OntoMathEdu ontology is built on a set of
OntoMathPro basic ontology relationships such as taxonomic relation (ISA); the
relationship between the mathematical object and the ﬁeld of mathematics; the
relationship between mathematical objects is “determined by”; the relationship
between the task and the method of solving it; a new set of didactic relations
was also introduced [49].
    When creating the top level of ontology OntoMathEdu , the planimetry section
of the school mathematics course was selected as a pilot: the current version of
the ontology contains 585 concepts related to the planimetry course of 5–9 classes
of secondary school. The ontology structure contains type hierarchies; hierarchies
of materialized relationships; hierarchy of roles and network of points of view.
The speciﬁcity of school geometric knowledge was taken into account, therefore,
when designing ontology, a number of relations between the concepts were singled
out: “whole–part”, “determined”, relation of ontological dependence, “theorem–
property”, “theorem–characteristic”, “found by formula” (see also [50, 51]).


5   Conclusion

This paper describes approaches to managing large collections of digital mathe-
matical documents that are based on semantic methods and are consistent with
the principles of the World Digital Mathematical Library (WDML). These ap-
proaches and methods fully relate to the areas that make up the new concept
of Big Math. They are being developed and practically implemented as part of


                                     68
the creation of the Lobachevskii-DML digital math library. The main results
mentioned are as follows.
    Methods for the formation of digital collections from a set of documents –
scientiﬁc articles, monographs, reports presented in various storage formats are
proposed. Based on the analysis of the structure of documents and the stylistic
features of their design, an algorithm for extracting their metadata has been
developed.
    In connection with the increasing role of metadata in the improvement of
modern scientiﬁc communications, both methods for extracting missing meta-
data from documents and methods for converting already created metadata into
the formats of relevant scientometric databases have been developed and de-
scribed.
    A software tool has been developed for dividing collections of articles into
separate documents and forming their semantic presentation. For example, the
set of “Proceedings of N.I. Lobachevskii Mathematical Center”, which have a
diﬀerent format and structure, describes an algorithm for creating a digital col-
lection and its inclusion in the Lobachevskii-DML digital mathematical library.
    Algorithms for enriching the electronic collections of the Lobachevskii-DML
digital library and generating metadata of documents of these collections in
selected formats are presented.
    Services for the normalization of the collection metadata of the Lobachevskii-
DML digital library have been developed in accordance with the DTD rules and
NISO JATS and DBLP XML schemas. By normalization, we mean the use of
methods for generating or transforming document metadata in accordance with
the rules and XML schemas of digital libraries and scientometric databases.
    Algorithms for creating a mandatory and fundamental collection of metadata
collections are presented in accordance with the rules of the European digital
mathematical library EuDML.
    The digital ecosystem OntoMath, which is the most important part of the
Lobachevskii-DML digital library, is described. The main components of this
ecosystem are: mathematical ontologies Mocassin, OntoMathPro and OntoMathEdu ,
the semantic publishing platform, the semantic search service OntoMathSearch,
recommender systems for the selection of scientiﬁc classiﬁers, search for related
articles and terminological annotation.
Acknowledgments
This work was funded by the subsidy allocated to Kazan Federal University for the
state assignment in the sphere of scientific activities, grant agreement
No.1.2368.2017. The work partially contains the results of the project “Monitoring
and standardization of the development and use of technologies for storing and
analyzing big data in the digital economy of the Russian Federation”, carried out as
part of the program of competence of the National Technological Initiative “Center
for storing and analyzing big data”, supported by the Ministry of Science and Higher
Education Of the Russian Federation under the Treaty of Moscow State University
named after M.V. Lomonosov with the Project Support Fund of the National
Technological Initiative dated 08/15/2019 No. 7/1251/2 019. The work was also
carried out with the partial support of the Russian Fund for Basic Researches and
the Government of the Republic of Tatarstan within the framework of scientific
projects 18-29-03086, 18-47-160012.


                                     69
References
1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and
   Presenting Data. EMC. Education Services (ed.), Wiley (2015).
 2. Bradis, V.M.: Four-digit mathematical tables. Moscow: Drofa, 2019.
 3. Riche de Prony: Tables des logarithmes, sinus et tangentes pour la division décimale
    du quart de cercle calcules avec 8 ou 9 décimales pour être imprimées avec 7 d
    écimales exactes au bureau du Cadas-tre, https://patrimoine.enpc.fr/exhibits/
    show/dataincognita/item/1817. Last ac-cessed 16 May 2019
 4. Bulletin de bibliographie, d’histoire et de biographie mathématiques. Notice sur
    la découverte des logarithmes. Nouvelles annales de mathématiques. Journal des
    candidats aux écoles polytechnique et normale, Serie 1, vol. 14, pp. 1–204 (1855)
    (Additional pages), http://www.numdam.org/item/NAM 1855 1 14 S1 0/. Last
    accessed 16 May 2019
 5. Peaucelle, J.L.: Le détail du calendrier de calcul des tables de Prony de 1791 à
    1802. http://rybn.org/human computers/articles/calcul des tables de prony.pdf.
    Last accessed 16 May 2019
 6. Roegel, D.: A reconstruction of the “Tables des logarithmes à huit decimals” from
    the French “Service géographique de l’armée” (1891). [Research Report] 2010.
    inria-00543952. https://hal.inria.fr/inria-00543952. Last accessed 16 May 2019
 7. Binswanger, M.: Excellence by Nonsense: The Competition for Publications
    in Modern Science. In: Bartling, S., Friesike, S. (eds). Opening Science. The
    Evolving Guide on How the Internet is Changing Research, Collaboration
    and Scholarly Publishing, pp. 49–72. Springer International Publishing (2014).
    https://doi.org/10.1007/978-3-319-00026-8 3
 8. Heller, L., The, R., and Bartling, S.: Dynamic Publication Formats and Col-
    laborative Authoring. In: Bartling, S., Friesike, S. (eds). Opening Science. The
    Evolving Guide on How the Internet is Changing Research, Collaboration and
    Scholarly Publishing, pp. 191–211. Springer International Publishing (2014).
    https://doi.org/10.1007/978-3-319-00026-8 13
 9. Knuth, D.E.: The TEX book. Addison-Wesley Publishing Company (1984, 1986,
    1991).
10. Cervone, D.: Math Jax: A Platform for Mathematics on the Web. Notices of the
    AMS 59, 312–316 (2012).
11. Tantau, T.: The TikZ and PGF Packages. Manual for version 3.1.4a (2019).
    https://pgf-tikz.github.io/pgf/pgfmanual.pdf. Last accessed 16 May 2019
12. Tools & Technical Speciﬁcations. EuDML Enhancer toolset demos. https://
    initiative.eudml.org/tools-technical-speciﬁcations. Last accessed 16 May 2019
13. OpenDreamKit. https://kwarc.info/projects/odk/. Last accessed 16 May 2019
14. Polyanin, A.D. and Zaitsev, V.F.: Handbook of Ordinary Diﬀerential Equations.
    Exact Solutions, Methods, and Problems. CRC Press. Taylor & Francis Group
    (2018).
15. Kohlhase, M.: Semantic Markup in TEX/LATEX (2019). http://ctan.altspu.ru/
    macros/latex/ contrib/stex/sty/stex/stex.pdf. Last accessed 16 May 2019
16. Kohlhase, M.: OMDoc – an open markup format for mathematical documents
    [Version 1.2]. Springer, Berlin (2006).
17. Elizarov, A.M., Kirillovich, A.V., Lipachev, E.K., Nevzorova, O.A., Solovyev,
    V.D., and Zhiltsov, N.G.: Mathematical knowledge representation: semantic mod-
    els and for-malisms. Lobachevskii Journal of Mathematics 35 (4), 348–354 (2014).
    https://doi.org/10.1134/S1995080214040143
18. Elizarov, A., Kirillovich, A., Lipachev, E., and Nevzorova, O.: Semantic formula
    search in digital mathematical libraries. Proc. of the 2nd Russia and Paciﬁc Conf.


                                        70
    on Comp. Technology and Applications (RPC 2017). IEEE, pp. 39-43 (2017).
    https://doi.org/10.1109/RPC.2017.8168063
19. Gorenstein, D.: The Enormous Theorem. Scientiﬁc American 253 (6), 104–115
    (1985).
20. Solomon, R.: A brief history of the classiﬁcation of the ﬁnite simple groups. Bulletin
    of the AMS. New Series 38 (3), 315–352 (2001).
21. Appel, K. and Haken, W.: Every map is four Colourable. Bulletin of the AMS 82,
    711–712 (1986).
22. Appel, K. and Haken, W.: Every map is four Colourable. Contemporary Mathe-
    matics 98 (1989).
23. Gonthier, G.: Formal Proof – The Four-Color Theorem. Notices of the AMS 55
    (11), 1382–1393 (2008).
24. Carette, J., Farmer, W.M., Kohlhase, M., and Rabe, F.: Big Math and the One-
    Brain Barrier. A Position Paper and Architecture Proposal. arXiv:1904.10405v1
    [cs.MS] 23 Apr 2019.
25. Elizarov, A.M., Lipachev, E.K., and Zuev, D.S.: Digital mathematical libraries:
    Overview of implementations and content management services. CEUR Workshop
    Proceedings 2022, 317–325 (2017).
26. Developing a 21st Century Global Library for Mathematics Research. The National
    Academies Press,Washington (2014). https://doi.org/10.17226/18619
27. Elizarov, A.M., Kirillovich, A.V., Lipachev, E.K., and Nevzorova, O.A.: Mathemat-
    ical Knowledge Management: Ontological Models and Digital Technology. CEUR
    Workshop Proceedings 1752, 44–50 (2016).
28. Bouche, T. and Labbe, O.: The New Numdam platform. CICM 2017: Intelligent
    Computer Mathematics, 70–82 (2017).
29. Chebukov, D.E., Izaak, A.D., Misyurina, O.G., Pupyrev, Yu.A., and Zhizhchenko,
    A.B.: Math-Net.Ru as a Digital Archive of the Russian Mathematical Knowledge
    from the XIX Century to Today. Intelligent Computer Mathematics. LNCS 7961,
    344–348 (2013). https://doi.org/10.1007/978-3-642-39320-4 26
30. Ion, P.: The Eﬀort to Realize a Global Digital Mathematics Library. In: Greuel,
    G.-M. et al. (eds.). ICMS 2016, LNCS, vol. 9725, pp. 458–466. Springer (2016).
    https://doi.org/10.1007/978-3-319-42432-3 59
31. Ion, P.D.F. and Watt, S.M.: The Global Digital Mathematics Library and
    the International Mathematical Knowledge Trust. ICM 2017: Intelligent Com-
    puter Mathematics, 2017. LNAI, vol. 10383, pp. 56–69. Springer, 2017.
    https://doi.org/10.1007/978-3-319-62075-6 5
32. Ion, P.D.F. and Watt, S.M.: The Global Digital Mathematics Library and
    the International Mathematical Knowledge Trust. ICM 2017: Intelligent Com-
    puter Mathematics, 2017. LNAI, vol. 10383, pp. 56–69. Springer, 2017.
    https://doi.org/10.1007/978-3-319-62075-6 5
33. Bouche, T.: Reviving the free public scientiﬁc library in the digital age? the Eu-
    DML project. In: Kaiser, K., Krantz, S.G., Wegner, B. (eds.) Topics and Issues
    in Electronic Publishing JMM/AMS Special Session. FIZ Karlsruhe, pp. 57–80
    (2013). https://www.emis.de/proceedings/TIEP2013/05bouche.pdf. Last accessed
    16 May 2019
34. Elizarov, A., Khaydarov, S., and Lipachev, E.: Scientiﬁc documents ontologies for
    semantic representation of digital libraries. Second Russia and Paciﬁc Conf. on
    Computer Technology and Applications (RPC). Vladivostok, Russky Island, Rus-
    sia 25–29 September, pp. 1–5 (2017). https://doi.org/10.1109/RPC.2017.8168064


                                         71
35. Batyrshina, R.R.: Method for extracting terms in digital mathematical collections.
    Proc. of the N.I. Lobachevskii Math. Center. Kazan: Kazan Math. Soc. Publ. 55,
    24–26 (2017).
36. Standard        ECMA-376          Oﬃce        Open      XML         File      Formats.
    http://www.ecmainternational.org/publications/standards/Ecma-376.htm. Last
    accessed 16 May 2019
37. Ingersoll, G.S., Morton, T.S., and Farris, A.L.: Taming Text. How to Find, Orga-
    nize, and Manipulate It. Manning Publications Co. (2013).
38. Sabitova, E.M.: Algorithm for extracting connections in scientiﬁc digital collec-
    tions. Proc. of the N.I. Lobachevskii Math. Center. Kazan: Kazan Math. Soc.
    Publ. 55, 123–126 (2017).
39. Bouche, T. and Rákosnı́k, J.: Report on the EuDML External Cooperation Model.
    In: Kaiser K., Krantz S.G., Wegner B. (eds.) Topics and Issues in Electronic Pub-
    lishing, JMM, Special Session, San Diego, 99–108 (2013).
40. Jost, M., Bouche, T., Goutorbe, C., and Jorda, J.P.: D3.2: The EuDML metadata
    schema. http://www.mathdoc.fr/publis/d3.2-v1.6.pdf. Last accessed 16 May 2019
41. EuDML            metadata            schema         speciﬁcation           (v2.0-ﬁnal).
    https://initiative.eudml.org/eudml-metadata-schema-speciﬁcation-v20-ﬁnal.
    Last accessed 16 May 2019
42. Journal Article Tag Suite. NISO JATS V1.0. https://jats.nlm.nih.gov/1.0/. Last
    accessed 16 May 2019
43. Khaydarov, S. and Yamalutdinova, G.: Recommender System of Physical and
    Mathematical Documents Classiﬁcation. CEUR Workshop Proceedings 2260, 480–
    486 (2018).
44. Elizarov, A., Kirillovich, A., Lipachev, E., and Nevzorova, O.: Digital Ecosystem
    OntoMath: Mathematical Knowledge Analytics and Management. CCIS 706, 33–
    46. Springer (2017). https://doi.org/10.1007/978-3-319-57135-5 3
45. Lange, C.: Ontologies and languages for representing mathematical knowl-
    edge on the Semantic Web. Semantic Web 4 (2), 119–158 (2013).
    https://doi.org/10.3233/SW-2012-0059
46. Solovyev, V. and Zhiltsov, N.: Logical Structure Analysis of Scientiﬁc Publications
    in Mathematics. Proc. of the Int. Conf. on Web Intelligence, Mining and Semantics
    (WIMS’11). ACM 21, 1–9 (2011)
47. Elizarov, A.M., Zhizhchenko, A.B., Zhil’tsov, N.G., Kirillovich, A.V., and Lipachev,
    E.K.: Mathematical Knowledge Ontologies and Recommender Systems for Collec-
    tions of Documents in Physics and Mathematics. Doklady Mathematics 93 (2),
    231–233 (2016). https://doi.org/10.1134/S1064562416020174
48. Nevzorova, O., Zhiltsov, N., Kirillovich, A., and Lipachev, E.: OntoMathPRO On-
    tology: A Linked Data Hub for Mathematics. CCIS 468, 105–119. Springer, Cham
    (2014). https://doi.org/10.1007/978-3-319-11716-4 9
49. Elizarov, A., Kirillovich, A., Lipachev, E., Nevzorova, O., and Shakirova, L.: Open
    Linked Data and Ontologies in Mathematics Education. CEUR Workshop Pro-
    ceedings 2260, 186–196 (2018).
50. Kirillovich, A., Shakirova, L., Falileeva, M., and Lipachev, E.: Towards an Ed-
    ucational Mathematical Ontology. L. Gómez Chova, et al. (eds). 13th Interna-
    tional Technology, Education and Development Conference (INTED2019), Valen-
    cia, Spain, March 11-13, 2019. IATED, 6823–6829 (2019).
51. Elizarov, A.M., Lipachev, E.K., and Khaydarov, S.M.: Method of automated se-
    lection of reviewers of scientiﬁc articles, implemented in the scientiﬁc journal infor-
    mation system. Proceedings of the 21th Conference Scientiﬁc Services & Internet
    (SSI 2019), Novorossiysk-Abrau, Russia, September 23-28, 2019.


                                         72