=Paper= {{Paper |id=Vol-2784/rpaper14 |storemode=property |title=The Use of Thematic Analysis Methods in Scientometric Systems |pdfUrl=https://ceur-ws.org/Vol-2784/rpaper14.pdf |volume=Vol-2784 |authors=Alexander Kozitsin,Sergey Afonin,Dmitry Shachnev |dblpUrl=https://dblp.org/rec/conf/ssi/KozitsinAS20 }} ==The Use of Thematic Analysis Methods in Scientometric Systems== https://ceur-ws.org/Vol-2784/rpaper14.pdf
 The Use of Thematic Analysis Methods in Scientometric
                       Systems

          Alexander Kozitsin [0000-0002-8065-9061], Sergey Afonin [0000-0003-3058-9269]
                       and Dmitry Shachnev[0000-0002-5940-9180]

             Research Institute of Mechanics, Lomonosov Moscow State University
                                   alexanderkz@mail.ru



        Abstract. In many modern scientometric systems and citation systems, various
        mechanisms of thematic search and thematic filtering of information are pre-
        sented. In most cases, a full-text approach is used for thematic analysis of articles
        and journals, which has a number of limitations. The use of algorithms based on
        graph analysis, both independently and in conjunction with full-text algorithms,
        eliminates these limitations and improves the completeness and accuracy of the-
        matic search. The algorithm developed by the authors and presented in this work
        uses the co-authorship graph to analyze the thematic proximity of journals. The
        algorithm is insensitive to the language of the journal and selects similar journals
        in different languages, which is difficult to implement for algorithms based on
        the analysis of full-text information. The algorithm was tested in the scientomet-
        ric information analysis system (IAS) ISTINA. In the interface developed for
        these purposes, the user can select one journal that is close to them by subject,
        and the system will automatically generate a selection of journals that may be of
        interest to the user both in terms of studying the materials available in them, and
        in terms of publishing the user's own articles. In the future, the developed algo-
        rithm can be adapted to search for similar conferences, collections of publications
        and scientific projects. The presence of such a tool will increase the publication
        activity of young employees, increase the citation of articles and the citation be-
        tween journals. The results of the algorithm for determining thematic proximity
        between journals, collections, conferences and scientific projects can also be used
        to build rules for data access control models based on domain ontologies.

        Keywords: thematic classification, bibliographic data, co-authorship graph, in-
        formation systems.


1       Introduction

The use of modern methods of thematic analysis for analytical processing of large
amounts of information is currently used in almost all spheres of human activity, in-
cluding scientometrics. The results of thematic analysis of scientific information can be
used to clarify scientometric indicators, make management decisions, search for infor-
mation, and determine the rules for access to information.
Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                       179


    The calculation of scientometric indicators is used to assess the significance of arti-
cles (citation), the credibility of journals (impact factor, h5-index, h5-median), the im-
pact on the scientific community of individual authors (Hirsch index and g-index), as-
sessing the activities of organizations in general (i-index) [1]. However, many authors
note that the characteristics of the distribution of the absolute numerical values of sci-
entometric indicators have a significant dependence on the analyzed thematic area [2].
For example, the values of the citation index of articles for the last 2 years have a dif-
ferent median for physics and mathematics, since mathematical articles are cited for a
longer period, but are slower to "gain" the number of citations. A similar inconsistency
is observed in the journals as a whole. For example, the best Russian journals according
to the Russian Science Citation Index (RSCI), presented on the statistics page eli-
brary.ru/titles_compare.asp, for 2018 as of 20.04.2020 have the following citation in-
dicators for different rubrics: physics 9200; biology 4600; mathematics 3600; mechan-
ics 1500; computer science 1100. In this regard, it is incorrect to compare the absolute
values of scientometric indicators of articles, journals or authors from different thematic
areas. In such cases it is necessary to use the normalized average citation count [3] or
other similar indicators that take into account the thematic area of the research. The
construction of such normalized indicators requires thematic classification of large vol-
umes of scientific articles and journals.
    When performing management activities, the use of the results of thematic analysis
makes it possible to assess the state of various areas of research, to compare them with
the world level, to identify new thematic areas to determine the policy of allocating
material resources for stimulating scientific activity. At the same time, it is necessary
to assess not only the current values of indicators, but also their dynamics over time, as
well as world indicators. For example, a decrease in indicators for a certain research
topic while the same indicators in the world are increasing may mean an outflow of
scientific personnel in organizations from this research area or equipment obsolescence.
    Another important area of application of thematic analysis is the creation of effective
mechanisms for conducting information search. Search objects can be publications,
journals, persons, organizations and other objects. Based on the thematic classification
and clustering, the following demanded tasks can be solved: searching for published
materials on a given topic, finding the most authoritative experts in a certain subject
area, determining a list of journals for publication and assessing their significance, de-
termining new thematic directions in any area, and search for research teams.
    Determination of thematic links between objects of the information system [4] can
also be used to automatically build ontologies and define data access rules in the attrib-
ute-based logical access control models (ABAC) [5], which have now largely sup-
planted the old access control models: role-based (RBAC),mandatory (MAC) and dis-
cretionary (DAC).
    Many large scientometric and citation systems have tools for thematic data analysis.
180


2      Use of thematic analysis in modern systems

The capabilities of thematic analysis of various scientometric systems differ in the type
of information being processed, types of classifiers, information sources, and the set of
classification and clustering methods used.
    The Web of Science (WoS) project uses keyword indices and thematic classifiers to
conduct thematic search. Indexing by keywords is performed using the Author Key-
words, which the authors specify manually when adding the article. Also, there is in-
dexation of keywords and terms (KeyWords Plus), that are automatically extracted
from the names of articles cited in the work. Keyword indexing allows search and ad-
ditional filtering using user-defined terms. For indexing by thematic classifiers, two
main classifiers are used: the one-level Web of Science Categories classifier for jour-
nals (containing 250 categories) and the two-level Research Area classifier for articles
(150 science areas). Apart of that, an additional Essential Science Indicators classifier
that consists of 22 categories is used. The project implements the Manuscript Matcher
service, which is able to build recommendations on the selection of a journal based on
the text of the manuscript proposed for publication [6].
    The Google Scholar project uses a two-level classifier with 8 first-level elements and
400 second-level elements. Subdivision by topic can be used at scholar.google.com/ci-
tations to filter journals when displaying their indicators (h5-index and h5-median),
which allows building more objective rankings for each of the thematic areas specified
in the classifier. Thematic filtering is only available for English language journals. For
Russian-language journals, as well as for journals published in other languages, the-
matic classification is not supported.
    To correct the data, in accordance with its basic methodology, Google actively uses
user interaction to collect information "from bottom to top", allowing authors to create
their own pages with a list of articles, a photo, descriptions of interests (Google Scholar
Citations). Adding items to the profiles can be done automatically, automated (selected
options are shown to the user), or manually with specifying full bibliographic data.
    The Scopus project uses the All Science Journal Classification Codes (ASJC) two-
level classifier, containing 4 first-level records and about 350 second-level records to
classify journals. On the page www.scival.com one can see the distribution of journals
by area and the relative normalized characteristics for the selected sections of the clas-
sifier, the change in the number of publications by topic over time and other indicators.
The data is available with a paid subscription.
    The RSCI project uses the three-level State Rubricator of NTI of Russia (GRNTI),
containing about 8 thousand rubrics (elibrary.ru/rubrics.asp). Thematic classification
can be used to search for journals and articles, as well as to filter the results of the
selection of journals when issuing their scientometric parameters.
    The Open Academic Graph (OAG) Project, an extended version of the Microsoft
Academic Graph (MAG), contains 170 million articles with citation references. The
project is not a scientometric system or a citation system, but the project data can be
used to test the algorithms of scientometric systems. The data can be freely downloaded
from the project website www.aminer.org/open-academic-graph.
                                                                                         181


   In addition to the above classifiers of commercial systems, there are a number of
generally accepted classifiers that are not associated with any particular citation system.
At the world level, the most famous is the three-level classifier OECD Fields of Sci-
ence, containing more than two hundred rubrics, which was planned to be used, among
other things, in the "Map of Russian Science" project. In many Russian journals, for
the thematic classification of articles by the authors themselves, a more detailed Uni-
versal Decimal Classification (UDC) is used, containing more than 150 thousand ru-
brics. Also, for the thematic classification of various scientific materials, the VINITI
Rubricator is used, containing more than 53 thousand rubrics, and a number of other
thematic classifiers: Classifier of the Russian Scientific Foundation (RSF) [7]; Classi-
fier of the Russian Foundation for Basic Research (RFBR) [8]; International Patent
Classification (IPC) [9], All-Russian Classification of Standards (RCS) [10], Mathe-
matics Subject Classification (MSC) [11]; Journal of Economic Literature Classifica-
tion (JEL) [12] and others (scs.viniti.ru/MapService/treeList.aspx). In the presence of
such a variety of classifiers, it is natural that various projects for their matching appear.
For example, the project on comparing the Scopus and OECD classifiers [13], as well
as the VINITI project [14].
   The projects enumerated above aimed to develop systems of counting citation indi-
cators of scientific publications and to classify them by field of science. The next de-
velopment step was the emergence on their basis of systems for assessing the scientific
activities of organizations as a whole.
   The Spanish project SCImago Journal & Country Rank of the University of Granada
(or "Atlas of Science") evaluates aggregated data on scientific activities in Spain, Por-
tugal and South America based on Scopus data. The project website www.scima-
gojr.com provides indicators not only for scientific journals, but also for countries as a
whole. The SJR index, developed by the authors of the project, is an alternative to the
impact factor.
   The Faculty Scholarly Productivity Index (FSPI) project evaluates the metric indi-
cators of USA universities based on Scopus data. In addition to the number of publica-
tions and citation rates, this project uses data on the received awards and prizes, as well
as on the volume of federal research funding to calculate the rank of the university.
More than 350 universities are ranked based on the aggregated data.
   The Times Higher Education (THE) project aims to assess universities around the
world [15]. The World University Rankings index developed within the framework of
the project is built on the basis of WoS citation data, which make up 32.5% of the rating
[16]. In addition, the subjective assessments of experts, the amount of funding for the
research carried out, the attraction of foreign students and teachers, as well as the intro-
duction of the university's developments into industry are taken into account.
   The QS World University Rankings project [17] assesses in terms of research and
teaching performance, student-teachers ratio, average citation index per faculty mem-
ber, reputation with employers, and the number of international students and teachers.
   The Academic Ranking of World Universities (ARWU), often referred to as the
"Shanghai Ranking" (www.shanghairanking.com), takes into account the receipt of No-
bel Prizes by university alumni, the number of published articles in the "Nature" and
"Science" journals and citation rates.
182


   It should be noted that making such comparisons without taking into account the
language of instruction, as well as evaluating journals without taking into account their
thematic area, does not give quite accurate results [18]. For example, a comparison of
universities all over the world in terms of citation only in English-language journals
indisputably shows only the fact that the percentage of teachers and students who are
fluent in English at universities in the USA, England and Canada is significantly higher
than in Russia or other non-English-speaking countries. Likewise, a comparison of the
proportion of foreign students and teachers at universities with English and Japanese as
the language of instruction shows not so much the level of education in the institution
as the number of foreigners who are fluent in this language.
   For scientometric systems that have a goal to obtain objective and balanced scientific
assessments of the quality of products, taking field of study into account in the analysis
of scientometric data, including language, subject area, and other similar characteris-
tics, it is a necessary requirement when constructing objective scientometric indicators.


3      Thematic analysis using textual information and classifiers

In the process of developing and maintaining the ISTINA scientometric system, special
attention has always been paid to the development of methods for intelligent infor-
mation analysis, including methods of thematic analysis. The volume of data processed
by this system is significantly inferior to the world citation systems, since it covers only
28 organizations, 900 thousand publications, 70 thousand monographs and 13 thousand
patents. However, the number of types of data used is much higher. In addition to pub-
lications and patents, the system contains complete information about data on scientific
projects (R&D, grants), conference talks, dissertations and diplomas, on the participa-
tion of employees in the activities of various councils and editorial boards, prizes and
awards they receive, courses taught and other data [19]. In addition, the information in
the system is double checked. The basic principle of the system is the movement of
information "from bottom to top". At the first stage, the user, as the most interested
person, registers all their works in the system, which are displayed on their personal
page. At the second stage, the responsible employees of the departments confirm the
accuracy of the data. A similar method of collecting information using the creation of
personal pages is currently used in the Google Scholar Citations project by Google cor-
poration, which is the leader in the text processing market. But due to objective reasons,
it is impossible to organize the second stage of verification in this system.
    One of the simplest ways to conduct thematic analysis is to use classifiers with man-
ual comparison of objects and thematic classes, including the use of thematic classifi-
cation of journal cards. This approach is used in Scopus, WoS and RSCI.
    When the ISTINA scientometric system was created, methods of analysis using the
categorization of journals on various static rubricators were implemented to analyze
employees' and organizations' activity in different thematic areas. When using an inter-
active interface on the organization's statistics page [20], data on the distribution of the
number of articles, citations in WoS, the number of authors and other aggregated char-
acteristics by Scopus and GRNTI rubrics are presented. Data can be provided both for
                                                                                          183


individual departments and for the organization as a whole with the ability to filter by
year of publication, by overcoming the threshold value of the analyzed indicator and
by belonging to a group of journals: journals from Scopus, journals from WoS Top
25%, journals from the Higher Attestation Commission list, collections of articles and
others. It is also possible to separately specify the metric for filtering by the threshold
value and the metric to be displayed on the chart. For example, filter by the number of
articles, and display the number of links to an article.
   Analysis is possible both at the level of the organization as a whole, and at the level
of each department separately. It should be noted that the choice of the level of aggre-
gation is especially useful given the ambiguity of determining the department for each
individual publication. In large scientific organizations, a large number of articles are
published in co-authorship by employees of different departments. With the traditional
method of counting, when aggregated data on individual departments are counted in-
dependently and then added together, these articles are counted several times, resulting
in distortion of the totals. Using the ability to aggregate source data both at the organi-
zational level and at the departmental level makes such estimates more accurate and
objective.
   This approach provides the user with the opportunity to assess the degree of publi-
cation activity of employees in various thematic areas. However, it does not allow an-
alyzing information with a sufficient degree of detail. The rubricator is static and addi-
tional detailing within one rubric is not possible.
   The second possible approach is to determine the subject and search for information
by keywords, abstracts or full-text articles. Keywords can be specified by the authors
of a work when it is registered in the system or calculated during the indexing process
from the abstract, full-text articles, or a list of cited literature, for example, Author Key-
words and KeyWords Plus in WoS. This approach makes it possible to better concretize
the search topic, which is necessary for tasks such as determining new thematic areas
or searching for information on a specific information need of the user. It should be
noted that the use of such thematic analysis is not limited to information search. For
example, in [21], it is proposed to use thematic analysis to assess the quality of the
journal. The main hypothesis is that in "good" scientific journals, articles should be
devoted to a fixed set of topics, and these topics should change over time. Thus, after
training the thematic analysis algorithm on a training set of articles from all analyzed
journals, it is possible to carry out a thematic and temporary classification of articles
from these journals. The quality of the journal will be proportional to the classification
accuracy with which the articles contained in it have been correctly classified by journal
affiliation and publication time interval.
   The main difficulty in using keywords for thematic analysis is the limited set of key-
words. When describing articles, authors usually specify less than 10 keywords. For
example, the average number of keywords that authors specify when registering articles
in IAS ISTINA is 3.8. An additional obstacle is the subjectivity of choice. At the first
stage, the authors extract from the article the basic concepts that, according to their
assessment, are significant at the moment. At the second stage, for each concept they
specify only one of keywords describing it, excluding possible synonyms. Thus, articles
with similar topics may have a non-overlapping set of keywords, and the accuracy of
184


determining their thematic similarity is significantly reduced. At the same time, a sim-
ilar approach to finding articles that are close in subject is implemented in some citation
systems. For example, one can test the quality of the selection of articles when search-
ing for keywords in Russian-language journals on the search page of the RSCI project
[22].
   The WoS project provides users with the Manuscript Matcher service for selecting
a journal for publication according to the text of the article. The service requires prior
registration of the user to operate. After submitting the title of the article to be published
and the abstract, the service determines the keywords and searches for a match with the
keywords of the journals. The result is shown as a list of journals with a description of
each journal, as well as a list of common keywords with a measure of similarity to the
uploaded article. The service can be useful for authors who use highly specialized
terms, for example, in chemistry, biology or astronomy. For more general topics, com-
paring terms is not very accurate. For example, for the article "Determining the thematic
proximity of scientific journals and conferences using Big Data technologies" the best
journals in the results of the search are "Scientometrics" and "Journal of the association
for information science and technology", however, the top 5 list also contains "Journal
of medical systems" and "Journal of digital imaging" which are matched by "create
software tools" and "full-text information" keywords.
   The RSCI project offers users a service for searching for similar articles. The user
can select one of the articles already indexed in the system and request a search for
similar articles by topic. But the results of such a search are even less accurate than the
results of keyword search and the results of the Manuscript Matcher service. For exam-
ple, for the article "Architecture, methods and means of the basic component of the
ISTINA system of scientific information management" 14 thousand related articles are
determined, and in the top 10 list there is not a single article that would be related to
the system considered in the article or any analogue, and only one article deals with the
issues of scientometrics. The top 3 results of the search by thematic similarity are: "In-
formation technology of software architecture structural synthesis of information sys-
tem", "Analysis of the asp.net development information system", "General overview of
agris (agricultural research information system)".
   One of the possible ways to improve the completeness of keyword search and to
resolve issues of homonymy is to expand the set of keywords based on building rela-
tionships between keywords [23], as well as using translations of terms. The ISTINA
project uses Wikipedia materials to automate the translation process, as well as free
services from Abbyy. Keyword search is used as the first stage of thematic analysis in
the developed algorithms for finding experts and selecting journals, which are currently
being tested on the data of the ISTINA system. The results of ongoing research in this
direction cannot yet be used for the implementation of industrial software, however, it
can already be argued that using only thematic analysis based on keywords, abstracts
and texts does not allow obtaining a satisfactory classification result. In this regard, the
developed algorithms use a combination of full-text analysis methods and analysis
methods of graph theory, which use explicit or implicit connections between classified
objects.
                                                                                        185


4      First Section

The use of links between objects (or graph of objects) allows one to supplement or
refine the analysis data in case of lack of information. Objects in the graph can be of
the same type, for example, articles and links between articles, or they can be of differ-
ent types, for example, employees and their projects. The goal of graph analysis can be
to expand the search scope or clarify the significance of objects in an existing search
scope.
    One example of supplementing data in a graph with objects of different types is the
problem of finding experts on a given topic [24]. To search for experts, the objects in
the authorship graph most related to experts (articles, monographs, projects, reports,
etc.) are determined, the degree of edge is determined, the keywords of the objects are
extracted, an information portrait of the user is built on the basis of an expanded set of
keywords and weights of the graph edges, proximity to the original search query is
estimated. To test the algorithm, the data of the ISTINA scientometric system were
used. The use of such algorithms in citation counting systems is difficult, since the
graph of object links in them contains only two types of vertices: authors and publica-
tions. In full-fledged scientometric systems, the user's information portrait is composed
of a larger number of object types, which improves the quality of the results.
    An example of solving the problem of data refinement on the basis of links between
similar objects is the algorithm for determining authorship of articles [25], which is
implemented in the ISTINA system. It is assumed that the authors' groups have a certain
stability, and the probability of publication by two authors of several joint articles is
much higher than writing an article in which one of the authors is replaced by a full
namesake. In accordance with this hypothesis, to resolve the ambiguity in determining
the authors of the article among all possible namesakes, a co-authorship graph is con-
structed and the most connected component is selected.
    Using the co-authorship graph, it is also possible to solve the problem of determining
the thematic proximity of journals without using data of full-text analysis. The main
hypothesis in the implementation of this method is the assumption that a significant
part of the authors publish articles in their subject area, and, therefore, several journals
in which the same set of authors are published are similar in topics. Based on this hy-
pothesis, the thematic proximity of two journals is calculated as the weighted sum of
authors who have publications in both journals. This takes into account not only the
number of publications made by the author, but also the position of the author in the
bibliographic metadata of the article. The link weight of an article is distributed among
all authors, but the first authors carry more weight than the rest. A formal description
of the algorithm is given in [26]. The main difference between this algorithm and sim-
ilar algorithms that use full-text analysis or keyword analysis is insensitivity to the lan-
guage of journals and, as a result, the ability to search for links between journals in
different languages. In addition, the algorithm does not require lengthy training on large
arrays of texts, while showing a fairly high accuracy of 78%.
    The further development of the algorithm described in [26] was the work on auto-
mating the expansion of the search area for journals in the co-authorship graph. The
186


main premise is the assumption that the proximity relation is transitive for highly spe-
cialized journals. If two highly specialized journals are close on the subject to the third,
then they are close to each other. At the same time, the generalization of this rule for
all journals, including broad-scope ones, is incorrect. For example, the presence of com-
mon authors in any two journals with the journal "Bulletin of the Russian Academy of
Sciences" with general thematic does not mean mutual thematic similarity of the origi-
nal two journals. In this regard, it is necessary to use mathematical models with nor-
malizing the weights of the edges in the graph of journals' connections [26]. In the
course of the studies, it was shown that the best result is achieved by normalizing the
weights of the edges using the total sum of the edges outgoing from each vertex. After
normalization, the proximity matrix between journals is calculated based on the com-
parison of paths in a graph of length 3. This approach can significantly increase the
completeness of thematic search. The final result is constructed by combining two lists:
the closest journals in the original thematic proximity matrix and the closest journals in
the extended thematic proximity matrix. Combining these lists before showing to the
user can increase the completeness of search, not much reducing its accuracy.
    The software implementation of the algorithm is used in the ISTINA system to pro-
vide users with a convenient interface for thematic search of journals. To perform a
search, the user must select one journal known to them on a given topic, finding it by
name.
    After that, follow the "Related journals" link. For convenience, in the row of each
journal in the list presented, an assessment of its thematic similarity with the original
journal, various citation characteristics and the number of articles from this journal
loaded into the scientometric system are indicated.
    The user can go to the page of a journal, or continue moving through the graph of
thematic links of journals using the links in the "Similar journals" column.
    It should be noted that this algorithm can search for thematic links not only between
journals, but also between other groups of objects with authors. In this example, the
algorithm also searches for conferences similar in topic to the given journal.
    Another important practical task, which can be solved using the description of rela-
tionships between objects in a scientometric system, is the task of determining the au-
thority of experts when searching for them by thematic description. For directed graphs,
the classical algorithm for assessing the authority of vertices in a graph is PageRank,
which was used by Google to rank search results. The algorithm is based on the as-
sumption that an incoming edge in the graph confirms the authority of the vertex. More-
over, the significance of this confirmation is higher when the authority of the outgoing
vertex is higher. In scientometric systems, the algorithm can be effectively used to an-
alyze the citation graph. To analyze the undirected graph of co-authorship and other
similar graphs in scientometric systems, it is possible to use a number of other charac-
teristics: degree of connectivity (the number of edges for each vertex); the degree of
proximity (average shortest distance to other vertices of the graph); the degree of me-
diation (the number of shortest paths between all pairs of vertices passing through a
given vertex); the degree of influence (the degree of connectivity in which the contri-
bution of each edge depends on the degree of influence of the neighboring vertex, for
                                                                                              187


example, PageRank); cross-clique centrality (the number of cliques that a vertex be-
longs to) and others. Preliminary experiments carried out on the data of ISTINA system
show that this approach can be quite effective for use in ranking the results of experts
search, automatic detection of stable research teams and other similar tasks.


5      Conclusion

The use of thematic analysis algorithms for solving a number of information processing
problems in scientometric systems allows us to create convenient services for searching
and processing information. The combination of full-text and graph analysis methods
allows to increase the accuracy and completeness of the presented results. Currently,
such services are not widely used in scientific citation systems. Scientific research in
this area, carried out using the data of the ISTINA project, can provide new mechanisms
for searching and processing scientometric information.


References
 1. Akoev, M.A., Markusova, V.A., Moskaleva, O.V., Pisliakov, V.V.: Rukovodstvo po nau-
    kometrii: indikatory razvitiia nauki i tekhnologii. Izdatelstvo Uralskogo universiteta, Ekate-
    rinburg (2014).
 2. Orlov, A.I.: Naukometriia i upravlenie nauchnoi deiatelnostiu. Upravlenie bolshimi siste-
    mami. Spetsialnyi vypusk 44: Naukometriia i ekspertiza v upravlenii naukoi, 538–568
    (2013).
 3. Brichkovskii, V.V.: Naukometricheskii analiz v informatsionnom obespechenii inno-
    vatsionnoi deiatelnosti. V mire nauki, 8(174), 64–67 (2017).
 4. Afonin, S.A., Kozitsyn, A.S., Shachnev, D.A.: Software Mechanisms for Scientometrical
    Data Aggregation Based on Ontological Representation of the Relational Database Struc-
    ture. Programmnaia inzheneriia, 7(9), 408–413 (2016).
 5. Afonin, S.: Ontology models for access control systems. Proc. of the 3rd International Con-
    ference Russian-Pacific Conference on Computer Technology and Applications (RPC),
    pp. 1–6 (2018).
 6. WoS journal recommendation service, http://mjl.clarivate.com/home, last accessed
    2020/10/10.
 7. RSF Classifier, http://www.rscf.ru/node, last accessed 2020/10/10.
 8. RFBR Classifier, http://www.rfbr.ru/rffi/ru/contest_documents, last accessed 2020/10/10.
 9. IPC Classifier, http://www.fips.ru, last accessed 2020/10/10.
10. RCS Classifier, http://classinform.ru/oks.html, last accessed 2020/10/10.
11. MSC Classifier, http://www.ams.org/msc/, last accessed 2020/10/10.
12. JEL Classifier, http://www.aeaweb.org/journal/jel_class_system.html, last accessed
    2020/10/10.
13. Scopus and OECD classifiers matching project, http://report03.metrics.ekt.gr/en/ appen-
    dixIII, last accessed 2020/10/10.
14. VINITI classifiers matching project, http://scs.viniti.ru/MapService/mapform.aspx, last ac-
    cessed 2020/10/10.
15. Times Higher Education, http://www.timeshighereducation.com, last accessed 2020/10/10.
188


16. World University Rankings, http://gtmarket.ru/ratings/the-world-university-rankings/info,
    last accessed 2020/10/10.
17. QS World University Rankings, http://www.topuniversities.com, last accessed 2020/10/10.
18. Kincharova, A.V.: Metodologiia mirovykh reitingov universitetov: analiz i kritika. Univer-
    sitetskoe upravlenie: praktika i analiz, (2) 70–80 (2014).
19. ISTINA project data, http://istina.msu.ru/statistics/activity/, last accessed 2020/10/10.
20. Organization statistics in ISTINA, http://istina.msu.ru/statistics/organization/214524/ dy-
    namic, last accessed 2020/10/10.
21. Krasnov, F.V.: Sravnitelnyi analiz kollektsii nauchnykh zhurnalov. Trudy SPIIRAN, 18,
    767–793 (2019).
22. Keywords search in RSCI, https://www.elibrary.ru/querybox.asp, last accessed 2020/10/10.
23. Afonin, S.A., Lunev, K.V.: Topic Analysis in Collection of Keyword Tuples. Programmnaia
    inzheneriia, (2), 29–39 (2015).
24. Vasenin, Valery, Lunev, Kirill, Afonin, Sergey, Shachnev, Dmitry: Methods for intelligent
    data analysis based on keywords and implicit relations: The case of "ISTINA" data analysis
    system. In Proc. of the International Conference Actual Problems of Systems and Software
    Engineering (APSSE 2019), IEEE Conference Proceedings, pp. 151–155, United States
    (2019).
25. Kozitsyn, A.S., Afonin, S.A.: The Resolution of Ambiguities in the Identification of Authors
    of the Publication with the Use of Co-Authors' Graphs in Large Collections of Bibliographic
    Data. Programmnaia inzheneriia, 8(12), 556–562 (2017).
26. Kozitsyn, A.S., Afonin, S.A.: Discovering hidden dependencies between objects based on
    the analysis of large arrays of bibliographic data. Proc. of the International Conference Ac-
    tual Problems of Systems and Software Engineering (APSSE 2019), IEEE Conference Pro-
    ceedings, pp. 320–328, Moscow (2019).