=Paper= {{Paper |id=Vol-3126/paper2 |storemode=property |title=Capabilities of data mining as a cognitive tool: methodological aspects |pdfUrl=https://ceur-ws.org/Vol-3126/paper2.pdf |volume=Vol-3126 |authors=Genady Shevchenko,Oleksander Shumeiko,Volodymyr Bilozubenko }} ==Capabilities of data mining as a cognitive tool: methodological aspects== https://ceur-ws.org/Vol-3126/paper2.pdf
Capabilities of Data Mining As a Cognitive Tool: Methodological
Aspects
Genady Shevchenko 1, Oleksander Shumeiko 2 and Volodymyr Bilozubenko 3
1
  Scientific Center, Noosphere Company, Gagarin avenue, 103-A, Dnipro, 49055, Ukraine
2
  Dniprovsk State Technical University, Dniprobudivska Street, 2, Kamyanske, 51900, Ukraine
3
  Scientific Center, Noosphere Company, Gagarin avenue 103-A, Dnipro, 49055, Ukraine

                  Abstract
                  Gaining a competitive advantage in many industries is possible only if the available digitized
                  data contains genuine knowledge. In this respect, it is necessary to take a step to preliminary
                  identify their hidden and non-obvious regularities using Data Mining (DM) methods. It is
                  critical to know the capabilities and limits of the use of DM methods as a cognitive tool in order
                  to build the effective strategy for addressing the real-life business problems.
                  The aim of this paper: within the methodology of scientific cognition to specify the capabilities
                  and limits of the applicability of DM methods. This will enhance the efficiency of using these
                  DM methods by experts in this field as well as by a wide range of professionals in other fields
                  who need an analysis of empirical data.
                  The paper specifies and supplements the basic stages of scientific cognition in terms of using
                  DM methods. The issue regarding the contribution of DM methods to the methodology of
                  scientific cognition was raised, and the level of cognitive value of the results of their use was
                  determined.
                  The scheme illustrating the relationship between the methodology of the levels of scientific
                  cognition, which supplements the well-known schemes of their classification and demonstrates
                  the maximum capabilities of DM methods, was developed. In terms of the methodology of
                  scientific cognition, a crucial fact was established - the limit of applicability of any DM method
                  is the lowest, the first level of the methodology of scientific cognition – the level of techniques.
                  The result of the processing in the form of ER can serve as a basis for these techniques.

                  Keywords 1
                  Data Mining, data, scientific cognition, methodology, empirical regularity, hypothesis.


1. Introduction                                                                               number of different methods for identification of
                                                                                              regularities. In the English-speaking world, they
                                                                                              commonly use the term “Machine Learning”,
   The enhanced opportunities of the existing
                                                                                              denoting all Data Mining technologies.). This
cognitive tools and a search for new tools have
                                                                                              happened in response to the practical needs in
always aroused a great interest, owing to their
                                                                                              different sectors of the national economy, as well
crucial importance for the development of human
                                                                                              as in the context of evolving capacities of
civilization, because knowledge gained as a result
                                                                                              computers, which enabled to accumulate and
of the use of these tools is the primary means of
                                                                                              process large amounts of heterogeneous data.
transforming the reality.
   In recent decades, Data Mining (DM) methods
and tools have become widely used (Data Mining
— it is not a single method, but a variety of a large

ISIT 2021: II International Scientific and Practical Conference
«Intellectual Systems and Information Technologies», September
13–19, 2021, Odesa, Ukraine
EMAIL: nikk.gena@gmail.com (A. 1); shumeiko_a@ukr.net (А.
2); bvs910@gmail.com (A. 3)
ORCID: 0000-0003-3984-9266 (A. 1); 0000-0002-8170-9606 (A.
2); 0000-0003-1269-7207 (A. 3)
              ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
              Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
2. Main result                                                           mining algorithms are improved. However, in
                                                                         terms of the methodology, very little effort is
                                                                         made and almost no researches are carried out in
    DM algorithms, implemented as computer
                                                                         this field, which substantially hinders further
programs, have actually developed new research
                                                                         development of DM that, generally speaking,
tools. At the same time, a widespread use of DM
                                                                         could become a basis for disciplinary revolution
methods raises methodological questions whether
                                                                         in the theory of cognition, and could even enable
we have a correct understanding of their
                                                                         to generate major innovations in the field of
capabilities and limits as well as data processing
                                                                         intelligent technologies.
results in terms of scientific cognition. At first
                                                                             The aim of the study: to specify the capabilities
glance, it seems an abstract question, but its
                                                                         and limits of applying DM methods in terms of the
clarification will enable the concerned parties to
                                                                         methodology of scientific cognition.
achieve better results and organize more effective
                                                                             The process of cognition is a process of
business processes.
                                                                         gaining and using knowledge, which is of staged
    It should be noted that, to varying degrees, the
                                                                         nature [8]. The first stage of cognition – singling
attention has already been paid to the image
                                                                         out and statement of the problem, then –
recognition methodology, as DM methods were
                                                                         experience, observation, experiment, studying the
formerly called, by such internationally acclaimed
                                                                         phenomenon: the second stage - summarizing the
scientists as [1-7]. However, these scientists have
                                                                         facts, identifying their essential parts, forming
not conducted an analysis in terms of the theory
                                                                         hypotheses and conclusions on their basis, i.e.
of cognition.
                                                                         certain abstraction from the first stage. At the third
    In fact, almost all the time, most studies on DM
                                                                         stage, the abstractions found, i.e., hypotheses or
methods raise the question which is rather related
                                                                         conclusions that were made before, are being
to the methodology of cognition2: “What
                                                                         tested. This is a universal scheme of cognition
knowledge can be derived from the accomulated
                                                                         (Fig.1).
data and what is its level?” This question
                                                                             These issues became particularly pronounced
demonstrates the immaturity of our concept of
                                                                         when computers started to be used for data
DM in terms of the theory of cognition, and it also
                                                                         mining. The key issue, being critical in terms of
summarizes multiple practical problems of DM
                                                                         cognition, is what the use of DM introduced into
application, which are not addressed by
                                                                         the methodology of scientific cognition and what
enchancing the computing capabilities or parallel
                                                                         the application of its outcomes can result in?
computing in the field of Big Data processing [6].
                                                                             The application of DM tools starts only when
Besides the difficulties of the right choice and
                                                                         the data has already been prepared in the form of
application of DM methods to the addressed
                                                                         datasets, where the objects are represented by the
problems, there is no full understanding of its
                                                                         sets of multidimensional data – for example, in the
capabilities and limits for the application as well
                                                                         form of training dataset (TD). It is generally
as of the process (phasing) itself and the obtained
                                                                         acknowledged that all DM methods are based on
results in terms of the theory of cognition. At the
                                                                         the inductive method of cognition, i. е., in case of
same time, an understanding of the capabilities
                                                                         DM (inductive learning), the program learns
and limits of DM can lead to a significant
                                                                         based on the presented empirical data. In other
modification of the methodology for the study and
                                                                         words, the program builds some kind of a general
for addressing the practical problems as well as
                                                                         rule based on the presented empirical data, which
improving the efficiency of applying the methods
                                                                         is obtained, in particular, through observation or
under consideration.
                                                                         experiment3. When using any DM methods, the
    The practice of analytics shows that DM
                                                                         final outcome is represented in the form of one or
methods are indeed a powerful tool of scientific
                                                                         another model that reflects certain regularities
cognition, which is of multidisciplinary nature.
                                                                         intrinsic to the data under study, which might
Moreover, it is DM methods that can serve as a
                                                                         logically be called empirical regularities (ER) and
basis for the convergence of the approaches to
                                                                         which, probably, are hypotheses in nature (that
scientific cognition in the humanities as well as in
                                                                         was very cautiously assumed by Zakrevsky [4].
natural sciences. Based on DM, a huge number of
the applied problems is addressed, and the data


2                                                                        3
 Although, most often, it is raised in purely practical terms– how far    The matters of choosing the feature vector and data pre-processing
we can trust the knowledge we gain.                                      are beyond the competence of DM.
Figure 1: General Scheme of scientific cognition (using DM methods)

   Therefore, the major outcome of applying DM                               modifications. It is the most general methods
methods is ER in the subject area under study,                               of scientific cognition, and their study is the
obtained with the use of these methods, which can                            subject of philosophical methodology
be represented in different forms and types. These                           (philosophy of science).
ER are, in fact, “drafts”, a critical auxiliary                              In view of the foregoing, it is proposed to
material for preparation and development of                              supplement the above classification of the levels
dialectical “leap” or complicated transition from                        of the methodology of scientific cognition in the
the empirical level of cognition to the theoretical                      form of the list of items 1-4, suggested by
one through devising hypotheses are the driver of                        V. Shtoff, with the scheme presented in Fig.2 –
science (Fig.1). In order to clarify the issue of the                    some kind of graphical supplement to these items,
level of knowledge derived in terms of the theory                        illustrating the outcomes of the work in a specific
of scientific cognition when analyzing the data                          subject area of the inductive approach under
accumulated in a certain subject area, we cannot                         study, which is a basis of all DM methods, related
do it without the methodology of scientific                              to the levels of scientific cognition.
cognition that “studies the methods for building                             The main purpose of this scheme is to show the
the scientific knowledge and methods which are                           relationship between the levels of cognition, and, the
used to gain new knowledge, i.e., methods and                            most important thing, to demonstrate the limit of the
forms of scientific study, dealing with the                              capabilities of DM methods. It follows from the
technical aspect to a minimum extent” [9]. It is                         above statement and the illustration that the limit of
customary to distinguish the following levels of                         the level of the scientific cognition methodology,
the methodology of scientific cognition [9]:                             achieved through DM methods or tools, is the lowest
   1. Technique – the lowest level, the                                  of these levels – the level of techniques.
   examples – directions, techniques, etc.;                                  As a result, ER is quite understood by the
   2. Scientific method, relying on knowledge                            expert in the subject area and is applicable for
   of the respective regularities, i.e. the theory of                    further processing as a basis for possible transition
   the given subject area;                                               to the hypothesis, which is not the automated
   3. General scientific method – quite general                          result of induction and not an inductive inference,
   method of scientific study, where the applicability                   but one of the possible answers to the problem
   extends the limits of one or another scientific                       encountered, including in the form of
   discipline and relies on the existence of                             assumptions, suggestions and their implications
   regularities, being common for different areas.                       with further testing in practice. However, the
   4. Methods used in all sciences without                               emergence of hypothesis is mandatory4.
   exception, although, in different forms and

4
  The need for hypothesis stems from the fact that the laws are not      practical (experimental, object-tool) and theoretical activity.
directly seen in individual facts, no matter how many of them are        However, eventually it is only confirmation by practice that converts
accumulated, as the essence does not coincide with phenomena.            a hypothesis into the true theory, converts probable knowledge into
Hypothesis is the statement, the truth or falsity of which has not yet   the credible one, and vice versa, the refutation in practice and
been established. The process of establishing the truth or falsity of    experiment discards the hypothesis as false assumption [9].
the hypothesis is the process of cognition as a dialectic unity of
Figure 2: Relationship between the levels of cognition
   Abbreviations: ER – empirical regularities. TD – training dataset. VD – validation dataset

    Using DM, it becomes possible to automatically       by the researcher and, most probably, carrying out
generate ER, being the “bricks” for advancing and        additional researches, which, to a large degree,
building hypotheses as a part of addressing a specific   can be considered an extension of DM. This is the
problem. That is, the emergence of hypothesis is         case with almost all known DM methods.
preceded by a very important stage of generation         Therefore, the ultimate outcome that might be
(search) of ER - this is precisely the contribution of   obtained directly in the application of any DM
DM to the process of cognition! Furthermore, this        tools is ER level, and, methodologically speaking,
stage occurs automatically, based on the algorithms      the level of techniques. Such class of DM models
invented by human beings and implemented in the          as neural networks needs to be separately
form of computer programs (a human just selects the      mentioned. The use of neural networks, in some
suitable algorithm and downloads the data).              cases, yields rather good results; however,
    At the same time, possible transition from ER to     unfortunately, they produce no effect in terms of
hypothesis as a probable knowledge – is not so easy      the methodology of scientific cognition – we
and straightforward way. There is an intersection or     cannot build ER in this case and, even more, we
convergence of dialectical logic, methodology of         are unable to proceed to formulate and devise
scientific cognition and psychology of scientific        hypotheses! Their level is limited by the level of
creativity (Fig.3). The analysis of the structure of     “primitive” (like animals do it) recognition
such a complex dialectic intersection is one of the      (classification) and nothing more, and it is not
challenges in the way of transition from the             itself a new knowledge. From the cognitive and
empirical basis to the theoretical building [9].         methodological points of view, it is a dead-end
                                                         type of DM or a completely different paradigm of
                                                         the scientific cognition. Actually, this is also
                                                         discussed in the work [10] where the authors try
                                                         to "feel out" the ways of understanding the work
                                                         of neural networks.
                                                             It should be noted that it is advancement of ER
                                                         that the cytogramm processing web service (URL:
                                                         https://www.data4logic.net/ru/Services/CellsAttri
                                                         butes) is focused on, enabling cytologists-
                                                         researchers to generate ER and, with a high
                                                         probability of success, to devise on their basis the
Figure 3: Transition from ER to hypothesis               hypotheses to address the problems that they face.
                                                         The pictures stipulated by the paper related to
   This also requires performing considerable and        leukemia diagnostics [11, 12] can be used as an
nontrivial intellectual work, taking certain efforts     example of this approach.
    In many cases, solving specific practical            there is still a limit represented by the empirical
problems is actually limited, in terms of cognition,     cognition – obtaining of ER, i.e., in fact, provisional
to the level of ER, which is used as a basis for         hypothesis for the given specific subject area. In this
further formulation, in a best-case scenario, of a       case, the burden of solving the specific problem to
decision-making direction or rule, and it remains        deepen cognition and clarify the hypotheses is fully
at the first empirical level of cognition, being the     transferred to the experts in the subject area. The
lowest of all possible levels [13, 14, 15]. In the       full-fledge interaction between the experts in subject
short run, it suits business as a sphere of practical    areas and Data Scientist is significantly more
activities; however, in the long run, the main think     painstaking in terms of organizational and
is lost – finding really new knowledge which can         communicative cost, but, in our opinion, this
be implemented in innovations, or developing a           approach is able to ensure major breakthroughs in
new method, modus operandi, business model,              the subject area. An interim option is also possible
etc., that will provide higher-order competitive         and now it begins to be actively used in business.
advantage.                                               Many companies realized that, without efficient
    In a similar way, the level of “primitive”           “task setters” and analytics well-versed in DM tools,
classification inherent to neural networks often suits   just the use of desktop, web and cloud services was
business. Consequently, it can be ascertained that       inefficient. From a methodological standpoint, the
DM methods are capable of providing only the level       most critical fact has been established – the limits of
of empirical cognition in the specific subject area      the applicability of any DM methods are the level of
under study as well as the level of techniques and       ER, i.e. the level of techniques and directions in a
directions, which completely fits the scheme shown       specific subject area, where data mining methods are
in Fig.1 and Fig.2.                                      used, or provisional (working) hypothesis. As of
    Now, it becomes clear why there are no               today, it is the only visible and obvious achievement
“breakthrough” inventions made using DM –                of all DM algorithms. It should be noted that one of
because now such inventions can take place only          the available web services, suitable for researchers
in a specific subject area, and this requires close      who have no special training on mathematics and
cooperation and interaction as well as full-fledged      informatics, which is designed to find ER, is
scientific communication with the representatives        implemented          on      ScienceHunter       portal
of the same subject area, which is the biggest           (https://www.sciencehunter.net).
obstacle to such kind of achievements.
    Hence, the following conclusions can be              3. Conclusions
drawn.
    1. The methods of DM as well as Big Data
is a new man-machine methodology of empirical                Knowing the applicability limits of DM tools, it
                                                         is possible to more fully understand how to set goals
cognition.
                                                         when selecting appropriate DM methods; for
    2.     These methods have their limit in the
form of ER represented in different forms.               example, to choose ones that produce a relatively
    3. ER can serve as “drafts” for preparation,         large set of ER, or to use those ones that produce a
generation and formulation of hypotheses aimed           limited set of such patterns characterized by greater
at further more in-depth cognition of the subject        accuracy. From the methodological point of view,
area.                                                    the most important fact has been established – the
    4. In order to select the best strategy for the      limits of applicability of DM methods is the level of
use of DM tools, a clear understanding of the            ER. A huge number of methods, techniques, a
goals of problem-solving is needed.                      variety of developed computer programs, cloud
    5. The use of DM tools requires a close              services and other software – all this ends up with
cooperation with the experts in a specific subject       one thing that is the level of ER. Currently, this is the
area that, in its turn, raises a number of questions     only observable and obvious achievement of all DM
related to: initiation of such cooperation;              algorithms. Should the result be considered
skillfulness of the experts in the subject area;         important in terms of cognition? It is quite possible
statement of the problem in the respective context;      to answer positively. Although it should be
building the team to solve the problem, etc.             emphasized that all this refers to a particular subject
    6. DM and Big Data experts’ “shifting” to the        area, which applies methods of data mining. It
area of development of the standardized software         should be noted that DM can be understood as an
(cloud services, web-services, desktop applications)     evidentiary or constructive method of cognition,
                                                         with all the advantages and disadvantages. Finding
does not solve the problem of in-depth cognition;
ER today is implemented in the form of web                Pourghasemi, C. Gokceoglu (Eds.) Spatial
services (for example, ScienceHunter portal:              Modeling in GIS and R for Earth and
https://www.sciencehunter.net), so future research        Environmental Sciences, Elsevier, 2019,
will focus on the development of an automated             pp. 467-484).       doi:10.1016/B978-0-12-
system concept for DM, suitable for researchers           815226-3.00021-1
with no special training in mathematics and          [14] K. Gibert, J. Izquierdo, M. SànchezMarrè,
computer science.                                         S.H. Hamilton,      I.     Rodríguez-Roda,
                                                          G. Holmes, Which method to use? An
4. References                                             assessment of data mining methods in
                                                          Environmental Data Science, Environmental
                                                          Modelling & Software 110 (2018) 3-27.
[1] M.M. Bongard, Recognition problem,                    doi:10.1016/j.envsoft.2018.09.021
     Nauka, Moscow, 1967.
                                                     [15] G. Agapito, P. Guzzi, M. Cannataro, Parallel
[2] N.G. Zagoruiko, Recognition methods and
                                                          and Distributed Association Rule Mining in
     their application, Soviet radio, Moscow,             Life Science: a Novel Parallel Algorithm to
     1972.                                                Mine Genomics Data, Information Sciences
[3] N.G. Zagoruiko, Applied methods of data               26.07 (2018). doi:10.1016/j.ins.2018.07.055
     and knowledge analysis, IM SO RAN,
     Novosibirsk, 1999.
[4] A.D. Zakrevsky Recognition logic. Minsk:
     Nauka i tekhnika, 1988, 118 p.
[5] L.G. Malinovsky, Classification processes -
     the basis for constructing the sciences of
     reality,   Algorithms      for     processing
     experimental data (1986) 155-182.
[6] A. Carbon, M. Jensen, A.-H. Sato,
     Challenges in data science: a complex
     systems perspective, Chaos, Solitons &
     Fractals        90        (2016),        1-7.
     doi:10.1016/j.chaos.2016.04.020
[7] L. Cao, Data Science: Challenges and
     Directions, Communications of the ACM,
     60(8) (2017) 59-68. doi:10.1145/3015456
[8] N.N. Moiseev, Man, environment, society.
     Problems of formalized description, Nauka,
     Moscow, 1982.
[9] V.A. Shtoff, Problems of the methodology of
     scientific knowledge, Vysshaia shkola,
     Moscow, 1978.
[10] Z. Chen, Y. Bei, C. Rudin, Concept
     Whitening      for   Interpretable     Image
     Recognition, Nature Machine Intelligence, 2
     (2020) 772-782. doi:10.1038/s42256-020-
     00265-z
[11] D.F. Gluzman (Ed.), Diagnosis of leukemia.
     Atlas and Practical Guide, MORION, 2000.
[12] V.A. Lekakh, Sick issues of modern
     oncology and new approaches to the
     treatment of oncological diseases, Librokom,
     Moscow, 2011.
[13] W. Chen, H. R. Pourghasemi, S. Zhang,
     J. Wang, 21 – A Comparative Study of
     Functional Data Analysis and Generalized
     Linear Model Data-Mining Methods for
     Landslide Spatial Modeling, in H. R.