=Paper=
{{Paper
|id=Vol-3126/paper2
|storemode=property
|title=Capabilities of data mining as a cognitive tool: methodological aspects
|pdfUrl=https://ceur-ws.org/Vol-3126/paper2.pdf
|volume=Vol-3126
|authors=Genady Shevchenko,Oleksander Shumeiko,Volodymyr Bilozubenko
}}
==Capabilities of data mining as a cognitive tool: methodological aspects==
Capabilities of Data Mining As a Cognitive Tool: Methodological
Aspects
Genady Shevchenko 1, Oleksander Shumeiko 2 and Volodymyr Bilozubenko 3
1
Scientific Center, Noosphere Company, Gagarin avenue, 103-A, Dnipro, 49055, Ukraine
2
Dniprovsk State Technical University, Dniprobudivska Street, 2, Kamyanske, 51900, Ukraine
3
Scientific Center, Noosphere Company, Gagarin avenue 103-A, Dnipro, 49055, Ukraine
Abstract
Gaining a competitive advantage in many industries is possible only if the available digitized
data contains genuine knowledge. In this respect, it is necessary to take a step to preliminary
identify their hidden and non-obvious regularities using Data Mining (DM) methods. It is
critical to know the capabilities and limits of the use of DM methods as a cognitive tool in order
to build the effective strategy for addressing the real-life business problems.
The aim of this paper: within the methodology of scientific cognition to specify the capabilities
and limits of the applicability of DM methods. This will enhance the efficiency of using these
DM methods by experts in this field as well as by a wide range of professionals in other fields
who need an analysis of empirical data.
The paper specifies and supplements the basic stages of scientific cognition in terms of using
DM methods. The issue regarding the contribution of DM methods to the methodology of
scientific cognition was raised, and the level of cognitive value of the results of their use was
determined.
The scheme illustrating the relationship between the methodology of the levels of scientific
cognition, which supplements the well-known schemes of their classification and demonstrates
the maximum capabilities of DM methods, was developed. In terms of the methodology of
scientific cognition, a crucial fact was established - the limit of applicability of any DM method
is the lowest, the first level of the methodology of scientific cognition – the level of techniques.
The result of the processing in the form of ER can serve as a basis for these techniques.
Keywords 1
Data Mining, data, scientific cognition, methodology, empirical regularity, hypothesis.
1. Introduction number of different methods for identification of
regularities. In the English-speaking world, they
commonly use the term “Machine Learning”,
The enhanced opportunities of the existing
denoting all Data Mining technologies.). This
cognitive tools and a search for new tools have
happened in response to the practical needs in
always aroused a great interest, owing to their
different sectors of the national economy, as well
crucial importance for the development of human
as in the context of evolving capacities of
civilization, because knowledge gained as a result
computers, which enabled to accumulate and
of the use of these tools is the primary means of
process large amounts of heterogeneous data.
transforming the reality.
In recent decades, Data Mining (DM) methods
and tools have become widely used (Data Mining
— it is not a single method, but a variety of a large
ISIT 2021: II International Scientific and Practical Conference
«Intellectual Systems and Information Technologies», September
13–19, 2021, Odesa, Ukraine
EMAIL: nikk.gena@gmail.com (A. 1); shumeiko_a@ukr.net (А.
2); bvs910@gmail.com (A. 3)
ORCID: 0000-0003-3984-9266 (A. 1); 0000-0002-8170-9606 (A.
2); 0000-0003-1269-7207 (A. 3)
©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
2. Main result mining algorithms are improved. However, in
terms of the methodology, very little effort is
made and almost no researches are carried out in
DM algorithms, implemented as computer
this field, which substantially hinders further
programs, have actually developed new research
development of DM that, generally speaking,
tools. At the same time, a widespread use of DM
could become a basis for disciplinary revolution
methods raises methodological questions whether
in the theory of cognition, and could even enable
we have a correct understanding of their
to generate major innovations in the field of
capabilities and limits as well as data processing
intelligent technologies.
results in terms of scientific cognition. At first
The aim of the study: to specify the capabilities
glance, it seems an abstract question, but its
and limits of applying DM methods in terms of the
clarification will enable the concerned parties to
methodology of scientific cognition.
achieve better results and organize more effective
The process of cognition is a process of
business processes.
gaining and using knowledge, which is of staged
It should be noted that, to varying degrees, the
nature [8]. The first stage of cognition – singling
attention has already been paid to the image
out and statement of the problem, then –
recognition methodology, as DM methods were
experience, observation, experiment, studying the
formerly called, by such internationally acclaimed
phenomenon: the second stage - summarizing the
scientists as [1-7]. However, these scientists have
facts, identifying their essential parts, forming
not conducted an analysis in terms of the theory
hypotheses and conclusions on their basis, i.e.
of cognition.
certain abstraction from the first stage. At the third
In fact, almost all the time, most studies on DM
stage, the abstractions found, i.e., hypotheses or
methods raise the question which is rather related
conclusions that were made before, are being
to the methodology of cognition2: “What
tested. This is a universal scheme of cognition
knowledge can be derived from the accomulated
(Fig.1).
data and what is its level?” This question
These issues became particularly pronounced
demonstrates the immaturity of our concept of
when computers started to be used for data
DM in terms of the theory of cognition, and it also
mining. The key issue, being critical in terms of
summarizes multiple practical problems of DM
cognition, is what the use of DM introduced into
application, which are not addressed by
the methodology of scientific cognition and what
enchancing the computing capabilities or parallel
the application of its outcomes can result in?
computing in the field of Big Data processing [6].
The application of DM tools starts only when
Besides the difficulties of the right choice and
the data has already been prepared in the form of
application of DM methods to the addressed
datasets, where the objects are represented by the
problems, there is no full understanding of its
sets of multidimensional data – for example, in the
capabilities and limits for the application as well
form of training dataset (TD). It is generally
as of the process (phasing) itself and the obtained
acknowledged that all DM methods are based on
results in terms of the theory of cognition. At the
the inductive method of cognition, i. е., in case of
same time, an understanding of the capabilities
DM (inductive learning), the program learns
and limits of DM can lead to a significant
based on the presented empirical data. In other
modification of the methodology for the study and
words, the program builds some kind of a general
for addressing the practical problems as well as
rule based on the presented empirical data, which
improving the efficiency of applying the methods
is obtained, in particular, through observation or
under consideration.
experiment3. When using any DM methods, the
The practice of analytics shows that DM
final outcome is represented in the form of one or
methods are indeed a powerful tool of scientific
another model that reflects certain regularities
cognition, which is of multidisciplinary nature.
intrinsic to the data under study, which might
Moreover, it is DM methods that can serve as a
logically be called empirical regularities (ER) and
basis for the convergence of the approaches to
which, probably, are hypotheses in nature (that
scientific cognition in the humanities as well as in
was very cautiously assumed by Zakrevsky [4].
natural sciences. Based on DM, a huge number of
the applied problems is addressed, and the data
2 3
Although, most often, it is raised in purely practical terms– how far The matters of choosing the feature vector and data pre-processing
we can trust the knowledge we gain. are beyond the competence of DM.
Figure 1: General Scheme of scientific cognition (using DM methods)
Therefore, the major outcome of applying DM modifications. It is the most general methods
methods is ER in the subject area under study, of scientific cognition, and their study is the
obtained with the use of these methods, which can subject of philosophical methodology
be represented in different forms and types. These (philosophy of science).
ER are, in fact, “drafts”, a critical auxiliary In view of the foregoing, it is proposed to
material for preparation and development of supplement the above classification of the levels
dialectical “leap” or complicated transition from of the methodology of scientific cognition in the
the empirical level of cognition to the theoretical form of the list of items 1-4, suggested by
one through devising hypotheses are the driver of V. Shtoff, with the scheme presented in Fig.2 –
science (Fig.1). In order to clarify the issue of the some kind of graphical supplement to these items,
level of knowledge derived in terms of the theory illustrating the outcomes of the work in a specific
of scientific cognition when analyzing the data subject area of the inductive approach under
accumulated in a certain subject area, we cannot study, which is a basis of all DM methods, related
do it without the methodology of scientific to the levels of scientific cognition.
cognition that “studies the methods for building The main purpose of this scheme is to show the
the scientific knowledge and methods which are relationship between the levels of cognition, and, the
used to gain new knowledge, i.e., methods and most important thing, to demonstrate the limit of the
forms of scientific study, dealing with the capabilities of DM methods. It follows from the
technical aspect to a minimum extent” [9]. It is above statement and the illustration that the limit of
customary to distinguish the following levels of the level of the scientific cognition methodology,
the methodology of scientific cognition [9]: achieved through DM methods or tools, is the lowest
1. Technique – the lowest level, the of these levels – the level of techniques.
examples – directions, techniques, etc.; As a result, ER is quite understood by the
2. Scientific method, relying on knowledge expert in the subject area and is applicable for
of the respective regularities, i.e. the theory of further processing as a basis for possible transition
the given subject area; to the hypothesis, which is not the automated
3. General scientific method – quite general result of induction and not an inductive inference,
method of scientific study, where the applicability but one of the possible answers to the problem
extends the limits of one or another scientific encountered, including in the form of
discipline and relies on the existence of assumptions, suggestions and their implications
regularities, being common for different areas. with further testing in practice. However, the
4. Methods used in all sciences without emergence of hypothesis is mandatory4.
exception, although, in different forms and
4
The need for hypothesis stems from the fact that the laws are not practical (experimental, object-tool) and theoretical activity.
directly seen in individual facts, no matter how many of them are However, eventually it is only confirmation by practice that converts
accumulated, as the essence does not coincide with phenomena. a hypothesis into the true theory, converts probable knowledge into
Hypothesis is the statement, the truth or falsity of which has not yet the credible one, and vice versa, the refutation in practice and
been established. The process of establishing the truth or falsity of experiment discards the hypothesis as false assumption [9].
the hypothesis is the process of cognition as a dialectic unity of
Figure 2: Relationship between the levels of cognition
Abbreviations: ER – empirical regularities. TD – training dataset. VD – validation dataset
Using DM, it becomes possible to automatically by the researcher and, most probably, carrying out
generate ER, being the “bricks” for advancing and additional researches, which, to a large degree,
building hypotheses as a part of addressing a specific can be considered an extension of DM. This is the
problem. That is, the emergence of hypothesis is case with almost all known DM methods.
preceded by a very important stage of generation Therefore, the ultimate outcome that might be
(search) of ER - this is precisely the contribution of obtained directly in the application of any DM
DM to the process of cognition! Furthermore, this tools is ER level, and, methodologically speaking,
stage occurs automatically, based on the algorithms the level of techniques. Such class of DM models
invented by human beings and implemented in the as neural networks needs to be separately
form of computer programs (a human just selects the mentioned. The use of neural networks, in some
suitable algorithm and downloads the data). cases, yields rather good results; however,
At the same time, possible transition from ER to unfortunately, they produce no effect in terms of
hypothesis as a probable knowledge – is not so easy the methodology of scientific cognition – we
and straightforward way. There is an intersection or cannot build ER in this case and, even more, we
convergence of dialectical logic, methodology of are unable to proceed to formulate and devise
scientific cognition and psychology of scientific hypotheses! Their level is limited by the level of
creativity (Fig.3). The analysis of the structure of “primitive” (like animals do it) recognition
such a complex dialectic intersection is one of the (classification) and nothing more, and it is not
challenges in the way of transition from the itself a new knowledge. From the cognitive and
empirical basis to the theoretical building [9]. methodological points of view, it is a dead-end
type of DM or a completely different paradigm of
the scientific cognition. Actually, this is also
discussed in the work [10] where the authors try
to "feel out" the ways of understanding the work
of neural networks.
It should be noted that it is advancement of ER
that the cytogramm processing web service (URL:
https://www.data4logic.net/ru/Services/CellsAttri
butes) is focused on, enabling cytologists-
researchers to generate ER and, with a high
probability of success, to devise on their basis the
Figure 3: Transition from ER to hypothesis hypotheses to address the problems that they face.
The pictures stipulated by the paper related to
This also requires performing considerable and leukemia diagnostics [11, 12] can be used as an
nontrivial intellectual work, taking certain efforts example of this approach.
In many cases, solving specific practical there is still a limit represented by the empirical
problems is actually limited, in terms of cognition, cognition – obtaining of ER, i.e., in fact, provisional
to the level of ER, which is used as a basis for hypothesis for the given specific subject area. In this
further formulation, in a best-case scenario, of a case, the burden of solving the specific problem to
decision-making direction or rule, and it remains deepen cognition and clarify the hypotheses is fully
at the first empirical level of cognition, being the transferred to the experts in the subject area. The
lowest of all possible levels [13, 14, 15]. In the full-fledge interaction between the experts in subject
short run, it suits business as a sphere of practical areas and Data Scientist is significantly more
activities; however, in the long run, the main think painstaking in terms of organizational and
is lost – finding really new knowledge which can communicative cost, but, in our opinion, this
be implemented in innovations, or developing a approach is able to ensure major breakthroughs in
new method, modus operandi, business model, the subject area. An interim option is also possible
etc., that will provide higher-order competitive and now it begins to be actively used in business.
advantage. Many companies realized that, without efficient
In a similar way, the level of “primitive” “task setters” and analytics well-versed in DM tools,
classification inherent to neural networks often suits just the use of desktop, web and cloud services was
business. Consequently, it can be ascertained that inefficient. From a methodological standpoint, the
DM methods are capable of providing only the level most critical fact has been established – the limits of
of empirical cognition in the specific subject area the applicability of any DM methods are the level of
under study as well as the level of techniques and ER, i.e. the level of techniques and directions in a
directions, which completely fits the scheme shown specific subject area, where data mining methods are
in Fig.1 and Fig.2. used, or provisional (working) hypothesis. As of
Now, it becomes clear why there are no today, it is the only visible and obvious achievement
“breakthrough” inventions made using DM – of all DM algorithms. It should be noted that one of
because now such inventions can take place only the available web services, suitable for researchers
in a specific subject area, and this requires close who have no special training on mathematics and
cooperation and interaction as well as full-fledged informatics, which is designed to find ER, is
scientific communication with the representatives implemented on ScienceHunter portal
of the same subject area, which is the biggest (https://www.sciencehunter.net).
obstacle to such kind of achievements.
Hence, the following conclusions can be 3. Conclusions
drawn.
1. The methods of DM as well as Big Data
is a new man-machine methodology of empirical Knowing the applicability limits of DM tools, it
is possible to more fully understand how to set goals
cognition.
when selecting appropriate DM methods; for
2. These methods have their limit in the
form of ER represented in different forms. example, to choose ones that produce a relatively
3. ER can serve as “drafts” for preparation, large set of ER, or to use those ones that produce a
generation and formulation of hypotheses aimed limited set of such patterns characterized by greater
at further more in-depth cognition of the subject accuracy. From the methodological point of view,
area. the most important fact has been established – the
4. In order to select the best strategy for the limits of applicability of DM methods is the level of
use of DM tools, a clear understanding of the ER. A huge number of methods, techniques, a
goals of problem-solving is needed. variety of developed computer programs, cloud
5. The use of DM tools requires a close services and other software – all this ends up with
cooperation with the experts in a specific subject one thing that is the level of ER. Currently, this is the
area that, in its turn, raises a number of questions only observable and obvious achievement of all DM
related to: initiation of such cooperation; algorithms. Should the result be considered
skillfulness of the experts in the subject area; important in terms of cognition? It is quite possible
statement of the problem in the respective context; to answer positively. Although it should be
building the team to solve the problem, etc. emphasized that all this refers to a particular subject
6. DM and Big Data experts’ “shifting” to the area, which applies methods of data mining. It
area of development of the standardized software should be noted that DM can be understood as an
(cloud services, web-services, desktop applications) evidentiary or constructive method of cognition,
with all the advantages and disadvantages. Finding
does not solve the problem of in-depth cognition;
ER today is implemented in the form of web Pourghasemi, C. Gokceoglu (Eds.) Spatial
services (for example, ScienceHunter portal: Modeling in GIS and R for Earth and
https://www.sciencehunter.net), so future research Environmental Sciences, Elsevier, 2019,
will focus on the development of an automated pp. 467-484). doi:10.1016/B978-0-12-
system concept for DM, suitable for researchers 815226-3.00021-1
with no special training in mathematics and [14] K. Gibert, J. Izquierdo, M. SànchezMarrè,
computer science. S.H. Hamilton, I. Rodríguez-Roda,
G. Holmes, Which method to use? An
4. References assessment of data mining methods in
Environmental Data Science, Environmental
Modelling & Software 110 (2018) 3-27.
[1] M.M. Bongard, Recognition problem, doi:10.1016/j.envsoft.2018.09.021
Nauka, Moscow, 1967.
[15] G. Agapito, P. Guzzi, M. Cannataro, Parallel
[2] N.G. Zagoruiko, Recognition methods and
and Distributed Association Rule Mining in
their application, Soviet radio, Moscow, Life Science: a Novel Parallel Algorithm to
1972. Mine Genomics Data, Information Sciences
[3] N.G. Zagoruiko, Applied methods of data 26.07 (2018). doi:10.1016/j.ins.2018.07.055
and knowledge analysis, IM SO RAN,
Novosibirsk, 1999.
[4] A.D. Zakrevsky Recognition logic. Minsk:
Nauka i tekhnika, 1988, 118 p.
[5] L.G. Malinovsky, Classification processes -
the basis for constructing the sciences of
reality, Algorithms for processing
experimental data (1986) 155-182.
[6] A. Carbon, M. Jensen, A.-H. Sato,
Challenges in data science: a complex
systems perspective, Chaos, Solitons &
Fractals 90 (2016), 1-7.
doi:10.1016/j.chaos.2016.04.020
[7] L. Cao, Data Science: Challenges and
Directions, Communications of the ACM,
60(8) (2017) 59-68. doi:10.1145/3015456
[8] N.N. Moiseev, Man, environment, society.
Problems of formalized description, Nauka,
Moscow, 1982.
[9] V.A. Shtoff, Problems of the methodology of
scientific knowledge, Vysshaia shkola,
Moscow, 1978.
[10] Z. Chen, Y. Bei, C. Rudin, Concept
Whitening for Interpretable Image
Recognition, Nature Machine Intelligence, 2
(2020) 772-782. doi:10.1038/s42256-020-
00265-z
[11] D.F. Gluzman (Ed.), Diagnosis of leukemia.
Atlas and Practical Guide, MORION, 2000.
[12] V.A. Lekakh, Sick issues of modern
oncology and new approaches to the
treatment of oncological diseases, Librokom,
Moscow, 2011.
[13] W. Chen, H. R. Pourghasemi, S. Zhang,
J. Wang, 21 – A Comparative Study of
Functional Data Analysis and Generalized
Linear Model Data-Mining Methods for
Landslide Spatial Modeling, in H. R.