Nina O. Rizun et al. CEUR Workshop Proceedings                                                                                                      63–81


                         Data science tools for economics education: text mining
                         and topic modeling applications
                         Nina O. Rizun1 , Maryna V. Nehrey2,3 and Nataliia P. Volkova4
                         1
                           Gdańsk University of Technology, 11/12 Gabriela Narutowicza, 80-233 Gdańsk, Poland
                         2
                           National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony Str., Kyiv, 03041, Ukraine
                         3
                           Eidgenössische Technische Hochschule Zürich, Main building, Rämistrasse 101, 8092 Zurich, Switzerland
                         4
                           Alfred Nobel University, 18 Sicheslavska Naberezhna Str., Dnipro, 49000, Ukraine


                                      Abstract
                                      Data science is the interdisciplinary field that uses tools, algorithms, and knowledge of mathematics and statistics
                                      to extract insights from data. Data science has a wide range of applications in various domains, such as business,
                                      marketing, banking, insurance, medicine, tourism, etc. Data science can also enhance the value of economics
                                      education by providing students with relevant skills and competencies for the modern and technologically
                                      advanced society. This paper explores the use of data science tools, especially text mining and natural language
                                      processing, for conducting scientific research and teaching economics. The paper demonstrates how text analytics
                                      and topic modeling can be used to analyze public perception of various topics, such as events, companies, products,
                                      and services. The paper also shows how text analytics and topic modeling can incorporate additional metadata,
                                      such as the characteristics of the comment authors, to reveal differences in their opinions. Furthermore, the paper
                                      reviews the data science study programs for economics at top-20 universities and identifies their strengths and
                                      weaknesses.

                                      Keywords
                                      data science, economics education, text mining, topic modeling, machine learning, natural language processing


                         1. Introduction
                         The year 2020 was a critical moment for the global society, as the COVID-19 pandemic exposed the
                         vulnerabilities and opportunities of various sectors and domains [1, 2, 3, 4, 5]. The education sector was
                         one of the most affected by the pandemic, as it had to undergo a rapid digital transformation, a shift to
                         online learning, and a suspension of educational activities [6, 7, 8, 9, 10]. The field of economics also
                         faced significant changes, such as the digitalization of processes, the adoption of remote work, and the
                         alteration of service and communication with customers [11, 12]. The fast-paced world has become
                         more digital than ever, and the demand for data literacy, data-driven decision making, and data science
                         skills has increased accordingly.
                            Data science is an interdisciplinary field that uses tools, algorithms, and knowledge of mathematics
                         and statistics to extract insights from data. Data science has a wide range of applications in various
                         domains, such as business, marketing, banking, insurance, medicine, tourism, etc. However, the potential
                         of data science in education has been relatively underexplored, and many opportunities for advancing
                         the field have not been fully exploited.
                            Data science can be used in education to address scientific problems, such as in the study of behavior
                         in economics, in macro- and microeconomics, marketing, finance, agriculture, environmental and
                         ecological economics, and so on. Data science can also be used to enhance the teaching and learn-
                         ing of economics by providing students with relevant skills and competencies for the modern and
                         technologically advanced society.

                          CoSinE 2024: 11th Illia O. Teplytskyi Workshop on Computer Simulation in Education, co-located with the XVI International
                          Conference on Mathematics, Science and Technology Education (ICon-MaSTEd 2024), May 15, 2024, Kryvyi Rih, Ukraine
                          " nina.rizun@pg.edu.pl (N. O. Rizun); marina.nehrey@gmail.com (M. V. Nehrey); npvolkova@yahoo.com (N. P. Volkova)
                          ~ https://pg.edu.pl/b5968c8562_nina.rizun (N. O. Rizun); https://scholar.google.com.ua/citations?user=NkrrNKAAAAAJ
                          (M. V. Nehrey); https://scholar.google.com.ua/citations?user=Y18aS7EAAAAJ (N. P. Volkova)
                           0000-0002-4343-9713 (N. O. Rizun); 0000-0001-9243-1534 (M. V. Nehrey); 0000-0003-1258-7251 (N. P. Volkova)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                             63
Nina O. Rizun et al. CEUR Workshop Proceedings                                                           63–81


2. Literature review
Data science has a big list of tools: linear regression, logistic regression, density estimation, confidence
interval, test of hypotheses, pattern recognition, clustering, supervised learning, time series, decision
trees, Monte-Carlo simulation, naive Bayes, principal component analysis, neural networks, k-means,
recommendation engine, collaborative filtering, association rules, scoring engine, segmentation, predic-
tive modeling, graphs, deep learning, game theory, arbitrage, cross-validation, model fitting, etc. Some
of these tools were used in the next researches.
   Teaching data science, for example, were introduced in [13], Big data and data science methods
presented in [14, 15, 16, 17, 18, 19, 20], machine learning used in [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35], Monte Carlo method presented in [36], Artificial Intelligence presented in [37, 38, 39, 40].
Data science is fast developing. A large volume of information that grows with each passing year makes
it possible to build high-precision models that simplify and partially automate the decision-making
process. Models are being developed that implement the key data science algorithms for different
areas of economics: financial data science [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], for institutional
economics – [53, 54, 55, 56, 57, 58], for agriculture – [59, 60, 61], for taxation – [62], and labor market –
[63].
   Data science developing for education discussed in [64, 65, 66, 67, 68].


3. Data science: principles and tools
Data science in education is a multidisciplinary approach to technologies, processes, and systems for
extract knowledge, understanding of data, and supports decision-making under uncertainty. Data
science deals with mathematics, statistics, statistical modeling, signal processing, computer science &
programming, database technologies, data modeling, machine learning, natural language processing,
predictive analytics, visualization, etc. Data science in education has two aspects of the application:
(i) the management and processing of data and (ii) analytical methods for analysis and modeling,
and includes nine main steps (figure 1). The first aspect includes data systems and their preparation,
including databases facilities, data cleansing, engineering, visualization, monitoring, and reporting. The
second aspect includes data analytics data mining, machine learning, text analytics, probability theory,
optimization, and visualization. The basis of the learning process is the availability of relevant data that
is of sufficient quality, appropriately organized for the task. Primary data often requires pre-processing.
First of all, it is necessary to investigate the availability of the necessary data and how they can be
obtained. The data search ends with the creation of a data set in which data coexistence is to be provided.
Data science has a wide range of tools for data evaluation and preparation, in particular for data mining,
data manipulation (value conversion, data aggregation and reordering, table aggregation, breakdown
or merge of values, etc.) and validation of data (checking format, ranges of test values and search in
legal values tables). The problem of missing values is solved by using different analytical methods:
simulation, inserting default values, statistical simulation. Data science provides broad opportunities
for text analytics. In addition, the use of data science tools facilitates work with big data. The main
approaches in data science are supervised learning models and unsupervised learning models.

3.1. Supervised learning models
Supervised learning is one of the methods of machine learning, in which the model learns on the
basis of labeled data. Using Supervised learning is possible to decide on two types of tasks: regression
and classification. The main difference between them is the type of variance that is predicted by the
corresponding algorithm. In regression training, it is a continuous variable, in the classification, it is a
categorical variable. To solve these problems, many algorithms have been developed. One of the most
common is a linear and logistic regression, a decision tree.
  Linear regression. Regression analysis can be considered as the basis of statistical research. This
approach involves a wide range of algorithms for forecasting a dependent variable using one or


                                                      64
Nina O. Rizun et al. CEUR Workshop Proceedings                                                        63–81


Figure 1: Data science process.


more factors (independent variables). The advantage of applying such an approach to modeling is
the simplicity and clarity of the results, the speed of learning, and the release of the forecast. The
disadvantage is not always sufficiently high precision (since in economics and finances, the linear
relationship between changes is rare).
   Logistic regression is used when it is necessary to predict the release of a binary variable using a
dataset of continuous or categorical variables. Situations, where the parent variable has more than 2
possible values, can be simulated by a one-vs-all approach when constructing a logistic classifier for
a possible output, or one-vs-one when constructing logistic classifiers for each possible combination
of categories of the original variable. The dependence between the independent and the logarithmic
variable in logistic regression is linear, the only difference with linear regression is sigmoidal functions,
which converts a linear result in the probability of belonging to a class within [0; 1]. The advantages and
disadvantages of logistic regression are due to the advantages and disadvantages of linear regression.
This is the speed of the algorithm and the possible interpretation of the results, on the one hand, and
a little accuracy – on the other. Logistic regression is often used to construct vote-counting models.
An important factor in this is the interpretation of its results. The influence of each factor is clearly
expressed by the magnitude of the coefficient 𝑏, which allows it to be clearly defined which of them
positively and to what extent influence the decision.
   A decision tree is an approach to both regression and classification. It is widely used in intelligent
data analysis. The decision tree consists of “nodes” and “branches”. The tree nodes have attributes
that are used to make decisions. In order to make a decision, it is needed to go down to the bottom
of the decision tree. The sequence of attributes in a tree, as well as the values that divide the leaves
into branches, depends on such parameters as the amount of information or entropy that the attribute
adds to the prediction variable. The advantages of decision trees are the simplicity of interpretation,
greater accuracy in decision-making simulation compared with regression models, the simplicity of
visualization, natural modeling of categorical variables (in regression models it is needed to be coded
by artificial variables). However, the decision trees have one significant drawback – low predictive
accuracy [69].

3.2. Unsupervised learning
Unsupervised learning describes a more complex situation in which, for each observation 𝑖 = 1, ..., 𝑛,
observation of the measurement vector 𝑥𝑖 , but without any variables in the output 𝑦𝑖 . In such data, the
construction of linear or logistic regression models is impossible, since there are no predictive variables.
In such a situation, a so-called “blind” analysis is conducted. Such a task belongs to the class of tasks of


                                                     65
Nina O. Rizun et al. CEUR Workshop Proceedings                                                      63–81


unsupervised learning, due to the absence of an output variable that guided the analysis. Unsupervised
learning algorithms can be divided into algorithms for space reduction and clustering algorithms. The
main task of clustering is to find patterns in the data that allow you to divide the data into groups and
then in a certain way analyze them and give them an interpretation.
   K-means is one of the most popular clustering algorithms, whose main task is to divide 𝑛 observations
into 𝑘 clusters. The minimum sum of squares is the distance of each observation to the center of the
corresponding cluster. This algorithm is iterative, at each step the cluster centers are re-indexed and
redistributed observation between them until a stable result is achieved. The benefits of such an
algorithm of clustering are the simplicity, speed, and the ability to process large amounts of data. But
the user must specify the number of clusters he wants to use for clustering before computing; the
instability of the result (it depends on the initial separation of points between the clusters).
   Hierarchical clustering is an alternative approach to clustering, which does not require a pre-
liminary determination of the number of clusters. Moreover, the hierarchical clustering ensures the
stability of the result and gives the output an attractive visualization based on the tree-like structure
of observations/clusters – dendrogram. This clustering algorithm uses different distance metrics and
cluster agglomeration cluster criteria, which makes it very flexible to the data on which clustering is
performed. However, the disadvantage of hierarchical clustering is the need to calculate the matrices
of the distance between observations before agglomeration, which complicates the application of this
algorithm for large data and data with many dimensions.
   Time series analysis. A time series is built by observations that have been collected with a fixed
interval. It could be daily demand, or monthly profit growth rates, number of flights, etc. The time
series analysis takes an important part in the analysis of data that covers the region, from the analysis
of exchange rates to sales forecasting [70, 71]. One of the tasks of time series analysis is the allocation
of trend and seasonal components and the construction of the forecast. There are many algorithms that
have been developed, and we consider models such as ARIMA and Prophet.
   The ARIMA algorithm is one of the most common algorithms for forecasting time series. The basic
idea is to use the previous time series values to predict the future. This can use any number of lags,
which makes such an approach difficult in setting because it is necessary to select the parameter so as
to minimize the error and not override the model. ARIMA is often used for short-term forecasting. A
disadvantage is the complexity of learning a model in many seasonal conditions.
   Algorithm Prophet was developed by Facebook at the beginning of 2017 for forecasting based on
time series [70]. It is based on an additive model in which nonlinear trends are of annual and weekly
seasonality. This approach also allows to model holidays and weekends, thereby allowing to predict
residuals in a time series. Also, the Prophet is insensitive to missed values, the bias in the trend, and
significant residuals, which is an important advantage over ARIMA. Another advantage is the rather
high speed of training, as well as the ability to use large-scale time series.


4. Topic modeling in data science
Under the notion of texts mining in natural language we understand the application of methods of texts
computer analysis and presentation in order to achieve the quality, which corresponds to the “manual”
processing for further usage in various tasks and applications. One of the actual tasks of automatic
texts mining is topic modelling.

4.1. Latent Dirichlet Allocation
Topic modelling is a statistical approach to extract the hidden semantics that occurs in a collection
of documents or reviews. Latent Dirichlet Allocation (LDA) model proposed by [72] is one of the
most notable approach for unsupervised topic modeling, which assumes documents and the words
within them are derived from a “generative probabilistic model”. Within the class of unsupervised
statistical topic models, themes are defined as distributions over a vocabulary of words that represent
semantically interpretable “topic” [73]. ‘Meaning’ of those topics (usually, in the form of topic Label


                                                    66
Nina O. Rizun et al. CEUR Workshop Proceedings                                                      63–81


and topic Description) is an emergent quality of the relationship between words [74, 75]. The task of
topic meaning recognizing is often fraught with difficulty and requires the application of a triangular
approach to its implementation, namely: (i) a literature review of existing topics found in the analyzed
problem domain; (ii) independent work of experts on assigning labels to topics; (iii) conducting joint
expert discussions in order to compare and revise the obtained labelling results.
  As for main assumption of LDA method, there are the following [76]: (i) document is represented as a
mixture of topics; (ii) each topic are present in many documents; (iii) each word within a given document
belonging to exactly one topic; (iv) each document can be represented as a vector of proportions that
denote what fraction of the words belong to each topic.
  The basic LDA model is shown in figure 2.


Figure 2: Latent Dirichlet allocation model [77].

    Figure 2 serves as a visual explanation of the model and could be described as follows: (i) we have 𝐷
documents and 𝐾 topics; (ii) each topic presented by 𝛽𝑘 words distribution over the vocabulary within
the topic 𝑘; (iii) each document is presented by 𝜃𝑑 topic proportions within the document, where 𝜃𝑑,𝑘 is
the topic proportion for topic 𝑘 in document 𝑑. Finally, we have (iv) for each 𝑛𝑡ℎ word in the document
𝑑 – topic assignments 𝑧𝑑,𝑛 (depends on the per-document topic proportions 𝜃𝑑 ) and (v) for each 𝑑𝑡ℎ
document – observed words 𝑤𝑑,𝑛 which is an element from the fixed vocabulary (depends on the topic
assignment 𝑧𝑑,𝑛 and all of the topics 𝛽1:𝑘 ) [77].
    In is obviously that data scientist in cooperation with other science domains increasingly seek ways
to apply NLP and especially LDA topic modelling techniques to extract, organize, recognize, label and
classify customers opinions and experiences [78]. Next examples demonstrate the possibilities to sol the
apply LDA topic modelling for solving: (i) human resources management, (ii) service quality assessment,
(iii) research & development policy coordination tasks and (iv) strategic planning in universities.
    Kobayashi et al. [79] used topic modelling to summarize the worker attributes and find worker
attribute constructs and use these to cluster jobs. 140 main topics were identified, and such skills, as,
for example, interpersonal communication (vocabulary of words: communication, written, oral, verbal,
interpersonal, presentation, effective, listening); analytical and problem-solving (vocabulary of words:
problem, solving, analytical, solver, troubleshooting, approach, abilities, capabilities); data analytical
skills (vocabulary of words: data, analysis, quantitative, research, statistics, economics, statistical,
modeling); willingness to travel and the ability to operate on a flexible work schedule (vocabulary of
words: travel, willingness, willing, work, time, needed, internationally, international) and other. As
authors mentioned, topic modelling showed that it is not only possible to classify job information from
vacancies but that we can also derive behavioral characteristics that are valued or required by employers
from potential or existing job holders. Moreover, as a further analysis of this research was planned
the analysing trends of worker attributes required by organizations (i) over time, (ii) occupations,
companies, and (iii) geographical regions, and also (iv) possibility to build a network of work activities
to examine relationship among tasks.
    Wallace et al. [80], Sharma et al. [81] captured the main positive and negative words within latent
aspects (topics), which characterise interpersonal manner, technical competence, and systems issues


                                                    67
Nina O. Rizun et al. CEUR Workshop Proceedings                                                         63–81


[82] from online physician reviews. Similar with previous work, James et al. [83] based on López et al.
[82] categorization, examined unstructured textual feedback of physicians in order to determine: (i)
how the extracted sentiment and topics compared to traditional identified dimensions of service quality
in healthcare and (ii) what tone and topic elements were driving patients’ service quality ratings. As
a main finding were the following list of topics and their tone: (1) Negative system quality: Staff and
Timeliness (vocabulary of words: office, staff, time, doctor, wait, appointment); (2) Positive interpersonal
quality: Physician Compassion (vocabulary of words: doctor, caring, great, knowledgeable, excellent,
recommend); (3) Negative system quality: Experience (vocabulary of words: told, don’t, doctor, ask, bad,
money, call); (4) Positive Technical quality: Family (vocabulary of words: doctor, questions, staff, practice,
children, son, pregnancy); (5) Positive Technical quality: Surgery (vocabulary of words: surgery, pain,
procedure, staff, hospital, knee, cancer, age); (6) Negative Technical quality: Diagnosis (vocabulary of
words: years, treatment, medical, patient, conditions, test, diagnosis, time, treated). The obtained results
allowed the authors to establish the dependence on the degree of influence of the identified aspects
(topics) on the general perception of the physician’s quality, as well as the behavioural characteristics
of patients when choosing a doctor online, depending on the content of comments and overall rating.

4.2. Structural topic modelling
When conducting research on the basis of textual documents or customers comments, researchers often
have a more of information “about the text” than “about the content of the text”. From the perspective of
topic modelling as a statistical approach, the existence of such information “about the text” (metadata)
allows and initiates the inclusion in the model of additional covariates that could influence the following
components of the topic model: (1) Proportion of the document devoted to the topic (”prevalence of
the topic”). For example, we can know that “clients who buy products online are more likely to talk
about delivery problems than clients who buy offline”. (2) Word rates used in the discussing of the topic
(”topical content”). For example, we can clarify that “when clients talking about delivery problems,
clients who buy products online are more likely discuss the problems about products returning, but
patients clients who buy offline are more likely discuss staff rudeness issues” [84]. Such possibilities are
proposed by Structural topic modelling (STM) as an extension of the LDA framework [74, 84, 85] .
   Drawing analogies with LDA: (i) each document in STM arises as a mixture over 𝐾 topics; (ii) topic
proportions (𝜃𝑑 ) can be correlated (LDA limitation 1); (iii) topics prevalence 𝜃𝑑 can be influenced by
set of covariates 𝑋 through a standard regression model with covariates; (ii) for each 𝑤𝑛 word in the
document 𝑑 (iii) a topic 𝑍𝑑,𝑛 is drawn from the document-specific distribution, and (iv) conditional on
that topic, a word is chosen from a multinomial distribution over words parameterized by 𝛽𝑑,𝑘,𝑣 , where
𝑘 = 𝑍𝑑,𝑛 . This distribution can include a second set of covariates 𝑌 [84]. Thus, the main differences
between the LSA and STM models (figure 3) are that the prevalence (content) parameters determined
in the LDA by the general a priori Dirichlet parameters 𝛼(𝜂) in the STM model are replaced with
prior structures specified in the form of generalized linear models parameterized by document specific
covariates 𝑋(𝑌 ) [86] These covariates inform either the topic prevalence (covariates 𝑋) or the topical
content (covariates 𝑌 ) latent variables with information “about the text” (metadata).


5. Example of structural modelling algorithms application in
   education
In order to study customer perception of the quality of services, assess their satisfaction with goods or
services received, as well as identify factors that influence customer acceptance of new offers on the
market, students were asked to use STM tools. As a data source 610 textual comments about hospitals
from the site http://www.ratemyhospital.ie/ (over the past two years – 2018–2019) were used. STM
package allows to use all additional variables to demonstrate the power of meta-data for topic modelling.
With this aim, textual comments data was extended by information about (1) hospital ownership (private,
public), (2) sentiment (positive or negative) (table 1) [87]. After that, all steps of text pre-processing


                                                     68
Nina O. Rizun et al. CEUR Workshop Proceedings                                                           63–81


Figure 3: A graphical illustration of the structural topic model [76].


were performed.

Table 1
Comments before pre-processing.
   Comments                                                                  Hospital Ownership   Sentiment
   A lovely friendly patient-focussed hospital                               Public               Positive
   Consultant I found seriously lacking compassion for my mother             Public               Negative
   the patient. Sniggered while informing us that while my mother’s
   condition is uncomfortable, it is not life threatening.To be frank,
   consultant spoke down to us.
   Tullamore is a very clean hospital and looks very well. All staff I had   Private              Positive
   the pleasure of meeting were lovely and very professional at all times.
   The staff in all capacities do not receive enough thanks for the jobs
   they do

   First, the STM model’s setup were performed. To determine the optimal number of topics, STM
models from 10 till 30 topics were built were analyzed. Semantic coherence is maximized when the
most probable words in a given topic frequently co-occur together, and it is a metric that correlates well
with a human judgment of topic quality. Having high semantic coherence is relatively easy, though, if
we only have a few topics dominated by very common words, so we wanted to look at both semantic
coherence and exclusivity of words to topics. So, the most valuable number of topics should be very
coherent and also very exclusive. Looking at figure 4, we draw the conclusion that the 15 topics suit
the most to these criteria. Most of the topics, in this case, are above the average of exclusivity and
have high coherence, especially compared to the other number of topics which are often spread out on
both axes. 15-topic STM model was selected based on subjectively optimal combination of the average
semantic coherence and exclusivity outcomes.
   As a result, for 15-topic model, we received the (i) topic-words distribution 𝛽; (ii) document-topic
proportions 𝜃; (iii) list of Highest probability-, FREX-, Lift- and Score-keywords (Highest Prob: are the
words within each topic with the highest probability; FREX : are the words that are both frequent and
exclusive, identifying words that distinguish topics; Lift: give more weight to words that appear less
frequently in other topics by dividing their frequency into other topics; Score: score words are weighted
by dividing the log frequency of the word in the topic by the log frequency in other topics [85, 88, 89]);
(iv) set of documents, mostly associated with this topic. The figure 5 allows us to get information on


                                                        69
Nina O. Rizun et al. CEUR Workshop Proceedings                                                     63–81


Figure 4: Semantic coherence and exclusivity of STM models.


the share of the different topics at the overall corpus.


Figure 5: Expected topic proportions over corpus.


   Second, students needed to realize the Topics labelling step. For that: (1) two students independently
labelled the topics to produce the first version of labels based on top weighted keywords; (2) two
students discussed the labels and resolved discrepancies in labelling; (3) two students independently
refined topic labels based on the computationally guided deep reading 20 of the most representative
tweets of the topics; (5) two students agreed on final 15 topic labels and jointly developed the topics


                                                    70
Nina O. Rizun et al. CEUR Workshop Proceedings                                                               63–81


descriptions (short summarization of the topic content) [87]. The result of topic labelling is presented
in the table 2.

Table 2
Topics labels.
 #    Topics label                       Topic keywords                                              Topic pro-
                                                                                                     portion, %
 1  Appointment Time Reliability      time, service, wait, appoint, nurses, clinic, profession       4.47
 2  Communication Skills              nurses, rude, hospital, patient, found, staff, ward            6.34
 3  Service Standards                 hospital, consult, year, many, staff, standard, old            9.45
 4  Waiting Time                      staff, hospital, member, given, sever, hour, time              3.03
 5  Staff Feedback/Explanation        staff, kind, time, patient, depart, great, explain             8.09
 6  Patient-Focusing Service          ask, hospital, doctor, day, told, week, care                   2.56
 7  Maternity Unit/Care               baby, doctor, midwife, time, inform, midwife, week             2.89
 8  Personnel Reliability / Treatment scare, staff, receive, excel, thank, ward, treatment           11.81
 9  Food Service                      hospital, staff, need, food, poor, good, doctor                8.10
 10 Hospital Environment              hospital, mother, conditions, room, week, inform, doctor       4.48
 11 Care and Recovery                 nursed, care, good, great, love, doctor, patient               9.29
 12 A&E/Admission                     pain, hospital, appoint, staff, still, patient, never          5.37
 13 Information Exchange with Pa- hour, doctor, wait, told, seen, blood, home                        9.99
    tient/Family
 14 Service Rapidness                 hospital, staff, well, profession, attend, efficiency, visit   8.31
 15 Ward/Hospital’s Facilities        patient, staff, trolley, corridor, time, ward, hospital        5.82


   Third, the STM covariate analysis could be performed. In this stage, we aimed the evaluating the
Sentiment effect on the formation of more positively and more negatively oriented aspects of hospitals
service quality (HSQ). Thus, we use Sentiment metadata as Covariate in the STM model. Formally, we
can identify an aspect as negative if, according to the results of effect estimation, the proportion of
this aspect in negative comments (Sentiment = Negative) is significantly higher than in comments in
positive comments (Sentiment = Positive). According to the results of our experiment, 5 topics (33.33%)
are positive (right side of figure 6), and 10 topics (66.66%) are negative (left side of figure 6).


Figure 6: Difference in the power of Sentiment influence on topic proportion.


   The dots in the figure 6 indicated the mean values of the estimated proportion differences (power
of influence, PI) with 95% confidence intervals, allows us to evaluate the relative degree of influence
of sentiment on of hospitals service quality aspects. For example, the five most negative Topic of
are (1) Information Exchange with Patient/Family (Topic 13) with highest power of negative influence;


                                                       71
Nina O. Rizun et al. CEUR Workshop Proceedings                                                      63–81


(2) Communication Skills (Topic 2); (3) A&E/Admission (Topic 12), (4) Waiting Time (Topic 4) and (5)
Patient-Focusing Service (Topic 6). In turn two most positive topics are (1) Service Rapidness (Topic 14);
(2) Personnel Reliability/Treatment (Topic 8). Knowledge about Topics with a positive and negative
impact of comments Sentiment allow to indicate the strength of patient satisfaction/dissatisfaction with
the hospitals service quality.
   Fourth, the power of Time influence on positive and negative Topics dynamics (from 2018 to 2019)
using the STM model (with Year and Sentiment as a Covariates) should be performed. In terms of the
Influence of the Time Factor on the Service Quality, the following four groups of HSQ Topics can be
distinguished: (1) Topics causing the growth of patient satisfaction with the Service Quality over the
time: positive topics with a positive dynamic over the time; (2) Topics causing a recession in patient
satisfaction with the hospitals service quality (HSQ) over the time: positive topics with a negative
dynamic over the time; (3) Topics causing the growth of patient dissatisfaction with the HSQ over the
time: negative topics with a positive dynamic over the time (4) Topics causing a recession in patient
dissatisfaction with the HSQ over the time: negative topics with a negative dynamic over the time.
   As an indicator that allows us to identify the direction and growth rate (GR) of change in the level of
positive or negative comments describing the Topic, the slope of the regression (dependence between
the proportion of Positive/Negative Aspects and Time) will be used. The presented four charts (figure 7
a, b, c, d) show examples of four possible types of Influence of the Time Factor on the Service Quality:
   1. Positive impact on Service Quality over the time: Service Rapidness topic characterized by growth
      rate (GR=1.100763) of patient satisfaction with the HSQ over the time (figure 7, b);
   2. Worsening of Service Quality over the time: Personnel Reliability/Treatment topic characterized
      by and recession (GR=0.821713) in patient satisfaction with the HSQ over the time (figure 7, a);
   3. Negative impact on Service Quality over the time: Information Exchange with Patient/Family
      topic characterized by growth (GR= 1.758421) of patient dissatisfaction with the HSQ over the
      time (figure 7, d);
   4. Improvement of Service Quality over the time: Food Service topic causing a recession in customer
      dissatisfaction (GR= 0.575861) with the HSQ over the time (figure 7, c).
   As a result, student could see that the largest number of aspects (37.5%) has a negative impact on the
HSQ. The highest degree of growth in patient dissatisfaction is characterized by A$E/Waiting Time topic.
Moreover, this growth rate is not only the largest in the category of Negative impact, but in all analyzed
topics. The most rapid (within the whole set of topics) decrease in the number of positive comments is
characterized by the aspect of Maternity Unit/Care. The group of topics on which improvement in their
quality is noted is 25.1%. At the same time, the Hospital Environment is characterized by the highest
rate of improvement. 16.7% of topics have a positive effect on the HSQ, among which Service Rapidness
and Maternity Unit/Treatment have the largest increase in the number of positive comments.
   Fifth, students may identify the influencing the Hospital Ownership on more positively and more
negatively oriented HSQ aspects structure (using the Sentiment and Hospital Ownership factors as
in the Covariates STM model). For this purpose, the following interpretation of the results could
be proposed: (1) the Topics, more related to Public Hospital Ownership according to the results of
effect estimation, in which the proportion of this Topics in comments about Public hospitals (Hospital
Ownership = Public) is significantly higher than in comments about Private hospitals and vice versa; (2)
the direction (positive or negative) of Hospital Ownership influencing on HSQ. For reaching the first
purpose, the Hospital Ownership effect estimation was performed for revealing the aspects in which
the proportion of the comments about Public hospitals (Hospital Ownership = Public) is significantly
higher than comments about Private hospitals and vice versa.
   For formalization the rules for second purpose reaching, in terms of discovering the Influence of the
Hospital Ownership on the Service Quality, the following groups of aspects proposed to be distinguished:
(1) Topics causing the growth the level of patients satisfaction with Service Quality in Public hospitals:
positive topics with a positive dynamic from Private to Public; (2) Topics causing the growth in the
level of patients satisfaction with Service Quality in Private hospitals: positive topics with a positive
dynamic from Public to Private; (3) Topics causing the growth the level of patients dissatisfaction with


                                                   72
Nina O. Rizun et al. CEUR Workshop Proceedings                                                    63–81


Figure 7: Examples of identification the influence of the Years Metadata.


Service Quality in Public hospitals: negative topics with a positive dynamic from Private to Public;
(4) Topics causing the growth in the level of patients dissatisfaction with Service Quality in Private
hospitals: negative topics with a positive dynamic from Public to Private.
   According to the results of our experiment, 8 Topics are more associated with Public Hospitals (right
side of figure 8), and 6 Topics are more associated with Private Hospitals (left side of figure 8), and
one topic (Topic 13) is for both types of hospitals. Based on received results, we can conclude that the
four topics (one positive and 3 negative), which more characterize the Public Hospital Ownership are
(1) Service Rapidness (positive); (2) Food Service (negative) (3) Maternity Unit/Care (negative) and (4)
Patient-Focusing Service (negative). In turn five Aspects, which more characterize the Private Hospital
Ownership (two positive and two negative) are (1) Appointment Time Reliability (negative); (2)Service
Standards (positive); (3) Staff Feedback/Explanation (positive) and (4) Hospital Environment (negative).
   Thus, this example of the use of STM modeling in teaching students shows how versatile and in-depth
research can be carried out using data science. Presented examples demonstrate the nature of tasks and


                                                     73
Nina O. Rizun et al. CEUR Workshop Proceedings                                                     63–81


Figure 8: Difference in the power of Hospital Ownership influence on Topic Proportion.


approaches which could develop students’ technical and research skills in the public perception analysis.
Such approaches also allow students to gain practical experience in the study and interpretation the
influence of additional metadata, characterizing the comments authors, on differences in their opinions
about events, companies, goods, and services.


6. Data science study programs in economics field
Classical methods of statistical analysis, modeling methods, and data mining are used in economics.
The analysis of data in these areas is aimed at the study of causation. In economics, current issues
include policy development, determining the impact of a decision, long-term and short-term planning
and forecasting, choosing the best solution from many possible, and many others. Drawing conclusions
is also important in economics. In addition, the modern economy and finance are characterized using
big data, so it is not always possible to use classical methods. Therefore, the methods of data science
are precisely those methods that should be used in economics, which gives positive results and effect.
Data science methods were first used in economic research and gradually penetrated into practice.
Today, economics need specialists who have knowledge in these areas and are able to apply data science
methods. In response to this market need, universities have begun to implement data science courses
and programs for students of economics. The table 4 presents the courses and programs of the top 20
universities in the world.
   A study programs in economic field in Ukrainian universities has shown that data science courses
and programs are still being introduced in Ukraine. Currently, there are separate programs for studying
Data Science, mainly for computer science. Therefore, we believe that the prospects that data science
opens for modern economists necessitate the introduction of courses and programs in data science.


7. Conclusions
Data science is a rapidly growing and evolving field that has applications in various domains, such as
research, society, and business. Data science requires significant investments and innovations from
businesses and governments, as well as adequate education and training for students and professionals.
However, as our research has shown, the integration of data science in economics education is still
in its infancy. Only a few leading universities offer data science courses and programs for economics
students, but this trend has not been widely adopted and needs to be further developed.


                                                    74
Nina O. Rizun et al. CEUR Workshop Proceedings                                                             63–81


Table 3
Data science courses and programs for economics at top-20 universities.
     University                     Location              Programs, courses
     Massachusetts Institute        United States         MicroMasters Program in Data, Economics,
     of Technology (MIT)                                  and Development; Policy Computer Science,
                                                          Economics and Data Science – course
     Stanford University            United States         M.S. in Statistics: Data Science; Tackling Big
                                                          Questions Using Social Data Science – course
     Harvard University             United States         Data Science for Business – course;
                                                          Using Big Data Solve Economic
                                                          and Social Problems – course
     California Institute           United States         Business Analytics – course
     of Technology
     University of Oxford           United Kingdom        MSc in Social Data Science
     ETH Zurich - Swiss Federal     Switzerland           Data Science in Techno-Socio-Economic
     Institute of Technology                              Systems – course
     University of Cambridge        United Kingdom        Economics: Data Science and Policy – course
     Imperial College London        United Kingdom        MSc Business Analytics
     University of Chicago          United States         Economic Policy Analysis – course
     UCL                            United Kingdom        Economics and Statistics BSc;
                                                          Social Sciences with Data Science BSc
     National University            Singapore             Master of Science in Business Analytics
     of Singapore
     Princeton University           United States         Statistics and Machine Learning – course
     Nanyang Technological          Singapore             Master of Science in Analytics
     University
     EPFL                           Switzerland           Master’s program in Data science
     Tsinghua University            China (Mainland)      Master’s Program in Data Science
     University of Pennsylvania     United States         Master of Information Systems Management,
                                                          Business Intelligence and Data Analytics;
                                                          MS in Information Technology,
                                                          Business Intelligence and Data Analytics;
                                                          Online Master of Science in Business Analytics
     Yale University                United States         Applied Econometrics: Politics, Sports,
                                                          Microeconomics; Applied Econometrics:
                                                          Macroeconomic and Finance Forecasting
     Cornell University             United States         Introduction to Data Science – course
     Columbia University            United States         Data Science for Social Good -
                                                          summer program
     The University of Edinburgh    United Kingdom        Statistics with Data Science MSc


   As an example of the use of data science methods in economics education, we have demonstrated the
application of STM-modeling in teaching students. STM-modeling is a technique that allows analyzing
textual data and identifying latent topics based on additional metadata, such as the characteristics of
the text authors. STM-modeling can help students develop their technological and research skills, work
with big data, and study and interpret the differences in opinions about various topics, such as events,
companies, products, and services.
   The STM-modeling technique is just one of the many methods and algorithms that can be used for
modeling and analyzing economic processes. There are numerous examples of how data science can
be applied in economics education, such as using time series analysis to predict the future value of a
cryptocurrency, using regression models to determine customer loyalty or the likelihood of customer
insolvency, etc. Data science offers a rich set of tools and techniques that can enhance the learning and
teaching of economics.
   Education should keep pace with the modern development of the digital economy, digital society,


                                                     75
Nina O. Rizun et al. CEUR Workshop Proceedings                                                          63–81


innovation, and creative entrepreneurship. The use of data science in education should be cross-platform,
that is, used not only in the study of specific subjects, but also in the teaching of all subjects, interaction
of students with each other and with teachers, real experts, research, and individual learning.


References
 [1] M. Velykodna, Psychoanalysis during the COVID-19 pandemic: Several reflections on countertrans-
     ference, Psychodynamic Practice 27 (2021) 10–28. doi:10.1080/14753634.2020.1863251.
 [2] S. Semerikov, H. Kucherova, V. Los, D. Ocheretin, Neural Network Analytics and Forecasting the
     Country’s Business Climate in Conditions of the Coronavirus Disease (COVID-19), in: V. Snytyuk,
     A. Anisimov, I. Krak, M. Nikitchenko, O. Marchenko, F. Mallet, V. V. Tsyganok, C. Aldrich, A. Pester,
     H. Tanaka, K. Henke, O. Chertov, S. Bozóki, V. Vovk (Eds.), Proceedings of the 7th International
     Conference “Information Technology and Interactions” (IT&I-2020). Workshops Proceedings, Kyiv,
     Ukraine, December 02-03, 2020, volume 2845 of CEUR Workshop Proceedings, CEUR-WS.org, 2020,
     pp. 22–32. URL: https://ceur-ws.org/Vol-2845/Paper_3.pdf.
 [3] S. O. Semerikov, T. A. Vakaliuk, I. S. Mintii, V. A. Hamaniuk, V. N. Soloviev, O. V. Bondarenko, P. P.
     Nechypurenko, S. V. Shokaliuk, N. V. Moiseienko, V. R. Ruban, Mask and Emotion: Computer
     Vision in the Age of COVID-19, in: Digital Humanities Workshop, DHW 2021, Association for
     Computing Machinery, New York, NY, USA, 2022, p. 103–124. doi:10.1145/3526242.3526263.
 [4] M. Velykodna, I. Frankova, Psychological Support and Psychotherapy during the COVID-19
     Outbreak: First Response of Practitioners, Journal of Intellectual Disability - Diagnosis and
     Treatment 9 (2021) 148–161. URL: https://doi.org/10.6000/2292-2598.2021.09.02.1.
 [5] T. Tkachenko, O. Yeremenko, A. Kozyr, V. Mishchanchuk, W. Liming, Integration Aspect of
     Training Teachers of Art Disciplines in Pedagogical Universities, Journal of Higher Education
     Theory and Practice 22 (2022) 138–147. doi:10.33423/jhetp.v22i6.5236.
 [6] T. A. Vakaliuk, V. V. Osadchyi, O. P. Pinchuk, From the digital transformation strategy to the
     productive integration of technologies in education and training: Report 2023, in: T. A. Vakaliuk,
     V. V. Osadchyi, O. P. Pinchuk (Eds.), Proceedings of the 2nd Workshop on Digital Transformation of
     Education (DigiTransfEd 2023) co-located with 18th International Conference on ICT in Education,
     Research and Industrial Applications (ICTERI 2023), Ivano-Frankivsk, Ukraine, September 18-
     22, 2023, volume 3553 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 1–8. URL: https:
     //ceur-ws.org/Vol-3553/paper00.pdf.
 [7] P. P. Nechypurenko, O. D. Kushnirova, The rebirth of home chemistry experiments: An interna-
     tional perspective and the ukrainian context, Science Education Quarterly 1 (2024) 97–102. URL:
     https://acnsci.org/journal/index.php/seq/article/view/824. doi:10.55056/seq.824.
 [8] S. G. Fashoto, Y. A. Faremi, E. Mbunge, O. Owolabi, Exploring structural equations modelling
     on the use of modified UTAUT model for evaluating online learning, Educational Technology
     Quarterly 2024 (2024) 319–336. doi:10.55056/etq.734.
 [9] S. Adewale, Is virtual learning still virtually satisfactory in the post-COVID-19 era for pre-service
     teachers?, Educational Technology Quarterly 2024 (2024) 152–165. doi:10.55056/etq.713.
[10] K. Meziane Cherif, L. Azzouz, A. Bendania, S. Djaballah, The teachers’ ban or permission of
     smartphone use in Algerian secondary school classrooms, Educational Dimension (2024). doi:10.
     55056/ed.727.
[11] A. Bielinskyi, V. Soloviev, S. Semerikov, V. Solovieva, Identifying stock market crashes by fuzzy
     measures of complexity, Neuro-Fuzzy Modeling Techniques in Economics 10 (2021) 3–45. doi:10.
     33111/nfmte.2021.003.
[12] A. Kiv, P. Hryhoruk, I. Khvostina, V. Solovieva, V. N. Soloviev, S. Semerikov, Machine learning of
     emerging markets in pandemic times, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the
     Special Edition of International Conference on Monitoring, Modeling & Management of Emergent
     Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2020, pp. 1–20. URL: https://ceur-ws.org/Vol-2713/paper00.pdf.


                                                      76
Nina O. Rizun et al. CEUR Workshop Proceedings                                                  63–81


[13] R. J. Brunner, E. J. Kim, Teaching data science, Procedia Computer Science 80 (2016) 1947–1956.
[14] H. Chen, R. H. L. Chiang, V. C. Storey, Business intelligence and analytics: From big data to big
     impact, MIS Quarterly 36 (2012) 1165–1188. URL: http://www.jstor.org/stable/41703503.
[15] G. George, E. C. Osinga, D. Lavie, B. A. Scott, Big data and data science methods for management
     research, The Academy of Management Journal 59 (2016) 1493–1507.
[16] A. G. Shoro, T. R. Soomro, Big data analysis: Apache spark perspective, Global Journal of Computer
     Science and Technology: C Software & Data Engineering 15 (2015) 7–14.
[17] J. Xiong, G. Yu, X. Zhang, Research on governance structure of big data of civil aviation, Journal
     of Computer and Communications 5 (2017) 112–118.
[18] L. Cao, Data science: a comprehensive overview, ACM Computing Surveys 50 (2017) 1–42.
     doi:10.1145/3076253.
[19] A. Ignatyuk, O. Liubkina, T. Murovana, A. Magomedova, FinTech as an innovation challenge:
     From big data to sustainable development, E3S Web of Conferences 166 (2020) 13027. doi:10.
     1051/e3sconf/202016613027.
[20] M. Mazorchuk, T. Vakulenko, A. Bychko, O. Kuzminska, O. Prokhorov, Cloud technologies and
     learning analytics: Web application for pisa results analysis and visualization, CEUR Workshop
     Proceedings 2879 (2020) 484–494.
[21] E. J. Parish, K. Duraisamy, A paradigm for data-driven predictive modeling using field inversion
     and machine learning, Journal of Computational Physics 305 (2016) 758–774.
[22] L. Guryanova, R. Yatsenko, N. Dubrovina, V. Babenko, Machine learning methods and models,
     predictive analytics and applications, CEUR Workshop Proceedings 2649 (2020) 1–5.
[23] V. Babenko, A. Panchyshyn, L. Zomchak, M. Nehrey, Z. Artym-Drohomyretska, T. Lahotskyi,
     Classical machine learning methods in economics research: Macro and micro level examples,
     WSEAS Transactions on Business and Economics (2021) 209–217. doi:10.37394/23207.2021.
     18.22.
[24] S. Nosratabadi, A. Mosavi, P. Duan, P. Ghamisi, F. Filip, S. S. Band, U. Reuter, J. Gama, A. H.
     Gandomi, Data science in economics: comprehensive review of advanced machine learning and
     deep learning methods, Mathematics 8 (2020) 1799.
[25] V. Derbentsev, A. Matviychuk, V. N. Soloviev, Forecasting of Cryptocurrency Prices Using
     Machine Learning, in: L. Pichl, C. Eom, E. Scalas, T. Kaizoji (Eds.), Advanced Studies of
     Financial Technologies and Cryptocurrency Markets, Springer, Singapore, 2020, pp. 211–231.
     doi:10.1007/978-981-15-4498-9_12.
[26] A. Kiv, S. Semerikov, V. N. Soloviev, L. Kibalnyk, H. Danylchuk, A. Matviychuk, Experimental
     Economics and Machine Learning for Prediction of Emergent Economy Dynamics, in: A. Kiv,
     S. Semerikov, V. N. Soloviev, L. Kibalnyk, H. Danylchuk, A. Matviychuk (Eds.), Proceedings of
     the Selected Papers of the 8th International Conference on Monitoring, Modeling & Management
     of Emergent Economy, M3E2-EEMLPEED 2019, Odessa, Ukraine, May 22-24, 2019, volume 2422
     of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 1–4. URL: https://ceur-ws.org/Vol-2422/
     paper00.pdf.
[27] A. Kiv, P. Hryhoruk, I. Khvostina, V. Solovieva, V. N. Soloviev, S. Semerikov, Machine learning of
     emerging markets in pandemic times, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the
     Special Edition of International Conference on Monitoring, Modeling & Management of Emergent
     Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2020, pp. 1–20. URL: https://ceur-ws.org/Vol-2713/paper00.pdf.
[28] A. E. Kiv, V. N. Soloviev, S. O. Semerikov, H. B. Danylchuk, L. O. Kibalnyk, A. V. Matviychuk,
     A. M. Striuk, Machine learning for prediction of emergent economy dynamics III, in: A. E. Kiv,
     V. N. Soloviev, S. O. Semerikov (Eds.), Proceedings of the Selected and Revised Papers of 9th
     International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-
     MLPEED 2021), Odessa, Ukraine, May 26-28, 2021, volume 3048 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2021, pp. i–xxxi. URL: https://ceur-ws.org/Vol-3048/paper00.pdf.
[29] P. V. Zahorodko, Y. O. Modlo, O. O. Kalinichenko, T. V. Selivanova, S. O. Semerikov, Quantum
     enhanced machine learning: An overview, CEUR Workshop Proceedings 2832 (2020) 94–103.


                                                 77
Nina O. Rizun et al. CEUR Workshop Proceedings                                                      63–81


[30] P. V. Zahorodko, S. O. Semerikov, V. N. Soloviev, A. M. Striuk, M. I. Striuk, H. M. Shalatska, Com-
     parisons of performance between quantum-enhanced and classical machine learning algorithms
     on the IBM Quantum Experience, Journal of Physics: Conference Series 1840 (2021) 012021.
     doi:10.1088/1742-6596/1840/1/012021.
[31] D. S. Antoniuk, T. A. Vakaliuk, V. V. Didkivskyi, O. Vizghalov, O. V. Oliinyk, V. M. Yanchuk, Using
     a business simulator with elements of machine learning to develop personal finance management
     skills, in: V. Ermolayev, A. E. Kiv, S. O. Semerikov, V. N. Soloviev, A. M. Striuk (Eds.), Proceedings
     of the 9th Illia O. Teplytskyi Workshop on Computer Simulation in Education (CoSinE 2021)
     co-located with 17th International Conference on ICT in Education, Research, and Industrial
     Applications: Integration, Harmonization, and Knowledge Transfer (ICTERI 2021), Kherson,
     Ukraine, October 1, 2021, volume 3083 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp.
     59–70. URL: https://ceur-ws.org/Vol-3083/paper131.pdf.
[32] D. S. Antoniuk, T. A. Vakaliuk, V. V. Didkivskyi, O. Y. Vizghalov, Development of a simulator to
     determine personal financial strategies using machine learning, CEUR Workshop Proceedings
     3077 (2022) 12–26.
[33] S. Zelinska, Machine learning: Technologies and potential application at mining companies, E3S
     Web of Conferences 166 (2020) 03007. doi:10.1051/e3sconf/202016603007.
[34] H. B. Danylchuk, S. O. Semerikov, Advances in machine learning for the innovation economy:
     in the shadow of war, in: H. B. Danylchuk, S. O. Semerikov (Eds.), Proceedings of the Selected
     and Revised Papers of 10th International Conference on Monitoring, Modeling & Management
     of Emergent Economy (M3E2-MLPEED 2022), Virtual Event, Kryvyi Rih, Ukraine, November
     17-18, 2022, volume 3465 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 1–25. URL:
     https://ceur-ws.org/Vol-3465/paper00.pdf.
[35] Y. O. Hodlevskyi, T. A. Vakaliuk, O. V. Chyzhmotria, O. Chyzhmotria, O. V. Vlasenko, Finding
     Anomalies in the Operation of Automated Control Systems Using Machine Learning, in: T. Hov-
     orushchenko, O. Savenko, P. T. Popov, S. Lysenko (Eds.), Proceedings of the 4th International
     Workshop on Intelligent Information Technologies & Systems of Information Security, Khmel-
     nytskyi, Ukraine, March 22-24, 2023, volume 3373 of CEUR Workshop Proceedings, CEUR-WS.org,
     2023, pp. 681–698. URL: https://ceur-ws.org/Vol-3373/paper47.pdf.
[36] R. Patriarca, G. Di Gravio, F. Costantino, A Monte Carlo evolution of the Functional Resonance
     Analysis Method (FRAM) to assess performance variability in complex systems, Safety science 91
     (2017) 49–60.
[37] N. Rizun, T. Shmelova, Decision-making models of the human-operator as an element of the
     socio-technical systems, in: Strategic Imperatives and Core Competencies in the Era of Robotics
     and Artificial Intelligence, IGI Global, 2017, pp. 167–204.
[38] O. M. Haranin, N. V. Moiseienko, Adaptive artificial intelligence in RPG-game on the Unity game
     engine, CEUR Workshop Proceedings 2292 (2018) 143–150.
[39] M. V. Marienko, S. O. Semerikov, O. M. Markova, Artificial intelligence literacy in secondary
     education: methodological approaches and challenges, in: S. Papadakis (Ed.), Proceedings of the
     11th Workshop on Cloud Technologies in Education (CTE 2023), Kryvyi Rih, Ukraine, December
     22, 2023, volume 3679 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 87–97. URL: https:
     //ceur-ws.org/Vol-3679/paper21.pdf.
[40] A. Bielinskyi, V. Soloviev, V. Solovieva, H. Velykoivanenko, Fuzzy time series forecasting using
     semantic artificial intelligence tools, Neuro-Fuzzy Modeling Techniques in Economics 2022 (2022)
     157–198. doi:10.33111/nfmte.2022.157.
[41] C. Brooks, A. G. F. Hoepner, D. McMillan, A. Vivian, C. W. Simen, Financial data science: the birth
     of a new financial research paradigm complementing econometrics?, The European Journal of
     Finance 25 (2019) 1627–1636. doi:10.1080/1351847X.2019.1662822.
[42] M. L. De Prado, Advances in financial machine learning, John Wiley & Sons, 2018.
[43] H. Danylchuk, N. Chebanova, N. Reznik, Y. Vitkovskyi, Modeling of investment attractiveness
     of countries using entropy analysis of regional stock markets, Global Journal of Environmental
     Science and Management 5 (2019) 227–235. URL: https://www.gjesm.net/article_35558.html. doi:10.


                                                   78
Nina O. Rizun et al. CEUR Workshop Proceedings                                                          63–81


     22034/gjesm.2019.05.SI.25.
[44] A. O. Bielinskyi, S. V. Hushko, A. V. Matviychuk, O. A. Serdyuk, S. O. Semerikov, V. N. Soloviev,
     Irreversibility of financial time series: a case of crisis, in: A. E. Kiv, V. N. Soloviev, S. O. Semerikov
     (Eds.), Proceedings of the Selected and Revised Papers of 9th International Conference on Moni-
     toring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2021), Odessa, Ukraine,
     May 26-28, 2021, volume 3048 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 134–150.
     URL: https://ceur-ws.org/Vol-3048/paper04.pdf.
[45] V. Soloviev, O. Serdiuk, S. Semerikov, A. Kiv, Recurrence plot-based analysis of financial-economic
     crashes, CEUR Workshop Proceedings 2713 (2020) 21–40.
[46] V. N. Soloviev, A. Bielinskyi, O. Serdyuk, V. Solovieva, S. Semerikov, Lyapunov Exponents as
     Indicators of the Stock Market Crashes, in: O. Sokolov, G. Zholtkevych, V. Yakovyna, Y. Tarasich,
     V. Kharchenko, V. Kobets, O. Burov, S. Semerikov, H. Kravtsov (Eds.), Proceedings of the 16th
     International Conference on ICT in Education, Research and Industrial Applications. Integration,
     Harmonization and Knowledge Transfer. Volume II: Workshops, Kharkiv, Ukraine, October 06-
     10, 2020, volume 2732 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 455–470. URL:
     https://ceur-ws.org/Vol-2732/20200455.pdf.
[47] V. N. Soloviev, V. Solovieva, A. Tuliakova, A. Hostryk, L. Pichl, Complex networks theory and
     precursors of financial crashes, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the
     Special Edition of International Conference on Monitoring, Modeling & Management of Emergent
     Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2020, pp. 53–67. URL: https://ceur-ws.org/Vol-2713/paper03.pdf.
[48] I. Khvostina, S. Semerikov, O. Yatsiuk, N. Daliak, O. Romanko, E. Shmeltser, Casual analysis of
     financial and operational risks of oil and gas companies in condition of emergent economy, in:
     A. Kiv (Ed.), Proceedings of the Selected Papers of the Special Edition of International Conference
     on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2020), Odessa,
     Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp.
     41–52. URL: https://ceur-ws.org/Vol-2713/paper02.pdf.
[49] L. Guryanova, L. Bogachkova, O. Zyma, M. Novosel, N. Poluektova, V. Gvozdytskyi, Mod-
     els of estimation and analysis of a systemic risk in the banking sector, in: 2020 IEEE 2nd
     International Conference on System Analysis Intelligent Computing (SAIC), 2020, pp. 1–6.
     doi:10.1109/SAIC51296.2020.9239193.
[50] O. V. Kuzmenko, S. V. Lieonov, A. O. Boiko, Data mining and bifurcation analysis of the risk of
     money laundering with the involvement of financial institutions, Journal of International Studies
     13 (2020). URL: https://www.jois.eu/files/22_871_Kuzmenko%20et%20al.pdf.
[51] N. Klymenko, O. Nosovets, L. Sokolenko, O. Hryshchenko, T. Pisochenko, Off-balance
     accounting in the modern information system of an enterprise, Academy of Account-
     ing and Financial Studies Journal 23 (2019). URL: https://www.abacademies.org/articles/
     offbalance-accounting-in-the-modern-information-system-of-an-enterprise-8403.html.
[52] V. Derbentsev, S. Semerikov, O. Serdyuk, V. Solovieva, V. Soloviev, Recurrence based entropies
     for sustainability indices, E3S Web of Conferences 166 (2020) 13031. doi:10.1051/e3sconf/
     202016613031.
[53] J. Prüfer, P. Prüfer, Data science for institutional and organizational economics, Technical Report,
     2018. doi:10.2139/ssrn.3137014.
[54] Y. Hrabovskyi, V. Babenko, O. Al’Boschiy, V. Gerasimenko, Development of a Technology for
     Automation of Work with Sources of Information on the Internet, WSEAS Transactions on
     Business and Economics 17 (2020) 231–240.
[55] M. Ilchuk, N. Davydenko, Y. Nehoda, Scenario modeling of financial resources at the enterprise,
     Intellectual Economics 13 (2019). doi:10.13165/IE-19-13-2-05.
[56] M. Oliskevych, G. Beregova, V. Tokarchuk, Fuel consumption in Ukraine: Evidence from vector
     error correction model, International Journal of Energy Economics and Policy 8 (2018). URL:
     https://www.econjournals.com/index.php/ijeep/article/view/6825/3925.
[57] Y. Shi, J. Zhu, V. Charles, Data science and productivity: A bibliometric review of data science


                                                      79
Nina O. Rizun et al. CEUR Workshop Proceedings                                                      63–81


     applications and approaches in productivity evaluations, Journal of the Operational Research
     Society 72 (2020) 975–988.
[58] A. Matviychuk, I. Strelchenko, S. Vashchaiev, H. Velykoivanenko, Simulation of the crisis contagion
     process between countries with different levels of socio-economic development, CEUR Workshop
     Proceedings 2393 (2019) 485–496.
[59] A. Kaminskyi, M. Nehrey, N. Rizun, The impact of COVID-induced shock on the risk-return
     correspondence of agricultural ETFs, CEUR Workshop Proceedings 2713 (2020) 204–218.
[60] M. Nehrey, A. Kaminskyi, M. Komar, Agro-economic models: a review and directions for research,
     Periodicals of Engineering and Natural Sciences 7 (2019) 702–711. URL: http://pen.ius.edu.ba/index.
     php/pen/article/view/579.
[61] I. Voronenko, A. Skrypnyk, N. Klymenko, D. Zherlitsyn, Y. Starychenko, Food security risk in
     Ukraine: assessment and forecast, Agricultural and Resource Economics: International Scientific
     E-Journal 6 (2020) 63–75.
[62] M. Ausloos, R. Cerqueti, T. A. Mir, Data science for assessing possible tax income manipulation:
     The case of Italy, Chaos, Solitons & Fractals 104 (2017) 238–256.
[63] M. Oliskevych, I. Lukianenko, Labor force participation in Eastern European countries: nonlinear
     modeling, Journal of Economic Studies 46 (2019) 1258–1279.
[64] National Academies of Sciences, Engineering, and Medicine, Division on Engineering and Physi-
     cal Sciences, Division of Behavioral and Social Sciences and Education, Computer Science and
     Telecommunications Board, Board on Mathematical Sciences and Analytics, Committee on Applied
     and Theoretical Statistics, Board on Science Education, Committee on Envisioning the Data Science
     Discipline: The Undergraduate Perspective, Data science for undergraduates: Opportunities and
     options, The National Academies Press, Washington, DC, 2018. doi:10.17226/25104.
[65] N. Volkova, N. Rizun, M. Nehrey, Data science: Opportunities to transform education, CEUR
     Workshop Proceedings 2433 (2019) 48–73.
[66] I. Perevozova, V. Babenko, Z. Krykhovetska, I. Popadynets, Holistic approach based assessment
     of social efficiency of research conducted by higher educational establishments, E3S Web of
     Conferences 166 (2020) 13022.
[67] A. E. Kiv, M. P. Shyshkina, S. O. Semerikov, A. M. Striuk, Y. V. Yechkalo, AREdu 2019 – How
     augmented reality transforms to augmented learning, CEUR Workshop Proceedings 2547 (2020)
     1–12.
[68] I. Dimitrov, N. Davydenko, A. Lotko, A. Dimitrova, Comparative study of main determinants of
     entrepreneurship intentions of business students, in: 2019 International Conference on Creative
     Business for Smart and Sustainable Growth (CREBUS), IEEE, 2019, pp. 1–4.
[69] G. James, D. Witten, T. Hastie, R. Tibshirani, An introduction to statistical learning: with Appli-
     cations in R, volume 112 of Springer Texts in Statistics, Springer, New York, 2013. doi:10.1007/
     978-1-4614-7138-7.
[70] M. Nehrey, T. Hnot, Using recommendation approaches for ratings matrixes in online marketing,
     Studia Ekonomiczne (2017) 115–130.
[71] I. Voronenko, M. Nehrey, S. Kostenko, I. Lashchyk, V. Niziaieva, Advertising strategy management
     in Internet marketing, Journal of Information Technology Management 13 (2021) 35–47.
[72] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research
     3 (2003) 993–1022. URL: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf.
[73] M. E. Roberts, B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, B. Albertson, S. Gadarian, D. Rand,
     Topic models for open ended survey responses with applications to experiments, American Journal
     of Political Science 58 (2014) 1064–82.
[74] S. D. Robinson, Temporal topic modeling applied to aviation safety reports: A subject matter
     expert review, Safety science 116 (2019) 275–286.
[75] P. DiMaggio, M. Nag, D. Blei, Exploiting affinities between topic modeling and the sociological
     perspective on culture: Application to newspaper coverage of US government arts funding, Poetics
     41 (2013) 570–606.
[76] M. E. Roberts, B. M. Stewart, E. M. Airoldi, A model of text for experimentation in the social


                                                    80
Nina O. Rizun et al. CEUR Workshop Proceedings                                                         63–81


     sciences, Journal of the American Statistical Association 111 (2016) 988–1003.
[77] D. M. Blei, Probabilistic topic models, Communications of the ACM 55 (2012) 77–84.
[78] V. B. Kobayashi, S. T. Mol, H. A. Berkers, G. Kismihok, D. N. Den Hartog, Text classification for
     organizational researchers: A tutorial, Organizational research methods 21 (2018) 766–799.
[79] V. B. Kobayashi, S. T. Mol, H. A. Berkers, G. Kismihók, D. N. Den Hartog, Text mining in
     organizational research, Organizational research methods 21 (2018) 733–765.
[80] B. C. Wallace, M. J. Paul, U. Sarkar, T. A. Trikalinos, M. Dredze, A large-scale quantitative analysis of
     latent factors and sentiment in online doctor reviews, Journal of the American Medical Informatics
     Association 21 (2014) 1098–1103.
[81] R. D. Sharma, S. Tripathi, S. K. Sahu, S. Mittal, A. Anand, Predicting online doctor ratings from
     user reviews using convolutional neural networks, International Journal of Machine Learning and
     Computing 6 (2016) 149.
[82] A. López, A. Detz, N. Ratanawongsa, U. Sarkar, What patients say about their doctors online: a
     qualitative content analysis, Journal of general internal medicine 27 (2012) 685–692.
[83] T. L. James, E. D. V. Calderon, D. F. Cook, Exploring patient perceptions of healthcare service
     quality through analysis of unstructured feedback, Expert Systems with Applications 71 (2017)
     479–492.
[84] M. E. Roberts, B. M. Stewart, D. Tingley, Stm: An R package for structural topic models, Journal
     of Statistical Software 91 (2019) 1–40.
[85] M. E. Roberts, B. M. Stewart, D. Tingley, E. M. Airoldi, et al., The structural topic model and
     applied social science, in: Advances in neural information processing systems workshop on topic
     models: computation, application, and evaluation, volume 4, Harrahs and Harveys, Lake Tahoe,
     2013, pp. 1–20.
[86] N. Hu, T. Zhang, B. Gao, I. Bose, What do hotel customers complain about? text analysis using
     structural topic model, Tourism Management 72 (2019) 417–426.
[87] A. Ojo, N. Rizun, Structural and temporal topic models of feedbacks on service quality–a path to
     theory development?, in: Americas Conference on Information Systems (AMCIS 2020). Healthcare
     Informatics & Health Information Tech (SIGHealth), volume 15, 2020.
[88] J. Chang, lda: Collapsed Gibbs sampling methods for topic models, 2015. URL: https://rdrr.io/cran/
     lda/.
[89] T. L. Griffiths, M. Steyvers, Finding scientific topics, Proceedings of the National Academy of
     Sciences 101 (2004) 5228–5235.


                                                     81