Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Data science tools for economics education: text mining and topic modeling applications Nina O. Rizun1 , Maryna V. Nehrey2,3 and Nataliia P. Volkova4 1 Gdańsk University of Technology, 11/12 Gabriela Narutowicza, 80-233 Gdańsk, Poland 2 National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony Str., Kyiv, 03041, Ukraine 3 Eidgenössische Technische Hochschule Zürich, Main building, Rämistrasse 101, 8092 Zurich, Switzerland 4 Alfred Nobel University, 18 Sicheslavska Naberezhna Str., Dnipro, 49000, Ukraine Abstract Data science is the interdisciplinary field that uses tools, algorithms, and knowledge of mathematics and statistics to extract insights from data. Data science has a wide range of applications in various domains, such as business, marketing, banking, insurance, medicine, tourism, etc. Data science can also enhance the value of economics education by providing students with relevant skills and competencies for the modern and technologically advanced society. This paper explores the use of data science tools, especially text mining and natural language processing, for conducting scientific research and teaching economics. The paper demonstrates how text analytics and topic modeling can be used to analyze public perception of various topics, such as events, companies, products, and services. The paper also shows how text analytics and topic modeling can incorporate additional metadata, such as the characteristics of the comment authors, to reveal differences in their opinions. Furthermore, the paper reviews the data science study programs for economics at top-20 universities and identifies their strengths and weaknesses. Keywords data science, economics education, text mining, topic modeling, machine learning, natural language processing 1. Introduction The year 2020 was a critical moment for the global society, as the COVID-19 pandemic exposed the vulnerabilities and opportunities of various sectors and domains [1, 2, 3, 4, 5]. The education sector was one of the most affected by the pandemic, as it had to undergo a rapid digital transformation, a shift to online learning, and a suspension of educational activities [6, 7, 8, 9, 10]. The field of economics also faced significant changes, such as the digitalization of processes, the adoption of remote work, and the alteration of service and communication with customers [11, 12]. The fast-paced world has become more digital than ever, and the demand for data literacy, data-driven decision making, and data science skills has increased accordingly. Data science is an interdisciplinary field that uses tools, algorithms, and knowledge of mathematics and statistics to extract insights from data. Data science has a wide range of applications in various domains, such as business, marketing, banking, insurance, medicine, tourism, etc. However, the potential of data science in education has been relatively underexplored, and many opportunities for advancing the field have not been fully exploited. Data science can be used in education to address scientific problems, such as in the study of behavior in economics, in macro- and microeconomics, marketing, finance, agriculture, environmental and ecological economics, and so on. Data science can also be used to enhance the teaching and learn- ing of economics by providing students with relevant skills and competencies for the modern and technologically advanced society. CoSinE 2024: 11th Illia O. Teplytskyi Workshop on Computer Simulation in Education, co-located with the XVI International Conference on Mathematics, Science and Technology Education (ICon-MaSTEd 2024), May 15, 2024, Kryvyi Rih, Ukraine " nina.rizun@pg.edu.pl (N. O. Rizun); marina.nehrey@gmail.com (M. V. Nehrey); npvolkova@yahoo.com (N. P. Volkova) ~ https://pg.edu.pl/b5968c8562_nina.rizun (N. O. Rizun); https://scholar.google.com.ua/citations?user=NkrrNKAAAAAJ (M. V. Nehrey); https://scholar.google.com.ua/citations?user=Y18aS7EAAAAJ (N. P. Volkova)  0000-0002-4343-9713 (N. O. Rizun); 0000-0001-9243-1534 (M. V. Nehrey); 0000-0003-1258-7251 (N. P. Volkova) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 63 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 2. Literature review Data science has a big list of tools: linear regression, logistic regression, density estimation, confidence interval, test of hypotheses, pattern recognition, clustering, supervised learning, time series, decision trees, Monte-Carlo simulation, naive Bayes, principal component analysis, neural networks, k-means, recommendation engine, collaborative filtering, association rules, scoring engine, segmentation, predic- tive modeling, graphs, deep learning, game theory, arbitrage, cross-validation, model fitting, etc. Some of these tools were used in the next researches. Teaching data science, for example, were introduced in [13], Big data and data science methods presented in [14, 15, 16, 17, 18, 19, 20], machine learning used in [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35], Monte Carlo method presented in [36], Artificial Intelligence presented in [37, 38, 39, 40]. Data science is fast developing. A large volume of information that grows with each passing year makes it possible to build high-precision models that simplify and partially automate the decision-making process. Models are being developed that implement the key data science algorithms for different areas of economics: financial data science [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], for institutional economics – [53, 54, 55, 56, 57, 58], for agriculture – [59, 60, 61], for taxation – [62], and labor market – [63]. Data science developing for education discussed in [64, 65, 66, 67, 68]. 3. Data science: principles and tools Data science in education is a multidisciplinary approach to technologies, processes, and systems for extract knowledge, understanding of data, and supports decision-making under uncertainty. Data science deals with mathematics, statistics, statistical modeling, signal processing, computer science & programming, database technologies, data modeling, machine learning, natural language processing, predictive analytics, visualization, etc. Data science in education has two aspects of the application: (i) the management and processing of data and (ii) analytical methods for analysis and modeling, and includes nine main steps (figure 1). The first aspect includes data systems and their preparation, including databases facilities, data cleansing, engineering, visualization, monitoring, and reporting. The second aspect includes data analytics data mining, machine learning, text analytics, probability theory, optimization, and visualization. The basis of the learning process is the availability of relevant data that is of sufficient quality, appropriately organized for the task. Primary data often requires pre-processing. First of all, it is necessary to investigate the availability of the necessary data and how they can be obtained. The data search ends with the creation of a data set in which data coexistence is to be provided. Data science has a wide range of tools for data evaluation and preparation, in particular for data mining, data manipulation (value conversion, data aggregation and reordering, table aggregation, breakdown or merge of values, etc.) and validation of data (checking format, ranges of test values and search in legal values tables). The problem of missing values is solved by using different analytical methods: simulation, inserting default values, statistical simulation. Data science provides broad opportunities for text analytics. In addition, the use of data science tools facilitates work with big data. The main approaches in data science are supervised learning models and unsupervised learning models. 3.1. Supervised learning models Supervised learning is one of the methods of machine learning, in which the model learns on the basis of labeled data. Using Supervised learning is possible to decide on two types of tasks: regression and classification. The main difference between them is the type of variance that is predicted by the corresponding algorithm. In regression training, it is a continuous variable, in the classification, it is a categorical variable. To solve these problems, many algorithms have been developed. One of the most common is a linear and logistic regression, a decision tree. Linear regression. Regression analysis can be considered as the basis of statistical research. This approach involves a wide range of algorithms for forecasting a dependent variable using one or 64 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Figure 1: Data science process. more factors (independent variables). The advantage of applying such an approach to modeling is the simplicity and clarity of the results, the speed of learning, and the release of the forecast. The disadvantage is not always sufficiently high precision (since in economics and finances, the linear relationship between changes is rare). Logistic regression is used when it is necessary to predict the release of a binary variable using a dataset of continuous or categorical variables. Situations, where the parent variable has more than 2 possible values, can be simulated by a one-vs-all approach when constructing a logistic classifier for a possible output, or one-vs-one when constructing logistic classifiers for each possible combination of categories of the original variable. The dependence between the independent and the logarithmic variable in logistic regression is linear, the only difference with linear regression is sigmoidal functions, which converts a linear result in the probability of belonging to a class within [0; 1]. The advantages and disadvantages of logistic regression are due to the advantages and disadvantages of linear regression. This is the speed of the algorithm and the possible interpretation of the results, on the one hand, and a little accuracy – on the other. Logistic regression is often used to construct vote-counting models. An important factor in this is the interpretation of its results. The influence of each factor is clearly expressed by the magnitude of the coefficient 𝑏, which allows it to be clearly defined which of them positively and to what extent influence the decision. A decision tree is an approach to both regression and classification. It is widely used in intelligent data analysis. The decision tree consists of “nodes” and “branches”. The tree nodes have attributes that are used to make decisions. In order to make a decision, it is needed to go down to the bottom of the decision tree. The sequence of attributes in a tree, as well as the values that divide the leaves into branches, depends on such parameters as the amount of information or entropy that the attribute adds to the prediction variable. The advantages of decision trees are the simplicity of interpretation, greater accuracy in decision-making simulation compared with regression models, the simplicity of visualization, natural modeling of categorical variables (in regression models it is needed to be coded by artificial variables). However, the decision trees have one significant drawback – low predictive accuracy [69]. 3.2. Unsupervised learning Unsupervised learning describes a more complex situation in which, for each observation 𝑖 = 1, ..., 𝑛, observation of the measurement vector 𝑥𝑖 , but without any variables in the output 𝑦𝑖 . In such data, the construction of linear or logistic regression models is impossible, since there are no predictive variables. In such a situation, a so-called “blind” analysis is conducted. Such a task belongs to the class of tasks of 65 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 unsupervised learning, due to the absence of an output variable that guided the analysis. Unsupervised learning algorithms can be divided into algorithms for space reduction and clustering algorithms. The main task of clustering is to find patterns in the data that allow you to divide the data into groups and then in a certain way analyze them and give them an interpretation. K-means is one of the most popular clustering algorithms, whose main task is to divide 𝑛 observations into 𝑘 clusters. The minimum sum of squares is the distance of each observation to the center of the corresponding cluster. This algorithm is iterative, at each step the cluster centers are re-indexed and redistributed observation between them until a stable result is achieved. The benefits of such an algorithm of clustering are the simplicity, speed, and the ability to process large amounts of data. But the user must specify the number of clusters he wants to use for clustering before computing; the instability of the result (it depends on the initial separation of points between the clusters). Hierarchical clustering is an alternative approach to clustering, which does not require a pre- liminary determination of the number of clusters. Moreover, the hierarchical clustering ensures the stability of the result and gives the output an attractive visualization based on the tree-like structure of observations/clusters – dendrogram. This clustering algorithm uses different distance metrics and cluster agglomeration cluster criteria, which makes it very flexible to the data on which clustering is performed. However, the disadvantage of hierarchical clustering is the need to calculate the matrices of the distance between observations before agglomeration, which complicates the application of this algorithm for large data and data with many dimensions. Time series analysis. A time series is built by observations that have been collected with a fixed interval. It could be daily demand, or monthly profit growth rates, number of flights, etc. The time series analysis takes an important part in the analysis of data that covers the region, from the analysis of exchange rates to sales forecasting [70, 71]. One of the tasks of time series analysis is the allocation of trend and seasonal components and the construction of the forecast. There are many algorithms that have been developed, and we consider models such as ARIMA and Prophet. The ARIMA algorithm is one of the most common algorithms for forecasting time series. The basic idea is to use the previous time series values to predict the future. This can use any number of lags, which makes such an approach difficult in setting because it is necessary to select the parameter so as to minimize the error and not override the model. ARIMA is often used for short-term forecasting. A disadvantage is the complexity of learning a model in many seasonal conditions. Algorithm Prophet was developed by Facebook at the beginning of 2017 for forecasting based on time series [70]. It is based on an additive model in which nonlinear trends are of annual and weekly seasonality. This approach also allows to model holidays and weekends, thereby allowing to predict residuals in a time series. Also, the Prophet is insensitive to missed values, the bias in the trend, and significant residuals, which is an important advantage over ARIMA. Another advantage is the rather high speed of training, as well as the ability to use large-scale time series. 4. Topic modeling in data science Under the notion of texts mining in natural language we understand the application of methods of texts computer analysis and presentation in order to achieve the quality, which corresponds to the “manual” processing for further usage in various tasks and applications. One of the actual tasks of automatic texts mining is topic modelling. 4.1. Latent Dirichlet Allocation Topic modelling is a statistical approach to extract the hidden semantics that occurs in a collection of documents or reviews. Latent Dirichlet Allocation (LDA) model proposed by [72] is one of the most notable approach for unsupervised topic modeling, which assumes documents and the words within them are derived from a “generative probabilistic model”. Within the class of unsupervised statistical topic models, themes are defined as distributions over a vocabulary of words that represent semantically interpretable “topic” [73]. ‘Meaning’ of those topics (usually, in the form of topic Label 66 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 and topic Description) is an emergent quality of the relationship between words [74, 75]. The task of topic meaning recognizing is often fraught with difficulty and requires the application of a triangular approach to its implementation, namely: (i) a literature review of existing topics found in the analyzed problem domain; (ii) independent work of experts on assigning labels to topics; (iii) conducting joint expert discussions in order to compare and revise the obtained labelling results. As for main assumption of LDA method, there are the following [76]: (i) document is represented as a mixture of topics; (ii) each topic are present in many documents; (iii) each word within a given document belonging to exactly one topic; (iv) each document can be represented as a vector of proportions that denote what fraction of the words belong to each topic. The basic LDA model is shown in figure 2. Figure 2: Latent Dirichlet allocation model [77]. Figure 2 serves as a visual explanation of the model and could be described as follows: (i) we have 𝐷 documents and 𝐾 topics; (ii) each topic presented by 𝛽𝑘 words distribution over the vocabulary within the topic 𝑘; (iii) each document is presented by 𝜃𝑑 topic proportions within the document, where 𝜃𝑑,𝑘 is the topic proportion for topic 𝑘 in document 𝑑. Finally, we have (iv) for each 𝑛𝑡ℎ word in the document 𝑑 – topic assignments 𝑧𝑑,𝑛 (depends on the per-document topic proportions 𝜃𝑑 ) and (v) for each 𝑑𝑡ℎ document – observed words 𝑤𝑑,𝑛 which is an element from the fixed vocabulary (depends on the topic assignment 𝑧𝑑,𝑛 and all of the topics 𝛽1:𝑘 ) [77]. In is obviously that data scientist in cooperation with other science domains increasingly seek ways to apply NLP and especially LDA topic modelling techniques to extract, organize, recognize, label and classify customers opinions and experiences [78]. Next examples demonstrate the possibilities to sol the apply LDA topic modelling for solving: (i) human resources management, (ii) service quality assessment, (iii) research & development policy coordination tasks and (iv) strategic planning in universities. Kobayashi et al. [79] used topic modelling to summarize the worker attributes and find worker attribute constructs and use these to cluster jobs. 140 main topics were identified, and such skills, as, for example, interpersonal communication (vocabulary of words: communication, written, oral, verbal, interpersonal, presentation, effective, listening); analytical and problem-solving (vocabulary of words: problem, solving, analytical, solver, troubleshooting, approach, abilities, capabilities); data analytical skills (vocabulary of words: data, analysis, quantitative, research, statistics, economics, statistical, modeling); willingness to travel and the ability to operate on a flexible work schedule (vocabulary of words: travel, willingness, willing, work, time, needed, internationally, international) and other. As authors mentioned, topic modelling showed that it is not only possible to classify job information from vacancies but that we can also derive behavioral characteristics that are valued or required by employers from potential or existing job holders. Moreover, as a further analysis of this research was planned the analysing trends of worker attributes required by organizations (i) over time, (ii) occupations, companies, and (iii) geographical regions, and also (iv) possibility to build a network of work activities to examine relationship among tasks. Wallace et al. [80], Sharma et al. [81] captured the main positive and negative words within latent aspects (topics), which characterise interpersonal manner, technical competence, and systems issues 67 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 [82] from online physician reviews. Similar with previous work, James et al. [83] based on López et al. [82] categorization, examined unstructured textual feedback of physicians in order to determine: (i) how the extracted sentiment and topics compared to traditional identified dimensions of service quality in healthcare and (ii) what tone and topic elements were driving patients’ service quality ratings. As a main finding were the following list of topics and their tone: (1) Negative system quality: Staff and Timeliness (vocabulary of words: office, staff, time, doctor, wait, appointment); (2) Positive interpersonal quality: Physician Compassion (vocabulary of words: doctor, caring, great, knowledgeable, excellent, recommend); (3) Negative system quality: Experience (vocabulary of words: told, don’t, doctor, ask, bad, money, call); (4) Positive Technical quality: Family (vocabulary of words: doctor, questions, staff, practice, children, son, pregnancy); (5) Positive Technical quality: Surgery (vocabulary of words: surgery, pain, procedure, staff, hospital, knee, cancer, age); (6) Negative Technical quality: Diagnosis (vocabulary of words: years, treatment, medical, patient, conditions, test, diagnosis, time, treated). The obtained results allowed the authors to establish the dependence on the degree of influence of the identified aspects (topics) on the general perception of the physician’s quality, as well as the behavioural characteristics of patients when choosing a doctor online, depending on the content of comments and overall rating. 4.2. Structural topic modelling When conducting research on the basis of textual documents or customers comments, researchers often have a more of information “about the text” than “about the content of the text”. From the perspective of topic modelling as a statistical approach, the existence of such information “about the text” (metadata) allows and initiates the inclusion in the model of additional covariates that could influence the following components of the topic model: (1) Proportion of the document devoted to the topic (”prevalence of the topic”). For example, we can know that “clients who buy products online are more likely to talk about delivery problems than clients who buy offline”. (2) Word rates used in the discussing of the topic (”topical content”). For example, we can clarify that “when clients talking about delivery problems, clients who buy products online are more likely discuss the problems about products returning, but patients clients who buy offline are more likely discuss staff rudeness issues” [84]. Such possibilities are proposed by Structural topic modelling (STM) as an extension of the LDA framework [74, 84, 85] . Drawing analogies with LDA: (i) each document in STM arises as a mixture over 𝐾 topics; (ii) topic proportions (𝜃𝑑 ) can be correlated (LDA limitation 1); (iii) topics prevalence 𝜃𝑑 can be influenced by set of covariates 𝑋 through a standard regression model with covariates; (ii) for each 𝑤𝑛 word in the document 𝑑 (iii) a topic 𝑍𝑑,𝑛 is drawn from the document-specific distribution, and (iv) conditional on that topic, a word is chosen from a multinomial distribution over words parameterized by 𝛽𝑑,𝑘,𝑣 , where 𝑘 = 𝑍𝑑,𝑛 . This distribution can include a second set of covariates 𝑌 [84]. Thus, the main differences between the LSA and STM models (figure 3) are that the prevalence (content) parameters determined in the LDA by the general a priori Dirichlet parameters 𝛼(𝜂) in the STM model are replaced with prior structures specified in the form of generalized linear models parameterized by document specific covariates 𝑋(𝑌 ) [86] These covariates inform either the topic prevalence (covariates 𝑋) or the topical content (covariates 𝑌 ) latent variables with information “about the text” (metadata). 5. Example of structural modelling algorithms application in education In order to study customer perception of the quality of services, assess their satisfaction with goods or services received, as well as identify factors that influence customer acceptance of new offers on the market, students were asked to use STM tools. As a data source 610 textual comments about hospitals from the site http://www.ratemyhospital.ie/ (over the past two years – 2018–2019) were used. STM package allows to use all additional variables to demonstrate the power of meta-data for topic modelling. With this aim, textual comments data was extended by information about (1) hospital ownership (private, public), (2) sentiment (positive or negative) (table 1) [87]. After that, all steps of text pre-processing 68 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Figure 3: A graphical illustration of the structural topic model [76]. were performed. Table 1 Comments before pre-processing. Comments Hospital Ownership Sentiment A lovely friendly patient-focussed hospital Public Positive Consultant I found seriously lacking compassion for my mother Public Negative the patient. Sniggered while informing us that while my mother’s condition is uncomfortable, it is not life threatening.To be frank, consultant spoke down to us. Tullamore is a very clean hospital and looks very well. All staff I had Private Positive the pleasure of meeting were lovely and very professional at all times. The staff in all capacities do not receive enough thanks for the jobs they do First, the STM model’s setup were performed. To determine the optimal number of topics, STM models from 10 till 30 topics were built were analyzed. Semantic coherence is maximized when the most probable words in a given topic frequently co-occur together, and it is a metric that correlates well with a human judgment of topic quality. Having high semantic coherence is relatively easy, though, if we only have a few topics dominated by very common words, so we wanted to look at both semantic coherence and exclusivity of words to topics. So, the most valuable number of topics should be very coherent and also very exclusive. Looking at figure 4, we draw the conclusion that the 15 topics suit the most to these criteria. Most of the topics, in this case, are above the average of exclusivity and have high coherence, especially compared to the other number of topics which are often spread out on both axes. 15-topic STM model was selected based on subjectively optimal combination of the average semantic coherence and exclusivity outcomes. As a result, for 15-topic model, we received the (i) topic-words distribution 𝛽; (ii) document-topic proportions 𝜃; (iii) list of Highest probability-, FREX-, Lift- and Score-keywords (Highest Prob: are the words within each topic with the highest probability; FREX : are the words that are both frequent and exclusive, identifying words that distinguish topics; Lift: give more weight to words that appear less frequently in other topics by dividing their frequency into other topics; Score: score words are weighted by dividing the log frequency of the word in the topic by the log frequency in other topics [85, 88, 89]); (iv) set of documents, mostly associated with this topic. The figure 5 allows us to get information on 69 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Figure 4: Semantic coherence and exclusivity of STM models. the share of the different topics at the overall corpus. Figure 5: Expected topic proportions over corpus. Second, students needed to realize the Topics labelling step. For that: (1) two students independently labelled the topics to produce the first version of labels based on top weighted keywords; (2) two students discussed the labels and resolved discrepancies in labelling; (3) two students independently refined topic labels based on the computationally guided deep reading 20 of the most representative tweets of the topics; (5) two students agreed on final 15 topic labels and jointly developed the topics 70 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 descriptions (short summarization of the topic content) [87]. The result of topic labelling is presented in the table 2. Table 2 Topics labels. # Topics label Topic keywords Topic pro- portion, % 1 Appointment Time Reliability time, service, wait, appoint, nurses, clinic, profession 4.47 2 Communication Skills nurses, rude, hospital, patient, found, staff, ward 6.34 3 Service Standards hospital, consult, year, many, staff, standard, old 9.45 4 Waiting Time staff, hospital, member, given, sever, hour, time 3.03 5 Staff Feedback/Explanation staff, kind, time, patient, depart, great, explain 8.09 6 Patient-Focusing Service ask, hospital, doctor, day, told, week, care 2.56 7 Maternity Unit/Care baby, doctor, midwife, time, inform, midwife, week 2.89 8 Personnel Reliability / Treatment scare, staff, receive, excel, thank, ward, treatment 11.81 9 Food Service hospital, staff, need, food, poor, good, doctor 8.10 10 Hospital Environment hospital, mother, conditions, room, week, inform, doctor 4.48 11 Care and Recovery nursed, care, good, great, love, doctor, patient 9.29 12 A&E/Admission pain, hospital, appoint, staff, still, patient, never 5.37 13 Information Exchange with Pa- hour, doctor, wait, told, seen, blood, home 9.99 tient/Family 14 Service Rapidness hospital, staff, well, profession, attend, efficiency, visit 8.31 15 Ward/Hospital’s Facilities patient, staff, trolley, corridor, time, ward, hospital 5.82 Third, the STM covariate analysis could be performed. In this stage, we aimed the evaluating the Sentiment effect on the formation of more positively and more negatively oriented aspects of hospitals service quality (HSQ). Thus, we use Sentiment metadata as Covariate in the STM model. Formally, we can identify an aspect as negative if, according to the results of effect estimation, the proportion of this aspect in negative comments (Sentiment = Negative) is significantly higher than in comments in positive comments (Sentiment = Positive). According to the results of our experiment, 5 topics (33.33%) are positive (right side of figure 6), and 10 topics (66.66%) are negative (left side of figure 6). Figure 6: Difference in the power of Sentiment influence on topic proportion. The dots in the figure 6 indicated the mean values of the estimated proportion differences (power of influence, PI) with 95% confidence intervals, allows us to evaluate the relative degree of influence of sentiment on of hospitals service quality aspects. For example, the five most negative Topic of are (1) Information Exchange with Patient/Family (Topic 13) with highest power of negative influence; 71 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 (2) Communication Skills (Topic 2); (3) A&E/Admission (Topic 12), (4) Waiting Time (Topic 4) and (5) Patient-Focusing Service (Topic 6). In turn two most positive topics are (1) Service Rapidness (Topic 14); (2) Personnel Reliability/Treatment (Topic 8). Knowledge about Topics with a positive and negative impact of comments Sentiment allow to indicate the strength of patient satisfaction/dissatisfaction with the hospitals service quality. Fourth, the power of Time influence on positive and negative Topics dynamics (from 2018 to 2019) using the STM model (with Year and Sentiment as a Covariates) should be performed. In terms of the Influence of the Time Factor on the Service Quality, the following four groups of HSQ Topics can be distinguished: (1) Topics causing the growth of patient satisfaction with the Service Quality over the time: positive topics with a positive dynamic over the time; (2) Topics causing a recession in patient satisfaction with the hospitals service quality (HSQ) over the time: positive topics with a negative dynamic over the time; (3) Topics causing the growth of patient dissatisfaction with the HSQ over the time: negative topics with a positive dynamic over the time (4) Topics causing a recession in patient dissatisfaction with the HSQ over the time: negative topics with a negative dynamic over the time. As an indicator that allows us to identify the direction and growth rate (GR) of change in the level of positive or negative comments describing the Topic, the slope of the regression (dependence between the proportion of Positive/Negative Aspects and Time) will be used. The presented four charts (figure 7 a, b, c, d) show examples of four possible types of Influence of the Time Factor on the Service Quality: 1. Positive impact on Service Quality over the time: Service Rapidness topic characterized by growth rate (GR=1.100763) of patient satisfaction with the HSQ over the time (figure 7, b); 2. Worsening of Service Quality over the time: Personnel Reliability/Treatment topic characterized by and recession (GR=0.821713) in patient satisfaction with the HSQ over the time (figure 7, a); 3. Negative impact on Service Quality over the time: Information Exchange with Patient/Family topic characterized by growth (GR= 1.758421) of patient dissatisfaction with the HSQ over the time (figure 7, d); 4. Improvement of Service Quality over the time: Food Service topic causing a recession in customer dissatisfaction (GR= 0.575861) with the HSQ over the time (figure 7, c). As a result, student could see that the largest number of aspects (37.5%) has a negative impact on the HSQ. The highest degree of growth in patient dissatisfaction is characterized by A$E/Waiting Time topic. Moreover, this growth rate is not only the largest in the category of Negative impact, but in all analyzed topics. The most rapid (within the whole set of topics) decrease in the number of positive comments is characterized by the aspect of Maternity Unit/Care. The group of topics on which improvement in their quality is noted is 25.1%. At the same time, the Hospital Environment is characterized by the highest rate of improvement. 16.7% of topics have a positive effect on the HSQ, among which Service Rapidness and Maternity Unit/Treatment have the largest increase in the number of positive comments. Fifth, students may identify the influencing the Hospital Ownership on more positively and more negatively oriented HSQ aspects structure (using the Sentiment and Hospital Ownership factors as in the Covariates STM model). For this purpose, the following interpretation of the results could be proposed: (1) the Topics, more related to Public Hospital Ownership according to the results of effect estimation, in which the proportion of this Topics in comments about Public hospitals (Hospital Ownership = Public) is significantly higher than in comments about Private hospitals and vice versa; (2) the direction (positive or negative) of Hospital Ownership influencing on HSQ. For reaching the first purpose, the Hospital Ownership effect estimation was performed for revealing the aspects in which the proportion of the comments about Public hospitals (Hospital Ownership = Public) is significantly higher than comments about Private hospitals and vice versa. For formalization the rules for second purpose reaching, in terms of discovering the Influence of the Hospital Ownership on the Service Quality, the following groups of aspects proposed to be distinguished: (1) Topics causing the growth the level of patients satisfaction with Service Quality in Public hospitals: positive topics with a positive dynamic from Private to Public; (2) Topics causing the growth in the level of patients satisfaction with Service Quality in Private hospitals: positive topics with a positive dynamic from Public to Private; (3) Topics causing the growth the level of patients dissatisfaction with 72 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Figure 7: Examples of identification the influence of the Years Metadata. Service Quality in Public hospitals: negative topics with a positive dynamic from Private to Public; (4) Topics causing the growth in the level of patients dissatisfaction with Service Quality in Private hospitals: negative topics with a positive dynamic from Public to Private. According to the results of our experiment, 8 Topics are more associated with Public Hospitals (right side of figure 8), and 6 Topics are more associated with Private Hospitals (left side of figure 8), and one topic (Topic 13) is for both types of hospitals. Based on received results, we can conclude that the four topics (one positive and 3 negative), which more characterize the Public Hospital Ownership are (1) Service Rapidness (positive); (2) Food Service (negative) (3) Maternity Unit/Care (negative) and (4) Patient-Focusing Service (negative). In turn five Aspects, which more characterize the Private Hospital Ownership (two positive and two negative) are (1) Appointment Time Reliability (negative); (2)Service Standards (positive); (3) Staff Feedback/Explanation (positive) and (4) Hospital Environment (negative). Thus, this example of the use of STM modeling in teaching students shows how versatile and in-depth research can be carried out using data science. Presented examples demonstrate the nature of tasks and 73 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Figure 8: Difference in the power of Hospital Ownership influence on Topic Proportion. approaches which could develop students’ technical and research skills in the public perception analysis. Such approaches also allow students to gain practical experience in the study and interpretation the influence of additional metadata, characterizing the comments authors, on differences in their opinions about events, companies, goods, and services. 6. Data science study programs in economics field Classical methods of statistical analysis, modeling methods, and data mining are used in economics. The analysis of data in these areas is aimed at the study of causation. In economics, current issues include policy development, determining the impact of a decision, long-term and short-term planning and forecasting, choosing the best solution from many possible, and many others. Drawing conclusions is also important in economics. In addition, the modern economy and finance are characterized using big data, so it is not always possible to use classical methods. Therefore, the methods of data science are precisely those methods that should be used in economics, which gives positive results and effect. Data science methods were first used in economic research and gradually penetrated into practice. Today, economics need specialists who have knowledge in these areas and are able to apply data science methods. In response to this market need, universities have begun to implement data science courses and programs for students of economics. The table 4 presents the courses and programs of the top 20 universities in the world. A study programs in economic field in Ukrainian universities has shown that data science courses and programs are still being introduced in Ukraine. Currently, there are separate programs for studying Data Science, mainly for computer science. Therefore, we believe that the prospects that data science opens for modern economists necessitate the introduction of courses and programs in data science. 7. Conclusions Data science is a rapidly growing and evolving field that has applications in various domains, such as research, society, and business. Data science requires significant investments and innovations from businesses and governments, as well as adequate education and training for students and professionals. However, as our research has shown, the integration of data science in economics education is still in its infancy. Only a few leading universities offer data science courses and programs for economics students, but this trend has not been widely adopted and needs to be further developed. 74 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 Table 3 Data science courses and programs for economics at top-20 universities. University Location Programs, courses Massachusetts Institute United States MicroMasters Program in Data, Economics, of Technology (MIT) and Development; Policy Computer Science, Economics and Data Science – course Stanford University United States M.S. in Statistics: Data Science; Tackling Big Questions Using Social Data Science – course Harvard University United States Data Science for Business – course; Using Big Data Solve Economic and Social Problems – course California Institute United States Business Analytics – course of Technology University of Oxford United Kingdom MSc in Social Data Science ETH Zurich - Swiss Federal Switzerland Data Science in Techno-Socio-Economic Institute of Technology Systems – course University of Cambridge United Kingdom Economics: Data Science and Policy – course Imperial College London United Kingdom MSc Business Analytics University of Chicago United States Economic Policy Analysis – course UCL United Kingdom Economics and Statistics BSc; Social Sciences with Data Science BSc National University Singapore Master of Science in Business Analytics of Singapore Princeton University United States Statistics and Machine Learning – course Nanyang Technological Singapore Master of Science in Analytics University EPFL Switzerland Master’s program in Data science Tsinghua University China (Mainland) Master’s Program in Data Science University of Pennsylvania United States Master of Information Systems Management, Business Intelligence and Data Analytics; MS in Information Technology, Business Intelligence and Data Analytics; Online Master of Science in Business Analytics Yale University United States Applied Econometrics: Politics, Sports, Microeconomics; Applied Econometrics: Macroeconomic and Finance Forecasting Cornell University United States Introduction to Data Science – course Columbia University United States Data Science for Social Good - summer program The University of Edinburgh United Kingdom Statistics with Data Science MSc As an example of the use of data science methods in economics education, we have demonstrated the application of STM-modeling in teaching students. STM-modeling is a technique that allows analyzing textual data and identifying latent topics based on additional metadata, such as the characteristics of the text authors. STM-modeling can help students develop their technological and research skills, work with big data, and study and interpret the differences in opinions about various topics, such as events, companies, products, and services. The STM-modeling technique is just one of the many methods and algorithms that can be used for modeling and analyzing economic processes. There are numerous examples of how data science can be applied in economics education, such as using time series analysis to predict the future value of a cryptocurrency, using regression models to determine customer loyalty or the likelihood of customer insolvency, etc. Data science offers a rich set of tools and techniques that can enhance the learning and teaching of economics. Education should keep pace with the modern development of the digital economy, digital society, 75 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 innovation, and creative entrepreneurship. The use of data science in education should be cross-platform, that is, used not only in the study of specific subjects, but also in the teaching of all subjects, interaction of students with each other and with teachers, real experts, research, and individual learning. References [1] M. Velykodna, Psychoanalysis during the COVID-19 pandemic: Several reflections on countertrans- ference, Psychodynamic Practice 27 (2021) 10–28. doi:10.1080/14753634.2020.1863251. [2] S. Semerikov, H. Kucherova, V. Los, D. Ocheretin, Neural Network Analytics and Forecasting the Country’s Business Climate in Conditions of the Coronavirus Disease (COVID-19), in: V. Snytyuk, A. Anisimov, I. Krak, M. Nikitchenko, O. Marchenko, F. Mallet, V. V. Tsyganok, C. Aldrich, A. Pester, H. Tanaka, K. Henke, O. Chertov, S. Bozóki, V. Vovk (Eds.), Proceedings of the 7th International Conference “Information Technology and Interactions” (IT&I-2020). Workshops Proceedings, Kyiv, Ukraine, December 02-03, 2020, volume 2845 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 22–32. URL: https://ceur-ws.org/Vol-2845/Paper_3.pdf. [3] S. O. Semerikov, T. A. Vakaliuk, I. S. Mintii, V. A. Hamaniuk, V. N. Soloviev, O. V. Bondarenko, P. P. Nechypurenko, S. V. Shokaliuk, N. V. Moiseienko, V. R. Ruban, Mask and Emotion: Computer Vision in the Age of COVID-19, in: Digital Humanities Workshop, DHW 2021, Association for Computing Machinery, New York, NY, USA, 2022, p. 103–124. doi:10.1145/3526242.3526263. [4] M. Velykodna, I. Frankova, Psychological Support and Psychotherapy during the COVID-19 Outbreak: First Response of Practitioners, Journal of Intellectual Disability - Diagnosis and Treatment 9 (2021) 148–161. URL: https://doi.org/10.6000/2292-2598.2021.09.02.1. [5] T. Tkachenko, O. Yeremenko, A. Kozyr, V. Mishchanchuk, W. Liming, Integration Aspect of Training Teachers of Art Disciplines in Pedagogical Universities, Journal of Higher Education Theory and Practice 22 (2022) 138–147. doi:10.33423/jhetp.v22i6.5236. [6] T. A. Vakaliuk, V. V. Osadchyi, O. P. Pinchuk, From the digital transformation strategy to the productive integration of technologies in education and training: Report 2023, in: T. A. Vakaliuk, V. V. Osadchyi, O. P. Pinchuk (Eds.), Proceedings of the 2nd Workshop on Digital Transformation of Education (DigiTransfEd 2023) co-located with 18th International Conference on ICT in Education, Research and Industrial Applications (ICTERI 2023), Ivano-Frankivsk, Ukraine, September 18- 22, 2023, volume 3553 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 1–8. URL: https: //ceur-ws.org/Vol-3553/paper00.pdf. [7] P. P. Nechypurenko, O. D. Kushnirova, The rebirth of home chemistry experiments: An interna- tional perspective and the ukrainian context, Science Education Quarterly 1 (2024) 97–102. URL: https://acnsci.org/journal/index.php/seq/article/view/824. doi:10.55056/seq.824. [8] S. G. Fashoto, Y. A. Faremi, E. Mbunge, O. Owolabi, Exploring structural equations modelling on the use of modified UTAUT model for evaluating online learning, Educational Technology Quarterly 2024 (2024) 319–336. doi:10.55056/etq.734. [9] S. Adewale, Is virtual learning still virtually satisfactory in the post-COVID-19 era for pre-service teachers?, Educational Technology Quarterly 2024 (2024) 152–165. doi:10.55056/etq.713. [10] K. Meziane Cherif, L. Azzouz, A. Bendania, S. Djaballah, The teachers’ ban or permission of smartphone use in Algerian secondary school classrooms, Educational Dimension (2024). doi:10. 55056/ed.727. [11] A. Bielinskyi, V. Soloviev, S. Semerikov, V. Solovieva, Identifying stock market crashes by fuzzy measures of complexity, Neuro-Fuzzy Modeling Techniques in Economics 10 (2021) 3–45. doi:10. 33111/nfmte.2021.003. [12] A. Kiv, P. Hryhoruk, I. Khvostina, V. Solovieva, V. N. Soloviev, S. Semerikov, Machine learning of emerging markets in pandemic times, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the Special Edition of International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 1–20. URL: https://ceur-ws.org/Vol-2713/paper00.pdf. 76 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 [13] R. J. Brunner, E. J. Kim, Teaching data science, Procedia Computer Science 80 (2016) 1947–1956. [14] H. Chen, R. H. L. Chiang, V. C. Storey, Business intelligence and analytics: From big data to big impact, MIS Quarterly 36 (2012) 1165–1188. URL: http://www.jstor.org/stable/41703503. [15] G. George, E. C. Osinga, D. Lavie, B. A. Scott, Big data and data science methods for management research, The Academy of Management Journal 59 (2016) 1493–1507. [16] A. G. Shoro, T. R. Soomro, Big data analysis: Apache spark perspective, Global Journal of Computer Science and Technology: C Software & Data Engineering 15 (2015) 7–14. [17] J. Xiong, G. Yu, X. Zhang, Research on governance structure of big data of civil aviation, Journal of Computer and Communications 5 (2017) 112–118. [18] L. Cao, Data science: a comprehensive overview, ACM Computing Surveys 50 (2017) 1–42. doi:10.1145/3076253. [19] A. Ignatyuk, O. Liubkina, T. Murovana, A. Magomedova, FinTech as an innovation challenge: From big data to sustainable development, E3S Web of Conferences 166 (2020) 13027. doi:10. 1051/e3sconf/202016613027. [20] M. Mazorchuk, T. Vakulenko, A. Bychko, O. Kuzminska, O. Prokhorov, Cloud technologies and learning analytics: Web application for pisa results analysis and visualization, CEUR Workshop Proceedings 2879 (2020) 484–494. [21] E. J. Parish, K. Duraisamy, A paradigm for data-driven predictive modeling using field inversion and machine learning, Journal of Computational Physics 305 (2016) 758–774. [22] L. Guryanova, R. Yatsenko, N. Dubrovina, V. Babenko, Machine learning methods and models, predictive analytics and applications, CEUR Workshop Proceedings 2649 (2020) 1–5. [23] V. Babenko, A. Panchyshyn, L. Zomchak, M. Nehrey, Z. Artym-Drohomyretska, T. Lahotskyi, Classical machine learning methods in economics research: Macro and micro level examples, WSEAS Transactions on Business and Economics (2021) 209–217. doi:10.37394/23207.2021. 18.22. [24] S. Nosratabadi, A. Mosavi, P. Duan, P. Ghamisi, F. Filip, S. S. Band, U. Reuter, J. Gama, A. H. Gandomi, Data science in economics: comprehensive review of advanced machine learning and deep learning methods, Mathematics 8 (2020) 1799. [25] V. Derbentsev, A. Matviychuk, V. N. Soloviev, Forecasting of Cryptocurrency Prices Using Machine Learning, in: L. Pichl, C. Eom, E. Scalas, T. Kaizoji (Eds.), Advanced Studies of Financial Technologies and Cryptocurrency Markets, Springer, Singapore, 2020, pp. 211–231. doi:10.1007/978-981-15-4498-9_12. [26] A. Kiv, S. Semerikov, V. N. Soloviev, L. Kibalnyk, H. Danylchuk, A. Matviychuk, Experimental Economics and Machine Learning for Prediction of Emergent Economy Dynamics, in: A. Kiv, S. Semerikov, V. N. Soloviev, L. Kibalnyk, H. Danylchuk, A. Matviychuk (Eds.), Proceedings of the Selected Papers of the 8th International Conference on Monitoring, Modeling & Management of Emergent Economy, M3E2-EEMLPEED 2019, Odessa, Ukraine, May 22-24, 2019, volume 2422 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 1–4. URL: https://ceur-ws.org/Vol-2422/ paper00.pdf. [27] A. Kiv, P. Hryhoruk, I. Khvostina, V. Solovieva, V. N. Soloviev, S. Semerikov, Machine learning of emerging markets in pandemic times, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the Special Edition of International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 1–20. URL: https://ceur-ws.org/Vol-2713/paper00.pdf. [28] A. E. Kiv, V. N. Soloviev, S. O. Semerikov, H. B. Danylchuk, L. O. Kibalnyk, A. V. Matviychuk, A. M. Striuk, Machine learning for prediction of emergent economy dynamics III, in: A. E. Kiv, V. N. Soloviev, S. O. Semerikov (Eds.), Proceedings of the Selected and Revised Papers of 9th International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2- MLPEED 2021), Odessa, Ukraine, May 26-28, 2021, volume 3048 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. i–xxxi. URL: https://ceur-ws.org/Vol-3048/paper00.pdf. [29] P. V. Zahorodko, Y. O. Modlo, O. O. Kalinichenko, T. V. Selivanova, S. O. Semerikov, Quantum enhanced machine learning: An overview, CEUR Workshop Proceedings 2832 (2020) 94–103. 77 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 [30] P. V. Zahorodko, S. O. Semerikov, V. N. Soloviev, A. M. Striuk, M. I. Striuk, H. M. Shalatska, Com- parisons of performance between quantum-enhanced and classical machine learning algorithms on the IBM Quantum Experience, Journal of Physics: Conference Series 1840 (2021) 012021. doi:10.1088/1742-6596/1840/1/012021. [31] D. S. Antoniuk, T. A. Vakaliuk, V. V. Didkivskyi, O. Vizghalov, O. V. Oliinyk, V. M. Yanchuk, Using a business simulator with elements of machine learning to develop personal finance management skills, in: V. Ermolayev, A. E. Kiv, S. O. Semerikov, V. N. Soloviev, A. M. Striuk (Eds.), Proceedings of the 9th Illia O. Teplytskyi Workshop on Computer Simulation in Education (CoSinE 2021) co-located with 17th International Conference on ICT in Education, Research, and Industrial Applications: Integration, Harmonization, and Knowledge Transfer (ICTERI 2021), Kherson, Ukraine, October 1, 2021, volume 3083 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 59–70. URL: https://ceur-ws.org/Vol-3083/paper131.pdf. [32] D. S. Antoniuk, T. A. Vakaliuk, V. V. Didkivskyi, O. Y. Vizghalov, Development of a simulator to determine personal financial strategies using machine learning, CEUR Workshop Proceedings 3077 (2022) 12–26. [33] S. Zelinska, Machine learning: Technologies and potential application at mining companies, E3S Web of Conferences 166 (2020) 03007. doi:10.1051/e3sconf/202016603007. [34] H. B. Danylchuk, S. O. Semerikov, Advances in machine learning for the innovation economy: in the shadow of war, in: H. B. Danylchuk, S. O. Semerikov (Eds.), Proceedings of the Selected and Revised Papers of 10th International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2022), Virtual Event, Kryvyi Rih, Ukraine, November 17-18, 2022, volume 3465 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 1–25. URL: https://ceur-ws.org/Vol-3465/paper00.pdf. [35] Y. O. Hodlevskyi, T. A. Vakaliuk, O. V. Chyzhmotria, O. Chyzhmotria, O. V. Vlasenko, Finding Anomalies in the Operation of Automated Control Systems Using Machine Learning, in: T. Hov- orushchenko, O. Savenko, P. T. Popov, S. Lysenko (Eds.), Proceedings of the 4th International Workshop on Intelligent Information Technologies & Systems of Information Security, Khmel- nytskyi, Ukraine, March 22-24, 2023, volume 3373 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 681–698. URL: https://ceur-ws.org/Vol-3373/paper47.pdf. [36] R. Patriarca, G. Di Gravio, F. Costantino, A Monte Carlo evolution of the Functional Resonance Analysis Method (FRAM) to assess performance variability in complex systems, Safety science 91 (2017) 49–60. [37] N. Rizun, T. Shmelova, Decision-making models of the human-operator as an element of the socio-technical systems, in: Strategic Imperatives and Core Competencies in the Era of Robotics and Artificial Intelligence, IGI Global, 2017, pp. 167–204. [38] O. M. Haranin, N. V. Moiseienko, Adaptive artificial intelligence in RPG-game on the Unity game engine, CEUR Workshop Proceedings 2292 (2018) 143–150. [39] M. V. Marienko, S. O. Semerikov, O. M. Markova, Artificial intelligence literacy in secondary education: methodological approaches and challenges, in: S. Papadakis (Ed.), Proceedings of the 11th Workshop on Cloud Technologies in Education (CTE 2023), Kryvyi Rih, Ukraine, December 22, 2023, volume 3679 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 87–97. URL: https: //ceur-ws.org/Vol-3679/paper21.pdf. [40] A. Bielinskyi, V. Soloviev, V. Solovieva, H. Velykoivanenko, Fuzzy time series forecasting using semantic artificial intelligence tools, Neuro-Fuzzy Modeling Techniques in Economics 2022 (2022) 157–198. doi:10.33111/nfmte.2022.157. [41] C. Brooks, A. G. F. Hoepner, D. McMillan, A. Vivian, C. W. Simen, Financial data science: the birth of a new financial research paradigm complementing econometrics?, The European Journal of Finance 25 (2019) 1627–1636. doi:10.1080/1351847X.2019.1662822. [42] M. L. De Prado, Advances in financial machine learning, John Wiley & Sons, 2018. [43] H. Danylchuk, N. Chebanova, N. Reznik, Y. Vitkovskyi, Modeling of investment attractiveness of countries using entropy analysis of regional stock markets, Global Journal of Environmental Science and Management 5 (2019) 227–235. URL: https://www.gjesm.net/article_35558.html. doi:10. 78 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 22034/gjesm.2019.05.SI.25. [44] A. O. Bielinskyi, S. V. Hushko, A. V. Matviychuk, O. A. Serdyuk, S. O. Semerikov, V. N. Soloviev, Irreversibility of financial time series: a case of crisis, in: A. E. Kiv, V. N. Soloviev, S. O. Semerikov (Eds.), Proceedings of the Selected and Revised Papers of 9th International Conference on Moni- toring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2021), Odessa, Ukraine, May 26-28, 2021, volume 3048 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 134–150. URL: https://ceur-ws.org/Vol-3048/paper04.pdf. [45] V. Soloviev, O. Serdiuk, S. Semerikov, A. Kiv, Recurrence plot-based analysis of financial-economic crashes, CEUR Workshop Proceedings 2713 (2020) 21–40. [46] V. N. Soloviev, A. Bielinskyi, O. Serdyuk, V. Solovieva, S. Semerikov, Lyapunov Exponents as Indicators of the Stock Market Crashes, in: O. Sokolov, G. Zholtkevych, V. Yakovyna, Y. Tarasich, V. Kharchenko, V. Kobets, O. Burov, S. Semerikov, H. Kravtsov (Eds.), Proceedings of the 16th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer. Volume II: Workshops, Kharkiv, Ukraine, October 06- 10, 2020, volume 2732 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 455–470. URL: https://ceur-ws.org/Vol-2732/20200455.pdf. [47] V. N. Soloviev, V. Solovieva, A. Tuliakova, A. Hostryk, L. Pichl, Complex networks theory and precursors of financial crashes, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the Special Edition of International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 53–67. URL: https://ceur-ws.org/Vol-2713/paper03.pdf. [48] I. Khvostina, S. Semerikov, O. Yatsiuk, N. Daliak, O. Romanko, E. Shmeltser, Casual analysis of financial and operational risks of oil and gas companies in condition of emergent economy, in: A. Kiv (Ed.), Proceedings of the Selected Papers of the Special Edition of International Conference on Monitoring, Modeling & Management of Emergent Economy (M3E2-MLPEED 2020), Odessa, Ukraine, July 13-18, 2020, volume 2713 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 41–52. URL: https://ceur-ws.org/Vol-2713/paper02.pdf. [49] L. Guryanova, L. Bogachkova, O. Zyma, M. Novosel, N. Poluektova, V. Gvozdytskyi, Mod- els of estimation and analysis of a systemic risk in the banking sector, in: 2020 IEEE 2nd International Conference on System Analysis Intelligent Computing (SAIC), 2020, pp. 1–6. doi:10.1109/SAIC51296.2020.9239193. [50] O. V. Kuzmenko, S. V. Lieonov, A. O. Boiko, Data mining and bifurcation analysis of the risk of money laundering with the involvement of financial institutions, Journal of International Studies 13 (2020). URL: https://www.jois.eu/files/22_871_Kuzmenko%20et%20al.pdf. [51] N. Klymenko, O. Nosovets, L. Sokolenko, O. Hryshchenko, T. Pisochenko, Off-balance accounting in the modern information system of an enterprise, Academy of Account- ing and Financial Studies Journal 23 (2019). URL: https://www.abacademies.org/articles/ offbalance-accounting-in-the-modern-information-system-of-an-enterprise-8403.html. [52] V. Derbentsev, S. Semerikov, O. Serdyuk, V. Solovieva, V. Soloviev, Recurrence based entropies for sustainability indices, E3S Web of Conferences 166 (2020) 13031. doi:10.1051/e3sconf/ 202016613031. [53] J. Prüfer, P. Prüfer, Data science for institutional and organizational economics, Technical Report, 2018. doi:10.2139/ssrn.3137014. [54] Y. Hrabovskyi, V. Babenko, O. Al’Boschiy, V. Gerasimenko, Development of a Technology for Automation of Work with Sources of Information on the Internet, WSEAS Transactions on Business and Economics 17 (2020) 231–240. [55] M. Ilchuk, N. Davydenko, Y. Nehoda, Scenario modeling of financial resources at the enterprise, Intellectual Economics 13 (2019). doi:10.13165/IE-19-13-2-05. [56] M. Oliskevych, G. Beregova, V. Tokarchuk, Fuel consumption in Ukraine: Evidence from vector error correction model, International Journal of Energy Economics and Policy 8 (2018). URL: https://www.econjournals.com/index.php/ijeep/article/view/6825/3925. [57] Y. Shi, J. Zhu, V. Charles, Data science and productivity: A bibliometric review of data science 79 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 applications and approaches in productivity evaluations, Journal of the Operational Research Society 72 (2020) 975–988. [58] A. Matviychuk, I. Strelchenko, S. Vashchaiev, H. Velykoivanenko, Simulation of the crisis contagion process between countries with different levels of socio-economic development, CEUR Workshop Proceedings 2393 (2019) 485–496. [59] A. Kaminskyi, M. Nehrey, N. Rizun, The impact of COVID-induced shock on the risk-return correspondence of agricultural ETFs, CEUR Workshop Proceedings 2713 (2020) 204–218. [60] M. Nehrey, A. Kaminskyi, M. Komar, Agro-economic models: a review and directions for research, Periodicals of Engineering and Natural Sciences 7 (2019) 702–711. URL: http://pen.ius.edu.ba/index. php/pen/article/view/579. [61] I. Voronenko, A. Skrypnyk, N. Klymenko, D. Zherlitsyn, Y. Starychenko, Food security risk in Ukraine: assessment and forecast, Agricultural and Resource Economics: International Scientific E-Journal 6 (2020) 63–75. [62] M. Ausloos, R. Cerqueti, T. A. Mir, Data science for assessing possible tax income manipulation: The case of Italy, Chaos, Solitons & Fractals 104 (2017) 238–256. [63] M. Oliskevych, I. Lukianenko, Labor force participation in Eastern European countries: nonlinear modeling, Journal of Economic Studies 46 (2019) 1258–1279. [64] National Academies of Sciences, Engineering, and Medicine, Division on Engineering and Physi- cal Sciences, Division of Behavioral and Social Sciences and Education, Computer Science and Telecommunications Board, Board on Mathematical Sciences and Analytics, Committee on Applied and Theoretical Statistics, Board on Science Education, Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, Data science for undergraduates: Opportunities and options, The National Academies Press, Washington, DC, 2018. doi:10.17226/25104. [65] N. Volkova, N. Rizun, M. Nehrey, Data science: Opportunities to transform education, CEUR Workshop Proceedings 2433 (2019) 48–73. [66] I. Perevozova, V. Babenko, Z. Krykhovetska, I. Popadynets, Holistic approach based assessment of social efficiency of research conducted by higher educational establishments, E3S Web of Conferences 166 (2020) 13022. [67] A. E. Kiv, M. P. Shyshkina, S. O. Semerikov, A. M. Striuk, Y. V. Yechkalo, AREdu 2019 – How augmented reality transforms to augmented learning, CEUR Workshop Proceedings 2547 (2020) 1–12. [68] I. Dimitrov, N. Davydenko, A. Lotko, A. Dimitrova, Comparative study of main determinants of entrepreneurship intentions of business students, in: 2019 International Conference on Creative Business for Smart and Sustainable Growth (CREBUS), IEEE, 2019, pp. 1–4. [69] G. James, D. Witten, T. Hastie, R. Tibshirani, An introduction to statistical learning: with Appli- cations in R, volume 112 of Springer Texts in Statistics, Springer, New York, 2013. doi:10.1007/ 978-1-4614-7138-7. [70] M. Nehrey, T. Hnot, Using recommendation approaches for ratings matrixes in online marketing, Studia Ekonomiczne (2017) 115–130. [71] I. Voronenko, M. Nehrey, S. Kostenko, I. Lashchyk, V. Niziaieva, Advertising strategy management in Internet marketing, Journal of Information Technology Management 13 (2021) 35–47. [72] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research 3 (2003) 993–1022. URL: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf. [73] M. E. Roberts, B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, B. Albertson, S. Gadarian, D. Rand, Topic models for open ended survey responses with applications to experiments, American Journal of Political Science 58 (2014) 1064–82. [74] S. D. Robinson, Temporal topic modeling applied to aviation safety reports: A subject matter expert review, Safety science 116 (2019) 275–286. [75] P. DiMaggio, M. Nag, D. Blei, Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding, Poetics 41 (2013) 570–606. [76] M. E. Roberts, B. M. Stewart, E. M. Airoldi, A model of text for experimentation in the social 80 Nina O. Rizun et al. CEUR Workshop Proceedings 63–81 sciences, Journal of the American Statistical Association 111 (2016) 988–1003. [77] D. M. Blei, Probabilistic topic models, Communications of the ACM 55 (2012) 77–84. [78] V. B. Kobayashi, S. T. Mol, H. A. Berkers, G. Kismihok, D. N. Den Hartog, Text classification for organizational researchers: A tutorial, Organizational research methods 21 (2018) 766–799. [79] V. B. Kobayashi, S. T. Mol, H. A. Berkers, G. Kismihók, D. N. Den Hartog, Text mining in organizational research, Organizational research methods 21 (2018) 733–765. [80] B. C. Wallace, M. J. Paul, U. Sarkar, T. A. Trikalinos, M. Dredze, A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews, Journal of the American Medical Informatics Association 21 (2014) 1098–1103. [81] R. D. Sharma, S. Tripathi, S. K. Sahu, S. Mittal, A. Anand, Predicting online doctor ratings from user reviews using convolutional neural networks, International Journal of Machine Learning and Computing 6 (2016) 149. [82] A. López, A. Detz, N. Ratanawongsa, U. Sarkar, What patients say about their doctors online: a qualitative content analysis, Journal of general internal medicine 27 (2012) 685–692. [83] T. L. James, E. D. V. Calderon, D. F. Cook, Exploring patient perceptions of healthcare service quality through analysis of unstructured feedback, Expert Systems with Applications 71 (2017) 479–492. [84] M. E. Roberts, B. M. Stewart, D. Tingley, Stm: An R package for structural topic models, Journal of Statistical Software 91 (2019) 1–40. [85] M. E. Roberts, B. M. Stewart, D. Tingley, E. M. Airoldi, et al., The structural topic model and applied social science, in: Advances in neural information processing systems workshop on topic models: computation, application, and evaluation, volume 4, Harrahs and Harveys, Lake Tahoe, 2013, pp. 1–20. [86] N. Hu, T. Zhang, B. Gao, I. Bose, What do hotel customers complain about? text analysis using structural topic model, Tourism Management 72 (2019) 417–426. [87] A. Ojo, N. Rizun, Structural and temporal topic models of feedbacks on service quality–a path to theory development?, in: Americas Conference on Information Systems (AMCIS 2020). Healthcare Informatics & Health Information Tech (SIGHealth), volume 15, 2020. [88] J. Chang, lda: Collapsed Gibbs sampling methods for topic models, 2015. URL: https://rdrr.io/cran/ lda/. [89] T. L. Griffiths, M. Steyvers, Finding scientific topics, Proceedings of the National Academy of Sciences 101 (2004) 5228–5235. 81