=Paper=
{{Paper
|id=Vol-3083/paper296
|storemode=property
|title=Computational modelling of stochastic processes for learning research
|pdfUrl=https://ceur-ws.org/Vol-3083/paper296.pdf
|volume=Vol-3083
|authors=Oleksandr H. Kolgatin,Larisa S. Kolgatina,Nadiia S. Ponomareva
|dblpUrl=https://dblp.org/rec/conf/icteri/KolgatinKP21
}}
==Computational modelling of stochastic processes for learning research==
Computational modelling of stochastic processes for learning research Oleksandr H. Kolgatin1 , Larisa S. Kolgatina2 and Nadiia S. Ponomareva3,4 1 Simon Kuznets Kharkiv National University of Economics, 9A Science Ave., Kharkiv, 61166, Ukraine 2 H. S. Skovoroda Kharkiv National Pedagogical University, 29 Alchevskyh Str., Kharkiv, 61002, Ukraine 3 Kryvyi Rih State Pedagogical University, 54 Gagarin Ave., Kryvyi Rih, 50086, Ukraine 4 Kharkiv University of Technology “STEP”, 9/11 Malomyasnytska Str, Kharkiv, 61000, Ukraine Abstract The objectives of our work were to use computer-based statistical modelling for comparison and system- atisation of various approaches to non-parametric null hypothesis significance testing. Statistical model for simulation of null hypothesis significance testing has been built for educational purpose. Fisher’s angular transformation, Chi-square, Mann-Whitney and Fisher’s exact tests were analysed. Appropriate software has been developed and gave us possibility to suggest new illustrative materials for describing the limitations of analysed tests. Learning researches as the method of understanding inductive statistics have been suggested, taking into account that modern personal computers provide acceptable time of the simulations with high precision. The obtained results showed low power of the most popular non-parametric tests for small samples. Students can’t analyse the test power at traditional null hy- pothesis significance testing, because the real differences between samples are unknown. Therefore, it is necessary to change the accents in Ukrainian statistical education, including PhD studies, from using null hypothesis significance testing to statistical modelling as a modern and effective method of proving the scientific hypothesises. This conclusions correlate with observed scientific publications and the recommendation of the American Statistical Association. Keywords computational modelling, computer-based simulation, statistical hypothesis significance testing, educa- tion, learning research 1. Introduction 1.1. Statement of the problem Computational modelling and using the computer-based models for simulation is an essential part of educational content and methodology. From one hand, computer-based simulation CoSinE 2021: 9th Illia O. Teplytskyi Workshop on Computer Simulation in Education, co-located with the 17th International Conference on ICT in Education, Research, and Industrial Applications: Integration, Harmonization, and Knowledge Transfer (ICTERI 2021), October 1, 2021, Kherson, Ukraine " kolgatin@ukr.net (O. H. Kolgatin); LaraKL@ukr.net (L. S. Kolgatina); ponomareva.itstep@gmail.com (N. S. Ponomareva) ~ http://www.is.hneu.edu.ua/?q=node/294 (O. H. Kolgatin); http://hnpu.edu.ua/uk/kolgatina-larysa-sergiyivna (L. S. Kolgatina); https://tinyurl.com/5xc89ntp (N. S. Ponomareva) 0000-0001-8423-2359 (O. H. Kolgatin); 0000-0003-2650-8921 (L. S. Kolgatina); 0000-0001-9840-7287 (N. S. Ponomareva) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 becomes one of the main method of pedagogical research that brings new facilities in forecast and proving the efficiency of new pedagogical technologies. From the other hand computational modelling and computer-based simulations provided by students give them new experience in difficult sided of educational content, facilitate competences in independent work and re- searcher’s competences [1]. Thus, students’ learning research with computational simulations has been developed as a method for improving students’ self management through creative learning activity [2, 3]. Semerikov et al. [4, 5] have suggested to use computer simulation of neural networks using spreadsheets. These results give us possibility to introduce this modern technology in educational process of wide kinds of educational programs that are not directly connected with computer science. The elements of technique of using CoCalc at studying topic “Neural network and pattern recognition” of the special course “Foundations of Mathematic Informatics” are shown in the works of Markova et al. [6]. The method of computational simulation and modelling is supported by work of Modlo and Semerikov [7], where new tools for modelling of electromechanical technical objects in cloud-based learning environment have been suggested. Khazina et al. [8] also considered computer modelling as a scientific means of training. So we can conclude that computational modelling and simulation is the popular and actual learning method. This field of research is very interesting for our present study, because it promotes development of computational modelling as a method of learning researches. One of such fields of pedagogical investigations, where new information technologies can provide new level of understanding the modelled processes, is comparative pedagogical exper- iment and statistical hypothesis testing as a part of one. Computer-based statistical analysis becomes a major part of the monitoring of the learning resources quality [9]. In our work, we consider statistical processing of results of pedagogical experiment as one aspect of pedagogical research. Traditionally this problem is solved by methods of mathematical statistics on the basis of statistical hypothesis testing. Two hypotheses are put forward: the null hypothesis, which states that there are no differences between compared random variables in the studied parameter, and the alternative hypothesis, which argues that the observed differences are caused by the studied impact. A researcher uses some criterion that integrates the observed differences in the numeric form and calculates the probability of obtaining the same or larger differences in random process to accept one of those hypothesises. The number of participants in pedagogical studies is usually small, so we accept alternative hypothesis, if the probability of the type I error (the probability that the observed differences are due to random factors) does not exceed 5 %. Understanding the essence of statistical hypothesis testing is a hard problem. Thus, Sotos et al. [10] pointed the common misconceptions about statistical inference. They noted that in response to the persistence of the misconceptions, educational researchers and practitioners have initiated and promoted a thorough reform for teaching statistics. And one direction of this reform was the importance of integrating technology in the statistics classroom, using simulations to help students understand the ideas behind statistical processes [10]. Problems of using statistical hypothesis tests are so deep that discussions continue even now, after more than a hundred years after implementation this approach into the science. For example, sci- entists discuss the problem of dichotomization of p-values, because it makes matters worse (Wasserstein et al. [11]) and suggest to describe the data using other approaches (McShane et al. [12]). The use of computer-based modelling provides a new look at the system of inductive methods of statistics, gives possibility to highlight the most powerful methods and to determine 2 the limits of their applicability, which is particularly important in psychological and pedagogical studies, where samples are small [13]. Otherwise, the practice of statistical data analysis in Ukrainian pedagogical researches is grounded on traditional approach. So we need to show educational community modern computer-based techniques for data analysis that are based on simulation of stochastic pro- cesses. We also need to show comparison of these techniques with traditional criteria for null hypothesis significance testing. This work is devoted to simulation of using popular classical criteria of statistical hypothesis testing: Pearson’s criterion Chi-square, Fisher’s angular trans- formation and Mann-Whitney U criterion. Information and communication technologies offer new perspectives for the analysis of the boundaries of these tests application, investigation of the criteria sensitivity, development of approaches to statistical analysis for small samples. Learning researches with appropriate models will be useful not only for students, but also for researchers to improve understanding the essence of statistical data processing in pedagogic research. 1.2. Analysis of previous research Last time researchers pay great attention for statistical modelling as an alternative approach to prove research hypothesises. Computer-based simulation has provided possibility to show the boundaries of using Pearson’s criterion Chi-square at null hypothesis significance testing (Kolgatin [13]). Computational model for investigating the efficiency of statistical hypothesis testing was proposed. This model did not use any assumptions about probability distribution and test features. So it could be used for comparison of methods built on different principles. There was shown that Chi-square test and Fisher’s angular transformation test in the studied range of sample sizes (from 9 to 200) do not provide good accuracy for frequency tables with 2 categories. The idea of these tests is to guarantee that we should obtain the error of the first type (type I error) in 5 % cases (5 % significance level were used), if the null hypothesis is true. Real value of the type I error was essentially differ within the interval from 0.04 to 0.08 instead of 0.05. Accuracy of the type I error estimation is better (in the interval from 0.04 to 0.06), if the sizes of samples are not less than 70. Accuracy of the type I error estimation by Chi-square test for frequency tables with 3 categories is better even for very small samples sizes. This accuracy essentially depends on the number of measures in samples and is worse, when one of the sample is small and the other one is large. Therefore, some recommendations for combining categories to use the chi-square test for small sample sizes are debatable. Another result that was obtained in [13] is devoted to Chi-square test power, its ability to show differences between distributions. The type II error was quite high, it decreased with increasing the sample sizes, when 3 categories in frequencies tables were used instead of 2 categories. This question is discussed in this paper later in detail, but we can conclude here that this results correlate with the statement of the American Statistical Association (ASA) about the limitations of p values (Wasserstein and Lazar [14]). Statistical modelling as a powerful alternative to null hypothesis significance testing was described by Lang et al. [15]. They noted that statistical modelling is a more complicated approach than null hypothesis significance testing, but this added complexity affords researchers the opportunity to quantify evidence in support of specific substantive hypotheses relative to 3 competing hypotheses — not simply against a null hypothesis [15]. The authors underlined that the purpose of statistical modelling is to represent, as accurately and completely as possible, a data generation process, with the goal of understanding and gathering evidence about its structure [15]. These authors suggested and compared Bayesian and “frequentist” models for exploring how child temperament mediates the relationship between age and developmental progress in communication and motor skills [15]. Statistical modelling as an educational tool is analysed in many scientific works. A good review on the corresponded literature was suggested by Jamie [16]. The main idea of the authors is to use computer simulation methods (CSMs) for the purpose of clarifying abstract and difficult concepts and theorems of statistics. Some systems of computer mathematics and spreadsheets are considered: SAS PROC IML, Excel, MINITAB, SAS, SPSS. There was analysed the approaches to teach and illustrate such parts of statistical education: central limit theorem, Student’s t-distribution, confidence intervals, binomial distribution, regression analysis, sampling distribution, survey sampling [16]. Many of the models for modelling the statistical hypothesis testing with educational purpose was suggested at the end of last century. Flusser and Hanna [17] have used BASIC computer programs to simulate a binomial experiment and test a simple statistical hypothesis. Taylor and Bosch [18] suggested interactive clinical trial simulation program that provides a few thousand simulations in about 5 minutes. Bradley et al. [19] have developed a comprehensive simulation laboratory for statistics that could work with real experimental data from database and generate samples according to given parameters. This software calculated p-value according F-test. Students could see that the decisions about the null hypothesis differ for various series and analyse the Type I and Type II errors. Ricketts and Berry [20] used statistical modelling in a package Resampling Stats to demonstrate a histogram of Differences between means. This results, obtained for very small samples, helped students to understand the essence of null hypothesis significance testing without any formulas. This software gave possibility to demonstrate the Type I and Type II errors for students, but did not produce enough performance for analysing the qualities of used criteria. Therefore, it is actual to develop some computer-based model for comparison of various approach for null hypothesis significance testing and analysing boundaries of its using. Such model will be useful not only for understanding the essence of null hypothesis significance testing, but it will be also useful for understanding the limitation of the traditional null hypothesis significance testing approach and will motivate pedagogical scientists to computational modelling as a perspective method of statistical data analysis. 1.3. Objectives We have started this work in 2014 with the objectives to use computer-based statistical modelling for comparison and systematisation of various approaches to non-parametric null hypothesis significance testing. The accessible for Ukrainian students information in textbooks and hand- books was contradictory and not enough for confident and reasonable choice of the statistical method for data analysis in pedagogical researches. We have tried to develop a statistical model for providing learning researches with null hypothesis significance testing by university and postgraduate students. 4 But now we are finishing this work with the objectives to prove advantage of statistical modelling over null hypothesis significance testing. We are grounding on our own simulations, stormy development of information and communication technologies and newest publications in statistical scientific literature. The aim of this research is to show the limitations of classical null hypothesis significance testing and motivate students and researchers to computational modelling as an effective method of research hypothesises proving. Such changing the objectives of our study leads to some inconsistency of this paper and deprives us of opportunities to introduce this analysis directly into educational process, because the program of statistical education should be revised taking into account obtained result. So we can suggest our results as a matter for critical thinking and developing educational programs to statistical educators. 2. Theoretical framework Statistical modelling of various criteria for null hypothesis significance testing needs in preparing procedures of these criteria implementation. More over, some criteria, such as Pearson’s Chi- square, Fisher’s angular transformation, needs data in a form of frequency table. Our model generates the samples in metric scale, so the data should be collapsed into some intervals to obtain the frequency table. Pearson’s Chi-square criterion was used in the form: 𝑚 ∑︁ 𝑘 ∑︁ (𝐸𝑖,𝑗 − 𝑇𝑖,𝑗 )2 𝜒2 = , (1) 𝑇𝑖,𝑗 𝑖=1 𝑗=1 where 𝐸𝑖,𝑗 , 𝑇𝑖,𝑗 – empirical and theoretical frequencies; 𝑖, 𝑚 – index and number of categories; 𝑗, 𝑘 – index and number of samples (𝑘 = 2 in this study). The form of Chi-square criterion with Yates’s correction for continuity was analysed by D’Agostino et al. [21], Kolgatin [22] and was not used here. All studies in this work were carried out for significance level of 5 %, the critical values of Chi-square criterion were assumed according to Verma [23]. The criterion of Fisher’s angular transformation was used for 2-tails in such form: ⃒ (︂ )︂ (︂ )︂⃒ √︂ * 𝐸 1,1 𝐸 1,2 𝑛1 · 𝑛2 (2) ⃒ ⃒ 𝜙 = 2 · ⃒⃒arcsin − arcsin ⃒ , 𝑛1 𝑛2 ⃒ 𝑛1 + 𝑛2 where 𝐸1,1 and 𝐸1,2 – frequencies in one of the categories for samples 1 and 2; 𝑛1 and 𝑛2 – sizes of samples 1 and 2 [24]. The critical value of this criterion was assumed 1.96 at significance level of 5 % (2-tail). Mostly, Fisher’s angular transformation is used for 1-sided test with critical value 1.64 [24]. The test power is higher in such case [13]. We used 2-sided test in this work to have correct comparison with Pearson’s Chi-square test, which has no 1-sided form. The words “exact test” are magical for some students and even researchers. Fisher’s exact test for consistency in a 2×2 table was analysed by D’Agostino et al. [21], Berkson [25], Liddell [26] etc. Their results were pessimistic. All these researchers believed that this test is exact only because it do not use any approximations. Theoretical basis of this test is not exact, so let try 5 our simulations to understand and illustrate the problem. We have used 2-sided form of the Fisher’s exact test criterion, which give us the p-value (probability of the Type I error) [27, 28]. The probability of given observed frequencies combination can be calculated with the formula: (𝑛1 )!(𝑛2 )!(𝑛𝑎 )!(𝑛𝑏 )! 𝑝* = , (3) 𝑎1 !𝑏1 !𝑎2 !𝑏2 !𝑛! where 𝑎1 , 𝑏1 , 𝑎2 , 𝑏2 – observed frequencies in the samples 𝐴 and 𝐵 in the categories 1 and 2 accordantly; 𝑛 = 𝑎1 + 𝑎2 + 𝑏1 + 𝑏2 – total number of measures; 𝑛1 = 𝑎1 + 𝑏1 – number of measures in the category 1; 𝑛2 = 𝑎2 + 𝑏2 – number of measures in the category 2; 𝑛𝑎 = 𝑎1 + 𝑎2 – the size of the sample 𝐴; 𝑛𝑏 = 𝑏1 + 𝑏2 – the size of the sample 𝐵. The probability of random realisation of given combination and all other less probable combinations is ∑︁ 𝑝 = 𝑝* + 𝑝𝑖 , (4) ∀𝑝𝑖 <𝑝* ,𝑖∈[0;𝑛𝑎 ] where (𝑛1 )!(𝑛2 )!(𝑛𝑎 )!(𝑛𝑏 )! 𝑝𝑖 = . (5) (𝑖)!(𝑛1 − 𝑖)!(𝑛𝑎 − 𝑖)!(𝑛𝑏 − 𝑛𝑎 + 𝑖)!𝑛! Mann-Whitney test and its modifications are the field of researchers’ attention now and statistical modelling is the main method of comparison the efficiency of various modifications [29, 30]. The assumptions of this group of test were analysed by Fay and Proschan [31]. Mann-Whitney test was used in our work based on research by Sidorenko [32], Gubler and Genkin [24], Billiet [33] in the form 𝑈 = min(𝑈𝑎 , 𝑈𝑏 ), (6) where 𝑛𝑎 (𝑛𝑎 + 1) 𝑈𝑎 = (𝑛𝑎 𝑛𝑏 ) + − 𝑇𝑎 , (7) 2 𝑈𝑏 = (𝑛𝑎 𝑛𝑏 ) − 𝑈𝑎 , (8) where 𝑛𝑎 and 𝑛𝑏 – the sizes of 𝐴 and 𝐵 samples accordantly; 𝑇𝑎 – the sum of ranks in the sample 𝐴. The calculated values of Mann-Whitney criterion were compared with its critical values according the table, when both 𝑛𝑎 and 𝑛𝑏 were not grater than 30 [33]. Z-test for U criterion was used in cases, where at least one of the sample size was grater than 30 [33]: 𝑈 − 21 𝑛𝑎 𝑛𝑏 𝑍 = √︁ . (9) 𝑛𝑎 𝑛𝑏 (𝑛𝑎 +𝑛𝑏 +1) 12 3. Statistical model The method of statistical modelling was used for investigation. The model allows to form 2 samples from one population or from different populations that have some differences in its probability distributions. 6 The first regime was used for Type I error investigation. Two series of numbers was created on the base of the same random number generator. The values obtained were distributed into 𝑚 categories, we could control the distribution to ensure uniform distribution or the predominance of frequencies in certain categories; an empirical value of criterion was calculated for obtained frequencies tables and compared with the critical value of this criterion at the specified level of significance; decision about the possibility of rejection of the null hypothesis was made. We knew that actually the null hypothesis was true, because both samples (series of numbers) were generated with one random number generator. But the alternative hypothesis was accepted in some of the tests as a result of random factors. The relative frequency of such false decisions was estimated as the probability of a type I error and should correspond to the significance level that was used to choose critical value of a criterion. We needed a large number of trials to obtain a satisfactory precision of the analysis. 1000000 trials were conducted in computational experiments for each case. The precision of the obtained values of the probability of a type I error was estimated on the base of the standard deviation in consecutive identical trials. The estimated absolute error was about 0.0005 for 95 % confidence interval. So we used all numbers with 2-3 significant digits and the last digit in all shown results is spare guard digit. We need in a guard digit for further data processing. The number of trials can be less in students’ investigations for saving computational time when the power of tests are analysed. In some of the trials with very small samples were obtained zero values of the frequencies in some categories, and it was not possible to calculate the values of a criterion. These results were removed from the analysis, and, if their part in the total number of trials exceeded 1 %, the study under appropriate conditions was not conducted. The second regime was used for investigation the power of the tests. Two unequal random number generators were used to analyse of criteria sensitivity. In such case we knew that actually the alternative hypothesis is true, because samples (series of numbers) were generated with different random number generator. We could control the level of variation. The relative frequency of true positive decisions corresponds to the criterion power that determined by the level of differences between the parameters of random number generators, which are used for samples. 4. Learning researches 4.1. Motivation with leading questions One of the method to motivate students for learning research is to suggest them leading questions after a brief theoretical review [2]. Such questions attract students’ attention to the most problem and debatable issues of the educational content. The limitation of using some theory, accuracy estimation, possible practical problems in specific cases are always the problem not only for students, but also for professional researchers. Concerning to the statistical education in using null hypothesis significance testing we would suggest students such leading questions: • Is Type I error fixed, when null hypothesis significance testing? • What factors affect the power of the null hypothesis significance testing? 7 Table 1 Type I error, when null hypothesis is true (example for equivalent samples sizes) Frequency of null hypothesis rejection using the test, % Size of Size of Pearson Chi-Square test for Fisher’s angular Fisher’s Mann- the the the number of categories transformation exact test Whitney sample 𝐴 sample 𝐵 2 3 5 test 4 4 7.07 0.79 5 5 6.05 2.17 3.18 6 6 5.79 0.63 4.11 7 7 5.71 1.29 3.81 8 8 7.64 2.09 4.96 9 9 4.99 9.02 2.10 4.01 10 10 4.21 4.83 4.84 1.26 4.03 data storing continue with the step given by a teacher 198 198 5.01 5.02 5.02 5.01 3.95 5.00 199 199 5.07 5.04 5.01 5.07 4.00 4.98 200 200 5.10 5.00 5.04 5.01 4.02 4.96 • Should we collapse metric scale data into some intervals? • Which tests should we use for small samples? • What can we know about the test power when implement null hypothesis significance testing in practice? • Can we prove that null hypothesis is true? Students find the answers on these leading questions during independent work according to the plan of learning research. The answers can be not determined. So the process of making conclusions is creative for students. Students should be equipped with the clear instruction for research steps and data collection. Also, some templates for conclusions should be prepared. We’ll show the examples of tables and diagrams as possible results of investigations. The details of instructional materials is determined by the level of readiness of students for independent work. 4.2. Learning research of the non-parametric criteria performance with true null hypothesis Now we will refer to students. You know some recommendations about limitations in using some null hypothesis significance tests. But how important are each of these limitations? What error can took place in each practical case of the test using. Textbooks do not give us detailed information. We can find the answers in professional research papers, but, may be, this source of information is not so easy to use. May be, some practical questions was not analysed in scientific works yet. So we need to master the method of statistical modelling to explore some specific practical problems. Study the above statistical model and use it to test non-parametric criteria performance with true null hypothesis. This model tries to use Pearson’s Chi-square, Fisher’s angular transformation, Fisher’s exact test for consistency in a 2×2 table, Mann-Whitney test for testing null hypothesis for 2 samples of some given probability distribution. Both samples 8 Table 2 Conclusions according the accuracy of Type I error estimation with analysed tests Templates Accuracy of Type I error estimation with analysed tests was ? 𝑏𝑒𝑡𝑡𝑒𝑟 𝑤𝑜𝑟𝑠𝑒 when processing the data organised in 2 categories Accuracy of Type I error estimation with analysed tests ? 𝑑𝑖𝑠𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑 with increasing the sizes of samples The matter of the observed periodical behaviour of Type I error estimation at using Fisher’s exact test for 2x2 frequency tables (2 categories) is ? 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑚𝑎𝑡𝑡𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑎𝑠𝑒 𝑚𝑜𝑑𝑒𝑙 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑟𝑖𝑡𝑒𝑟𝑖𝑜𝑛 𝑙𝑜𝑤 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑜𝑓 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠 The matter of the observed periodical behaviour of Type I error estimation at using Fisher’s angular transformation and Pearson’s Chi-square test for 2x2 frequency tables (2 categories) is ? 𝑎𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟 𝑎𝑛𝑑 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑚𝑎𝑡𝑡𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑟𝑖𝑡𝑒𝑟𝑖𝑎 𝑙𝑜𝑤 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑜𝑓 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠 Fisher’s exact test was ? 𝑙𝑒𝑠𝑠 𝑐𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑣𝑒 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑣𝑒 than Fisher’s angular transformation and Pearson’s Chi- square test for 2x2 frequency tables (2 categories) ′ Collapsing data of small size samples into less number of categories ? 𝑑𝑖𝑑𝑛 𝑙𝑒𝑑 𝑡𝑜 𝑡 𝑙𝑒𝑎𝑑 𝑡𝑜 improving Type I error estimation The observed behaviour ? 𝑐𝑎𝑛 𝑑𝑖𝑓 𝑓 𝑒𝑟 𝑤𝑖𝑙𝑙 𝑏𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 in simulations with another probabilities distribution in population and another sizes of samples Figure 1: Accuracy of Type I error estimation in Fisher’s angular transformation, Chi-square and Fisher’s exact tests for 2 categories for samples of equal sizes 𝑛𝑎 = 𝑛𝑏 = 4...200. are the random samples from the unique population. So, the ideal test should decline the null hypothesis in 5 % cases (Type I error on significance level 5 %). Try your simulations for given probability distribution in population according to your individual variant and fill in the table 1. It will be useful to create this table in some spreadsheet by copying the data from the used 9 Table 3 Conclusions according the power of analysed tests with given data Templates ′ Collapsing data of small size samples into less number of categories ? 𝑑𝑖𝑑𝑛 𝑙𝑒𝑑 𝑡𝑜 𝑡 𝑙𝑒𝑎𝑑 𝑡𝑜 improving the power of null hypothesis significance testing Power of the analysed tests ? 𝑑𝑖𝑠𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑 with increasing the sizes of samples asymptotically To satisfy the Type II error less than 5 % (power grater than 95 %) at using Chi-square test for 2x2 frequency tables (2 categories) we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ To satisfy the Type II error less than 5 % (power grater than 95 %) at using Chi-square test for 3x2 frequency tables (3 categories) we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ To satisfy the Type II error less than 5 % (power grater than 95 %) at using Chi-square test for 5x2 frequency tables (5 categories) we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ To satisfy the Type II error less than 5 % (power grater than 95 %) at using Fisher’s exact test for 2x2 frequency tables (2 categories) we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ To satisfy the Type II error less than 5 % (power grater than 95 %) at using Fisher’s angular transforma- tion test for 2x2 frequency tables (2 categories) we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ To satisfy the Type II error less than 5 % (power grater than 95 %) at using Mann-Whitney test we needed samples sizes 𝑛𝑎 = _____ and 𝑛𝑏 = _____ The analysed tests can be ranged according to their power in the such order: 1. (with the most power) _____________________; 2. _____________________; 3. _____________________; 4. _____________________; 5. _____________________; 6. _____________________ Power of some analysed tests can be improved in the case of one-sided hypothesis testing: 1. _____________________; 2. _____________________ The observed behaviour ? 𝑐𝑎𝑛 𝑑𝑖𝑓 𝑓 𝑒𝑟 𝑤𝑖𝑙𝑙 𝑏𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 in simulations with another probabilities distribution in population and another sizes of samples software output. Draw diagrams according to the obtained data (see the examples on figure 1 and figure 2). Analyse your results and draw conclusions according to the templates in the table 2. 4.3. Learning research of the non-parametric criteria power We continue to refer to students. Let analyse the power of the tests. You remember that, according to the rules, we reject the null hypothesis, if the test allows us. But we never say that we accept the null hypothesis. We should say that we can not reject it. Now we should understand the matter of such rule. Use the null hypothesis significance test for two samples with different probability distributions obtained by different random generators with given 10 Figure 2: Accuracy of Type I error estimation in Mann-Whitney and Chi-square tests for samples of equal sizes 𝑛𝑎 = 𝑛𝑏 = 5...200. Figure 3: Power of Fisher’s angular transformation, Chi-square, Mann-Whitney and Fisher’s exact tests for samples of equal sizes with uniform probability distributions in diapasons [−0.05; 0.95] and [0.05; 1.05]. 11 Figure 4: Power of Fisher’s angular transformation, Chi-square, Mann-Whitney and Fisher’s exact tests for samples of equal sizes with uniform probability distributions in diapasons [−0.1; 0.9] and [0.1; 1.1]. different parameters (according to your individual variant). Store the results in the form of the table 1. The form is the same, but now we know that null hypothesis is false. So the data in the table will show the power of the tests. Organise your data using diagrams and show the theoretical probability distribution in your samples, using the information about your random generator. We have used the uniform random generators with different means to obtain the examples (see figure 3, figure 4, figure 5). Analyse your results and draw conclusions according to the templates in the table 3. As we can see, the main problem at using null hypothesis significance testing is unknown power of tests in practical tasks. The power of tests will be low, if null hypothesis is false with low differences between compared populations. The tests do not give us mechanism to estimate the power in such cases. So statistical modelling is more appropriate method of data analysing, because the model give us possibility to estimate data distribution and confidence intervals. 5. Conclusions Statistical model for simulation of null hypothesis significance testing has been built. Fisher’s angular transformation, Chi-square, Mann-Whitney and Fisher’s exact tests were analysed. Appropriate software has been developed and gave us possibility to suggest new illustrative materials for describing the limitations of analysed tests. Learning researches in inductive statistics have been suggested on the base of statistical modelling. This didactic materials can be useful for master and PhD students in pedagogics. 12 Figure 5: Power of Fisher’s angular transformation, Chi-square, Mann-Whitney and Fisher’s exact tests for samples of equal sizes with uniform probability distributions in diapasons [−0.15; 0.85] and [0.15; 1.15]. Suggested methods contain new views on the use of null hypothesis significance testing. We stress that collapsing data into less number of categories decrease the efficiency of tests and does not give any advantage in accuracy of significance level providing. We suggest to change the accents in Ukrainian statistical education, including PhD studies, from using null hypothesis significance testing to statistical modelling as a modern and effective method of proving the scientific hypothesises. We ground on results of our simulations suggested in this paper, possibilities of modern information and communication technologies, literature review and the opinion of American Statistical Association. The field of further research is in developing the courseware for teaching the inductive statistics based on statistical modelling. Studying the null hypothesis significance tests should be considered as an auxiliary simplified methods. References [1] O. H. Kolgatin, L. S. Kolgatina, N. S. Ponomareva, E. O. Shmeltser, A. D. Uchitel, Sys- tematicity of students’ independent work in cloud learning environment of the course “Educational Electronic Resources for Primary School” for the future teachers of primary schools, in: S. Semerikov, V. Osadchyi, O. Kuzminska (Eds.), Proceedings of the Sym- posium on Advances in Educational Technology, AET 2020, University of Educational Management, SciTePress, Kyiv, 2022. 13 [2] L. I. Bilousova, L. S. Kolgatina, O. H. Kolgatin, Computer simulation as a method of learning research in computational mathematics, CEUR Workshop Proceedings 2393 (2019) 880–894. [3] L. I. Bilousova, O. H. Kolgatin, L. S. Kolgatina, O. H. Kuzminska, Introspection as a condition of students’ self-management in programming training, in: S. Semerikov, V. Os- adchyi, O. Kuzminska (Eds.), Proceedings of the Symposium on Advances in Educational Technology, AET 2020, University of Educational Management, SciTePress, Kyiv, 2022. [4] S. O. Semerikov, I. O. Teplytskyi, Y. V. Yechkalo, A. E. Kiv, Computer Simulation of Neural Networks Using Spreadsheets: The Dawn of the Age of Camelot, CEUR Workshop Proceedings 2257 (2018) 122–147. [5] S. O. Semerikov, I. O. Teplytskyi, Y. V. Yechkalo, O. M. Markova, V. N. Soloviev, Computer Simulation of Neural Networks Using Spreadsheets: Dr. Anderson, Welcome Back, CEUR Workshop Proceedings 2393 (2019) 833–848. [6] O. Markova, S. Semerikov, M. Popel, CoCalc as a learning tool for neural network simu- lation in the special course “Foundations of mathematic informatics”, CEUR Workshop Proceedings 2104 (2018) 388–403. [7] Y. O. Modlo, S. O. Semerikov, Xcos on Web as a promising learning tool for Bachelor’s of Electromechanics modeling of technical objects, CEUR Workshop Proceedings 2168 (2018) 34–41. [8] S. A. Khazina, Y. S. Ramskyi, B. S. Eylon, Computer modeling as a scientific means of training prospective physics teachers, in: EDULEARN16 Proceedings, 8th International Conference on Education and New Learning Technologies, IATED, 2016, pp. 7699–7709. doi:10.21125/edulearn.2016.0694. [9] H. M. Kravtsov, Methods and technologies for the quality monitoring of electronic educa- tional resources, CEUR Workshop Proceedings 1356 (2015) 311–325. [10] A. E. C. Sotos, S. Vanhoof, W. V. den Noortgate, P. Onghena, How confident are students in their misconceptions about hypothesis tests?, Journal of Statistics Education 17 (2009). doi:10.1080/10691898.2009.11889514. [11] R. L. Wasserstein, A. L. Schirm, N. A. Lazar, Moving to a World Beyond “p < 0.05”, The American Statistician 73 (2019) 1–19. doi:10.1080/00031305.2019.1583913. [12] B. B. McShane, D. Gal, A. Gelman, C. Robert, J. L. Tackett, Abandon Statistical Significance, The American Statistician 73 (2019) 235–245. doi:10.1080/00031305.2018.1527253. [13] O. Kolgatin, Computer-based simulation of stochastic process for investigation of effi- ciency of statistical hypothesis testing in pedagogical research, Journal of Information Technologies in Education (ITE) (2016) 007–014. URL: http://ite.kspu.edu/index.php/ite/ article/view/101. doi:10.14308/ite000582. [14] R. L. Wasserstein, N. A. Lazar, The ASA Statement on p-Values: Context, Process, and Purpose, The American Statistician 70 (2016) 129–133. doi:10.1080/00031305.2016. 1154108. [15] K. M. Lang, S. J. Sweet, E. M. Grandfield, Getting beyond the Null: Statistical Modeling as an Alternative Framework for Inference in Developmental Science, Research in Human Development 14 (2017) 287–304. doi:10.1080/15427609.2017.1371567. [16] D. M. Jamie, Using computer simulation methods to teach statistics: A review of the litera- ture, Journal of Statistics Education 10 (2002). doi:10.1080/10691898.2002.11910548. 14 [17] P. Flusser, D. Hanna, Computer simulation of the testing of a statistical hypothesis, Mathematics and Computer Education 25 (1991) 158. URL: https://www.learntechlib.org/ p/144840. [18] D. W. Taylor, E. G. Bosch, CTS: A clinical trials simulator, Statistics in Medicine 9 (1990) 787–801. doi:10.1002/sim.4780090708. [19] D. R. Bradley, R. L. Hemstreet, S. T. Ziegenhagen, A simulation laboratory for statistics, Behavior Research Methods, Instruments, and Computers 24 (1992) 190–204. URL: https: //link.springer.com/content/pdf/10.3758/BF03203496.pdf. doi:10.3758/BF03203496. [20] C. Ricketts, J. Berry, Teaching statistics through resampling, Teaching Statistics 16 (1994) 41–44. doi:10.1111/j.1467-9639.1994.tb00685.x. [21] R. B. D’Agostino, W. Chase, A. Belanger, The appropriateness of some common procedures for testing the equality of two independent binomial populations, The American Statistician 42 (1988) 198–202. URL: http://www.jstor.org/stable/2685002. [22] O. H. Kolgatin, Informatsionnyye tekhnologii v nauchno-pedagogicheskikh issle- dovaniyakh (Information technologies in educational researches), Upravlyayushchiye Sistemy i Mashiny (Control Systems and Machines) 255 (2015) 66–72. [23] J. P. Verma, Data Analysis in Management with SPSS Software, Springer, India, 2013. doi:10.1007/978-81-322-0786-3. [24] Y. V. Gubler, A. A. Genkin, Primeneniye Neparametricheskikh Metodov Statistiki v Mediko- Biologicheskikh Issledovaniyakh (Application of Nonparametric Methods of Statistics in Biomedical Research), Meditsina, Leningradskoye otdeleniye, Leningrad, 1973. [25] J. Berkson, In dispraise of the exact test: Do the marginal totals of the 2x2 table contain relevant information respecting the table proportions?, Journal of Statistical Planning and Inference 2 (1978) 27–42. doi:10.1016/0378-3758(78)90019-8. [26] D. Liddell, Practical tests of 2 × 2 contingency tables, Journal of the Royal Statistical Society. Series D (The Statistician) 25 (1976) 295–304. doi:10.2307/2988087. [27] G. K. Kanji, 100 Statistical Tests, SAGE Publications, London - Thousand Oaks - New Delhi, 2006. [28] K. J. Preacher, Calculation for Fisher’s exact test, 2021. URL: http://quantpsy.org/fisher/ fisher.html. [29] Y. Fong, Y. Huang, Modified Wilcoxon-Mann-Whitney test and power against strong null, The American Statistician 73 (2019) 43–49. doi:10.1080/00031305.2017.1328375. [30] A. Marx, C. Backes, E. Meese, H.-P. Lenhof, A. Keller, EDISON-WMW: Exact dynamic programing solution of the Wilcoxon-Mann-Whitney test, Genomics, Proteomics and Bioinformatics 14 (2016) 55–61. doi:10.1016/j.gpb.2015.11.004. [31] M. P. Fay, M. A. Proschan, Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules, Statistics Surveys 4 (2010) 1–39. doi:10.1214/09-SS051. [32] Y. V. Sidorenko, Metody Matematicheskoy Obrabotki v Psikhologii (Methods of Mathe- matical Processing in Psychology), Rech, St. Petersburg, 2002. URL: https://www.sgu.ru/ sites/default/files/textdocsfiles/2014/02/19/sidorenko.pdf. [33] P. Billiet, The Mann-Whitney U-test – analysis of 2-between-group data with a quantitative response variable, 2003. URL: https://psych.unl.edu/psycrs/handcomp/hcmann.PDF. 15