=Paper= {{Paper |id=Vol-1844/10000311 |storemode=property |title=Statistical Software in the Higher School Educational Process |pdfUrl=https://ceur-ws.org/Vol-1844/10000311.pdf |volume=Vol-1844 |authors=Taras Kobylnyk |dblpUrl=https://dblp.org/rec/conf/icteri/Kobylnyk17 }} ==Statistical Software in the Higher School Educational Process== https://ceur-ws.org/Vol-1844/10000311.pdf
    Statistical Software in the Higher School Educational
                            Process

                                      Taras Kobylnyk

    Drohobych Ivan Franko State Pedagogical University, 24 Ivan Franko Str., Drohobych,
                                        Ukraine
                            kobylnyktaras@gmail.com



       Abstract. Data processing is not possible without a computer with the appro-
       priate software. Before the user there is a problem of choosing software for re-
       search. Therefore the aim was to analyze software for statistical research. To
       achieve the goal the scientific and methodological sources were analyzed, com-
       parison, generalization were made to determine the state of a problem and per-
       spective directions of its solution. The paper presents the software capabilities
       of such tools as MS Excel, Statistica, IBM SPSS, R. Special attention is focused
       on the calling R from other software, including Statistica, IBM SPSS, Mathe-
       matica, LaTeX. The popular commercial statistical software provides the ability
       to perform R script directly in the shell of these programs. It is proposed to use
       R package for the scientific research and in the higher school educational pro-
       cess. In the educational process it is appropriate to use the project method. Stu-
       dents made report and presentation by using integrating R into LaTeX.

       Keywords: Statistical Analysis, MS Excel, R, IBM SPSS, Statistica, Mathe-
       matica, LaTeX.


       Key Terms. SoftwareSystem, ICTEnvironment, TeachingProcess


1   Introduction

    In the experimental research the different statistical methods were used to test hy-
potheses, constructing statistical models of objects, phenomena, regularities and pro-
cesses. It is obviously that the experimental data processing is impossible without the
use of computers with appropriate software. There is a large variety of software in-
cluding general, special purpose for data processing. Thus, the user raises the ques-
tion: which software to choose? Or which software is best for data processing?
    Standard statistical methods of data processing were implemented in spreadsheets
(Lotus, QuatroPro MS Excel, OpenOffice.org Calc and other), Computer Mathemat-
ics Systems (CMS) (Maple, Mathematica, Matlab, Maxima and other), specialized
software (R, SPSS, Statistica, SAS and other).
    Often, in the learning process of higher school the students study and use special-
ized software such as MS Excel, IBM SPSS, Statistica for statistical analysis. The
book [10] provides a comprehensive coverage of the main statistical analysis topics
(data description, statistical inference, classification and regression, factor analysis,
survival data, directional statistics) that one faces in practical problems, discussing
their solutions with Statistica, SPSS, Matlab and R. The book [4] provides a brief and
straightforward description of how to conduct a range of statistical analyses using of
SAS, version 9.2. The book [18] covers the applying of many of the most powerful
statistical and machine learning techniques used in data science projects in using the
programming language R to perform actual research data processing. The article [12]
presents various ways of measuring of the popularity or market share of software for
advanced analytics. Such software is also referred to as tools for data science, statisti-
cal analysis, machine learning, artificial intelligence, predictive analytics, business
analytics, and is also a subset of business intelligence.
    Research methods. Scientific and methodological sources were analyzed, com-
parison, generalization was used to determine the state of the problem and perspective
directions of its solution.


2   The Presentation of Main Results

2.1 Microsoft Excel.
   The book [5] shows how to use Microsoft Excel to perform statistical analysis.
This step-by-step guide has been updated to cover the new features and interface of
Excel 2010. There are advantages of Microsoft Excel:

─ the relative ease of mastering and practical use;
─ a significant number of built-in statistical functions;
─ the presence of the add "Analysis ToolPak", containing procedures for solving
  complex problems of statistical analysis;
─ the ability to create user "own" modules in the VBA for data analysis;
─ there is a macro-addition xlstat-pro (You can use the trial from site developer
  www.xlstat.com).

     Despite the advantages of MS Excel a full statistical analysis is not possible: this
is software of general purpose but not special (scientific). There is a small set of sta-
tistical tests. There are not many methods, especially multidimensional. There is no
specialized reporting system. This is explained by the fact that the statistics is not the
primary function of a spreadsheet.
     Statistical packages are deprived of these deficiencies, far surpassing spreadsheets
by the volume and quality of implemented statistical methods. Therefore, the final
statistical analysis should be done in the software that is specifically designed for this
purpose - statistical packages. The statistical package usually contains quite a large
range of standard statistical methods, simple enough for rapid studying, working with
rather large databases, it is possible to exchange data with other packages and data-
bases, it has an extensive set of graphics data and the results of their analysis and
there is a detailed documentary support and reference system [16].
   Statistical package must satisfy the following minimum requirements:
─ modularity;
─ use a simple problem-oriented language for the formulation of user tasks;
─ maintaining the user's data bank and report on the results of this analysis;
─ interactive user mode with the package;
─ compatibility from other software.

    It should be noted there is a minimal set of statistical methods of analysis, which is
included in all statistical packages, such as descriptive statistics, nonparametric statis-
tics, variance, correlation and regression, cluster, factor, discriminant analyzes.


2.2 IBM SРSS
    IBM SPSS (Statistical Package for Social Science) is a universal statistical pack-
age of the company SРSS Inc. The first version of the package was released in 1968.
In 2009, IBM swallowed SPSS Inc, whereby the name of the package includes the
IBM abbreviation. IBM SPSS is modular software. Its foundation is the base module
(SPSS Base), allowing to carry out all types of statistical analysis. The latest version
is 24. The book [7] has been written for use with version 20 of the IBM SPSS statisti-
cal package for Windows. This book focuses on both the Windows method and the
syntax method of analysis, which has proven popular with both experienced research-
ers and students.
    IBM SPSS is among the most widely used software for statistical analysis in so-
cial science and other disciplines. IBM SPSS can take data from almost any type of
file and use them to generate tabulated reports, charts and plots of distributions and
trends, descriptive statistics, and complex statistical analyses.
    Add-on modules can be installed for the extended and in-depth data analysis, such
as IBM SPSS Missing Values (finds relationships between any missing values in your
data and other variables), IBM SPSS Neural Networks (offers non-linear data model-
ing procedures that enable you to discover more complex relationships in your data),
IBM SPSS Forecasting (enables analysts to predict trends and develop forecasts
quickly and easily – without being an expert statistician) and other. More detailed
information about the modules can be found at the site http://www-
03.ibm.com/software/products/en/spss-statistics.
    Among the advantages of the IBM SPSS should be noted:
─ a simple and user-friendly interface;
─ a wide variety of statistical and graphical data analysis procedures and reporting
  procedures;
─ the detailed context-based help system;
─ the ability to download trial version on the official website of the company, the
  existence software versions in many languages;
─ compatibility with operating systems Windows, Mac OS, Linux.

   Among the disadvantages of the IBM SPSS package should be noted:
─ high system requirements (requires 1GB or more of RAM, 800 MB hard disk
  memory and a CPU frequency of 1 GHz and above);
─ high price compared to similar software.


2.3 Statistica
   Statistica [17] is universal software for statistical analysis of StatSoft Inc. The first
version of the package (Statistica for DOS) was released in 1991. To date, it devel-
oped 13-th version of the package.
   There are three standard components (modules) of Statistica:

─ Statistica Base module provides an extensive choice of the main types of statistical
  analysis. It takes requires 256 MB RAM or more and CPU frequency of 500 MHz
  and above for efficient operation of the Statistica Base module;
─ Advanced Linear/NonLinearModels module is used for modeling and forecasting,
  including the ability to automatically select the model and advanced interactive
  visualization tools;
─ Multivariate Exploratory Techniques module is used for exploratory analysis of the
  different types of data in combination with interactive visualization tools.

   There are such options for installing Statistica:

─ a single-user version (Single-User);
─ a network version (Concurrent Network) - version for use in local area networks;
─ Enterprise – version for use in computer systems and large organizations;
─ Web-Based – version for use in large networks through a browser.

   Among the advantages of Statistica should be noted:

─ there is implemented communication between Statistica and Windows-
  applications;
─ there is a built-in programming Statistica Basic language.

   The disadvantages include the fact that Statistica is developed only for Windows,
which slightly decrease the number of users and the high cost of licenses. Statistica
does not have a version for Mac OS or Linux. Statistica relies on Microsoft's .NET
Framework and other Microsoft technologies. But thanks to advancement in computer
hardware and software, Statistica can run on a Windows OS inside of a virtual ma-
chine on a Mac or Linux computer. Some examples of virtualization software for the
Mac include VMware Fusion and Parallels. More information on this topic can be
found on the site https://support.quest.com/statistica/kb/145707.


2.4 R Environment

   First of all R is a high-level language and an environment for data analysis and
graphics [2]. Learning of statistical analyses with R and global scientific community
support has resulted that R-scripts are the standard for both of articles and in informal
communication of scientists around the world.
   We list the advantages and disadvantages R package:
─ it is a free software, somebody can download it from http://www.r-project.org;
─ there is a realization of the operating systems Windows, Mac OS, Linux;
─ R basic version takes up little space on hard disk and includes all the functions
  needed for statistical analysis;
─ package extensions is developed that apply in virtually all fields of science where
  statistical analysis is used;
─ there is an opportunity to write the necessary functions by himself;
─ data can be entered from the keyboard or import from text files, spreadsheets, sta-
  tistical software, CMS, database management systems.

    Among the disadvantages of conventionally should be noted:
─ orientation to programming;
─ in contrast to most commercial software R has not several Graphical User Interfac-
  es, but has Command Line Interface and thus need to know the necessary functions
  and programming language syntax. It should be noted there are several Graphical
  User Interfaces for R (RStudio, R Commander and others), but they are not as good
  as the interfaces.


2.5 Computer Mathematics Systems
    If statistical analysis is only part of some large-scale scientific research in this case
you can use CMS, for example Maple [14], Mathematica [15] or Matlab [11]. CMS
covers virtually all areas of classical and Applied Mathematics and Statistics. Normal-
ly some basic statistics functions are in the CMS kernel. Package development (or
Toolbox) is used to increase the possibilities of using to data processing.
    Maple, Mathematica, Matlab are commercial and costly CMS. The Maxima, a
Computer Algebra System, is the real alternative to commercial CMS
(http://maxima.sourceforge.net/). The Maxima source code can be compiled on many
systems, including Windows, Linux and Mac OS. In fact, it is the only system that
can compete with commercial Mathematica and Maple, but it has several limited pos-
sibilities for statistical analysis.


3   Calling R from Other Software

    Today there is a trend of integration of statistical packages between themselves
and other software, including CMS and word processors. We will consider examples
of such integration. It should be noted that most achievements in statistics are realized
for the first time in package R and then added to the menu other software. The R
software is powerful but it takes a long time to learn to use it well. You can keep us-
ing your current software to access and manage data, then call R for just the things
your current software doesn’t do [13]. Therefore, we consider the integrating R into
other software such as Statistica, IBM SPSS, Mathematica and word processors.


3.1 Calling R from Statistica
    According to information at the [8] R tabular results are difficult to handle; R
graphics cannot be changed; results difficult to manage. Statistica and R integration
expands the possibility of data environments, open and use R scripts inside Statistica.
By integrating R into Statistica, the user can run R scripts in the shell Statistica, ob-
taining results in the form of reports, workbooks and charts Statistica. You can also
call methods R from Statistica Visual Basic. Statistica interacts with other applica-
tions using the relevant standards, in particular by using the COM standard, which is
built into the Windows. Statistica interacts with R through statconnDCOM and, if the
library is installed in the system, users can open and execute R scripts from Statistica
environment. It facilitates the transfer of data and presentation of results through cus-
tom macro running in Statistica, instead of R. Interaction with R is available in all
Statistica product lines. From the 10th version of Statistica, the process of establishing
interaction with R is carried out automatically.


3.2 Calling R from IBM SPSS

   According to the information at the [3] the 16th version of IBM SPSS package
contains a free plug-in that allows you to execute code R in IBM SPSS. With the R
plug-in, you retain all the features of an SPSS database, particularly the labels of cat-
egory data and the long descriptors. The R integration plug-in does two things:

─ it opens communication between IBM SPSS and R;
─ it provides R with a package of functions with which to translate IBM SPSS data
  structures into R objects.

   More about integrating R into IBM SPSS you can read in the article [3]. Thus, the
functionality of R is available for users who do not have the skills in working with R.


3.3 Calling R from Mathematica
   By integrating the R code in Mathematica data is exchanged between Mathemati-
ca and R, and the execution of the R code in Mathematica. From version 9, Mathe-
matica offers built-in ways to integrate R code into your Mathematica workflow,
combining Mathematica's broad range of capabilities with the R. There are built-in
ways to integrate R code into the Mathematica system using the RLink package de-
velopment. RLink uses J/Link and rJava/JRI Java library to help users exchange data
between Mathematica and R, and in order to execute code P from Mathematica.
   Calling R from Mathematica provides:

─ full access to all R functionality from within Mathematica;
─ exchange data between Mathematica and R, using the usual Mathematica represen-
  tation for the data – most core R data types are supported;
─ apply built-in and user-defined R functions from within Mathematica in a way
  natural for Mathematica users;
─ create functions in the R workspace from within Mathematica.

    More about calling R from Mathematica you can read in the article [1].


3.4 Calling R from Word Processors

    Usually, any research is completed publication of results. To save the results suit-
able for publication as is usually used two packages – sweave and odfWeave from R.
The first allows you to embed the code and results in the tex-files for typographic
quality reports in the formats pdf, PostScript and dvi. Similarly odfWeave package
allows you to embed the code and the results of R in the file format odf (Open Docu-
ments Format). The final report, you can edit at word processor OpenOffice Writer or
Microsoft Word. More calling R from word processors you can read in the book [9].


4   Conclusions and Prospects for the Further Research

    There is question about software choice for data processing for users. Of course,
functionality, high quality and reasonable price are the factors for choosing statistical
software. Therefore, you need to consider when choosing statistical software:

─ the cost (if system is commercial);
─ the purpose of use (for scientific research or educational process);
─ the user qualifications requirements (level of statistical knowledge and program-
  ming);
─ hardware and software available.

    From the above software analysis we advise to use R package because:

─ it is distributed under license GNU/GPL;
─ it is one of the best statistical software.

    The last argument is confirmed by the fact that the popular commercial statistical
systems (IBM SPSS, Statistica, SAS), CMS (Mathematica, Matlab) provides the abil-
ity to perform R script directly in the shell of these programs. Integrating R into other
software can be useful not only to specialists from the statistics, but also specialists,
for whom data processing is only part of the big scientific research.
    In the educational process it is appropriate to use the project method. Students are
encouraged to implement the project; the end result is to report on the implementation
of individual tasks. The subject is related data processing by using of R package. Re-
port and presentation are made by using integrating R into LaTeX.
    The use of statistical software extends the scope of application of mathematical
methods and models for the study of processes in various fields of human activity and
simplifies them. The final result of research is some publication. In this case LaTeX is
useful which combined with the R package or CMS provides the ability to create high
quality products. Further research will be focused to exploring the possibilities of
using R environment for various statistical analysis methods and development of
methods of its use in the higher school educational process.


References
 1. Built-in       Integration       with      R:        New       in      Mathematica         9,
    https://www.wolfram.com/mathematica/new-in-9/built-in-
    integration-with-r/
 2. Crawley, M.J.: The R Book, Second Edition. A John Wiley & Sons, Chichester (2013)
 3. Dalzell, C.:                Calling                R              from                SPSS,
    https://www.ibm.com/developerworks/library/ba-call-r-spss/
 4. Der, G., Everitt, B.S.: A Handbook of Statistical Analyses Using SAS, Third Edition.
    Chapman and Hall/CRC Press, Boca Raton (2008)
 5. Dretzke, B.J. Statistics with Microsoft Excel, 5th Edition. Pearson, New Jersey (2011).
 6. Field, A.: Discovering Statistics Using IBM SPSS Statistics, 4th Edition. Sage Publica-
    tions, London (2013)
 7. Ho, R.: Handbook of Univariate and Multivariate Data Analysis with IBM SPSS, Second
    Edition. Chapman and Hall/CRC, Boca Raton (2013)
 8. Integrating                       R                      into                     Statistica,
    http://statsoft.ru/products/integration/integration-with-
    R.php (in Russian)
 9. Kabacoff, R.I.: R in Action: Data Analysis and Graphics with R, Second Edition. Manning
    Publications, NY (2015)
10. Marques de Sá, Joaquim P.: Applied Statistics Using SPSS, STATISTICA, MATLAB and
    R, Second Edition. Springer Publishing Company, Incorporated, Berlin (2007)
11. Martinez, W.L., Martinez, A.R.: Exploratory Data Analysis with Matlab. Chapman &
    Hall/CRC Press, Boca Raton (2005)
12. Muenchen,        R.A.:      The       Popularity     of     Data      Science     Software,
    http://r4stats.com/articles/popularity
13. Muenchen,           R. A.        Calling         R       from         Other       Software,
    http://r4stats.com/articles/calling-r/
14. Rafter, J.A., Abell, M.L., James P.B. Statistics with Maple. Academic Press, Amsterdam;
    Boston (2002)
15. Rose, C., Smith, Murray D. Mathematical Statistics with MATHEMATICA. Springer-
    Verlag, NY (2002)
16. Tyurin, Yu. N., Makarova, A.A. Statistical analysis of the data on the computer. INFRA,
    Moscow (1998) (in Russian)
17. Wass, J.A. STATISTICA 10: The Power of Statistics and Data Mining Simplified,
    http://www.scientificcomputing.com/article/2011/12/statistic
    a-10-power-statistics-and-data-mining-simplified
18. Zumel, N., Mount, J.: Practical Data Science with R. Manning Publications, NY (2014)