254 Advances in GMDH-based Predictive Analytics Tools for Business Intelligence Systems Serhiy Yefimenko Department for Information Technologies of Inductive Modelling, International Research and Training Center for Information Technologies and Systems, UKRAINE, Kyiv, 40, Ave Glushkov, email: syefim@ukr.net Abstract: The paper analyzes approaches to prediction Organizations of different types may be troubled by certain of economic processes in business intelligence systems. problems in the effectiveness of existing data using in their Contemporary tools of predictive analytics, used for systems. In this regard, the quality and speed of information effective making of business decisions, are considered. and analytical support for business management is of The concept of advanced GMDH-based predictive particular importance for companies. Most of them use BI analytics tool is proposed. analytical applications based on OLAP systems for planning, Keywords: business intelligence, predictive analytics, analyzing and controlling tasks. However, in new economic GMDH, recurrent-and-parallel computing. conditions, the functionality of such systems is not enough to solve new digital problems, since they oriented on I. INTRODUCTION retrospective analysis. Consequently, there is a need for Achieving success and ensuring competitiveness in today's predictive analytics, which complements and enhances BI fast changing economic conditions are impossible without the capabilities in terms of predicting future events. use of reliable and on-line information. Business data is In general, there are several types of analytics that co-exist becoming significant resource for knowledge acquisition and and supplement each other [4, 5]: making important managerial decisions in different business – descriptive analytics explores past facts in order to find fields. Up-to-date effective decisions require reliable and the causes of previous successes or failures. It answers the complete information, and it is impossible to do with the use question "What's up?". Descriptive analytics is still in use of traditional information systems. today. Most of the management reports for sales, marketing, In our time, there is a rapid transformation of the global finance use this kind of business analytics; information area that affects society, market and business. – diagnostic analytics goes further and gives an idea not There is a fast growth of the digital economy. 25% of the only of the events that occurred, but also of their causes. It world’s economy will be digital by 2020 [1], whereas this answers the question "Why something happened?"; number was 15% in 2005. The Internet of Things (IOT) and – predictive analytics answers the question "What is likely Big Data, mobile and cloudy technologies contribute the to happen?". Historical data is combined with rules, economy digitization. Influence of these technologies on algorithms and external data in order to determine the future business will result in direct domain physical resources to value or the probability of an event; become useless. – prescriptive analytics is the next stage in predicting Business intelligence (BI) is a modern managerial tool in future events, and offers a sequence of actions to gain most the digital economy. It contributes to the company's from predictions and shows the consequence of each prosperity based on smart financial, business processes, and decision. It answers the question "What should I do?". personnel management under considerable amount of Predictive analytics is defined in [6] as a variety of information. statistical techniques from predictive modelling, machine The purpose of the review is to consider modern learning, and data mining that analyze current and historical approaches to prediction economic, production and financial facts to make predictions about future or otherwise unknown processes in BI systems, as well as existing software tools for events. As a rule, big data arrays are used in the process of predictive analytics. analysis. The main idea of predictive analytics is to determine one or more parameters that affect the predicted event. The II. PREDICTIVE ANALYTICS & PREDICTIVE process of predictive analysis can be represented as follows: MODELLING Project BI encompasses strategies and technologies used by definition Data enterprises to analyze business information [2]. BI refers to Model collection the management philosophy and toolkit used to help operate monitoring The goal business information in order to make effective business of business Data decisions. BI technologies provide historical, current and Deployment doing analysis predictive views on business operations. The classification of technologies, used by business analytics, is given in [3]. Predictive modeling is one of the Modeling Statistics most effective. Fig.1. Predictive analytics process ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 255 Project definition. Definition of project results, – convenient and clear interface allowing users to create components, scale of the work, business purpose, data set to predictive models on their own; be used. – automated process of routine application of models; Data collection. With the use of intelligent data analysis, – possibility of batch processing; data from different sources is prepared. – rapid data collection and preparation, aggregation and Data analysis. The process of data checking, clearing and analysis; modeling in order to identify useful information is – scalability and customization of the solution; performing. – high system performance when working with big data. Statistical analysis allows to confirm assumptions, IBM SPSS (Statistical Package for the Social Sciences) hypotheses using standard statistical models. [11] is a widespread intelligent tool for predictive analytics. Predictive modeling provides the ability to automatically SPSS's predictive analytics helps you analyze the patterns in build accurate predictive models. historical and current transactions to predict potential future Deployment of a predictive model provides the use of events. analytical results in the decision making process for obtaining A key component of the toolkit is SPSS Modeler, software reports. environment for data mind allowing you to create intelligent Model monitoring. Models are tested to ensure the predictive solutions by revealing the data patterns and expected results. relationships. SPSS Modeler Server supports integration with The result of predictive analytics applying consists in the data mind and modelling tools provided by DBMS (database most effective business solutions making. An important management system) developers, including IBM Pure Data requirement for a predictive model is to be as fit as possible System for Analytics. Using the SPSS Modeler, one can build and to be statistically significant. The predictive models may and store models in the database. One can combine the be [7]: analytical capabilities and ease of use of SPSS Modeler with – classification models. They describe set of rules, the power and performance of the DBMS, using the built-in according to which a new object can be assigned to the algorithms supplied by their developers. The models are built relevant class; inside databases and are available for use with the convenient – time series models. They describe the functions that user interface of SPSS Modeler. allow prediction of continuous numerical parameters and are Dell Statistica (from 2017 -Tibco Software) in-depth data based on information on the change of a certain parameter analysis platform [12] focuses on data professionals and over the past time period. organization needing to data process from a large number of According to Transparency Market Research [8], the IOT devices and heterogeneous sources. The functionality of market for predictive analytics will reach $ 6.5 billion by the toolkit will help to prepare structured and unstructured 2019, while it was $ 3.6 billion in 2015. The global market data, deploy analytical tools on devices regardless of their for predictive analysis systems will grow by an average of location and use internal analysis functions on the MYSQL, 17.8% annually. And as experience shows, the companies Oracle, and Teradata platforms. survive, that continue to invest in technology and innovation With Dell Statistica, companies are able to cope with the in the difficult economic times. And predictive analytics, of lack of data analysts and the complexity of today's IOT course, is one of such technology. environments, as well as take into account new sources and data types. III. SOFTWARE TOOLS FOR PREDICTIVE ANALYTICS Dell Statistica’s features, simplifying predictive analytics, Forrester Research has published in 2013 a report "Big are as follows: Data Predictive Analytics Solutions, Q1 2013" in which – dashboards with advanced visualization allowing users to market leaders for predictive analytics are contained [9]. easily see the results of the analysis at any stage; According to it, SAS and IBM SPSS have the strongest – state-of-the-art web interface allowing users to share position in the market and the best strategies among the reports that can be opened in any browser; largest developers of predictive analytics tools. The – effective control of data, entered manually. evaluation was carried out for 51 parameters - from the In addition to the represented (far from complete) completeness of the functionality for the main analytical developers of predictive analytics, there are also a large system to the size of the client base and the architectural number of specialized firms providing business intelligence advantages offered by the solutions developers. services. One of the most famous is Elder Research [13]. It SAS (Statistical Analysis System) Enterprise Miner [10] is has extensive experience in using many software tools leading in the segment of in-depth analytics, accounting for (including all the above) for developing analytical solutions, about a third of the market. It allows users to explore and programming, and personalized data visualization. analyze large amounts of data, to find patterns of relationships and to make well-informed decisions, based on IV. GMDH-BASED PREDICTIVE ANALYTICS TOOLS facts and findings. Areas of effective use of the solution: Among the various tools for predictive analytics, it should banking sector, healthcare, oil and gas sector, insurance be emphasized several ones, the common feature of which is companies, telecommunications, transport, power system. using of one of the most effective inductive modeling The main advantages of SAS Enterprise Miner include: methods – Group Method of Data Handling (GMDH) [14]. – advanced predictive modeling; ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 256 Software tool Insights [15] is developed by German Department for information technologies of inductive company KnowledgeMiner Software (created in 1993). modelling [17]. Besides GMDH, it also uses Similar Patterns self-organizing The tool is implemented for use on multiprocessor cluster modeling technology (also known as Analog Complexing) systems. However, it can be embedded in any contemporary and fuzzy logic for modelling and prediction. It is possible to business intelligence system as an analytical tool for build linear and nonlinear, static and dynamic time series modeling and prediction of the dynamics processes in digital models, multi-input and one output models, many inputs and economy systems based on the detection and use of many outputs models. The outputs of the model can be knowledge about the behavior and performance of such represented both in analytical form (in the form of equations systems. with estimated parameters) and graphically (using a system graph, which reflects the interconnections of the system 55 structure). 45 35 25 1996 1998 2000 2002 2004 2006 Real values Model Fig.4. An example of software tool with recurrent-and-parallel GMDH algorithms using for predictive analysis Fig.2. An example of Insights using for predictive analysis V. ON CONSTRUCTION OF ADVANCED GMDH- Insights implements vector processing, multi-core and BASED PREDICTIVE ANALYTICS TOOL multiprocessor support for high-performance computing. It is Given that there is a considerable amount of predictive scaled to the Apple Macintosh computer hardware. analytics tools, not fulfilling the whole range of problems, it Regardless of which processor is used (dual-core or two six- may be concluded that there is still no single convenient core), the software automatically uses all the features of the solution on the market. To create accurate models and to PC. obtain adequate predictions of various indicators, an GMDH Shell [16] is a contemporary software tool for advanced predictive analytics tool is required allowing to predictive analytics. It is based on the classical GMDH comprehensively reflect the relationships in models, being algorithm and can be used for time series prediction, solving accessible and convenient for the user, allowing the user to classification and clustering problems. GMDH Shell is a customize the model and build reliable predictions. However, powerful solution for analyzing multidimensional data from to date there is no such a complete solution. Developing of various business fields. The software tool offers data mining such a tool is an actual problem in the field of analytical algorithms – self-organized neural networks and business solutions. combinatorial structural optimization of models. There is also The most important features of this advanced predictive the possibility of high-performance computing using a Linux- analytics tool are: cluster. − GMDH-based software tool [18, 19]; − recurrent-and-parallel computing [20]; − intelligent user interface [21]; GMDH-based software tool. The user does not need to have a thorough knowledge of the modeling principles when building models. He will be able to model with a convenient tool, knowing only the features of his domain. Built-in intelligent algorithms allow one to automatically build models on the available data set, which greatly facilitates the user’s work. Advanced predictive analytics tool, being constructed in the paper, is based on software for modelling and prediction Fig.3. An example of GMDH Shell using for predictive analysis of complex multidimensional interrelated processes in the class of vector autoregressive models. It should be noted that GMDH Shell does not compete Recurrent-and-parallel computing. Fundamentally new with KnowledgeMiner Insights in the sense that it is intended data-based solution for inductive modeling of complex for use on the Windows operating system. processes has a high level of performance because of new Software tool for modeling and prediction of complex concept, combining the efficiency of recurrent and parallel multidimensional interrelated processes is developed in the computing. The implementation of such solution provides ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 257 significant enhancing of efficiency and validity of managerial [7] http://www.globalcio.ru/workshops/968. decisions. [8]https://www.transparencymarketresearch.com/pressrelease Intelligent user interface. It is very important that /predictive-analytics-industry.htm. predictive analytics tools are either too complicated for users [9]https://www.forrester.com/report/The+Forrester+Wave+Bi or do not contain the necessary range of options. The g+Data+Predictive+Analytics+Solutions+Q1+2013/-/E- intelligent user interface should be friendly and should allow RES85601. building models without deep programming knowledge, [10]https://www.sas.com/ru_ua/software/enterprise- which will significantly expand the range of users and miner.html. increase their confidence in BI applications. [11] https://www.ibm.com/analytics/data-science/predictive- Advanced predictive analytics tool must include an analytics/spss-statistical-software. intelligent shell allowing user (with any level of [12] https://www.tibco.com/products/tibco-statistica. qualification) help to solve the data-based modelling problem [13] https://www.elderresearch.com. (from data preprocessing to modelling algorithm choice). The intelligent shell provides the general use of automatic [14] Stepashko V. “Developments and Prospects of GMDH- analysis and modeling procedures. It takes into account the Based Inductive Modeling” In: Advances in Intelligent user’s wishes and a priori knowledge about the modeling Systems and Computing II: Selected Papers from the object, and also provides decisions making control at every International Conference on Computer Science and step of the modeling problem solving. Information Technologies, CSIT 2017, September 5-8, Lviv, Ukraine. N. Shakhovska, V. Stepashko Editors. VI. CONCLUSION AISC book series, Volume 689. – Cham: Springer, 2017, Contemporary capabilities and advanced techniques of pp. 474-491. predictive analytics are becoming powerful way for [15] https://www.knowledgeminer.eu. increasing the company's productive efficiency. Predictive [16] https://gmdhsoftware.com. analytics is a new trend opening up broad prospects for the [17] http://www.mgua.irtc.org.ua. further development of companies. [18] Yefimenko S. “Building Vector Autoregressive Models Applying predictive analytics systems one should Using COMBI GMDH with Recurrent-and-Parallel understand that the work of such systems is impossible Computations” In: Advances in Intelligent Systems and without sufficient historical data and ineffective without the Computing II: Selected Papers from the International collection of current data. The less data will be used, the less Conference on Computer Science and Information accurate are predicted values. Technologies, CSIT 2017, September 5-8, Lviv, The effectiveness of applying predictive analytics tools Ukraine”. N. Shakhovska, V. Stepashko Editors. AISC depends on both technologies used and the quality of such book series, Volume 689, Cham: Springer, 2017, pp. tools. And the advantage here will be on the side of the 601-613. solutions, providing advanced methods of data mining. Such [19] Stepashko V.S. and Yefimenko S.M. “Technologies of ones are just knowledge-oriented intelligent modeling Numerical Investigation and Applying of Data-Based software tools based on GMDH. Modeling Methods” Proceedings of the 2nd REFERENCES International Conference on Inductive Modelling ICIM 2008, Kyiv, 2008, pp. 236-240. [1] https://www.accenture.com/t20160314T114937__w__/us- [20] Serhiy Yefimenko, Volodymyr Stepashko “Intelligent en/_acnmedia/Accenture/Omobono/TechnologyVision/p Recurrent-and-Parallel Computing for Solving Inductive df/Technology-Trends-Technology-Vision-2016.PDF.P. Modeling Problems” Proceedings of 16th International B. Johns, "A symmetrical condensed node for the TLM Conference on Computational Problems of Electrical method," IEEE Trans. Microwave Theory Tech., vol. Engineering (СРЕЕ-2015), September 2-5, 2015, Lviv, MTT-35, pp.370-377, Apr. 1997. Ukraine, 2015, pp. 236-238. [2] https://en.wikipedia.org/wiki/Business_intelligence. [21] Stepashko V.S., Zvorygina T.F., Yefimenko S.M. [3] M. Goebel and L. Gruenwald‚ "A survey of data mining “Problem of decision making intellectualization in tasks and knowledge discovery software tools", Volume 1, of models identification” (Problema intelektualizatsii Issue 1 (June 1999), Publisher ACM New York, NY, pryiniattia rishen’ u zadachakh identyfikatsii modelei), USA. Proceedings of ISDMIT-2005 Conference, Kherson, [4] https://en.wikipedia.org/wiki/Business_analytics. Ukraine, 2005, Vol. 1, pp. 127-131. [5] https://www.gartner.com/it-glossary/diagnostic-analytics. [6] https://en.wikipedia.org/wiki/Predictive_analytics. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic