Implementation of probabilistic data structures in the processes of neuroevolutionary synthesis Serhii Leoshchenkoa, Andrii Oliinyka, Sergey Subbotina, Viktor Lytvyna and Oleksandr Korniienkoa a National university “Zaporizhzhia polytechnic”, Zhukovskogo street 64, Zaporizhzhia, 69063, Ukraine Abstract Neuroevolutionary synthesis is an alternative to training a pre-designed neural network. The use of neuroevolution methods is explained by fewer free parameters, less dependence on the expert, and usually better adaptive characteristics. However, neuroevolution methods are more resource-intensive, because the synthesis uses populations consisting of a large number of networks. Various mechanisms, such as parallelization, can be used to solve problems related to synthesis time. However, they usually lead to even more memory usage in the process. Mechanisms such as selective pressure only partially solve this problem. Probabilistic data structures have proven to be a fairly compact storage mechanism. Therefore, the paper provides an example and analysis of ways to use probabilistic data structures during the neuroevolutionary synthesis of artificial neural networks. Keywords 1 Neuroevolutionary, synthesis, artificial neural networks, probabilistic data structures, encoding, sequencing, genetic operators 1. Introduction Artificial neural networks (ANN) are one of the areas of scientific research in the field of creating artificial intelligence [1-3], which is based on the desire to imitate the human nervous system [3-5]. Including its (nervous system's) ability to correct mistakes and self-learn [5-7]. The scope of application of ANNs is constantly expanding, today they are used in such areas as:  machine learning, which is a type of artificial intelligence [1-7]. It is based on training the ANN on the example of millions [8-11] of tasks of the same type. Nowadays, machine learning is actively implemented by Google [12-15], Bing [16], [17], and Baidu [18], [19] search engines. So based on the millions of search queries that we all enter into search systems every day, their algorithms learn to show us the most relevant results so that we can find exactly what we are looking for [14], [15];  in robotics [20], [21], neural networks are used to develop numerous algorithms for the Iron "brains" of robots. Here, training usually uses methods that implement a reinforcement learning strategy, such as swarm intelligence or the actor-critical method (A2C and A3C) [22-25];  computer system architects use neural networks to solve the problem of parallel computing [4], [7];  with the help of neural networks, scientists can solve various complex mathematical problems [6]. As already mentioned in one of the theses, before direct training of the ANN, it is necessary to train. This can be done using certain examples (supervised learning), using information about the environment and the state of the "agent" in that environment (reinforcement learning), or without any CMIS-2021: The Fourth International Workshop on Computer Modeling and Intelligent Systems, April 27, 2021, Zaporizhzhia, Ukraine EMAIL: sergleo.zntu@gmail.com (S. Leoshchenko); olejnikaa@gmail.com (A. Oliinyk); subbotin@zntu.edu.ua (S. Subbotin); lytvynviktor.a@gmail.com (V. Lytvyn); al.korn95@gmail.com (O. Korniienko) ORCID: 0000-0001-5099-5518 (S. Leoshchenko); 0000-0002-6740-6078 (A. Oliinyk); 0000-0001-5814-8268 (S. Subbotin); 0000-0003- 4061-4755 (V. Lytvyn); 0000-0003-4812-5382 (O. Korniienko) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) clear input information at all (unsupervised learning) [26-33]. Neuroevolution methods can be used for all approaches, because usually, in addition to conventional training (parametric synthesis), neuroevolution methods adjust the structure (topology) of the ANN (structural synthesis) [37], [38]. Such methods are separated into a separate group of methods for the evolution of interneuronal connections and network topologies (TWEANNs) [37] ,[38], which are individuals in the population. However, one of the main disadvantages of neuroevolution methods is their resource intensity. Large amounts of synthesis time are usually explained by the search for the most optimal ANN structure, which is achieved by using evolutionary mechanisms in individual populations. On the other hand, taking into account the fact that in usual network training, using gradient methods, it is necessary to design topologies first and only then train it, it should be noted that neuroevolutionary synthesis automates this subprocess [39-42]. Moreover, the use of parallelization can significantly speed up the synthesis process [39-42]. And given the fact that almost all neuroevolution methods can be parallelized, this removes the speed problem. However, parallelization imposes additional memory requirements on the computing system. The use of selective pressure mechanisms [43] can increase the level of adaptability of final decisions and reduce memory usage, but in sequential systems it can lead to a low level of genetic diversity. Therefore, the question arises about using certain special structures to preserve information about the population. Moreover, the use of these structures is possible at the stage of encoding genetic information about an individual, even at the beginning of synthesis. Probabilistic data structures (PDS) are a group of data structures that are extremely useful for big data and streaming applications [44-46]. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of elements. Data collisions are ignored (usually using a number of mechanisms that prevent such situations), and errors are controlled at a certain threshold. Compared to error-free approaches, these algorithms use much less memory and have a constant query time. Moreover, support for union and intersection operations provides a low level of parallelization complexity. Therefore, let's look at what stages of synthesis the use of PDS is justified and appropriate. The main stages are shown in Figure 1. Figure 1: Synthesis stages where PDS can be implemented Analyzing Figure1 note that the implementation of PDS during synthesis is advisable at the following stages:  encoding of genetic information about ANN, as individuals of the population;  crossover of individuals (since PDS supports unification and intersection). Special cases can be considered situations when information about the ANN is transmitted, for example, between the cores of parallel computing systems or computing nodes, as shown in Figure 2. Population initialization Evaluating individuals Evaluating individuals Evaluating individuals Selection and formation of Selection and formation of Selection and formation of Coding the parent pool the parent pool the parent pool (PDS) Recombination (crossover) Recombination (crossover) Recombination (crossover) Thread1 Thread2 Variations (mutation) Variations (mutation) Threadn Variations (mutation) Evaluating individuals Solution Figure 2: ANNs forwarding in parallel systems 2. Literature Review Statistical analysis and analysis of data that related to big data is a common and quite relevant task, especially in such areas as smart systems, Internet of things (IoT), Cognitive Internet of Things (CIoT), web analytics [44-47]. One of the options for organizing data storage when analyzing big data is powerful distributed databases and data warehouses: Hydra [48], [49], Hadoop [50] and NetApp [51], [52]. For processing, the same methods can be used, as in MapReduce, to search for a HashSet [49], [52]. However, such approaches are fraught with low problems in real-time data processing, due to the excessive importance of high-latency analytical processes and poor applicability. Therefore, if simple additive metrics (such as total transactions, total page views, or average conversion price) are most valuable during analytics, it is obvious that the raw data can be effectively summed up, for example, on a daily basis or using simple counters in the flow. Calculating more complex metrics (such as average humidity, the number of unique users, or the most frequently encountered elements) is more complex and requires more resources if implemented consistently. PDS allows to evaluate these and many other metrics and manipulate [44-46] the accuracy of estimates for memory consumption. These data structures can be used as temporary data accumulators in query processing procedures, or perhaps more importantly, as a compact, sometimes surprisingly compact, replacement for raw data in streaming computing. A number of PDS’s will be analyzed in abstracts, while others are described in detail using a detailed mathematical analysis of these structures can be found in the original articles. Preliminary comments are as follows [47]:  for some structures, such as Loglog Counter or Bloom filter, there are simple and practical formulas that allow you to determine the structure parameters based on the expected amount of data and the required error probability. Other structures, such as Count-Min Sketch or Stream- Summary, have a complex dependence on the statistical properties of data, and experiments are the only reasonable way to understand their applicability to real-world use cases;  the applicability of the PDSs is not strictly limited to the above queries or a single set of data. Experiments and other works prove that structures filled with different data sets can often be combined to handle complex queries, and other types of queries can be supported using custom versions of the described algorithms. 2.1. Bloom filter Bloom filter is probably the most well-known and widely used PDS [53-56]. Bloom filter is similar to Linear Counting, but it is designed to maintain the identity of each element, not statistics. A Bloom filter is a bit array of M bits initialized at 0. To add an element, it is processed by k hash functions to get k array positions, and set the bits in these positions to 1. To request an element, it is passed to k hash functions to get k array position. If any of the bits in these positions are 0, then the element is definitely not included in the set. If all bits are equal to 1, then the element can be in the set. That is, If the filter has a relatively large size compared to the number of individual elements, each element has a relatively unique signature and you can check a specific value – whether it is already registered in the bit set or not. If all the bits of the corresponding signature are units, then the answer is yes (with a certain probability, of course). Bloom filter with a false positive rate of 1% requires only 9.6 bits per element, regardless of the size of the elements [54-56]. Similar to Linear Counting, Bloom filter supports bit typing. However, as the from description makes clear, each value is mapped not to one, but to a certain fixed number of bits using several independent k hash functions. Let's consider formulas that allow us to calculate the parameters of the Bloom filter as a function of error probability and capacitance. Given false positive probability p and the estimated number of insertions n , the length of the bit array can be calculated as [54-56]: n ln p m , (1) ln 2 2 where m is filter (sketch) size (bits); p is an error probability (false positive); n is maximum cardinality (capacity). The hash functions used for bloom filter should generally be faster than cryptographic hash algorithms with good distribution and collision resistance. m k ln 2 , (2) n where k is number of hash functions. In Figure 3 shows an example of placing elements in Bloom filter. Consider an example: we enter x , y and z in the filter with k  3 hash functions, as shown in the figure above. Each of these three elements has three bits, each with a value of 1 in the bit array. When we search for w in the set because one of the bits is not set to 1, Bloom filter will tell us that it is not in the set [53-56]. Figure 3: Placing elements in Bloom filter 2.2. MinHash To begin with, MinHash is not just a PDS. In computer science and data mining, MinHash is a method for quickly estimating how similar two sets are. This scheme was invented by Andrei Broder (1997) [57] and was originally used in the AltaVista search engine to detect duplicate web pages and exclude them from search results [58]. It has also been used in large-scale clustering tasks, such as clustering documents based on the similarity of their word sets. Let's introduce the concept of Jacquard similarity coefficient (Jaccard similarity coefficient) [57], [58]: A B J  A, B   . (3) A B In other words, the number of elements in the cross-section is divided by the number of elements in the union. This estimate is called the Jacquard coefficient, the coefficient is zero when the sets have no common elements, and one when the sets are equal, in other cases the value is somewhere in the middle. In general, the calculation of such a coefficient boils down to the fact that at the beginning we need to divide the sets into separate parts, these will be the elements of our sets, then we need to somehow calculate the dimensions of the intersection and union. Usually, to efficiently perform the last two operations, sets are represented as hash tables without key-associated values.this structure works very quickly. Building a table: On  , you need two of them, calculating the cross section: On  and calculating the union is also On  , where n is the number of elements in the set [57], [58]. Here are two sets A , B and h a hash function that can count hashes for elements of these sets. Next, we define the function hmin S  , which calculates the function h for all members of any set S and returns its lowest value. Now, let's start calculating hmin  A and hmin B  for different pairs of sets, the question is: what is the probability that hmin  A  hmin B  . This probability must be proportional to the size of the intersection of sets: in the absence of common terms, it tends to zero, and to one, when the sets are equal, in intermediate cases it is somewhere in the middle. This is the J  A, B  so Jacquard coefficient. However, if we just count hmin  A and hmin B  for our two sets and then compare the values, it won't give us anything, because there are only two options: equal or not equal. We need to somehow get enough statistics for our sets to calculate J  A, B  close to the truth. This is done simply, instead of one function h , we introduce several independent hash functions, or rather k functions [57]: 1  k  2, (3)   where  is the highest error value is desired. So, to calculate J  A, B  with an error of no more than 0.1, you need 100 hash functions. On the one hand, this is a lot. So let discuss in what will be the optimization. First, we can calculate the so-called H min S  signature, i.e. the minima of all hash functions for the set S . The complexity of calculating it is greater than when building a hash table, but, nevertheless, it is linear, and you only need to do this for each document once, for example, when adding it to the database, then it is enough to operate only with signatures [58]. Secondly, the signature has a fixed size for a given maximum error value. In order to compare any two sets of any size, you need to perform a fixed number of operations. In addition, in theory, signatures are much more convenient to index. In theory, because their indexing does not fit very well on relational databases, it is also not particularly suitable for full-text engines. 2.3. Count‐Min Sketch Count-min sketch is a PDS that is a table of event frequencies in a data stream. It uses hash functions to map events to frequencies, but unlike a hash table, it uses only sublinear space, by recalculating some events due to collisions [47], [59]. The Count-min sketch structure can be represented as a sketch with depth d and width w as at Figure 4. Depth d Width w Figure 4: Count‐min sketch with depth d and width w Then we represent the distribution of values in the Count–min sketch array, based on the principle shown in Figure 5 this distribution can be considered a sketch. Figure 5: Distribution of values in the structure In general, we will have a representation with a distribution according to the following rules (4) and (5).  estimation  2  max cardinalityhashfunction / widthsketch , (4)   1 0.5depthsketch , (5)  estimation is an evaluation error;  is a probability for a value with Count-min sketch; cardinalit y hashfuncti on is a power of the hash function; depthsketch is a sketch depth (height) ; widthsketch is a sketch width. It is worth noting that the width of the sketch limits the error value (4), and the height (depth) controls the probability that the score will break this limit (5) [47], [59]. 2.4. Comparison of PDS for further work Let's compare the analyzed PDS. The comparison is shown in tabular form in Table. 1. Table 1 PDS comparison Comparison criteria Bloom Filter MinHash Count‐Min Sketch Required number of Low High Low hash functions Complexity of hash Low Low Low functions Sketch size Medium Big Small Frequency of false Very low Low Low positives Ability to add / / delete Requires modifications Missing Maybe items As can be seen from the tabular comparison, all the analyzed PDS have certain disadvantages, which may become certain limitations for their further practical use during neurosynthesis [47], [60]. So for Bloom Filter, there are restrictions on adding or removing elements. This problem is solved in the latest modifications, but such mechanisms significantly complicate the work. MinHash requires a large number of hash functions while running, and while they may be quite simple, calculating them still takes up some time resources. Moreover, when using a large number of such hash functions, the sketch of the resulting view also becomes more complicated. Count-Min Sketch looks like the best option in this comparison, precisely because of the low complexity of the sketch and low requirements for the number and complexity of hash functions. However, it should still be noted that there is a higher probability of false triggering (although it is also noted in other PDS cases) [47], [60]. Therefore, we will plan the further use scheme based on these shortcomings. 3. Materials and methods Based on the above-mentioned features, subprocesses of neurosynthesis and analysis of the most popular PDS, we will determine the scheme of implementation of PDS in the process of neuroevolutionary synthesis of ANN:  Bloom Filter will be used when sending information about the ANN between the cores of a parallel system;  MinHash is used in the sequential execution of neuroevolutionary synthesis to encode genetic information about individuals, to assess acceleration at the breeding stage;  Count-Min Sketch will also be used to encode genetic information about individuals during neuroevolution synthesis. Previously, it has been noted that all genetic information about neural networks is encoded using direct coding sequencing [61-64]. In general, this approach is based on coding connections: the genotype of an individual presents information about the weights ( WW ) of interneuronal connections of the neuromodel, but each gene will contain information about the indices of the initial ( N i ) and final ( N o ) communication neuron, as well as its weight. In the case when the method works with recurrent neural networks, an additional cell with the feedback weight is added, its index is determined by the index of the original neuron. When using Bloom filter, four hash functions will be used, which will be used to sequentially encode values from an array of interneuronal relationships [65]: H 1 N i   H 2 N o   H 3WW   f WW  , (6) So, to find the value of f WW  , hash WW four times, perform three table searches, and exclude-or combine-four values. In practice, four v hashes can be reduced to a single 64-bit or 128-bit hash, which is divided into four 16-bit or 32-bit values, respectively. Such an operation will be performed with the best individuals before sending them from parallel cores to the main thread of the parallel system to investigate the reduction of overhead costs. MinHash will be used for coding and subsequent comparison in sequential neuroevolution synthesis. However, since adding or removing data from such a structure is not possible (or not rational due to computational costs), the population size will increase and the number of epochs will decrease [57]. When using Count-min Sketch, an array of data about the ANN is represented as a specific sketch and decoded afterwards as at Figure 6. Figure 6: Encoding and decoding information about the ANN using Count‐min sketch Thus, the proposed encoding method begins with an initial representation of a matrix constructed from so called polymer cells containing information about the input-output connections between neurons and the weights of such connections. Next, hash functions are defined (by default, we recommend taking 4 hash functions) and a matrix is created for their output, as shown in Figure6. After that, hash outputs are calculated for each element of the ANN data stream and the corresponding counter in the matrix is increased. Thus, increasing the corresponding counts in the matrix, we get an updated matrix. In some cases, due to a hash collision, it is possible that the received frequency of the element is slightly higher than expected. The encoding accuracy will depend on how unique the hash functions that return the element value are. Also, the larger the hash function, the more accurate the frequency will be. In this case, the probabilistic data structure Count–min sketch allows you to calculate the frequency of big data flows in sublinear space with an estimated time complexity, that is O1 , with a constant time component that does not depend on the parameters of the neural network model. 4. Experimental research For an experimental study of the work, it was decided to choose a moderate sample of data as a sample, which would not initially require additional complications of neural networks, since this can become a problem in the case of using MinHash. test it on two tasks that would differ in scale. This is how the Tic-Tac-Toe Endgame Data Set was selected [66-69]. The main characteristics of the sample are shown in Table 2. As a parallel computing system, the equipment of Department of the software tools the National university "Zaporizhzhia polytechnic" was used: Xeon processor E5-2660 v4 (14 cores), RAM 4x16 GB DDR4, the programming model of Java threads. Table 2 Characteristics of the tic‐Tac‐Toe endgame data set sample Tic‐Tac‐Toe Endgame Data Set Data Set Characteristics: Multivariate Number of Instances: 958 Attribute Characteristics: Categorical Number of Attributes: 9 To test the MinHash case, the metaparameters specified in Table 3 will be set. Table 3 Metaparameters for the MinHash use case Comparison criteria Bloom Filter Population size 100 Number of training epochs 1 Elite size 5% Activation function (fitness functions) hyperbolic tangent Probability of mutation 25% Crossover type two‐point Types of mutation deleting an interneuronal connection removing a neuron adding interneuronal connection adding a neuron changing the activation function For testing the Count–min sketch case, the main parameters remain unchanged, but the number of epochs increases to 5. When testing the Bloom Filter use case, not only the meta parameters of the method were met, but also the hardware usage parameters specified in Table 4. Table 4 Metaparameters for the Bloom Filter use case Comparison criteria Bloom Filter Population size 100 Number of training epochs 5 Elite size 5% Activation function (fitness functions) hyperbolic tangent Probability of mutation 25% Type of crossover on parallel threads two‐point Type of crossover on the main thread uniform (two‐point) Types of mutation deleting an interneuronal connection removing a neuron adding interneuronal connection adding a neuron changing the activation function Number of cores (threads) 14 The test results for the tic-tac-Toe Endgame data set sample presented using MinHash are shown in Table 5. The test results for the tic-tac-Toe Endgame data set sample presented using Count-Min Sketch are shown in Table 6. Table 5 Test results using MinHash Memory usage Time at the Error in the Synthesis (at the Error in the Method selection training Time, s selection test sample stage, s sample stage) MGA 2487 561 91% 0.11 0.24 MGA + 2189 287 61% 0.11 0.25 MinHash Table 6 Test results using Count‐Min Sketch Memory usage Time at the Error in the Synthesis (at the Error in the Method selection training Time, s selection test sample stage, s sample stage) MGA 5364 1742 94% 0.024 0.13 MGA + Count‐Min 4865 1025 56% 0.029 0.15 Sketch The test results for the tic-tac-Toe Endgame data set sample presented using Bloom filter are shown in Table 7. Table 7 Test results using Bloom filter Average time Error in the Synthesis Communication of Error in the Method training Time, s overhead communication test sample sample overhead MGA 2494 0.4 199.5 0.022 0.1 MGA + Bloom 1867 0.28 104.6 0.03 0.174 filter 5. Analysis of experimental results Analyzing the presented results, we can come to the following conclusions. When using MinHash, there were no significant improvements: it was at the selection stage that memory usage decreased from 92% to 61%, but up to this point, memory usage has not changed. Also, the acceleration was only at the selection stage, namely from 1742 s to 1025 s. However, the time for encoding and decoding networks increased significantly, precisely because of the complexity of this process, so the optimization was rather point-based and cannot be recommended for use in the future. In addition, we should mention the low accuracy due to the reduction in the number of epochs and the inability to use crossbreeding. When using Count-min Sketch, the acceleration situation was similar, because during the execution of the method, the acceleration was noticeable, but the process of encoding and decoding individuals took too long, so the total synthesis time is almost the same. But memory usage has really become justifiably optimized from 94% to 56%. This can be explained by the fact that, unlike MinHash, almost during the entire process, in addition, such encoding made it possible to perform mutation and crossing without significant problems. On the other hand, we should note a slight decrease in the accuracy of operation (due to false positives of the decoded network). Using Bloom filter made it possible to reduce the share of shipments and their time, because data was sent from cores in a compact format. In addition, decoding networks on the main core did not impose additional complex calculations on parallel cores. However, even this strategy did not significantly improve the speed of work. Moreover, although uniform crossing was chosen on the main core, during the development of information modules, the problem of organizing multi-parent crossing arose. Therefore, unfortunately, it was necessary to use uniform crossover as a two-point crossover. In general, it can be seen that the most promising is the use of Count-min Sketch, because this made it possible to perform the full synthesis process without additional complications. 6. Conclusions The paper examines the variants and possibilities of using PDS in neuroevolutionary synthesis of ANNs, which can significantly reduce memory usage during synthesis and speed up this process. The scientific novelty lies in the fact that during neuroevolutionary synthesis at different stages and with different execution schemes (sequential and parallel), it is proposed to use different PDS. This made it possible to study options for reducing resource intensity pointwise and in general. Studies have shown that despite the popularity of PDS for preserving big data in neuroevolutionary synthesis, this approach meets a number of obstacles. Moreover, during the work, recommendations were developed for the use of individual PDS for individual stages and approaches. The practical significance lies in the fact that practical problems of coding the ANN are solved, which can later be used for diagnostics, forecasting, evaluation and modeling. The results of the experiments showed that the proposed coding methods make it possible to encode information about the ANN more compactly for its subsequent transmission to workstations for use as a model for diagnosis, prediction, evaluation and modeling. Prospects for further research and development areas include a detailed analysis of the use of the Count-Min sketch data structure. This is due to the most acceptable results when testing various PDS and approaches to their use. 7. Acknowledgements The work was carried out with the support of the state budget research projects of the state budget of the National University “Zaporozhzhia Polytechnic” “Intelligent methods and software for diagnostics and non-destructive quality control of military and civilian applications” (state registration number 0119U100360) and “Development of methods and tools for analysis and prediction of dynamic behavior of nonlinear objects” (state registration number 0121U107499). 8. References [1] M. Elgendy, Deep Learning for Vision Systems, New York, Manning Publications, 2020. [2] D. Hudgeon, R. Nichol, Machine Learning for Business: Using Amazon SageMaker and Jupyter, Manning Publications, 2020. [3] D. Barh, Artificial Intelligence in Precision Health: From Concept to Applications, Academic Press, Cambridge, 2020. [4] K. Warr, Strengthening Deep Neural Networks: Making AI Less Susceptible to Adversarial Trickery, O'Reilly Media, Massachusetts, 2019. [5] Y. Goldberg , G. Hirst, Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies), Morgan and Claypool Publishers, California, 2017. [6] T. Neskorodieva, E. Fedorov, Method for Automatic Analysis of Compliance of Expenses Data and the Enterprise Income by Neural Network Model of Forecast, in: Proceedings of the Mathematical Modeling and Simulation of Systems (MODS'2020), Springer (2020), 156-165. doi: 10.1007/978-3-030-58124-4_15 [7] J.A.J. Alsayaydeh, W.A.Y. Khang, A.K.M.Z Hossain, V. Shkarupylo, J. Pusppanathan, The experimental studies of the automatic control methods of magnetic separators performance by magnetic product, ARPN Journal of Engineering and Applied Sciences, vol. 15(7) (2020) 922- 927. [8] J.A.J. Alsayaydeh, W.A. Indra, W.A.Y. Khang, V. Shkarupylo, D.A.P.P. Jkatisan, Development of vehicle ignition using fingerprint, ARPN Journal of Engineering and Applied Sciences, vol. 14(23) (2019) 4045-4053. [9] J.A.J. Alsayaydeh, W.A.Y. Khang, W.A. Indra, J.B. Pusppanathan, V. Shkarupylo, A.K.M. Zakir Hossain, S. Saravanan, Development of vehicle door security using smart tag and fingerprint system, ARPN Journal of Engineering and Applied Sciences, vol. 9(1) (2019) 3108-3114. [10] J.A.J. Alsayaydeh, W.A.Y. Khang, W.A. Indra, V. Shkarupylo, J. Jayasundar, Development of smart dustbin by using apps, ARPN Journal of Engineering and Applied Sciences, vol. 14(21) (2019) 3703-3711. [11] J.A. Alsayaydeh, M. Nj, S.N. Syed, A.W. Yoon, W.A. Indra, V. Shkarupylo, C. Pellipus, Homes appliances control using bluetooth, ARPN Journal of Engineering and Applied Sciences, vol. 14(19) (2019) 3344-3357. [12] Google, 2021. URL: https://www.google.com/ [13] S. Dresden, Search engine with neural network weighting based on parametric user data, 2004. Patent No. US20050004905A1, Filed March 3rd., 2003, Issued June 6th., 2005. [14] Search Engines and Neural Networks, 2018. URL: https://towardsdatascience.com/https- towardsdatascience-com-search-engines-and-neural-networks-97e0df4f088d [15] W. Serrano, Neural Networks in Big Data and Web Search. Neural networks in web search, 4(1):7 (2018) 1-41. doi: 10.3390/data4010007 [16] Bing, 2021. URL: https://www.bing.com/ [17] Microsoft to accelerate Bing search with neural network, 2015. URL: https://www.extremetech.com/extreme/199814-microsoft-to-accelerate-bing-search-with-neural- network [18] Baidu, 2021. URL: http://www.baidu.com/ [19] YATI & ERNIE: Machine Learning in Yandex and Baidu, 2021. URL: https://www.searchenginejournal.com/yati-ernie-machine-learning-yandex-baidu/392982/ [20] V.Mnih, A.P. Badia, M. Mirza, A. Graves, et al., Asynchronous methods for deep reinforcement learning, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML'16, 48 (2016) 1928–1937. [21] L. Tai, J. Zhang, M. Liu, J. Boedecker, W. Burgard, A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation, 2018. URL: https://arxiv.org/abs/1612.07139 [22] R. Gilman, K. Wang Intuitive RL: Intro to Advantage-Actor-Critic (A2C), 2018. URL: https://hackernoon.com/intuitive-rl-intro-to-advantage-actor-critic-a2c-4ff545978752 [23] J. K.H. Franke, G. Köhler, N. Awad, F. Hutter Neural Architecture Evolution in Deep Reinforcement Learning for Continuous Control NeurIPS 2019 MetaLearn Workshop, 2019. URL: https://arxiv.org/abs/1910.12824 [24] Reinforcement Learning: Deep Q Networks, 2020. URL: https://blogs.oracle.com/datascience/reinforcement-learning-deep-q-networks [25] A. Juliani Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C), 2016. URL: https://medium.com/emergent-future/simple-reinforcement-learning- with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 [26] P. Dangeti, Statistics for Machine Learning: Techniques for exploring supervised, unsupervised, and reinforcement learning models with Python and R, Packt Publishing, Birmingham, 2017. [27] B. Bateman, A. R. Jha, B. Johnston, I. Mathur, The Supervised Learning Workshop: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition, Packt Publishing, Birmingham, 2020. [28] X. Zhu, A.B. Goldberg, Introduction to Semi-Supervised Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool Publishers, California, 2009. [29] G. Hinton, Unsupervised Learning: Foundations of Neural Computation (Computational Neuroscience), Bradford Books, Massachusetts, 1999. [30] A. Jones, C. Kruger, B. Johnston The Unsupervised Learning Workshop: Get started with unsupervised learning algorithms and simplify your unorganized data to help make future predictions, Packt Publishing, Birmingham, 2020. [31] R.S. Sutton, A.G. Barto, Reinforcement Learning, second edition: An Introduction (Adaptive Computation and Machine Learning series) second edition, Bradford Books, Massachusetts, 2018. [32] P. Winder, Reinforcement Learning: Industrial Applications of Intelligent Agents, O'Reilly Media, Massachusetts, 2020. [33] L. Graesser, W.L. Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (Addison-Wesley Data & Analytics Series), Addison-Wesley Professional, Boston, 2019. [34] K. O. Stanley, J. Clune, J. Lehman, et al., Designing neural networks through neuroevolution. Nature Machine Intelligence 1 (2019) 24–35. doi: 10.1038/s42256-018-0006-z [35] Omelianenko, Hands-On Neuroevolution with Python: Build high-performing artificial neural network architectures using neuroevolution-based algorithms, Packt Publishing, Birmingham, 2019. [36] A. Bergel, Agile Artificial Intelligence in Pharo: Implementing Neural Networks, Genetic Algorithms, and Neuroevolution, Apress, New York, 2020. [37] A. Gaier, David Ha Exploring Weight Agnostic Neural Networks, 2019. URL: https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html [38] K O. Stanley, J. Clune Welcoming the Era of Deep Neuroevolution, 2017. URL: https://eng.uber.com/deep-neuroevolution/ [39] A.O. Oliinyk, T.A. Zaiko, S.A. Subbotin, Factor analysis of transaction data bases. Automatic Control and Computer Sciences 48(2) (2014) 87-96. doi: 10.3103/S0146411614020060 [40] A. Oliinyk, S. Skrupsky, S.A. Subbotin, Parallel computer system resource planning for synthesis of neuro-fuzzy networks. Advances in Intelligent Systems and Computing, 543 (2017) 88-96. doi: 10.1007/978-3-319-48923-0_12 [41] A. Oliinyk, S. Skrupsky, S. Subbotin, I. Korobiichuk, Parallel method of production rules extraction based on computational intelligence. Automatic Control and Computer Sciences, 51(4) (2017) 215-223. doi: 10.3103/S0146411617040058 [42] A. Oliinyk, S. Skrupsky, S. Subbotin, Experimental research and analysis of complexity of parallel method for production rules extraction. Automatic Control and Computer Sciences, 52 (2) (2018) 89-99. doi: 10.3103/S0146411618020062 [43] S. Leoshchenko, A. Oliinyk, S. Subbotin, T. Zaiko, N. Gorobii, Implementation of selective pressure mechanism to optimize memory consumption in the synthesis of neuromodels for medical diagnostics, in: Proceedings of the 2nd International Workshop on Informatics & Data- Driven Medicine, IDDM 2019, CEUR-WS, 2019, pp. 109-120. DBLP Key: conf/iddm/LeoshchenkoOSZG19 [44] A. Gakhov, Probabilistic Data Structures and Algorithms for Big Data Applications, Books on Demand, 2019. [45] H. Knebl, Algorithms and Data Structures: Foundations and Probabilistic Methods for Design and Analysis, Springer, Berlin, 2020. [46] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, The MIT Press, Cambridge, 2009. [47] Probabilistic data structures for web analytics and data mining, 2012. URL: https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data- mining/ [48] Hydra, 2014. URL: https://github.com/addthis/hydra [49] Big Data Hadoop Alternatives: What They Offer and Who Uses Them, 2018. URL: https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135 [50] Apache Hadoop, 2021. URL: https://hadoop.apache.org/ [51] NetApp, 2021. URL: https://www.netapp.com/ [52] Big Data Analytics solutions: The bigger the better, 2021. URL: https://www.netapp.com/artificial-intelligence/big-data-analytics/ [53] B.H. Bloom, Space/time trade-offs in hash coding with allowable errors. Communications of the ACM Т. 13(7) (1970) 422–426. https://doi.org/10.1145/362686.362692 [54] Bloom Filters: Is element x in set S?, 2017. URL: https://www.abhishek-tiwari.com/bloom- filters-is-element-x-in-set-s/ [55] How to count Big Data: Probabilistic data structures and algorithms, 2019. URL: https://www.kdnuggets.com/2019/08/count-big-data-probabilistic-data-structures- algorithms.html [56] Introduction to Probabilistic Data Structures, 2015. URL: https://dzone.com/articles/introduction- probabilistic-0 [57] A.Z. Broder, On the resemblance and containment of documents. Compression and Complexity of Sequences: Proceedings, IEEE (1997) 21–29. doi:10.1109/SEQUEN.1997.666900 [58] A.Z. Broder, M. Charikar, A.M. Frieze, M. Mitzenmacher, Min-wise independent permutations, in: Proceedings of the 30th ACM Symposium on Theory of Computing (STOC '98), New York, NY, USA: Association for Computing Machinery (1998) 327–336. doi:10.1145/276698.276781 [59] G. Cormode, Count-Min Sketch, Encyclopedia of Database Systems, Springer, 2009, 511-516. [60] I. Katsov, Introduction to Algorithmic Marketing: Artificial Intelligence for Marketing Operations, Books on Demand, 2017. [61] S. Leoshchenko, A. Oliinyk, S. Subbotin, T. Zaiko, S. Shylo, V. Lytvyn, Sequencing for encoding in neuroevolutionary synthesis of neural network models for medical diagnosis, in: Proceedings of the 3rd International Conference on Informatics & Data-Driven Medicine, IDDM 2020, CEUR-WS, 2020, pp. 62-71. [62] B.A. Pierce, Genetics: A Conceptual Approach, W.H. Freeman & Co. Ltd., New York, 2019. [63] R. Brooker, Genetics: Analysis and Principles, McGraw-Hill Education, New York, 2020. [64] S. Leoshchenko, A. Oliinyk, S. Subbotin, N. Gorobii, T. Zaiko, Synthesis of artificial neural networks using a modified genetic algorithm, in: Proceedings of the 1st International Workshop on Informatics & Data-Driven Medicine, IDDM 2018, CEUR-WS, 2018, pp. 1-13. DBLP Key: conf/iddm/PerovaBSKR18 [65] B. Reagen, U. Gupta, R. Adolf, M. Mitzenmacher, A. Rush, G.-Y. Wei, D. Brooks, Weightless: Lossy Weight Encoding For Deep Neural Network Compression, Computer Science, Mathematics, 2017. URL: http://proceedings.mlr.press/v80/reagan18a/reagan18a.pdf [66] Tic-Tac-Toe Endgame Data Set, 1991. URL: https://archive.ics.uci.edu/ml/datasets/Tic-Tac- Toe+Endgame [67] C. J. Matheus, ,L. A. Rendell Constructive induction on decision trees. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, MI: Morgan Kaufmann, 1989, pp. 645-650. [68] C. J. Matheus Adding domain knowledge to SBL through feature construction. Proceedings of the Eighth National Conference on Artificial Intelligence,Boston, MA: AAAI Press, 1990, pp. 803-808. [69] D. W. Aha Incremental constructive induction: An instance-based approach. In Proceedings of the Eighth International Workshop on Machine Learning, Evanston, ILL: Morgan Kaufmann, 1991, pp. 117-121.