Implementation of reinforcement learning strategies in the synthesis of neuromodels to solve medical diagnostics tasks

Implementation of reinforcement learning strategies in the synthesis of neuromodels to solve medical diagnostics tasks SerhiiLeoshchenko National university "Zaporizhzhia polytechnic"

Zhukovskogo street 64 69063 Zaporizhzhia Ukraine

AndriiOliinyk National university "Zaporizhzhia polytechnic"

Zhukovskogo street 64 69063 Zaporizhzhia Ukraine

SergeySubbotin National university "Zaporizhzhia polytechnic"

Zhukovskogo street 64 69063 Zaporizhzhia Ukraine

ViktorLytvyn National university "Zaporizhzhia polytechnic"

Zhukovskogo street 64 69063 Zaporizhzhia Ukraine

OleksandrKorniienko National university "Zaporizhzhia polytechnic"

Zhukovskogo street 64 69063 Zaporizhzhia Ukraine

Implementation of reinforcement learning strategies in the synthesis of neuromodels to solve medical diagnostics tasks FEBBCC27F50B58BC5D840305B2A82D9E GROBID - A machine learning software for extracting information from scholarly documents medical diagnostics neuromodel synthesis reinforcement learning penalty and reward duel

The highlevel of accuracy of the functioning of artificial neural network (ANN) diagnostic models described at the resources indicates the prospects for the use of ANN in various fields of medicine for the diagnosis and forecasting of diseases. The implementation of diagnostic neuromodels into clinical practice can provide effective results during making medical decisions, contribute to improving the accuracy of diagnosis of diseases, and speed up process of examination of the patient. It is also worth noting that ANN can be used as models of the subject area under consideration. By changing the input data of the neural network model, observing the behavior of the output signals, it is possible to research the subject area under consideration, identify and investigate medical patterns that the ANN extracted during training. However, medical tasks become more complicated every time: the nature of clinical data about the patient changes, the data is constantly updated, the volume of data increases, as well as the hidden connections in the data. An additional challenge is the increased requirements for the adaptability and sensitivity of the neuromodel for a particular patient or disease. Using a reinforcement learning approach demonstrates good training results on incomplete data or in areas of high specificity. The paper investigates the possibility of using reinforcement learning strategies for the synthesis of high-precision neuromodels for subsequent use in medical diagnostics.

Introduction

ANNs are increasingly used in intelligent medical systems every year. Among the possible use cases are [1], [2]:  methods that look for deviations in MRI images, mammography, X-rays. Before the pandemic, developers often created such programs to help doctors diagnose cancer. Since the beginning of the pandemic, they have been changed for the diagnosis of COVID-19;  analysis of medical records, patient complaints. The doctor enters data about the patient into the database: test results, examination data, and the program offers treatment tactics. This is how one of the most famous programs in this industry works: Watson from IBM;  control of medical staff. It is extremely important for the head of the clinic to understand whether the doctors prescribe the procedures and treatment correctly. The patient's medical history has everything: what he came with, what tests he was prescribed, what treatment. The method looks for anomalies and points to histories where excessive treatment is prescribed, too many procedures. Or to those where it is less than in similar cases.

IDDM-2021: 4rd International Conference on Informatics & Data-Driven Medicine, November 19-21, 2021, Valencia, Spain EMAIL: sergleo.zntu@gmail.com (S. Leoshchenko); olejnikaa@gmail.com (A. Oliinyk); subbotin@zntu.edu.ua (S. Subbotin); lytvynviktor.a@gmail.com (V. Lytvyn); al.korn95@gmail.com (O. Korniienko) ORCID: 0000-0001-5099-5518 (S. Leoshchenko); 0000-0002-6740-6078 (A. Oliinyk); 0000-0001-5814-8268 (S. Subbotin); 0000-0003-4061-4755 (V. Lytvyn); 0000-0003-4812-5382 (O. Korniienko) Moreover, researchers often have to work with more specific tasks that are not so common in mass practice [3][4][5][6]:

 methods for detecting signs of early-stage Alzheimer's disease on MRI images;  a method that looks for anomalies in X-ray images;  methods for the control of bedridden patients. There are cameras in the wards that are connected to a program that can recognize a specific situation: a patient falling out of bed. If this happens, the nurses are automatically notified;  methods for monitoring the workload of operating tables. The program determines how evenly the load is distributed on medical teams in different operating rooms;  medical reference book with artificial intelligence (the doctor enters data about the patient, the program suggests a solution). However, all these tasks are characterized by similar problems in the process of implementing ANN. For example, getting data. To create a neuromodel designed for any task, it must be trained on data. To teach her to see an anomaly on an X-ray or to determine that it is cancer and not pneumonia, she needs to show a lot of such pictures (thousands, hundreds of thousands, millions). The diagnosis must be correctly signed on all the pictures, otherwise the program will make more mistakes [7][8][9][10].

So, many researchers agree that the main difficulty of developers is: the lack of homogeneous and high-quality data. A developer can't just come to a hospital and take medical data about patients. Even taking into account the fact that they are depersonalized, for example, X-rays without a first and last name [7][8][9][10]. These data are protected by several legal laws at once: on medical secrecy, on personal data, etc. Large Western universities often provide developers with arrays of data to guarantee the ability to train a model. But then there is a problem with data compatibility. For example, the developers received a database with postoperative X-rays: control images, which are made after surgery in the patient's supine position. However, to analyze the results of screening studies, the pictures are taken most massively when the patient is standing, it is impossible to apply a system trained on such data. The patient's X-rays lying down and standing are two very different pictures. There are also always doubts about the reliability and accuracy of other people's data. It is difficult to train models that prompt a doctor to make a decision based on text data: approaches to the treatment of certain diseases may differ in each country [11][12][13].

Reinforcement learning is a approach of machine learning method in which a model is trained that has no information about the system, but has the ability to perform any actions in it. Actions move the system to a new state, and the model receives some reward from the system. Therefore, such a strategy can be a useful practice for solving medical problems.

So the main goal of the work will be to develop a new method of neuroevolutionary synthesis of neuromodels for medical diagnostics with the borrowing of strategies and mechanisms of reinforcement learning methods. This approach will eliminate most of the disadvantages of neuroevolutionary methods.

Related Works

In recent years, researchers have observed significant qualitative growth in reinforcement learning technologies. If initially this approach demonstrated good results in game tasks, then at the moment neuromodels trained with reinforcement learning methods are actively used for pattern recognition, agent management in robotics and decision-making in continuous tasks [14][15][16][17][18][19][20].

Sometimes reinforcement learning is distinguished not as a separate strategy, but as an offshoot from the strategy of learning with a teacher. This is due to the formulation of the assignment: a real or virtual environment acts as a teacher. However, this is the main mistake of this classification. After all, the environment in this case reacts to the agent dynamically and each time the reaction may be different. Thus, during the training process, the agent receives information from the external environment about where there is no exit, thus, he studies the surrounding world and learns to find a way out [14][15][16][17][18][19][20].

It should be noted that several factors have influenced the rapid development of the reinforcement learning approach [14][15][16][17][18][19][20]:  increased computing speed (using powerful distributed and parallel computing systems, the use of many lightweight threads of modern GPUs);  a significant increase in the amount of suitable data for training models in open repositories (for example, ImageNet);  dissemination of new ANN topologies (CNN, LSTM, GRU);  expansion and distribution of computing infrastructures (Linux, TCP/IP, Git, ROS, PR2, AWS, AMT, TensorFlow, etc.). So in general, It can be concluded that the main impetus for recent progress is not new ideas and methods, but the intensification of computing, sufficient data, mature infrastructure. And, despite significant practical results, their theoretical basis still remains simple [16][17][18][19][20].

The most common and researched reinforcement learning method is Policy Gradient (PG) [21][22][23][24][25][26][27][28]. The popularity of this method is explained by theoretically supported rules for optimizing the expected reward:  clear policy;  transparent rules. In general, PG can be represented in the likeness of the diagram in Fig. 1. Then basically the method will consist of performing 4 basic steps [21][22][23][24][25][26][27][28]

                                    i i T t i t i t N i i t i t T t s a r s a N J 1 1 1 | | log 1

, where    J is a function of the maximized mathematical expectation of the sum of the agent's winnings  , and

     J

is the gradient of this function. Then,   Further researches of reinforcement learning methods was found in the more complete and advanced Q-learning method [21][22][23][24][25][26][27][28]. Q-learning is a method that researched values from a special table that measures in what quality level it will be performed a certain action in any state (it can be measured this with a simple scalar value, so the larger the value, the better the action). The values which stored in the table are called "Q-values". These are estimates of the amount of future awards. In other words, they estimate how much more reward it could be get before the end of the game by being in the ( i s ) state and performing the ( i a ) action. This method allows to get more information about the environment at every step. This information is used to update the values in the table [21][22][23][24][25][26][27][28].

The basic concept of Q-learning is based on the Bellman equation:

    ' , ' max , ' a s Q r a s Q a    ,(1)

Q is a Q-Values for the state given a particular state; i s is a sequence of agent states ( i s ); i a is a sequence of agent and actions; r is a expected discounted cumulative reward;  is a the award in the future, devaluing future awards.

The equation states that the value of Q for a certain state-action pair should be the reward received when moving to a new state (by performing this action), added to the value of the best action in the next state. And to resolve the conflict, when the hypothesis works that receiving an award right now is more valuable than receiving an award in the future,  number is used from 0 to 1 (usually from 0.9 to 0.99), which is multiplied by the award in the future, devaluing future awards [21][22][23][24][25][26][27][28]. with the parameter  . To evaluate this network, firstly should be optimized the following sequence of function dropouts on iteration

State Action Q Table State-Action Value - - - - - - - - 0 0 0 0 0 0 0 0 Q-Value State Q-Value Action 1 Q-Value Action 2 Q-Value Action N … Deep Q-Learning General Q-Learningi :                 2 ' , , , , , i DQN i s r a s i i a s Q y E L ;   ' , ' , ' max     a s Q r y DQN i

, updating the parameters of the descent gradient such that [21][22][23][24][25][26][27][28]. Dueling DDQN is a special state-of-the-art deep Q learning algorithm consisting of separate duel architectures that share streams of value and benefits in deep Q networks to determine the value of the next state. Prioritizing experience reproduction, i.e. sampling mini-experience packages that have a large expected impact on learning, further increases efficiency [21][22][23][24][25][26][27][28].

        i i i DQN i s r a s i i i a s Q a s Q y E L          , , , , ' , , ,DNN DNN DNN Flatten FC FC V(s) A(s,a 1 )

A(s,a 2 )

A(s,a 3 )

Aggregation Layer Q(s,a 1 ) Q(s,a 2 ) Q(s,a 3 )

Figure 3: General architecture of the Q-learning method

Proposed method

As it was presented in the previous section, reinforcement learning methods have great prospects for solving problems that are poorly formalized, with incomplete data or with a dynamic environment. In our work, we propose a method based on strategies of reinforcement learning methods [29].

So it is proposed: 1. taken the neuroevolutionary synthesis of neuromodels as a basis, but with the addition of 2 separate neural networks: a network for evaluating and monitoring the environment ( crit glob NN _ ) and a network for duplicating the parameters of the best agent at the next step ( 3. during the synthesis and modification of the structure of individual individuals considered as agents in the environment, all information will be forwarded to the global network crit glob NN _ , whose task is to compare the current results of agents with the reference results of training data and adjust the penalty or reward for each agent; 4. at the same time, after evaluating the actions of all agents, the agent with the best results is selected at the iteration ( The main goal of this step is to evaluate the results in the next iteration with the previously best ones.

This synthesis approach also assumes the presence of an additional identifier: the evaluation of the reward growth step lev Q mark  [30][31][32][33][34]. Such an identifier will help to avoid areas of local extremes, since if the reward value decreases less than the specified one, it is possible to change the best agent in the population peratively.

The general progress of the method is shown in Fig. 4.

Results and Discussions

A data set was selected for testing based on the characteristics of patients with pneumonia, which was recently presented by authors M.-A. Kim, J. Seok Park,C. W. Lee, and W.-I. Choi [35]. Total sample size: 77490 values. Table 1 shows the characteristics of the set date.

For this task, the development of neuromodels will make it much easier to determine the further diagnosis of a person after collecting data on their well-being. Given that pneumonia is one of the most important signs and complications of COVID-19 [36], [37], after additional training on advanced data, this model can be used to diagnose patients or predict the further development of disease dynamics.

Number of instances' 1435

It will be compared the work of the proposed method reinforcement learning (RLNE) with the modified neuroevolution genetic algorithm method (MGA) which synthesis tasks will be RNN and DNN [14], [15], [34]. For methods compared, will be used next characteristics of the metaparameters. Types of mutation deleting a connection between neurons removing a neuron (hidden layer) adding a connection between neurons adding a neuron (hidden layer) changing the type of activation function The test results for the synthesis task are shown in Table 3. Analyzing the results, it can be concluded that the proposed method has well demonstrated the synthesis time in comparison with the use of MGA for the synthesis of DNN. This is due to the fact that topologically synthesized neuromodels were simpler and their modifications required less effort. However, the time results are inferior in time to MGA for RNN synthesis. A possible explanation may be that during RNN synthesis, there was no need to clone the best individuals to compare the results, since the presence of recurrent connections makes this process easier.

Another important characteristic is the accuracy of the synthesized solutions. So the solutions obtained by RLNE were more accurate both on training and test data, but the difference in error with MGA RNN is not so significant. And the results of MGA DNN were even better. It is likely that deep networks allow encode hidden connections between data more accurately.

The second stage of the study of experimental results was focused on the characteristics of resource consumption during the synthesis of solutions. So special attention was paid to measuring the load on the CPU and RAM [38]. Such monitoring allows more accurately determine the load distribution at different iterations of the method execution. The CPU and RAM load graphs are shown in Fig. 5 and 6, respectively. During the use of MGA in both cases, the load on the CPU and RAM was more abrupt, but did not exceed the mark of 81-82% on average. When using RLNE, the load distribution was more systematic, but it often reached 100%. These indicators are important when designing a parallel approach in synthesis using methods. So a relatively low load allow implement MGA on highly productive GPUs, but the high resource consumption of RLNE, on the contrary, limits this possibility.

Conclusion

The proposed strategies and method demonstrated the accepted level of work. Thus, the accuracy of the resulting solution was increased by 6.4% (from 0.157 to 0.147). It was also possible to reduce the synthesis time: in comparison with analogues by 8.5% (from 8031 s to 7352 s). However, a high level of resource consumption limits the parallelization of the method, which in turn can significantly limit the genetic diversity of individuals. In the future, it is possible to implement the main strategies of the proposed method in parallel implementations of neuroevolutionary methods for the purpose of intellectual maintenance and control of populations of solutions.

Also, an important option for further research may be to simplify the proposed strategy by extracting a clone of the best result at the iteration and replacing this approach with the use of individual agents with recurrent connections, but by tightening the control of the import of the barrier from the external global critic network. On the other hand, this approach will allow to focus the work of the critic's network on the external data of the environment.

FitFigure 1 :1step T at which the transition to the terminal state occurred; 3. result not agree to the extreme, repeat from point 1.Generate samples (i.e. run the policy) General scheme of the PG method

Figure 2 :2Figure 2: General scheme of the Q-learning method

rest of the population will be a set of different individual agents (

Figure 5 :Figure 6 :56Figure 5: Load on the CPU during synthesis

Table 22Metaparameters for methodsMetaparameter of the methodCharacteristics of the metaparameterNumber of individual in population (size)100Elite size of population5%Neurons activation functionhyperbolic tangentProbability of the mutation (for the MGA)25%Type of the crossovertwo-pointReward[-1;0;1]

Table 33General results on data setMethod of synthesisSynthesis Time, sError at the trainingError at the testsamplesampleRLNE73520.0210.147MGA RNN71730.030.157MGA DNN80310.0190.134

Acknowledgements

The work was carried out with the support of the state budget research projects of the state budget of the National University "Zaporozhzhia Polytechnic" "Intelligent methods and software for diagnostics and non-destructive quality control of military and civilian applications" (state registration number 0119U100360) and "Development of methods and tools for analysis and prediction of dynamic behavior of nonlinear objects" (state registration number 0121U107499).

An optimized artificial neural network model for the prediction of rate of hazardous chemical and healthcare waste generation at the national level VMAdamović DZAntanasijević MĐRistić 10.1007/s10163-018-0741-6 J Mater Cycles Waste Manag 20 2018 Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis MJohnson AAlbizri SSimsek 10.1007/s10479-020-03872-6 Ann Oper Res 2020 Resource efficient for hybrid fiber-wireless communications links in access networks with multi response optimization algorithm AW YKhang JA JAlsayaydeh SMIdrus JA B MGani WAIndra JBPusppanathan ARPN Journal of Engineering and Applied Sciences 16 1 2021 Development of programmable home security using GSM system for early prevention JA JAlsayaydeh AAziz AI ARahman SN SSalim MZainon ZABaharudin MIAbbasi AW YKhang 2021 16 Development of vehicle door security using smart tag and fingerprint system JA JAlsayaydeh WA YKhang WAIndra JBPusppanathan VShkarupylo AK MZakir Hossain SSaravanan ARPN Journal of Engineering and Applied Sciences 9 1 2019 Development of smart dustbin by using apps JA JAlsayaydeh WA YKhang WAIndra VShkarupylo JJayasundar ARPN Journal of Engineering and Applied Sciences 14 21 2019 Risk management for nuclear medical department using reinforcement learning algorithms GParagliola MNaeem 10.1007/s40860-019-00084-z J Reliable Intell Environ 5 2019 Multi-step medical image segmentation based on reinforcement learning ZTian XSi YZheng 10.1007/s12652-020-01905-3 J Ambient Intell Human Comput 2020 Reinforcement learning for medical information processing over heterogeneous networks AKishor CChakraborty WJeberson 10.1007/s11042-021-10840-0 Multimed Tools Appl 80 2021 Software Agent with Reinforcement Learning Approach for Medical Image Segmentation MChitsaz CSengWoo 10.1007/s11390-011-9431-8 J. Comput. Sci. Technol 26 2011 An approach towards missing data management using improved GRNN-SGTM ensemble method IIzonin RTkachenko VVerhun KZub 10.1016/j.jestch.2020.10.005 Engineering Science and Technology, an International Journal 24 3 2021 Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method IIzonin RTkachenko IDronyuk PTkachenko MGregus MRashkevych 10.3934/mbe.2021132 33892562 Math Biosci Eng 18 3 2021 Neuro-Fuzzy Diagnostics Systems Based on SGTM Neural-Like Structure and T-Controller RTkachenko IIzonin PTkachenko 10.1007/978-3-030-82014-5_47 Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2021 Lecture Notes on Data Engineering and Communications Technologies SBabichev VLytvynenko

Cham

Springer 2022 77 Ensemble application of bidirectional LSTM and GRU for aspect category detection with imbalanced data JAKumar SAbirami 10.1007/s00521-021-06100-9 Neural Comput & Applic 2021 RNN-LSTM-GRU based language transformation AKhan ASarfaraz 10.1007/s00500-019-04281-z Soft Comput 23 2019 Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients MLu ZShahn DSow FDoshi-Velez LHLehman Proceedings of the AMIA Annual Symposium the AMIA Annual Symposium

Rockville

2020. 2021 Jan 25 Deep Recurrent Q-Learning for Partially Observable MDPs MHausknecht PStone AAAI Fall Symposia 2015 Continuous control with deep reinforcement learning TPLillicrap JJHunt APritzel NMHeess TErez YTassa DSilver DWierstra CoRR 2016 Learning the Dynamic Treatment Regimes from Medical Registry Data through Deep Q-network ALiu NLiu YLogan B 10.1038/s41598-018-37142-0 Sci Rep 9 1495 2019 Dueling Deep-Q-Network Based Delay-Aware Cache Update Policy for Mobile Users in Fog Radio Access Networks BGuo XZhang QSheng HYang 10.1109/ACCESS.2020.2964258 IEEE Access 8 2020 A pair of interrelated neural networks in Deep Q-Network 2020 Mixing policy gradient and Q-learning GDelétang 2019 Deep Reinforcement Learning: Pong from Pixels Karpathy 2016 Catch me if you can: A simple english explanation of GANs or Dueling neural-nets GKesari 2018 Dueling Deep Q Networks CYoon Dueling Network Architectures for Deep Reinforcement Learning 2019 Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed 2018 Self Learning AI-Agents Part II: Deep Q-Learning Oppermann 2018 Understanding Q-Learning, the Cliff Walking problem LVazquez 2018 Using the Actor-Critic method for population diversity in neuroevolutionary synthesis SLeoshchenko AOliinyk SSubbotin VShkarupylo Proceedings of the 2nd International Workshop on Intelligent Information Technologies and Systems of Information Security (IntelITSIS'2021) the 2nd International Workshop on Intelligent Information Technologies and Systems of Information Security (IntelITSIS'2021) CEUR-WS 2021 Combinatorial optimization problems solving based on evolutionary approach AOliinyk IFedorchenko AStepanenko MRud DGoncharenko 10.1109/CADSM.2019.8779290 Proceedings of the 15th International Conference on the Experience of Designing and Application of CAD Systems the 15th International Conference on the Experience of Designing and Application of CAD Systems

Lviv, Ukraine

IEEE 2019 Development of genetic methods for predicting the incidence of volumes of emissions of pollutants in air AOliinyk IFedorchenko AStepanenko AKatschan YFedorchenko AKharchenko DGoncharenko Proceedings of the 2nd International Workshop on Informatics & Data-Driven Medicine, IDDM 2019 CEUR-WS the 2nd International Workshop on Informatics & Data-Driven Medicine, IDDM 2019 2019 A Using modern architectures of recurrent neural networks for technical diagnosis of complex systems SLeoshchenko AOliinyk SSubbotin TZaiko 10.1109/INFOCOMMST.2018.8632015 Proceedings of the 5th International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PICST2018) the 5th International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PICST2018)

Kharkiv, Ukraine

IEEE 2018 Synthesis of artificial neural networks using a modified genetic algorithm SLeoshchenko AOliinyk SSubbotin NGorobii TZaiko Proceedings of the 1st International Workshop on Informatics & Data-Driven Medicine, IDDM 2018 the 1st International Workshop on Informatics & Data-Driven Medicine, IDDM 2018 CEUR-WS 2018 Using Recurrent Neural Networks for Data-Centric Business SLeoshchenko AOliinyk SSubbotin TZaiko 10.1007/978-3-030-35649-1_4 Data-Centric Business and Applications Lecture Notes on Data Engineering and Communications Technologies DAgeyev TRadivilova NKryvinska

Cham

Springer 42 Pneumonia severity index in viral community acquired pneumonia in adults M.-AKim JSeok CWPark W.-ILee Choi 10.1371/journal.pone.0210102 PLoS One 14 3 2019 COVID-19 interstitial pneumonia: monitoring the clinical course in survivors GRaghu KWilson 10.1016/S2213-2600(20)30349-0 The Lancet Respiratory Medicine 8 9 2020 COVID-19 pneumonia: A review of typical CT findings and differential diagnosis CHani NHTrieu ISaab SDangeard SBennani GChassagnon M.-PRevel 10.1016/j.diii.2020.03.014 Diagnostic and Interventional Imaging 101 5 2020