Machine learning assessment1 Vladimir Konyukhov1, Daria Musatova2, Anna Zueva3, Alexey Sorokin4 and Serik Toygambayev5 1 Russian State University of Physical Education, Sport, Youth and Tourism, 4, Lilac Boule- vard house, Moscow, 105122, Russian Federation 2 Lomonosov Moscow State University, 1, Leninskie Gory, Moscow, 119991, Russia 3 Financial University under the Government of the Russian Federation, 49, Leningradsky ave- nue, Moscow, 125993, Russian Federation 4 Academy of Civil Protection of the Ministry of Emergency Situations of Russia, 1, Soko- lovskaya, Khimki, 141435, Russian Federation 5 Russian State Agrarian University – Moscow Timiryazev Agricultural Academy, Larch Alley str., 16A, bldg. 3, sq. 409, Moscow, 127550, Russian Federation nk-kfea@mail.ru Abstract. This article discusses the concept of machine learning, its main char- acteristics and types, as well as the application of machine learning to artificial intelligence units. The research raises an important question about the legal regulation of providing access to data in the machine learning of artificial intel- ligence units. Keywords: machine learning, data access, artificial intelligence, data, per- sonal data, algorithm, legal regulation. 1 Introduction Machine learning nowadays plays a very important role in people lives and definitely is going to take a leading role in the future. Specialists in the development of learning algorithms are already considered among the most in-demand professions. Informa- tion is a basis for data and due to the fact that there is a lot of it makes hard for people to over think and study it using only their mind. But how does it work? Firstly, ma- chine learning optimize data by selecting information and putting it in order using a spectrum or range. This makes it more convenient for a person because is decreases an amount of time needed to do it manually, so machine learning technologies are in great demand today. Machine learning technologies are being actively implemented in such important areas for society as medicine, transport, education, and agriculture. In practice, this process is hampered by objective problems, many of which lie in the legal field. The article discusses some of them [1-6]. 1 Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). 2 Materials and methods To consider the concept of machine learning and its classifications, as well as to find a way to regulate the access to data used in machine learning of artificial intelligence units in the legal sphere, the author uses such methods as general scientific (deduc- tion), special (structural and functional) and private legal (formal legal, comparative legal) methods. This problem is studied by a lot of legal scholars McCarthy J., Robertson J., Mitcheland T. others. 3 Results The term “Machine learning” stands for a data analysis method that helps to auto- mates the process of creating an analytical model. This autoimmunization improves with time as artificial intelligence is made up in such way that it can learn and adapt through experience. This process of collecting information and statistics assembled by artificial intelligence with the help of which it can make predictions of any type is usually messed up with the term “data mining” [7]. But there is a difference that con- sists in the fact that data searching is used in cases where we need to discover un- known or hidden elements and to sum up this information and the information that is brought in data mining is specifically extracted for people and in machine learning artificial intelligence uses this information for improving it own processes [8]. Creat- ing and improving an algorithms of machine learning is the main essence of present- ing new data [2]. Machine learning methods include four categories [6]: ─ Supervised machine learning algorithms. They act on the principle that they apply previously analyzed information to new data, using labeled examples to predict fu- ture events. The machine learning algorithm creates an inferred function to predict the output values, starting with the analysis of a known training data set. The sys- tem is able to provide targets for any new input after sufficient preparation. The learning algorithm can also compare its output signal with the correct, assumed output signal and find errors to modify the model accordingly. ─ In contrast, when the information used for training is not classified or labeled, un- supervised machine learning algorithms are used. Unsupervised learning studies how the system derives functions describing hidden structures from unlabeled data. The system will not find the correct results, but will examine the data and draw conclusions from the data records in order to describe the hidden structure in the unlabeled data. ─ Reinforcement machine learning algorithms is a type of a learning method of arti- ficial intelligence that produces actions and finds errors or rewards. The software interacts with a modern day world and its changes due to this it has to deal with a lot of specific tasks. The example of this can be a self-driving car and its systems that controls the distance to the next car. The better the process of preventing er- rors in the work the correctly the process of reinforcement machine learning is and makes artificial intelligence determine the ideal behavior. The choice of agent to find out which action is the best, simple reward feedback is needed-this is called a reinforcement signal. ─ Semi-supervised machine learning algorithm is balance between supervised and unsupervised learning, because they use both labeled and unlabeled data for train- ing – usually a small amount of labeled data and a large amount of unlabeled data. This greatly improves the accuracy of training. When the generated labeled data requires qualified and appropriate training/learning resources, semi-supervised learning is usually chosen. Otherwise, obtaining unlabeled data usually does not require additional resources [9-15]. Machine learning is at the base of the decision-making system of artificial intelli- gence algorithms. Learning is present at different stages of the AI life cycle, and the subsequent accuracy of its work depends on how the learning process is built. The data on which the algorithm is trained forms its experience, which means a direct link between the data for training and what decisions will eventually be made by the algo- rithm [4]. A variety of AI systems and robots can only learn effectively if they are provided with a very large amount of data for appropriate processing. The more data is loaded into the algorithm, the more effective the learning process will be. Today, there is a steady trend in the world towards the legislative allocation of an increasing amount of data, access to which is restricted: personal data, medical data, geolocation and information containing the secret of communication, other information that can primarily serve as identification of subjects and objects [5]. Often, it is the use of such data that is associated with the main breakthroughs in the field of machine learning. In this regard, the question arises: should access to data that is necessary for training and for solving problems in socially important areas be provided on special, simpli- fied grounds? 4 Discussion According to a number of scientists, to solve the problem, it is necessary to develop special rules that would define the limits of the use of AI data and allow us to main- tain a balance between the principle of preserving privacy and the availability of a variety of human data necessary for the development of machine learning technology [1; 12]. The data can be protected by means of “safe harbour”, the need to obtain the consent of the data subject, as well as the requirements for minimizing the use of data. Also, one of the ways to solve this problem is the depersonalization of personal data. However, depersonalization of personal data does not always guarantee their full pro- tection. There are a number of cases when comparing depersonalized data with each other, it was possible to reveal the subjects of such data [16-20]. An important factor in this area is also the fact that large corporations have much more data processing capabilities, they have access to more data and are automati- cally in the most advantageous position, which indicates monopolization. This leads to the creation of a dominant position for a small number of large companies that use Both to collect information about their users and in their actions are often guided only by their own interests and internal regulations [3]. To solve the problem of artificial intelligence bias, it is necessary to ensure that the data provided for training is as objective as possible [9]. And it should not be trained on any available information: it is necessary to take a responsible approach to the selection of data and choose only those that exclude subjective assessments as much as possible. In this regard, it is necessary to develop standards for the data on which AI systems will be trained. The data used must be checked for compliance with this standard, which must meet the requirements of applicable EU law, including the GDPR [21-24]. The machine learning process should be organized only on the basis of reliable data and only using scientifically-based and proven algorithms, while the amount of data for training should be obviously sufficient, as well as a ban on illegal interfer- ence in the learning processes should be established and a complete and reliable re- cording of all information collected and processed by artificial intelligence, and in- formation about the choice and adoption of all its decisions in the learning process should be organized. 5 Conclusion The questions that arise in connection with machine learning make us think about a global problem: already there is a clear conflict between the need to respect basic hu- man rights – the right to privacy, restriction of access to personal data, on the other hand, the need in some special situations to sacrifice such rights for the sake of scien- tific progress and the development of society as a whole. Of course, in this conflict, the key task is to find a balance of interests: it is necessary to find a "point of balance" between these two vectors of technology development. In any case, the training of algorithms should be based on ensuring respect for fun- damental human rights. References 1. Asaro, P.: Robots and Responsibility. Legal Perspective, 11-14 (2007). 2. Dorschel A.: Rethinking Data Privacy: The Impact of Machine Learning. Luminovo A. I., 211-214 (2019). 3. Kingston, J.K.C.: Artificial Intelligence and Legal Liability. Research and Development in Intelligent Systems XXXIII: Incorporating Applications and Innovations in Intelligent Systems XXIV, 270–280 (2016). 4. Krensky, P., Hare, J.: Hype Cycle for Data Science and Machine Learning, 47-52 Gartne (2018). 5. Solum, L.: Legal Personhood for Artificial Intelligences. North Carolina Law Review, 1231-1287 (April 1992). 6. Lipton, Z., Steinhardt, J.: Troubling trends in machine learning scholarship. ICML, 110- 114 (2018). 7. Mitchell, T.: Machine Learning. A Guide to Current Research. Tom M. Mitchell, Jaime G. Carbonell, Ryszard S. Michalski (Eds.). Springer Science & Business Media, 178-182 (1986). 8. Robertson, J.: Human Rights vs. Robot Rights: Forecasts from Japan. Critical Asian Studies, 571–598 (2014). 9. Willick Marshall S. Artificial Intelligence: Some Legal Approaches and Implications, 134-142 (1983). 10. McCarthy, J.: Machine Learning. What it is and why it matters, https://www.sas.com/en_us/insights/ analytics/machine-learning.html, last accessed 2021/06/20. 11. Zhichkin, K., Nosov, V., Zhichkina, L.: The production costs calculation automation for planning the crops production parameters. CEUR Workshop Proceedings 2843, 20 (2021). 12. McCarthy, J. What is artificial intelligence? http://www- formal.stanford.edu/jmc/whatisai/, last accessed 2021/06/20. 13. Sadriddinov, M.I., Mezina, T.V., Morkovkin, D.E., Romanova, Ju.A., Gibadullin, A.A.: Assessment of technological development and economic sustainability of domestic indus- try in modern conditions. IOP Conference Series: Materials Science and Engineering, 734, 012051 (2020). 14. Fokicheva, A., Abramov, V., Istomin, E., Sokolov, A., Goloskvskaya, E., Levina, A.: Machine learning with digital generators for training sets including proteins modeling in the context of big data and blockchain technologies. Proceedings of the 33rd International Business Information Management Association Conference, IBIMA 2019: Education Ex- cellence and Innovation Management through Vision, 2020, 8638-8642 (2019). 15. Zhichkin, K., Nosov, V., Zhichkina, L., Panchenko, V., Zueva, E., Vorob'eva, D.: Model- ling of state support for biodiesel production. E3S Web of Conferences 203, 05022 (2020). 16. Ermakova, A., Oznobihina, L., Avilova, T.: Analysis of the current state and features of natural resource potential management. E3S Web of Conferences, 157, 3005 (2020). 17. Khayrzoda, S., Morkovkin, D., Gibadullin, A., Elina, O., Kolchina, E.: Assessment of the innovative development of agriculture in Russia. E3S Web of Conferences 176, 05007 (2020). 18. Zhichkin, K., Nosov, V., Zhichkina, L., Pavlyukova, A., Korobova, L.: Modeling the production activity of personal subsidiary plots in the regional food security system. IOP Conference Series: Earth and Environmental Science 659, 012005 (2021). 19. Istomin, E.P., Burlov, V.G., Abramov, V.M., Sokolov, A.G., Bidenko, S.I.: Decision support model within environmental economics. International Multidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Management, SGEM 19(5.3), 139-145 (2019). 20. Zimnukhova, D.I., Zubkova, G.A., Morkovkin, D.E., Stroev, P.V., Gibadullin, A.A.: Management and development of digital technologies in the electric power industry of Russia. Journal of Physics: Conference Series 1399, 033097 (2019). 21. Istomin, E.P., Abramov, V.M., Lepeshkin, O.M., Baikov, E.A., Bidenko, S.I.: Web-based tools for natural risk management while large environmental projects. International Mul- tidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Man- agement, SGEM, 19, 953-960 (2019). 22. Morkovkin D., Lopatkin D., Sadriddinov M., Shushunova T, Gibadullin A., Golikova O. Assessment of innovation activity in the countries of the world // E3S Web of Confer- ences. № 157. Pp. 04015 (2020). 23. An, D., Song, Y.,Carr, M.: A comparison of two models of creativity: Divergent thinking and creative expert performance. Personality and Individual Differ- ences, 90, 78-84 (2016). 24. Popova, A., Abramov, V., Popov, N., Istomin, E., Sokolov, A., Levina, A.: Blockchain and big data technologies within geo-information support for arctic projects. Proceedings of the 33rd International Business Information Management Association Conference, IBIMA 2019: Education Excellence and Innovation Management through Vision, 2020 2019, 8575-8579 (2019).