Modified Funk SVD-Augmented Recommender Systems for Advancing Visual Data Processing in Industrial IoT Olena Hordiichuk-Bublivska1,∗,†, Halyna Beshley1,2,† Iryna Ivanochko2,†, Mykola Beshley1,2,† and Orest Kochan1,∗,† 1 Lviv Polytechnic National University, Bandera Str. 12, Lviv, 79013, Ukraine 2 Comenius University in Bratislava, 82005 Bratislava 25, Slovakia Abstract Industrial recommender systems are currently an active area of research and development, with many businesses across various industries leveraging them to improve their customer experience and increase their revenue. The growth of 5G industrial networks has led to a massive increase in the volume of visual data (images, video information) generated from various sources, making it challenging to process and analyze the data. Additionally, traditional recommendation systems used in these networks may not be able to handle the massive amount of data and provide accurate recommendations. This article proposes a modified Funk Singular-Value Decomposition (SVD) approach for enhancing collaborative filtering in recommendation systems for 5G industrial networks. The proposed approach can effectively reduce the dimensionality of the data and capture the underlying patterns and relationships between users and items, thereby enhancing the performance of the recommendation system. According to the study results, it was determined that when using less data about users, the speed of providing recommendations increases. It was also established that the accuracy of calculations improves when additional item features are included. The simultaneous use of two modifications allows to improve the accuracy of providing recommendations by 2%, as well as to reduce the duration of calculations by an average of 10-15%. The proposed modifications of the FunkSVD algorithm can improve the level of providing user recommendations according to their requirements. The research results can be used to optimize the operation of industrial and 5G systems with dynamic parameter changes. Keywords recommender systems, collaborative filtering, big data, funk singular-value decomposition algorithm, 5G industrial network paper1 1. Introduction The modernization of industrial systems has brought significant changes to the way production processes are managed. The Industry 4.0 concept has enabled the digitization of 1 BAIT’2024: The 1st International Workshop on “Bioinformatics and applied information technologies”, October 02-04, 2024, Zboriv, Ukraine ∗ Corresponding author. † These authors contributed equally. obublivska@gmail.com (O. Hordiichuk-Bublivska); halyna.v.beshlei@lpnu.ua (H. Beshley); irene.ivanochko@gmail.com (I. Ivanochko); mykola.i.beshlei@lpnu.ua (M. Beshley); orestvk@gmail.com (O. Kochan) 0000-0002-6439-549X (O. Hordiichuk-Bublivska); 0000-0001-5392-3499 (H. Beshley); 0000-0002-1936-968X (I. Ivanochko); 0000-0002-7122-2319 (M. Beshley); 0000-0002-3164-3821 (O. Kochan) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings manufacturing, leading to the implementation of modern information systems that solve several problems to deliver the appropriate quality of service. In the industrial system, the problem of data collection from various end devices, storage, and analysis for further decision- making must be solved [1]. The quality of user service is one of the most important criteria for evaluating the efficiency of industrial systems. Defined characteristics of service provision for one user can be used to improve it for others. The more sources of information are analyzed, the easier it is to determine general patterns and possible problems in the operation of industrial systems. At the same time, the overabundance of data constantly arriving from various end devices is not always critical for computing. Data analysis and optimization methods should be used [2]. Storing and processing data received from various distributed sources is extremely important for modern industrial systems. Cloud technologies are used to improve the efficiency of big data analysis. Cloud manufacturing (CMfg) is an important component of the Industry 4.0 concept [3]. CMfg allows an industrial system to monitor its resources and quickly access them. Also, users can exchange data with other and global control nodes, saving them on cloud servers [4]. Recommender systems are widely used in various industries, including industrial networks, to provide personalized recommendations to users [5]. In industrial networks, recommender systems can be used to recommend products, services, and suppliers to businesses, based on their past purchases, interests, and behavior [6]. Overall, recommender systems have the potential to provide significant benefits to businesses in industrial networks by improving the efficiency of purchasing and reducing the time and effort required to find suppliers and products. In an industrial setting, recommender systems can be used in various ways, such as recommending maintenance schedules for machines, optimizing production processes, recommending suppliers for raw materials, and more. With the high bandwidth and low latency of 5G networks, these systems can provide real-time recommendations to users, allowing them to make decisions quickly and efficiently [7]. However, implementing recommender systems in this context requires careful consideration of the unique challenges and limitations of industrial networks. Since most endpoints in the Industrial Internet of Things (IIoT) are located remotely from each other, data is transmitted over wireless communication channels. 5G mobile communication systems allow serving users quickly and efficiently, providing them with a wide range of communication services [8, 9]. It often happens that the amount of information significantly exceeds the 5G bandwidth, which leads to delays or the need to reduce the quality of the transmitted content. Thus, it is necessary to solve the problem of data transmission to users, data optimization, resource allocation and system load. It is also important to consider the specific needs of the industrial setting when developing a recommender system. This may involve identifying the most relevant data sources, selecting appropriate algorithms for data analysis, and incorporating domain-specific knowledge into the recommender system. In 5G industrial networks, there can be a large number of users and items, which can result in sparse data. Sparse data refers to situations where there are few ratings or interactions between users and items. This can make it challenging to generate accurate recommendations using traditional collaborative filtering algorithms. The collection of data on sales statistics and the use of services by different categories of users allows us to better understand their needs and adapt production processes to them [10- 12]. The list of goods and services is constantly expanding, and the requirements for the quality of service are growing, so it is necessary to constantly look for new methods of data processing. The novelty of this work lies in the proposal of modifications to the Funk SVD algorithm for processing big data in industrial settings, with the aim of developing effective 5G industrial recommender systems. The main contribution of the study is in two aspects. First, we propose to use only a part of the user's data to generate recommendations, which reduces the load on 5G channels and the duration of calculations, while maintaining a high level of accuracy in data processing. Second, we demonstrate the effectiveness of incorporating additional item features to improve the accuracy of recommender systems. Overall, the proposed data optimization techniques have the potential to revolutionize the way 5G industrial networks operate, by enabling the development of more accurate and fast recommender systems. The study's results highlight the prospect of using modified FunkSVD algorithms to improve the quality of service for different types of 5G systems users, leading to increased efficiency, reduced costs, and optimized processes The rest of the paper is organized as follows. In Section 2, we provide a literature review of collaborative filtering and its variants. In Section 3, we introduce the challenges and solutions for 5G industrial recommendation systems. In Section 4, we compare the SVD and Funk SVD algorithms for data optimization in industrial 5G recommender systems. In Section 5 and Section 6, we describe the proposed enhanced collaborative filtering approach using modifications of the Funk SVD. Section 7 contains the results and discussion, while the conclusion of the paper is presented in Section 8. 2. Related Works We will analyze current research in this field. E.P. Xing et al. [13] defined the problem of big data processing in ML and proposed a framework for systematizing information. In [14], V. Vashishth et al. analyzed the internment of cloud technologies and IIoT and proposed a method for optimal access to remote resources. D. Tarek et al. proposed a distributed protocol for managing large traffic volumes in IoT systems [15]. In [16], S. Sennan et al. studied the problem of data clustering in IoT systems and proposed an optimization algorithm for Cluster Head (CH) selection. In [17], M.C. Adikari and T.P Amalan proposed a software model with optimal operating parameters for big data analytics and ML (ML) in FMCG (Fast Moving Consumer Goods). In [18], B. Raviteja et al. investigated the relevance of using ML algorithms to improve the efficiency of Supply Chain Management (SCM) systems. G.K. Singh et al. in [19] considered the importance of Big Data analysis for solving business and logistics management problems. H. Fan et al. analyze the operation of IIoT systems as a component of the Industry 4.0 concept and propose a federated learning-based privacy-preserving data aggregation scheme [20]. In [21] W. Gao et al. determine the importance of applying federated learning in IIoT systems and propose a resource allocation scheme that allows for choosing the best learning devices. Liang et al. in [22] also investigate the problem of big data processing in IIoT systems and the selection of optimal end devices for training. The authors also propose a Deep Q-Network (DQN)-based scheme for optimal resource allocation. In [23], K. Shah et al. analyze the features of the application of recommender systems for intelligent user selection of goods and services. Z. Guo et al. in [24] investigate the use of recommender systems for IIoT. The authors offer their own implicit feedback-based group recommender system. B. Wu et al. consider the problem of determining the relationships between users and goods in IoT systems and provide a framework for improving item recommendation [25]. In [26], S.M. Kasongo investigated the importance of intrusion detection in IIoT systems and proposed their Intrusion Detection Systems (IDSs), which are more efficient and reliable than existing ones. A. Simeone et al. in [27] analyze the service of users of intelligent production and proposed the provision of personal recommendations using an intelligent decision-making recommender system. In [28], T. Vafeiadis et al. offer a smart recommender system that analyzes data from various IoT devices and enhances decision support. Y. Fu et al. investigate the problem of intelligent resource management in 5G networks [29]. In [30], A. Vulpe et al. consider performance indicators of the LTE RAN network to detect possible malfunctions of the terminal equipment. The authors also offer their analytics framework that can be used for 5G networks. Yongchang Wang and L. Zhu [31] investigate and compare different big data optimization methods. The authors note the advantages of the SVD algorithm for determining the most important information for processing. C. Zhang et al. [32] investigate the methods of automated processing of big data and propose a feature extraction method that uses SVD and PCA (Principal component analysis) algorithms. K. Birul offers a modified Funk SVD that uses data on all products and only one user to form recommendations [33]. S. Guo et al. consider the Funk SVD algorithm for improving the performance of recommender systems [34]. As is clear from the analyzed sources, processing big data in industrial and 5G systems is still an actual problem. However, user data analysis using recommender systems, considering dynamic changes in service quality requirements, system performance indicators, etc., have not been sufficiently researched. In our previous work [35], we investigated improving the performance of recommender systems and proposed a modified federated SVD, which demonstrated the accuracy and reliability of the results. Extending the research, we used the FunkSVD algorithm to process sparse data in this work. We also proposed modifying the Funk SVD algorithm for its more efficient use under different system parameters and loads in 5G systems. The proposed modifications improve both the accuracy and speed of providing recommendations to users and the analytics of production processes. 3. Challenges and Solutions for 5G Industrial Recommendation Systems 5G industrial recommendation systems have the potential to revolutionize the way factories and industrial plants operate, by enabling real-time data collection and analysis, predictive maintenance, and optimized production processes. Overall, the successful implementation of 5G industrial recommendation systems requires a combination of advanced technologies, secure data transmission protocols, and efficient data processing and analysis algorithms. For the efficient operation of ML algorithms, it is necessary to use large amounts of information. The development of information technologies and rapid digitalization contribute to the daily generation of significant data sets from various sources. Thanks to such data, it is possible to better identify features characteristic of individual tasks to accurately determine the result. However, the excess of information still complicates the work of computational algorithms. The processing and storage of large data arrays lead to the consumption of computing resources and the slowing down of calculations. The diversity of information sources also requires the adaptation of ML algorithms and distributed data processing [36-38]. Federated learning is one of the methods of effective organization of distributed data processing using ML [39]. Instead of sending all the data from different distributed nodes to each other, they are processed on local nodes. Only the results of the calculations are sent to the controller, which updates the entire machine-learning system. Federated learning is effectively used in industrial systems, as it solves the problem of processing data collected from different end devices. This improves the reliability and privacy of users' private information, as it does not need to be transmitted over the network [40-42]. Statistical analysis is extremely important for operating commercial systems that provide users with certain goods or services. Based on the previous actions of customers, it is possible to determine how satisfied they are with the services they use. Recommender systems (RS), are used to establish relationships between users and goods or services, which are generally called items. We can often see examples of recommender systems in everyday life. When visiting websites or applications, users are offered advertisements for items that may interest them in some way. For example, when searching for a certain object on the site, the recommender system also identifies similar ones according to certain parameters. Similar products or services are designated as “you may like this,” “other products you may like,” etc. There are different approaches to determining the most appropriate items for users. Recommendations can be personalized, that is, determined by taking into account the characteristics of a specific user. Non-personalized recommendations common to a certain group of users can also be calculated. For the work of recommender systems, special methods are used for processing data from users. The result of RS is suggestions to users about new products or services they are most probably to like. The most common approaches to the work of recommender systems are: Content-based, which analyzes the similarity of different products to each other. If a user liked a certain product or service before, there is a high probability that he will highly rate similar ones in the future. The advantage of this approach is the simplicity of implementation. The disadvantage is that new and unlike other products have little chance of getting into the recommendations; Collaborative filtering forms recommendations by analyzing user profiles. In case of finding similarities between two or more users, they are recommended products or services previously chosen by other group members. This approach demonstrates high efficiency. At the same time, preparing a recommender system in advance is necessary, providing information about users and their preferences. In the absence of information, problems arise in the formation of recommendations; A hybrid approach to the formation of recommendations, combining the above [43]. Intelligent production systems focus not only on producing certain products but also on their sale to end users. Determining the level of interest in certain goods or services is extremely important for the system's effective functioning [44, 45]. A situation often arises when a product is not in demand among users due to its features. As a result of monitoring feedback and sales statistics, possible modifications and improvements of services and goods can be identified and immediately implemented. Automating the management of the intelligent production system contributes to the prompt resolution of existing problems and the prevention of new ones. 5G Wireless Network is an important component of smart manufacturing. Since many final IoT devices are located at a considerable distance from each other, wireless communication allows to combine them into a single data exchange system. End users can connect to the network when they need to send or receive information. 5G technology makes it possible to provide various communication services at high speed and ensure the required quality of service. However, large amounts of data exchanged between users and the system create a significant load on communication channels and processing devices. As a result, some of the services may be of lower quality or may not be available at all. The architecture of the industrial system using the 5G network for visual data communication from IIoT devices is shown in Fig.1. Figure 1: The architecture of the industrial system using the 5G network for visual data transmission. ML and artificial intelligence methods are used to optimize the operation of 5G networks. Analysis of the provided services and load changes depending on various parameters helps to better adjust the system parameters. ML methods make it possible to distribute network resources according to user needs and quickly solve problem situations. Thus, the mobile network can flexibly adapt to different operating conditions and determine priorities in user service. Having determined the areas of the mobile communication system that are the most loaded, it can be automatically attracted additional resources. For this purpose, constant monitoring of the load and performance indicators of the system is carried out. The process of processing user requests in 5G systems is shown in Fig.2. Figure 2: Processing user requests in 5G systems. The different types of traffic provided to users of 5G systems also require special attention to the provision of service priorities and allocating necessary communication resources [46]. Multimedia data transmission uses more power than voice messages. The reliability and confidentiality of user data should also be ensured when forwarding them for processing. ML methods solve the problems of big data optimization, analysis, and decision-making. Due to many end users, ML algorithms receive much information to train. However, since the data often contains redundancy and is too cumbersome to calculate, it should be pre-processed, and the most important ones should be selected. 4. Comprehensive Study of SVD and Funk SVD Algorithms for Visual Data Optimization in Industrial 5G Recommender Systems The improvement of industrial systems made it necessary to process huge data sets. The information coming from different devices is often of various types, unstructured, and of large volume. Big data processing devices usually cannot process it efficiently, so users do not get timely results. For faster calculations, data should be pre-prepared and brought to a clear and concise form. Optimizing big data and discarding redundancy is one of the most relevant for modern industrial systems. ML, Deep Learning, and Data Mining methods are used for intelligent information processing. There are many ways to transform data of a higher dimension into a smaller one while preserving its properties with a certain level of accuracy. For example, one of the most popular methods is Principal component analysis (PCA), which solves the problem of dimensionality reduction while preserving as many properties of the original data as possible. PCA represents data sequences as a set of interdependent variables, the principal components (PCs). Then, the most informative variables are searched, and unimportant ones are discarded. The Latent semantic analysis (LSA) algorithm is similar to PCA, but it is used more to analyze text sequences. Both methods can be extended to the Singular value decomposition (SVD) algorithm, an improved way to identify the most important data and discard redundancy. SVD decomposes the original matrix into the product of three mutually orthogonal submatrices. This method is optimal for processing and optimizing matrices that are not rectangular. Compared to the PCA algorithm, SVD more effectively determines the fundamental properties in data arrays and considers their diversity. Singular Value Decomposition involves representing the initial data matrix A ( m , n ) as: A=P × Δ × Q , (1) where matrix P has dimension ( m , m ), matrix Q has dimension ( n , n ), matrix Δ has dimension ( m , n ) and demonstrates the relationship between P and Q . SVD is widely used for recommender systems and has repeatedly proven its effectiveness [31]. However, in real systems where large data from various devices are processed, several problems arise. First, the information should be brought to a form suitable for forming recommendations. Thus, all data on the interaction of users and items should be presented as a numerical score. Methods of pre-processing and optimization of information are used to solve this problem. It is also important to consider that only under ideal conditions we can get almost all the data on the evaluations of each item from all users and form the recommendations matrix. In such a table, the rows correspond to users, the columns correspond to items, and at the intersection, there are ratings, that is, evaluations. In real recommender systems, we receive only part of the information about users' interest in certain products. Many products have not yet received customer reviews, and recently registered users do not have a purchase history. Recommender systems are designed to fill empty cells in recommendation tables but should also process data more compactly and efficiently. Recommendation tables usually have dimensions (m,n), where m is the number of users, and n is the items. Due to the sparseness of the data matrices, it does not make sense to process them all since they do not carry information for forming recommendations but use a lot of computing resources. For determining the most relevant services for specific categories, recommender systems should be used that establish correspondences between users and products. Thus, the efficiency of service delivery is improved. 5G recommender systems have the problem of processing extremely large data, which are often not extremely important for forming recommendations. Recommendation matrices look like this: a11 N / A … a1 m a a22 … N/A A ( n , m )= 21 , (2) … … … … an 1 N / A … anm where aij is a value of interaction between user i and item j, N/A is a data Not Available or unknown. An excess of information leads to a slowing down of calculations and an overloading of devices. Sparse data matrices are better optimized by discarding empty cells or those containing unimportant information. For sparse data processing, the paper proposes to improve the existing Funk SVD algorithm, which decomposes the initial matrix A into the product of two submatrices [34, 35]: A ( n , m )=M ( n , k )× N ( k , m )T , (3) where k < n and k < m. The FunkSVD algorithm uses the Stochastic Gradient Descent method to gradually reduce the error between the original matrix and the resulting one. Thus, it is possible to determine the Sum of the Squared Error for each decomposition: 2 ∑ ( ai , j −mi × nTj )2 +(|mi| +|nTj | ), 2 min (4) i∈ n, j∈ m where ai , j is the elements of the original recommendation matrix, ni , m j is the elements of matrices N and M accordingly. ' The values of the element of the initial matrix ai , j are obtained by multiplying the rows and columns with the corresponding indices: a'i , j = ∑ ( mi × nTj ), (5) i , j=k The calculation error can be defined as follows: Err=ai , j −a'i , j (6) We can update the matrix M element, using learning coefficient ε and correction coefficient θ : ' mi =mi +ε ( Err × 2 × n j +θ × mi ), (7) Let's also calculate the element of the matrix N : ' n j =n j +ε ( Err × 2 × mi +θ × n j ). (8) Then the users recommendation is determined as: ' ' 'T ai , j =mi × n j +d i , j , (9) d i , j is the total coefficient of deviation of user and product indicators from the total: d i , j =d i +d j +∆ , (10) where d i is the deviation from the average value for the user, d j the same for item, ∆ is the regularization factor. The accuracy and efficiency of the recommender system can be determined by presenting its confusion matrix (Table 1): Table 1 Confusion matrix for recommendation systens Actually Interested Actually Not Interested Recommended K true positive K false positive Not Recommended K falsenegative K true negative According to Table 1, K true is the probability that the recommended item interested the positive user; K false is the corresponds to the probability of a positive recommendation for an positive uninteresting item; K false is the probability of the negative recommendation for the interesting item; negative K true is the probability of the negative recommendation for the uninteresting item. negative The probability of giving a positive recommendation if the item is not interesting to the user can be represented as: K true K true = positive , (11) ¿ K true + K false positive negative Probability of giving a negative recommendation if the product is not interesting to the user: K true K true = negative , (12) ¿ K true + K false negative positive The accuracy of providing recommendations can be calculated: K true K positive = positive . (13) ¿ K true + K false positive positive However, to evaluate the effectiveness of the FunkSVD algorithm, recommendations can be directly determined Mean Absolute Error (MAE) for matrix A with dimension N : N ∑ |ai , j −a'i , j|. (14) K mean = i , j=1 ¿ |N | In the work, we use the Root Mean Square Error, (RMS Error, RMSE) to check the accuracy of the recommendation calculations: √ N ∑ ( ai , j −a'i , j )2 (15) i , j=1 K rmse ( Accuracy )= . |N | For studying the effectiveness of data processing by the FunkSVD algorithm, a program model was created in the Python programming language, and data from an open recommender system was used. For a better understanding of the algorithm's operation, data of different volumes and with different levels of sparseness were extracted. A comparison of data calculation durations by SVD and Funk SVD algorithms is shown in Fig. 3. Data matrices with different degrees of sparseness, i.e., cells filled with information, were studied. 900 800 700 Execution time, mcs 600 500 400 SVD 300 Funk SVD 200 100 0 5 10 15 20 30 50 75 Data sparsity, % Figure 3: The comparison of execution time by SVD and Funk SVD algorithms. As we can see from Fig. 3, SVD works longer, while the difference with Funk SVD increases with the increase in the percentage of data sparsity. If the system processes a lot of redundant data, Funk SVD allows to discard part of it and speed up the calculation. 5. Modified Funk SVD Approach for Improving the execution time of Recommendation Systems Although FunkSVD works quite well with sparse data, it can still be improved to speed up the calculations. According to eq. (5) the Funk SVD algorithm uses data about all users and products to determine the unknown cell value of the initial data matrix. In the paper, we proposed using different number of users δ in the calculations to modify the existing Funk SVD algorithm: a'i , j = ∑ ( mi × nTj ), (16) i=δ , j=k where δ