Parallel Optimization of Dimensionality Reduction Methods for Disease Prediction: PCA and LDA with Dask-ML

Parallel Optimization of Dimensionality Reduction Methods for Disease Prediction: PCA and LDA with Dask-ML LesiaMochurad lesia.i.mochurad@lpnu.ua Lviv Polytechnic National University

12 Bandera street 79013 Lviv Ukraine

LilianaMirchuk liliana.mirchuk.shi.2022@lpnu.ua Lviv Polytechnic National University

12 Bandera street 79013 Lviv Ukraine

AnastasiiaVeretilnyk anastasiia.veretilnyk.shi.2022@lpnu.ua Lviv Polytechnic National University

12 Bandera street 79013 Lviv Ukraine

2024 Cambridge MA USA

Parallel Optimization of Dimensionality Reduction Methods for Disease Prediction: PCA and LDA with Dask-ML 1613-0073 C8246D9B8DA0D2981F7AA165CC902B94 GROBID - A machine learning software for extracting information from scholarly documents Medical data processing machine learning ResNet-50 parallel computing сholangiocarcinoma diagnosis 1

In modern medicine, an urgent problem is to improve the results of cancer prognostication, in particular, to increase the accuracy and reduce the time to obtain a solution. In this paper, we propose to reduce the dimensionality of data at the preprocessing stage using principal component analysis (PCA) and linear discriminant analysis (LDA) methods in order to compare their performance and efficiency. It is known that these methods can be time-consuming, which is critical when solving a forecasting problem. To overcome this problem, the paper proposes to parallelise PCA and LDA methods based on Dask-ML technology. The ResNet-50 model was used to diagnose the disease. The proposed approach has achieved an accuracy of 85.2%, which is 3% higher than the results reported in previous studies. The results obtained indicate that data preprocessing and dimensionality reduction can avoid incorrectly set tasks and improve the accuracy of prediction. In addition, we were able to significantly reduce preprocessing time by parallelising the PCA method using parallel computing technology. In future research, we plan to further improve medical data processing methods, including exploring other approaches to dimensionality reduction and integrating the latest machine learning algorithms to improve prediction accuracy.

Introduction

Timely diagnostics in medicine is extremely important as it allows detecting diseases at early stages, which increases the effectiveness of treatment, reduces its cost and prevents the development of complications. Early diagnostics also improves the quality of life of patients, reduces the burden on the healthcare system, and contributes to the prevention and control of infectious diseases [1]. The development of technologies, such as artificial intelligence, increases the accuracy and speed of diagnostics, making it a key element of modern medicine [2], [3].

It has been reported [4] that patients with cancer had an overall average time to diagnosis of 156.2 (164.9) days, and 15.4% of patients waited longer than 180 days before receiving a diagnosis. Computer diagnostics based on deep learning using images of pathological tissues are often used in cancer diagnosis. However, despite the availability of databases for cancer detection, we still do not have an accurate method for predicting the disease. There are significant difficulties in histological examinations, which are very important for the diagnosis and treatment of diseases. They consist in detecting cancer in tissue images, where scientists often face inverse problems that are incorrectly posed and require special attention to solve [5]. In addition, huge amounts of medical data are generated every day, and their analysis is complicated by factors such as noise, missing data, and high dimensionality. For example, the diagnosis of malignant tumours requires the use of various sources of information [6]. To improve the work with medical data and overcome the difficulties encountered in their analysis, preprocessing is used [7]. Our analysis of the scientific sources has shown that parallelisation of preprocessing can provide better results compared to a sequential process [8].

The relevance of the research conducted in this paper is that there is a lack of efficiency in timely cancer diagnosis, which can lead to severe consequences for the health and life of patients, insufficient accuracy of the results [9] and few solutions for processing multidimensional databases consisting of a large number of images in medicine. In this study, we consider various parallel computing methods and technologies to reduce the multidimensionality of the database at the preprocessing stage, which will allow for better accuracy and not significantly increase the overall solution time.

Challenges in analysing and predicting patient diagnoses include issues such as incomplete or inaccurate data, poorly formulated models, incorrect assumptions, high data dimensionality, and lack of standards or methodological guidance. These factors can make it difficult to make accurate predictions, lead to distorted results, and degrade the quality of diagnosis. Correcting these problems requires correct data preprocessing methods, adequate mathematical models and clear standards to ensure the accuracy and reliability of predictions.

Preprocessing, as a way of solving ill-posed problems, helps to avoid complexities, in particular the 'curse of dimensionality', by reducing the multidimensionality of the database to smaller dimensions that still retain important information. This includes methods such as linear discriminant analysis (LDA) and principal component analysis (PCA), which are used to improve data quality in machine learning, in particular to increase classification and regression accuracy [10]. The optimal choice of preprocessing methods demonstrates significant improvements in prediction, emphasising the importance of this stage in data processing.

In [11], the authors propose a multidimensional choledochal database, which we used to test the effectiveness of the proposed approach. This database contains both microscopic hyperspectral images and colour RGB images in the same field of view for deep learning studies. All the images in this database have been evaluated and labelled by experienced pathologists, making them suitable for training neural networks. This database is very useful for researchers to learn new multivariate deep learning algorithms for pathological diagnosis, as it contains morphology, spectrum, and information about biochemical changes of the samples. Few three-dimensional databases for research have been published on the Internet. To date, the presented multidimensional choledochal database is the first publicly available database of choledochal pathology that contains both microscopic hyperspectral and colour RGB images with annotations of choledochal sections.

In contrast to the authors of [12], who propose a method for early detection of cholangiocarcinoma using hyperspectral images of microscopic tissues, using the ResNet-50 model, which achieves an accuracy of 82.4%. We additionally consider reducing the multidimensionality of the database using two methods: PCA and LDA. In addition, we parallelised these methods using modern parallel computing technologies Dask-ML and MPI, which allowed us to significantly improve data processing efficiency and diagnostic accuracy.

In [13], the authors use hyperspectral imaging (HSI), which offers a promising way to improve liver cancer diagnosis due to its ability to capture detailed continuous spectral and spatial information that is beyond the visible range of the human eye. Classification of cholangiocarcinoma using HSI is challenging due to its high dimensionality. To solve this problem, this article presents a network called MedisawHSI. As a result, they managed to achieve an accuracy of 93.35%. As we can see, the authors managed to achieve better accuracy compared to [12]. In our opinion, the accuracy of the results has improved because they used the division of the hyperspectral image into smaller overlapping regions, which are then classified individually based on their spectral characteristics.

In our study, we propose a method to enhance the accuracy and efficiency of cancer prognostication by reducing data dimensionality through PCA and LDA, parallelized using Dask-ML technology. This approach not only improves diagnostic accuracy but also significantly reduces preprocessing time. Similar to our efforts to address the challenges of time-consuming processes in medical data analysis, recent advancements in the Internet of Medical Things (IoMT) have also focused on improving the security and efficiency of data handling. For instance, a Timestamp-based Secret Key Generation (T-SKG) scheme has been developed for resource-constrained IoMT devices to ensure secure data transmission without direct key sharing, thereby addressing vulnerabilities in traditional key sharing mechanisms [14]. This parallel development in secure data processing complements our efforts to enhance the reliability and speed of medical diagnostics.

The purpose of our article is to compare the efficiency of reducing the multidimensionality of the database using PCA and LDA methods applied at the preprocessing stage and parallelised using Dask-ML and MPI technologies to reduce processing time and improve the accuracy of data analysis in medicine, in particular, in the diagnosis of cholangiocarcinoma.

The main contributions of the paper are as follows:

• An improved approach to reducing data dimensionality using PCA and LDA methods is proposed, contributing to the accuracy of disease prediction. • For the first time, Dask-ML technologies are used to parallelize PCA and LDA methods, significantly reducing data processing time. • A comparative analysis of the performance and efficiency of PCA and LDA methods in reducing data dimensionality to enhance forecasting results is conducted.

Methods and materials

Overview of the algorithm of the proposed approach

The proposed approach consists of two parts and is schematically presented in Figure 1:

1. First of all, we applied data preprocessing to reduce the dimensionality of our multidimensional database. This will help us to reduce the dimensionality of the database while preserving the meaningful characteristics. We consider two methods of dimensionality reduction: PCA and LDA. To determine the effectiveness of each method, we propose to parallelise them and analyse the results. To parallelise the methods, we use the Dask-ML library [15]. Dask-ML is a toolkit that provides parallelised implementations of machine learning algorithms. Specifically, for PCA, we use Dask-ML PCA, which works with Dask Array to represent data and automatically distributes computations across multiple processors or computers. 2. Next, we will work with already processed data, namely, a smaller database after applying a dimensionality reduction method such as PCA or LDA. The RestNet-50 method is used for classification, to comparatively evaluate the effectiveness of the impact of reducing the dimensionality of the database as a way to solve an incorrectly posed problem in determining the diagnosis of cancer.

Overview of sequential PCA and LDA methods and their comparative characteristics

PCA is a statistical procedure that uses an orthogonal transformation. PCA transforms a group of correlated variables into a group of uncorrelated variables. Instead of discarding weak predictors, PCA generates new predictors that are uncorrelated. But in general, PCA works better if the data set contains independent but uncorrelated predictors, and another problem is the choice of the number of principal components [16]. The main goal of LDA is to project a dataset with a large number of features into a smaller space with good class resolution. This will reduce computational costs [17]. Dimensionality reduction techniques often require intensive computation and do not easily scale to large datasets. Recent advances in high-performance measurements using physical objects such as sensors or the results of complex numerical simulations generate data of extremely high dimensionality. It is becoming increasingly difficult to process such data consistently. In our research, we found that these methods were parallelised on distributed memory machines with MPI. The results show that their structure provides very good scalability for large problem sizes across the entire range of tested processor configurations [18], [19].

First of all, when applying the PCA method, we have to convert the data into feature vectors (we represent one image as one feature vector containing the pixels of the image). Thus, we have to standardise the data so that all features have the same weight.

An important step in understanding the relationships between attributes is the covariance matrix we build for standardised data. A covariance matrix is a square matrix that contains the covariances between all pairs of variables in your data set. Covariance measures how much two variables change together. Figure 2 shows a part of the covariance matrix with a size of 100*100. In this case, we obtained:

1. The diagonal elements are equal to 1, indicating that each variable is perfectly correlated with itself. For standardised data, these values are always 1. We can see that most of the matrix elements have positive covariance, which indicates that the variables in our dataset are likely to be correlated with each other. Now we can calculate the eigenvectors and eigenvalues. We sort the eigenvectors in descending order of their eigenvalues in order to select the principal components for the algorithm. In our case, this number was 2 because we were reducing the database to two dimensions. Each dot in Figure 3 corresponds to one image. The colours of the dots differ depending on the folder to which they belong: 'L', "N" or "P".

This figure allows you to visually understand the distribution of images in three-dimensional space in terms of their height, width, and color representation in RGB channels.

Figure 3: 3D model

As we mentioned before, the multidimensional choledoch database consists of images of three species. That is why we didn't reduce the multidimensionality of the entire database at once, but separately for L, P, N. As a result, we saved the reduced images in .h5 format. Figure 4 shows the original image. An overview of the key characteristics and differences between PCA and LDA is provided in Table 1.

Formal description of the parallel PCA and LDA algorithm using Dask-ML

The next stage of our research involves parallelizing PCA and LDA using the Dask-ML library [20], which allows us to scale computations to multiple processors or computers in a cluster. This is especially useful when working with large databases.

Stages of the parallel PCA algorithm using Dask-ML: 1. Calculation of the covariance matrix

• The input data is centered by subtracting the average value of each feature, which is represented by the formula (1) 𝑋 !"#$"%"& = 𝑋 − 𝑚𝑒𝑎𝑛(𝑋);

(1) • The covariance matrix is calculated from the centered data (see (2))

𝛴 = 1 𝑛 − 1 𝑋 !"#$"%"& ' 𝑋 !"#$"%"& .(2)

Calculation of Householder coefficients

• Based on the resulting covariance matrix, we calculate the Householder coefficient required to update the matrices Q and R, where Q is the orthogonal matrix from the QR decomposition and R is the upper triangular matrix from the QR decomposition; • We calculate the norm of the vector v for its normalization, which is represented by the formula ( 3)

‖𝑣‖ = 1𝑣 ( ) + 𝑣 ) ) + ⋯ + 𝑣 # ) ;(3)

• Create a normalized vector u using the formula ( 4)

𝑢 = 𝑣 ‖𝑣‖ ;(4)

• Calculate the coefficient β used to create the Householder reflection

𝛽 = 2 𝑢 ' 𝑢 . (5)

3. Update matrices Q and R • Divide the calculations of Q and R into parallel parts;

• Each column i of the matrix A is processed separately: o Calculate a part of the matrix Q (Qi); o Calculate a part of the matrix R (Ri); • Combine the parts along the axis 1 (horizontally) to obtain the full matrices Q and R.

Calculating eigenvalues and eigenvectors

• Eigenvalues are calculated from the diagonal elements of the matrix R: 𝜆 = 𝑑𝑖𝑎𝑔(𝑅) ) ;

• Eigenvectors are calculated by solving a system of linear equations using the Gaussian method for each eigenvalue: 𝐴𝑒 * = 𝜆 * 𝑒 * . • Parallelize this process to speed up the calculations:

o Divide the array of eigenvalues into N subparts, where N is the number of threads.

o At the same time, we calculate eigenvectors.

Conversion of eigenvectors to the basis A

• Eigenvectors obtained from the matrix R, are converted to the base A by multiplying by the matrix Q, as shown in ( 6)

𝑒 + = 𝑄𝑒 , .(6)

Stages of the parallel LDA algorithm using Dask-ML:

1. Use Dask-ML to calculate averages and center data in parallel; 2. Use Dask-ML to compute the mean vectors of each class in parallel; 3. Calculate scattering matrices for each class in parallel; 4. Using Dask-ML to calculate the interclass scattering matrix in parallel; 5. In parallel, we sum the scattering matrices for each class; 6. Using Dask-ML to calculate eigenvectors and eigenvalues in parallel.

Using the ResNet-50 architecture

During the analysis of the literature, several methods of disease prediction for the selected dataset proved to be effective, namely: ResNet-50, InceptionResNetV2, Random Forest, etc. We decided to focus on the ResNet-50 method for the following reasons: − The authors of the article [12] used ResNet-50 in their research and achieved good results.

This confirms the effectiveness of this method for our purposes. − According to our analysis, the InceptionResNetV2 network was used by the authors of [9], but it was designed to work with multidimensional databases. Since our goal is to reduce the dimensionality of the database using parallel PCA and LDA algorithms, the InceptionResNetV2 method does not meet our needs. − ResNet-50 is known to be an effective method for image classification and has successful results with different types of data. This makes it suitable for our task of predicting diseases from medical images.

Thus, when choosing ResNet-50, we took into account not only the availability of this method in the study, but also its suitability for our specific goals and limitations.

The authors of [12] emphasized the importance of data preparation, which included normalization and cropping of the original images, to achieve good results. We decided to follow these steps by reducing the size of all images, which confirms the adaptability of the method to our case.

To achieve the best performance, we split the dataset into non-overlapping training and test datasets, which are divided into training (6800 images) and test (210 images).

To determine the accuracy of the ResNet-50 neural network, we calculated how many tests the neural network gave correct answers and how many did not. We considered 4 cases to calculate the results:

• True Positive (TP) is a case where a person had cancer and the neural network gave the result that the person had cancer. • True negative (TN) is a case where a person did not have cancer and the neural network determined correctly that the person really did not have cancer. • False Negative (FN) is a case where a person had cancer, but the neural network said he did not. • False Positive (FP) is a case where a person did not have cancer, but the neural network showed that they did.

To evaluate the forecasting efficiency, we chose the following metrics: Recall (the ratio of correctly identified positive cases to all actually positive cases), Precision (the ratio of correctly identified positive cases to all cases that the model identified as positive), and Accuracy (the ratio of correct predictions to the total number of observations). The formal representation of the latter is given by formulas ( 7)- (9).

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 ,(7)𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 ,(8)

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 .

(9)

Results of numerical experiments

Before presenting the results of the study, we propose to consider the characteristics of the computers on which the calculations were performed: Computer models: Apple M1 Pro, Asus VivoBook Operating systems: Windows 10, MacOS Sonoma 14.5

The amount of disk space available: 200 Гб RAM: 16 Гб Number of cores: 8 Network connection speed: 100 Мбіт/с Software tools we used to conduct our research: Clion, Kaggle editor, Google Colab. In our work, we chose the choledoch dataset [21], which consists of three files:

• L -samples with parts of cancerous areas; • N -complete cancerous areas; • P -no cancerous areas; Each file type includes three directories: • annotation -contains the coordinates of the points where there are cancerous areas;

• hyper -general description of the image, including the number of columns, rows, channels, etc; • rgb -contains all multidimensional images. This dataset is multidimensional: DB dimensionality: (1728, 2304, 3), the first number indicates the height of the images, the second -the width, and the third -the number of dimensions.

Accordingly, separately for each type of image, we calculated the execution time of the sequential algorithms when reducing the multidimensional database using PCA and LDA methods and the total execution time. The results are presented in Table 2. The execution time of the program that implements the proposed parallel PCA and LDA methods depending on the number of cores is shown in Table 3. As we can see from Tables 2 and 3, the proposed division into threads and subtasks allowed us to reduce the execution time based on a parallel algorithm, which is important at the preprocessing stage in order to reduce the dimensionality of data without significant time costs.

Next, it is important to analyze the overall diagnostic results. Table 4 shows the results of calculations of how many tests the neural network gave correct answers and how many did not. The accuracy of our proposed method reached 85.2%, which is about 3% better than in [12]. This leads to the conclusion that data preprocessing and dimensionality reduction can avoid incorrectly set tasks and improve the accuracy of the results. At the same time, we also managed to solve the problem of significant time costs at the preprocessing stage by parallelizing the PCA method using parallel computing technology.

Conclusions

Summarizing the results of this study, it should be emphasized that LDA is slower than PCA. We believe that this is due to the following factors: first, the calculation of the covariance matrix, as LDA needs to calculate the covariance matrix for each class in the dataset, which can be a computationally expensive operation, especially for large datasets with many classes. PCA, on the other hand, only needs to compute one covariance matrix for the entire dataset. Second, solving the eigenvalue problem, as LDA needs to solve the eigenvalue problem for the generalized eigenvalue matrix, which can be a computationally intensive task, especially for large matrices. PCA, on the other hand, requires solving the eigenvalue problem for a standardized covariance matrix, which is usually easier. Third, the number of eigenvalues: LDA typically requires only k eigenvalues to be calculated, where k is the desired projection dimension, while PCA requires all eigenvalues of the covariance matrix to be calculated. Fourth, sensitivity to noise: LDA can be more sensitive to noise in the data than PCA because LDA uses class label information, which can be sensitive to noise, while PCA does not use this information and therefore may be less sensitive to noise in the data. In general, LDA can be slower than PCA due to the more complex computations it requires. When applying the parallel PCA algorithm, we obtained a maximum speedup of about 3.5 times and an efficiency of 0.81. The dimensionality was reduced from three to two, which significantly improved the performance of our diagnostic system.

Figure 1 :1Figure 1: Diagram of the algorithm of the proposed approach

2 .2Off-diagonal elements: we can see different colours reflecting the level of covariance between different variables. Lighter colours (closer to yellow) indicate high positive covariance (variables that change in the same direction), while darker colours (closer to black) indicate low or negative covariance (variables that change in opposite directions). 3. The scale on the right shows how the colours of our matrix are interpreted: values closer to 1 indicate high positive covariance, and values closer to -0.2 indicate negative covariance.

Figure 2 :2Figure 2: Part of the covariance matrix for standardised data

Figure 4 :4Figure 4: Original image

Figure 5 :Figure 6 :56Figure 5: 2D model

Table 1 Comparative characteristics of PCA and LDA methods for data dimensionality reduction1CriterionРСАLDAPurposeDimensionality reduction byReducing dimensionality bymaximizing the total variancemaximizing the differenceof the databetween classesMethodologyEigenvectors and eigenvaluesEigenvectors and eigenvaluesof the covariance matrixof the scatter matrix betweenand within classesOrientationIndependent of classClass orientedApplicationVisualization, noise reduction,Classification, improving thedata preprocessingseparation between classesData typesAny dataClass labeled dataAdvantages-Keeps the maximum amount-Maximizes the separationof variation;between classes;-Useful for data visualization;-Effective for classification-Independent of classtasks;-Can improve classificationresults;Disadvantages-The loss of interpretation;-Assumption of data normality;-Not always suitable for-Loss of efficiency with a largeclassification tasksnumber of classes or unequalcovariance matrices;Dimensionality reductionTo the number of principalTо (k-1), where k is the numbercomponents that retain mostof classesof the varianceData requirementsDoes not require class labelsRequires class labels andassumes a normal distributionof data with equal covariancematrices for each class

Table 2 Time to execute sequential PCA and LDA methods, min2Image typeLNPTotal timeSequential PCA7.016.436.5720Sequential LDA7.107.237.0221.53

Table 3 Execution time of parallel PCA and LDA methods depending on the number of cores, min3Number of cores 1248Parallel PCA5.0123.3432.1122.005Parallel LDA5.4513.0202.3432.151

Table 4 ResNet-50 test results4Next, we calculated the prediction accuracy indicators based on the proposed preprocessing stage and the use of the ResNet-50 network (see Table5).ResultPositiveNegativePositive107 -TP3 -FNNegative28 -FP72 -TN

Table 5 Indicators of forecasting accuracy5ResultRecallPrecisionAccuracyPositive0.7930.9720.852Negative0.7220.965-

Acknowledgements

The authors express their gratitude to the Armed Forces of Ukraine for providing the security necessary to perform this work. This work has been made possible only through the resilience and courage of the Ukrainian Army.

On Intelligent Multiagent Approach to Viral Hepatitis B Epidemic Processes Simulation DChumachenko 10.1109/DSMP.2018.8478602 IEEE Second International Conference on Data Stream Mining & Processing (DSMP)

Lviv

IEEE 2018. Aug. 2018 A Parallel Algorithm for the Detection of Eye Disease LMochurad RPanto 10.1007/978-3-031-24475-9_10 Lecture Notes on Data Engineering and Communications Technologies Computer Science and Digital Economics IV ZHu YWang MHe

Cham; Nature Switzerland

Springer 2023 158 Advances in Intelligent Systems A Hybrid Deep Learning-Based Approach for Brain Tumor Classification ARaza 10.3390/electronics11071146 Electronics 11 7 1146 Apr. 2022 Time duration and health care resource use during cancer diagnoses in the United States: A large claims database analysis MGitlin NMcgarvey NShivaprakash ZCong 10.18553/jmcp.2023.29.6.659 J. Manag. Care Spec. Pharm 29 6 Jun. 2023 AN OVERVIEW OF CHALLENGES IN MEDICAL IMAGE PROCESSING AO ADeheyab 10.1145/3584202.3584278 Proceedings of the 6th International Conference on Future Networks & Distributed Systems the 6th International Conference on Future Networks & Distributed Systems

Tashkent TAS Uzbekistan

ACM Dec. 2022 Advances in Data Preprocessing for Biomedical Data Fusion: An Overview of the Methods SWang 10.1016/j.inffus.2021.07.001 Challenges, and Prospects 76 Dec. 2021 Inf. Fusion Application of Data Preprocessing in Medical Research VVBozhenko TMTatarnikova 10.1109/WECONF57201.2023.10148004 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF)

St. Petersburg

IEEE 2023. May 2023 Parallel Algorithms for Interpolation with Bezier Curves and B-Splines for Medical Data Recovery LMochurad YMochurad 6th International Conference on Informatics and Data-Driven Medicine 2023. 2023 3609 InceptionRestNetV2 Transfer Learning Approach for Cholangiocarcinoma Diagnosis utilizing Multidimensional Choledochal Database JanarththananJeyagopal 10.13140/RG.2.2.21993.10083 2021 Unpublished Review of Dimension Reduction Methods SNanga 10.4236/jdaip.2021.93013 J. Data Anal. Inf. Process 09 03 2021 A Multidimensional Choledoch Database and Benchmarks for Cholangiocarcinoma Diagnosis QZhang QLi GYu LSun MZhou JChu 10.1109/ACCESS.2019.2947470 IEEE Access 7 2019 ResNet-50 based Method for Cholangiocarcinoma Identification from Microscopic Hyperspectral Pathology Images YDeng JYin YWang JChen LSun QLi 10.1088/1742-6596/1880/1/012019 J. Phys. Conf. Ser 1880 1 12019 Apr. 2021 Cholangiocarcinoma Classification using MedisawHSI: A Breakthrough in Medical Imaging HNamburu VNMunipalli MVanga MPasam SSikhakolli SChinnadurai 10.1109/ic-ETITE58242.2024.10493579 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)

Vellore, India

IEEE 2024. Feb. 2024 A secure data transmission framework for IoT enabled healthcare SSaif PDas SBiswas SKhan MAHaq VKovtun 10.1016/j.heliyon.2024.e36269 Heliyon 10 16 Aug. 2024 Machine Learning in Python: Main Developments and Technology Trends in Data Science SRaschka JPatterson CNolet 10.3390/info11040193 Machine Learning, and Artificial Intelligence, Information 11 4 193 Apr. 2020 Impact of Preprocessing Methods on Healthcare Predictions PMisra ASYadav 10.2139/ssrn.3349586 SSRN Electron. J 2019 Analysis of Dimensionality Reduction Techniques on Big Data GTReddy 10.1109/ACCESS.2020.2980942 IEEE Access 8 2020 Parallel Framework for Dimensionality Reduction of Large-Scale Datasets SKSamudrala JZola SAluru BGanapathysubramanian 10.1155/2015/180214 Sci. Program 2015 2015 DEVELOPMENT OF PROGRAMMABLE HOME SECURITY USING GSM SYSTEM FOR EARLY PREVENTION JA JAlsayaydeh AAziz AI ARahman ARPN Journal of Engineering and Applied Sciences 16 1 Machine Learning in Python: Main Developments and Technology Trends in Data Science SRaschka JPatterson CNolet 10.3390/info11040193 Machine Learning, and Artificial Intelligence, Information 11 4 193 Apr. 2020 Microscopic Hyperspectral Choledoch Dataset Jul. 20, 2024