Preprocessing Ground-Based Hyperspectral Image Data for Improving CNN-based Classification Andreas Schliebitz1 , Heiko Tapken1 and Martin Atzmueller2,3 1 Osnabrück University of Applied Sciences, Albrechtstr. 30, 49076 Osnabrück, Germany 2 Osnabrück University, Semantic Information Systems Group, Wachsbleiche 27, 49090 Osnabrück, Germany 3 German Research Center for Artificial Intelligence (DFKI), Hamburger Str. 24, 49084 Osnabrück, Germany Abstract Complex data – like hyperspectral image data – requires adequate preprocessing methods for tackling the issue of data quality, as a prerequisite for further machine learning approaches like deep learning. This paper addresses preprocessing in the context of ground-based hyperspectral image data: It presents novel preprocessing methods, and proposes a comprehensive preprocessing pipeline for handling complex hyperspectral image samples. Multiple preprocessing pipelines are applied on a set of hyperspectral images in the context of image classification, analyzing which preprocessing algorithms perform best, in order to draw further conclusions about methods and their combinations in our application context. Our results show trends on the application of specific methods, and indicate that the application of shorter pipelines tends to achieve better results. We also provide empirical evidence suggesting that too intensive dimensionality reduction can have detrimental effects on classifiability, regardless of contamination levels. Keywords hyperspectral image analysis, data preprocessing, artificial neural networks, image classification 1. Introduction In recent years, deep learning-based techniques have advanced significantly, in particular, in the area of computer vision, e. g., [1, 2, 3, 4]. However, one important prerequisite for their successful application is sufficient data quality [5, 6], requiring appropriate preprocessing: in particular, preprocessing needs to be applied e. g., when working with complex data like hyperspectral images [7, 8, 9] or performing advanced sampling approaches for quality estimation, cf. [10, 11]. In this paper, we target preprocessing methods in the context of analyzing hyperspectral image data: We investigate several preprocessing algorithms based on a literature review, provide efficient Python implementations (detached from data acquisition), and combine them according to the recommendations of Vidal et al. [11] into comprehensive preprocessing pipelines. These are applied to artificially contaminated hyperspectral images of potatoes. A hyperspectral image is a data cube consisting of many two-dimensional spatial bands stacked along the spectral dimension. Subsequently, the cleaned datasets are classified using an artificial neural network to assess and identify promising combinations of preprocessing methods [12, 13, 14]. LWDA’23: Lernen, Wissen, Daten, Analysen. October 09–11, 2023, Marburg, Germany $ a.schliebitz@hs-osnabrueck.de (A. Schliebitz); h.tapken@hs-osnabrueck.de (H. Tapken); martin.atzmueller@uni-osnabrueck.de (M. Atzmueller)  0000-0003-0361-7770 (A. Schliebitz); 0000-0002-0685-5072 (H. Tapken); 0000-0002-2480-6901 (M. Atzmueller) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 In particular, our application context is given by an agricultural application for visual-based automated quality determination of potatoes [14], which is being developed in the Agri-Gaia research project by Osnabrück University of Applied Sciences in cooperation with Wernsing Feinkost GmbH. It addresses both the use of RGB as well as hyperspectral data described in this paper. Real-time capable RGB cameras are only suitable for determining defects visible on a potato’s surface. Therefore, the quality of dirty potatoes cannot be assessed using this approach. Due to the different penetration depths of different wavelengths of the non-visible electro- magnetic spectrum, it is suspected that hyperspectral cameras can detect defects beneath a thin layer of soil [10]. Hence, hyperspectral data can be a valuable provider of nontrivial features, which may be extracted using convolutional neural networks (CNNs). However, due to multiple contamination hazards typically encountered during data collection, hyperspectral preprocessing is often necessary for assuring data quality. Our core contributions are summarized as follows: 1. We discuss preprocessing, and propose two novel methods, i. e., spectral binning using self-reference deviation (SRD) and spatial binning via a SID-SAM sliding window (SSSW). 2. In our experimentation, we construct different preprocessing pipelines and estimate their performance on a reference dataset in the described real-world context. 3. We provide an open-source implementation for creating preprocessing pipelines in Python using the proposed methods: https://github.com/andreas-schliebitz/hipp The rest of the paper is structured as follows: Section 2 discusses related work. After that, Section 3 describes our proposed approach in detail. Next, Section 4 presents and discusses our results. Finally, Section 5 concludes with a summary and interesting directions for future work. 2. Related Work Vidal et al. [11] discuss preprocessing steps for hyperspectral data, as well as individual algorithms within the preprocessing pipeline shown in Figure 1: • Dead pixels can be detected using thresholding techniques relying on median spectra or more robust methods like genetic or evolutionary algorithms [11, p. 143]. In addition to more sophisticated methods like Minimum Volume Ellipsoid (MVE), defect pixels can also be localized by their abnormal intensity values [11, p. 143]. • Spectral spike points may be removed using median-modified Wiener filters (MMWF), wavelet transforms (Wt), signal derivatives or statistical methods detecting unlikely devia- tions from the signal’s mean [11, p. 144]. • Background removal can often be performed using manual or automatic thresholding techniques [11, p. 142]. Outlier detection should only be attempted if a normal distribution of data points can be assumed [11, p. 145]. If so, methods like Resampling by Half Means (RHM), Smallest Half Volume (SHV) or more robust estimators like MVE or Minimum Covariance Determinant (MCD) can be used [11, p. 145]. • Following the paper of Vidal et al. [11], spectral preprocessing can be split into noise reduction, scatter correction and the reduction of additive baseline shifts. Noise reduction can be performed using 3D or Daubechies wavelets [11, pp. 140-141]. Both spectral denoising and the rectification of additive baseline shifts can be attempted using the Savitzky-Golay filter [11, p. 145]. For scatter correction the use of the Standard Normal 2 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 Variate (SNV) is advisable if no reference spectrum is available. Otherwise, if a reference spectrum can be recorded a priori, then Multiplicative Scatter Correction (MSC) should be considered instead [11, p. 145] [15]. • Dimensionality reduction of a hyperspectral datacube can be achieved using variable selection techniques based on genetic algorithms or partial least squares regression (e. g. iPLS) [11, p. 140]. Other methods include spatial and spectral binning as well as factor model approaches such as Principal Component Analysis (PCA) or Multivariate Curve Resolution (MCR) [11, p. 140]. A classification of hyperspectral data cleaned with a selection of these methods is performed by Amigo et al. [16], co-author of [11]. The authors aim at distinguishing flame retardants contained in six different types of plastics using partial least squares regression. In contrast to this paper, the authors do not attempt a classification via artificial neural networks. In the context of hyperspectral image classification, neural networks, especially convolutional types, can be considered as successors of kernel-based methods like PCA and Support Vector Machines (SVM) [17]. Yu et al. [18] demonstrate the superior classification ability of convolutional neural networks (CNN) in a comparative study involving 𝑘-nearest neighbor (KNN) and SVM varia- tions. The results show that the CNN-based classifier performs best on all three hyperspectral datasets, being Indian Pines, PaviaU and Salinas. The literature today includes a number of CNN architectures specifically designed for hyperspectral image classification. Ausdebert et al. implement ten of these CNN-based classifiers in their DeepHyperX toolbox [19, 20], of which the S-CNN architecture [21] performs best in our experiments (see Table 4). 3. Methods A preprocessing pipeline can be considered as a composition of functions which takes a hyper- spectral data cube (hypercube) as input and applies a set of transformations to it. Each prepro- cessing step can affect the spatial and spectral dimensions of the resulting hypercube, which will serve as input of the subsequent step. Therefore, algorithms in a preprocessing pipeline must be able to adapt to changes in input dimension- ality. Experiments have shown that careless composi- tion of algorithms within preprocessing pipelines can have a negative impact on their effectiveness. Figure 1 (adapted from Vidal et al. [11]) depicts a reasonable sequence of individual preprocessing steps also used by the pipelines compiled in this paper. Below, the findings of a literature review are used to instantiate several such preprocessing pipelines with selected al- gorithms. The task of these pipelines is to remove unwanted noise from hyperspectral images, for three contamination levels (low, medium and high). After this preprocessing phase, the cleansed hyperspectral images are used for training a neural classifier in or- der to draw conclusions about effective combinations of different preprocessing algorithms. The selected Figure 1: Structure of a hyperspectral preprocessing algorithms are summarized in Table 1. preprocessing pipeline, cf. [11] 3 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 Table 1 Overview of algorithms used for creating preprocessing pipelines. Preprocessing step Algorithm Abbr. Ref. Dead pixel detection Standard Deviation Threshold SDT [11, sec. 6] Spike detection Standard Deviation Factor SDF [11, sec. 7] Stochastic Outlier Selection SOS [22] Outlier detection Copula Based Outlier Selection COPOD [23] Reed-Xiaoli Detector RXD [24] Savitzky-Golay Filter SGF [25] Noise reduction Daubechies Wavelet Filter Wt [26] Minimum Noise Fraction MNF [27] Standard Normal Variate SNV [28] Scatter correction Multiplicative Scatter Correction MSC [29] Principal Component Analysis PCA [30] Dimensionality reduction Spectral Binning SRD — Spatial Binning SSSW — 3.1. Methods for Dimensionality Reduction 3.1.1. Spectral Binning using Self-Reference Deviation (SRD) For hyperspectral data, spectral binning is a dimensionality reduction technique, which merges similar bands along the spectral dimension 𝑧 of the hypercube H ∈ R𝑦×𝑥×𝑧 where 𝑦 and 𝑥 denote the spatio-temporal and spatial dimensions respectively. More generally, 𝑦 represents the height and 𝑥 the width of each spectral band in pixels. Binning can be done manually for obvious groups of similar bands by specifying static bins. Since this approach is rarely practical for all samples in a dataset, a new automatic spectral binning procedure is presented below. The SRD method is based on the average distance 𝑑 ∈ R>0 of the intensity values inside the averaged spectrum I* ∈ R𝑧 (self-reference) of the hypercube. For dynamic creation of bins, SRD expects a user defined maximum deviation 𝜀 ∈ (0, 1] from this average distance 𝑑. In summary, the SRD algorithm works as follows: 1. Calculate the reference spectrum I* of H, where the 𝑖-th entry of I* is the mean 𝑏𝑖 of the 𝑖-th spectral band of H. 2. Calculate the mean distance 𝑑 of all neighboring intensity values in I* , adding the maxi- mum allowed percentage deviation: 𝑑 := 𝑑 · (1 + 𝜀) 3. Obtain the averages 𝑏𝑖 from the reference spectrum I* with 1 ≤ 𝑖 ≤ 𝑧 and identify the bins using the following case distinction: a) If 𝑖 = 1 or the distance from the current mean 𝑏𝑖 to the mean of the bands within the current bin is less than 𝑑: Add the 𝑖-th band from H to the current bin. A small deviation from the current bin’s mean indicates a high degree of similarity. b) Otherwise: Close the current bin and start with a new one. ′ 4. Create a reduced hypercube H′ ∈ R𝑦×𝑥×𝑧 by averaging the spectral bands of each bin. This operation reduces the hypercube’s depth by merging multiple similar bands into one. The storage gain 𝐺SRD achieved by spectral binning is calculated from the product of the depth difference 𝑧 − 𝑧 ′ , the unchanged spatial resolution 𝑦 × 𝑥 and the constant storage size 𝑘 of an intensity value in bits: 𝐺SRD = (𝑧 − 𝑧 ′ ) · 𝑦 · 𝑥 · 𝑘 where 0 < 𝑧 ′ ≤ 𝑧. Here, the storage gain is large if the difference 𝑧 − 𝑧 ′ is large, that is, the number of bins generated is small. 4 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 3.1.2. Spatial Binning using a SID-SAM Sliding Window (SSSW) Spatial binning, in contrast to spectral binning, achieves dimensionality reduction by merging similar point spectra located in the 𝑥𝑦 plane of the hypercube H ∈ R𝑦×𝑥×𝑧 . The spatial binning method presented in this section compares multiple point spectra in a 𝑠 × 𝑡 pixel wide window, which is moved over the 𝑥𝑦-plane of the hypercube with a horizontal step size of 𝑠 pixels. After reaching the right image boundary, the window is moved 𝑡 pixels down and repositioned at the left edge of the image. For each point spectrum within a window region, a similarity score is calculated by comparing it to the hypercube’s averaged spectrum I* using the SID- SAM similarity measure [31]. In doing so, only those spectra are merged that exhibit a high similarity to I* and thus contribute comparatively little to the hypercube’s entropy. SSSW can be used with all types of hyperspectral data cubes regardless of the window size, since the median spectrum I* does always exist. The similarity matrix R𝑦×𝑥 stores the SID-SAM score for each point spectrum. An entry of R is small if the associated point spectrum is similar to I* . Afterwards, the matrix R is subdivided into the original 𝑠 × 𝑡 wide window regions. The SID-SAM values inside these smaller matrices are averaged to obtain a single similarity measure tied to a specific image area. Subsequently, all averaged SID-SAM values are associated with their corresponding point spectra in the 𝑥𝑦-plane of the hypercube. In doing so, a maximum of 𝑢 = (⌊𝑦/𝑠⌋ + 1) · (⌊𝑥/𝑡⌋ + 1) non-overlapping image regions are considered for spatial binning. After sorting the averaged SID-SAM values in ascending order, the point spectra eligible for merging are determined through the use of a user defined percentile 𝜃 ∈ [0, 1]. In this case, it is sufficient to associate the smallest ⌊𝜃 · 𝑢⌋ SID-SAM values with their corresponding window regions and merge the contained spectra by spectral averaging. The storage gain 𝐺SSSW achieved by the SSSW algorithm increases with the window size 𝑠 × 𝑡 and the percentile 𝜃 used for binning: 𝐺SSSW ≈ ⌊𝜃 · 𝑢⌋ · (𝑠 · 𝑡 − 1) · 𝑧 · 𝑘 bits. The value 𝑘 represents the constant storage size of an intensity value in bits. Strictly speaking, this storage gain calculation is rather an approximation, since the sliding window procedure will generate irregular window sizes if either the image width or height is not a multiple of 𝑠 or 𝑡 respectively. Finally, since merging multiple point spectra creates gaps in the original data cube, a compression is performed by densely rearranging the binned point spectra resulting in different output dimensions. This procedure has the disadvantage of separating spatial features in the 𝑥𝑦-plane of the hypercube. 3.2. Preprocessing Pipelines The combinatorial complexity associated with instantiating various alternative preprocessing pipelines is significantly reduced by using the fixed order of the individual preprocessing steps depicted in Figure 1. The number of eligible pipeline combinations is further limited by purposely leaving out MSC, since the linearly related SNV [32] is faster to compute. Furthermore, due to the absence of a reference spectrum, there are no advantages justifying the use of MSC in this paper. Additional pipeline combinations are eliminated by always applying SDT and SDF at the beginning of each preprocessing pipeline. This results in the generated pipelines differing only in their preprocessing algorithms for outlier detection, noise reduction, and dimension reduction (see Figure 2). The input and output degrees of all inner nodes of this graph are initially identical by definition, since each node produces an output and passes it as input to all descendant nodes. 5 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 SDT For reasons of clarity, the three outgoing edges of each noise reduction node (MNF, Wt, SGF) in Figure 2 are SDF merged into a single edge. The same applies to the out- going edges of the SNV node. The number of pipelines SOS RXD COPOD constructed in this way can be determined by counting all possible paths from the root (SDT) to the leaf nodes MNF Wt SGF (SRD, PCA, SSSW) of the graph. Since preprocessing steps with only one algorithm do not create additional pipelines, only three of the six preprocessing steps have SNV to be considered with three alternatives each. Therefore, the graph shown in Figure 2 produces a total of 33 = 27 SRD PCA SSSW different pipeline combinations. Figure 2: Preprocessing pipelines with algorithms from Table 1. 3.3. Reference Dataset The raw dataset used in this paper consists of hyperspectral images depicting a total of 728 clean potatoes, which were manually acquired using a Helios Core 0.9-1.7 XF-PA100 240×320/CII/H330 line scan camera. Table 2 includes 18 more potatoes for a grand total of 746 since some potatoes are annotated with multiple classes. During dataset creation, samples annotated with multiple classes are duplicated to each assigned quality class. To avoid methodological errors in training and evaluation, potatoes with multiple defect classes are not included in the contiguous reference dataset. Additionally, the raw dataset is restricted to the four largest defect classes from Table 2 to ensure the availability of sufficient test spectra during evaluation. These measures reduce the size of the raw dataset from 728 to initially 622 and after deduplication to finally 606 potatoes with an average of approximately 13 700 point spectra each. Table 2 Quality classes of the hyperspectral potato dataset. # Quality class Quantity Σ # Quality class Quantity Σ 1 green 264 7 growth deformity 17 2 dry rot 171 8 blue spots 17 622 124 3 no defect 117 9 withered 14 4 growth crack 70 10 wet rot 9 5 mechanical damage 43 Total 746 6 scab 24 3.4. Dataset Contamination The spectra of the reference dataset are artificially contaminated with varying amounts of noise to investigate the effectiveness of different preprocessing pipelines. This procedure became necessary due to an unknown amount of in-camera preprocessing that was applied to the raw sensor data at acquisition time. Contamination is performed in three stages using additive and multiplicative scattering effects, Gaussian noise, spike points and dead pixels. Table 3 quantifies the degree of artificial contamination for all three contamination levels using statistical indicators applied to the whole set of intensity values contained within each dataset. 6 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 The strength of additive and multiplicative scat- Table 3: Statistical characteristics of the tering effects is dynamically calculated as a func- three contamination levels. tion of contamination level and applied to random Contam. Avg. SD (𝜎) Var. (𝜎 2 ) spectra of a fixed user-defined percentile (24 %). As low 0.374 0.241 0.058 with the other types of contamination, the spectra medium 0.353 0.246 0.061 high 0.333 0.256 0.066 are increasingly influenced by additive and mul- tiplicative scattering effects at higher contamination levels. Additive scattering effects can manifest themselves not only in a positive but also negative shift of intensity values. In contrast to additive influences, multiplicative scattering affects the slopes within a spectrum’s wavelength profile. In this paper, this phenomenon is simulated by multiplying a spectrum with by factor between 0.5 and 0.9. This ensures that the intensity values of the spectrum are not shifted out of their normalized range of values [0,1]. Since multiplying a spectrum by a number close to one has little effect on its slopes, the highest scaling factor is used at the lowest contamination level. Gaussian noise is applied to every sample with increasing variance at higher contamination levels (0.0004, 0.0012, 0.0020). Simulation of both spike points (max. 1.5 %) and defect pixels (max. 2.4 %) is implemented by randomly setting maximum (1) or minimum (0) intensity values. The influence of defect pixels is simulated within the 𝑖-th band B𝜆𝑖 ∈ R𝑦×𝑥 by randomly replacing columns, such that they consist either of the minimum or maximum value 0, 1 ∈ R𝑦 , respectively. 3.5. Classification of Preprocessed Datasets 3.5.1. Dataset Sampling Sampling of both the reference dataset and the three contaminated datasets is performed using a special sampling method that guarantees the spatial disjointness of training, test, and validation spectra. The purpose of this sampling procedure is to separate the point spectra of a spatially contiguous dataset in such a way that no two spectra of the same potato appear in more than one of the three dataset splits. The developed sampling strategy achieves this by dividing the ground truth mask of the reference dataset into three overlap-free regions (see Figure 3). This trisection is performed along the horizontal axis of the contiguous hypercube, since a vertical decomposition could result in a class imbalance within the partial datasets. The reason for this is that the potatoes inside the contiguous hypercube are sorted according to their size in order to save space by reducing the amount of empty background voxels. However, in our dataset, green potatoes were on average significantly smaller than potatoes of other quality classes and therefore located in the lower third of the ground truth mask. Performing vertical partitioning along the horizontal axis showed an improvement in the distribution of potato sizes and classes in each dataset split. Figure 3: Sampling of training, testing, and validation data from three non-overlapping regions. 7 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 3.5.2. Network Architecture Selection Classification of the contaminated datasets as well as those cleaned by the preprocessing pipelines is performed using the DeepHyperX toolbox [19, 20]. For the selection of a suitable classifier, the reference dataset consisting of 8.3 million point spectra is limited to a training and test dataset of 200 000 spectra each. The training is monitored with a validation split of 20 000 spectra. The goal of these initial training runs is to determine a neural network architecture that has a low training duration with comparatively high accuracy. Excessively long training durations are unacceptable from a practical point of view, since up to 45 training and testing iterations must be performed with the pipelines constructed in Section 3.2. The seven neural network architectures evaluated in advance to this paper are part of the DeepHyperX toolbox and trained in six epochs with a batch size of 100. Other settings and hyperparameters like learning rate, its scheduler, optimizer and loss function are left at their defaults. All training runs are performed on two Tesla V100 graphics cards of an NVIDIA DGX Station. The comparatively low amount of training spectra is due to the long training duration (4 h 20 min) of the best performing CNN architecture developed by Sharma et al. (see Table 4). Table 4 Classification accuracies and training durations of a hyperspectral CNN (sharma) and multilayer per- ceptron (nn) on spectra of the reference dataset. Arch. Train loss Val. acc. Test acc. Train 𝑡 Test 𝑡 sharma [21] 0.228 59.7 % 64.7 % 4:20:08 0:14:57 nn [19] 0.999 49.4 % 52.8 % 0:02:26 0:06:51 Due to the long training duration of the sharma architecture, the second ranked nn archi- tecture is chosen as the classifier to evaluate the performance of the different preprocessing pipelines, serving as a prototypical instance towards more advanced architectures. The nn architecture is not a convolutional neural network, but rather a simple multilayer perceptron (MLP) consisting of only four fully connected layers, each with 2048 neurons. 4. Results 4.1. Experimental Setup The nn architecture is trained on approximately 4.5 million spectra from each of the contaminated datasets in 12 epochs with a batch size of 100. Training is monitored at each epoch with approximately 225 000 validation spectra while the learning rate is dynamically adjusted by the ReduceLROnPlateau scheduler. Cross entropy is used as the loss function in conjunction with the Adam optimizer. As a regularization mechanism, Dropout is applied with a dropout probability of 𝑝 = 0.5. The evaluation of a trained model is performed on approximately 900 000 test spectra. Since the training, test, and validation datasets are created using the spatially disjoint sampling method from Section 3.5.1, no spectra from the same potato can appear in both the training and test datasets. The computed performance metric is the classification accuracy of each model on its respective test dataset. 8 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 4.2. Reference Dataset The nn classifier trained on the reference dataset achieves an accuracy of 53.61 % on the first four defect classes (cf. Table 2), which is used as a baseline for performance comparison below. 4.3. Contaminated Datasets As expected, the classification accuracy of the nn architecture decreases monotonically but not uniformly as the contamination level of the dataset increases. A test accuracy of 40.77 % was reported for low, 37.53 % for medium and 36.36 % for high contamination. The observed degradation spans 4.41 %, with a gap of 3.24 % from second to first place. This difference is almost three times larger than from third to second rank (1.17 %). Compared to the classification results obtained from the reference dataset, the nn architecture exhibits a drop in test accuracy of 12.84 % at low contamination. Hence, the generalization ability of the nn architecture is found to be significantly weaker on the contaminated datasets. 4.4. Preprocessed Datasets The classification results listed in Tables 5 and 7 correspond to the pipeline combinations implied by Figure 2. Also, the 36 rows of the first table together with the 9 rows of the second table constitute the 45 preprocessing runs mentioned in section 3.5.2. Before analyzing the reported classification accuracies, it is important to note that the tables presented in this section do not explicitly list the first two preprocessing algorithms dealing with dead pixel (SDT) and spike detection (SDF). As mentioned in Section 3.2, these procedures are always applied in this order at the beginning of each preprocessing pipeline, such that they are omitted for better readability of the tabular listings. Table 6 quantifies the performance differential between the various dimensionality reduction techniques observed in Table 7. The left half of this listing includes the mean classification accuracies of the first 19 pipeline combinations using spectral binning, which are grouped by their respective contamination level and sorted by test accuracy. This ordering also applies to the right half of Table 6, which summarizes the same metrics for the last 17 pipelines using PCA for dimensionality reduction. Table 7 reports the classification results of those preprocessing pipelines that discard the spatial features of their inputs. This is because, unlike in Table 5, outliers are detected using the COPOD algorithm and dimensionality reduction is performed through spatial (SSSW) rather than spectral binning (SRD). This leads to a new peak classification accuracy of 44.52 %, i. e., 1.81 % higher than reported in Table 5. 4.5. Discussion The classification results listed in Table 5 show a clear dichotomy in their final processing step. The first 19 pipelines largely perform spectral binning as their dimensionality reduction, whereas the last 17 pipelines apply PCA. All pipeline combinations in the upper half of Table 5 (rows 1 to 19) were able to improve the classifiability of datasets with medium and high contamination levels. None of the last 17 pipelines succeeded in achieving the same objective. The pipelines which showed the largest improvement in classifiability of their cleaned datasets are located in the first two rows of Table 5 (+6.35 %, +5.03 %). 9 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 Table 5 Classification results of preprocessed datasets with retained spatial features. # Contam. SOS RXD SGF Wt MNF SNV PCA SRD Test acc. 1 high –  – –   –  42.71 % 2 medium  – – –   –  42.56 % 3 high  – – –   –  40.74 % 4 medium –  – –   –  40.68 % 5 low  – – –   –  39.71 % 6 low  –  – –  –  39.64 % 7 low –  – –   –  39.56 % 8 medium –   – –  –  39.53 % 9 low –   – –  –  39.39 % 10 low –  –  –  –  38.79 % 11 high  –  – –  –  38.71 % 12 low  – –  –  –  38.62 % 13 medium –  – –    – 38.49 % 14 high –  –  –  –  38.43 % 15 medium  –  – –  –  38.32 % 16 high  – –  –  –  38.13 % 17 medium –  –  –  –  38.10 % 18 medium  – –  –  –  37.88 % 19 high –   – –  –  37.64 % 20 low –  – –    – 35.59 % 21 low  – – –    – 35.45 % 22 medium  – – –    – 35.24 % 23 high  – –  –   – 35.08 % 24 medium  – –  –   – 34.25 % 25 high –  –  –   – 33.71 % 26 low  – –  –   – 33.20 % 27 low –  –  –   – 32.91 % 28 high  – – –    – 32.41 % 29 high  –  – –   – 32.32 % 30 medium –  –  –   – 32.11 % 31 high –  – –    – 31.92 % 32 low  –  – –   – 30.91 % 33 high –   – –   – 30.67 % 34 low –   – –   – 30.30 % 35 medium  –  – –   – 30.25 % 36 medium –   – –   – 29.10 % Table 6 Mean test accuracies and their changes related to the first 19 and last 17 pipelines from Table 5. # Contam. ∅ Test acc. Δ Acc. (Sec. 4.3) # Contam. ∅ Test acc. Δ Acc. (Sec. 4.3) 1 high 39.39 % +3.03 % 1 low 33.06 % -7.71 % 2 medium 39.37 % +1.48 % 2 high 32.69 % -3.67 % 3 low 39.29 % -1.48 % 3 medium 32.19 % -5.34 % By examining the actual dimensionality reductions performed in Table 5, we observe that the highest ranked pipeline (42.71 % acc.) uses spectral binning to perform an average dimensionality reduction of 81 bands per sample. After replacing the spectral binning with PCA, the test accuracy drops by 10.79 % to 31.92 % (rank 31). The dimensionality reduction carried out by the PCA decreases the average depth of the respective dataset from 252 to only 3 bands. Based on 10 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 Table 7 Classification results of preprocessed datasets with discarded spatial features. # Contam. COPOD SGF Wt MNF SNV SSSW Test acc. 1 low   – –   44.52 % 2 high  – –    42.40 % 3 medium   – –   42.35 % 4 medium  – –    41.39 % 5 high   – –   40.89 % 6 low  –  –   38.62 % 7 high  –  –   35.78 % 8 medium  –  –   35.41 % 9 low  – –    34.64 % this observation, it can be concluded that a drastic dimensionality reduction using PCA can have detrimental effects on the classifiability of hyperspectral datasets. This claim is further substantiated by rows 20 to 36 of Table 5. For reference, the PCA carried out in this paper was set to obtain 99.8 % of a sample’s variance. In Table 6, we observe that the contamination levels are reversed in comparison to the classifi- cation results of the contaminated datasets (see Section 4.3). On average the best improvements in test accuracy can be obtained on the most contaminated datasets (+3.03 %). Our experiments indicate, that preprocessing datasets with low contamination levels may even lead to a deteriora- tion of their classifiability (∅ -1.48 %). In our tests, this observation holds true for each individual pipeline applied to a dataset with low contamination. The best classification result of a lightly contaminated dataset is 39.71 % (rank 5) behind two more contaminated datasets. The second half of the table shows the negative impact of PCA on the classification results. The greatest losses occur for datasets of the lowest contamination level, whose test accuracy decreases by 7.71 % on average. In second place are the most contaminated datasets, whose mean classification accuracy is least affected by PCA (-3.67 %). Regarding noise reduction, the first five rows of Table 5 indicate that the MNF algorithm is superior to SGF and Wt. This coincides with both the visual appearance of the denoised samples and the fact that MNF was specifically designed for hyperspectral noise reduction. Even the popular Savitzky-Golay filter, which seems to work well with the Reed-Xiaoli detector, cannot outperform MNF. In terms of outlier detection, the combinations of SOS and MNF (3) and RXD and MNF (2) occur almost equally often in the Top-6 results of Table 5. However, the highest ranked preprocessing pipeline uses RXD rather than SOS for outlier detection. The importance of RXD increases after the rankings are extending to the first eleven results, where RXD is used by six pipelines and SOS only by five. In the lower half of Table 5 we observe that SGF performs poorly in conjunction with PCA, whereas the wavelet transform appears to be more robust. Similar advantages emerge in the midfield, where wavelet filtering is on par with SGF and even outperforms on datasets with higher contamination levels. Overall, the best classification accuracy is achieved by a pipeline which discards features in the spatial-temporal domain (see Table 7). This pipeline preprocesses a dataset of low contamination – classified with an accuracy of 44.52 %. This result is 1.81 % higher than the previous peak of 42.71 % (see Table 5). The second best accuracy was previously obtained by a combination of RXD, MNF and spectral binning. For the results shown in Table 7, there is no superior method for noise reduction in the context of COPOD and spatial binning. Both MNF and SGF show promising results in spectral and spatial smoothing, while wavelet transform clearly underperforms. 11 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 5. Conclusions This paper considered classification methods on hyperspectral image data, specially present- ing novel preprocessing methods, which were integrated into a comprehensive preprocessing pipeline. We performed an analysis of multiple preprocessing pipelines in the context of image classification, in order to analyze which preprocessing algorithms perform best. Our results indicate that dimensionality reduction using principal component analysis can potentially have negative effects on the classifiability of hyperspectral data. Furthermore, both the Reed-Xiaoli detector and the MNF transform are preferable as dedicated hyperspectral preprocessing algorithms to general-purpose methods such as SOS or wavelet filtering. The Savitzky-Golay filter, widely used in signal processing, can also be used successfully for smooth- ing noisy spectra. The results obtained with this method can be considered satisfactory as long as no dimensional reduction is performed with PCA. For this reason, it is recommended that dimensionality reduction should be removed from a preprocessing pipeline and not applied across the board to all samples. If data reduction is still required, then the less invasive but also less effective methods of spatial and spectral binning can be used as an alternative to PCA. Furthermore, a step-by-step calculation of the results listed in the Table 5 and 7 showed that shorter pipelines correlate with better classification results. Therefore, preprocessing pipelines should contain as few algorithms as possible, focusing on those that are tailored to the type of contamination at hand. In order to determine which preprocessing steps are really useful, an examination of the contaminated dataset is recommended. In our experimentation, advantages in favoring preprocessing pipelines that preserve spatial features were not observed in terms of classification accuracy using a multilayer perceptron. However, it can be conjectured that convolutional neural networks can extract generalizable features from coherent spatio-temporal regions, which is supported by the results shown in Table 4. In summary, for classifying hyperspectral image data we recommend the following strategies: 1. The preprocessing algorithms should be as domain-specific as possible. 2. A minimal number of pre-processing algorithms should be used. 3. The preprocessing algorithms should address the specific impurities of the dataset. 4. The algorithms should be carefully parameterized and arranged according to Figure 1. 5. For further classification, a CNN designed for hyperspectral data should be used. For future work, we aim to extend the analysis of the applied neural network architectures. Also, for explainability and interpretability of the applied method [33, 34, 35, 36] both specialized preprocessing approaches as well as respective feature generation, e. g., using declarative or neuro-symbolic techniques [37, 38, 39] provide interesting directions for future research. Acknowledgments This work was partly funded by the German Federal Ministry of Economics and Climate Protec- tion as part of the research project Agri-Gaia under grant number 01MK21004G. 12 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 References [1] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al., Recent advances in convolutional neural networks, Pattern recognition 77 (2018) 354–377. doi:10.1016/j.patcog.2017.10.013. [2] A. Khan, A. Sohail, U. Zahoora, A. S. Qureshi, A survey of the recent architectures of deep convolutional neural networks, Artificial Intelligence Review 53 (2020) 5455–5516. doi:10.1007/s10462-020-09825-6. [3] D. Hemanth, V. Estrela, Deep Learning for Image Processing Applications, Advances in Parallel Computing, IOS Press, 2017. [4] L. Jiao, J. Zhao, A Survey on the New Generation of Deep Learning in Image Processing, IEEE Access 7 (2019) 172231–172263. doi:10.1109/ACCESS.2019.2956508. [5] S. E. Whang, J.-G. Lee, Data collection and quality challenges for deep learning, Proceedings of the VLDB Endowment 13 (2020) 3429–3432. doi:10.14778/3415478.3415562. [6] S. E. Whang, Y. Roh, H. Song, J.-G. Lee, Data collection and quality challenges in deep learning: A data-centric ai perspective, The VLDB Journal 32 (2023) 791–813. doi:10.1007/ s00778-022-00775-9. [7] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, et al., Recent advances in techniques for hyperspectral image processing, Remote sensing of environment 113 (2009) S110–S122. doi:10.1016/j.rse.2007.07.028. [8] M. Manley, Near-infrared spectroscopy and hyperspectral imaging: non-destructive analy- sis of biological materials, Chemical Society Reviews 43 (2014) 8200–8214. doi:10.1039/ c4cs00062e. [9] J. M. Amigo, I. Martí, A. Gowen, Hyperspectral Imaging and Chemometrics: A Perfect Combination for the Analysis of Food Structure, Composition and Quality, in: Data handling in science and technology, volume 28, Elsevier, 2013, pp. 343–370. doi:10.1016/ B978-0-444-59528-7.00009-0. [10] B. Boldrini, W. Kessler, K. Rebner, R. Kessler, Hyperspectral Imaging: A Review of Best Practice, Performance and Pitfalls for in-line and on-line Applications, Journal of Near Infrared Spectroscopy 20 (2012) 438–508. doi:10.1255/jnirs.1003. [11] M. Vidal, J. M. Amigo, Pre-processing of hyperspectral images. essential steps before image analysis, Chemometrics and Intelligent Laboratory Systems 117 (2012) 138–148. doi:10.1016/j.chemolab.2012.05.009. [12] L. Dale, A. Thewis, C. Boudry, I. Rotar, P. Dardenne, V. Baeten, J. Fernández Pierna, Hyper- spectral Imaging Applications in Agriculture and Agro-Food Product Quality and Safety Control: A Review, Applied Spectroscopy Reviews 48 (2013) 142. doi:10.1080/05704928. 2012.705800. [13] C. Wang, B. Liu, L. Liu, Y. Zhu, J. Hou, P. Liu, X. Li, A review of deep learning used in the hyperspectral image analysis for agriculture, Artificial Intelligence Review 54 (2021) 5205–5253. doi:10.1007/s10462-021-10018-y. [14] A. Schliebitz, H. Graf, T. Wamhof, H. Tapken, A. Gertzen, KI-basiertes Computer-Vision- System zur Qualitäts- und Größenbestimmung von Kartoffeln, 43. GIL-Jahrestagung, Resiliente Agri-Food-Systeme (2023). 13 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 [15] D. Pelliccia, Two scatter correction techniques for NIR spec- troscopy in Python, Online, 2018. URL: https://nirpyresearch.com/ two-scatter-correction-techniques-nir-spectroscopy-python/, retrieved: 2023-07-15. [16] J. M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34–51. doi:10.1016/j.aca.2015.09.030. [17] G. Camps-Valls, L. Bruzzone, Kernel-based methods for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 43 (2005) 1351–1362. doi:10.1109/ TGRS.2005.846154. [18] S. Yu, S. Jia, C. Xu, Convolutional neural networks for hyperspectral image classification, Neurocomputing 219 (2017) 88–98. doi:10.1016/j.neucom.2016.09.010. [19] N. Audebert, DeepHyperX, Online, 2018. URL: https://github.com/nshaud/DeepHyperX, retrieved: 2023-07-15. [20] N. Audebert, B. Le Saux, S. Lefèvre, Deep Learning for Classification of Hyperspectral Data: A Comparative Review, IEEE Geoscience and Remote Sensing Magazine 7 (2019) 159–173. doi:10.1109/MGRS.2019.2912563. [21] V. Sharma, A. Diba, T. Tuytelaars, L. Van Gool, Hyperspectral CNN for Image Clas- sification & Band Selection, with Application to Face Recognition, Technical report KUL/ESAT/PSI/1604, KU Leuven, ESAT, Leuven, Belgium (2016). [22] J. Janssens, Outlier Selection and One-Class Classification, Ph.D. thesis, 2013. Series: TiCC Ph.D. Series Volume: 27. [23] Z. Li, Y. Zhao, N. Botta, C. Ionescu, X. Hu, COPOD: Copula-Based Outlier Detection, in: 2020 IEEE International Conference on Data Mining (ICDM), 2020, pp. 1118–1123. doi:10.1109/ICDM50108.2020.00135. [24] I. Reed, X. Yu, Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution, IEEE Transactions on Acoustics, Speech, and Signal Processing 38 (1990) 1760–1770. doi:10.1109/29.60107. [25] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures., Analytical Chemistry 36 (1964) 1627–1639. doi:10.1021/ ac60214a047. [26] I. Daubechies, Ten Lectures on Wavelets, SIAM, 1992. doi:10.1137/1.9781611970104. [27] A. A. Green, M. Berman, P. Switzer, M. D. Craig, A Transformation for Ordering Mul- tispectral Data in Terms of Image Quality with Implications for Noise Removal, IEEE Transactions on geoscience and remote sensing 26 (1988) 65–74. doi:10.1109/36.3001. [28] R. J. Barnes, M. S. Dhanoa, S. J. Lister, Correction to the Description of Standard Normal Variate (SNV) and De-Trend (DT) Transformations, NIR news 5 (1994) 6–6. doi:10.1255/ jnirs.21. [29] T. Isaksson, T. Næs, The Effect of Multiplicative Scatter Correction (MSC) and Linearity Improvement in NIR Spectroscopy, Applied Spectroscopy 42 (1988) 1273–1284. doi:10. 1366/0003702884429869. [30] K. Pearson, LIII. On lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin philosophical magazine and journal of science 2 (1901) 559–572. doi:10.1080/14786440109462720. [31] Y. Du, C.-I. Chang, et al., New hyperspectral discrimination measure for spectral character- ization, Optical engineering 43 (2004) 1777–1786. doi:10.1117/1.1766301. 14 Andreas Schliebitz et al. CEUR Workshop Proceedings 1–15 [32] M. S. Dhanoa, S. J. Lister, R. Sanderson, R. J. Barnes, The Link between Multiplicative Scatter Correction (MSC) and Standard Normal Variate (SNV) Transformations of NIR Spectra, J. Near Infrared Spectrosc. 2 (1994) 43–47. doi:10.1255/jnirs.30. [33] O. Biran, C. Cotton, Explanation and Justification in Machine Learning: A Survey, in: IJCAI-17 Workshop on Explainable AI, 2017. [34] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Information Fusion 58 (2020) 82 – 115. doi:10.1016/j.inffus.2019.12.012. [35] L. Bertossi, F. Geerts, Data quality and explainable AI, Journal of Data and Information Quality (JDIQ) 12 (2020) 1–9. doi:10.1145/3386687. [36] S. Vollert, M. Atzmueller, A. Theissler, Interpretable Machine Learning: A Brief Survey From the Predictive Maintenance Perspective, in: Proc. IEEE International Conference on Emerging Technologies and Factory Automation (ETFA 2021), IEEE, 2021. doi:10.1109/ ETFA45728.2021.9613467. [37] M. Atzmueller, Declarative Aspects in Explicative Data Mining for Computational Sensemaking, in: D. Seipel, M. Hanus, S. Abreu (Eds.), Proc. International Confer- ence on Declarative Programming, Springer, Heidelberg, Germany, 2018, pp. 97–114. doi:10.1007/978-3-030-00801-7_7. [38] A. Holzinger, From machine learning to explainable AI, in: 2018 world symposium on digital intelligence for systems and machines (DISA), IEEE, 2018, pp. 55–66. doi:10.1109/ DISA.2018.8490530. [39] T. Wu, M. Tjandrasuwita, Z. Wu, X. Yang, K. Liu, R. Sosic, J. Leskovec, ZeroC: A Neuro- Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time, Advances in Neural Information Processing Systems 35 (2022) 9828–9840. doi:10.48550/ arXiv.2206.15049. 15