=Paper=
{{Paper
|id=Vol-3736/paper1
|storemode=property
|title=Precision Slicing for Enhanced Defect Detection in High-Resolution Wind Turbine Blade Imagery
|pdfUrl=https://ceur-ws.org/Vol-3736/paper1.pdf
|volume=Vol-3736
|authors=Serhii Svystun,Oleksandr Melnychenko,Pavlo Radiuk,Oleg Savenko,Anatoliy Sachenko
|dblpUrl=https://dblp.org/rec/conf/icyberphys/SvystunMRSS24
}}
==Precision Slicing for Enhanced Defect Detection in High-Resolution Wind Turbine Blade Imagery==
Precision slicing for enhanced defect detection in high-resolution wind turbine blade imagery⋆ Serhii Svystun1,∗,†, Oleksandr Melnychenko1,† Pavlo Radiuk1,†, Oleg Savenko1,† and Anatoliy Sachenko2,3,† 1 Khmelnytskyi National University, 11, Institutes str., Khmelnytskyi, 29016, Ukraine 2 Kazimierz Pulaski University of Technology and Humanities, Department of Informatics, Radom, Poland 3 Research Institute for Intelligent Computer Systems, West Ukrainian National University, Ternopil, Ukraine Abstract The analysis of high-resolution aerial imagery captured by unmanned aerial vehicles (UAVs) presents significant analytical challenges, primarily due to the minuscule size of observable objects and the variability in object scale influenced by UAV altitude and positioning. These factors often lead to diminished data fidelity and complicate the detection of smaller objects, which are critical in applications such as infrastructure monitoring. Traditional image processing techniques, which typically segment images into smaller, randomly cropped sections before analysis, must sufficiently address these challenges. In this work, we propose a novel defect detection framework for identifying minor to medium-sized damages on wind turbine blades (WTBs), a critical component in renewable energy production. The proposed framework, termed 'slice-aided inference,' enhances the existing methodologies by incorporating both traditional patch division and a novel, more advanced technique known as slice-aided hyper-inference. These techniques are rigorously assessed with various advanced deep learning models, emphasizing their efficiency in identifying surface defects. The empirical testing conducted as part of this study demonstrates significant enhancements in detection capabilities, leveraging a dataset of high-resolution UAV images to highlight the practical applications and effectiveness of the proposed framework in real-world scenarios. Keywords Aerial imagery, drone imaging, defect detection, WTBs, slice-aided inference, hyper-inference, deep neural networks, image segmentation, high-resolution imaging, and object detection 1 1. Introduction Renewable energy sources are increasingly recognized for their substantial benefits and are not just solutions for individual nations but global imperatives in the face of climate change. Unlike fossil fuels, they significantly mitigate CO2 emissions, which is crucial in the international strategy to curb anthropogenic global warming [1, 2]. Deploying renewable energy bolsters national energy security and diminishes dependency on imported fossil fuels [3]. The localized ICyberPhyS-2024: 1st International Workshop on Intelligent & CyberPhysical Systems, June 28, 2024, Khmelnytskyi, Ukraine ∗ Corresponding author. † These authors contributed equally. svystuns@khmnu.edu.ua (S. Svystun); melnychenko@khmnu.edu.ua (O. Melnychenko); radiukp@khmnu.edu.ua (P. Radiuk); savenko_oleg_st@ukr.net (O. Savenko); as@wunu.edu.ua (A. Sachenko) 0009-0009-8210-6450 (S. Svystun); 0000-0001-8565-7092 (O. Melnychenko); 0000-0003-3609-112X (P. Radiuk); 0000-0002-4104-745X (O. Savenko); 0000-0002-0907-3682 (A. Sachenko) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings availability of renewable resources minimizes the risks associated with supply disruptions and price volatility, promoting energy independence [1]. As a result, there is an escalating trend among countries to invest in renewable energy infrastructures, including wind turbines and solar panels. Wind turbines convert kinetic wind energy into electrical power, reducing greenhouse gas emissions and enhancing energy supply diversity, security, and sustainability. These turbines operate by harnessing the motion of wind to generate electricity without the need for combustion processes [4]. The efficiency and output of wind turbines are significantly influenced by the condition of their blades, which are integral to maximizing energy capture [5]. Research suggests that the blades are accountable for up to 25% of a turbine’s total energy production. Maintaining these blades in optimal condition is imperative for maximizing energy generation and minimizing operational downtime and associated costs [6]. Historically, the inspection of wind turbines, especially offshore ones, has been predominantly manual and labor-intensive, contributing to elevated maintenance costs and extended operational delays. Recent advancements in sensor technologies, including those for acoustic, vibration, ultrasonic, and strain measurements, have substantially improved wind turbines' maintenance and condition monitoring [7–9]. The integration of visual sensors, which facilitate detailed examinations of turbine surfaces, is expected to yield significant benefits in the maintenance regimes of these critical energy assets. There is a growing need for safer and more efficient methods of inspecting WTBs to enhance cost-effectiveness and efficiency [10]. The ideal solution would balance sensors' reliability, accuracy, and affordability. By implementing an effective and economical approach, wind farm operators can optimize energy production while cutting maintenance expenses. Technological advancements, especially in UAVs, are at the forefront in developed countries for various applications [11, 12]. UAVs have proven to be highly effective tools for aerial inspections, particularly in assessing WTBs [13]. By utilizing UAV-based inspection methodologies, practitioners have achieved an impressive throughput of 10–12 daily turbines, which could increase to 15–20 with complete automation [13]. These advancements surpass conventional inspection methods and hold significant promise for improving inspection efficacy, reducing expenses, and enhancing energy generation with minimal operational disruption, all of which are crucial in the energy sector. The automated scrutiny of energy infrastructure, notably WTBs, can gain substantially from the progression in drone technology and remote image surveillance, engendering cost efficiencies and fostering climate change mitigation endeavors and safety protocols. Despite strides made in deep learning paradigms, persistent challenges in object detection persist, stemming from factors such as diminished image resolution, occlusions, intricate backgrounds, and the diminutive dimensions of target objects. The essence of deep learning methodologies lies in the iterative processes of training and inference, whereby models are honed to discern anomalies through optimization methodologies applied to datasets replete with instances of defects [14]. Subsequently, once trained, these models can identify deviations in new imagery by extrapolating learned patterns. The overarching significance of meticulous dataset curation, judicious selection of architectural frameworks, and optimization of these facets underscores the imperative of achieving precise outcomes. Introducing high-resolution technology in aerial imaging has significantly enhanced our ability to capture intricate details from a high vantage point [15, 16]. However, this advancement also brings its own set of challenges. Depending on the UAV's proximity to the subject, the varying scales of objects within these images can make it difficult to discern small entities as the UAV moves away from its focal point. Additionally, the large amount of background information in high-resolution imagery can challenge effective computational processing. Within the purview of deep learning detection methodologies, exemplified by convolutional neural networks (CNNs), these challenges manifest as impediments to optimal classifier training, frequently culminating in compromised detection accuracy [17]. The emergence of high-resolution data streams, typified by high-definition (HD) 4K, necessitates the innovation of novel analytical techniques to navigate these intricacies. Figure 1 visually represents the multifaceted challenges encountered in inspecting surface defects on WTBs using drone technology. Inference Inference Zoomed Training Figure 1: This scheme illustrates the utilization of drones for the surveillance of WTB surfaces. In transmitting image data to servers facilitated by 4G/5G technology, defect detection models frequently encounter the challenge of processing high-resolution images. In our proposed framework, training is conducted on image patches, with a meticulous pre-processing step that involves discarding patches lacking relevant content. Subsequently, during the inference stage, implementing a robust strategy becomes imperative to detect minor defects. The scheme in Figure 1 underscores the intricacies inherent in high-resolution images, accentuating the complexities inherent in this domain. The rest of the paper is structured as follows. Related works review existing methodologies and similar approaches. The suggested architecture section details the proposed defect detection framework, covers the DTU-Drones dataset and annotation process, and outlines training and inference strategies, emphasizing slice-aided hyper-inference. The results section presents experimental outcomes and performance analysis. The discussion highlights comparative analysis and practical implications. Finally, the conclusion summarizes findings, improvements in defect detection, and future research directions, followed by references. 2. Related works The analysis of defects on WTBs encompasses a spectrum of methodologies, ranging from conventional techniques rooted in image processing to contemporary approaches leveraging hand-crafted features. Wang and Zhang, for instance, deployed Haar-like features in tandem with cascaded classifiers to discern surface cracks on WTBs, with a principal emphasis on distinguishing cracked regions from non-cracked ones [18]. Similarly, Huang and Wang extended the utility of Haar-like features, integrating them with the parallel Jaya K-means algorithm to achieve enhanced precision in surface crack detection [19]. Deng and Guo devised a novel strategy by amalgamating an optimized Lévy flight strategy combined with the log- Gabor filter for defect identification [20]. To efficiently detect large-scale cracks on WTBs, Peng et al. [21] proposed an analytical framework harnessing UAV-captured imagery. Alternatively, Ruiz and Magda [22] adopted a distinct approach by converting operational signals from wind turbines into grayscale representations, subsequently leveraging multichannel texture features for pattern recognition. The advent of deep learning methodologies has catalyzed a significant paradigm shift within the realm of WTB defect detection. Shihavuddin et al. [23] spearheaded the introduction of a feature pyramid network augmented with offline data augmentation techniques tailored specifically for processing higher-resolution imagery. Their methodology involved training diverse Faster-RCNN detectors on heterogeneous private datasets, yielding promising outcomes. Subsequent explorations delved into the efficacy of YOLO models and EfficientDet, further substantiating the potential of deep learning frameworks in this domain. The superiority of CNNs over traditional descriptors has been underscored, particularly with the added advantage of ensemble classifiers. Foster et al. [17] contributed to this discourse by categorizing WTB defects through the utilization of image patches for both training and inference tasks. Yu et al. [24] addressed challenges associated with blurry imagery by deploying a super-resolution CNN model complemented by Laplacian variance pre-processing techniques. Remarkable advancements in detection performance for WTB defects have also been ascribed to deploying other deep learning architectures. Collectively, these studies illuminate the pivotal role of deep learning methodologies in augmenting defect detection capabilities in WTB inspection. The processing of high-resolution drone-captured images poses substantial challenges due to the variability in object scales and the requisite computational resources [25]. To mitigate these challenges, a prevalent approach involves partitioning these images into smaller patches, a practice endorsed by several studies in the field [13, 19]. This strategy alleviates computational burdens, enhances object clarity, and augments dataset dimensions, fortifying model performance. Despite the substantial body of literature dedicated to detecting WTB defects, a notable discrepancy endures in the methodologies and classification schemes employed across these studies, a gap that our research aims to address. Benchmarking the performance of WTB surface defect detectors proves particularly challenging due to the confidentiality of data [24] and the absence of annotations, even in cases where data accessibility is feasible [17, 23]. Our work strives to establish a unified approach and set a benchmark for future studies in this area. Utilizing drones for detecting surface defects on wind turbine blades (WTB) has proven to be a cost-effective and efficient method, supported by prior research findings. However, this inspection technique comes with its own set of challenges, such as processing high-resolution images, detecting small objects, and adjusting for changes in object scale due to variations in the drone's position. This study aims to improve accuracy of defect detection in renewable energy assets using imagery captured by UAVs. To accomplish the stated objective, this paper intends to make the following distinctive contributions: • We present a defect detection framework that integrates a realistic slice-based inference strategy for object detection in high-resolution images. • We conduct a benchmark comparison of our framework against several state-of-the-art deep learning detection baselines and slicing strategies, tailored specifically for inspecting wind turbine blades. • We perform an extensive evaluation using a high-resolution drone image dataset, showcasing significant improvements in detecting minor and medium-sized defects on wind turbine blades. 3. Suggested architecture Figure 2 depicts our proposed framework for WTB surface defect detection. During the pre- processing step, high-resolution images undergo partitioning into patches, which are subsequently incorporated into the training process. This framework aims to optimize defect detection performance by leveraging patch-based and full-resolution inference strategies, thereby addressing the challenges associated with high-resolution imagery in WTB inspection. 3.1. Dataset This study uses the DTU-Drones inspection dataset of WTB images sourced from the Technical University of Denmark (DTU). This publicly available dataset, accessible at [26], comprises 589 high-resolution images captured between 2017 and 2018 across diverse environmental conditions. Notably, while high-resolution images typically possess dimensions of 1920 × 1080 pixels, the images within this dataset boast 5280 × 2890 pixels, unequivocally classifying them as high resolution. Given the absence of surface defect bounding boxes in the DTU dataset, we embarked on an annotation process after an exhaustive analysis of various WTB surface defects prevalent in the dataset. It's worth noting that several recent studies have utilized the current dataset for similar purposes [17, 23]. However, these studies could have increased their value by making their annotations publicly accessible for broader utilization and by providing more consistent interpretations of surface defect types. 3.2. Surface defects on wind turbine blades Identifying defects required a comprehensive review of the existing literature concerning the detection of surface defects in wind turbine blades WTBs. Utilizing publicly available datasets (Section 3.1) and conducting a comprehensive literature search, we systematically categorized various surface defects for our study. Figure 2: The left portion of the figure showcases different surface defect types from the DTU dataset. On the right, the workflow is depicted, beginning with dataset pre-processing and training, as shown by the black arrows. Two distinct inference scenarios are demonstrated: Scenario I, marked by red arrows, employs image patches for inference, while Scenario II, indicated by blue arrows, uses the original resolution of the test images for inference. We carefully selected a total of 314 images, each representing one of five distinct types of surface defects, as detailed in this investigation. These images collectively encompass 879 instances, with multiple defects observed per image. Specifically, we identified the following five categories of defects, as illustrated on the left side of Figure 2. • Missing Teeth. This defect category involves missing teeth in the vertex-generating panel, a critical component of WTBs. Identifying the presence or absence of these teeth is essential for ensuring optimal blade performance. • Paint-Off. Paint-off describes the loss or peeling of the protective paint layer on the surface of WTBs. Although not directly harmful, paint-off signals the necessity for maintenance to maintain the blade's structural integrity and extend its lifespan. • Erosion. Erosion is a form of surface degradation affecting WTBs that experiences gradual wear and tear caused by environmental conditions or extended exposure to natural elements. While erosion may not pose immediate threats, it necessitates regular maintenance to mitigate potential issues. • Crack. Surface cracks in WTBs are considered critical defects owing to their potential to induce structural instability, ultimately leading to catastrophic failure. Detecting and accurately localizing surface cracks are paramount for facilitating prompt maintenance and averting further structural deterioration. • Damage Lightning Receptor. The lightning receptor plays a crucial role in protecting the wind turbine blade from lightning strikes. Identifying any surface damage to the lightning receptor is essential for evaluating its effectiveness and ensuring it provides sufficient protection against lightning-related damage. 3.3. Data annotation The dataset annotation process was conducted with meticulous attention to detail, explicitly targeting the defective segments of WTBs. Our annotation methodology entailed precisely localizing regions of interest corresponding to each defect type. During the annotation process, significant focus was placed on precisely identifying and delineating specific areas of the wind turbine blades WTBs that exhibited defects such as missing teeth in the vertex-generating panel, erosion, damage to the lightning receptor, cracks, and paint-off. These surface defects were carefully localized within their respective regions, providing a thorough and detailed annotation of the defective segments. 3.4. Pre-processing Before embarking on the learning phase, a series of pre-processing steps were diligently executed to ensure the manageability of processing and model training for high-resolution images, all while preserving the intricate details inherent in the imagery. Building upon the insights gleaned from previous research, notably [17], we undertook an empirical analysis to systematically assess the impact of various patch sizes on our study. The chosen patch sizes proved compatible with our object detection models and demonstrated a discernible performance advantage over alternative patch sizes. An automated approach was employed to select image patches containing at least one defect, excluding patches exhibiting only background or devoid of defects from the dataset. Following the patching process, the dataset was partitioned into distinct subsets for training, testing, and validation purposes, facilitating subsequent experimental inquiries. Notably, both online (on- the-fly) and offline augmentation techniques were applied exclusively after utilizing data samples during the model training and inference processes. 3.5. Detection system framework This section provides a comprehensive exposition of the detection framework tailored to address the nuances of high-resolution images. Illustrated in Figure 2 is the entirety of the proposed training and inference pipeline devised for the detection of (WTB) surface defects from high-resolution images. The delineation of the proposed detection framework unfolds across two distinct phases, namely, training and inference. Let’s 𝐼𝐼 ∈ 𝐷𝐷𝑟𝑟 represents a high-resolution image in the training partition database 𝐷𝐷 = {𝐷𝐷𝑟𝑟 , 𝐷𝐷𝑡𝑡 }, where preprocessing produces non-overlapping image patches. 𝑝𝑝 = {(𝑝𝑝, 𝑏𝑏) ∣ 𝑝𝑝 ⊆ 𝐼𝐼, 𝑝𝑝 ∈ R, 𝐾𝐾 × 𝐾𝐾 × 3, 𝑏𝑏 ≠ ∅}. (1) where b is the set of bounding boxed associated with each image patch p and K = 1024, as established in Section 4.1. A model ℳ is trained using image patches that make up the training set. 𝑃𝑃 = � 𝑝𝑝𝑖𝑖 , (2) 𝑖𝑖∈|𝐷𝐷𝑟𝑟 | such that ℳ ← train(𝑷𝑷). We trained three baseline neural network architectures for evaluation and benchmarking purposes: YOLOv5 and RetinaNet, both known for their efficiency and Faster-RCNN, renowned for their accuracy but requiring additional computational resources. YOLOv5 employs a compact yet practical architecture, featuring a deep CNN backbone with 21 convolutional layers (CSPDarknet21), supplemented with a feature pyramid network (PANet) and multiple detection heads for efficient object detection. The system utilizes anchor boxes and a composite loss function play pivotal roles in its training process, with non-maximum suppression refining results during inference. On the other hand, RetinaNet adopts a one-stage design that emphasizes efficiency and employs anchor boxes for region proposals. It utilizes a backbone CNN (ResNet50) and a Feature Pyramid Network (FPN) to capture multi-scale features crucial for detecting objects of various sizes. Lastly, Faster-RCNN operates as a two-stage model, featuring a Region Proposal Network (RPN) for generating proposals and RoI Align layers for feature extraction from these proposals, utilizing ResNet50 as its backbone. Despite its accuracy, Faster-RCNN may demand more computational resources. These neural network architectures were employed to train surface defect detection models, with images pre-processed as outlined in Section 3.4. All models underwent training using the standard multi-class cross-entropy loss function. 𝐶𝐶 ℒ𝒞𝒞ℰ = − � 𝑡𝑡 log(𝑠𝑠𝑖𝑖 ). (3) 𝑖𝑖=1 In this formula, 𝑡𝑡𝑖𝑖 and 𝑠𝑠𝑖𝑖 represent the ground truth label and the SoftMax probabilities for the 𝑖𝑖-th class of 𝐶𝐶 total classes, respectively. The function captures the concept of cross-entropy loss, a common loss function used in classification problems to measure the difference between the true distribution 𝑡𝑡 and the predicted distribution s. 3.6. Inference strategies In the subsequent phase, the inference process unfolds through two distinct strategies: Scenario I and Scenario II. Each plan is meticulously crafted to assess the proposed method's performance under unique conditions. Figure 3 visually delineates both scenarios, offering a graphical representation of their respective methodologies. Evaluating the method's performance under these contrasting conditions yields valuable insights into the proposed framework’s practical viability and efficacy. Figure 3: A visual example that showcases the disparity between Scenario I and II, elucidating two distinct evaluation scenarios for the proposed method. In Scenario I, the model undergoes training and testing on pre-processed image patches, whereas in Scenario II, the model is directly tested on raw high-resolution images. 3.6.1. Scenario I: Patch-based inference Scenario I revolves around constructing the test set by leveraging image patches, as expounded in Section 3.4. Within this framework, individual patches are fed into the trained model for inference. Essentially, Scenario I illustrates a configuration where the model is trained with image patches and evaluated on test patches that, while not identical, maintain the same patch size as those used in training. This setup adheres to the conventional paradigm of machine learning model training. However, it's crucial to acknowledge that Scenario I may segment extended defects, creating separate bounding boxes within distinct image patches, which may not align with practical feasibility. While the patch-based inference process facilitates swift processing, it necessitates additional postprocessing steps for consolidating and identifying corresponding image patches. 3.6.2. Scenario II: Slice-aided inference In Scenario II, heightened realism is achieved by employing unprocessed high-resolution images for the test set. This circumvents the need for manual pre-processing, instead employing an internal pre-processing mechanism intrinsic to the testing process, as demonstrated in Equation (4). This configuration offers notable advantages, notably enabling the direct utilization of high-resolution images for prediction and amalgamating multiple detected defects in the original image rather than treating them individually, thereby better reflecting the challenges encountered in real-world scenarios. To effectively manage the processing of a high-resolution image during inference, particularly within Scenario II, standard resizing or fixed cropping methods are suboptimal for two primary reasons: (1) small objects may become nearly invisible following such transformations, potentially eluding detection; and (2) the precision between overlapping objects may be significantly compromised after resizing the original image. The text describes the implementation of a computational technique called "slice-aided hyper-inference" for processing high-resolution images, particularly for detecting small and medium-sized objects. The image 𝐼𝐼′ from the dataset 𝐷𝐷𝐷𝐷, which is part of the test partition, is segmented into 𝑀𝑀 × 𝑁𝑁 patches, denoted as 𝑝𝑝′𝑚𝑚𝑚𝑚. This approach is designed to facilitate the analysis of each segment individually, improving the inference accuracy and efficiency in handling high-resolution data. To effectively manage the detection of surface defects, mainly to avoid issues with disjointed defects across the boundaries of patches, the method incorporates overlapping sampling of patches using a sliding window technique. This approach ensures that each window samples patches with a defined overlap percentage v between adjacent windows. This overlapping strategy aids in maintaining continuity in the areas of interest across patches, thereby reducing the risk of missing or misidentifying defects that span multiple patches. After the patches are extracted and overlapped, they are resized to a uniform size of 𝑊𝑊 × 𝑊𝑊 pixels. This standardization is crucial for consistent processing and analysis, ensuring that all patches are subjected to the same scale regardless of their original dimensions. Finally, the resized patches undergo patch-level model inference. This step involves analyzing each patch using a trained model to predict or identify defects. The inference process on each patch is done independently, allowing for detailed and localized defect detection within the high-resolution image. The inference is carried out as follows: 𝑏𝑏�𝑚𝑚𝑚𝑚 ← inf �𝑀𝑀�𝑍𝑍(𝑝𝑝𝑚𝑚𝑚𝑚 ′ , 𝑊𝑊)�� ∀𝑚𝑚 ∈ [1, 𝑀𝑀], 𝑛𝑛 ∈ [1, 𝑁𝑁]. (4) The process described involves the detection of defects within images through a series of computational steps. Initially, 𝑏𝑏�𝑚𝑚𝑚𝑚 represents the set of output bounding boxes determined by the analysis. The function 𝑍𝑍(⋅) is a resizing module that standardizes the size of each image patch 𝑝𝑝′ to a uniform width 𝑊𝑊. Which is crucial for consistent processing across different patches. Once the patches are processed, the bounding boxes with confidence scores exceeding the detection threshold 𝑇𝑇𝑑𝑑𝑑𝑑𝑑𝑑 are retained for further analysis. These selected boxes are then subjected to Non-Maximum Suppression (NMS), a method used to eliminate redundant bounding boxes. NMS is applied based on the Intersection over Union (IoU) metric, where boxes overlapping more than a predefined threshold 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 are consolidated. This step ensures that the final set of bounding boxes is non-redundant, optimizing the clarity and accuracy of the defect detection process. 4. Results Experiments utilized the DTU Images dataset for evaluation. Subsequent sections detail the outcomes corresponding to the two scenarios outlined in Section 3.6. These results include the class-wise mean average precision and comparisons among small, medium, and large objects across both scenarios. 4.1. Evaluation details To assess the efficacy of two distinct methodologies, the dataset was divided into three parts: training, validation, and testing, with proportions of 70%, 15%, and 15%. In the first scenario (Scenario I), we divided the original high-resolution images into patches measuring 1024 pixels on each side, as detailed in Table 1. Table 1 The image depicts a comparison of patch sizes employed for training and inference in experiments where YOLOv5 serves as the foundational model, highlighting the variations and effects on model performance. In Scenario I, the table shows the results from the validation set, where 'K' denotes the patch resolution, and the count of samples correlates with the training dataset used. Scenario II details 'W' as the window size with the validation set outcomes influenced by parameters 𝑉𝑉 = 0.1, 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 0.5, and 𝑇𝑇𝑑𝑑𝑑𝑑𝑑𝑑 = 0.001. mAP mAP mAP K Images Labels mAP@0.5 @0.5- W Patches @0.5- @0.5 0.95 0.95 640 878 1045 78.7 44.2 511 83 81.4 44.4 800 756 943 79.4 42.5 799 39 82.1 44.7 1024 588 786 84.5 45.9 1023 23 81.9 44.6 2048 352 656 84.4 47.3 2047 5 80.6 43.8 This patch size was chosen to facilitate manageable training under resource constraints while preserving the image's intricate details, as discussed in Section 3.4. In the second scenario (Scenario II), through detailed experimentation, it was found that a patch width of 800 pixels (W) yielded the best results, as indicated in Table 1. 4.2. Performance metric To assess the effectiveness of our models, we evaluated their performance on the test partition by calculating the mean average precision (mAP). This metric was examined at the commonly used 0.5 IoU threshold (expressed as mAP@.50) typical in object detection analyses and through a more detailed measure spanning a range from 0.5 to 0.95 IoU thresholds, in increments of 0.05 (expressed as mAP@.5-.95). In alignment with the criteria set by the COCO challenge, we analyzed performance specifically for small, medium, and large objects. For this purpose, mAP@.5-.95s was calculated for small objects (area < 322), mAP@.5-.95m for medium objects (322 < area < 962), and mAP@.5-.95l for large objects (area > 962). 4.3. Training configurations In our experiments, training was performed with a batch size of 8, using the Stochastic Gradient Descent (SGD) optimizer. Learning rates were set according to standard practices: for Faster- RCNN and RetinaNet, and for YOLOv5. Baseline models were sourced from the Detectron2 and Ultralytics repositories. All experiments were conducted on a system equipped with an Intel i5 processor and a single NVIDIA RTX 4060 GPU. 4.4. Experimental results 4.4.1. Overall results In Table 2, the YOLOv5 model demonstrates stable performance, with a maximum variation of 1.7 points observed in the "large" object size category. Table 2 mAP@.5-.95 for Scenario I and Scenario II for Small, Medium, and Large objects using the DTU test set. Models Scena Scena Scena Scena Scena Scena Δ(Small) Δ(Medium) Δ(Large) rio I rio I rio I rio II rio II- rio II – Medi – – Medi – Small um Large Small um Large YOLOv5 28.3 48.5 52.4 26.2 49.1 54.1 -2.1 0.7 1.8 Faster- 17 43.3 64.8 30.2 49.5 51.5 14.2 7.2 -13.3 RCNN RetinaNet 14.3 37.7 66.6 19.1 44.5 51.1 5.8 7.8 -16.5 In contrast, Faster-RCNN displays more significant fluctuations, especially achieving notable improvements for "small" and "medium" objects by 14.2 and 7.2 points, respectively. However, it also shows a marked decrease of 14.3 points for "large" objects, indicating its sensitivity to scenario changes and significant variability across different object sizes. RetinaNet also shows improved performance for "small" and "medium" objects, with increases of 5.8 and 7.8 points, but experiences a drop of 15.5 points for "large" objects, mirroring the trend seen in Faster-RCNN. Overall, it is clear from the comparison of detection performance across small, medium, and large objects in Scenarios I and II that the proposed framework in Scenario II significantly enhances the performance of smaller objects when using the Faster-RCNN and RetinaNet models. Table 3 delineates the comparative effectiveness of baseline models across Scenarios I and II using two key metrics. Table 3 Overall, mAP@.5 and mAP@.5-.95 for Scenario I and II on the DTU test set. Models mAP@.50 mAP@.5-.95 Scenario-I Scenario-II Scenario-I Scenario-II YOLOv5 82.3 86.1 42.7 45.2 Faster-RCNN 74.2 84.4 38.8 44.1 RetinaNet 71.6 71.4 33.9 38.9 In Scenario I, YOLOv5 achieves a mAP@.50 of 81.3, which increases to 85.1 in Scenario II. Similarly, Faster-RCNN exhibits a rise from 73.2 to 83.4 when moving from Scenario I to Scenario II. In contrast, RetinaNet records a slight decrease in mAP@.50 from 70.6 to 70.4. For the mAP@.5-.95 metric, all models show enhancements; YOLOv5 improves from 41.7 to 44.2, Faster-RCNN from 37.8 to 43.1, and RetinaNet from 32.9 to 37.9 when transitioning from Scenario I to Scenario II. The results indicate that RetinaNet, along with the other models, benefits significantly under the optimized conditions of Scenario II, particularly in detecting small and medium objects within high-resolution imagery captured by drones. This suggests that Scenario II surpasses the typical configurations in previous studies, enhancing overall detection capabilities. 4.4.2. Class-wise results Table 4 presents additional observations. Table 4 Class-wise mAP@.5-.95 for Scenario I and Scenario II on the DTU test set. Faster- Faster- YOLOv5 YOLOv5 RCNN RCNN RetinaNet RetinaNet Classes Scenario I Scenario II Scenario I Scenario II Scenario I Scenario II ER 44.7 45 39.9 48.2 37.6 47.9 DA 39.7 31.5 26.7 31.2 21.7 21.4 CR 28 54.9 31.7 49.8 24.7 31.2 PO 58.8 57.4 56.6 54.2 53.3 59 MT 42.4 38.1 38.7 37.3 32 35.1 From Table 4, in the case of YOLOv5 under our proposed framework (Scenario II), there is a twofold improvement in the performance for the CR class, although decreases are observed in the MT and DA. But for Faster-RCNN, significant enhancements are noted in the ER, DA, and CR classes with the adoption of our framework. Conversely, RetinaNet shows performance gains in all classes except for the DA class. It's noted that samples in the DA class usually represent minor defects on the WTB surface, which perform well under the slice-aided setup. Figure 4 provides a graphical comparison that highlights class-specific performance differences between Scenario I and Scenario II for the evaluated models. Notably, YOLOv5 consistently improves, particularly in the CR class within Scenario II. In contrast, Faster-RCNN demonstrates enhanced performance across three distinct classes under Scenario II conditions, indicating increased reliability. Furthermore, RetinaNet exhibits exceptional results in Scenario II for all classes, which can largely be attributed to the implementing of the focal loss function. This function effectively tackles class imbalances within the dataset. In addition, Figure 5 presents the precision-recall curve for the DTU test set, highlighting the effectiveness of our method in detecting surface defects on WTBs. Figure 4: The overlapping area in purple illustrates the Intersection over Union (IoU) performance comparison at thresholds from 0.5 to 0.95 for YOLOv5, RetinaNet, and Faster R- CNN across different classes in the WTB dataset. The curves in Figure 5 evaluate two key performance metrics: precision, which indicates the accuracy of the detections, and recall, which measures the method's ability to identify all relevant defects in the images. Figure 5: Precision-recall curve at various IoU thresholds. The curves illustrate the method's performance across various conditions by employing different Intersections over Union (IoU) thresholds, namely C75 (IoU threshold of 0.75), C65 (IoU threshold of 0.65), C50 (IoU threshold of 0.5), and C30 (IoU threshold of 0.3). These IoU thresholds are instrumental in assessing the robustness of the method and the trade-offs between precision and recall at various levels. Such insights are vital for refining and optimizing defect detection methods to enhance accuracy and efficiency in real-world applications. 4.4.3. Visual comparisons Figure 6 compares key results from Scenarios I and II, using a single trained baseline model focusing on small objects from the dataset. Figure 6: Visual comparison of inference strategies, showcasing prediction results across different scenarios. Close analysis reveals that our proposed framework significantly enhances the ability to detect defects, particularly in cases where Scenario I might miss or inadequately detect them. This improvement is notably illustrated in Figure 6, particularly in the second row, the second column, where the model under Scenario I completely misses a DA defect. Conversely, this defect is successfully detected in Scenario II, shown in the second row, third column; it remains unnoticed in a 1024-pixel setting but becomes apparent in an 800-pixel context, as used in Scenario II. These findings underscore the efficacy of a multi-scale image processing strategy. However, it is essential to recognize that both scenarios still encounter challenges, especially with certain defect classes that are difficult to localize. This issue is highlighted in Figure 6, where a PO defect at the edge of the image in the second row, third column, poses localization challenges for the model in Scenario II. This example demonstrates the complex challenges present in analyzing drone-captured imagery. 4.4.4. Efficiency Given the complexities involved with slice-aided inference, it naturally leads to longer inference times when processing a full-size image. In Scenario II, we observed an average inference time of 0.418 seconds per patch using the YOLOv5 model, equating to approximately 27.6 seconds for an entire full-size image. However, it is worth noting that the inference duration could be shortened by selectively processing only specific patches. A comparative analysis of inference speeds between the two scenarios reveals that Scenario I is more efficient. This efficiency advantage stems from Scenario I's method of patch-based processing, which contrasts with the slower performance observed in Scenario II. The latter's extended processing time is primarily due to its detailed analysis, which thoroughly considers predictions from the original high-resolution images. The complexity of object detection models can be gauged by examining the number of parameters they incorporate. YOLOv5, known for its simplicity, utilizes approximately 7.2 million parameters. In contrast, RetinaNet features around 32 million parameters, and Faster- RCNN is even more complex, with about 38 million parameters. 5. Discussion This study explored two distinct methodologies for evaluating our defect detection technique tailored for WTB inspections. Scenario I utilized segmented patches from images for both training and testing. This method proved fast but had the drawback of occasionally missing defects that span multiple patches, resulting in fragmented detections of a single defect. Additionally, detecting small objects is challenging due to the high-resolution aerial imagery and the variable scale of objects caused by the drone's varying distance from the target. Common strategies to address this issue in high-resolution images involve randomly cropping or rescaling images before they are introduced to the model for training and testing. Nevertheless, these tactics may still result in poor representation of objects during the training phase. Alternatively, we considered segmenting the images into smaller patches for direct application in both the training and testing phases. On the other hand, Scenario II evaluated our method on raw, high-resolution images. This approach successfully identified defects overlooked in Scenario I, especially those that were small or spanned multiple patches. Performance-wise, YOLOv5 demonstrated consistent results in both scenarios, with slight improvements for medium and large objects in Scenario II. Faster R-CNN showed substantial enhancements in detecting small and medium objects in Scenario II, though its efficiency declined for larger objects. Likewise, RetinaNet improved its detection of small and medium-sized objects in Scenario II but struggled with larger objects. The comparative analysis summarized in Table 3 underscores that the proposed method in Scenario II consistently elevates the performance of YOLOv5, Faster-RCNN, and RetinaNet across various metrics. Our approach could significantly boost defect detection in practical applications, especially for smaller objects. This technique is versatile for both on-shore and off- shore operations, requiring only an image of the turbine blade. 6. Conclusions This paper presents a robust framework specifically designed for detecting surface defects on WTBs using high-resolution imagery. Our proposed slice-aided inference method significantly improves the detection accuracy of small and medium-sized defects in high-resolution UAV- captured images. The experimental results show that our framework, particularly under Scenario II conditions, enhances the performance of deep learning models such as YOLOv5, Faster-RCNN, and RetinaNet. For instance, YOLOv5 achieved a mAP of 45.2% in Scenario II, compared to 42.7% in Scenario I. Similarly, Faster-RCNN improved from 38.8% in Scenario I to 44.1% in Scenario II, and RetinaNet showed an increase from 33.9% to 38.9%. Despite these significant improvements, the proposed framework has some limitations. One of the main challenges is the increased computational cost associated with slice-aided inference, which leads to longer processing times. For example, the average inference time in Scenario II was 27.6 seconds per full-size image, which is substantially higher than in Scenario I. Additionally, the method still faces difficulties in detecting certain defect types, such as paint- off defects at the image edges, which can affect localization accuracy. Future research should focus on addressing these limitations by optimizing the inference process to reduce computational overhead and improving the detection algorithms to better handle edge cases and complex backgrounds. References [1] F. R. Alharbi, D. Csala, Gulf cooperation council countries’ climate change mitigation challenges and exploration of solar and wind energy resource potential, Appl. Sci. 11.6 (2021) 2648. doi:10.3390/app11062648. [2] M. Ikram, R. Sroufe, Q. Zhang, M. Ferasso, Assessment and prediction of environmental sustainability: novel grey models comparative analysis of China vs. the USA, Environ. Sci. Pollut. Res. 28 (2021) 17891–17912. doi:10.1007/s11356-020-11418-3. [3] J. Mamkhezri, M. Khezri, Assessing the spillover effects of research and development and renewable eFiskernergy on CO2 emissions: international evidence, Environ., Dev. Sustain. 26 (2023) 7657–7686. doi:10.1007/s10668-023-03026-1. [4] E. Hernandez-Estrada, O. Lastres-Danguillecourt, J. B. Robles-Ocampo, A. Lopez-Lopez, P. Y. Sevilla-Camacho, B. Y. Perez-Sariñana, J. R. Dorrego-Portela, Considerations for the structural analysis and design of wind turbine towers: A review, Renew. Sustain. Energy Rev. 137 (2021) 110447. doi:10.1016/j.rser.2020.110447. [5] K. A. Adeyeye, N. Ijumba, J. Colton, The effect of the number of blades on the efficiency of a wind turbine, IOP Conf. Ser. 801.1 (2021) 012020. doi:10.1088/1755-1315/801/1/012020. [6] C. Cieslak, A. Shah, B. Clark, P. Childs, Wind-Turbine Inspection, Maintenance and Repair Robotic System, in: ASME Turbo Expo 2023: Turbomachinery Technical Conference and Exposition, American Society of Mechanical Engineers, New York, NY, USA, 2023, pp. 1–11. doi:10.1115/gt2023-101713. [7] Y. Du, S. Zhou, X. Jing, Y. Peng, H. Wu, N. Kwok, Damage detection techniques for wind turbine blades: A review, Mech. Syst. Signal Process. 141 (2020) 106445. doi:10.1016/j.ymssp.2019.106445. [8] O. Melnychenko, O. Savenko, A self-organized automated system to control unmanned aerial vehicles for object detection, in: Proceedings of the 4th International Workshop on Intelligent Information Technologies & Systems of Information Security (IntelITSIS’2023), CEUR- WS.org, Aachen, 2023, pp. 589–600. [9] A. I. Panagiotopoulos, D. Tcherniak, S. D. Fassois, Damage detection on an operating wind turbine blade via a single vibration sensor: A feasibility study, in: Lecture Notes in Civil Engineering, Springer International Publishing, Cham, 2021, pp. 405–414. doi:10.1007/978-3- 030-64908-1_38. [10] S. Sun, T. Wang, F. Chu, In-situ condition monitoring of wind turbine blades: A critical and systematic review of techniques, challenges, and futures, Renew. Sustain. Energy Rev. 160 (2022) 112326. doi:10.1016/j.rser.2022.112326. [11] Z. Liu, X. Liu, K. Wang, Z. Liang, J. A. F. O. Correia, A. De Jesus, GA-BP neural network- based strain prediction in full-scale static testing of wind turbine blades, Energies 12.6 (2019) 1026. doi:10.3390/en12061026. [12] M. Shafiee, Z. Zhou, L. Mei, F. Dinmohammadi, J. Karama, D. Flynn, Unmanned aerial drones for inspection of offshore wind turbines: A mission-critical failure analysis, Robotics 10.1 (2021) 26. doi:10.3390/robotics10010026. [13] W. Qi, Object detection in high resolution optical image based on deep learning technique, Nat. Hazards Res. 2.4 (2022) 384–392. doi:10.1016/j.nhres.2022.10.002. [14] O. Melnychenko, L. Scislo, O. Savenko, A. Sachenko, P. Radiuk, Intelligent integrated system for fruit detection using multi-UAV imaging and deep learning, Sensors 24.6 (2024) 1913. doi:10.3390/s24061913. [15] R. Yang, R. Wang, Y. Deng, X. Jia, H. Zhang, Rethinking the random cropping data augmentation method used in the training of CNN-based SAR image ship detector, Remote Sens. 13.1 (2020) 34. doi:10.3390/rs13010034. [16] O. Pavlova, O. Halytskyi, Video repeater design concept for UAV control, Comput. Syst. Inf. Technol. No. 1 (2024) 33–38. doi:10.31891/csit-2024-1-4. [17] O. Melnychenko, O. Savenko, P. Radiuk, Apple detection with occlusions using modified YOLOv5-v1, in: 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), IEEE, New York, NY, USA, 2023, pp. 107–112. doi:10.1109/idaacs58523.2023.10348779. [18] L. Wang, Z. Zhang, Automatic detection of wind turbine blade surface cracks based on UAV- taken images, IEEE Trans. Ind. Electron. 64.9 (2017) 7293–7303. doi:10.1109/tie.2017.2682037. [19] L. Wang, Z. Zhang, X. Luo, A two-stage data-driven approach for image-based wind turbine blade crack inspections, IEEE/ASME Trans. Mechatron. 24.3 (2019) 1271–1281. doi:10.1109/tmech.2019.2908233. [20] L. Deng, Y. Guo, B. Chai, Defect detection on a wind turbine blade based on digital image processing, Processes 9.8 (2021) 1452. doi:10.3390/pr9081452. [21] L. Peng, J. Liu, Detection and analysis of large-scale WT blade surface cracks based on UAV- taken images, IET Image Process. 12.11 (2018) 2059–2064. doi:10.1049/iet-ipr.2018.5542. [22] M. Ruiz, L. E. Mujica, S. Alférez, L. Acho, C. Tutivén, Y. Vidal, J. Rodellar, F. Pozo, Wind turbine fault detection and classification by means of image texture analysis, Mech. Syst. Signal Process. 107 (2018) 149–167. doi:10.1016/j.ymssp.2017.12.035. [23] A. Shihavuddin, M. R. A. Rashid, M. H. Maruf, M. A. Hasan, M. A. u. Haq, R. H. Ashique, A. A. Mansur, Image based surface damage detection of renewable energy installations using a unified deep learning approach, Energy Rep. 7 (2021) 4566–4576. doi:10.1016/j.egyr.2021.07.045. [24] Y. Yu, H. Cao, X. Yan, T. Wang, S. S. Ge, Defect identification of wind turbine blades based on defect semantic features with transfer feature extractor, Neurocomputing 376 (2020) 1–9. doi:10.1016/j.neucom.2019.09.071. [25] V. V. Morozov, O. V. Kalnichenko, O. O. O. M. Mezentseva, The method of interaction modeling on basis of deep learning the neural networks in complex IT-projects, Int. J. Comput. (2020) 88–96. doi:10.47839/ijc.19.1.1697. [26] A. Shihavuddin and X. Chen, DTU - Drone inspection images of wind turbine, Software, v. 2, Mendeley Data, 2018. doi:10.17632/hd96prn3nc.2.