<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimizing Defect Detection: A Machine Learning-Ready Data Processing Pipeline</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Loredana Cristaldi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilia Lenzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Martinenghi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Martiri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Moschetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Letizia Tanca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Zanoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano, DEIB</institution>
          ,
          <addr-line>20133 Milano</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Printed Circuit Boards (PCBs) are essential in modern electronics, but ensuring their quality is increasingly dificult due to complex designs and common soldering defects. While AOI and X-ray inspections automate detection, manual data handling still introduces inconsistencies. This study proposes a data processing pipeline that standardizes AOI and X-ray outputs to improve consistency and support machine learning. It integrates preprocessing, harmonizes defect labels, and leverages a relational database for automated storage and analysis. Tests on real-world data show reduced false positives and less need for manual verification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Pipeline</kwd>
        <kwd>Anomaly Detection</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>PCB</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In automated quality control for printed circuit boards (PCBs), Automated Optical Inspection
(AOI) and X-ray Inspection (AXI) generate vast amounts of heterogeneous data, including images,
structured test results, and metadata. However, making efective use of this data requires a
robust processing pipeline that handles data integration, cleaning, transformation, and storage
for subsequent analysis. Traditional AOI systems, while eficient in identifying potential defects,
sufer from high false-positive rates due to rule-based heuristics that fail to generalize across
diferent manufacturing conditions. Similarly, AXI data is often stored in unstructured formats,
making integration with AOI results complex and prone to inconsistencies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The primary challenge is constructing a comprehensive data pipeline that consolidates
information from multiple sources, aligns defect detection outputs, and ensures consistency
across inspection methods. This requires structured database design, schema optimization, and
automated entity resolution techniques to match test results, component references, and board
identifiers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Furthermore, an additional problem for defect detection is caused by the imbalanced nature
of the data. Non-defective components vastly outnumber defective ones, skewing machine
learning models trained on raw data. Addressing this issue necessitates a preprocessing step
that balances the dataset while preserving manufacturing variability [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>This paper presents a data pipeline that eficiently processes, integrates, and stores AOI and
AXI data, creating a structured foundation for defect detection using experts’ analysis input
as ground truth for our systems. We detail the steps involved in data transformation, storage,
and retrieval, providing a scalable framework applicable to large-scale PCB manufacturing
environments. We demonstrate the efectiveness of our process by showcasing how the data
preparation pipeline enables the successful application of Random Forest (RF) and Convolutional
Neural Networks (CNNs) models in various configurations for defect detection. This approach
significantly boosts precision, improving it from 3.03% to over 80% in the binary classification
task. Furthermore, our approach introduces monitoring of false negatives, a crucial aspect
currently overlooked in the AOI system.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Automated defect detection in Printed Circuit Boards (PCBs) has transitioned from rule-based
methods to data-driven approaches. Early systems relied on handcrafted feature extraction and
rule-based classification, but these methods struggled with variations in lighting, alignment,
and defect complexity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Machine learning techniques, such as Support Vector Machines
(SVMs) and RF, improved classification accuracy by leveraging structured defect data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but
they remained limited in handling real-world manufacturing variability.
      </p>
      <p>
        Deep learning further advanced PCB defect detection, with CNNs excelling in feature
extraction from AOI images [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While CNN-based models outperform traditional methods, they
require extensive labeled datasets, making data preprocessing and integration crucial [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Alternative approaches, such as object detection models like YOLO and Faster R-CNN, have been
applied for precise defect localization, balancing accuracy and computational eficiency [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
However, challenges persist due to class imbalance and dataset limitations, often addressed
through data augmentation and transfer learning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Beyond algorithmic improvements, the efectiveness of defect detection hinges on structured
data pipelines that integrate AOI, X-ray, and manual inspection data. Recent studies emphasize
the importance of data alignment across heterogeneous sources to reduce false positives and
enhance defect classification reliability [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This work builds on these findings by proposing a
data pipeline solution that ensures consistency in defect detection across multiple inspection
methods.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data Processing Pipeline</title>
      <p>The proposed methodology addresses the integration and preprocessing of heterogeneous data
sources used in Printed Circuit Board (PCB) defect detection. The pipeline consists of three
main phases:
• Data transformation: Preprocessing to standardize formats and correct inconsistencies.
• Exploratory Data Analysis (EDA): Understanding data distribution and identifying key
challenges.
• Data integration: Merging information from diferent inspection sources into a unified
dataset.</p>
      <p>Each phase presents distinct challenges, which are addressed through strategies that improve
the overall reliability of defect classification models by ensuring high-quality input data, reducing
inconsistencies, and providing a structured foundation for further analysis and defect detection
algorithms.</p>
      <p>Given the increasing complexity of PCB designs and the higher accuracy demanded in defect
detection, traditional single-source inspection approaches are insuficient. By integrating AOI,
X-ray, and manual inspection data, our pipeline establishes a high-resolution defect detection
framework that minimizes false positives and maximizes classification accuracy. However,
achieving this level of integration requires addressing multiple issues related to data format
inconsistencies, identifier mismatches, and varying defect classification criteria.</p>
      <p>In the following subsections, we will provide a detailed description of the pipeline’s phases.
Data Transformation Data was collected from multiple sources, including Automated
Optical Inspection (AOI) images, tabular test results, X-ray scans, PCB layout files, and operator
manual evaluations, which served as the ground truth for supervised learning models. These
data sources exhibited inconsistencies in structure, identifier formats, and missing
information, which made direct integration impossible and required a comprehensive transformation
strategy.</p>
      <p>To standardize AOI images, each image—representing a portion of the whole board—was
systematically renamed following a unified naming convention to enable board sub-region-based
analysis. However, inconsistencies in naming conventions made automated alignment dificult.
To address this, custom scripts were developed to match images with component identifiers
based on spatial metadata and inferred relationships. PCB layout diagrams, available as PDF
ifles, lacked coordinate-based information, requiring manual mapping of component names to
their corresponding AOI and test results. Similarly, X-ray inspection data, originally stored in
HTML format with embedded images, required custom parsing scripts to extract structured
defect classifications and unique identifiers for the inspected PCB. Each entry was enriched
with inspection dates, retrieved from the timestamp of the X-ray inspection, and details of
the imaging equipment used. Since many entries had inconsistent labeling, an adaptive
textmatching approach was necessary to unify defect categories. As far as identifier standardization,
Locality-Sensitive Hashing (LSH) was used to group similar Board IDs and identify formatting
patterns, and regular expressions were then applied to extract relevant components based on
these patterns to resolve discrepancies between AOI and X-ray test results. This significantly
improved cross-referencing, although some cases persisted where ambiguous abbreviations
were used.</p>
      <p>Despite these standardization eforts, several challenges remained. AOI image resolution
varied across inspections, complicating defect localization. Additionally, missing test result
entries and unexpected values without semantic meaning introduced biases, which required
mitigation through a semi-supervised labeling approach based on past inspection data.
Furthermore, bounding boxes were manually placed around components in AOI images to provide
training data for supervised models. However, this process was labor-intensive, highlighting
the need for automated annotation in future iterations.</p>
      <p>Exploratory Data Analysis EDA was performed to evaluate the efectiveness of data
transformation and identify key issues within the dataset.</p>
      <p>First of all, discrepancies between AOI and manual inspections exposed systematic diferences
in defect assessment criteria. While AOI tended to over-detect minor imperfections, human
inspectors focused on severe issues, necessitating a harmonization strategy to align defect
classifications across inspection methods.</p>
      <p>The analysis revealed that AOI systems detected significantly more defects than human
inspectors, indicating a high false-positive rate. These false positives often resulted from minor
solder irregularities flagged by pre-set AOI thresholds, which required a statistical analysis to
refine these thresholds dynamically. Moreover, Table 1 highlights the limitations of the AOI
system when considering the operator’s inspection as the ground truth. Notably, the AOI system
exhibits a high false positive rate, incorrectly identifying a significant number of components as
defective when they are not (29481 cases). This results in particularly low precision (≈ 3.03% ),
potentially leading to wasted resources spent on inspecting components that were wrongly
lfagged as defective. On the other hand, the zero associated with false negatives — meaning no
instances where the AOI system missed a defect identified by the operator — is misleading. This
does not indicate that the system is foolproof in detecting all defects; rather, it reflects a gap in
the verification process: false negatives are not systematically checked in the current workflow.
Nevertheless, such errors, although not directly reflected in the table, have a significant negative
impact on business processes, as undetected defects may result in additional costs and potential
non-conformities in production.</p>
      <p>Furthermore, the analysis of the X-ray dataset revealed an extreme class imbalance, with
defective components accounting for less than 1% of inspected samples. This imbalance
presented challenges for machine learning models, prompting the consideration of oversampling,
synthetic defect generation, and augmentation strategies as mitigation techniques.</p>
      <sec id="sec-3-1">
        <title>Operator: No Defect (0)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Operator: Defect (1)</title>
      </sec>
      <sec id="sec-3-3">
        <title>AOI: No Defect (0)</title>
      </sec>
      <sec id="sec-3-4">
        <title>AOI: Defect (1)</title>
        <p>2,830,061
29,481
0
920</p>
        <p>In this context, a key challenge was the lack of ground truth labels for many defect cases.
Operator classifications were prioritized as the most reliable, yet inconsistencies in their
evaluation introduced subjectivity. Additionally, defects identified by X-ray were not always detected
in AOI, revealing limitations in purely visual inspection techniques. This discrepancy arises
because X-ray inspection is typically used for components with solder joints located beneath
them, which are not visible through AOI alone. Consequently, combining AOI and X-ray
analysis ensures full defect coverage, overcoming the limitations of individual inspection methods.
Cross-validation of AOI and X-ray results was explored to enhance defect detection reliability.
Data Integration The final stage of the pipeline involved merging multiple inspection data
sources into a structured dataset. To achieve this, identifier matching was carried out, where
AOI and X-ray test results were aligned using probabilistic matching techniques and heuristics,
improving cross-system consistency. Subsequently, the defect labels from AOI and manual
inspection were standardized by mapping the descriptions to a unified defect taxonomy.</p>
        <p>Despite these eforts, identifier mismatches persisted due to variations in board numbering
conventions.</p>
        <p>In the following sections, we will explore the additional steps undertaken to prepare the final
dataset for the machine learning task and describe the final structure of the designed database.</p>
        <sec id="sec-3-4-1">
          <title>3.1. Additional preprocessing actions</title>
          <p>As deeply discussed in the previous sections, the initial statistical assessment of the dataset
revealed a severe class imbalance, with defective components accounting for less than 1% of
all inspected samples. Some defect categories were significantly underrepresented, posing a
challenge for machine learning models prone to favoring majority classes. To address this,
two dataset configurations were developed: (i) Full Defect Set: Preserved all defect categories,
maintaining the original imbalance. While this configuration provided the most complete defect
representation, it required additional rebalancing techniques to prevent model bias; (ii) Selected
Subset: Focused only on the four most frequently occurring defect types, ensuring a more even
class distribution and improving classification reliability for high-occurrence defects.</p>
          <p>Moreover, as shown before, the available data included structured test results and unstructured
AOI/X-ray images, each requiring diferent preprocessing steps. To ensure high-quality input for
machine learning models, AOI and X-ray images were normalized and standardized to address
variations in lighting and exposure across diferent inspection conditions, in addition, bounding
box coordinates were adjusted based on statistical outlier detection, reducing annotation errors
in defect localizations.</p>
          <p>More precisely, AOI images were systematically linked to defect annotations stored in the
structured database, ensuring consistency across multiple inspection sources. Labels were
assigned based on operator-confirmed defects, reducing misclassification risks. To address the
issue of class imbalance, a controlled sampling strategy was used to pair defective components
with comparable non-defective ones, maintaining representativeness while preventing model
bias and ensuring balanced training data.</p>
          <p>Image preprocessing standardized resolution and intensity levels, addressing exposure
variations. Data augmentation techniques—including brightness adjustments and Gaussian
noise—were applied to improve model robustness under real-world conditions.</p>
          <p>These preprocessing steps ensured that defect localization models could handle imaging
inconsistencies and generalize across diferent manufacturing conditions.</p>
          <p>For what concerned the structured test results, the raw inspection data were transformed into
feature vectors that represented several key aspects. These included component-level attributes
such as board type, component type, test frequency, and past defect history.</p>
          <p>Inspection metadata was incorporated, which included details like the test method (whether
it was AOI, manual, or X-ray, when available), the inspection timestamp, and defect confidence
scores.</p>
          <p>Finally, defect classification features were added, which consisted of encoded defect labels,
failure severity, and the likelihood of misclassification.</p>
          <p>Feature selection was performed using mutual information ranking, ensuring that only the
most relevant predictors were retained for defect classification. Additionally, dimensionality
reduction via Principal component analysis (PCA) was tested to improve computational eficiency
without sacrificing classification accuracy</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.2. Final Dataset Structure</title>
          <p>The final dataset was designed to support both structured queries and machine learning-based
defect detection. It comprises of a tabular dataset, which functions as a structured database
storing the inspection results, linking each component to its corresponding defect classification,
metadata, and historical inspection outcomes, and an image dataset, consisting of cropped and
preprocessed AOI/X-ray images that were paired with the corresponding defect annotations,
formatted to serve as input for deep learning models.</p>
          <p>To facilitate scalability and rapid access, the database schema was optimized specifically
for machine learning applications. One key optimization was indexing, which was applied
to frequently queried attributes such as component IDs and defect categories, allowing for
quicker data retrieval. Additionally, partitioning was implemented for tables storing historical
inspection data, with partitions organized by time. This approach improved query speed and
enabled real-time defect monitoring. To further optimize storage, hybrid storage strategies were
employed; rather than storing the images directly in the database, file path references were
used. This not only reduced storage overhead but also ensured seamless access to the image
datasets.</p>
          <p>This structured approach ensures eficient defect retrieval, supporting both traditional
statistical analyses and modern AI-driven defect classification methods.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Machine Learning for Defect Detection</title>
      <p>The generated dataset was used to train and evaluate various machine learning models for
classifying defective and non-defective components. Following an initial exploratory phase
that tested diferent model architectures, the most promising candidates advanced to the final
training stage. The models selected for evaluation were:
• Random Forest (RF): Trained on numerical features extracted from AOI images and
structured test results, RF provided strong interpretability and robustness against class
imbalance.
• Convolutional Neural Networks (CNNs): Designed to process AOI images directly,</p>
      <p>CNNs captured spatial defect patterns to enhance classification granularity.</p>
      <p>The models were trained using a 70/30 train-test split on two dataset configurations: (i) the
full defect set and (ii) a more balanced subset containing the most frequent defect types. Table 2
summarizes the classification performance. The results indicate that the RF model achieved high
accuracy, precision, and recall across both subsets, whereas the CNN’s performance declined</p>
      <p>Predicted Defect Class Confidence
()%18000
ifceend4600 45.6%
n
C20 20.1% 25.4%23.9%
o
InsufCiocmieMpni0ostpnSleoanlcdteeSdrhCRifoetimvnegproCsenhdeipnPtToolamrbitsytBoondin3yg.4L%iLfitfitn1eg.Ud3p%LseiLda3eed.a0dD%oOw0vDn.e1ar%htuamn3Bg.r7iHd%egiign0hg.7t%Poi0n.t1%
on the Full Defect set due to significant dataset imbalance, particularly in defect classes with
fewer than 10 samples.</p>
      <p>Approach
Selected Subset
Full Defect Set</p>
      <p>Acc.
88.9%
89.0%</p>
      <p>RF
Prec.
89.0%
89.0%</p>
      <p>Rec.
89.0%
89.2%</p>
      <p>Acc.
86.2%
82.7%</p>
      <p>CNN
Prec.
87.3%
82.7%</p>
      <p>Rec.
85.1%
82.8%</p>
      <p>After testing the binary classification, a multi-label approach was applied using a pre-trained
ResNet model with additional convolutional layers to enhance defect classification. This
approach provides valuable insights into the probability of each defect occurring in diferent
components, which is crucial for improving maintainability and facilitating faster fault recovery.
The proposed method was tested on two diferent subsets of defective components: the first
containing 12 defect classes and the second with 11, where the “Missing Component" fault was
excluded, as it represents a fundamentally diferent issue compared to placement or welding
defects, two examples of the output of this step are shown in Figure 1. The results, presented in
Table 3, demonstrate high precision in identifying the most probable defect. However, the low
recall—particularly when the “Missing Component" class is considered—indicates that the model
sometimes predicts defects that are not actually present in the component. These results could
be improved by increasing the amount of available training data and adjusting fault detection
thresholds based on historical information.</p>
      <p>To evaluate real-world applicability, models were tested under varying brightness conditions
and noise levels (Table 4). CNNs demonstrated higher robustness to brightness fluctuations,
while both RF and CNNs showed performance degradation with increasing image noise.
10% black pixels
30% black pixels</p>
      <p>Acc.
84.0%
77.2%</p>
      <p>Rec.
83.9%
77.1%</p>
      <p>Acc.
83.6%
77.4%</p>
      <p>Rec.
83.1%
81.0%</p>
      <sec id="sec-4-1">
        <title>4.1. Discussion on models performances</title>
        <p>Both RF and CNN models contribute significantly to improving AOI-based defect detection,
each excelling in diferent aspects. RF demonstrated strong accuracy, precision, and recall across
both dataset configurations, leveraging numerical features for structured classification while
efectively handling class imbalance. CNNs, on the other hand, excelled in capturing spatial
defect patterns directly from AOI images, ofering finer classicfiation granularity. However, its
performance declined on the highly imbalanced Full Defect set, highlighting the need for data
augmentation and rebalancing techniques.</p>
        <p>Moreover, both methods significantly improve precision compared to the AOI system in
all configurations, efectively addressing the high false positive rate while maintaining strong
accuracy and recall and mitigating false negatives.</p>
        <p>To further enhance defect classification, both a multi-label approach and a noise-robustness
test were conducted. The multi-label approach provided insights into the probability of each
defect occurring in diferent components, while the robustness test evaluated the models’
resilience to image distortions caused by machinery faults. These analyses highlighted the
importance of establishing performance thresholds to monitor the overall health of the inspection
process.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Works</title>
      <p>This study presents a comprehensive approach to PCB defect analysis by combining a robust data
pipeline with machine learning techniques. The proposed pipeline efectively integrates AOI,
X-ray, and manual inspection data, reducing inconsistencies through identifier standardization
and enhancing data alignment. Overall, integrating ML models into a structured defect detection
pipeline enhances classification by combining numerical inspection data with image-based
defect patterns.</p>
      <p>To further improve performance, future research should focus on automating defect
annotation using object detection models and exploring active learning strategies to iteratively refine
defect classifications. Additionally, multimodal approaches that combine structured tabular data
with visual defect patterns hold promise for enhancing robustness across inspection conditions.
To further improve detection performance and robustness, dataset expansion, adaptive
thresholding, and model fusion strategies can also be considered in future works. These advancements
are expected to further improve defect classification accuracy, reduce manual intervention, and
strengthen the scalability of the proposed system in real-world manufacturing environments.
This study was carried out within the MICS (Made in Italy – Circular and Sustainable) Extended
Partnership and received funding from Next-Generation EU (Italian PNRR – M4 C2, Invest 1.3 –
D.D. 1551.11-10-2022, PE00000004). CUP MICS D43C22003120001.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Generative AI to check grammar and
spelling. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <article-title>Visual inspection: a review of the literature</article-title>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <article-title>Learning from imbalanced data</article-title>
          ,
          <source>IEEE Transactions on knowledge and data engineering 21</source>
          (
          <year>2009</year>
          )
          <fpage>1263</fpage>
          -
          <lpage>1284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A. M.</given-names>
            <surname>Isa</surname>
          </string-name>
          ,
          <article-title>Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A survey</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>15921</fpage>
          -
          <lpage>15944</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Côté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikanjam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Humeniuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Khomh</surname>
          </string-name>
          ,
          <article-title>Data cleaning and machine learning: a systematic literature review</article-title>
          ,
          <source>Autom. Softw. Eng</source>
          .
          <volume>31</volume>
          (
          <year>2024</year>
          )
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.-A. I.</given-names>
            <surname>Hassanin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Abd</surname>
          </string-name>
          El-Samie,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>El Banby</surname>
          </string-name>
          ,
          <article-title>A real-time approach for automatic defect detection from pcbs based on surf features and morphological operations</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>78</volume>
          (
          <year>2019</year>
          )
          <fpage>34437</fpage>
          -
          <lpage>34457</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine learning 45</source>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <article-title>The use of a convolutional neural network in detecting soldering faults from a printed circuit board assembly</article-title>
          ,
          <source>HighTech and Innovation Journal</source>
          <volume>3</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A. M.</given-names>
            <surname>Isa</surname>
          </string-name>
          ,
          <article-title>Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A survey</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>15921</fpage>
          -
          <lpage>15944</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <article-title>You only look once: Unified, real-time object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          , J. Malik,
          <article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A survey on transfer learning</article-title>
          ,
          <source>IEEE Transactions on knowledge and data engineering 22</source>
          (
          <year>2009</year>
          )
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschetti</surname>
          </string-name>
          , E. Lenzi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cristaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tanca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinenghi</surname>
          </string-name>
          ,
          <article-title>A data pipeline to classify pcb welding defects on noisy data, Accepted in IEEE I2MTC (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>