<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unsupervised Anomaly Detection in Industrial Image Data with Autoencoders</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tulsi Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gautam Malik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adriano Puglisi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering, Sapienza University of Rome.</institution>
          <addr-line>Via Ariosto 25, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>84</fpage>
      <lpage>91</lpage>
      <abstract>
        <p>Traditional quality control techniques could miss small defects in manufacturing environments, reducing the quality of the final product. Using the MVTec dataset, a commonly used benchmark in industrial visual inspection, in this study we investigate two types of autoencoders, denoising autoencoders (DAE) and contractive autoencoders (CAE), to solve the problem of defect identification in industrial processes. The presence of both textured and non-textured objects allows a direct comparison between materials with diferent surface characteristics. The VGG16 and ResNet models pre-trained on ImageNet are used as encoders. Three variants of DAE and three of CAE are designed and evaluated. Both the loss MSE (Mean Squared Error) and the SSIM (Structural Similarity Index Measure) are used to compare the reconstruction quality and the defect detection capability. The results highlight performance diferences between DAE and CAE and between diferent object categories, providing useful insights into the efectiveness of each approach in diferent industrial scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Unsupervised Learning</kwd>
        <kwd>Denoising Autoencoder</kwd>
        <kwd>Contractive Autoencoder</kwd>
        <kwd>Anomaly Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>contractive autoencoders to improve the efectiveness of
defect detection.</p>
      <p>
        Quality control is an essential part of many manufactur- The autoencoder encodes the input into a
lowering industries. Usually, it is performed manually, but the dimensional representation known as the latent space,
problem with manual visual inspection is that there are from which the decoder reconstructs the output. A
modpossibilities for error and for this reason, vision-based ification to the autoencoder called a denoising
autoeninspection can be used. The deep neural network has coder stops the network from learning the identity
funcplayed an important role in the automation industry. Us- tion. To be more precise, if the autoencoder is too large,
ing deep neural networks, visual inspection can also be it can just learn the data, resulting in output
equivaautomated. Many image processing and machine learn- lent to input without doing any beneficial representation
ing methods have already been used to achieve auto- learning or dimensionality reduction. Denoising
autoenmated defect detection in production parts. However, coders address this issue by purposefully introducing
image processing methods have limitations, as implicit errors, noise, or masking some input values. [
        <xref ref-type="bibr" rid="ref13">8, 9</xref>
        ]
engineering features are used for the application, which A contractive autoencoder is an unsupervised deep
can be misleading for complex cases. Deep convolutional learning method that aids a neural network in encoding
networks are a solution for automating quality control in unlabeled training input. In general, autoencoders are
the manufacturing industry since they have the ability to employed to discover a representation, or encoding, for a
obtain the best features from images, but these methods set of unlabeled data, typically as the initial step toward
are limited by data availability. There are two problems dimensionality reduction or the creation of new data
to consider: one is the imbalance of data between nor- models. The traditional reconstruction cost function is
mal and defective images. The other is the annotation of enhanced by a penalty term in a contractive autoencoder.
the data. To overcome this problem, defect detection is The Frobenius norm of the Jacobian matrix
representtreated as an anomaly detection problem. ing the activations of the encoder with respect to the
      </p>
      <p>
        Due to the absence of labels in the data, the problem input corresponds to this penalty term. This penalty
can be addressed through unsupervised learning, by train- term causes a localized space contraction, which in turn
ing convolutional networks on normal images and testing produces strong characteristics on the activation layer.
them on images containing defects [
        <xref ref-type="bibr" rid="ref23">1, 2, 3, 4, 5</xref>
        ]. In this The penalty aids in sculpting a representation that is
work, convolutional neural network autoencoders [6, 7] more invariant to most directions orthogonal to the
manare used to perform anomaly detection, in particular, the ifold while also better capturing the local directions of
study focuses on the use of denoising autoencoders and variation required by the data, which correspond to a
lower-dimensional non-linear manifold [
        <xref ref-type="bibr" rid="ref14 ref2 ref29 ref34 ref44 ref9">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>called PNI that, given neighborhood characteristics and
a multi-layer perceptron network model, computes the
For the anomaly detection SIFT and SURF are used to normal distribution using conditional probability.
Addiextract the features from the images and train the model tionally, a histogram of typical characteristics is made for
on normal image. Features of images can be mislead- each point to use position information. The suggested
ing sometimes depending upon nature of the applica- technique uses an extra refining network trained on
fabrition. Machine learning algorithms can be used to clas- cated anomaly pictures in addition to the anomaly map to
sify anomalies from the normal images. A supervised better interpolate and account for the shape and edge of
learning approach is not good for this application, but the input image. Yang et al. [22] presents a novel method
semi-supervised and unsupervised models increase per- for detecting industrial image anomalies based on a
selfformance of model. The supervised and semi-supervised supervised learning and self-attentive graph convolution
approaches are compared and the performance of the (SLSG) network. In SLSG, pseudo-prior knowledge of
semi-supervised model is better than the supervised anomalies is introduced by simulated abnormal samples,
approach[11]. In this paper [1] they propose a model and the encoder is assisted in learning the embedding
based on point features of the images. A hand-crafted of normal patterns and position connections. Holly et
point feature Harris-Laplace point detector is used in this al. [23] suggests a technique that makes use of a total
study to detect the anomalies. The point feature uses reconstruction error and an autoencoder to locate system
Harris corner detector and then SIFT key points to ex- problems. In order to pinpoint the source of a problem,
tract the local shape around key points. Diferent loss the signals that contribute the most to the overall
reconfunctions are used for the unsupervised deep learning struction error are identified by computing the individual
model. This study shows that for diferent types of ob- reconstruction error for each sensor signal.
jects the model performs diferently and a specific type
of loss is suitable for a specific application. To identify
and categorize defects of the LED chip, Lin et al.[12] sug- 3. Methodology
gested the LEDNet network. Cha et al.[13] suggested
utilizing Faster R-CNN, that showed promising results 3.1. Dataset
also in other applications [14], to identify five distinct The dataset used is the MVtec industrial images dataset. It
lfaws to detect structural damage. contains images of many diferent industrial products. It</p>
      <p>The use of autoencoders for unsupervised anomaly is an industrial inspection-focused dataset for evaluating
identification based on reconstruction loss is examined anomaly detection techniques. Over 5000 high-resolution
in [15], highlighting both its strengths and weaknesses. images in fifteen diferent object and texture categories
Using an analogous situation from particle physics, it make up this collection. Each category includes both a
demonstrates that the standard autoencoder configura- test set of photos with various types of faults and images
tion is not a model independent anomaly tagger. In the without defects as well as a set of training images with
work of Lupo et al. [16] generative models are used to no defects [24].
detect anomalies in texts, exploring approaches ranging
from machine learning to deep learning. Among the
analyzed models, the variational autoencoder is the one with 3.2. Preprocessing
the most promising performances for this task. Vincent During the training phase, only normal images, free of
et al. [17] instead studied denoising autoencoders for the defects or anomalies, are used. Images containing defects
extraction of robust features from images, demonstrating of various types, specific to each product, are instead
that these models are able to improve the representation used in the testing phase. All images provided are
highof visual features and, consequently, the overall image resolution RGB images with dimensions of 1024×1024. To
quality. Similar architectures have been employed also reduce computational complexity, they are downsampled
in the field of audio processing, for example for the auto- to 256×256 and normalized by dividing the pixel values
matic identification of speech disorders, exploiting unla- by 255.
beled speech signals [18]. Bionda et al. [19] proposes a When using a denoising autoencoder, noise is
introdeep convolutional autoencoder to detect the anomalies duced into the data to artificially corrupt it. Similarly, the
in the textured images. MSE, or pixel vise error is not suit- test images are also altered and the result of the model
able for textured images as it only focuses on pixels. So, is compared with the original uncorrupted images. An
SSIM is used as a loss function to improve performance of example of the images with noise is shown in Figure 1.
the autoencoder. Complex wavelet SSIM performs better For the contractive autoencoder, resized but uncorrupted
than MSE for the textured images. Based on the applica- images are directly fed as input and the model is trained
tion loss function plays a great role in generative models to faithfully reproduce the same images as output.
[20]. In this paper [21] they present a novel approach
3.3. Model Architecture
The autoencoder model adopted in this work is based on
a structure composed entirely of convolutional layers. Its
architecture can be conceptually divided into three
fundamental components, namely the encoder, the decoder
and the latent space, often referred to as bottleneck. The
encoder has the task of compressing the data, reducing
the dimensionality of the input image until obtaining a
compact representation in the latent space. This encoded
representation is then transmitted to the decoder, which
has the role of expanding the data again to reconstruct
an image as similar as possible to the initial one.</p>
      <p>Four convolutional layers and four max-pooling layers
form the encoder, which gradually reduces the number
of pixels in the image. The type of autoencoder used
determines the structure of the bottleneck. The latent
space in the denoising autoencoder is composed directly
of the final output of the encoder. Instead, the
representation passes through a thick layer that acts as a bottleneck
in the contractive autoencoder. Both architectures use
the same decoder, consisting of four upsampling layers
and five convolutional layers. To restore the output to
its initial size, a final convolutional layer is added. The
complete architectures of the two models are reported in
Figure 2 and Figure 3.</p>
      <p>
        To compare our model we also took into account pre
trained encoders, in particular, the VGG16 model already
trained on the ImageNet dataset. This model includes
sixteen layers in total, thirteen of which are
convolutional and three fully connected. In our implementation,
only the convolutional layers were kept, while the fully
connected ones were removed [
        <xref ref-type="bibr" rid="ref17">25</xref>
        ]. The ResNet50 model
pre-trained on the ImageNet dataset was also used.
Although the encoder architectures are diferent, in both
cases the decoder was kept unchanged, so as to make the
comparison between the configurations more fair and
meaningful.
3.3.1. Loss Function
      </p>
      <sec id="sec-2-1">
        <title>For training and testing losses, denoising autoencoder</title>
        <p>uses the MSE. On measuring pixel values between two  
images, Mean squared error computes the average of the  (, ) = 1 ∑︁ ∑︁[(, ) − (, )]2 (1)
squared diferences between corresponding pixel values.  =1 =1
In the case of a contractive autoencoder, the bottleneck Another loss function used for the texture image is the
or the latent dimension of the autoencoder is used to measure of resemblance between two pictures. SSIM
comcompute the contractive loss along with MSE. Finally, pares the brightness, contrast, and structural elements of
the sum of the two losses is calculated. The autoencoder
is trained using contractive loss.
two images to determine how similar they are. It makes In both cases, training is aimed at minimizing the loss
use of statistical metrics including pixel intensity mean, function.
variance, and covariance. 1 and 2 are constants that are
added to prevent denominator instability [26]. It is often 3.3.3. Thresholding and classification
adopted as a loss function for picture-based optimization
tasks, for example, image denoising or super-resolution,
and as a quality metric for image compression or
restoration.</p>
        <p>For anomaly detection using autoencoders, a threshold
based on the reconstruction error is adopted to
distinguish between normal images and images containing
defects. The error is calculated by comparing each
reconstructed image with the corresponding original image,
 (, ) = (2    + 1)(2  + 2) (2) using MSE or SSIM depending on the type of sample.
( 2 +  2 + 1)( 2 +  2 + 2) The threshold is determined starting from the training
images, which are all free of anomalies. For each of them,
the reconstruction error is calculated, after which the
3.3.2. Training and Optimization threshold is obtained as the average of the errors obtained.
Once established, this threshold allows to classify the test
To train the denoising autoencoder, input images are ar- images, those with an error lower than the threshold are
tificially corrupted with noise, while the corresponding considered normal, while those with an error higher than
clean images are used as targets. The model is optimized the threshold are classified as anomalous.
using Adam optimizer, with a learning rate of 0.0001 and To evaluate the efectiveness of the classification
proa batch size of 32. The contractive autoencoder is trained cess, the accuracy and the F1 score are used. Even if in
with Adam as well, but its loss function includes a reg- the test dataset there are diferent types of anomalies, in
ularization term on the encoder’s Jacobian in addition this study they are treated as belonging to a single class.
to the reconstruction loss. Both loss functions are
applied to each sample type, in order to analyze the model
performance in diferent scenarios. 4. Results and Analysis</p>
        <p>As for the pretrained encoders, the VGG16 and
ResNet50 architectures have been considered, both opti- The model was evaluated on the hazelnut, pill, bottle,
mized with the same decoder used in the other models. screw and tile categories of the MVTec dataset. The pill
In the case of VGG16, the last three fully connected lay- and tile classes represent textured samples, while the
ers are removed, keeping the five convolutional blocks others have smooth surfaces. Both autoencoders, DAE
that are subsequently trained together with the decoder. and CAE, were trained using the two loss functions, MSE
After the encoder, a dense layer is inserted to act as a and SSIM, and tested on all categories. The results for
bottleneck. As for ResNet50, the last layer is removed DAE are reported in Table 1.
and all the other layers are fine-tuned with the decoder.
84–91</p>
      </sec>
      <sec id="sec-2-2">
        <title>For the hazelnut and bottle classes, the classic DAE</title>
        <p>with MSE loss achieved the best performance. In the
case of the screw class, the use of SSIM led to a higher
accuracy, probably due to the geometric complexity of
the spirals. For textured samples such as pill and tile,
the DAE with SSIM consistently provided better results.
The VGG16 encoder, despite slightly increasing overall
accuracy, reduced the ability to detect normal images,
whereas the classical DAE with SSIM maintained a better
balance between normal and defective cases.</p>
        <p>The performance of the contractive autoencoder is
reported in Table 2. For the hazelnut and screw classes,
classical CAE with MSE performed best. For the bottle
class, CAE with VGG16 encoder and loss SSIM showed
the highest performance. For textured samples, using
SSIM also proved more efective. In particular, for the
pill class, CAE with VGG16 achieved the best results. For
the tile class, classical CAE showed solid performance,
while CAE with VGG16 and loss MSE achieved the
highest absolute accuracy, but failed to properly distinguish
normal images.</p>
        <p>DAE proved to be more efective in most cases, its
performance drops with ResNet encoders, while CAE
maintains more consistency across classes. The latter
is more stable on textured samples, but less accurate in
reconstructing details.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <p>This study has shown that autoencoders are a valid and
promising solution for unsupervised anomaly detection
in industrial image data. In particular, the denoising
autoencoder (DAE) achieved consistently better results than
the contractive autoencoder (CAE) across most object
categories. This confirms that the introduction of noise
during training encourages more robust feature
learning and improves the generalization ability of the model
when tested on unseen defective images.</p>
      <p>The experiments demonstrated that the Structural
Similarity Index Measure (SSIM) is more efective than Mean
Squared Error (MSE) when dealing with textured surfaces.
SSIM is sensitive to structural deformations, brightness,
and contrast, and is therefore better suited for materials
where texture plays a key role in defect identification.
On smooth objects, instead, MSE remains competitive
and sometimes preferable.</p>
      <p>One important finding is that using pre-trained
encoders such as VGG16 or ResNet50 does not always
improve results. While VGG16 provided a slight
improvement in some categories, it sometimes reduced the correct
classification of normal samples. ResNet, in particular,
underperformed in most configurations, possibly due to
its architectural complexity and its limited adaptability to
small or subtle defect patterns after fine-tuning. This
indicates that a careful balance must be found between the
use of pre-trained knowledge and the specific needs of
anomaly detection tasks, where fine-grained pixel-level
reconstruction is crucial.</p>
      <p>From a methodological perspective, the combination of
classical convolutional autoencoders with loss functions
adapted to the type of image (MSE for regular shapes,
SSIM for textures) provides a strong and flexible
framework. Moreover, the use of a thresholding strategy based
on reconstruction errors proved simple yet efective in
binary classification between normal and defective cases.</p>
      <p>Despite the relatively small size of the training data
used, the models were able to achieve good classification</p>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <sec id="sec-4-1">
        <title>During the preparation of this work, the authors used</title>
        <p>ChatGPT, Grammarly in order to: Grammar and spelling
check, Paraphrase and reword. After using this
tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s
content.
performance, confirming the potential of unsupervised
learning techniques in real-world industrial inspection
scenarios. These methods avoid the need for large labeled
datasets and are capable of identifying a wide range of
defects without explicit annotation.</p>
        <p>For future work, it would be beneficial to explore the
integration of attention mechanisms or generative
adversarial networks (GANs) to enhance reconstruction
quality and reduce false positives. Furthermore, expanding
the set of evaluated loss functions to include perceptual
losses or multi-scale SSIM might improve results on more
complex textures. Finally, applying these models in
realtime settings, with hardware constraints and on-the-fly
decision-making, remains a key area for further
development and practical validation in industrial contexts.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Journal of Intelligent Systems</source>
          <volume>36</volume>
          (
          <year>2021</year>
          )
          <fpage>2443</fpage>
          -
          <lpage>2464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>doi:10</source>
          .1002/int.22386. [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          , Keeping eyes on the
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          volume
          <volume>3695</volume>
          ,
          <year>2023</year>
          , p.
          <fpage>85</fpage>
          -
          <lpage>95</lpage>
          . [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boutarfaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. E</surname>
          </string-name>
          . Tiber-
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>shop Proceedings</source>
          , volume
          <volume>3695</volume>
          ,
          <year>2023</year>
          , p.
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          . [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , G. Nandi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kala</surname>
          </string-name>
          , Static hand gesture
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>coders, in: 2014 Seventh International Conference</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>on Contemporary Computing (IC3)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>99</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          104. doi:
          <volume>10</volume>
          .1109/IC3.
          <year>2014</year>
          .
          <volume>6897155</volume>
          . [9]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          , C. Napoli,
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>puter Science)</source>
          , volume
          <volume>9119</volume>
          ,
          <year>2015</year>
          , p.
          <fpage>469</fpage>
          -
          <lpage>480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>doi:10</source>
          .1007/978-3-
          <fpage>319</fpage>
          -19324-3_
          <fpage>42</fpage>
          . [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rifai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Glorot</surname>
          </string-name>
          , Y. Ben-
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>during feature extraction</source>
          ,
          <year>2011</year>
          . [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bilik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Horak</surname>
          </string-name>
          ,
          <article-title>Sift and surf based fea-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>ture extraction for the anomaly detection</article-title>
          ,
          <year>2022</year>
          . [1]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Kamoona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Gostar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bab-Hadiashar</surname>
          </string-name>
          , arXiv:
          <fpage>2203</fpage>
          .
          <fpage>13068</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoseinnezhad</surname>
          </string-name>
          , Point pattern feature-based [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shu</surname>
          </string-name>
          , S. Niu, Automated de-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>cess 9</source>
          (
          <year>2021</year>
          )
          <fpage>158672</fpage>
          -
          <lpage>158681</lpage>
          . URL: https://doi. ing
          <volume>30</volume>
          (
          <year>2019</year>
          )
          <fpage>2525</fpage>
          -
          <lpage>2534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>org/10</source>
          .1109%
          <fpage>2Faccess</fpage>
          .
          <year>2021</year>
          .
          <volume>3130261</volume>
          . doi:
          <volume>10</volume>
          .1109/ [13]
          <string-name>
            <surname>Y.-J. Cha</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Choi</surname>
          </string-name>
          , G. Suh, S. Mahmoudkhani,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>access.</surname>
          </string-name>
          <year>2021</year>
          .3130261. O. Buüyuökoöztürk, Autonomous structural visual [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A new inspection using region-based deep learning for de-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>sian approximation</article-title>
          ,
          <source>IEEE Signal Processing Letters Civil and Infrastructure Engineering</source>
          <volume>33</volume>
          (
          <year>2018</year>
          )
          <fpage>731</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <volume>25</volume>
          (
          <year>2018</year>
          )
          <fpage>1615</fpage>
          -
          <lpage>1619</lpage>
          . doi:
          <volume>10</volume>
          .1109/LSP.
          <year>2018</year>
          .
          <volume>747</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          2866926. [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , et al.,
          <source>Enhanc</source>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , E. Tramontana,
          <article-title>ing object detection robustness for cross-depiction</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>via cuckoo search algorithm</article-title>
          ,
          <source>International Journal SHOP PROCEEDINGS</source>
          , volume
          <volume>3684</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>of Electronics and Telecommunications</source>
          <volume>61</volume>
          (
          <year>2015</year>
          )
          <year>2023</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          333 -
          <fpage>338</fpage>
          . doi:
          <volume>10</volume>
          .1515/eletel-2015-
          <volume>0043</volume>
          . [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Finke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krämer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Morandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mück</surname>
          </string-name>
          , I. Olek[4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Hybrid neu- siyuk, Autoencoders for unsupervised anomaly</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>diction of new generation batteries storage</article-title>
          ,
          <source>in: Energy Physics</source>
          <year>2021</year>
          (
          <year>2021</year>
          ). URL: https://doi.org/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>3rd International Conference on Clean Electrical</source>
          <volume>10</volume>
          .1007%2Fjhep06%
          <fpage>282021</fpage>
          %
          <fpage>29161</fpage>
          . doi:
          <volume>10</volume>
          .1007/
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Power: Renewable Energy Resources Impact</surname>
          </string-name>
          , IC-
          <fpage>jhep06</fpage>
          (
          <year>2021</year>
          )
          <fpage>161</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>CEP</source>
          <year>2011</year>
          ,
          <year>2011</year>
          , p.
          <fpage>341</fpage>
          -
          <lpage>344</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCEP. [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lupo</surname>
          </string-name>
          , Variational autoencoder for unsupervised
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <year>2011</year>
          .6036301. anomaly detection,
          <source>Master's thesis</source>
          ,
          <year>2019</year>
          . Corso di [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lo Sciuto</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shikler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Or- laurea magistrale in Ingegneria Matematica.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>ganic solar cells defects classification by using a [17]</article-title>
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , P.-A. Man-
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>with denoising autoencoders</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1096</fpage>
          -
          <lpage>1103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>doi:10.1145/1390156</source>
          .1390294. [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Corvitto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Faiella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          , S. Russo,
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>CEUR WORKSHOP PROCEEDINGS</source>
          , volume
          <volume>3869</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>31</lpage>
          . [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bionda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Frittoli</surname>
          </string-name>
          , G. Boracchi, Deep au-
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>and Processing - ICIAP 2022</source>
          , Springer Interna-
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>tional Publishing</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>669</fpage>
          -
          <lpage>680</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          //doi.org/10.1007%
          <fpage>2F978</fpage>
          -
          <fpage>3</fpage>
          -
          <fpage>031</fpage>
          -064302_
          <fpage>56</fpage>
          . doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <volume>1007</volume>
          /
          <fpage>978</fpage>
          -3-
          <fpage>031</fpage>
          -06430-2_
          <fpage>56</fpage>
          . [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fiani</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. De Magistris</surname>
          </string-name>
          , et al.,
          <source>Increased</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>using gans</article-title>
          .,
          <source>in: ICYRIME</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>45</lpage>
          . [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          , Pni : Industrial anomaly
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>mation</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2211</volume>
          .
          <fpage>12634</fpage>
          . [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          , Slsg: Industrial
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <article-title>ture embeddings and one-class classification</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>arXiv:2305</source>
          .
          <fpage>00398</fpage>
          . [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Holly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Heel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Katic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schoefl</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Stiftinger,
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <source>ization in industrial cooling systems</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <source>arXiv:2210</source>
          .
          <fpage>08011</fpage>
          . [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fauser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sattlegger</surname>
          </string-name>
          , C. Steger,
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <article-title>for unsupervised anomaly detection</article-title>
          ,
          <source>in: 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <source>Pattern Recognition (CVPR)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>9584</fpage>
          -
          <lpage>9592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <source>doi:10</source>
          .1109/CVPR.
          <year>2019</year>
          .
          <volume>00982</volume>
          . [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , Very deep convolu-
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <year>2015</year>
          . arXiv:
          <volume>1409</volume>
          .
          <fpage>1556</fpage>
          . [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Akenine-Möller, Understanding ssim,
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <year>2020</year>
          . arXiv:
          <year>2006</year>
          .13846.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>