<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Ital-IA</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Models for Accurate Anomaly Detection in Industry 5.0</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luigi Capogrosso</string-name>
          <email>luigi.capogrosso@univr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alvise Vivenza</string-name>
          <email>alvise.vivenza@univr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Chiarini</string-name>
          <email>andrea.chiarini@univr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Setti</string-name>
          <email>francesco.setti@univr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Cristani</string-name>
          <email>marco.cristani@univr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Difusion Models, Anomaly Detection, Industry 5.0</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Engineering for Innovation Medicine, University of Verona</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Management, University of Verona</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>QUALYCO S.r.l, Spin-of of the University of Verona</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>4</volume>
      <fpage>29</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this paper, we show the research we are carrying out in collaboration with QUALYCO, a startup spin-of of the University of Verona, on multimodal Latent Difusion Models (LDMs) for accurate anomaly detection in Industry 5.0. Unlike conventional image generation techniques, we work within a human feedback loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate its eficacy and versatility on the challenging KSDD2 dataset, achieving state-of-the-art results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Surface Defect Detection (SDD) is a challenging problem
in industrial scenarios, defined as the task of
individuating samples containing a defect [1]. In many real-world
applications, a human expert inspects every product and
removes those defective pieces. Unfortunately, human
experts are often inaccurate, and outputs can be
inconsistent or biased. Moreover, humans are relatively slow
in accomplishing this task, and their performances are
subject to stress and fatigue.</p>
      <p>Automated defect detection systems [2] can easily
overcome most of these issues by learning classifiers
on defective and nominal training products. The main
drawback is the data collection process required to train
a model efectively. Indeed, defective items (i.e.,
positive samples) are relatively rare compared to nominal
items (i.e., negative samples). Thus, the user may need
to collect massive amounts of data to have enough
positive samples. Moreover, with the rise of the Industry
5.0 [3] and the transition towards flexible manufacturing
processes where human operators and production line
components actively collaborate, there is an increasing
demand for systems that can quickly adapt to new
production setups, i.e., customized products manufactured
in small batches. Traditional automated systems cannot
comply with these demands since data collection could
easily involve the whole batch size.</p>
      <p>Recent studies on SDD focused on limiting the impact
of the labeling process by formulating the problem under
the unsupervised learning paradigm [4] or training
exclusively on nominal samples [5], possibly using few-shot
learning strategies [6]. In both cases, the goal is to
generate an accurate model of the nominal sample distribution
and predict everything with a low probability score as
anomalies. However, due to the limited restoration
capability of these models, these approaches tend to generate
many false positives, especially on datasets with complex
structures or textures [7].</p>
      <sec id="sec-2-1">
        <title>It is worth noting that, in industrial setups, anomalies</title>
        <p>are not generated by Gaussian processes but are the
outcome of specific, often predictable, issues during the
production process. Consequently, the anomalous samples
are not randomly distributed outside the nominal
distribution; they can be modeled as a mixture of Gaussian
Normal samples</p>
        <p>from the
production line</p>
        <p>Image generation via Diffusion Models
Region localization
Text description
"A photo of a scratched</p>
        <p>surface"
Neg. text description
"A photo of a smooth
surface"</p>
        <p>Latent Diffusion</p>
        <p>Model</p>
        <p>TinyML-based image
classification with the
generated samples</p>
        <p>Classification</p>
        <p>model
Anomalous</p>
        <p>Not anomalous
distributions in the feature space instead. While general, defective samples compared to previous
state-ofunpredictable anomalies can still happen, expert opera- the-art approaches.
tors can easily define the main problems they can expect • We dive into spatial control approaches to
enfrom the manufacturing process, such as which kind of able the synthesis of defect samples
incorporatdefects, in which locations, and how often they wish to ing regional information and exhibit enhanced
appear. Thus, generative AI can represent a powerful controllability of the image generation through a
tool for SDD, with defect image generation emerging as human feedback loop pipeline, efectively
utiliza promising approach to enhance detector performance. ing domain expertise to generate more plausible</p>
        <p>Specifically, in this paper, we report the result of our in-distribution anomalies.
research on Latent Difusion Models (LDMs), a
powerful class of generative models, to produce fine-grained
realistic defect images that can be used as positive sam- 2. Related Work
ples to train an anomaly detection model. We name
our approach DIAG, a training-free Difusion-based In- Research on SDD has been conducted according to
diferdistribution Anomaly Generation pipeline for data aug- ent setups: unsupervised approaches [8] use a mixture of
mentation in the SDD task. By leveraging pre-trained unlabelled positive and negative sample images for
trainLDMs with multimodal conditioning, we can exploit do- ing; supervised approaches require labeled samples in the
main experts’ knowledge to generate plausible anoma- form of binary masks representing the defects (full
superlies without needing real positive data. When using vision) [9] or simply as a tag for the whole image (weak
these augmented images to train an anomaly detection supervision) [10]. Supervised methods demonstrated
model, we show a notable increase in the detection perfor- superior accuracy in the identification of anomalies.
Nevmance compared to previous state-of-the-art augmenta- ertheless, the efort required to provide good annotations
tion pipelines. Specifically, this research is being carried is not always justified. Collecting positive samples can
out in collaboration with QUALYCO1, a startup spin-of be time and resource-consuming due to the low rate of
of the University of Verona. Figure 1 outlines our ap- defective products generated by industrial lines.
proach. Thus, many recent approaches adopt a “clean” setup,
The main contributions of our research are as follows: where the training set consists of only nominal samples.</p>
        <p>Two strategies can be adopted in clean setups: model
• We present a complete pipeline for training iftting and image generation. Model fitting approaches
anomaly detection models based on nominal im- aim at generating an accurate model of the nominal
disages and textual prompts. We showcase the su- tribution, considering an outlier in every sample with
perior outcomes achieved by utilizing generated a likelihood lower than –or a distance from the
nominal prototype higher than– a predefined threshold [ 11].</p>
        <p>On the contrary, data augmentation approaches
leverefectively enhance spatial control, opting to utilize an
age generative methods to synthesize images of defects
inpainting model, as demonstrated in [14, 18]. Given an
and use these images as positive samples for training
image with a masked region, inpainting seamlessly fills
a supervised model. Specifically, this work focuses on
it with content that harmonizes with the surrounding
imgeneration-based data augmentation under clean setups. age. Although typically employed to eliminate undesired
The most popular data augmentation pipeline for SDD
artifacts, the inpainting process ensures that the masked
consists of a series of random standard transformations
area incorporates the provided prompt, efectively
mergof the input image –such as mirroring, rotations, and
ing textual and visual content.</p>
        <sec id="sec-2-1-1">
          <title>3.2. Our proposed pipeline</title>
          <p>To generate an anomalous image   , the process starts
by sampling a random negative image, an anomaly
description, and a mask, forming the triplet (  ,   ,   ).</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>These pieces of information will then be fed to a text</title>
        <p>conditioned LDM to perform inpainting on image   using</p>
        <p>The anomaly description   guides the generation,
fillplies with the prompt. To generate images resembling
real anomalous samples, domain knowledge from
industrial experts is exploited, providing textual descriptions
of the potential anomalies’ type, shape, and spatial
information.</p>
        <p>The LDM is then conditioned on this information to
noise foreground ROI is super-imposed on the original the mask   .
image to obtain the simulated anomalous image.
However, all these approaches are based on generating out- ing the masked region of   with an anomaly that
comimprove the generation of in-distribution anomalous sam- inpaint plausible anomalies on defect-free samples.
Forples of [13], incorporating domain knowledge provided
mally, given pictures of defect-free (negative) samples
by an expert user through textual prompts and localiza-  , domain experts will provide textual descriptions  

tion of salient regions in a training-free setup.
color changes– followed by the super-imposition of noisy
patches [12].</p>
        <p>In MemSeg [12], the pipeline for the generation of
the abnormal synthetic examples is divided into three
steps: i) a Region of Interest (ROI) indicating where the
defect will be located is generated using Perlin noise
and the target foreground; ii) the ROI is applied to a
noise image to generate a noise foreground ROI; iii) the
of-distribution patterns that do not faithfully represent
the target-domain anomalies.</p>
      </sec>
      <sec id="sec-2-3">
        <title>More recently, the first work that draws attention to in-distribution defect data is In&amp;Out [13], in which we empirically show that difusion models provide more realistic in-distribution defects. Here, we significantly</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Multimodal Difusion-based image generation</title>
        <p>LDMs [14, 15] are a class of deep latent variable
models that work by modeling the joint distribution of the
data over a Markovian inference process. This process
consists of small perturbations of the data with a
variancepreserving property [16], such that the limit distribution
after the difusion process is approximately identical to a
known prior distribution. Starting with samples from the
prior, a reverse difusion process is learned by gradual
denoising the sample to resemble the initial data by the
end of the procedure.</p>
        <p>We leveraged the natural ability of LDMs to
incorporate multimodal conditioning in the generation
process, taking inspiration from [17, 18, 19]. Specifically,
we use as textual descriptions a prompt and a negative
prompt, i.e., a prompt which guides the image generation
“away” from its concepts of the desired output,
resulting in high-quality images that comply with the given
descriptions [20, 21].</p>
        <p>In particular, we do not do full image generation to
of what diferent anomalies may look like. At the same
time, regions where these anomalies may appear on the
defect-free samples will be designated. We define this
set of regions as a set of binary masks   of possible
anomalies, shapes, and locations. The result of this
operation is   , an anomalous version of   , where an anomaly
has been inpainted in the masked region   . Due to the
stochastic nature of LDMs, this process can be repeated
multiple times to generate an augmented set of
anomalous sample images   . Finally, the set   can be used as
data augmentation for training anomaly detection
models, as presented in the following section.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. The anomaly detection task</title>
        <sec id="sec-3-2-1">
          <title>We approach the anomaly detection problem as a binary</title>
          <p>classification problem, where the objective is to predict
whether a sample belongs to one of two classes.
Specifically, we utilized a ResNet-50 [22] backbone trained with
a binary cross-entropy loss function denoted as ℒBCE.</p>
          <p>Mathematically, it is defined as:</p>
          <p>1
 =1
ℒBCE( ,  )̂ = −</p>
          <p>∑ [  log( ̂ ) + (1 −   ) log(1 −  ̂ )] ,
where,  represents the ground truth labels,  ̂ represents
the predicted probabilities, and  is the number of
sam(1)
ples. In detail,   denotes the true label for sample  , Table 1
which can be either 0 or 1, while  ̂ signifies the predicted Results between MemSeg, In&amp;Out and DIAG when no
probability that sample  belongs to class 1. anomalous samples are available. In bold, the best results.</p>
          <p>Ongoing developments aim to optimize a model Underlined, the second best.
tehficrieonutghsyTsitneymMtLha[2t3c]atnecwhonrikqusemsoinotohrldyeirntorehaalv-tei maneuolntra- Model Naug AP ↑ Precision ↑ Recall ↑
a production line.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment setup</title>
        <p>Datasets We use the Kolektor Surface-Defect Dataset 2 DIAG (ours) 80 .769 .851 .673
(KSDD2) [10], one of the most recent, complex, and real- DIAG (ours) 100 .801 .924 .664
world SDD datasets. This dataset comprises 246 positive DIAG (ours) 120 .739 .944 .609
and 2085 negative images in the training set and 110
positive and 894 negative images in the testing set. Positive
images are images with visible defects, such as scratches, the training set, which will be used to train the anomaly
spots, and surface imperfections. Since the images have detection model.
diferent dimensions, we standardize the dataset
resolution, resizing all the images to 224 × 632 pixels while
keeping the number of normal and anomalous samples
unchanged.</p>
        <p>ResNet-50 training and testing For a fair comparison
with [13], we use the same PyTorch implementation of
the ResNet-50 [22] as our anomaly detection model, in
which we substitute the fully connected layers after the
backbone to make it a binary classifier. The network is
trained for 50 epochs with Adam [24] as an optimizer, a
learning rate of 0.0001, and a batch size of 32. To maintain
consistency with the training and evaluation procedures
of KSDD2, our setup is the same as presented in [10, 13],
where only the images and ground truth labels are used
to train the model.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Quantitative results</title>
        <sec id="sec-4-2-1">
          <title>Evaluation metrics The anomaly detection performance was evaluated based on Average Precision (AP), Precision, and Recall, following the evaluation protocol defined in [ 13].</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.2. Implementation details</title>
        <sec id="sec-4-3-1">
          <title>In this section, we specify all the implementation details for reproducibility. All training and inferences were conducted on an NVIDIA RTX 3090 GPU.</title>
          <p>Inpainting via Difusion Models
trained implementation of SDXL [21] from Difusers as
our text-conditioned LDM. Following the procedure
outlined in Section 3.2, we use the negative images of KSDD2
as the set   . As the set of anomaly descriptions   ,
we used the prompts “white marks on the wall” and
“copper metal scratches”. Instead, “smooth, plain,
black, dark, shadow” were used as a negative prompt
to improve the performance further. These prompts were
chosen after a series of tests, simulating the iterative
process of our human feedback loop pipeline until the
resulting images resembled plausible anomalies. We used the
segmentation masks of positive samples in the KSDD2
dataset to simulate the domain experts’ definition of
plausible anomalous regions. Then, these data are fed to the
pre-trained SDXL model to perform inpainting on the
negative images in a training-free process, generating
the set of augmented anomalous images   as described in
Section 3.2. Finally, the generated images   are added to</p>
          <p>Zero-shot data augmentation Here, we emulate the
We use the pre- situation where no original positive samples are
available in the training set. This scenario makes generating
augmented positive samples necessary and restricts the
users to augmentation procedures that do not rely on
positive images. To do this, we build the set of augmented
anomalous images   by generating   augmented
positive samples with diferent pipelines, i.e., MemSeg [ 12],
In&amp;Out [13] and DIAG. Then, we train the ResNet-50
model on a dataset that includes the original negative
samples   and the augmented positive samples   . Finally,
we evaluate the model on the original test set.</p>
          <p>Table 1 reports the comparison between the models
trained with MemSeg, In&amp;Out, and DIAG augmented
data at diferent values of   . As we can see, our
proposed method achieves the highest AP (.801), recorded
at 100 augmented images, while also resulting in a
consistently higher AP when compared to the MemSeg and
In&amp;Out pipelines. These impressive results highlight
how, through domain expertise in the form of anomaly
descriptions and segmentation masks, it is possible to
generate in-distribution images able to meaningfully guide
an anomaly detection network, even in a complicated
scenario where no real anomalous data is available.</p>
          <p>Surprisingly, the DIAG performance with   = 120 4.4. Qualitative results
augmented images is lower than using a smaller num- The main goal of our data augmentation pipeline is to
ber of augmented images. We hypothesize this is due generate in-distribution synthetic positive images,
meanto the stochastic nature of the LDMs image generation. ing images that closely resemble the real ones. Figure 2
While it allows the generation of various images given shows qualitative results. It’s evident that the images
the same guidance, it can also lower, in some cases, the produced by DIAG are markedly more realistic compared
predictability of the quality of the generated samples, to those generated by MemSeg [12] and In&amp;Out [13].
which sometimes may not faithfully comply with the
prompt. Future works will focus on studying quality
consistency in the image generation pipeline. 5. Conclusions
Full-shot data augmentation To showcase DIAG as This work presents DIAG, a novel data augmentation
a general data augmentation technique, we also explore pipeline that leverages visual language models to produce
the scenario where real positive samples are available in training-free positive images for enhancing the
perforthe training set. To this aim, we include all the 246 real mance of an SDD model. We introduced domain experts
positive samples   in the training set, together with the in the generation pipeline, asking them to describe with
real negative images   and the   augmented positive textual prompts how a defect should look and where it
images   . can be localized. Then, we adopt a pre-trained LDM to</p>
          <p>As we can see from Table 2, DIAG achieves the highest generate defective images and train a binary classifier
average AP yet (.924), surpassing the .782 set by the pre- for isolating the anomalous images. We focus our
exvious state-of-the-art data augmentation pipeline [13]. periments on the KSDD2 dataset and establish ourselves
When comparing these results to the ones obtained in as the new state-of-the-art data augmentation pipeline,
the “zero-shot data augmentation” scenario, it is clear surpassing previous approaches in both the zero-shot
how more in-distribution images improve model per- and full-shot data augmentation scenarios with an AP
formance during training. This is highlighted by the of .801 and .924, respectively. These results highlight
improvement in performance of all the models when the potential of in-distribution data augmentation in the
adding the real positive images   to the training set. At anomaly detection field, where training-free generative
the same time, the inclusion of DIAG augmented images model pipelines such as DIAG can provide meaningful
allows the model to explore the anomaly distribution fur- data for downstream classification, making them
appealther, resulting in the diference in performance between ing solutions in scenarios where real anomalous data is
the diferent data augmentation pipelines. dificult to collect or unavailable. These promising results
promote further exploration across various datasets,
particularly investigating how robust the image generation
is compared to noisy textual prompts.
[13] L. Capogrosso, F. Girella, F. Taioli, M. Dalla Chiara,</p>
          <p>M. Aqeel, F. Fummi, F. Setti, M. Cristani,
Difusion[1] T. Wang, Y. Chen, M. Qiao, H. Snoussi, A fast and based image generation for in-distribution data
robust convolutional neural network-based defect augmentation in surface defect detection, in:
detection model in product quality control, The International Joint Conference on Computer
ViInternational Journal of Advanced Manufacturing sion, Imaging and Computer Graphics Theory
Technology 94 (2018) 3465–3471. and Applications (VISAPP), 2024. doi:10.5220/
[2] S. H. Hanzaei, A. Afshar, F. Barazandeh, Automatic 0012350400003660.</p>
          <p>detection and classification of the ceramic tiles’ sur- [14] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan,
face defects, Pattern Recognition 66 (2017) 174–189. S. Ganguli, Deep unsupervised learning using
[3] P. K. R. Maddikunta, Q.-V. Pham, B. Prabadevi, nonequilibrium thermodynamics, in: International
N. Deepa, K. Dev, T. R. Gadekallu, R. Ruby, M. Liyan- Conference on Machine Learning (ICML), 2015.
age, Industry 5.0: A survey on enabling technolo- [15] J. Ho, A. Jain, P. Abbeel, Denoising difusion
probgies and potential applications, Journal of Industrial abilistic models, Advances in Neural Information
Information Integration 26 (2022) 100257. Processing Systems (NeurIPS) 33 (2020) 6840–6851.
[4] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, [16] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar,
P. Gehler, Towards total recall in industrial anomaly S. Ermon, B. Poole, Score-based generative
moddetection, in: IEEE/CVF Conference on Computer eling through stochastic diferential equations, in:
Vision and Pattern Recognition (CVPR), 2022, pp. International Conference on Learning
Representa14318–14328. tions (ICLR), 2020.
[5] M. Rudolph, B. Wandt, B. Rosenhahn, Same same [17] J. Ho, T. Salimans, Classifier-free difusion guidance,
but difernet: Semi-supervised defect detection arXiv preprint arXiv:2207.12598 (2022).
with normalizing flows, in: Winter Conference [18] R. Rombach, A. Blattmann, D. Lorenz, P. Esser,
on Applications of Computer Vision (WACV), 2021. B. Ommer, High-resolution image synthesis with
la[6] Y. Song, T. Wang, P. Cai, S. K. Mondal, J. P. Sahoo, tent difusion models, in: IEEE/CVF Conference on
A comprehensive survey of few-shot learning: Evo- Computer Vision and Pattern Recognition (CVPR),
lution, applications, challenges, and opportunities, 2022.</p>
          <p>ACM Computing Surveys 55 (2023) 1–40. [19] L. Capogrosso, A. Mascolini, F. Girella, G. Skenderi,
[7] Y. Chen, Y. Ding, F. Zhao, E. Zhang, Z. Wu, L. Shao, S. Gaiardelli, N. Dall’Ora, F. Ponzio, E. Fraccaroli,
Surface defect detection methods for industrial S. Di Cataldo, S. Vinco, et al., Neuro-symbolic
products: A review, Applied Sciences 11 (2021) empowered denoising difusion probabilistic
mod7657. els for real-time anomaly detection in industry 4.0:
[8] X. Tao, D. Zhang, W. Ma, Z. Hou, Z. Lu, C. Adak, Wild-and-crazy-idea paper, in: 2023 Forum on
SpecUnsupervised anomaly detection for surface defects ification &amp; Design Languages (FDL), IEEE, 2023, pp.
with dual-siamese network, IEEE Transactions on 1–4.</p>
          <p>Industrial Informatics 18 (2022) 7707–7717. [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen,
[9] C. Luan, R. Cui, L. Sun, Z. Lin, A siamese net- Hierarchical text-conditional image generation
work utilizing image structural diferences for cross- with clip latents. arxiv 2022, arXiv preprint
category defect detection, in: 2020 IEEE Interna- arXiv:2204.06125 (2022).
tional Conference on Image Processing (ICIP), IEEE, [21] D. Podell, Z. English, K. Lacey, A. Blattmann,
2020. T. Dockhorn, J. Müller, J. Penna, R. Rombach,
[10] J. Božič, D. Tabernik, D. Skočaj, Mixed supervision Sdxl: Improving latent difusion models for
highfor surface-defect detection: From weakly to fully resolution image synthesis, arXiv preprint
supervised learning, Computers in Industry 129 arXiv:2307.01952 (2023).</p>
          <p>(2021) 103459. [22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual
learn[11] T. Defard, A. Setkov, A. Loesch, R. Audigier, ing for image recognition, in: IEEE/CVF
ConferPadim: a patch distribution modeling framework ence on Computer Vision and Pattern Recognition
for anomaly detection and localization, in: Interna- (CVPR), 2016.
tional Conference on Pattern Recognition (ICPR), [23] L. Capogrosso, F. Cunico, D. S. Cheng, F. Fummi,
2021. M. Cristani, A machine learning-oriented survey
[12] M. Yang, P. Wu, H. Feng, Memseg: A semi- on tiny machine learning, IEEE Access (2024).
supervised method for image surface defect detec- [24] D. P. Kingma, J. Ba, Adam: A method for
stochastion using diferences and commonalities, Engi- tic optimization, arXiv preprint arXiv:1412.6980
neering Applications of Artificial Intelligence 119 (2014).
(2023) 105835.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>