=Paper=
{{Paper
|id=Vol-3762/519
|storemode=property
|title=Exploiting Multimodal Latent Diffusion Models for Accurate Anomaly Detection in Industry 5.0
|pdfUrl=https://ceur-ws.org/Vol-3762/519.pdf
|volume=Vol-3762
|authors=Luigi Capogrosso,Alvise Vivenza,Andrea Chiarini,Francesco Setti,Marco Cristani
|dblpUrl=https://dblp.org/rec/conf/ital-ia/CapogrossoVCSC24
}}
==Exploiting Multimodal Latent Diffusion Models for Accurate Anomaly Detection in Industry 5.0==
Exploiting Multimodal Latent Diffusion Models for Accurate
Anomaly Detection in Industry 5.0
Luigi Capogrosso1,∗ , Alvise Vivenza1,2 , Andrea Chiarini2,3 , Francesco Setti1,2 and
Marco Cristani1,2
1
Department of Engineering for Innovation Medicine, University of Verona, Italy
2
QUALYCO S.r.l, Spin-off of the University of Verona, Verona, Italy
3
Department of Management, University of Verona, Italy
Abstract
Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained
on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter
are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by
superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often
produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify
what a defect looks like. In this paper, we show the research we are carrying out in collaboration with QUALYCO, a startup
spin-off of the University of Verona, on multimodal Latent Diffusion Models (LDMs) for accurate anomaly detection in
Industry 5.0. Unlike conventional image generation techniques, we work within a human feedback loop pipeline, where
domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible
anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop,
facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner,
avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate its efficacy and
versatility on the challenging KSDD2 dataset, achieving state-of-the-art results.
Keywords
Diffusion Models, Anomaly Detection, Industry 5.0
1. Introduction to collect massive amounts of data to have enough pos-
itive samples. Moreover, with the rise of the Industry
Surface Defect Detection (SDD) is a challenging problem 5.0 [3] and the transition towards flexible manufacturing
in industrial scenarios, defined as the task of individuat- processes where human operators and production line
ing samples containing a defect [1]. In many real-world components actively collaborate, there is an increasing
applications, a human expert inspects every product and demand for systems that can quickly adapt to new pro-
removes those defective pieces. Unfortunately, human duction setups, i.e., customized products manufactured
experts are often inaccurate, and outputs can be incon- in small batches. Traditional automated systems cannot
sistent or biased. Moreover, humans are relatively slow comply with these demands since data collection could
in accomplishing this task, and their performances are easily involve the whole batch size.
subject to stress and fatigue. Recent studies on SDD focused on limiting the impact
Automated defect detection systems [2] can easily of the labeling process by formulating the problem under
overcome most of these issues by learning classifiers the unsupervised learning paradigm [4] or training exclu-
on defective and nominal training products. The main sively on nominal samples [5], possibly using few-shot
drawback is the data collection process required to train learning strategies [6]. In both cases, the goal is to gener-
a model effectively. Indeed, defective items (i.e., posi- ate an accurate model of the nominal sample distribution
tive samples) are relatively rare compared to nominal and predict everything with a low probability score as
items (i.e., negative samples). Thus, the user may need anomalies. However, due to the limited restoration capa-
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
bility of these models, these approaches tend to generate
nized by CINI, May 29-30, 2024, Naples, Italy many false positives, especially on datasets with complex
∗
Corresponding author. structures or textures [7].
Envelope-Open luigi.capogrosso@univr.it (L. Capogrosso); It is worth noting that, in industrial setups, anomalies
alvise.vivenza@univr.it (A. Vivenza); andrea.chiarini@univr.it are not generated by Gaussian processes but are the out-
(A. Chiarini); francesco.setti@univr.it (F. Setti);
marco.cristani@univr.it (M. Cristani)
come of specific, often predictable, issues during the pro-
Orcid 0000-0002-4941-2255 (L. Capogrosso); 0000-0003-4915-5145 duction process. Consequently, the anomalous samples
(A. Chiarini); 0000-0002-0015-5534 (F. Setti); 0000-0002-0523-6042 are not randomly distributed outside the nominal distri-
(M. Cristani) bution; they can be modeled as a mixture of Gaussian
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Normal samples Image generation via Diffusion Models TinyML-based image
from the classification with the
production line generated samples
Region localization
Text description
"A photo of a scratched
surface"
Classification
Latent Diffusion model
Neg. text description Model
"A photo of a smooth
surface"
Anomalous Not anomalous
Figure 1: Our pipeline. Starting from positive samples, we leverage a Latent Diffusion Model (LDM) to synthesize novel
in-distribution high-quality images of defective surfaces based on defect localization via gesture and textual prompts by a
human feedback loop. Then, these synthetic images are used as anomaly samples to train a TinyML-based binary classifier
directly on the production line for real-time anomaly detection.
distributions in the feature space instead. While general, defective samples compared to previous state-of-
unpredictable anomalies can still happen, expert opera- the-art approaches.
tors can easily define the main problems they can expect • We dive into spatial control approaches to en-
from the manufacturing process, such as which kind of able the synthesis of defect samples incorporat-
defects, in which locations, and how often they wish to ing regional information and exhibit enhanced
appear. Thus, generative AI can represent a powerful controllability of the image generation through a
tool for SDD, with defect image generation emerging as human feedback loop pipeline, effectively utiliz-
a promising approach to enhance detector performance. ing domain expertise to generate more plausible
Specifically, in this paper, we report the result of our in-distribution anomalies.
research on Latent Diffusion Models (LDMs), a power-
ful class of generative models, to produce fine-grained
realistic defect images that can be used as positive sam- 2. Related Work
ples to train an anomaly detection model. We name
our approach DIAG, a training-free Diffusion-based In- Research on SDD has been conducted according to differ-
distribution Anomaly Generation pipeline for data aug- ent setups: unsupervised approaches [8] use a mixture of
mentation in the SDD task. By leveraging pre-trained unlabelled positive and negative sample images for train-
LDMs with multimodal conditioning, we can exploit do- ing; supervised approaches require labeled samples in the
main experts’ knowledge to generate plausible anoma- form of binary masks representing the defects (full super-
lies without needing real positive data. When using vision) [9] or simply as a tag for the whole image (weak
these augmented images to train an anomaly detection supervision) [10]. Supervised methods demonstrated
model, we show a notable increase in the detection perfor- superior accuracy in the identification of anomalies. Nev-
mance compared to previous state-of-the-art augmenta- ertheless, the effort required to provide good annotations
is not always justified. Collecting positive samples can
tion pipelines. Specifically, this research is being carried
out in collaboration with QUALYCO1 , a startup spin-off be time and resource-consuming due to the low rate of
of the University of Verona. Figure 1 outlines our ap- defective products generated by industrial lines.
proach. Thus, many recent approaches adopt a “clean” setup,
The main contributions of our research are as follows: where the training set consists of only nominal samples.
Two strategies can be adopted in clean setups: model
• We present a complete pipeline for training fitting and image generation. Model fitting approaches
anomaly detection models based on nominal im- aim at generating an accurate model of the nominal dis-
ages and textual prompts. We showcase the su- tribution, considering an outlier in every sample with
perior outcomes achieved by utilizing generated a likelihood lower than –or a distance from the nomi-
1
nal prototype higher than– a predefined threshold [11].
https://qualyco.com.
On the contrary, data augmentation approaches lever- effectively enhance spatial control, opting to utilize an
age generative methods to synthesize images of defects inpainting model, as demonstrated in [14, 18]. Given an
and use these images as positive samples for training image with a masked region, inpainting seamlessly fills
a supervised model. Specifically, this work focuses on it with content that harmonizes with the surrounding im-
generation-based data augmentation under clean setups. age. Although typically employed to eliminate undesired
The most popular data augmentation pipeline for SDD artifacts, the inpainting process ensures that the masked
consists of a series of random standard transformations area incorporates the provided prompt, effectively merg-
of the input image –such as mirroring, rotations, and ing textual and visual content.
color changes– followed by the super-imposition of noisy
patches [12]. 3.2. Our proposed pipeline
In MemSeg [12], the pipeline for the generation of
the abnormal synthetic examples is divided into three To generate an anomalous image 𝑖𝑎 , the process starts
steps: i) a Region of Interest (ROI) indicating where the by sampling a random negative image, an anomaly de-
defect will be located is generated using Perlin noise scription, and a mask, forming the triplet (𝑖𝑛 , 𝑑𝑎 , 𝑚𝑎 ).
and the target foreground; ii) the ROI is applied to a These pieces of information will then be fed to a text-
noise image to generate a noise foreground ROI; iii) the conditioned LDM to perform inpainting on image 𝑖𝑛 using
noise foreground ROI is super-imposed on the original the mask 𝑚𝑎 .
image to obtain the simulated anomalous image. How- The anomaly description 𝑑𝑎 guides the generation, fill-
ever, all these approaches are based on generating out- ing the masked region of 𝑖𝑛 with an anomaly that com-
of-distribution patterns that do not faithfully represent plies with the prompt. To generate images resembling
the target-domain anomalies. real anomalous samples, domain knowledge from indus-
More recently, the first work that draws attention to trial experts is exploited, providing textual descriptions
in-distribution defect data is In&Out [13], in which we of the potential anomalies’ type, shape, and spatial infor-
empirically show that diffusion models provide more mation.
realistic in-distribution defects. Here, we significantly The LDM is then conditioned on this information to
improve the generation of in-distribution anomalous sam- inpaint plausible anomalies on defect-free samples. For-
ples of [13], incorporating domain knowledge provided mally, given pictures of defect-free (negative) samples
by an expert user through textual prompts and localiza- 𝐼𝑛 , domain experts will provide textual descriptions 𝐷𝑎
tion of salient regions in a training-free setup. of what different anomalies may look like. At the same
time, regions where these anomalies may appear on the
defect-free samples will be designated. We define this
3. Methodology set of regions as a set of binary masks 𝑀𝑎 of possible
anomalies, shapes, and locations. The result of this oper-
3.1. Multimodal Diffusion-based image ation is 𝑖𝑎 , an anomalous version of 𝑖𝑛 , where an anomaly
generation has been inpainted in the masked region 𝑚𝑎 . Due to the
stochastic nature of LDMs, this process can be repeated
LDMs [14, 15] are a class of deep latent variable mod-
multiple times to generate an augmented set of anoma-
els that work by modeling the joint distribution of the
lous sample images 𝐼𝑎 . Finally, the set 𝐼𝑎 can be used as
data over a Markovian inference process. This process
data augmentation for training anomaly detection mod-
consists of small perturbations of the data with a variance-
els, as presented in the following section.
preserving property [16], such that the limit distribution
after the diffusion process is approximately identical to a
known prior distribution. Starting with samples from the 3.3. The anomaly detection task
prior, a reverse diffusion process is learned by gradual We approach the anomaly detection problem as a binary
denoising the sample to resemble the initial data by the classification problem, where the objective is to predict
end of the procedure. whether a sample belongs to one of two classes. Specifi-
We leveraged the natural ability of LDMs to incor- cally, we utilized a ResNet-50 [22] backbone trained with
porate multimodal conditioning in the generation pro- a binary cross-entropy loss function denoted as ℒ .
BCE
cess, taking inspiration from [17, 18, 19]. Specifically, Mathematically, it is defined as:
we use as textual descriptions a prompt and a negative
prompt, i.e., a prompt which guides the image generation 1
𝑁
“away” from its concepts of the desired output, result- ℒBCE (𝑦, 𝑦)̂ = − ∑ [𝑦𝑖 log(𝑦𝑖̂ ) + (1 − 𝑦𝑖 ) log(1 − 𝑦𝑖̂ )] ,
𝑁 𝑖=1
ing in high-quality images that comply with the given
(1)
descriptions [20, 21].
where, 𝑦 represents the ground truth labels, 𝑦̂ represents
In particular, we do not do full image generation to
the predicted probabilities, and 𝑁 is the number of sam-
ples. In detail, 𝑦𝑖 denotes the true label for sample 𝑖, Table 1
which can be either 0 or 1, while 𝑦𝑖̂ signifies the predicted Results between MemSeg, In&Out and DIAG when no
probability that sample 𝑖 belongs to class 1. anomalous samples are available. In bold, the best results.
Ongoing developments aim to optimize a model Underlined, the second best.
through TinyML [23] techniques in order to have an ultra- Model Naug AP ↑ Precision ↑ Recall ↑
efficient system that can work smoothly in real-time on
a production line. MemSeg [12] 80 .514 .733 .436
MemSeg [12] 100 .388 .633 .432
MemSeg [12] 120 .511 .683 .470
4. Experiments In&Out [13] 80 .556 .530 .655
In&Out [13] 100 .626 .742 .568
4.1. Experiment setup In&Out [13] 120 .536 .699 .534
Datasets We use the Kolektor Surface-Defect Dataset 2 DIAG (ours) 80 .769 .851 .673
(KSDD2) [10], one of the most recent, complex, and real- DIAG (ours) 100 .801 .924 .664
DIAG (ours) 120 .739 .944 .609
world SDD datasets. This dataset comprises 246 positive
and 2085 negative images in the training set and 110 pos-
itive and 894 negative images in the testing set. Positive
images are images with visible defects, such as scratches, the training set, which will be used to train the anomaly
spots, and surface imperfections. Since the images have detection model.
different dimensions, we standardize the dataset reso-
lution, resizing all the images to 224 × 632 pixels while ResNet-50 training and testing For a fair comparison
keeping the number of normal and anomalous samples with [13], we use the same PyTorch implementation of
unchanged. the ResNet-50 [22] as our anomaly detection model, in
which we substitute the fully connected layers after the
Evaluation metrics The anomaly detection perfor- backbone to make it a binary classifier. The network is
mance was evaluated based on Average Precision (AP), trained for 50 epochs with Adam [24] as an optimizer, a
Precision, and Recall, following the evaluation protocol learning rate of 0.0001, and a batch size of 32. To maintain
defined in [13]. consistency with the training and evaluation procedures
of KSDD2, our setup is the same as presented in [10, 13],
where only the images and ground truth labels are used
4.2. Implementation details to train the model.
In this section, we specify all the implementation de-
tails for reproducibility. All training and inferences were 4.3. Quantitative results
conducted on an NVIDIA RTX 3090 GPU.
Zero-shot data augmentation Here, we emulate the
Inpainting via Diffusion Models We use the pre- situation where no original positive samples are avail-
trained implementation of SDXL [21] from Diffusers as able in the training set. This scenario makes generating
our text-conditioned LDM. Following the procedure out- augmented positive samples necessary and restricts the
lined in Section 3.2, we use the negative images of KSDD2 users to augmentation procedures that do not rely on pos-
as the set 𝐼𝑛 . As the set of anomaly descriptions 𝐷𝑎 , itive images. To do this, we build the set of augmented
we used the prompts “white marks on the wall ” and anomalous images 𝐼𝑎 by generating 𝑁𝑎𝑢𝑔 augmented pos-
“copper metal scratches ”. Instead, “smooth, plain, itive samples with different pipelines, i.e., MemSeg [12],
black, dark, shadow ” were used as a negative prompt In&Out [13] and DIAG. Then, we train the ResNet-50
to improve the performance further. These prompts were model on a dataset that includes the original negative
chosen after a series of tests, simulating the iterative pro- samples 𝐼𝑛 and the augmented positive samples 𝐼𝑎 . Finally,
cess of our human feedback loop pipeline until the result- we evaluate the model on the original test set.
ing images resembled plausible anomalies. We used the Table 1 reports the comparison between the models
segmentation masks of positive samples in the KSDD2 trained with MemSeg, In&Out, and DIAG augmented
dataset to simulate the domain experts’ definition of plau- data at different values of 𝑁𝑎𝑢𝑔 . As we can see, our pro-
sible anomalous regions. Then, these data are fed to the posed method achieves the highest AP (.801), recorded
pre-trained SDXL model to perform inpainting on the at 100 augmented images, while also resulting in a con-
negative images in a training-free process, generating sistently higher AP when compared to the MemSeg and
the set of augmented anomalous images 𝐼𝑎 as described in In&Out pipelines. These impressive results highlight
Section 3.2. Finally, the generated images 𝐼 are added to how, through domain expertise in the form of anomaly
𝑎
Table 2
Results between MemSeg, In&Out and DIAG when all the
anomalous samples are available. In bold, the best results.
Underlined, the second best.
Model Naug AP ↑ Precision ↑ Recall ↑
MemSeg [12] 80 .744 .851 .691
MemSeg [12] 100 .774 .814 .752
MemSeg [12] 120 .734 .772 .707
In&Out [13] 80 .747 .764 .734
In&Out [13] 100 .775 .868 .720
In&Out [13] 120 .782 .906 .689
Figure 2: First row displays some negative samples from
DIAG (ours) 80 .869 .912 .755 the KSDD2 dataset. The second row shows some images
DIAG (ours) 100 .911 .978 .800 of positive samples from the same dataset. The third row
DIAG (ours) 120 .924 .896 .864 shows the MemSeg-generated defect samples. The fourth
row shows In&Out generated defect samples. Lastly, the final
row showcases some images generated with DIAG. Notably,
descriptions and segmentation masks, it is possible to gen- the defect images that DIAG generated are more realistic and
in-distribution.
erate in-distribution images able to meaningfully guide
an anomaly detection network, even in a complicated
scenario where no real anomalous data is available.
Surprisingly, the DIAG performance with 𝑁𝑎𝑢𝑔 = 120 4.4. Qualitative results
augmented images is lower than using a smaller num- The main goal of our data augmentation pipeline is to
ber of augmented images. We hypothesize this is due generate in-distribution synthetic positive images, mean-
to the stochastic nature of the LDMs image generation. ing images that closely resemble the real ones. Figure 2
While it allows the generation of various images given shows qualitative results. It’s evident that the images
the same guidance, it can also lower, in some cases, the produced by DIAG are markedly more realistic compared
predictability of the quality of the generated samples, to those generated by MemSeg [12] and In&Out [13].
which sometimes may not faithfully comply with the
prompt. Future works will focus on studying quality
consistency in the image generation pipeline. 5. Conclusions
Full-shot data augmentation To showcase DIAG as This work presents DIAG, a novel data augmentation
a general data augmentation technique, we also explore pipeline that leverages visual language models to produce
the scenario where real positive samples are available in training-free positive images for enhancing the perfor-
the training set. To this aim, we include all the 246 real mance of an SDD model. We introduced domain experts
positive samples 𝐼𝑝 in the training set, together with the in the generation pipeline, asking them to describe with
real negative images 𝐼𝑛 and the 𝑁𝑎𝑢𝑔 augmented positive textual prompts how a defect should look and where it
images 𝐼𝑎 . can be localized. Then, we adopt a pre-trained LDM to
As we can see from Table 2, DIAG achieves the highest generate defective images and train a binary classifier
average AP yet (.924), surpassing the .782 set by the pre- for isolating the anomalous images. We focus our ex-
vious state-of-the-art data augmentation pipeline [13]. periments on the KSDD2 dataset and establish ourselves
When comparing these results to the ones obtained in as the new state-of-the-art data augmentation pipeline,
the “zero-shot data augmentation” scenario, it is clear surpassing previous approaches in both the zero-shot
how more in-distribution images improve model per- and full-shot data augmentation scenarios with an AP
formance during training. This is highlighted by the of .801 and .924, respectively. These results highlight
improvement in performance of all the models when the potential of in-distribution data augmentation in the
adding the real positive images 𝐼𝑝 to the training set. At anomaly detection field, where training-free generative
the same time, the inclusion of DIAG augmented images model pipelines such as DIAG can provide meaningful
allows the model to explore the anomaly distribution fur- data for downstream classification, making them appeal-
ther, resulting in the difference in performance between ing solutions in scenarios where real anomalous data is
the different data augmentation pipelines. difficult to collect or unavailable. These promising results
promote further exploration across various datasets, par-
ticularly investigating how robust the image generation
is compared to noisy textual prompts.
References [13] L. Capogrosso, F. Girella, F. Taioli, M. Dalla Chiara,
M. Aqeel, F. Fummi, F. Setti, M. Cristani, Diffusion-
[1] T. Wang, Y. Chen, M. Qiao, H. Snoussi, A fast and based image generation for in-distribution data
robust convolutional neural network-based defect augmentation in surface defect detection, in:
detection model in product quality control, The International Joint Conference on Computer Vi-
International Journal of Advanced Manufacturing sion, Imaging and Computer Graphics Theory
Technology 94 (2018) 3465–3471. and Applications (VISAPP), 2024. doi:10.5220/
[2] S. H. Hanzaei, A. Afshar, F. Barazandeh, Automatic 0012350400003660 .
detection and classification of the ceramic tiles’ sur- [14] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan,
face defects, Pattern Recognition 66 (2017) 174–189. S. Ganguli, Deep unsupervised learning using
[3] P. K. R. Maddikunta, Q.-V. Pham, B. Prabadevi, nonequilibrium thermodynamics, in: International
N. Deepa, K. Dev, T. R. Gadekallu, R. Ruby, M. Liyan- Conference on Machine Learning (ICML), 2015.
age, Industry 5.0: A survey on enabling technolo- [15] J. Ho, A. Jain, P. Abbeel, Denoising diffusion prob-
gies and potential applications, Journal of Industrial abilistic models, Advances in Neural Information
Information Integration 26 (2022) 100257. Processing Systems (NeurIPS) 33 (2020) 6840–6851.
[4] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, [16] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar,
P. Gehler, Towards total recall in industrial anomaly S. Ermon, B. Poole, Score-based generative mod-
detection, in: IEEE/CVF Conference on Computer eling through stochastic differential equations, in:
Vision and Pattern Recognition (CVPR), 2022, pp. International Conference on Learning Representa-
14318–14328. tions (ICLR), 2020.
[5] M. Rudolph, B. Wandt, B. Rosenhahn, Same same [17] J. Ho, T. Salimans, Classifier-free diffusion guidance,
but differnet: Semi-supervised defect detection arXiv preprint arXiv:2207.12598 (2022).
with normalizing flows, in: Winter Conference [18] R. Rombach, A. Blattmann, D. Lorenz, P. Esser,
on Applications of Computer Vision (WACV), 2021. B. Ommer, High-resolution image synthesis with la-
[6] Y. Song, T. Wang, P. Cai, S. K. Mondal, J. P. Sahoo, tent diffusion models, in: IEEE/CVF Conference on
A comprehensive survey of few-shot learning: Evo- Computer Vision and Pattern Recognition (CVPR),
lution, applications, challenges, and opportunities, 2022.
ACM Computing Surveys 55 (2023) 1–40. [19] L. Capogrosso, A. Mascolini, F. Girella, G. Skenderi,
[7] Y. Chen, Y. Ding, F. Zhao, E. Zhang, Z. Wu, L. Shao, S. Gaiardelli, N. Dall’Ora, F. Ponzio, E. Fraccaroli,
Surface defect detection methods for industrial S. Di Cataldo, S. Vinco, et al., Neuro-symbolic
products: A review, Applied Sciences 11 (2021) empowered denoising diffusion probabilistic mod-
7657. els for real-time anomaly detection in industry 4.0:
[8] X. Tao, D. Zhang, W. Ma, Z. Hou, Z. Lu, C. Adak, Wild-and-crazy-idea paper, in: 2023 Forum on Spec-
Unsupervised anomaly detection for surface defects ification & Design Languages (FDL), IEEE, 2023, pp.
with dual-siamese network, IEEE Transactions on 1–4.
Industrial Informatics 18 (2022) 7707–7717. [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen,
[9] C. Luan, R. Cui, L. Sun, Z. Lin, A siamese net- Hierarchical text-conditional image generation
work utilizing image structural differences for cross- with clip latents. arxiv 2022, arXiv preprint
category defect detection, in: 2020 IEEE Interna- arXiv:2204.06125 (2022).
tional Conference on Image Processing (ICIP), IEEE, [21] D. Podell, Z. English, K. Lacey, A. Blattmann,
2020. T. Dockhorn, J. Müller, J. Penna, R. Rombach,
[10] J. Božič, D. Tabernik, D. Skočaj, Mixed supervision Sdxl: Improving latent diffusion models for high-
for surface-defect detection: From weakly to fully resolution image synthesis, arXiv preprint
supervised learning, Computers in Industry 129 arXiv:2307.01952 (2023).
(2021) 103459. [22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn-
[11] T. Defard, A. Setkov, A. Loesch, R. Audigier, ing for image recognition, in: IEEE/CVF Confer-
Padim: a patch distribution modeling framework ence on Computer Vision and Pattern Recognition
for anomaly detection and localization, in: Interna- (CVPR), 2016.
tional Conference on Pattern Recognition (ICPR), [23] L. Capogrosso, F. Cunico, D. S. Cheng, F. Fummi,
2021. M. Cristani, A machine learning-oriented survey
[12] M. Yang, P. Wu, H. Feng, Memseg: A semi- on tiny machine learning, IEEE Access (2024).
supervised method for image surface defect detec- [24] D. P. Kingma, J. Ba, Adam: A method for stochas-
tion using differences and commonalities, Engi- tic optimization, arXiv preprint arXiv:1412.6980
neering Applications of Artificial Intelligence 119 (2014).
(2023) 105835.