Exploiting Multimodal Latent Diffusion Models for Accurate Anomaly Detection in Industry 5.0 Luigi Capogrosso1,∗ , Alvise Vivenza1,2 , Andrea Chiarini2,3 , Francesco Setti1,2 and Marco Cristani1,2 1 Department of Engineering for Innovation Medicine, University of Verona, Italy 2 QUALYCO S.r.l, Spin-off of the University of Verona, Verona, Italy 3 Department of Management, University of Verona, Italy Abstract Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this paper, we show the research we are carrying out in collaboration with QUALYCO, a startup spin-off of the University of Verona, on multimodal Latent Diffusion Models (LDMs) for accurate anomaly detection in Industry 5.0. Unlike conventional image generation techniques, we work within a human feedback loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate its efficacy and versatility on the challenging KSDD2 dataset, achieving state-of-the-art results. Keywords Diffusion Models, Anomaly Detection, Industry 5.0 1. Introduction to collect massive amounts of data to have enough pos- itive samples. Moreover, with the rise of the Industry Surface Defect Detection (SDD) is a challenging problem 5.0 [3] and the transition towards flexible manufacturing in industrial scenarios, defined as the task of individuat- processes where human operators and production line ing samples containing a defect [1]. In many real-world components actively collaborate, there is an increasing applications, a human expert inspects every product and demand for systems that can quickly adapt to new pro- removes those defective pieces. Unfortunately, human duction setups, i.e., customized products manufactured experts are often inaccurate, and outputs can be incon- in small batches. Traditional automated systems cannot sistent or biased. Moreover, humans are relatively slow comply with these demands since data collection could in accomplishing this task, and their performances are easily involve the whole batch size. subject to stress and fatigue. Recent studies on SDD focused on limiting the impact Automated defect detection systems [2] can easily of the labeling process by formulating the problem under overcome most of these issues by learning classifiers the unsupervised learning paradigm [4] or training exclu- on defective and nominal training products. The main sively on nominal samples [5], possibly using few-shot drawback is the data collection process required to train learning strategies [6]. In both cases, the goal is to gener- a model effectively. Indeed, defective items (i.e., posi- ate an accurate model of the nominal sample distribution tive samples) are relatively rare compared to nominal and predict everything with a low probability score as items (i.e., negative samples). Thus, the user may need anomalies. However, due to the limited restoration capa- Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- bility of these models, these approaches tend to generate nized by CINI, May 29-30, 2024, Naples, Italy many false positives, especially on datasets with complex ∗ Corresponding author. structures or textures [7]. Envelope-Open luigi.capogrosso@univr.it (L. Capogrosso); It is worth noting that, in industrial setups, anomalies alvise.vivenza@univr.it (A. Vivenza); andrea.chiarini@univr.it are not generated by Gaussian processes but are the out- (A. Chiarini); francesco.setti@univr.it (F. Setti); marco.cristani@univr.it (M. Cristani) come of specific, often predictable, issues during the pro- Orcid 0000-0002-4941-2255 (L. Capogrosso); 0000-0003-4915-5145 duction process. Consequently, the anomalous samples (A. Chiarini); 0000-0002-0015-5534 (F. Setti); 0000-0002-0523-6042 are not randomly distributed outside the nominal distri- (M. Cristani) bution; they can be modeled as a mixture of Gaussian © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Normal samples Image generation via Diffusion Models TinyML-based image from the classification with the production line generated samples Region localization Text description "A photo of a scratched surface" Classification Latent Diffusion model Neg. text description Model "A photo of a smooth surface" Anomalous Not anomalous Figure 1: Our pipeline. Starting from positive samples, we leverage a Latent Diffusion Model (LDM) to synthesize novel in-distribution high-quality images of defective surfaces based on defect localization via gesture and textual prompts by a human feedback loop. Then, these synthetic images are used as anomaly samples to train a TinyML-based binary classifier directly on the production line for real-time anomaly detection. distributions in the feature space instead. While general, defective samples compared to previous state-of- unpredictable anomalies can still happen, expert opera- the-art approaches. tors can easily define the main problems they can expect • We dive into spatial control approaches to en- from the manufacturing process, such as which kind of able the synthesis of defect samples incorporat- defects, in which locations, and how often they wish to ing regional information and exhibit enhanced appear. Thus, generative AI can represent a powerful controllability of the image generation through a tool for SDD, with defect image generation emerging as human feedback loop pipeline, effectively utiliz- a promising approach to enhance detector performance. ing domain expertise to generate more plausible Specifically, in this paper, we report the result of our in-distribution anomalies. research on Latent Diffusion Models (LDMs), a power- ful class of generative models, to produce fine-grained realistic defect images that can be used as positive sam- 2. Related Work ples to train an anomaly detection model. We name our approach DIAG, a training-free Diffusion-based In- Research on SDD has been conducted according to differ- distribution Anomaly Generation pipeline for data aug- ent setups: unsupervised approaches [8] use a mixture of mentation in the SDD task. By leveraging pre-trained unlabelled positive and negative sample images for train- LDMs with multimodal conditioning, we can exploit do- ing; supervised approaches require labeled samples in the main experts’ knowledge to generate plausible anoma- form of binary masks representing the defects (full super- lies without needing real positive data. When using vision) [9] or simply as a tag for the whole image (weak these augmented images to train an anomaly detection supervision) [10]. Supervised methods demonstrated model, we show a notable increase in the detection perfor- superior accuracy in the identification of anomalies. Nev- mance compared to previous state-of-the-art augmenta- ertheless, the effort required to provide good annotations is not always justified. Collecting positive samples can tion pipelines. Specifically, this research is being carried out in collaboration with QUALYCO1 , a startup spin-off be time and resource-consuming due to the low rate of of the University of Verona. Figure 1 outlines our ap- defective products generated by industrial lines. proach. Thus, many recent approaches adopt a “clean” setup, The main contributions of our research are as follows: where the training set consists of only nominal samples. Two strategies can be adopted in clean setups: model • We present a complete pipeline for training fitting and image generation. Model fitting approaches anomaly detection models based on nominal im- aim at generating an accurate model of the nominal dis- ages and textual prompts. We showcase the su- tribution, considering an outlier in every sample with perior outcomes achieved by utilizing generated a likelihood lower than –or a distance from the nomi- 1 nal prototype higher than– a predefined threshold [11]. https://qualyco.com. On the contrary, data augmentation approaches lever- effectively enhance spatial control, opting to utilize an age generative methods to synthesize images of defects inpainting model, as demonstrated in [14, 18]. Given an and use these images as positive samples for training image with a masked region, inpainting seamlessly fills a supervised model. Specifically, this work focuses on it with content that harmonizes with the surrounding im- generation-based data augmentation under clean setups. age. Although typically employed to eliminate undesired The most popular data augmentation pipeline for SDD artifacts, the inpainting process ensures that the masked consists of a series of random standard transformations area incorporates the provided prompt, effectively merg- of the input image –such as mirroring, rotations, and ing textual and visual content. color changes– followed by the super-imposition of noisy patches [12]. 3.2. Our proposed pipeline In MemSeg [12], the pipeline for the generation of the abnormal synthetic examples is divided into three To generate an anomalous image 𝑖𝑎 , the process starts steps: i) a Region of Interest (ROI) indicating where the by sampling a random negative image, an anomaly de- defect will be located is generated using Perlin noise scription, and a mask, forming the triplet (𝑖𝑛 , 𝑑𝑎 , 𝑚𝑎 ). and the target foreground; ii) the ROI is applied to a These pieces of information will then be fed to a text- noise image to generate a noise foreground ROI; iii) the conditioned LDM to perform inpainting on image 𝑖𝑛 using noise foreground ROI is super-imposed on the original the mask 𝑚𝑎 . image to obtain the simulated anomalous image. How- The anomaly description 𝑑𝑎 guides the generation, fill- ever, all these approaches are based on generating out- ing the masked region of 𝑖𝑛 with an anomaly that com- of-distribution patterns that do not faithfully represent plies with the prompt. To generate images resembling the target-domain anomalies. real anomalous samples, domain knowledge from indus- More recently, the first work that draws attention to trial experts is exploited, providing textual descriptions in-distribution defect data is In&Out [13], in which we of the potential anomalies’ type, shape, and spatial infor- empirically show that diffusion models provide more mation. realistic in-distribution defects. Here, we significantly The LDM is then conditioned on this information to improve the generation of in-distribution anomalous sam- inpaint plausible anomalies on defect-free samples. For- ples of [13], incorporating domain knowledge provided mally, given pictures of defect-free (negative) samples by an expert user through textual prompts and localiza- 𝐼𝑛 , domain experts will provide textual descriptions 𝐷𝑎 tion of salient regions in a training-free setup. of what different anomalies may look like. At the same time, regions where these anomalies may appear on the defect-free samples will be designated. We define this 3. Methodology set of regions as a set of binary masks 𝑀𝑎 of possible anomalies, shapes, and locations. The result of this oper- 3.1. Multimodal Diffusion-based image ation is 𝑖𝑎 , an anomalous version of 𝑖𝑛 , where an anomaly generation has been inpainted in the masked region 𝑚𝑎 . Due to the stochastic nature of LDMs, this process can be repeated LDMs [14, 15] are a class of deep latent variable mod- multiple times to generate an augmented set of anoma- els that work by modeling the joint distribution of the lous sample images 𝐼𝑎 . Finally, the set 𝐼𝑎 can be used as data over a Markovian inference process. This process data augmentation for training anomaly detection mod- consists of small perturbations of the data with a variance- els, as presented in the following section. preserving property [16], such that the limit distribution after the diffusion process is approximately identical to a known prior distribution. Starting with samples from the 3.3. The anomaly detection task prior, a reverse diffusion process is learned by gradual We approach the anomaly detection problem as a binary denoising the sample to resemble the initial data by the classification problem, where the objective is to predict end of the procedure. whether a sample belongs to one of two classes. Specifi- We leveraged the natural ability of LDMs to incor- cally, we utilized a ResNet-50 [22] backbone trained with porate multimodal conditioning in the generation pro- a binary cross-entropy loss function denoted as ℒ . BCE cess, taking inspiration from [17, 18, 19]. Specifically, Mathematically, it is defined as: we use as textual descriptions a prompt and a negative prompt, i.e., a prompt which guides the image generation 1 𝑁 “away” from its concepts of the desired output, result- ℒBCE (𝑦, 𝑦)̂ = − ∑ [𝑦𝑖 log(𝑦𝑖̂ ) + (1 − 𝑦𝑖 ) log(1 − 𝑦𝑖̂ )] , 𝑁 𝑖=1 ing in high-quality images that comply with the given (1) descriptions [20, 21]. where, 𝑦 represents the ground truth labels, 𝑦̂ represents In particular, we do not do full image generation to the predicted probabilities, and 𝑁 is the number of sam- ples. In detail, 𝑦𝑖 denotes the true label for sample 𝑖, Table 1 which can be either 0 or 1, while 𝑦𝑖̂ signifies the predicted Results between MemSeg, In&Out and DIAG when no probability that sample 𝑖 belongs to class 1. anomalous samples are available. In bold, the best results. Ongoing developments aim to optimize a model Underlined, the second best. through TinyML [23] techniques in order to have an ultra- Model Naug AP ↑ Precision ↑ Recall ↑ efficient system that can work smoothly in real-time on a production line. MemSeg [12] 80 .514 .733 .436 MemSeg [12] 100 .388 .633 .432 MemSeg [12] 120 .511 .683 .470 4. Experiments In&Out [13] 80 .556 .530 .655 In&Out [13] 100 .626 .742 .568 4.1. Experiment setup In&Out [13] 120 .536 .699 .534 Datasets We use the Kolektor Surface-Defect Dataset 2 DIAG (ours) 80 .769 .851 .673 (KSDD2) [10], one of the most recent, complex, and real- DIAG (ours) 100 .801 .924 .664 DIAG (ours) 120 .739 .944 .609 world SDD datasets. This dataset comprises 246 positive and 2085 negative images in the training set and 110 pos- itive and 894 negative images in the testing set. Positive images are images with visible defects, such as scratches, the training set, which will be used to train the anomaly spots, and surface imperfections. Since the images have detection model. different dimensions, we standardize the dataset reso- lution, resizing all the images to 224 × 632 pixels while ResNet-50 training and testing For a fair comparison keeping the number of normal and anomalous samples with [13], we use the same PyTorch implementation of unchanged. the ResNet-50 [22] as our anomaly detection model, in which we substitute the fully connected layers after the Evaluation metrics The anomaly detection perfor- backbone to make it a binary classifier. The network is mance was evaluated based on Average Precision (AP), trained for 50 epochs with Adam [24] as an optimizer, a Precision, and Recall, following the evaluation protocol learning rate of 0.0001, and a batch size of 32. To maintain defined in [13]. consistency with the training and evaluation procedures of KSDD2, our setup is the same as presented in [10, 13], where only the images and ground truth labels are used 4.2. Implementation details to train the model. In this section, we specify all the implementation de- tails for reproducibility. All training and inferences were 4.3. Quantitative results conducted on an NVIDIA RTX 3090 GPU. Zero-shot data augmentation Here, we emulate the Inpainting via Diffusion Models We use the pre- situation where no original positive samples are avail- trained implementation of SDXL [21] from Diffusers as able in the training set. This scenario makes generating our text-conditioned LDM. Following the procedure out- augmented positive samples necessary and restricts the lined in Section 3.2, we use the negative images of KSDD2 users to augmentation procedures that do not rely on pos- as the set 𝐼𝑛 . As the set of anomaly descriptions 𝐷𝑎 , itive images. To do this, we build the set of augmented we used the prompts “white marks on the wall ” and anomalous images 𝐼𝑎 by generating 𝑁𝑎𝑢𝑔 augmented pos- “copper metal scratches ”. Instead, “smooth, plain, itive samples with different pipelines, i.e., MemSeg [12], black, dark, shadow ” were used as a negative prompt In&Out [13] and DIAG. Then, we train the ResNet-50 to improve the performance further. These prompts were model on a dataset that includes the original negative chosen after a series of tests, simulating the iterative pro- samples 𝐼𝑛 and the augmented positive samples 𝐼𝑎 . Finally, cess of our human feedback loop pipeline until the result- we evaluate the model on the original test set. ing images resembled plausible anomalies. We used the Table 1 reports the comparison between the models segmentation masks of positive samples in the KSDD2 trained with MemSeg, In&Out, and DIAG augmented dataset to simulate the domain experts’ definition of plau- data at different values of 𝑁𝑎𝑢𝑔 . As we can see, our pro- sible anomalous regions. Then, these data are fed to the posed method achieves the highest AP (.801), recorded pre-trained SDXL model to perform inpainting on the at 100 augmented images, while also resulting in a con- negative images in a training-free process, generating sistently higher AP when compared to the MemSeg and the set of augmented anomalous images 𝐼𝑎 as described in In&Out pipelines. These impressive results highlight Section 3.2. Finally, the generated images 𝐼 are added to how, through domain expertise in the form of anomaly 𝑎 Table 2 Results between MemSeg, In&Out and DIAG when all the anomalous samples are available. In bold, the best results. Underlined, the second best. Model Naug AP ↑ Precision ↑ Recall ↑ MemSeg [12] 80 .744 .851 .691 MemSeg [12] 100 .774 .814 .752 MemSeg [12] 120 .734 .772 .707 In&Out [13] 80 .747 .764 .734 In&Out [13] 100 .775 .868 .720 In&Out [13] 120 .782 .906 .689 Figure 2: First row displays some negative samples from DIAG (ours) 80 .869 .912 .755 the KSDD2 dataset. The second row shows some images DIAG (ours) 100 .911 .978 .800 of positive samples from the same dataset. The third row DIAG (ours) 120 .924 .896 .864 shows the MemSeg-generated defect samples. The fourth row shows In&Out generated defect samples. Lastly, the final row showcases some images generated with DIAG. Notably, descriptions and segmentation masks, it is possible to gen- the defect images that DIAG generated are more realistic and in-distribution. erate in-distribution images able to meaningfully guide an anomaly detection network, even in a complicated scenario where no real anomalous data is available. Surprisingly, the DIAG performance with 𝑁𝑎𝑢𝑔 = 120 4.4. Qualitative results augmented images is lower than using a smaller num- The main goal of our data augmentation pipeline is to ber of augmented images. We hypothesize this is due generate in-distribution synthetic positive images, mean- to the stochastic nature of the LDMs image generation. ing images that closely resemble the real ones. Figure 2 While it allows the generation of various images given shows qualitative results. It’s evident that the images the same guidance, it can also lower, in some cases, the produced by DIAG are markedly more realistic compared predictability of the quality of the generated samples, to those generated by MemSeg [12] and In&Out [13]. which sometimes may not faithfully comply with the prompt. Future works will focus on studying quality consistency in the image generation pipeline. 5. Conclusions Full-shot data augmentation To showcase DIAG as This work presents DIAG, a novel data augmentation a general data augmentation technique, we also explore pipeline that leverages visual language models to produce the scenario where real positive samples are available in training-free positive images for enhancing the perfor- the training set. To this aim, we include all the 246 real mance of an SDD model. We introduced domain experts positive samples 𝐼𝑝 in the training set, together with the in the generation pipeline, asking them to describe with real negative images 𝐼𝑛 and the 𝑁𝑎𝑢𝑔 augmented positive textual prompts how a defect should look and where it images 𝐼𝑎 . can be localized. Then, we adopt a pre-trained LDM to As we can see from Table 2, DIAG achieves the highest generate defective images and train a binary classifier average AP yet (.924), surpassing the .782 set by the pre- for isolating the anomalous images. We focus our ex- vious state-of-the-art data augmentation pipeline [13]. periments on the KSDD2 dataset and establish ourselves When comparing these results to the ones obtained in as the new state-of-the-art data augmentation pipeline, the “zero-shot data augmentation” scenario, it is clear surpassing previous approaches in both the zero-shot how more in-distribution images improve model per- and full-shot data augmentation scenarios with an AP formance during training. This is highlighted by the of .801 and .924, respectively. These results highlight improvement in performance of all the models when the potential of in-distribution data augmentation in the adding the real positive images 𝐼𝑝 to the training set. At anomaly detection field, where training-free generative the same time, the inclusion of DIAG augmented images model pipelines such as DIAG can provide meaningful allows the model to explore the anomaly distribution fur- data for downstream classification, making them appeal- ther, resulting in the difference in performance between ing solutions in scenarios where real anomalous data is the different data augmentation pipelines. difficult to collect or unavailable. These promising results promote further exploration across various datasets, par- ticularly investigating how robust the image generation is compared to noisy textual prompts. References [13] L. Capogrosso, F. Girella, F. Taioli, M. Dalla Chiara, M. Aqeel, F. Fummi, F. Setti, M. Cristani, Diffusion- [1] T. Wang, Y. Chen, M. Qiao, H. Snoussi, A fast and based image generation for in-distribution data robust convolutional neural network-based defect augmentation in surface defect detection, in: detection model in product quality control, The International Joint Conference on Computer Vi- International Journal of Advanced Manufacturing sion, Imaging and Computer Graphics Theory Technology 94 (2018) 3465–3471. and Applications (VISAPP), 2024. doi:10.5220/ [2] S. H. Hanzaei, A. Afshar, F. Barazandeh, Automatic 0012350400003660 . detection and classification of the ceramic tiles’ sur- [14] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, face defects, Pattern Recognition 66 (2017) 174–189. S. Ganguli, Deep unsupervised learning using [3] P. K. R. Maddikunta, Q.-V. Pham, B. Prabadevi, nonequilibrium thermodynamics, in: International N. Deepa, K. Dev, T. R. Gadekallu, R. Ruby, M. Liyan- Conference on Machine Learning (ICML), 2015. age, Industry 5.0: A survey on enabling technolo- [15] J. Ho, A. Jain, P. Abbeel, Denoising diffusion prob- gies and potential applications, Journal of Industrial abilistic models, Advances in Neural Information Information Integration 26 (2022) 100257. Processing Systems (NeurIPS) 33 (2020) 6840–6851. [4] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, [16] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, P. Gehler, Towards total recall in industrial anomaly S. Ermon, B. Poole, Score-based generative mod- detection, in: IEEE/CVF Conference on Computer eling through stochastic differential equations, in: Vision and Pattern Recognition (CVPR), 2022, pp. International Conference on Learning Representa- 14318–14328. tions (ICLR), 2020. [5] M. Rudolph, B. Wandt, B. Rosenhahn, Same same [17] J. Ho, T. Salimans, Classifier-free diffusion guidance, but differnet: Semi-supervised defect detection arXiv preprint arXiv:2207.12598 (2022). with normalizing flows, in: Winter Conference [18] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, on Applications of Computer Vision (WACV), 2021. B. Ommer, High-resolution image synthesis with la- [6] Y. Song, T. Wang, P. Cai, S. K. Mondal, J. P. Sahoo, tent diffusion models, in: IEEE/CVF Conference on A comprehensive survey of few-shot learning: Evo- Computer Vision and Pattern Recognition (CVPR), lution, applications, challenges, and opportunities, 2022. ACM Computing Surveys 55 (2023) 1–40. [19] L. Capogrosso, A. Mascolini, F. Girella, G. Skenderi, [7] Y. Chen, Y. Ding, F. Zhao, E. Zhang, Z. Wu, L. Shao, S. Gaiardelli, N. Dall’Ora, F. Ponzio, E. Fraccaroli, Surface defect detection methods for industrial S. Di Cataldo, S. Vinco, et al., Neuro-symbolic products: A review, Applied Sciences 11 (2021) empowered denoising diffusion probabilistic mod- 7657. els for real-time anomaly detection in industry 4.0: [8] X. Tao, D. Zhang, W. Ma, Z. Hou, Z. Lu, C. Adak, Wild-and-crazy-idea paper, in: 2023 Forum on Spec- Unsupervised anomaly detection for surface defects ification & Design Languages (FDL), IEEE, 2023, pp. with dual-siamese network, IEEE Transactions on 1–4. Industrial Informatics 18 (2022) 7707–7717. [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, [9] C. Luan, R. Cui, L. Sun, Z. Lin, A siamese net- Hierarchical text-conditional image generation work utilizing image structural differences for cross- with clip latents. arxiv 2022, arXiv preprint category defect detection, in: 2020 IEEE Interna- arXiv:2204.06125 (2022). tional Conference on Image Processing (ICIP), IEEE, [21] D. Podell, Z. English, K. Lacey, A. Blattmann, 2020. T. Dockhorn, J. Müller, J. Penna, R. Rombach, [10] J. Božič, D. Tabernik, D. Skočaj, Mixed supervision Sdxl: Improving latent diffusion models for high- for surface-defect detection: From weakly to fully resolution image synthesis, arXiv preprint supervised learning, Computers in Industry 129 arXiv:2307.01952 (2023). (2021) 103459. [22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn- [11] T. Defard, A. Setkov, A. Loesch, R. Audigier, ing for image recognition, in: IEEE/CVF Confer- Padim: a patch distribution modeling framework ence on Computer Vision and Pattern Recognition for anomaly detection and localization, in: Interna- (CVPR), 2016. tional Conference on Pattern Recognition (ICPR), [23] L. Capogrosso, F. Cunico, D. S. Cheng, F. Fummi, 2021. M. Cristani, A machine learning-oriented survey [12] M. Yang, P. Wu, H. Feng, Memseg: A semi- on tiny machine learning, IEEE Access (2024). supervised method for image surface defect detec- [24] D. P. Kingma, J. Ba, Adam: A method for stochas- tion using differences and commonalities, Engi- tic optimization, arXiv preprint arXiv:1412.6980 neering Applications of Artificial Intelligence 119 (2014). (2023) 105835.