=Paper= {{Paper |id=Vol-3865/13_paper |storemode=property |title=Filming the sound: Anomaly Detection on Audio Tape Recordings using Computer Vision Algorithms (full paper) |pdfUrl=https://ceur-ws.org/Vol-3865/13_paper.pdf |volume=Vol-3865 |authors=Zafer Çınar,Alessandro Russo,Matteo Spanio,Niccolò Pretto,Sergio Canazza |dblpUrl=https://dblp.org/rec/conf/aiia/Cinar0SPC24 }} ==Filming the sound: Anomaly Detection on Audio Tape Recordings using Computer Vision Algorithms (full paper)== https://ceur-ws.org/Vol-3865/13_paper.pdf
                                .

                         Filming the sound: Anomaly Detection on Audio Tape
                         Recordings using Computer Vision Algorithms
                         Zafer Çınar1,† , Alessandro Russo1,† , Matteo Spanio1,∗,† , Niccolò Pretto2,† and
                         Sergio Canazza1,†
                         1
                           Centro di Sonologia Computazionale (CSC), Department of Information Engineering, University of Padua, Via Giovanni
                         Gradenigo, 6b, 35131, Padua, Italy
                         2
                           Media Interaction Lab, Faculty of Engineering, Free University of Bozen-Bolzano, Via Bruno Buozzi, 1, 39100, Bozen, Italy


                                     Abstract
                                     The preservation of open-reel audio tapes is critical for maintaining valuable cultural and historical audio
                                     archives, yet current digitisation and analysis operations are often error-prone due to tape degradation and the
                                     long duration of the recordings. Considering the analog nature of this kind of recording, anomaly detection
                                     algorithms, applied to the video of the tape flowing on the playback head, can be used to detect errors and details
                                     with musicological value. This paper presents a new dataset of high-quality videos and a new algorithm for
                                     anomaly detection on audio tapes. Experimental results show notable improvements in detection performance,
                                     though false positives remain a challenge at higher speeds. Additionally, the new algorithm supports a wider
                                     range of playback speeds, improving its flexibility. This improvement is an important step towards a reliable
                                     implementation of the IEEE/MPAI CAE ARP standard (3302-2022).

                                      Keywords
                                      open reel audio tapes, irregularities detection, computer vision, preservation, restoration




                         1. Introduction
                         Nowadays, audiovisual archives are increasingly facing the challenge of preserving their collections
                         from deterioration [1]. Digitization is a key solution, converting analog materials like photos, films,
                         videos, and audio recordings into digital formats to mitigate physical degradation. However, the
                         digitization process must be based on a scientific methodology to ensure minimal information loss.
                         The Centro di Sonologia Computazionale (CSC) of the University of Padua has been working in
                         audio document preservation for the last decade [2, 3], carrying out its research activity to develop a
                         preservation methodology for audio documents [4]. At CSC, digitization goes beyond simply migrating
                         the audio content; it includes gathering metadata and contextual information, such as photos and
                         video documentation. This approach became essential when working with archives of electronic music
                         composers like Luciano Berio and Luigi Nono, who left markings and notes on tapes, also as indications
                         for live performances, when tapes were used almost like instruments on stage. Furthermore, the
                         video documentation captures important information about tape conditions and mechanical issues that
                         may affect its playback, such as dirt, loss of magnetic paste or deformations. Since reviewing hours
                         of video can be time-consuming and prone to errors, artificial intelligence may assist archivists and

                         3rd Workshop on Artificial Intelligence for Cultural Heritage (AI4CH 2024, https:// ai4ch.di.unito.it/ ), co-located with the 23nd
                         International Conference of the Italian Association for Artificial Intelligence (AIxIA 2024). 26-28 November 2024, Bolzano, Italy
                         ∗
                             Corresponding author.
                         †
                             These authors contributed equally.
                         Envelope-Open zafer.cinar@studenti.unipd.it (Z. Çınar); alessandro.russo@dei.unipd.it (A. Russo); spanio@dei.unipd.it (M. Spanio);
                         niccolo.pretto@unibz.it (N. Pretto); sergio.canazza@unipd.it (S. Canazza)
                         GLOBE https://matteospanio.github.io/ (M. Spanio);
                         https://www.unibz.it/it/faculties/engineering/academic-staff/person/47860-niccolo-pretto (N. Pretto);
                         https://www.dei.unipd.it/~canazza/ (S. Canazza)
                         Orcid 0000-0001-6691-759X (A. Russo); 0000-0002-2436-7208 (M. Spanio); 0000-0002-3742-7150 (N. Pretto); 0000-0001-7083-4615
                         (S. Canazza)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
researchers by automatically detecting points of interest on the surface of the tapes and, therefore,
the corresponding moment on the digitized audio recording. The methodology proposed by CSC has
been the core reference during the implementation of the IEEE/MPAI CAE ARP standard, approved
in December of 20221 . The implementation of a standard is often a long-term process that requires
continuous updates and refinements to accommodate evolving needs and technological advancements.
In the case of the IEEE/MPAI CAE ARP standard, the development of effective tools required several
iterations of improvements since its first version [5]. As new technologies emerge, software must be
re-evaluated and enhanced to address previous limitations, improve overall performance, and meet the
evolving expectations of both the archival and technical communities. This is the case of the Video
Analyzer, a component described in the standard developed to detect anomalies (also referred to as
irregularities), such as splices, tape degradations or notes on the surface of the digitized tapes. This
module analyzes a video framing a close-up of the tape flowing in front of the magnetic head during the
digitization process. This paper contributes to this ongoing effort by presenting a series of significant
improvements to the anomaly detection process in open-reel audio tapes. The key contributions of this
work include:

       1. a new dataset of high-quality videos can be used for testing new algorithms.
       2. a new detection algorithm that shows a strong improvement in comparison to the initial algorithm
          implemented in the Video Analyzer component.
       3. an extension of the supported playback speeds managed by the algorithm.

   These improvements result in a more robust and accurate system for identifying irregularities on
audio tapes, which will improve the reliability of the overall implementation of IEEE/MPAI CAE ARP
standard and, therefore, foster correct preservation and reduce restoration and analysis efforts. The
next Section provides a overall view on the background and related works at the base of this paper.
Section 3 provides more information about the standard. Then, the proposed algorithm is described
in Section 4. The experiment and dataset used to test the algorithm are reported in Section 5 along
with its performance, compared to the previous version. Finally, Section 6 concludes the article with a
discussion of the results and further opportunities for development.


2. Related work
Anomaly detection has been a lasting yet active research area in various research communities for several
decades [6]. Traditional methods often rely on machine learning models, such as convolutional neural
networks (CNNs) and support vector machines (SVMs), to recognize patterns in images or video frames
that deviate from normal behaviour. Studies on the application of deep learning models to anomaly
detection are still ongoing, with efforts concentrated on capturing high-dimensional representations
of normal data [7]. More recently, unsupervised approaches, including autoencoders and Generative
Adversarial Networks (GANs), have emerged as effective alternatives for detecting abnormalities,
particularly in cases where labeled data is limited [8, 9]. However, in this specific context, using deep
learning models poses several challenges. One major issue is the need for large-scale datasets, which
are difficult to obtain due to the time-consuming and error-prone nature of annotating frames for
tape anomalies. The lack of sufficient labeled data makes it impractical to train deep learning models
effectively. In the specific domain of anomaly detection in time-based media, like videos, frame-by-
frame comparisons are usually adopted to detect subtle visual differences [10]. The process typically
involves analyzing pixel-level changes and identifying unusual patterns such as signal damages or
inconsistencies. For example, background subtraction methods have been used in video surveillance to
detect anomalies, though they suffer from false positives when exposed to subtle variations like lighting
changes [11]. These techniques have also been adopted for other applications, such as medical image
analysis and industrial monitoring [12, 13].

1
    Link to the standard https://standards.ieee.org/ieee/3302/11006/ (Last Accessed: November 4, 2024)
2.1. Overview of the Video Analyzer
The Video Analyzer is one of the main components of the IEEE/MPAI CAE ARP standard (an overall
description of the standard will be provided in Section 3). This component implements an anomaly
detection algorithm on the video of the tape flowing on the playback head of the tape recorder. The frame
of each irregularity identified by the algorithm and related metadata are the output of this component.
The first version of the Video Analyzer featured an innovative method for detecting irregularities on
the surface of open-reel audio tapes by analyzing videos produced during the digitization process [14].
This approach enabled a detailed frame-by-frame inspection, allowing for the identification of physical
issues such as splices, scratches, and deformations. The system employed advanced computer vision
algorithms, notably the Generalized Hough Transform and Speeded Up Robust Features (SURF), to
identify regions of interest (ROIs) on the tapes and detect potential anomalies. The detection process
involved several key steps. First, the system focused on detecting ROIs in the section of the tape
beneath the reading head, using fixed elements like the pinch roller, a rotating rubber wheel which
accompanies and controls the movement of the tape, as reference points to ensure consistency across
frames. In the next phase, consecutive video frames were compared to identify significant differences.
This was done through a pixel-level analysis, where changes in pixel intensity were monitored. If the
number of differing pixels between two frames exceeded a set threshold, the system flagged the frame as
containing an anomaly. The system generated a difference image for each pair of frames, representing
potential irregularities in the tape. After the anomaly detection phase, the identified irregularities were
not classified immediately. Instead, the output from this process was passed to the Tape Irregularity
Classifier, a module within the IEEE/MPAI CAE ARP standard. At this stage, a convolutional neural
network was used to categorize the detected irregularities, identifying specific issues such as splices,
dirt, or other forms of damage on the tape’s surface. One of the primary limitations of the original
methodology was its reliance on the PAL (720x576) video format, which was the standard for most of the
archived video recordings. The use of PAL video introduced several challenges. For instance, the low
resolution and interlaced nature of PAL video (25 interlaced frames per second) affected the accuracy
of anomaly detection, particularly during the image classification stage. The misalignment between
odd and even lines due to interlacing reduced the precision of the convolutional neural network in
identifying anomalies, as it disrupted the visual clarity required to detect subtle tape irregularities [14].
To overcome these issues, high-definition (HD) video is necessary. By increasing the video resolution
up to Full HD (1920x1080) and frame rate to 50 fps (progressive), the enhanced method aims at reducing
motion blur by using a shutter speed of 1/100 of a second and improving the detail captured in each
frame. This allows for a more precise identification of anomalies, such as small scratches or splices
that may go unnoticed in lower-quality videos. This improvement also allows for an overall better
performance of machine learning models, which can leverage the additional details to detect more
complex irregularities. Thus, the transition to FHD video is crucial to refining the anomaly detection
process and achieving more accurate and reliable results.


3. IEEE/MPAI CAE ARP
MPAI is an international, independent, non-profit organization dedicated to developing standards for AI-
based data coding. MPAI Context-based Audio Enhancement (MPAI-CAE) standard aims to enhance the
user experience in audio-related applications such as entertainment, communication, teleconferencing,
gaming, post-production, and restoration. The MPAI-CAE international standard was approved in May
2022 and subsequently adopted by the IEEE Standards Association as 3302-2022 in December of the
same year. The presented work falls within the Audio Recording Preservation (ARP) use case, a part
of the MPAI-CAE standard. The IEEE/MPAI CAE ARP standard provides precise software references
for audio document preservation. Its technical specifications adopt the preservation methodology
developed at the CSC, incorporating AI-based computational tools to extract information from digitized
audio/video of analog open reel tapes. The use of AI enables the automatic detection of irregularities
on the surface of the tape, improving precision and speed in selecting and extracting irregularities,
such as splices, marks, and loss of magnetic paste. The technical architecture of the standard includes
five modules that target and process different digital inputs: the Audio Analyzer, the Video Analyzer,
the Tape Irregularity Classifier, the Tape Audio Restoration, and the Packager. The initial version of
the Video Analyzer was implemented by training the system using a dataset of videos created by the
CSC during numerous digitization projects. These videos were recorded during the A/D transfer of the
signal to provide a visual record of magnetic tapes, most of which were preparatory tapes containing
electronic music recordings. Documenting the presence of splices, different tape segments, possible
alterations, annotations, and marks added by the composer can be extremely useful for reconstructing
the philological history of a given work as well as detecting parts that could require an audio restoration
[15]. Moreover, the video provides valuable information regarding the preservation conditions of audio
documents, allowing to keep trace of splices, marks, and other surface irregularities.


4. Algorithm Description
The proposed method maintains the same framework as the existing approach, where consecutive frame
pairs are compared as the video plays. The logic behind this method is to treat the problem as motion
detection. While there is constant movement in the Region of Interest (ROI), areas without irregularities
appear static, like still images. This allows for the use of a traditional motion detection algorithm,
with irregularities seen as moving objects and regular regions as the background. The modification
introduced in this work is based on frame differencing with additional filtering steps. This approach is
chosen to improve the accuracy of detecting irregularities by focusing on meaningful changes between
consecutive frames while reducing the impact of noise and irrelevant variations. By adding filtering,
the method improves at distinguishing real irregularities from minor fluctuations that don’t indicate
actual issues. The flowchart in Figure 1 shows the frame differencing process with additional steps that
incorporate Otsu’s method [16] for thresholding and determining irregularities.




Figure 1: Flowchart of the frame differencing method with additional filtering.


   The first step in this method is identifying ROI, based on the latest techniques: where the detection
of the reading head is carried out using the Generalized Hough Transform and Speeded-Up Robust
Features (SURF) is used for detecting the position of the pinch roller. In this implementation, the ROI
of the tape area is reduced by half to decrease the computational load and improve accuracy. While
this method reduces computational overhead, it may increase the number of irregularities detected
in the case of problems that extend over long portions of the tape (for example, writing on the tape
or a large scratch) or tapes running at very low speeds. While this solution creates a certain amount
of “noise”, later modules of the standard are able to handle the extra frames, and this approach also
reduces information loss. This step helps the method focus only on the relevant parts of the video,
                         (a) Original ROI                          (b) Reduced ROI
Figure 2: Comparison of original and new Regions of Interest (ROIs). The green rectangles highlight the ROIs
while red rectangles highlight the reading heads.


making it more efficient and accurate. Figure 2b shows the reduced ROI.
   Next, the method calculates the absolute difference between pairs of consecutive frames to create a
difference image. This image shows the intensity variations between frames, with significant changes
indicating possible irregularities. By focusing on these intensity differences instead of just color changes,
the method can better detect motion and irregularities, ensuring that subtle but important differences
are not missed. This approach treats differences more selectively, unlike the previous method, which
treated all differences the same. Figure 3c shows the result of the absolute difference between the two
consecutive frames Figure 3a and Figure 3b.




         (a) Previous frame                  (b) Current frame                 (c) Difference image

Figure 3: Comparison of previous and current frames, with the calculated difference image.


   After creating the difference image, the standard deviation is calculated to decide if a frame needs
further evaluation. For a frame to be evaluated, the standard deviation must exceed the threshold
referred to as deviation limit in Figure 1. Frames with a standard deviation below this limit are considered
regular and are excluded from further processing. This step reduces the computational load by ensuring
that only frames with significant intensity variations are processed. The deviation limit varies based on
the tape speed, with values set as follows: 2.25 for 30 inches per second (ips), 2.5 for 15 ips, 2.6 for 7.5
ips, and 2.75 for 3.75 ips. This separation is made to address the intensity change with different speeds,
as higher speeds result in greater motion blur and reduced intensity. For frames that pass the standard
deviation check, Otsu’s method is used to find the right threshold for binarizing the difference image.
Depending on the result, either a global or Otsu’s threshold is selected to create a binary motion image.
To improve the accuracy of detecting irregularities, upper and lower limits are set on the threshold
values. The upper limit is 15 to prevent the threshold from being too high and missing parts of an
irregularity by potentially filtering out portions of it. The lower limit is 5 to stop the threshold from
being too low, which could cause false positives by highlighting insignificant portions of the frame.
This approach adapts the thresholding process for each frame and makes irregularity detection more
robust and reliable. Once the threshold is set, it is applied to the difference image to create a binary
motion image. This image shows potential irregularities as white pixels on a black background, making
it easier to identify significant differences between frames. The binary motion image serves as a key
intermediate step. Figure 4a shows the binarized image obtained by thresholding the difference image
in Figure 3c. To further improve the binary motion image, an opening operation is applied with a
kernel 3x3. This process, which involves erosion followed by dilation, removes small, irrelevant artifacts
caused by tape vibration or other external factors while keeping the main irregularities intact. This step
improves the clarity of potential irregularities, making them easier to detect and analyze in the final
evaluation phase. Finally, the method counts the number of white pixels in the processed image and
compares it to a set threshold of 5% of the total pixels to determine if the frame contains an irregularity.
If the count exceeds this threshold, the frame is marked as irregular; otherwise, it is classified as regular.
Figure 4b shows the image after morphological opening. Figure 5 and Figure 6 illustrate the processes
applied to an annotation and a shadow, respectively.




                        (a) Binarized image                     (b) Image after opening

Figure 4: Progression from the difference image to thresholding, followed by morphological opening.




                          (a) Current frame                     (b) Difference image




                         (c) Binarized image                   (d) Image after opening

Figure 5: Progression from the current frame to final image for an annotation.




                          (a) Current frame                     (b) Difference image




                         (c) Binarized image                   (d) Image after opening

Figure 6: Progression from the current frame to final image for a shadow.


  A further improvement concerns the quality of the videos provided as input of the algorithm. The
analysis was previously conducted on videos in PAL format at 25 fps interlaced with a resolution of
720x576. The new algorithm works on high-definition videos with 50 fps (progressive), a fixed shutter
speed of 1/100 of a second and a resolution of 1920x1080. This enhancement in video quality allows
for more precise detection of irregularities, as finer details and smaller anomalies can be captured
and analyzed more effectively. Since the video documentation began before the development of this
project, many videos were of lower quality and interlaced. One approach to handle the misalignment
caused by interlacing was to separate the even and odd fields, which mitigated the misalignment but
reduced the resolution to 720x228. To make effective use of these existing resources, the new method
provides a solution by employing interpolation-based deinterlacing. This approach ensures that the
captured irregularities are not limited to the reduced resolution of 720x228 but are instead maintained
at the original full resolution. By preserving the original dimensions, the method aims to enhance the
accuracy of irregularity detection and also addresses the challenge of misaligned lines, which could
impact classification in the future. Figure 7a shows a frame from the original interlaced video and
Figure 7b shows the resulting image of deinterlacing.




                   (a) Original interlaced frame               (b) Deinterlaced frame

Figure 7: Comparison between an interlaced frame with misaligned lines and the deinterlaced frame.


  In this novel version, the speed options have been expanded. The previous version of the Video
Analyzer was specifically calibrated for detecting the anomalies on tapes recorded at 7.5 ips and 15
ips. However, the new updated version also supports lower and higher speeds (3.75 ips and 30 ips).
This change makes the method more flexible and adaptable, allowing it to handle a wider range of tape
playback scenarios.


5. Experiment and evaluation
To evaluate the performance of the improved anomaly detection method, an experiment was conducted
using videos of open-reel tapes featuring various manually annotated irregularities (mainly splices, as
they are the most common one). The test aimed at comparing the improved detection method against
the original one across different playback speeds, providing a comprehensive analysis of precision and
recall. The videos were captured at four distinct speeds: 3.75 ips, 7.5 ips, 15 ips, and 30 ips, with tape
durations varying accordingly—10 minutes for 3.75 ips, 34 minutes for 7.5 ips, 10 minutes for 15 ips, and
9 minutes for 30 ips. At 3.75 ips, there were 14 splices present. At 7.5 ips, the tape had 66 splices, along
with 4 annotations, 3 shadows, and 3 end-of-tape markers (a full description of this kind of irregularies
can be found in [4]). For 15 ips, the tape featured 55 splices and 1 shadow. At 30 ips, the tape had 93
splices, 2 end-of-tape markers and 2 annotations. For the sake of comparison completeness, the authors
also modified the previous version of the algorithm to support two additional speeds. The experiment
focused on two key metrics: precision (the ability to correctly identify anomalies without generating
false positives) and recall (the ability to detect all present anomalies). Both metrics were measured for
each playback speed to compare the results of the old and new methods. The results, summarized in
Table 1, show a notable difference between the two approaches. The new method successfully detected
all irregularities at every speed but produced some false positives, while the old method demonstrated
inconsistent performance, particularly struggling at lower speeds.
   The new method detected all irregularities across the given speeds. However, in some cases, it
generated some false positives. This was especially noticeable at 30 ips, where the motion blur caused
by the higher speed required an increment of the sensitivity value for detecting the frame differences,
Table 1
Evaluation for different speed options
                     Tape Speed      Precision - New     Recall - New      Precision - Old     Recall - Old
                      3.75ips             0.9333             1.0000             0.0000            0.0000
                       7.5ips             0.9268             1.0000             0.4156            0.8421
                        15ips             0.9655             1.0000             0.9643            0.4821
                        30ips             0.8818             1.0000             0.9655            0.8660


which leads to more false positives under the presence of lighting changes and vibrations. It should be
noted that some of the false positives are duplicates of the same irregularities. The method was designed
to ensure that closely occurring irregularities are not missed, which sometimes results in duplicates. In
future releases, the method can be improved by merging consecutive irregularities into one after the
classification. This change could help in reducing false positives and enhancing the overall detection
precision. The old method faced significant challenges, specifically at 3.75 ips, where it failed to detect
any irregularity. This shortcoming is likely due to its reliance on detecting large pixel differences
between frames, which are less noticeable at slower speeds. Consequently, the old method showed
some improvements at the highest speed (30 ips), where pixel variations are more pronounced, with
a lower detection precision in comparison to the new improvement. Additionally, due to its reliance
on pixel count, the old method struggled to detect irregularities other than splices. This limitation
arose because irregularities such as shadows and annotations often affect only a smaller area of the
tape, making them less noticeable in comparison with splices. The new method addresses these issues
more effectively: by focusing on intensity differences between frames, and using filtering techniques, it
provides a more accurate detection. As a result, the overall performance of the novel method is better
than the previous one, providing a more accurate and consistent irregularity detection across different
speeds and various video conditions.


6. Conclusions
This work introduced significant improvements in the automatic detection of superficial irregularities on
open-reel audio tapes using computer vision techniques. The key contributions include the development
of a new dataset of high-quality videos2 , an enhanced detection algorithm with improved accuracy, and
an expanded range of supported playback speeds. Despite these advancements, the system still faces
certain limitations. The increased recall of the new algorithm, especially at higher playback speeds, led to
false positives, particularly due to lighting changes and vibrations. Additionally, double detections of the
same irregularities highlighted the need for further refinements, such as post-processing techniques that
can merge closely occurring irregularities into a single detection. Future works will focus on addressing
these limitations. Possible improvements will include refining the filtering process to reduce false
positives and optimizing the classification module to handle more complex irregularities. Furthermore,
expanding the algorithm’s capabilities to work with other tape formats and video resolutions could
enhance its applicability to a broader range of archival materials, supporting better preservation and
restoration efforts in audiovisual archives.


Acknowledgments
This work is partially supported by the SYCURI Project, funded by the University of Padova in the
Program “World Class Research Infrastructure”.


2
    The dataset presented in this article is available on Zenodo. The assigned DOI is 10.5281/zenodo.14028922. Please refer to
    this repository to access data and additional details.
References
 [1] F. Rumsey, Will you be mine forever? Audio archiving, multitracks, and 90s digital, Journal of the
     Audio Engineering Society 68 (2020) 304–307. URL: https://aes2.org/publications/elibrary-page/
     ?id=20736.
 [2] S. Canazza, G. De Poli, Four decades of music research, creation, and education at Padua’s Centro di
     Sonologia Computazionale, Computer Music Journal 43 (2020) 58–80. doi:10.1162/comj_a_00537 .
 [3] S. Canazza, G. De Poli, A. Vidolin, Gesture, music and computer: The Centro di Sonologia Com-
     putazionale at Padova University, a 50-year history, Sensors 22 (2022). doi:10.3390/s22093465 .
 [4] N. Pretto, C. Fantozzi, E. Micheloni, V. Burini, S. Canazza, Computing methodologies supporting
     the preservation of electroacoustic music from analog magnetic tape, Computer Music Journal 42
     (2018) 59 – 74. doi:10.1162/comj_a_00487 .
 [5] M. Bosi, N. Pretto, M. Guarise, S. Canazza, Sound and music computing using AI: Designing a
     standard, in: Proceedings of the 18th Sound and Music Computing Conference, Virtual, 2021, p.
     215 – 218. doi:10.5281/zenodo.5045003 .
 [6] G. Pang, C. Shen, L. Cao, A. V. D. Hengel, Deep learning for anomaly detection: A review, ACM
     Computing Surveys 54 (2021). doi:10.1145/3439950 .
 [7] R. Chalapathy, S. Chawla, Deep learning for anomaly detection: A survey, ArXiv abs/1901.03407
     (2019). URL: https://api.semanticscholar.org/CorpusID:57825713.
 [8] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsupervised anomaly
     detection with generative adversarial networks to guide marker discovery, in: M. Niethammer,
     M. Styner, S. Aylward, H. Zhu, I. Oguz, P.-T. Yap, D. Shen (Eds.), Information Processing in
     Medical Imaging, Springer International Publishing, Cham, 2017, p. 146 – 157. doi:10.1007/
     978- 3- 319- 59050- 9_12 .
 [9] X. Xia, X. Pan, N. Li, X. He, L. Ma, X. Zhang, N. Ding, GAN-based anomaly detection: A review,
     Neurocomputing 493 (2022) 497–535. doi:10.1016/j.neucom.2021.12.093 .
[10] W. Sultani, C. Chen, M. Shah, Real-world anomaly detection in surveillance videos, in: 2018
     IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6479–6488. doi:10.
     1109/CVPR.2018.00678 .
[11] S. Qasim, K. N. Khan, M. Yu, M. S. Khan, Performance evaluation of background subtraction
     techniques for video frames, in: 2021 International Conference on Artificial Intelligence (ICAI),
     2021, pp. 102–107. doi:10.1109/ICAI52203.2021.9445253 .
[12] L. Nans, C. Mediavilla, D. Marez, S. Parameswaran, Leveraging motion saliency via frame differenc-
     ing for enhanced object detection in videos, in: M. S. Alam, V. K. Asari (Eds.), Pattern Recognition
     and Tracking XXXIV, volume 12527, International Society for Optics and Photonics, SPIE, 2023, p.
     125270V. doi:10.1117/12.2678373 .
[13] A. Wang, Y. Ji, M. Chen, Y. Liu, Z. Li, S. Yan, Moving object recognition on production line based
     on adaptive frame differencing algorithm, in: 2024 36th Chinese Control and Decision Conference
     (CCDC), 2024, pp. 966–971. doi:10.1109/CCDC62350.2024.10587928 .
[14] A. Russo, M. Spanio, S. Canazza, Enhancing preservation and restoration of open reel audio
     tapes through computer vision, in: G. L. Foresti, A. Fusiello, E. Hancock (Eds.), Image Analysis
     and Processing - ICIAP 2023 Workshops, Springer Nature Switzerland, Cham, 2024, pp. 297–308.
     doi:10.1007/978- 3- 031- 51026- 7_26 .
[15] N. Pretto, E. Micheloni, A. Chmiel, N. D. Pozza, D. Marinello, E. Schubert, S. Canazza, Multimedia
     archives: New digital filters to correct equalization errors on digitized audio tapes, Advances in
     Multimedia (2021). doi:10.1155/2021/5410218 .
[16] N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems,
     Man, and Cybernetics 9 (1979) 62 – 66. doi:10.1109/TSMC.1979.4310076 .