<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Filming the sound: Anomaly Detection on Audio Tape Recordings using Computer Vision Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergio Canazza</string-name>
          <email>sergio.canazza@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro di Sonologia Computazionale (CSC), Department of Information Engineering, University of Padua</institution>
          ,
          <addr-line>Via Giovanni</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gradenigo</institution>
          ,
          <addr-line>6b, 35131, Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Media Interaction Lab, Faculty of Engineering, Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Via Bruno Buozzi, 1, 39100, Bozen</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>The preservation of open-reel audio tapes is critical for maintaining valuable cultural and historical audio archives, yet current digitisation and analysis operations are often error-prone due to tape degradation and the long duration of the recordings. Considering the analog nature of this kind of recording, anomaly detection algorithms, applied to the video of the tape flowing on the playback head, can be used to detect errors and details with musicological value. This paper presents a new dataset of high-quality videos and a new algorithm for anomaly detection on audio tapes. Experimental results show notable improvements in detection performance, though false positives remain a challenge at higher speeds. Additionally, the new algorithm supports a wider range of playback speeds, improving its flexibility. This improvement is an important step towards a reliable 3rd Workshop on Artificial Intelligence for Cultural Heritage (AI4CH 2024, https:// ai4ch.di.unito.it/ ), co-located with the 23nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2024). 26-28 November 2024, Bolzano, Italy ∗Corresponding author. †These authors contributed equally. https://www.unibz.it/it/faculties/engineering/academic-staff/person/47860-niccolo-pretto (N. Pretto); https://www.dei.unipd.it/~canazza/ (S. Canazza) 0000-0001-6691-759X (A. Russo); 0000-0002-2436-7208 (M. Spanio); 0000-0002-3742-7150 (N. Pretto); 0000-0001-7083-4615 Proceedings</p>
      </abstract>
      <kwd-group>
        <kwd>Algorithms</kwd>
        <kwd>open reel audio tapes</kwd>
        <kwd>irregularities detection</kwd>
        <kwd>computer vision</kwd>
        <kwd>preservation</kwd>
        <kwd>restoration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, audiovisual archives are increasingly facing the challenge of preserving their collections
from deterioration [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Digitization is a key solution, converting analog materials like photos, films,
videos, and audio recordings into digital formats to mitigate physical degradation. However, the
digitization process must be based on a scientific methodology to ensure minimal information loss.
The Centro di Sonologia Computazionale (CSC) of the University of Padua has been working in
audio document preservation for the last decade [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], carrying out its research activity to develop a
preservation methodology for audio documents [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. At CSC, digitization goes beyond simply migrating
the audio content; it includes gathering metadata and contextual information, such as photos and
video documentation. This approach became essential when working with archives of electronic music
composers like Luciano Berio and Luigi Nono, who left markings and notes on tapes, also as indications
for live performances, when tapes were used almost like instruments on stage. Furthermore, the
video documentation captures important information about tape conditions and mechanical issues that
may afect its playback, such as dirt, loss of magnetic paste or deformations. Since reviewing hours
of video can be time-consuming and prone to errors, artificial intelligence may assist archivists and
(S. Canazza)
      </p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
researchers by automatically detecting points of interest on the surface of the tapes and, therefore,
the corresponding moment on the digitized audio recording. The methodology proposed by CSC has
been the core reference during the implementation of the IEEE/MPAI CAE ARP standard, approved
in December of 20221. The implementation of a standard is often a long-term process that requires
continuous updates and refinements to accommodate evolving needs and technological advancements.
In the case of the IEEE/MPAI CAE ARP standard, the development of efective tools required several
iterations of improvements since its first version [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As new technologies emerge, software must be
re-evaluated and enhanced to address previous limitations, improve overall performance, and meet the
evolving expectations of both the archival and technical communities. This is the case of the Video
Analyzer, a component described in the standard developed to detect anomalies (also referred to as
irregularities), such as splices, tape degradations or notes on the surface of the digitized tapes. This
module analyzes a video framing a close-up of the tape flowing in front of the magnetic head during the
digitization process. This paper contributes to this ongoing efort by presenting a series of significant
improvements to the anomaly detection process in open-reel audio tapes. The key contributions of this
work include:
1. a new dataset of high-quality videos can be used for testing new algorithms.
2. a new detection algorithm that shows a strong improvement in comparison to the initial algorithm
implemented in the Video Analyzer component.
3. an extension of the supported playback speeds managed by the algorithm.
      </p>
      <p>These improvements result in a more robust and accurate system for identifying irregularities on
audio tapes, which will improve the reliability of the overall implementation of IEEE/MPAI CAE ARP
standard and, therefore, foster correct preservation and reduce restoration and analysis eforts. The
next Section provides a overall view on the background and related works at the base of this paper.
Section 3 provides more information about the standard. Then, the proposed algorithm is described
in Section 4. The experiment and dataset used to test the algorithm are reported in Section 5 along
with its performance, compared to the previous version. Finally, Section 6 concludes the article with a
discussion of the results and further opportunities for development.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Anomaly detection has been a lasting yet active research area in various research communities for several
decades [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Traditional methods often rely on machine learning models, such as convolutional neural
networks (CNNs) and support vector machines (SVMs), to recognize patterns in images or video frames
that deviate from normal behaviour. Studies on the application of deep learning models to anomaly
detection are still ongoing, with eforts concentrated on capturing high-dimensional representations
of normal data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. More recently, unsupervised approaches, including autoencoders and Generative
Adversarial Networks (GANs), have emerged as efective alternatives for detecting abnormalities,
particularly in cases where labeled data is limited [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. However, in this specific context, using deep
learning models poses several challenges. One major issue is the need for large-scale datasets, which
are dificult to obtain due to the time-consuming and error-prone nature of annotating frames for
tape anomalies. The lack of suficient labeled data makes it impractical to train deep learning models
efectively. In the specific domain of anomaly detection in time-based media, like videos,
frame-byframe comparisons are usually adopted to detect subtle visual diferences [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The process typically
involves analyzing pixel-level changes and identifying unusual patterns such as signal damages or
inconsistencies. For example, background subtraction methods have been used in video surveillance to
detect anomalies, though they sufer from false positives when exposed to subtle variations like lighting
changes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. These techniques have also been adopted for other applications, such as medical image
analysis and industrial monitoring [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
1Link to the standard https://standards.ieee.org/ieee/3302/11006/ (Last Accessed: November 4, 2024)
2.1. Overview of the Video Analyzer
The Video Analyzer is one of the main components of the IEEE/MPAI CAE ARP standard (an overall
description of the standard will be provided in Section 3). This component implements an anomaly
detection algorithm on the video of the tape flowing on the playback head of the tape recorder. The frame
of each irregularity identified by the algorithm and related metadata are the output of this component.
The first version of the Video Analyzer featured an innovative method for detecting irregularities on
the surface of open-reel audio tapes by analyzing videos produced during the digitization process [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
This approach enabled a detailed frame-by-frame inspection, allowing for the identification of physical
issues such as splices, scratches, and deformations. The system employed advanced computer vision
algorithms, notably the Generalized Hough Transform and Speeded Up Robust Features (SURF), to
identify regions of interest (ROIs) on the tapes and detect potential anomalies. The detection process
involved several key steps. First, the system focused on detecting ROIs in the section of the tape
beneath the reading head, using fixed elements like the pinch roller, a rotating rubber wheel which
accompanies and controls the movement of the tape, as reference points to ensure consistency across
frames. In the next phase, consecutive video frames were compared to identify significant diferences.
This was done through a pixel-level analysis, where changes in pixel intensity were monitored. If the
number of difering pixels between two frames exceeded a set threshold, the system flagged the frame as
containing an anomaly. The system generated a diference image for each pair of frames, representing
potential irregularities in the tape. After the anomaly detection phase, the identified irregularities were
not classified immediately. Instead, the output from this process was passed to the Tape Irregularity
Classifier, a module within the IEEE/MPAI CAE ARP standard. At this stage, a convolutional neural
network was used to categorize the detected irregularities, identifying specific issues such as splices,
dirt, or other forms of damage on the tape’s surface. One of the primary limitations of the original
methodology was its reliance on the PAL (720x576) video format, which was the standard for most of the
archived video recordings. The use of PAL video introduced several challenges. For instance, the low
resolution and interlaced nature of PAL video (25 interlaced frames per second) afected the accuracy
of anomaly detection, particularly during the image classification stage. The misalignment between
odd and even lines due to interlacing reduced the precision of the convolutional neural network in
identifying anomalies, as it disrupted the visual clarity required to detect subtle tape irregularities [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
To overcome these issues, high-definition (HD) video is necessary. By increasing the video resolution
up to Full HD (1920x1080) and frame rate to 50 fps (progressive), the enhanced method aims at reducing
motion blur by using a shutter speed of 1/100 of a second and improving the detail captured in each
frame. This allows for a more precise identification of anomalies, such as small scratches or splices
that may go unnoticed in lower-quality videos. This improvement also allows for an overall better
performance of machine learning models, which can leverage the additional details to detect more
complex irregularities. Thus, the transition to FHD video is crucial to refining the anomaly detection
process and achieving more accurate and reliable results.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. IEEE/MPAI CAE ARP</title>
      <p>
        MPAI is an international, independent, non-profit organization dedicated to developing standards for
AIbased data coding. MPAI Context-based Audio Enhancement (MPAI-CAE) standard aims to enhance the
user experience in audio-related applications such as entertainment, communication, teleconferencing,
gaming, post-production, and restoration. The MPAI-CAE international standard was approved in May
2022 and subsequently adopted by the IEEE Standards Association as 3302-2022 in December of the
same year. The presented work falls within the Audio Recording Preservation (ARP) use case, a part
of the MPAI-CAE standard. The IEEE/MPAI CAE ARP standard provides precise software references
for audio document preservation. Its technical specifications adopt the preservation methodology
developed at the CSC, incorporating AI-based computational tools to extract information from digitized
audio/video of analog open reel tapes. The use of AI enables the automatic detection of irregularities
on the surface of the tape, improving precision and speed in selecting and extracting irregularities,
such as splices, marks, and loss of magnetic paste. The technical architecture of the standard includes
ifve modules that target and process diferent digital inputs: the Audio Analyzer, the Video Analyzer,
the Tape Irregularity Classifier, the Tape Audio Restoration, and the Packager. The initial version of
the Video Analyzer was implemented by training the system using a dataset of videos created by the
CSC during numerous digitization projects. These videos were recorded during the A/D transfer of the
signal to provide a visual record of magnetic tapes, most of which were preparatory tapes containing
electronic music recordings. Documenting the presence of splices, diferent tape segments, possible
alterations, annotations, and marks added by the composer can be extremely useful for reconstructing
the philological history of a given work as well as detecting parts that could require an audio restoration
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Moreover, the video provides valuable information regarding the preservation conditions of audio
documents, allowing to keep trace of splices, marks, and other surface irregularities.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithm Description</title>
      <p>
        The proposed method maintains the same framework as the existing approach, where consecutive frame
pairs are compared as the video plays. The logic behind this method is to treat the problem as motion
detection. While there is constant movement in the Region of Interest (ROI), areas without irregularities
appear static, like still images. This allows for the use of a traditional motion detection algorithm,
with irregularities seen as moving objects and regular regions as the background. The modification
introduced in this work is based on frame diferencing with additional filtering steps. This approach is
chosen to improve the accuracy of detecting irregularities by focusing on meaningful changes between
consecutive frames while reducing the impact of noise and irrelevant variations. By adding filtering,
the method improves at distinguishing real irregularities from minor fluctuations that don’t indicate
actual issues. The flowchart in Figure 1 shows the frame diferencing process with additional steps that
incorporate Otsu’s method [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for thresholding and determining irregularities.
      </p>
      <p>The first step in this method is identifying ROI, based on the latest techniques: where the detection
of the reading head is carried out using the Generalized Hough Transform and Speeded-Up Robust
Features (SURF) is used for detecting the position of the pinch roller. In this implementation, the ROI
of the tape area is reduced by half to decrease the computational load and improve accuracy. While
this method reduces computational overhead, it may increase the number of irregularities detected
in the case of problems that extend over long portions of the tape (for example, writing on the tape
or a large scratch) or tapes running at very low speeds. While this solution creates a certain amount
of “noise”, later modules of the standard are able to handle the extra frames, and this approach also
reduces information loss. This step helps the method focus only on the relevant parts of the video,
(a) Original ROI
(b) Reduced ROI
making it more eficient and accurate. Figure 2b shows the reduced ROI.</p>
      <p>Next, the method calculates the absolute diference between pairs of consecutive frames to create a
diference image. This image shows the intensity variations between frames, with significant changes
indicating possible irregularities. By focusing on these intensity diferences instead of just color changes,
the method can better detect motion and irregularities, ensuring that subtle but important diferences
are not missed. This approach treats diferences more selectively, unlike the previous method, which
treated all diferences the same. Figure 3c shows the result of the absolute diference between the two
consecutive frames Figure 3a and Figure 3b.</p>
      <p>(a) Previous frame
(b) Current frame
(c) Diference image</p>
      <p>After creating the diference image, the standard deviation is calculated to decide if a frame needs
further evaluation. For a frame to be evaluated, the standard deviation must exceed the threshold
referred to as deviation limit in Figure 1. Frames with a standard deviation below this limit are considered
regular and are excluded from further processing. This step reduces the computational load by ensuring
that only frames with significant intensity variations are processed. The deviation limit varies based on
the tape speed, with values set as follows: 2.25 for 30 inches per second (ips), 2.5 for 15 ips, 2.6 for 7.5
ips, and 2.75 for 3.75 ips. This separation is made to address the intensity change with diferent speeds,
as higher speeds result in greater motion blur and reduced intensity. For frames that pass the standard
deviation check, Otsu’s method is used to find the right threshold for binarizing the diference image.
Depending on the result, either a global or Otsu’s threshold is selected to create a binary motion image.
To improve the accuracy of detecting irregularities, upper and lower limits are set on the threshold
values. The upper limit is 15 to prevent the threshold from being too high and missing parts of an
irregularity by potentially filtering out portions of it. The lower limit is 5 to stop the threshold from
being too low, which could cause false positives by highlighting insignificant portions of the frame.
This approach adapts the thresholding process for each frame and makes irregularity detection more
robust and reliable. Once the threshold is set, it is applied to the diference image to create a binary
motion image. This image shows potential irregularities as white pixels on a black background, making
it easier to identify significant diferences between frames. The binary motion image serves as a key
intermediate step. Figure 4a shows the binarized image obtained by thresholding the diference image
in Figure 3c. To further improve the binary motion image, an opening operation is applied with a
kernel 3x3. This process, which involves erosion followed by dilation, removes small, irrelevant artifacts
caused by tape vibration or other external factors while keeping the main irregularities intact. This step
improves the clarity of potential irregularities, making them easier to detect and analyze in the final
evaluation phase. Finally, the method counts the number of white pixels in the processed image and
compares it to a set threshold of 5% of the total pixels to determine if the frame contains an irregularity.
If the count exceeds this threshold, the frame is marked as irregular; otherwise, it is classified as regular.
Figure 4b shows the image after morphological opening. Figure 5 and Figure 6 illustrate the processes
applied to an annotation and a shadow, respectively.</p>
      <p>(a) Binarized image
(b) Image after opening</p>
      <p>A further improvement concerns the quality of the videos provided as input of the algorithm. The
analysis was previously conducted on videos in PAL format at 25 fps interlaced with a resolution of
720x576. The new algorithm works on high-definition videos with 50 fps (progressive), a fixed shutter
speed of 1/100 of a second and a resolution of 1920x1080. This enhancement in video quality allows
for more precise detection of irregularities, as finer details and smaller anomalies can be captured
and analyzed more efectively. Since the video documentation began before the development of this
project, many videos were of lower quality and interlaced. One approach to handle the misalignment
caused by interlacing was to separate the even and odd fields, which mitigated the misalignment but
reduced the resolution to 720x228. To make efective use of these existing resources, the new method
provides a solution by employing interpolation-based deinterlacing. This approach ensures that the
captured irregularities are not limited to the reduced resolution of 720x228 but are instead maintained
at the original full resolution. By preserving the original dimensions, the method aims to enhance the
accuracy of irregularity detection and also addresses the challenge of misaligned lines, which could
impact classification in the future. Figure 7a shows a frame from the original interlaced video and
Figure 7b shows the resulting image of deinterlacing.</p>
      <p>(a) Original interlaced frame
(b) Deinterlaced frame</p>
      <p>In this novel version, the speed options have been expanded. The previous version of the Video
Analyzer was specifically calibrated for detecting the anomalies on tapes recorded at 7.5 ips and 15
ips. However, the new updated version also supports lower and higher speeds (3.75 ips and 30 ips).
This change makes the method more flexible and adaptable, allowing it to handle a wider range of tape
playback scenarios.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment and evaluation</title>
      <p>
        To evaluate the performance of the improved anomaly detection method, an experiment was conducted
using videos of open-reel tapes featuring various manually annotated irregularities (mainly splices, as
they are the most common one). The test aimed at comparing the improved detection method against
the original one across diferent playback speeds, providing a comprehensive analysis of precision and
recall. The videos were captured at four distinct speeds: 3.75 ips, 7.5 ips, 15 ips, and 30 ips, with tape
durations varying accordingly—10 minutes for 3.75 ips, 34 minutes for 7.5 ips, 10 minutes for 15 ips, and
9 minutes for 30 ips. At 3.75 ips, there were 14 splices present. At 7.5 ips, the tape had 66 splices, along
with 4 annotations, 3 shadows, and 3 end-of-tape markers (a full description of this kind of irregularies
can be found in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). For 15 ips, the tape featured 55 splices and 1 shadow. At 30 ips, the tape had 93
splices, 2 end-of-tape markers and 2 annotations. For the sake of comparison completeness, the authors
also modified the previous version of the algorithm to support two additional speeds. The experiment
focused on two key metrics: precision (the ability to correctly identify anomalies without generating
false positives) and recall (the ability to detect all present anomalies). Both metrics were measured for
each playback speed to compare the results of the old and new methods. The results, summarized in
Table 1, show a notable diference between the two approaches. The new method successfully detected
all irregularities at every speed but produced some false positives, while the old method demonstrated
inconsistent performance, particularly struggling at lower speeds.
      </p>
      <p>The new method detected all irregularities across the given speeds. However, in some cases, it
generated some false positives. This was especially noticeable at 30 ips, where the motion blur caused
by the higher speed required an increment of the sensitivity value for detecting the frame diferences,
which leads to more false positives under the presence of lighting changes and vibrations. It should be
noted that some of the false positives are duplicates of the same irregularities. The method was designed
to ensure that closely occurring irregularities are not missed, which sometimes results in duplicates. In
future releases, the method can be improved by merging consecutive irregularities into one after the
classification. This change could help in reducing false positives and enhancing the overall detection
precision. The old method faced significant challenges, specifically at 3.75 ips, where it failed to detect
any irregularity. This shortcoming is likely due to its reliance on detecting large pixel diferences
between frames, which are less noticeable at slower speeds. Consequently, the old method showed
some improvements at the highest speed (30 ips), where pixel variations are more pronounced, with
a lower detection precision in comparison to the new improvement. Additionally, due to its reliance
on pixel count, the old method struggled to detect irregularities other than splices. This limitation
arose because irregularities such as shadows and annotations often afect only a smaller area of the
tape, making them less noticeable in comparison with splices. The new method addresses these issues
more efectively: by focusing on intensity diferences between frames, and using filtering techniques, it
provides a more accurate detection. As a result, the overall performance of the novel method is better
than the previous one, providing a more accurate and consistent irregularity detection across diferent
speeds and various video conditions.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This work introduced significant improvements in the automatic detection of superficial irregularities on
open-reel audio tapes using computer vision techniques. The key contributions include the development
of a new dataset of high-quality videos2, an enhanced detection algorithm with improved accuracy, and
an expanded range of supported playback speeds. Despite these advancements, the system still faces
certain limitations. The increased recall of the new algorithm, especially at higher playback speeds, led to
false positives, particularly due to lighting changes and vibrations. Additionally, double detections of the
same irregularities highlighted the need for further refinements, such as post-processing techniques that
can merge closely occurring irregularities into a single detection. Future works will focus on addressing
these limitations. Possible improvements will include refining the filtering process to reduce false
positives and optimizing the classification module to handle more complex irregularities. Furthermore,
expanding the algorithm’s capabilities to work with other tape formats and video resolutions could
enhance its applicability to a broader range of archival materials, supporting better preservation and
restoration eforts in audiovisual archives.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is partially supported by the SYCURI Project, funded by the University of Padova in the
Program “World Class Research Infrastructure”.
2The dataset presented in this article is available on Zenodo. The assigned DOI is 10.5281/zenodo.14028922. Please refer to
this repository to access data and additional details.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rumsey</surname>
          </string-name>
          ,
          <article-title>Will you be mine forever? Audio archiving, multitracks, and 90s digital</article-title>
          ,
          <source>Journal of the Audio Engineering Society</source>
          <volume>68</volume>
          (
          <year>2020</year>
          )
          <fpage>304</fpage>
          -
          <lpage>307</lpage>
          . URL: https://aes2.org/publications/elibrary-page/ ?id=
          <fpage>20736</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          , G. De Poli,
          <article-title>Four decades of music research, creation, and education at Padua's Centro di Sonologia Computazionale</article-title>
          ,
          <source>Computer Music Journal</source>
          <volume>43</volume>
          (
          <year>2020</year>
          )
          <fpage>58</fpage>
          -
          <lpage>80</lpage>
          . doi:
          <volume>10</volume>
          .1162/comj_a_
          <fpage>00537</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          , G. De Poli,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vidolin</surname>
          </string-name>
          , Gesture, music and computer: The Centro di Sonologia Computazionale at Padova University, a 50-
          <string-name>
            <surname>year</surname>
            <given-names>history</given-names>
          </string-name>
          ,
          <source>Sensors</source>
          <volume>22</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3390/s22093465.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fantozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Micheloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Burini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          ,
          <article-title>Computing methodologies supporting the preservation of electroacoustic music from analog magnetic tape</article-title>
          ,
          <source>Computer Music Journal</source>
          <volume>42</volume>
          (
          <year>2018</year>
          )
          <fpage>59</fpage>
          -
          <lpage>74</lpage>
          . doi:
          <volume>10</volume>
          .1162/comj_a_
          <fpage>00487</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guarise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          ,
          <article-title>Sound and music computing using AI: Designing a standard</article-title>
          ,
          <source>in: Proceedings of the 18th Sound and Music Computing Conference</source>
          , Virtual,
          <year>2021</year>
          , p.
          <fpage>215</fpage>
          -
          <lpage>218</lpage>
          . doi:
          <volume>10</volume>
          .5281/zenodo.5045003.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V. D.</given-names>
            <surname>Hengel</surname>
          </string-name>
          ,
          <article-title>Deep learning for anomaly detection: A review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3439950.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chalapathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Deep learning for anomaly detection: A survey</article-title>
          , ArXiv abs/
          <year>1901</year>
          .03407 (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/CorpusID:57825713.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schlegl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Seeböck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Waldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Schmidt-Erfurth</surname>
          </string-name>
          , G. Langs,
          <article-title>Unsupervised anomaly detection with generative adversarial networks to guide marker discovery</article-title>
          , in: M.
          <string-name>
            <surname>Niethammer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Styner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Aylward</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>I. Oguz</given-names>
          </string-name>
          , P.-T. Yap, D. Shen (Eds.),
          <source>Information Processing in Medical Imaging</source>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , p.
          <fpage>146</fpage>
          -
          <lpage>157</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -59050-9_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , N. Ding,
          <article-title>GAN-based anomaly detection: A review</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>493</volume>
          (
          <year>2022</year>
          )
          <fpage>497</fpage>
          -
          <lpage>535</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2021</year>
          .
          <volume>12</volume>
          .093.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sultani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Real-world anomaly detection in surveillance videos</article-title>
          ,
          <source>in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>6479</fpage>
          -
          <lpage>6488</lpage>
          . doi:
          <volume>10</volume>
          . 1109/CVPR.
          <year>2018</year>
          .
          <volume>00678</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Qasim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Performance evaluation of background subtraction techniques for video frames</article-title>
          ,
          <source>in: 2021 International Conference on Artificial Intelligence (ICAI)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>107</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICAI52203.
          <year>2021</year>
          .
          <volume>9445253</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mediavilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          ,
          <article-title>Leveraging motion saliency via frame diferencing for enhanced object detection in videos</article-title>
          , in: M. S. Alam, V. K. Asari (Eds.),
          <source>Pattern Recognition and Tracking XXXIV</source>
          , volume
          <volume>12527</volume>
          , International Society for Optics and Photonics,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          ,
          <year>2023</year>
          , p.
          <source>125270V. doi:10.1117/12</source>
          .2678373.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Moving object recognition on production line based on adaptive frame diferencing algorithm</article-title>
          ,
          <source>in: 2024 36th Chinese Control and Decision Conference (CCDC)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>966</fpage>
          -
          <lpage>971</lpage>
          . doi:
          <volume>10</volume>
          .1109/CCDC62350.
          <year>2024</year>
          .
          <volume>10587928</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spanio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          ,
          <article-title>Enhancing preservation and restoration of open reel audio tapes through computer vision</article-title>
          , in: G. L.
          <string-name>
            <surname>Foresti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fusiello</surname>
          </string-name>
          , E. Hancock (Eds.),
          <source>Image Analysis and Processing - ICIAP 2023 Workshops</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>308</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -51026-7_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Micheloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chmiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Pozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canazza</surname>
          </string-name>
          , Multimedia archives:
          <article-title>New digital filters to correct equalization errors on digitized audio tapes</article-title>
          , Advances in Multimedia (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1155/
          <year>2021</year>
          /5410218.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Otsu</surname>
          </string-name>
          ,
          <article-title>A threshold selection method from gray-level histograms</article-title>
          ,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          <volume>9</volume>
          (
          <year>1979</year>
          )
          <fpage>62</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .1109/TSMC.
          <year>1979</year>
          .
          <volume>4310076</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>