Pattern recognition and neural networks for acoustic
                                monitoring and conservation of the historic port of Ancona
                                Samantha Di Loreto1,*, Alessandro Ricciutelli1 , Valter Lori2, Fabio Serpilli2 and
                                Sergio Montelpare1

                                1 Università degli studi G. D’Annunzio, Viale Pindaro 42, 60123, Pescara (Italy)

                                2 Università Politecnica delle Marche, Via Brecce Bianche 12, 60124, Ancona (Italy)


                                                Abstract
                                                The soundscape of Ancona's historic port is a vital aspect of its cultural heritage, capturing the
                                                essence of its maritime and urban environment. This research focuses on the application of
                                                advanced sound event recognition techniques to characterize and preserve the unique auditory
                                                landscape of the port. Utilizing convolutional neural networks (CNNs) we developed an
                                                automated system capable of identifying and classifying sound events within the port's diverse
                                                acoustic environment.
                                                Our approach involves the extraction of significant "soundmarks" - unique sounds that define the
                                                auditory identity of the port, such as ship horns, dockyard activity, and traditional marketplace
                                                sounds. By capturing and analyzing these soundmarks, our system not only characterizes the
                                                existing soundscape but also aids in the preservation of these auditory elements, ensuring that
                                                they are protected in the event of environmental changes or disasters.
                                                The findings of this study underscore the importance of soundscape preservation as a critical
                                                component of heritage conservation, particularly for historic sites where auditory elements play
                                                a significant role in cultural identity. Our methodology provides a framework for the automatic
                                                extraction and classification of soundmarks, offering a valuable tool for heritage conservationists
                                                and urban planners in safeguarding the acoustic heritage of historical buildings and sites globally.

                                                Keywords
                                                Soundscape, Cultural Heritage, Pattern recognition, Neural network model


                                1. Introduction
                                The soundscape strategy, which encompasses both desired and undesired sounds while
                                considering the contextual perception of the acoustic environment, represents a cutting-
                                edge approach in integrating soundscapes as a central element in architectural design
                                practices[1]. In this context, the term "soundscape" refers to the array of sounds present in
                                a specific environment, including both wanted and unwanted sounds. Soundscape-based
                                design transcends the purely visual aspect, considering the auditory experience of the
                                environment and actively seeking to shape the acoustics of designed spaces [2].
                                ____________________________________

                                VIPERC2024: 3rd International Conference on Visual Pattern Extraction and Recognition for Cultural Heritage
                                Understanding, 1 September 2024
                                ∗ Corresponding author.

                                   samantha.diloreto@unich.it (S. Di Loreto);
                                    0000-0002-1901-4280 (S. Di Loreto).
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
This perspective suggests that for a more holistic architectural practice, it is essential to
consider the soundscape as a design element, integrating it into design decisions to create
more harmonious environments that are mindful of the users' overall sensory experiences
[3]. To study the influence of the interaction between the geographic landscape and acoustic
stimuli on environmental perception, various authors have attempted to identify the
informative, aesthetic, or affective qualities of sound, which substantially contribute to the
quality of a given landscape [4].

Landscape factors, which are not negligible, have also been considered in many studies,
especially in relation to sound perception and soundscape [5]. The soundscape is typically
interpreted through the identification and description of different sound sources in a
location. Kang et al. [6] demonstrated that the human evaluation of loudness and acoustic
comfort depends on a series of intrinsic environmental factors. For instance, a large-scale
subjective survey was conducted on the underground commercial streets of Harbin, China,
to determine how individual sound sources affected loudness and the assessment of
acoustic comfort [7].
To study the influence of the interaction between geographic landscape and acoustic stimuli
on environmental perception, various authors have attempted to identify the informative,
aesthetic, or affective qualities of sound, which substantially contribute to the quality of a
given landscape [8], [9]. In an urban environment, there are different zones, and in each
zone, a dominant sound persists. Based on a series of case studies in Europe and China and
an extensive literature review, Xu and Wu [10] evaluated the basic elements of the
soundscape: sound, space, people, and environment. The evaluation of the relationship
between the acoustic/auditory environment and the responses of the people inhabiting it
is a crucial correlation for characterizing the human perception of daily environmental
noise.
In [11], a system that, by means of a tangible user interface, integrated by pattern
recognition and computer vision techniques, supports cultural heritage experts in creating
Smart Interactive Experiences by properly tailoring the behavior of the involved smart
objects. An experimental evaluation of the used techniques has been performed and is
presented and discussed.

The contribution of this work lies in the development of a pattern recognition algorithm
that systematically identifies and extracts soundmarks from the historic port's acoustic
data. These soundmarks are crucial for characterizing the sonic footprint of the location. In
the unfortunate event of the historical port's loss, our methodology provides a framework
for not only structural reconstruction but also for recreating its soundscape. This ensures
that the cultural and auditory identity of the port can be preserved and restored,
maintaining a holistic approach to heritage conservation.
Our methodology provides a framework for the automatic extraction and classification of
soundmarks, offering a valuable tool for heritage conservationists and urban planners in
safeguarding the acoustic heritage of historical buildings and sites globally.
This strategy reflects a holistic approach that recognizes the fundamental role of sound in
the perception of space and the overall quality of the built environment.

2. Material and methods
2.1. The case study
   The city of Ancona is characterized by a very varied territory on which numerous
contexts arise, including the port.
   The transformation of the waterfront is an important topical issue that concerns not only
large urban centers, but also small and medium-sized waterfront towns.
   The port of Ancona (international abbreviation IT AOI) is a port located in the innermost
part of the Gulf of Ancona and is therefore in its oldest core a natural harbour; it is Italy's
leading port for international vehicle and passenger traffic.
   The survey was carried out mainly in the historic Port, the only area of the port of Ancona
that can be reached without the aid of means of transport and the oldest area of
considerable historical interest due to the architectural works present. The historic port of
Ancona is located on the north-east of the entire port and it is one of the oldest and most
significant in the Adriatic Sea, with origins tracing back to ancient times. Established by the
Greeks from Syracuse in the 4th century BC, Ancona's port is strategically situated along the
maritime routes that linked the eastern Mediterranean with Italy and central Europe.
   During the Roman era, Ancona emerged as a crucial port due to its prime location and
the shape of its natural harbor, which offered safe refuge for ships. The port experienced
one of its most significant expansions under Emperor Trajan, who built a new pier, now
known as the Arch of Trajan, a monument honoring the emperor and his contributions to
the city. In the Middle Ages, Ancona's port continued to thrive as a key port in the Adriatic,
benefiting from its autonomy as a maritime republic, though smaller than Venice and Genoa.
During the Renaissance, Ancona became a vital trade hub with the East, owing to its
connections with the Ottoman Empire and other Mediterranean powers.
   To this day the site is an important landmark for the city. It attracts tourists who are
captivated by its history, monuments, and maritime traditions. Regular events and activities
related to the sea and seafaring help preserve the city's rich maritime heritage and it
function as a community hub, providing a beautiful atmosphere where locals usually enjoy
walking and spending time.
   Figure 1 shows the current state of the port.
Figure 1: Aerial view (a) and road connection and distribution (b) of the Port of Ancona.


2.2. Acoustic measurements
   Noise from transportation infrastructures is regulated by the specific implementing
regulations pursuant to Article 11 of L.447/95 [20] such infrastructures are not subject,
within their respective bands, to emission and immissions limits and to the attention values
provided by the D.P.C.M. November 14, 1997 "Determination of the limit values of sound
sources" [12, p. 447] . In addition, Article 4, of that decree, stipulates, that even the
differential immissions limit values do not apply to noise produced: by road, rail, airport
and maritime infrastructures.
   This insufficiency of conventional methods creates the need to experiment with
alternative approaches to environmental management, such as those focused on the
soundscape.
   Information was collected for the geographical and acoustic characterization of the area
under study by also taking advantage of the technical report on the acoustic classification
plan for the Ancona area approved by the Marche Region.
   Therefore, information such as the influence of the port area, the acoustic classification
and the measurement points of the noise monitoring network were collected and examined.
The seaport currently operates within the limits prescribed by Decree DPCM 14/11/97 in
which the environmental noise classification of areas is explained, and the port area of the
city of Ancona includes public land of three classes: class IV, affected by vehicular traffic,
with high population density, and presence of commercial activities. Classes V and VI cover
the industrial area (see figure 2).
Figure 2: Acoustic classification: framing of the port area.

    Experimental measurements allowed the determination of acoustic parameters such as
sound pressure levels (SPLs) and the main objective parameters of psychoacoustics:
Loudness and relative percentiles, Sharpness, roughness and Fluctation strenght [13].
    Sound levels for the calculation of psychoacoustic parameters were measured using a
4100-type head and torso simulator.
    The head and torso simulator, equipped with microphones at the entrance of the ear
canals allows the shape, size and acoustic impedance of the listener's head and torso to be
maintained; it is also capable of maintaining sound directionality.
    For headphone playback, the track acquired with the binaural head was chosen. The
analysis was performed using Soundbook MK2 software (Spectra) and Sound Quality
software (PULSE, B&K).
    All data from the site were collected under the same conditions: daytime (11:00 a.m. to
4:00 p.m.) in October 2021 (first week of the month), clear weather, and temperature
ranging between 18º C and 21º C.
    As for the visual data, however, these were recorded in 8k resolution as recommended
in ITU-R BT.2020.
    Table 1 show the result of acoustic and psychoacoustic measurements in the historical
port (named Cluster C1) and figure 3 show the graphic trend of measurements in situ.
Table 1
Acoustic and Psychoacoustic parameters calculated on the binaural recording.
                                                              Fluctation      Roughn
                    SPL        Loudness         Sharpn
    Cluster                                                  Strenght          ess
                 (dBA)         (phon)       ess (acum)
                                                              (vacil)        (asper)
             L      34,9         52,33            1,11           0,33           1,17
    C1
             R      37,7         54,98            1,10           0,41           1,58


Figure 3: Graphic trend of measurements for the C1 Cluster.


2.3. Cluster analysis and pattern recognition
To achieve this goal, an efficient vision system to recognize objects is required, because both
smart objects and custom attributes are represented by physical objects that must be
precisely detected. Object detection is, indeed, one of the areas that is maturing rapidly
thanks to deep learning innovation. Current object detection methods are typically based
on Convolutional Neural Network (CNN) models, able to automatically recognize visual
features exploiting different architectures [14]. One of the first models featuring
convolutions and shared weights was LeNet [15]. However, the spread of the deep learning
approach for image and object classification was determined by AlexNet [16], developed in
2012 as an enhanced version of LeNet. ZFNet [17] further improved AlexNet by exploiting
deconvolution network, while GoogLeNet introduced the Inception module reducing the
number of network parameters. In 2016, the residual network ResNet became the state-of-
the-art for the practical use of such models.
These pattern recognition techniques and advanced neural networks are essential for the
preservation and environmental monitoring of the historic port of Ancona. By
implementing an advanced vision system, we can accurately identify and catalog historical
objects and structures, monitor changes over time, and detect any damage or deterioration.
The use of CNNs and their variants allows for the analysis of large amounts of visual data,
ensuring continuous and detailed monitoring, which is fundamental for the conservation of
such a significant historical site. Neural networks, with their capacity for learning and
adaptation, offer powerful tools for sound and image recognition, crucial for preserving
both tangible and intangible aspects of cultural heritage.

To preserve the unique soundmarks of the port, a sophisticated algorithm was constructed.
This algorithm captures, analyzes, and classifies the sound environment of the port,
providing valuable insights for heritage preservation and environmental monitoring,
ensuring that the port’s sonic identity is maintained even in the face of potential changes or
disruptions. This tool represents a significant step forward in the holistic conservation of
Ancona's maritime heritage, integrating auditory elements into broader conservation and
planning efforts.

The development of the pattern recognition algorithm involves in five phases:
   i)     Image Classification: Image classification was approached as a supervised
          learning problem where a set of target classes (objects to be identified in
          images) was defined, and the model was trained to recognize them using labeled
          video examples.
   ii)    Cluster Analysis: Cluster analysis involved the application of clustering
          algorithms with the goal of finding hidden patterns or groupings in a dataset.
   iii)   Convolutional Neural Networks (CNNs): CNNs were employed due to their
          effectiveness in image classification and potential for audio classification. For
          this purpose, an audio classifier for the case study was constructed using
          Matlab-code, designed specifically to identify the type of noise in the area. The
          code grants to aggregate local classifications to whole sound column decisions
          by mimicking the visual video classification of Hershey et al. [18]
   iv)    Training with Binaural Recordings: The algorithm was trained using binaural
          recordings collected during the measurement campaign. These recordings
          provided a comprehensive and immersive representation of the sound
          environment, capturing the unique acoustic characteristics of the port.
   v)     Temporal Analysis: After studying several more complex models for combining
          information over time, a simple average single-frame CNN classification output
          was found where timestamps correspond to each analyzed region.

The system provided detailed outputs such as:
       - Sounds - Sounds detected in each region;
       - Average Scores - Average network scores corresponding to each class of sounds
          detected in the region;
       -   Max Scores - Maximum network scores corresponding to each sound class
           detected in the region;


Figure 4: Workflow of the CNN classification.

By capturing and analyzing these soundmarks unique to the port, the system provides a
comprehensive understanding of the acoustic environment.

3. Results
   The algorithm developed for the preservation and environmental monitoring of the
historic port of Ancona effectively categorized the recorded sounds into distinct classes.
These classifications were based on the framework provided in [13] and they are crucial for
understanding the soundscape dynamics and informing conservation strategies.
   The sounds of the historic port were classified into the following categories:

      -    TRAFFIC NOISE: This category includes sounds from boats, cars, and sirens.
           These sounds are often associated with the daily operations of the port and
           nearby urban activities.
      -    OTHER NOISE: This includes sounds from construction activities, industrial
           operations, machinery, and inappropriate music. These are typically considered
           as intrusive or undesirable noises that can affect the acoustic environment.
      -    SOUNDS FROM HUMAN BEINGS: Sounds in this category come from
           conversations, laughter, children playing and footsteps. These sounds reflect the
           human presence and activities within the port area.
      -    NATURAL SOUNDS: These are sounds from the natural environment, such as
           singing birds, wind in vegetation, flowing water, and sea waves. Natural sounds
           contribute to the acoustic diversity and aesthetic quality of the soundscape.
      -    DESIGNED SOUNDS: This category includes sounds that are intentionally added
           to the environment for specific purposes, such as ambient music or public
           announcements.
      -    ACOUSTIC EFFECTS: These are sounds that result from the acoustic
           characteristics of the environment, including echoes and reverberations.

To provide a comprehensive analysis of the soundscape, a temporal sound map was created.
This map visualizes the distribution and intensity of the different sound categories over
time. Figure 5 illustrates the temporal sound map of cluster C1, highlighting the variations
in sound classification within a specific timeslot.


Figure 5: Cloud of cluster C1: Average Scores pari a 0.42 e Maxscores pari a 0.53.


    In addition, agreeing with Shafer's theory, keynotes, soundmarks and soundsignals
were distinguished and identified according to the following classification for the cluster C1
(see table 2).

Table 2
Keynotes, soundmarks e soundsignal in cluster C1.
 Keynotes                   Soundsignal                              Soundmarks
 Traffic                    Siren
                                                                  Rowing boats, canoes,
 Air noise                  Allarm                                kayaks, Sailboats, Sea
                            Siren of protection of the              waves, Clip-Clop
 Water
                            boat
The temporal sound map provides the maximum and minimum network scores: for each
sound class detected in the timeslot, the algorithm calculates the maximum and minimum
network scores. These scores represent the confidence level of the neural network in
classifying the detected sounds into the predefined categories. Higher scores indicate a
stronger presence or more frequent occurrence of a particular sound class during the
analyzed period.

In addition, the temporal sound map is a crucial tool for several reasons:

    -   Monitoring Changes Over Time: By visualizing sound data over time, it becomes
        possible to monitor changes in the soundscape. This can help identify patterns
        related to specific events, activities, or environmental changes that affect the
        acoustic environment.
    -   Detecting Anomalies: The map helps in detecting anomalies or unexpected changes
        in the soundscape, such as sudden increases in construction noise or other
        disturbances. This is vital for timely interventions to preserve the acoustic quality
        of the historic port.
    -   Enhancing Conservation Efforts: Understanding the temporal dynamics of sounds
        allows conservationists to develop targeted strategies for preserving the unique
        sonic identity of the port. For example, measures can be taken to mitigate intrusive
        noises while enhancing the presence of natural and designed sounds.
    -   Informing Urban Planning: The insights gained from the temporal sound map can
        inform urban planning and development decisions, ensuring that the acoustic
        environment is considered alongside other factors.

By utilizing binaural recordings and advanced neural network techniques, the developed
algorithm provides a powerful tool for the holistic conservation of the historic port of
Ancona. It ensures that both the tangible and intangible aspects of the port’s heritage,
including its unique soundscape, are preserved and protected for future generations.
   Thanks to this investigation, it was possible to identify the actual content of the key notes
and sound signals in the track; this was essential for the construction of the quality metrics.
   The use of a clustering analysis allowed sound stimuli to be grouped according to
common characteristics, thus simplifying the understanding of emerging patterns and
recurring trends. This methodology helped to more clearly delineate the complexity of the
data and provide a better perspective on the inherent differences among the various sound
stimuli examined.


4. Conclusion
   The implementation of advanced pattern recognition techniques and neural networks
for the preservation and environmental monitoring of the historic port of Ancona has
yielded significant and promising results. The developed algorithm successfully categorized
the port's sounds into six distinct classes: traffic noise, other noise, sounds from human
beings, natural sounds, designed sounds, and acoustic effects. This classification was made
possible using binaural recordings collected during the measurement campaign, which
provided a rich and immersive dataset for precise training of the neural network.
   The creation of a temporal sound map for cluster C1 offered a detailed understanding of
the port's acoustic environment, illustrating the variations in sound classification over time.
By calculating the maximum and minimum network scores for each sound class, the system
highlighted the dynamic nature of the soundscape, revealing the presence and intensity of
different sound sources throughout the day. This comprehensive approach ensures a more
thorough protection of the port's cultural heritage by integrating auditory elements into the
conservation strategy.
   Furthermore, the ability to monitor changes and detect anomalies in the soundscape
allows for proactive conservation measures, ensuring the port's sonic identity is maintained
amidst environmental and urban changes. The insights gained from the sound classification
and temporal sound map can inform urban planning decisions, helping to mitigate intrusive
noises and enhance the presence of natural and designed sounds in the port area.
   In conclusion, the research demonstrates the effectiveness of combining pattern
recognition techniques and neural networks for cultural heritage conservation. The
developed system not only provides a detailed understanding of the historic port's
soundscape but also offers valuable tools for its preservation


References
[1] M. D. Fowler, ‘Soundscape as a design strategy for landscape architectural praxis’,
    Design Studies, vol. 34, no. 1, pp. 111–128, 2013, doi: 10.1016/j.destud.2012.06.001.
[2] C.-J. Yu and J. Kang, ‘Soundscape in the sustainable living environment: A cross-cultural
    comparison between the UK and Taiwan’, Science of the Total Environment, vol. 482–
    483, no. 1, pp. 501–509, 2014, doi: 10.1016/j.scitotenv.2013.10.107.
[3] A. Kaplan, ‘Landscape architecture’s commitment to landscape concept: A missing
    link?’, Journal of Landscape Architecture, vol. 4, no. 1, pp. 56–65, 2009, doi:
    10.1080/18626033.2009.9723413.
[4] T. V. Renterghem et al., ‘Interactive soundscape augmentation by natural sounds in a
    noise polluted urban park’, Landscape and Urban Planning, vol. 194, p. 103705, 2020,
    doi: https://doi.org/10.1016/j.landurbplan.2019.103705.
[5] R. Pheasant, K. Horoshenkov, G. R. Watts, and B. Barrett, ‘The acoustic and visual
    factors influencing the construction of tranquil space in urban and rural environments
    tranquil spaces-quiet places?’, The Journal of the Acoustical Society of America, vol. 123,
    pp. 1446–57, Apr. 2008, doi: 10.1121/1.2831735.
[6] A. M. Jian KANG Francesco ALETTA, Tin OBERMAN, Mercede ERFANIAN, Magdalena
    KACHLICKA, Matteo LIONELLO, ‘Towards soundscape indices’, PROCEEDINGS of the
    23rd International Congress on Acoustics, 2019.
[7] J. Liu, J. Kang, H. Behm, and T. Luo, ‘Effects of landscape on soundscape perception:
    Soundwalks in city parks’, Landscape and Urban Planning, vol. 123, pp. 30–40, 2014,
    doi: https://doi.org/10.1016/j.landurbplan.2013.12.003.
[8] J. W. Smith and B. C. Pijanowski, ‘Human and policy dimensions of soundscape
     ecology’, Global Environmental Change, vol. 28, no. 1, pp. 63–74, 2014, doi:
     10.1016/j.gloenvcha.2014.05.007.
[9] J. L. Carles, I. L. Barrio, and J. V. de Lucio, ‘Sound influence on landscape values’,
     Landscape and Urban Planning, vol. 43, no. 4, pp. 191–200, 1999, doi:
     https://doi.org/10.1016/S0169-2046(98)00112-1.
[10] X. Xu and H. Wu, ‘Audio-visual interactions enhance soundscape perception in China’s
     protected areas’, Urban Forestry and Urban Greening, vol. 61, 2021, doi:
     10.1016/j.ufug.2021.127090.
[11] F. Balducci, P. Buono, G. Desolda, D. Impedovo, and A. Piccinno, ‘Improving smart
     interactive experiences in cultural heritage through pattern recognition techniques’,
     Pattern Recognition Letters, vol. 131, pp. 142–149, Mar. 2020, doi:
     10.1016/j.patrec.2019.12.011.
[12] ‘Determinazione dei valori limite delle sorgenti sonore’, Recommendation Legge 14
     novembre 1997, 1997.
[13] International standard ISO, ‘Acoustics — Soundscape — Part 2: Data collection and
     reporting requirements’, International Organization for Standardization, Standard
     ISO/TS            12913-2:2018,             2018.          [Online].        Available:
     https://www.iso.org/standard/75267.html
[14] F. Balducci, D. Impedovo, and G. Pirlo, ‘Detection and Validation of Tow-Away Road
     Sign Licenses through Deep Learning Methods’, Sensors, vol. 18, no. 12, 2018, doi:
     10.3390/s18124147.
[15] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘Gradient-based learning applied to
     document recognition’, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov.
     1998, doi: 10.1109/5.726791.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘ImageNet Classification with Deep
     Convolutional Neural Networks’, in Advances in Neural Information Processing
     Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran
     Associates,             Inc.,           2012.            [Online].          Available:
     https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8
     436e924a68c45b-Paper.pdf
[17] K. He, X. Zhang, S. Ren, and J. Sun, ‘Deep Residual Learning for Image Recognition’, in
     2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas,
     NV, USA: IEEE, Jun. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
[18] Hershey Shawn et al., ‘CNN Architectures for Large-Scale Audio Classification’, IEEE
     International Conference on Acoustics, pp. 131–35, 2017.