<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Speech and Language Impairment Detection by Means of AI-Driven Audio-Based Techniques</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Luca</forename><surname>Corvitto</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer, Control and Management Engineering</orgName>
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<addrLine>Via Ariosto 25</addrLine>
									<postCode>00185</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lorenzo</forename><surname>Faiella</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer, Control and Management Engineering</orgName>
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<addrLine>Via Ariosto 25</addrLine>
									<postCode>00185</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christian</forename><surname>Napoli</surname></persName>
							<email>cnapoli@diag.uniroma1.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer, Control and Management Engineering</orgName>
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<addrLine>Via Ariosto 25</addrLine>
									<postCode>00185</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adriano</forename><surname>Puglisi</surname></persName>
							<email>puglisi@diag.uniroma1.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer, Control and Management Engineering</orgName>
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<addrLine>Via Ariosto 25</addrLine>
									<postCode>00185</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Samuele</forename><surname>Russo</surname></persName>
							<email>samuele.russo@uniroma1.it</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Psychology</orgName>
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<addrLine>Via dei Marsi 78</addrLine>
									<postCode>00185</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Speech and Language Impairment Detection by Means of AI-Driven Audio-Based Techniques</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">28FBE5534365F18B505C0EF0C7403789</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:24+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>SLI</term>
					<term>AI</term>
					<term>audio</term>
					<term>healthcare</term>
					<term>speech</term>
					<term>learning disease</term>
					<term>feature extraction</term>
					<term>data augmentation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Speech and Language Impairments (SLI) affect a large and heterogeneous group of people. With our work, we propose a novel, easy, and immediate detection tool to help diagnose people who suffer from SLI using speech audio signals, along with a new dataset containing English speakers affected by SLI. In this work, we experiment with feature extraction methods such as Mel Spectrogram and wav2vec 2.0, as well as classification methods such as SVM, CNN, and linear neural networks. We also work on data audio augmentation trying to overcome the very common limitations imposed by data scarcity in the medical field. The overall results indicate that the wav2vec 2.0 feature extractor, paired with a linear classifier, provides the best performance with a reasonably high accuracy of over 96%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The rapid development of the use of Artificial Intelligence (AI) techniques in a broad range of scientific fields has helped solve real-life problems, in particular, the new advancements revolutionized a wide variety of areas such as Natural Language Processing (NLP) <ref type="bibr" target="#b0">[1]</ref>, computer vision <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, robotics and many more. Due to the huge volume of medical data being generated worldwide, there is a clear need for efficient use of this information to benefit health sectors around the world <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. The medical community has taken strong notice of the potential of these new technologies in AI. Machine learning (ML) thrives in areas where there are lots of data, therefore ML is one of the essential and most effective tools in analyzing highly complex medical data <ref type="bibr" target="#b5">[6]</ref>. For example, analyzing medical data originating from disease diagnosis with the aid and benefits given by these tools could be a lot more financially efficient. In healthcare, it is also vital that diseases are detected early on during diagnosis and prognosis. The success of these AI methods has also spread across other domains, including speech recognition and the music recommendation task <ref type="bibr" target="#b6">[7]</ref>. Due to the relevance of such systems in our day-to-day lives, there is an increasing need for effective and efficient audio clas-sification systems. Automatic classification technologies are widely applied in voice assistants <ref type="bibr" target="#b7">[8]</ref>, chatbots <ref type="bibr" target="#b8">[9]</ref>, smart safety devices <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>, and in different real-world environments <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14]</ref>.</p><p>Our project aims to conciliate these two worlds and design a Deep Learning (DL) model that can detect, from a given audio input, if the speaker could be affected by a speech and language impairment. Individuals with a Speech and Language Impairment (SLI), generally, despite normal hearing, normal nonverbal intelligence, adequate social functioning, and no obvious signs of brain injury represent a heterogeneous group of people with significant difficulty in learning languages <ref type="bibr" target="#b14">[15]</ref>. One of the defining characteristics of SLI is speech disfluency, more specifically impaired acquisition of pattern-based components in language, such as morphology, syntax, and some aspects of phonology such as stuttering. This commonly used definition leads to early hypotheses regarding the etiology of SLI that an impaired languagespecific learning mechanism underlies language development and disorders <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. This disorder is deemed "primary" or "specific" when there is no clear explanation for these lags in language skills, a defining characteristic of primary language disorder is that its cause is unknown <ref type="bibr" target="#b18">[19]</ref>. Language disorders are also linked to a heightened risk for psychiatric concerns, attentional difficulties, social-behavioral problems, and learning disabilities <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref>. Many current trends in audio signal processing rely on data-driven machine learning approaches to achieve state-of-the-art results <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>. However, the quantity and quality of available data influences heavily the achieved performance for a task. Depending on the specific task, as for our case study, such data can often be hard to obtain and costly to label particularly in  <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref> or with the aid of tools such as electroencephalogram (EEG) <ref type="bibr" target="#b26">[27]</ref>. We want to design an easy and accessible model that can detect if a person could be affected by an SLI without having to go through complex and timeconsuming procedures. In this manner, such a model could also be implemented in robots from a human-robot interaction (HRI) perspective, allowing the machine to detect people with SLI and change its behavior and form of interaction accordingly.</p><p>This study proposes an analysis of a novel, yet simple, approach of using exclusively audio recordings for SLI detection. Specifically, in Section 2 we start by exploring the current literature, and then we will talk about the problems faced in collecting our data and how we handled them in section 3. After that, in Section 4, we will go through an analysis of the techniques and models used to perform the detection, the trials and results we obtained from them in Section 5, and then we will discuss the limits of our approach in Section 6. We will finally draw our conclusions in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>In the ever-evolving landscape of computer science and artificial intelligence, the domains of audio data augmentations and feature extraction are undergoing very rapid changes and revolutions thanks to groundbreaking research and advancements. In the following sections, we will delve into the story and explore the state of the art of these fields.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Audio Data Augmentation</head><p>One of the most important challenges in developing an efficient and effective audio classification system is accessing a large and well-annotated dataset. One of the main obstacles in developing sound classifications is a lack of a sufficient quantity of labeled data. This is due to the following main reasons: class imbalance, data privacy issues, time constraints involved in data collection, high dependency on expertise for effective annotation, etc. <ref type="bibr" target="#b27">[28,</ref><ref type="bibr" target="#b28">29,</ref><ref type="bibr" target="#b29">30]</ref> Data Augmentation (DA) is defined as the creation of new data by adding deformations to increase the variety of the data so that these deformations do not change their semantic value. It is well known that DA can improve the algorithm's performance, tackle the issue of overfitting <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b31">32]</ref>, and improve the generalization ability of Deep Neural Networks (DNN); this happens because DA averages over the orbits of the group that keeps the data distribution invariant, which leads to variance reduction <ref type="bibr" target="#b32">[33]</ref>. DA is key when dealing with problems regarding audio signals because the Convolutional Neural Network (CNN) is the most widely used model in audio applications and when faced with small datasets, CNN's capacity for information retention becomes a flaw; the models memorize the training data and lose performance on new data <ref type="bibr" target="#b33">[34,</ref><ref type="bibr" target="#b34">35]</ref>. In addition to increasing generalization capabilities, the augmentation of data also allows the designed system to improve data significance, regardless of the available data samples <ref type="bibr" target="#b35">[36,</ref><ref type="bibr" target="#b36">37]</ref>. These strategies include methods on raw audio signals, as well as applying other techniques on samples converted into spectrograms or even more complex approaches such as interpolation and nonlinear mixing on the spectrum. We will now list and briefly explain the most used audio augmentation techniques. Pitch Shifting. The tone of each audio signal in the dataset is lowered or raised by a factor preserving its duration. Time Stretching. The audio sample is slowed down or sped up by a ratio without altering the pitch drastically. Time Shifting. Time is shifted to the left or to the right by a random factor or by a predetermined amount. Volume Adjustment. The volume of the audio file is altered, there is a change in loudness, or sometimes a dynamic range compression is applied. Noise addition. Noise is introduced into the samples, other than a simple random Gaussian noise there are many types of noises such as white noise <ref type="bibr" target="#b37">[38]</ref>, babble noise, static noise <ref type="bibr" target="#b38">[39]</ref>, factory noise, etc. SpeedUp. The signal is resampled at a preset sampling rate and later returned at the original sampling rate, resulting in a speed change. Filtering. Several kinds of filters are applied to the input audio. Most of the common filters are band-pass, band-stop, high-pass, high-shelf, low-pass, low-shelf, and peaking filters.</p><p>This topic is so important that researchers also developed and designed methods that generate entirely new samples, for example with the aid of a Generative Adversarial Network (GAN) in <ref type="bibr" target="#b39">[40]</ref> people created new variants of the audio samples that already existed in their dataset and then utilized an evolutionary algorithm to search the input domain to select the best-generated samples, in this way they were able to generate audio in a controlled manner that contributed to an improvement in classification performance of the original task. One very recent DA method proposed by Google is SpecAugment <ref type="bibr" target="#b40">[41]</ref>, in this method, the two-dimensional spectrum diagram is treated as an image with time on the horizontal axis and frequency on the vertical axis. Encoder-decoder networks are becoming very popular in fields different from NLP, this is because they can convert a high-dimensional input into a lower-dimensional vector in latent space, researchers in <ref type="bibr" target="#b41">[42]</ref> have experimented with a Long Short Term Memory (LSTM) based auto-encoder to produce artificial data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Audio Feature Extraction and Models</head><p>It should be noted that data augmentation is not the only way to reduce overfitting and improve the generalization ability of DL models. Model structure optimization, transfer learning, and One-shot and Zero-shot learning are also known strategies that deal with overfitting from different aspects. We will now focus on the most common processing flow of audio classification: preprocessing the original audio data, feature extraction, and feeding the features into the DL model. Audio signals have very high dimensionality, so thousands of floating point values are required to represent a short audio signal, raising the need for exploring dimensionality reduction and feature extraction methods. The degree of how great or poor a model performs is also determined by the choice of features used feature representation is crucial to improve the performance of learning algorithms in the sound classification task. One of the first features that comes to mind when thinking of an audio signal is the spectrogram, its characteristics have been widely used by previous researchers in different domains of sound classification, such as heartbeat sounds to detect heart diseases <ref type="bibr" target="#b42">[43]</ref>. Another method used to extract features implements the Mel-Frequency Cepstrum (MFC), which is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency, where the Mel-Frequency Cepstral Coefficients (MFCC) were successful in representing sounds for the detection of respiratory diseases <ref type="bibr" target="#b43">[44]</ref>. Some methods that also use the MFC are the long-mel <ref type="bibr" target="#b44">[45]</ref>, mel filter bank energy <ref type="bibr" target="#b45">[46]</ref>, inverted MFCC <ref type="bibr" target="#b46">[47]</ref>, and many more. Although mel spectrogram and MFCC are commonly used, people also implement bag of audio words <ref type="bibr" target="#b47">[48]</ref>, Discrete Gabor Transform (DGT) audio image representation <ref type="bibr" target="#b48">[49]</ref>, ZCR, entropy of energy, spectral centroid, spectral spread, spectral entropy <ref type="bibr" target="#b49">[50]</ref>, and so on.</p><p>Classification is a common task in ML and pattern recognition. DL methods applied in these tasks, such as CNN models, often do not perform as well as more traditional ML methods such as random forest, Adaboost, etc., especially in small data <ref type="bibr" target="#b50">[51]</ref>. On the other hand, typical ML algorithms, such as ensemble classifiers have been shown to learn features better and adapt more with improved generalization abilities even in the case of small and imbalanced datasets. Over the past years, different ML algorithms have been used for detecting sound events and medical sounds, and the achieved results were of great significance. Classifiers, such as Support Vector Machine (SVM), have shown to be very effective in sound classification tasks <ref type="bibr" target="#b51">[52]</ref>, also MultiLayer Percep-trons (MLP) were very useful in person identification using speech and breath sounds <ref type="bibr" target="#b52">[53]</ref>, Hidden Markov Models (HMM) <ref type="bibr" target="#b53">[54]</ref>, logistic regression and linear discriminant analysis <ref type="bibr" target="#b54">[55]</ref> and others. Some studies exploited the effectiveness of multiple simpler methods with ensemble methods such as random forests <ref type="bibr" target="#b55">[56,</ref><ref type="bibr" target="#b50">51]</ref>, XgBoost <ref type="bibr" target="#b56">[57]</ref>, and so on. Unfortunately, considering the complexity of sound and the need to sometimes train an extremely sensitive classifier that can identify different representations of sound features, traditional ML still suffers in these kinds of tasks from having less complex models. In this case, the choice of DL methods has been proven to be more efficient. DL methods differ from traditional ones because they can extract meaningful features from data through the application of a hierarchical structure <ref type="bibr" target="#b57">[58]</ref> CNNs were able to achieve significant and more accurate training results <ref type="bibr" target="#b58">[59]</ref>. People tried to combine the best of these two worlds by implementing hybrid methods, for example, researchers merged an SVM and a GRU-RNN in <ref type="bibr" target="#b59">[60]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset</head><p>In the medical field, in particular, regarding specific problems such as the one presented in this paper, data is not always freely available or available at all. This is mostly due to privacy concerns <ref type="bibr" target="#b60">[61,</ref><ref type="bibr" target="#b61">62]</ref>. Another important reason, which is also related in some ways to privacy <ref type="bibr" target="#b61">[62]</ref>, lies in the overall low level of digitization of healthcare information <ref type="bibr" target="#b62">[63]</ref>; in fact, according to Gopal G. et al. <ref type="bibr" target="#b63">[64]</ref>, healthcare has the lowest level of digital innovation compared to other industries, such as media, finance, insurance, and retail, contributing to limited growth of labor productivity. In addition to this, it is also worth noting that not every dataset containing the desired medical information is also in the desired format, in which case the only remaining option is to create an entirely new dataset from scratch, that is what we did.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Collection</head><p>The process of collecting audio data is a pivotal phase in this research. For our dataset, we aimed to collect a sufficient amount of pure, non-multimodal, audio data in a waveform representation. Audio data can be stored in various formats, each with its characteristics, trade-offs, and use cases. Common audio formats include Waveform Audio File Format (WAV), MPEG-1 Audio Layer 3 (MP3), Free Lossless Audio Codec (FLAC), and more. These formats differ in terms of compression, quality, and compatibility. For this study, we opt for the WAV format <ref type="bibr" target="#b64">[65]</ref>, which is an uncompressed audio file format, developed by IBM and Microsoft, that efficiently stores audio data in a waveform representation without any loss of in-  The data collection process began with the identification of audio samples containing English speakers affected by Speech and Language Impairment (SLI) originating from different conditions. This diverse dataset was intentionally curated to optimize the performance of SLI detection. By including speakers with a range of impairments, the model is exposed to a broad spectrum of speech patterns and anomalies, thereby enhancing its ability to accurately detect SLI in real-world applications. To source such data, we turned to YouTube, a vast and user-friendly repository of video and audio content. The videos found were then converted into audio files in WAV format using an online converter.</p><p>We finally paired the collected data with a subset of the LibriSpeech dataset <ref type="bibr" target="#b66">[67]</ref> containing healthy English speakers only.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data Preprocessing</head><p>To feed the waveform signals to the model, we needed to ensure that they were appropriately prepared and processed. Effective data preprocessing is fundamental to enhancing the model's performance, as it directly impacts the model's ability to extract meaningful patterns and insights from raw input data. This was performed in different steps. Firstly we identified different time windows from each audio file to cut out unnecessary in- formation from them, keeping just human speech sounds (with or without background noise). After that, we analyzed the time windows by dividing each one of them into smaller ones containing the speech of one single person each. Even if there exist different tools available to detect human speech, considering the scarcity of data we suffer, we decided to perform this step manually to be sure that the quality of our dataset is not affected.</p><p>Secondly we split the time windows that we obtained in 3-second clips. We chose this length as a trade-off between a sufficient length, to capture fluency information and a brief duration. Our decision was also based on the standard approach used in the state-of-the-art working with wav2vec 2.0 in these kinds of tasks <ref type="bibr" target="#b67">[68,</ref><ref type="bibr" target="#b68">69]</ref>. Then these clips were saved in two different subsets, creating the Train and the Test set, ensuring that the same speakers do not overlap in both datasets.</p><p>Finally the acquired data was augmented to increase its dimension. We applied the following audio augmentations techniques: Time shifting, Time stretching, Pitch shifting, and Noise addition, using Gaussian noise. To do so, we used the python library audiomentations <ref type="bibr" target="#b69">[70]</ref>. For Time shifting we resampled the time windows shifting the starting time further by 1.5 seconds; For Time stretching we slowed down the speed of the audios by a ratio of 0.8; For the Pitch shifting we both lowered and raised the pitch tone by a value of 3, obtaining for each clip two additional ones; Finally for the Noise addition, we added a 0.01𝑚 amplitude Gaussian noise. Audio waveforms before and after noise addition are shown in Fig. <ref type="figure" target="#fig_0">1</ref> and Fig. <ref type="figure" target="#fig_1">2</ref>. All the augmentation techniques were applied on the original audio; Time shifting was directly applied on the time windows, while the other ones on the initial 3 seconds clips.</p><p>The number of samples in the created dataset is shown in Table <ref type="table" target="#tab_1">1</ref>, while in Table <ref type="table">2</ref> we collect the audio data augmentation techniques used and their respective parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2 Pamateres used for augmentation methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Augmentations Parameters</head><p>Time-shifting shift = +1.5 seconds Time-stretching ratio = 0.8 Pitch-shifting shift = ± 3 tones Noise-addition amplitude = 0.01 meters </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Data Management</head><p>The dataset contains audio files in the WAV format, its data is affected not only by its advantages but also by its drawbacks. The complete dataset, which comprehends both original and augmented data, was too large to be loaded in an online manner using the original files. To overcome this problem we loaded the data in batches and concatenated them in subsets that were saved in the .arrow format <ref type="bibr" target="#b70">[71]</ref>, a columnar memory format for flat and hierarchical data, organized for efficient analytic operations. In this way, large data can be saved, loaded, and processed avoiding memory usage problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Models and Techniques Used</head><p>The best way to approach a problem is to know deeply every factor that influences it and how the key components work, after that, one can tackle it and try to capture its essence with the maximum capabilities. In the following subsections, we present a brief description of the techniques we used and the models we implemented.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Log Mel Spectrogram</head><p>The way humans hear frequencies in sound is known as pitch, it is a subjective impression of the frequency. They do not perceive frequencies linearly, on the contrary, humans are more sensitive to differences between lower frequencies than higher ones. For example, the difference between audios of frequency 100𝐻𝑧 and 200𝐻𝑧 is way bigger than 1000𝐻𝑧 and 1100𝐻𝑧, even though the absolute difference is the same amount. Humans perceive sounds on a logarithmic scale rather than a linear scale. The Mel Scale <ref type="bibr" target="#b71">[72]</ref> was developed to take this into account by conducting experiments with a large number of listeners. It is a scale of pitches, such that each unit is judged by listeners to be equal in pitch distance from the next. The human perception of the amplitude of a sound is called loudness, similarly to frequency, also loudness is heard logarithmically rather than linearly. The Decibel scale is used to measure the loudness of a sound, for example, a sound with an amplitude of 20𝐷𝑏 is 10 times louder than one with an amplitude of 10𝐷𝑏. We can see that, to deal with sound realistically, we need to use a logarithmic scale via the Mel Scale and the Decibel Scale when dealing with Frequencies and Amplitudes in our data. Spectrograms are generated from sound signals using Fourier Transforms. A Fourier Transform (FT) <ref type="bibr" target="#b72">[73]</ref> is a mathematical formula that allows us to decompose the signal into its constituent frequencies and displays the amplitude of each frequency present in the signal. Spectrograms are generated from sound signals using FTs. In other words, an FT converts the signal from the time domain into the frequency domain, and the result is called a spectrum. A spectrogram consists in dividing the sound signal into smaller time segments, then applying the FT to each segment, and finally, the combination of these segments in a single plot is called spectrogram. A Mel Spectrogram makes two important changes relative to a regular spectrogram that plots frequency vs time: it uses the Mel scale instead of frequency on the y-axis and uses the Decibel scale instead of amplitude to indicate color. In Fig. <ref type="figure" target="#fig_2">3</ref> we can see a normalized version of the Mel spectrogram of one of the audios present in the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Wav2vec 2.0</head><p>Wav2vec 2.0 <ref type="bibr" target="#b65">[66]</ref> is an exceptional tool that learns powerful representations from speech mimicking the human learning experience. People start, in fact, since the early stages of their lives comprehending language without labeled data, i.e. kids learn from listening to adults around them. It is also able to outperform state-of-the-art models while using 100 times less labeled data, thus demonstrating the feasibility of training without huge amounts of labeled data which is very hard to achieve in a field dealing with a complex medium such as audio. The model can be visualized in Fig. <ref type="figure" target="#fig_3">4</ref> and next, we will describe its components.</p><p>Multi-layer convolutional feature encoder. It consists of several blocks containing a temporal convolution followed by layer normalization and a GELU activation function.</p><p>Context network. It follows the Transformer architecture, differently from a normal Transformer that uses fixed positional embeddings, a convolutional layer is used instead, and it acts as a relative positional embedding. The output of the convolution followed by a GELU is added to the inputs and then a layer normalization is applied.</p><p>Quantization module. It discretizes the output of the feature encoder to a finite set of speech representations via product quantization. Product quantization amounts to choosing quantized representations from multiple codebooks and concatenating them. The Gumbel softmax enables choosing discrete codebook entries in a fully differentiable way.</p><p>The feature encoder 𝑓 : 𝑋 → 𝑍 takes as input the raw waveform 𝑋 and outputs the latent speech representations 𝑧1, ..., 𝑍𝑡 for 𝑇 time steps, then they are fed to the transformer 𝑔 : 𝑍 → 𝐶 that captures information from the entire sequence and outputs context representations. The output of the feature encoder is also discretized to 𝑞𝑡 with a quantization module to represent the targets in the self-supervised objective. During the model's pretraining a part of the latent speech representations that are generated from the feature encoder are masked, and then the model learns the representations of speech audio by solving a contrastive task, which requires identifying the true quantized latent speech representation for a masked time step within a set of distractors. After pre-training on unlabeled speech, the model is fine-tuned on labeled data with a Connectionist Temporal Classification (CTC) loss.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Classification Methods</head><p>Classification is the part that stands out the most in an entire model because it outputs the labels that are used to compute the evaluation metrics, even though it is the most noticeable part of a model, in our case they are just the final piece of the puzzle since most of the work is done in the previous steps of the pipeline; still, we want to pay some attention to the type of classifiers we used in our work.</p><p>Support Vector Machine (SVM) <ref type="bibr" target="#b73">[74]</ref> is one of the first algorithms learned by every ML expert, it is simple yet it can achieve excellent results, especially with small amounts of data where other ML algorithms tend to have some difficulties. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (𝑁 − the number of features) that distinctly classifies the data points. To separate the two classes of data points, many possible hyperplanes could be chosen. SVM finds a plane that has the maximum margin, i.e. the maximum distance between data points of both classes. Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. The biggest difficulty encountered when testing the SVM is that even with low amounts of data the model had memory issues, since audio features are extremely large and with multiple classes, while SVM excels with data that has fewer classes, thus making it hard to fully exploit SVM's strengths.</p><p>One of the best and most efficient methods to generate labels from an ML model is adding a linear layer at the end of the pipeline, that is what we did with our wav2vec 2.0 feature extractor, we have included a linear classifier 𝑓 (𝑥𝑖, 𝑊, 𝑏) = 𝑊 • 𝑥𝑖 + 𝑏 and we trained its weights to output two types of labels, one for people affected by a SLI and one for the others.</p><p>Resnet34 is a very famous residual neural network that was pre-trained on ImageNet-1k and was released by Microsoft <ref type="bibr" target="#b74">[75]</ref>, thanks to residual learning and skip connections this type of model can be much deeper than normal convolutional neural networks. We decided to fine-tune this model with the features extracted with the log mel spectrogram from our dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>In this section, we will describe the different architectures that we tested in detail and then we will comment on the obtained results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Architectures</head><p>Our first approach was to use the wav2vec 2.0 model, in particular the pre-trained wav2vec2-base model from HuggingFace <ref type="bibr" target="#b75">[76]</ref>, to perform Feature Extraction on the 19-31 pre-processed non-augmented dataset and then use a SVM, the Support Vector Classifier (SVC) model from scikit learn <ref type="bibr" target="#b76">[77]</ref>, to perform the classification process taking the extracted features in input. As it was explained in the previous section 4, wav2vec2.0 takes a raw waveform signal as input, 3 seconds clips in WAV format in our case, then extracts audio features from them following what it had learned in its previous training. The extracted features were then standardized using the StandardScaler from scikit learn, removing the mean and scaling them to unit variance. The standardization of a dataset is a common requirement for many ML estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance). Finally, we fitted the SVM using a linear kernel.</p><p>Using the SVM model as a classifier was our first attempt to cope with the limited number of samples at our disposal. Once the dataset was augmented we ceased to use the SVM due to its intrinsic limitations at working with large datasets; so we opted for a complete DL approach.</p><p>For our second architecture, we substituted the classifier head with a simple Fully Connected (FC), or linear, layer, keeping the wav2vec 2.0 model to perform the Feature Extraction, this time, on the augmented dataset. We trained the model for 5 epochs through the Trainer class by HuggingFace on a batch of 32 samples each, setting the learning rate to 2𝑒 − 5 after a warm-up period at a ratio of 0.1 and decreasing its value linearly till the end of the training.</p><p>The last architecture tested was a CNN, more precisely resnet34, that received as input the log mel spectrogram of the audios and generated as output the labels of the given audio. All the procedures to extract the spectrogram were carried on with the librosa library, firstly the sample was resampled with a new rate of 22050, then the mel spectrogram extracted was normalized and finally scaled. Regarding the CNN, only the last layer was modified, it was replaced with a linear layer that had two output channels and the whole model was fine-tuned without freezing the previous layers. Training was carried out for 50 epochs, the learning rate started at 2𝑒 − 4 and decayed by a factor of 10 every 10 epochs; the loss function used was the CrossEntropyLoss. All parameters used to compute the spectrum are shown in Table <ref type="table" target="#tab_2">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Evaluations</head><p>In Table <ref type="table">4</ref> we show the accuracy of our architectures, compared with others architectures <ref type="bibr" target="#b77">[78]</ref> As we can see, the first model is the one with the lowest score. This means that, despite the ability of the SVM to avoid overfitting on the poor quantity of data provided, it cannot accurately detect the speakers affected by SLI. This is  <ref type="bibr" target="#b78">[79]</ref> 0.8832 LMT BL Strategy <ref type="bibr" target="#b78">[79]</ref> 0.9269 MLP BL Strategy <ref type="bibr" target="#b78">[79]</ref> 0.9013 NB BL Strategy <ref type="bibr" target="#b78">[79]</ref> 0.9269 CNN <ref type="bibr" target="#b79">[80]</ref> 0.8421</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our Models Accuracy</head><p>Wav2vec2.0 + SVM 0.6627 Wav2vec2.0 + FC 0.9661 Log Mel Spectrogram + CNN 0.9362 probably due to the magnitude of the feature space extracted by the wav2vec 2.0 model. Using, instead, an augmented dataset together within a DL approach we manage to reach a very high value of accuracy, the highest of our models. The wav2vec 2.0 feature extractor, having enough data to work with, managed to extract the key features and information needed to correctly identify which voice belongs to a healthy speaker or an impaired one.</p><p>The CNN model that was fine-tuned with Log Mel Spectrum features achieved great accuracy in labeling samples, unfortunately, through a more accurate analysis of the confusion matrices shown in 5, 6, and 7, we discovered that the number of false negatives is extremely high compared to the false positives. In the medical field, especially for tools helping with diagnosis, it is crucial to have the smallest number of false negatives, since an undetected disease is much worse than a false positive, medical operators could be missing a lot of vital anomalies and in time they will lose trust in the system. In our case recall is way more important than the precision score, from Table <ref type="table" target="#tab_3">5</ref> we can see that the CNN model reaches only a recall score of 0.85, on the other hand wav2vec 2.0 achieves a better recall and F1 Score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Limitations and Future Works</head><p>It is of critical importance to examine our achievements and acknowledge the constraints that affect our work. While our research has given promising results, the following section delves into the limitations that shape our results and sets a base for future possible improvements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Limitations</head><p>The lack of data quantity and quality is one of our major constraints. The problem of data scarcity has already been addressed in section 3 so we will now talk about quality.</p><p>In the realm of ML and DL, it has been well documented that the issue of low-quality data and disparities in data collection methodologies exacerbate the inherent biases within the data when utilized for training algorithms, a clear example is given by the societal or political biases reflected in word embeddings or large language models <ref type="bibr" target="#b80">[81,</ref><ref type="bibr" target="#b81">82]</ref>. This concern arises when the data collected for training purposes exhibits significant variations in quality and collection techniques, resulting in a heightened vulnerability to intrinsic biases within the data. Such biases can subsequently propagate through the training process, influencing the performance and fairness of ML and DL algorithms leading to further disparities and discrimination in the real world, due to the accessibility to such tools <ref type="bibr" target="#b82">[83,</ref><ref type="bibr" target="#b83">84]</ref>. Particularly, in our work, the collection of English speakers affected by SLI presents the limitation of containing mostly speakers with American accents. In real-world applications this can have negative effects on the model performance, for example, the algorithm could achieve higher and better results with American people rather than with Mexican ones, or other English-speaking minority ethnic groups of people whose accent differs from the standard American one <ref type="bibr" target="#b83">[84]</ref>.</p><p>Another limitation of our dataset is that it does not contain children speakers. This is because finding such materials on the web is often difficult, and it is more difficult to create them from scratch due to the small number of certified children affected by SLI and, since they are minors, due to more strict privacy concerns. The most used dataset in this field <ref type="bibr" target="#b84">[85]</ref> consists of one second clips of Czech speaking children, both healthy or affected by SLI. Although this dataset could be useful for the detection of SLI, it is limited to the Czech language and children speakers. This kind of limitation is common in the healthcare field, especially in SLI detection. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Future Works</head><p>Future works should focus on the creation of a new dataset comprising people speaking different languages, since it is not yet known, to our knowledge, whether fluency problems can be generalized in all languages and a wide age range, knowing that the features and the overall characteristics of the voice between children and adults change in general, due to their anatomical differences <ref type="bibr" target="#b85">[86]</ref>.</p><p>Given the technological advancement in the field of generative audio with astonishing tools such as the audio manipulation software produced by ElevenLabs <ref type="bibr" target="#b86">[87]</ref>, which can clone voices, generate new ones, translate them into other languages, and make them read texts, new kinds of audio enhancement can be experimented with, and although they cannot be used now, because they cannot replicate stuttering or other kinds of fluency features that characterize people affected with SLI yet, they are promising tools to take into consideration for the near future.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions</head><p>This work proposes a novel approach to Speech and Language Impairment (SLI) detection, based solely on audio and AI audio-based techniques, together within an entirely new dataset composed of English speakers affected by SLI. The results show that, even with some limitations related to the scarcity of data available, Deep Learning methods can achieve accurate estimations on healthy or impaired speakers. In particular, wav2vec 2.0, with a Fully Connected layer as the classification head, reaches an accuracy of over 96% on our test set. Our findings also confirm that data audio augmentation techniques are fundamental to training Deep Learning models adequately. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Audio sample from our dataset</figDesc><graphic coords="4,89.29,84.19,203.37,78.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Audio sample with noise from our dataset</figDesc><graphic coords="4,89.29,198.01,203.37,78.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Log Mel Spectrogram of a sample from our dataset</figDesc><graphic coords="5,89.29,182.45,203.36,168.18" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Wav2vec 2.0 pipeline</figDesc><graphic coords="6,89.29,84.19,203.36,109.14" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Wav2vec 2.0 + SVM confusion matrix</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :Figure 7 :</head><label>67</label><figDesc>Figure 6: Wav2vec 2.0 + FC confusion matrix</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Luca Corvitto et al. CEUR Workshop Proceedings 19-31 the audio domain. As a consequence, researchers often have to deal with datasets of insufficient size or quality. Usually, diagnosis of this type of problem is carried out with human experts, with special in-loco tests</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Dataset samples</figDesc><table><row><cell></cell><cell></cell><cell>Train</cell><cell></cell><cell>Test</cell></row><row><cell></cell><cell>SLI</cell><cell>Healthy</cell><cell>SLI</cell><cell>Healthy</cell></row><row><cell cols="2">Non-augmented 1010</cell><cell>1010</cell><cell>124</cell><cell>125</cell></row><row><cell>Time-shifted</cell><cell>893</cell><cell>1010</cell><cell>104</cell><cell>125</cell></row><row><cell>Time-stretched</cell><cell>1010</cell><cell>1010</cell><cell>124</cell><cell>125</cell></row><row><cell>Pitch-shifted</cell><cell>2020</cell><cell>2020</cell><cell>248</cell><cell>250</cell></row><row><cell>Noise-addition</cell><cell>1010</cell><cell>1010</cell><cell>124</cell><cell>125</cell></row><row><cell>Total</cell><cell>5943</cell><cell>6060</cell><cell>724</cell><cell>750</cell></row><row><cell>Dataset</cell><cell></cell><cell>12003</cell><cell></cell><cell>1474</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Parameters used to compute the Spectrogram</figDesc><table><row><cell cols="2">Log Mel Spectrum Parameters</cell></row><row><cell>Sample rate</cell><cell>22050</cell></row><row><cell>Windows length</cell><cell>2048</cell></row><row><cell>Hop length</cell><cell>512</cell></row><row><cell>N mels</cell><cell>128</cell></row><row><cell>Table 4</cell><cell></cell></row><row><cell>Architectures Accuracy</cell><cell></cell></row><row><cell>Models</cell><cell>Accuracy</cell></row><row><cell>LASSO (Full Model) [78]</cell><cell>0.84</cell></row><row><cell>1NN CHI Strategy</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Architectures overall performances</figDesc><table><row><cell>Model</cell><cell cols="2">F1 Score Precision</cell><cell>Recall</cell></row><row><cell>Wav2vec2.0 + SVM</cell><cell>0.6316</cell><cell>0.6923</cell><cell>0.5806</cell></row><row><cell>Wav2vec2.0 + FC</cell><cell>0.9655</cell><cell>0.9641</cell><cell>0.9668</cell></row><row><cell>Spectrogram + CNN</cell><cell>0.9187</cell><cell>0.9983</cell><cell>0.8508</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A deep learning approach for sound event recognition using a brain inspired representation, IEEE Transactions on Information Forensics and Security 15 (2020) 3610-3624. doi:10.1109/TIFS.2020. 2994740.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Machine learning and natural language processing in mental health: Systematic review</title>
		<author>
			<persName><forename type="first">A</forename><surname>Le Glaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Haralambous</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D.-H</forename><surname>Kim-Dufor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lenca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Billot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Ryan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Marsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Devylder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Walter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Berrouiguet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lemey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Med Internet Res</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">e15708</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">I</forename><surname>Patrício</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rieder</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.compag.2018.08.001</idno>
		<ptr target="https://doi.org/10.1016/j.compag.2018.08.001" />
	</analytic>
	<monogr>
		<title level="j">Computers and Electronics in Agriculture</title>
		<imprint>
			<biblScope unit="volume">153</biblScope>
			<biblScope unit="page" from="69" to="81" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Enhancing sentiment analysis on seed-iv dataset with vision transformers: A comparative study</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">E</forename><surname>Tibermacine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tibermacine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Guettala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Russo</surname></persName>
		</author>
		<idno type="DOI">10.1145/3638985.3639024</idno>
	</analytic>
	<monogr>
		<title level="m">ACM International Conference Proceeding Series</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="238" to="246" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Harnessing the heart of big data</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Scruggs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Watson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hermjakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Lindsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ping</surname></persName>
		</author>
		<idno type="DOI">10.1161/CIRCRESAHA.115.306013</idno>
		<idno>doi:10.1161</idno>
		<ptr target="/CIRCRESAHA.115.306013" />
	</analytic>
	<monogr>
		<title level="m">Access, Bronze Open Access, Green Open Access</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">116</biblScope>
			<biblScope unit="page" from="1115" to="1119" />
		</imprint>
	</monogr>
	<note>cited by: 44; All Open</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Eyetracking system with low-end hardware: Development and evaluation</title>
		<author>
			<persName><forename type="first">E</forename><surname>Iacobelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ponzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Russo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<idno type="DOI">10.3390/info14120644</idno>
	</analytic>
	<monogr>
		<title level="j">Information</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Novel approach toward medical signals classifier</title>
		<author>
			<persName><forename type="first">M</forename><surname>Woźniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Nowicki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pappalardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<idno type="DOI">10.1109/IJCNN.2015.7280556</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Joint Conference on Neural Networks</title>
				<meeting>the International Joint Conference on Neural Networks</meeting>
		<imprint>
			<date type="published" when="2015-09">September 2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An interpretable deep learning model for automatic sound classification</title>
		<author>
			<persName><forename type="first">P</forename><surname>Zinemanas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rocamora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Miron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Font</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Serra</surname></persName>
		</author>
		<idno type="DOI">10.3390/electronics10070850</idno>
		<ptr target="https://www.mdpi.com/2079-9292/10/7/850.doi:10.3390/electronics10070850" />
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Room Identification with Personal Voice Assistants (Extended Abstract</title>
		<author>
			<persName><forename type="first">M</forename><surname>Azimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Roedig</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-95484-0_19</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="317" to="327" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A domain-specific generative chatbot trained from little data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kapočiūtė-Dzikienė</surname></persName>
		</author>
		<idno type="DOI">10.3390/app10072221</idno>
		<ptr target="https://www.mdpi.com/2076-3417/10/7/2221.doi:10.3390/app10072221" />
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tariq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lee</surname></persName>
		</author>
		<idno type="DOI">10.1109/BigData.2018.8622587</idno>
		<title level="m">Audio iot analytics for home automation safety</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="5181" to="5186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An advanced solution based on machine learning for remote emdr therapy</title>
		<author>
			<persName><forename type="first">F</forename><surname>Fiani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Russo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<idno type="DOI">10.3390/technologies11060172</idno>
	</analytic>
	<monogr>
		<title level="j">Technologies</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A review of the application of acoustic emission technique in engineering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gholizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Leman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">T</forename><surname>Baharudin</surname></persName>
		</author>
		<idno type="DOI">10.12989/sem.2015.54.6.1075</idno>
	</analytic>
	<monogr>
		<title level="j">Structural Engineering and Mechanics</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1075" to="1095" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Audio classification techniques in home environments for elderly/dependant people</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lozano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Hernáez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Picón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camarena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Navas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computers Helping People with Special Needs</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Miesenberger</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Klaus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Zagler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Karshmer</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="320" to="323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Blumstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Mennill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Clemins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Girod</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Patricelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Deppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Krakauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Cortopassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">F</forename><surname>Hanser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mc-Cowan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N G</forename><surname>Kirschel</surname></persName>
		</author>
		<idno type="DOI">10.1111/j.1365-2664.2011.01993.x</idno>
		<ptr target="https://doi.org/10.1111/j.1365-2664.2011.01993.x" />
	</analytic>
	<monogr>
		<title level="j">Journal of Applied Ecology</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="758" to="767" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V M</forename><surname>Bishop</surname></persName>
		</author>
		<title level="m">Uncommon understanding: Development and disorders of language comprehension in children</title>
				<imprint>
			<publisher>Psychology Press/Erlbaum (UK) Taylor and Francis</publisher>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">The grammatical characterization of developmental dysphasia</title>
		<author>
			<persName><forename type="first">H</forename><surname>Clahsen</surname></persName>
		</author>
		<idno type="DOI">10.1515/ling.1989.27.5.897</idno>
		<idno>doi: 10.1515</idno>
		<ptr target="/ling.1989.27.5.897" />
		<imprint>
			<date type="published" when="1989">1989</date>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="897" to="920" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Specific language impairment as a period of extended optional infinitive</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Rice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wexler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Cleave</surname></persName>
		</author>
		<idno type="DOI">10.1044/jshr.3804.850</idno>
		<ptr target="https://pubs.asha.org/doi/abs/10.1044/jshr.3804.850.doi:10.1044/jshr.3804.850" />
	</analytic>
	<monogr>
		<title level="j">Journal of Speech, Language, and Hearing Research</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="850" to="863" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Domain-specific cognitive systems: insight from grammatical-sli</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">K</forename><surname>Van Der Lely</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.tics.2004.12.002</idno>
		<ptr target="https://doi.org/10.1016/j.tics.2004.12.002.doi:10.1016/j.tics.2004.12.002" />
	</analytic>
	<monogr>
		<title level="j">Trends in Cognitive Sciences</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="53" to="59" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Ten questions about terminology for children with unexplained language problems</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V M</forename><surname>Bishop</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Lang. Commun. Disord</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="381" to="415" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Long-term consistency in speech/language profiles: II. behavioral, emotional, and social outcomes</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Beitchman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Brownlie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Walters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Inglis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lancee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Am. Acad. Child Adolesc. Psychiatry</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="815" to="825" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Social and behavioral characteristics of preschoolers with specific language impairment</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Stanton-Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Justice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">E</forename><surname>Skibbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">L</forename><surname>Grant</surname></persName>
		</author>
		<idno type="DOI">10.1177/02711214070270020501</idno>
		<ptr target="https://doi.org/10.1177/02711214070270020501.doi:10.1177/02711214070270020501" />
	</analytic>
	<monogr>
		<title level="j">Topics in Early Childhood Special Education</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="98" to="109" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Wagner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seiderer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>André</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:52192644" />
		<title level="m">Deep learning in paralinguistic recognition tasks: Are hand-crafted features still relevant?</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Interspeech</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Learning environmental sounds with end-to-end convolutional neural network</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tokozume</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Harada</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICASSP.2017.7952651</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="2721" to="2725" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Samplecnn: Endto-end deep convolutional neural networks using very small filters for music classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nam</surname></persName>
		</author>
		<idno type="DOI">10.3390/app8010150</idno>
		<ptr target="https://www.mdpi.com/2076-3417/8/1/150.doi:10.3390/app8010150" />
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Identifying children with clinical language disorder: An application of machine-learning classification</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Justice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-Y</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A R</forename><surname>Logan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Learn. Disabil</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="351" to="365" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Reinforcement learning in young adults with developmental language impairment</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Tomblin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Brain Lang</title>
		<imprint>
			<biblScope unit="volume">123</biblScope>
			<biblScope unit="page" from="154" to="163" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Eeg based identification of learning disabilities using machine learning algorithms</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Ahire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nitin</forename></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wagh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Neurol Disord</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Data augmentation and deep learning methods in sound classification: A systematic review</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">O</forename><surname>Abayomi-Alli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Damaševičius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Qazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Adedoyin-Olowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Misra</surname></persName>
		</author>
		<idno type="DOI">10.3390/electronics11223795</idno>
		<ptr target="https://www.mdpi.com/2079-9292/11/22/3795.doi:10.3390/electronics11223795" />
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Using modularity metrics to assist move method refactoring of large systems</title>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pappalardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<idno type="DOI">10.1109/CISIS.2013.96</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings -2013 7th International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2013</title>
				<meeting>-2013 7th International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2013</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="529" to="534" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Real-time cloud-based game management system via cuckoo search algorithm</title>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Woźniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<idno type="DOI">10.1515/eletel-2015-0043</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Electronics and Telecommunications</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="page" from="333" to="338" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Spectral images based environmental sound classification using cnn with meaningful data augmentation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Mushtaq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-F</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q.-V</forename><surname>Tran</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.apacoust.2020.107581</idno>
		<ptr target="https://doi.org/10.1016/j.apacoust.2020.107581" />
	</analytic>
	<monogr>
		<title level="j">Applied Acoustics</title>
		<imprint>
			<biblScope unit="volume">172</biblScope>
			<biblScope unit="page">107581</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Graphic object feature extraction system based on cuckoo search algorithm</title>
		<author>
			<persName><forename type="first">M</forename><surname>Woźniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2016.08.068</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">66</biblScope>
			<biblScope unit="page" from="20" to="31" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Invariance reduces variance: Understanding data augmentation in deep learning and beyond</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dobriban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<idno>ArXiv abs/1907.10905</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:198895147" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">A survey on image data augmentation for deep learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Shorten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Khoshgoftaar</surname></persName>
		</author>
		<idno type="DOI">10.1186/s40537-019-0197-0</idno>
		<ptr target="https://doi.org/10.1186/s40537-019-0197-0.doi:10.1186/s40537-019-0197-0" />
	</analytic>
	<monogr>
		<title level="j">Journal of Big Data</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">60</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Simplified firefly algorithm for 2d image key-points search</title>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pappalardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Marszalek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Polap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wozniak</surname></persName>
		</author>
		<idno type="DOI">10.1109/CIHLI.2014.7013395</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Computational Intelligence for Human-Like Intelligence, Proceedings</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
	<note>IEEE SSCI 2014 -2014 IEEE Symposium Series on Computational Intelligence -CIHLI 2014</note>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">A</forename><surname>Greco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Petkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saggese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vento</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aren</forename></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Is the colony of ants able to recognize graphic objects?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Woźniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tramontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Damaševičius</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-24770-0_33</idno>
	</analytic>
	<monogr>
		<title level="j">Communications in Computer and Information Science</title>
		<imprint>
			<biblScope unit="volume">538</biblScope>
			<biblScope unit="page" from="376" to="387" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Environmental sound classification using a regularized deep convolutional neural network with data augmentation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Mushtaq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-F</forename><surname>Su</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.apacoust.2020.107389</idno>
		<ptr target="https://doi.org/10.1016/j.apacoust.2020.107389" />
	</analytic>
	<monogr>
		<title level="j">Applied Acoustics</title>
		<imprint>
			<biblScope unit="volume">167</biblScope>
			<biblScope unit="page">107389</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Analysis of dnn speech signal enhancement for robust speaker recognition</title>
		<author>
			<persName><forename type="first">O</forename><surname>Novotny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Plchot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Glembek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Cernocky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Burget</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.csl.2019.06.004</idno>
		<ptr target="https://doi.org/10.1016/j.csl.2019.06.004" />
	</analytic>
	<monogr>
		<title level="j">Computer Speech and Language</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="page" from="403" to="421" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">An evolutionary-based generative approach for audio data augmentation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mertes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">W</forename><surname>Schuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>André</surname></persName>
		</author>
		<idno type="DOI">10.1109/MMSP48831.2020.9287156</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">SpecAugment: A simple data augmentation method for automatic speech recognition</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Chiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Cubuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<idno type="DOI">10.21437/interspeech.2019-2680</idno>
		<ptr target="https://doi.org/10.21437%2Finterspeech.2019-2680.doi:10.21437/interspeech.2019-2680" />
	</analytic>
	<monogr>
		<title level="m">Interspeech 2019</title>
				<imprint>
			<publisher>ISCA</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Data augmentation for internet of things dialog system</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J P C</forename><surname>Rodrigues</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11036-020-01638-9</idno>
		<ptr target="https://doi.org/10.1007/s11036-020-01638-9.doi:10.1007/s11036-020-01638-9" />
	</analytic>
	<monogr>
		<title level="j">Mobile Networks and Applications</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="158" to="171" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification</title>
		<author>
			<persName><forename type="first">T</forename><surname>Koike</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">W</forename><surname>Schuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yamamoto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Annual International Conference of the IEEE Engineering in Medicine &amp; Biology Society (EMBC)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Respiratory diseases recognition through respiratory sound with the help of deep neural network</title>
		<author>
			<persName><forename type="first">V</forename><surname>Basu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rana</surname></persName>
		</author>
		<idno type="DOI">10.1109/CINE48825.2020.234388</idno>
	</analytic>
	<monogr>
		<title level="m">4th International Conference on Computational Intelligence and Networks (CINE)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Lda-based data augmentation algorithm for acoustic scene classification</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Leng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2020.105600</idno>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">195</biblScope>
			<biblScope unit="page">105600</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">Speech emotion recognition using data augmentation</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">M</forename><surname>Praseetha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">P</forename><surname>Joby</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10772-021-09883-3</idno>
		<ptr target="https://doi.org/10.1007/s10772-021-09883-3.doi:10.1007/s10772-021-09883-3" />
	</analytic>
	<monogr>
		<title level="j">International Journal of Speech Technology</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="783" to="792" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lalitha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zakariah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Alotaibi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.apacoust.2020.107519</idno>
		<ptr target="https://doi.org/10.1016/j.apacoust.2020.107519" />
	</analytic>
	<monogr>
		<title level="j">Applied Acoustics</title>
		<imprint>
			<biblScope unit="volume">170</biblScope>
			<biblScope unit="page">107519</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">A bag-of-audio-words approach for snore sounds&apos; excitation localisation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schmitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Janott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pandit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Heiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hemmert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schuller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Speech Communication; 12</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
	<note>ITG Symposium</note>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Optimal window and lattice in gabor transform. application to audio analysis</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lachambre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ricaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stempfel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Torrésani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wiesmeyr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Onchis-Moaca</surname></persName>
		</author>
		<idno type="DOI">10.1109/SYNASC.2015.25</idno>
	</analytic>
	<monogr>
		<title level="m">17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)</title>
				<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="109" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<analytic>
		<title level="a" type="main">User-adaptive models for activity and emotion recognition using deep transfer learning and data augmentation</title>
		<author>
			<persName><forename type="first">E</forename><surname>Garcia-Ceja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Kvernberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Torresen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11257-019-09248-1</idno>
		<ptr target="https://doi.org/10.1007/s11257-019-09248-1.doi:10.1007/s11257-019-09248-1" />
	</analytic>
	<monogr>
		<title level="j">User Modeling and User-Adapted Interaction</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="365" to="393" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<analytic>
		<title level="a" type="main">Experimental design and analysis of sound event detection systems: Case studies</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ykhlef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ykhlef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chiboub</surname></persName>
		</author>
		<idno type="DOI">10.1109/ISPA48434.2019.8966798</idno>
	</analytic>
	<monogr>
		<title level="m">6th International Conference on Image and Signal Processing and their Applications (ISPA)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b51">
	<analytic>
		<title level="a" type="main">Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lalitha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zakariah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Alotaibi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.apacoust.2020.107519</idno>
		<ptr target="https://doi.org/10.1016/j.apacoust.2020.107519" />
	</analytic>
	<monogr>
		<title level="j">Applied Acoustics</title>
		<imprint>
			<biblScope unit="volume">170</biblScope>
			<biblScope unit="page">107519</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">Stethoscope-sensed speech and breath-sounds for person identification with sparse training data</title>
		<author>
			<persName><forename type="first">W.-H</forename><surname>V.-T. Tran</surname></persName>
		</author>
		<author>
			<persName><surname>Tsai</surname></persName>
		</author>
		<idno type="DOI">10.1109/JSEN.2019.2945364</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Sensors Journal</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="848" to="859" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<analytic>
		<title level="a" type="main">Data augmentation using virtual microphone array synthesis and multi-resolution feature extraction for isolated word dysarthric speech recognition</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A M</forename><surname>Celin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nagarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vijayalakshmi</surname></persName>
		</author>
		<idno type="DOI">10.1109/JSTSP.2020.2972161</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Journal of Selected Topics in Signal Processing</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="346" to="354" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b54">
	<analytic>
		<title level="a" type="main">Urban sound event classification based on local and global features aggregation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kobayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Murakawa</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.apacoust.2016.08.002</idno>
		<ptr target="https://doi.org/10.1016/j.apacoust.2016.08.002" />
	</analytic>
	<monogr>
		<title level="j">Applied Acoustics</title>
		<imprint>
			<biblScope unit="volume">117</biblScope>
			<biblScope unit="page" from="246" to="256" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>acoustics in Smart Cities</note>
</biblStruct>

<biblStruct xml:id="b55">
	<analytic>
		<title level="a" type="main">CoughGAN: Generating synthetic coughs that improve respiratory disease classification()</title>
		<author>
			<persName><forename type="first">V</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Vatanparvar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nemati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Rahman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kuang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annu Int Conf IEEE Eng Med Biol Soc</title>
		<imprint>
			<biblScope unit="page" from="5682" to="5688" />
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b56">
	<analytic>
		<title level="a" type="main">Data augmentation using gan for sound based covid 19 diagnosis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Yella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rajan</surname></persName>
		</author>
		<idno type="DOI">10.1109/IDAACS53288.2021.9660990</idno>
	</analytic>
	<monogr>
		<title level="m">11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="606" to="609" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b57">
	<analytic>
		<title level="a" type="main">Neural network prediction of sound quality via domain knowledge-based data augmentation and bayesian approach with small data sets</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ymssp.2021.107713</idno>
		<ptr target="https://doi.org/10.1016/j.ymssp.2021.107713" />
	</analytic>
	<monogr>
		<title level="j">Mechanical Systems and Signal Processing</title>
		<imprint>
			<biblScope unit="volume">157</biblScope>
			<biblScope unit="page">107713</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b58">
	<analytic>
		<title level="a" type="main">Musical instrument tagging using data augmentation and effective noisy data processing</title>
		<author>
			<persName><forename type="first">D</forename><surname>Koszewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kostek</surname></persName>
		</author>
		<idno type="DOI">10.17743/jaes.2019.0050</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Audio Engineering Society</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="57" to="65" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b59">
	<analytic>
		<title level="a" type="main">Snore-GANs: Improving automatic snore sound classification with synthesized data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Janott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schuller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE J Biomed Health Inform</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="300" to="310" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b60">
	<analytic>
		<title level="a" type="main">Global healthcare fairness: We should be sharing more, not less, data</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P</forename><surname>Seastedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Schwab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>O'brien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wakida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G F</forename><surname>Marcelo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Agha-Mir-Salim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Frigola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Ndulue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marcelo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Celi</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pdig.0000102</idno>
		<ptr target="https://doi.org/10.1371/journal.pdig.0000102.doi:10.1371/journal.pdig.0000102" />
	</analytic>
	<monogr>
		<title level="j">PLOS Digital Health</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b61">
	<analytic>
		<title level="a" type="main">Digitization of healthcare sector: A study on privacy and security concerns</title>
		<author>
			<persName><forename type="first">M</forename><surname>Paul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Maglaras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Ferrag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Almomani</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.icte.2023.02.007</idno>
		<ptr target="https://doi.org/10.1016/j.icte.2023.02.007" />
	</analytic>
	<monogr>
		<title level="j">ICT Express</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="571" to="588" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b62">
	<analytic>
		<title level="a" type="main">Digital transformation in healthcare: Technology acceptance and its applications</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Stoumpos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Kitsios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename></persName>
		</author>
		<idno type="DOI">10.3390/ijerph20043407</idno>
	</analytic>
	<monogr>
		<title level="j">Int J Environ Res Public Health</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b63">
	<analytic>
		<title level="a" type="main">Digital transformation in healthcarearchitectures of present and future information technologies</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gopal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Suter-Crazzolara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Toldo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Eberhardt</surname></persName>
		</author>
		<idno type="DOI">10.1515/cclm-2018-0658</idno>
		<idno>doi:</idno>
		<ptr target="10.1515/cclm-2018-0658" />
	</analytic>
	<monogr>
		<title level="j">Clinical Chemistry and Laboratory Medicine</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="328" to="335" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>CCLM)</note>
</biblStruct>

<biblStruct xml:id="b64">
	<monogr>
		<ptr target="https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html" />
		<title level="m">Wave file format specification</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b65">
	<analytic>
		<title level="a" type="main">Wav2vec 2.0: A framework for self-supervised learning of speech representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Baevski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohamed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Auli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20</title>
				<meeting>the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b66">
	<analytic>
		<title level="a" type="main">Librispeech: An asr corpus based on public domain audio books</title>
		<author>
			<persName><forename type="first">V</forename><surname>Panayotov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Povey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khudanpur</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICASSP.2015.7178964</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="5206" to="5210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b67">
	<analytic>
		<title level="a" type="main">Wav2vec2-based paralinguistic systems to recognise vocalised emotions and stuttering</title>
		<author>
			<persName><forename type="first">T</forename><surname>Grósz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Porjazovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Getman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kadiri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kurimo</surname></persName>
		</author>
		<idno type="DOI">10.1145/3503161.3551572</idno>
		<idno>doi:10.1145/3503161.3551572</idno>
		<ptr target="https://doi.org/10.1145/3503161.3551572" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th ACM International Conference on Multimedia, MM &apos;22</title>
				<meeting>the 30th ACM International Conference on Multimedia, MM &apos;22<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="7026" to="7029" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b68">
	<analytic>
		<title level="a" type="main">Automatic speech disfluency detection using wav2vec2. 0 for different languages with variable lengths</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wumaier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">7579</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b69">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">I</forename><surname>Jordal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tamazian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">T</forename><surname>Chourdakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Angonin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Karpov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Dhyani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sarioglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Berk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Mirus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><surname>Marvinlvn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solomidhero</surname></persName>
		</author>
		<author>
			<persName><surname>Alum</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.7010042</idno>
	</analytic>
	<monogr>
		<title level="j">iver56/audiomentations</title>
		<imprint>
			<biblScope unit="issue">0</biblScope>
			<biblScope unit="page">33</biblScope>
		</imprint>
	</monogr>
	<note>????</note>
</biblStruct>

<biblStruct xml:id="b70">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Richardson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Cook</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Crane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dunnington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>François</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Keane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moldovan-Grünfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ooms</surname></persName>
		</author>
		<ptr target="https://arrow.apache.org/docs/r/" />
		<title level="m">Apache Arrow, arrow: Integration to &apos;Apache&apos; &apos;Arrow</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b71">
	<analytic>
		<title level="a" type="main">Handbook for acoustic ecology</title>
		<author>
			<persName><forename type="first">B</forename><surname>Truax</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Leonardo</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">83</biblScope>
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b72">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">N</forename><surname>Bracewell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">N</forename><surname>Bracewell</surname></persName>
		</author>
		<title level="m">The Fourier transform and its applications</title>
				<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>McGraw-Hill</publisher>
			<date type="published" when="1986">1986</date>
			<biblScope unit="volume">31999</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b73">
	<analytic>
		<title level="a" type="main">Support-vector networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="273" to="297" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b74">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1512.03385</idno>
		<title level="m">Deep residual learning for image recognition</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b75">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">W</forename><surname>Clément Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wav2vec2</forename></persName>
		</author>
		<ptr target="https://huggingface.co/docs/transformers/model_doc/wav2vec2" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b76">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b77">
	<analytic>
		<title level="a" type="main">Identifying children with clinical language disorder: an application of machine-learning classification</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Justice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-Y</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Logan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of learning disabilities</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="351" to="365" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b78">
	<analytic>
		<title level="a" type="main">An evaluation of measures to dissociate language and communication disorders from healthy controls using machine learning techniques</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gaspers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Thiele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Foltz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stenneken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tscherepanow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd acm sighit international health informatics symposium</title>
				<meeting>the 2nd acm sighit international health informatics symposium</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="209" to="218" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b79">
	<analytic>
		<title level="a" type="main">Communication disorder identification from recorded speech using machine learning assisted mobile application</title>
		<author>
			<persName><forename type="first">C</forename><surname>Kanimozhiselvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Santhiya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="789" to="793" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b80">
	<analytic>
		<title level="a" type="main">Semantics derived automatically from language corpora contain human-like biases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Caliskan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Bryson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Narayanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">356</biblScope>
			<biblScope unit="page" from="183" to="186" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b81">
	<analytic>
		<title level="a" type="main">The political biases of ChatGPT</title>
		<author>
			<persName><forename type="first">D</forename><surname>Rozado</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soc. Sci</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">148</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b82">
	<analytic>
		<title level="a" type="main">Potential biases in machine learning algorithms using electronic health record data</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Gianfrancesco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tamang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yazdany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schmajuk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JAMA Intern. Med</title>
		<imprint>
			<biblScope unit="volume">178</biblScope>
			<biblScope unit="page" from="1544" to="1547" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b83">
	<analytic>
		<title level="a" type="main">Health data poverty: an assailable barrier to equitable digital health care</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ibrahim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zariffa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Denniston</surname></persName>
		</author>
		<idno type="DOI">10.1016/S2589-7500(20)30317-4</idno>
		<ptr target="https://doi.org/10.1016/S2589-7500(20)30317-4" />
	</analytic>
	<monogr>
		<title level="j">The Lancet Digital Health</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="e260" to="e265" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b84">
	<analytic>
		<title level="a" type="main">Speech databases of typical children and children with SLI</title>
		<author>
			<persName><forename type="first">P</forename><surname>Grill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tučková</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS One</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page">e0150365</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b85">
	<analytic>
		<title level="a" type="main">Children&apos;s voice and voice disorders</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mcallister</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sjölander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Seminars in speech and language</title>
				<imprint>
			<publisher>Thieme Medical Publishers</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="71" to="079" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b86">
	<monogr>
		<title level="m" type="main">Eleven labs</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Piotr Dabkowski</surname></persName>
		</author>
		<ptr target="https://elevenlabs.io/voice-lab" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
