<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multisource Approaches to Italian Sign Language (LIS) Recognition: Insights from the MultiMedaLIS Dataset</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gaia</forename><surname>Caligiore</surname></persName>
							<email>gaia.caligiore@unimore.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Modena Reggio-Emilia</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Raffaele</forename><surname>Mineo</surname></persName>
							<email>raffaele.mineo@phd.unict.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Catania</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Concetto</forename><surname>Spampinato</surname></persName>
							<email>concetto.spampinato@unict.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Catania</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Egidio</forename><surname>Ragonese</surname></persName>
							<email>egidio.ragonese@unict.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Catania</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simone</forename><surname>Palazzo</surname></persName>
							<email>simone.palazzo@unict.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Catania</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sabina</forename><surname>Fontana</surname></persName>
							<email>sfontana@unict.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Catania</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Tenth Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multisource Approaches to Italian Sign Language (LIS) Recognition: Insights from the MultiMedaLIS Dataset</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CB8DF3419281E31021F4A5BBF9712E07</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Italian Sign Language, Sign Language Recognition, Deep Learning, Computer Vision 000-0002-7087-1819 (G. Caligiore), 0000-0002-1171-5672 (R. Mineo)</term>
					<term>0000-0001-6653-2577 (C. Spampinato)</term>
					<term>0000-0001-6893-7076 (E. Ragonese)</term>
					<term>0000-0002-2441-0982 (S. Palazzo)</term>
					<term>0000-0003-3083-1676 (S. Fontana)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Given their status as unwritten visual-gestural languages, research on the automatic recognition of sign languages has increasingly implemented multisource capturing tools for data collection and processing. This paper explores advancements in Italian Sign Language (LIS) recognition using a multimodal dataset in the medical domain: the MultiMedaLIS Dataset. We investigate the integration of RGB frames, depth data, optical flow, and skeletal information to develop and evaluate two computational models: Skeleton-Based Graph Convolutional Network (SL-GCN) and Spatiotemporal Separable Convolutional Network (SSTCN). RADAR data was collected but not included in the testing phase. Our experiments validate the effectiveness of these models in enhancing the accuracy and robustness of isolated LIS signs recognition. Our findings highlight the potential of multisource approaches in computational linguistics to improve linguistic accessibility and inclusivity for members of the signing community.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Italian Sign Language (LIS-Lingua dei Segni Italiana) is the primary means of communication within the Italian signing community. Due to their visual-gestural modality, sign languages (SLs) were initially not considered fully-fledged linguistic systems. However, since the 1960s, beginning with Stokoe's pioneering works <ref type="bibr" target="#b0">[1]</ref>, the contemporary study of SLs has evolved into a robust field of research. Over the past halfcentury, significant societal and scientific advancements have transformed the perception and status of SLs, now recognized as natural and complete languages, having received legal recognition in many countries.</p><p>In the Italian context, the study of signed communication began in the early 1980s, involving both hearing and deaf researchers. At that time, what we now call LIS was still mostly unnamed and was often referred to as 'mime' or 'gesture' by both signers and non-signers alike <ref type="bibr" target="#b1">[2]</ref>. The first significant publications on LIS <ref type="bibr" target="#b2">[3]</ref> [4], along with the collaborative efforts of deaf and hearing researchers, initiated a transformative period in SL research in the Italian context <ref type="bibr" target="#b4">[5]</ref>. This shift in perspective was influenced by factors beyond the language itself, such as increased meta-linguistic awareness and greater visibility of the community and its language to the wider public. In fact, from a societal perspective, the visibility of SL in Italy, especially in media, has significantly changed with technological advancements, mirroring global trends.</p><p>In the late 1980s, Italy introduced subtitles in movies on television, marking a step toward content accessibility. The importance of media accessibility, through subtitles or LIS interpreting, was accentuated during the COVID-19 pandemic. The need for equitable access to critical information for deaf individuals became evident, with efforts born within the community stressing the central role of LIS in ensuring that the deaf signers received accessible information during challenging times <ref type="bibr" target="#b5">[6]</ref>, highlighting the significant communication barriers that deaf individuals face, especially when in-person interactions were restricted. This increased visibility, along with persistent advocacy by the signing community, played a crucial role in the official recognition of LIS and Tactile LIS (LISt) in May 2021.</p><p>Within this evolving societal and linguistic framework, the increased media visibility of LIS and the introduction of video capturing tools in daily lives, language collection emerges as a central issue. For SLs, the need for comprehensive collections is particularly significant. Unlike oral languages, which in some cases have developed standardized written systems, SLs must rely on video collections to capture signed communication accurately. These videos, whether raw or annotated, are essential for analyzing SLs with both qualitative and quantitative evidence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Automatic Sign Language Recognition</head><p>The development and use of preferably annotated SL datasets or corpora are crucial for training and validating automatic recognition models, and access to high-quality data from diverse SLs and cultural contexts enhances the generalizability of these solutions.</p><p>Comprehensive data collections of this kind ensures that models can effectively understand and process the wide range of linguistic and cultural nuances present in different SLs.</p><p>In the domain of automatic sign language recognition (SLR) of LIS, the integration of visual and spatial information presents a complex challenge. As mentioned, LIS operates through the visual-gestural channel. More precisely, it is characterized as multimodal 2 (signed discourse is comprised of manual and body components) and multilinear (manual and body components are performed simultaneously) <ref type="bibr" target="#b1">[2]</ref>. Recent advancements in SLR have been significantly driven by annotated datasets, which serve as the basis for training and validating models <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>Machine learning technologies, particularly deep learning neural networks, have facilitated the development of more precise and robust models for SL interpretation. These models are able to refine their performance through training on diverse and complex 2 Given our group's interdisciplinarity, we found "multimodal" can mean different things depending on one's background: in linguistics, it refers to the employment of manual and body components while signing, while in computer vision, it means using multiple capturing tools. To differentiate, we use "multisource" for capturing tools. Thus, "multimodal" in this text follows SL linguistics terminology.</p><p>datasets. Additionally, computer vision plays a central role in this field by enabling real-time analysis and interpretation of body and manual components <ref type="bibr" target="#b1">[2]</ref> that is hand movements, facial expressions, and body posture <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>A significant challenge in applying deep learning and computer vision methods to SLR lies in ensuring the quality and adequacy of training data, which is essential for achieving optimal model performance.</p><p>Therefore, in this study, we focus on evaluating the efficacy of the MultiMedaLIS Dataset (Multimodal Medical LIS Dataset) and assessing various deep learning models for SLR which employ advanced deep learning techniques to interpret isolated signs by integrating diverse data types such as RGB video, depth information, optical flow, and skeletal data.</p><p>We benchmark our Dataset with two models: the Skeleton-Based Graph Convolutional Network (SL-GCN) and the Spatiotemporal Separable Convolutional Network (SSTCN). These models are trained on the MultiMedaLIS Dataset, showcasing how the incorporation of multisource data can enhance the accuracy of sign recognition. This approach aims at testing the potential of integrating different data modalities to improve the robustness and performance of SLR systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">State of the Art</head><p>In this section, we discuss the state of the art from two perspectives considered during our work on the Dataset: LIS data collection and SLR tools</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">LIS Data Collections</head><p>SL researchers in Italy have been actively engaged in the creation of LIS corpora and datasets. This effort involves a complex process of video data collection and annotation, as SL datasets can vary significantly depending on their intended use. Within this context, SL data collections can be categorized into two main types. The first type includes datasets that feature videos depicting continuous signing, capturing the flow and context of natural SL usage. The second type comprises datasets that focus on isolated signs, which are individual signs presented separately from continuous discourse.</p><p>The scarcity of available LIS data collections has prompted researchers to develop their own resources. Several smaller-scale LIS corpora have been independently established, each serving distinct purposes based on the type of data collected.</p><p>The methodologies employed for collecting LIS data encompass a diverse array of approaches, ranging from naming tasks to semi-structured and spontaneous interviews with deaf signers, to video recording sessions involving hearing individuals learning LIS as a second language (L2) or second modality (M2) <ref type="bibr" target="#b15">[16]</ref>. These documentations serve equally diverse purposes, ranging from documenting the language itself to creating tools for automatic translation highlighting the ongoing commitment of researchers to expand and enrich the available resources for studying LIS <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b23">23,</ref><ref type="bibr" target="#b24">24]</ref>.</p><p>Despite the predominant private nature of corpora collections, an exception to the accessibility challenge is found in the online dictionary SpreadTheSign, a project originating in 2004. Initially conceived as a dictionary for SLs, SpreadTheSign has evolved into a versatile resource for language documentation <ref type="bibr" target="#b25">[25]</ref>. Another significant resource is the Corpus LIS, recognized as the largest collection of spontaneous, semi-structured, and structured videos in LIS by deaf signers. The primary objectives of this corpus were twofold: to collect a substantial quantity of data suitable for quantitative analysis and to establish a comprehensive representation of LIS usage in Italy <ref type="bibr" target="#b26">[26,</ref><ref type="bibr" target="#b27">27,</ref><ref type="bibr" target="#b28">28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">SLR Tools</head><p>Like SL data collections, SLR approaches can be broadly classified into two main categories: those that rely on specialized hardware and those that use visual information. The former employ specialized hardware, such as gloves able to capture precise hand movements. While these systems can provide detailed data, they are often considered intrusive and can compromise the natural flow of communication. Additionally, they are unable to capture the full spectrum of SLs, which includes manual and body components. In contrast, vision-based approaches use visual information captured by cameras, including RGB, depth, infrared, or a combination of these. These methods are less intrusive for users, as they do not require the use of special equipment.</p><p>In SLR, a challenge lies in effectively capturing both body movements and specific motions of hands, arms, and face. For instance, <ref type="bibr" target="#b29">[29]</ref> introduces a multi-scale, multi-modal framework that focuses on spatial details across different scales. This approach involves each visual modality capturing spatial information uniquely, supported by a system operating at three temporal scales. The training methodology emphasizes precise initialization of individual modalities and progressive fusion via ModDrop, which enhances overall robustness and performance.</p><p>Another study proposes an iterative optimization alignment network tailored for weakly supervised continuous SLR <ref type="bibr" target="#b30">[30]</ref>. The framework employs a 3D residual convolutional network for feature extraction, complemented by an encoder-decoder architecture featuring LSTM decoders and Connectionist Temporal Classification (CTC).</p><p>[31] introduces a 3D convolutional neural network enhanced with an attention module, designed to extract spatiotemporal features directly from raw video data. In contrast, <ref type="bibr" target="#b32">[32]</ref> combines bidirectional recurrence and temporal convolutions, emphasizing temporal information's effectiveness in sign tasks, although not covering the full spectrum of movements. Moreover, <ref type="bibr" target="#b33">[33]</ref> employs CNNs, a Feature Pooling Module, and LSTM networks to generate distinctive visual representations but falls short in capturing comprehensive movements and signing.</p><p>However, as previously noted, RGB-based SLR systems can raise privacy concerns, particularly when processing visual data in cloud environments or for machine learning training <ref type="bibr" target="#b34">[34]</ref>. Addressing these issues, radio frequency (RF) sensors have emerged as a promising alternative, ensuring privacy preservation while enabling innovative data representations for SLR. In the literature, deep learning techniques have been applied to various RF modalities such as ultra-wideband (UWB) <ref type="bibr" target="#b35">[35]</ref>, Doppler <ref type="bibr" target="#b36">[36]</ref>, continuous wave (CW) <ref type="bibr" target="#b37">[37]</ref>, micro-Doppler <ref type="bibr" target="#b38">[38]</ref>, frequency modulated continuous wave (FMCW) <ref type="bibr" target="#b13">[14]</ref>, multi-antenna systems <ref type="bibr" target="#b39">[39]</ref>, and millimeter waves <ref type="bibr" target="#b40">[40]</ref>.</p><p>As part of the Dataset discussed in this work, we have also collected RADAR data and are actively analyzing it. However, preliminary results are not available at this time, so they are not included in this report. Currently, RADAR-based solutions have demonstrated robust performance across diverse environmental conditions, highlighting the productivity of incorporating this sensor technology in data collection efforts. Nevertheless, many existing RADAR solutions are tailored to recognizing a limited set of signs, highlighting the ongoing challenge of expanding vocabulary recognition capabilities in datasets like the one discussed in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">The MultiMedaLIS Dataset</head><p>The MultiMedaLIS <ref type="bibr" target="#b41">[41]</ref> Dataset was created thanks to the interdisciplinary collaboration established between the Department of Humanities (DISUM) and the Department of Electrical, Electronic and Computer Engineering (DIEEI) of the University of Catania (Unict). It aims to offer a multimodal collection of LIS signs specifically focused on medical contexts.</p><p>For the data recording protocol, the DIEEI group developed a customized recording software to collect the LIS data, supplemented with a desktop computer and a modified keyboard transformed into a pedal board. This pedal board, equipped with two pedals, allowed handsfree navigation of the software, enabling users to move forward (by pushing on the right pedal) or backward (by pushing on the left pedal) while maintaining a neutral recording position <ref type="foot" target="#foot_0">3</ref> . During sessions, one of 126 Italian labels or alphabet letters was displayed on a screen, with adjustable display time for preparation and transition from one sign to the other. Each recording started from a neutral position, and the right pedal marked the completion of a sign. If errors occurred, the left pedal allowed re-recording. The software's interface features a color-coded background: yellow for preparation and green for recording. Additionally, it supports flexible data expansion, accepting word lists from text files for easy customization in future collections. After the recording process, Dataset included synchronized data capturing facial expressions, hand and body movements and comprises a total of 25,830 sign instances. This includes 205 repetitions of 100 different signs and the 26 signs of the LIS alphabet <ref type="bibr" target="#b41">[41]</ref>. Beyond these 26 signs, the signs included in the MultiMedaLIS Dataset can be broadly categorized into two groups <ref type="bibr" target="#b42">[42]</ref>: semantically marked signs related to health and health issues, and non-semantically marked signs. It is important to note that while the first group of signs is categorized as semantically marked, this classification does not imply that these signs belong exclusively to a specialized jargon lexicon. The decision to categorize signs as semantically marked was driven by their significance in contexts related to health and medical interactions in the post-pandemic world (hence, when the Dataset was first theorized). However, it was also important to include additional signs that could contribute to constructing meaningful utterances in patient-doctor interactions. During the creation of the MultiMedaLIS Dataset, careful consideration was given to selecting signs that could be combined to form coherent and meaningful utterances.</p><p>Regarding the specific form of signs, the MultiMedaLIS Dataset includes a lexicon of standard, isolated signs that are not combined within utterances.</p><p>These signs reflect forms commonly found in online dictionaries and educational materials. To ensure the accuracy of the data, sign variants performed by a professional LIS interpreter during the collection of a test dataset were compared with the same variants found in the online dictionary SpreadTheSign. This comparison aimed to select documented versions of each sign for inclusion in the Dataset. By incorporating these documented variants, we aimed to enhance its precision, reliability, and real-world applicability. This approach contributed to ensuring that the Dataset aligns with established standards and supports effective research and application in the field of LIS.</p><p>When discussing recording tools for state-of-the-art multimodal corpora in the Italian context, such as the Corpus LIS <ref type="bibr" target="#b27">[27]</ref> and the CORMIP <ref type="bibr" target="#b43">[43]</ref> the emphasis is placed on the portability and non-invasiveness of these tools. This approach ensures minimal interference with the signer's natural environment and activities.</p><p>Portable and non-invasive recording tools are chosen specifically for their ability to capture data in familiar, and sometimes domestic, settings without disrupting the signer's surroundings, aiming to maintain the authenticity of the signed interactions and minimize any discomfort or distraction for the participants.</p><p>To capture LIS for recognition with minimal invasiveness we integrated a combination of recording tools. A 60GHz RADAR sensor, employed to capture detailed manual motion data, provided Time-and Frequency-Domain data and Range Doppler Maps for distinguishing moving objects at 13 fps. For more structured depth and facial recognition data, the Realsense D455 depth camera and Kinect v1 were incorporated. The Realsense D455, equipped with dual infrared cameras and RGB mode, captured depth data at 848x480 pixels and RGB data at 1280x720 pixels, both at 30 fps, enabling the tracking of facial expressions through 68 facial points. The Zed v1 and Zed v2 cameras provided high-resolution stereoscopic data, recording at 1920x1080 pixels and 25 fps, with capabilities for generating depth maps and 3D point clouds. Additionally, the Zed v2 offered tracking for 18 body points in both 2D and 3D <ref type="bibr" target="#b41">[41]</ref>. By prioritizing portability and non-invasiveness, high-quality data can be still collected, while respecting the privacy and comfort of the individuals recorded. Anonymization is achieved through the use of the RADAR sensor, which we introduced specifically to address privacy concerns inherent in face-to-face signed communication.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Testing the Dataset</head><p>The MultiMedaLIS Dataset was designed with the aim of supporting the development of SLR models by enabling the collection and integration of information through various data modalities:</p><p>• RGB frames: images extracted from videos.</p><p>• Depth data: three-dimensional information for each RGB frame</p><p>• Optical flow: to emphasize movement • Skeletal data: face landmarks and body joints One of the main components of the Dataset are RGB frames, which are images extracted from videos. These frames provide a two-dimensional visual representation of the signs performed by the signer, capturing details such as hand positions and facial expressions. The Dataset includes depth data, providing a threedimensional aspect to the images. allowing for more detailed information on the distance and relative position of elements in the scene. This type of data is particularly useful for understanding the spatial dynamics of signs.</p><p>Alongside RGB and depth data, the MultiMedaLIS Dataset also contains optical flow information, which describes the movement between consecutive frames. Optical flow is essential for capturing the direction and speed of movements, providing a more detailed understanding of the transitions between various signs. Finally, the Dataset includes skeletal data, representing face landmarks and body joints, allowing for precise tracking of joint and body segment positions, facilitating the analysis of signs in terms of joint movements.</p><p>Managing this multimodal data is an emerging topic in computational linguistics. By combining different sources of information, it is possible to significantly improve the performance of SLR models. For example, integrating depth data with RGB frames can provide a more complete representation of signs, while adding optical flow and skeletal data can further enrich the analysis of movement's temporal structure. In our view, the MultiMedaLIS Dataset provides a solid foundation for exploring these combinations, allowing researchers to develop more effective and accurate solutions for SLR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Models and Architectures</head><p>In the context of automatic SLR, various approaches and model architectures have been tested to leverage the characteristics of multimodal data in the MultiMedaLIS Dataset.</p><p>The SL-GCN (Skeleton-Based Graph Convolutional Network) represents a significant innovation in this field. This model generates skeletal data from videos and creates temporal graphs that capture the spatiotemporal relationships between joint movements. Through finetuning and the combination of different data streams, SL-GCN has demonstrated high accuracy in sign recognition <ref type="bibr" target="#b44">[44]</ref>  <ref type="bibr" target="#b45">[45]</ref>.</p><p>Another prominent architecture is the SSTCN (Spatiotemporal Separable Convolutional Network) <ref type="bibr" target="#b46">[46]</ref>, which excels in feature extraction from videos using HRNet <ref type="bibr" target="#b47">[47]</ref>. This approach has shown an accuracy of 96.33%, highlighting its effectiveness in capturing spatial and temporal dynamics of LIS signs.</p><p>RGB frames are crucial for the visual representation of signs. The process of splitting videos into frames, cropping, and normalization optimally prepares the data for analysis by deep learning models. The use of dense optical flow presents significant challenges in sign recognition. Optical flow extraction using the Farneback algorithm <ref type="bibr" target="#b48">[48]</ref> led to 56% accuracy, highlighting difficulties in capturing precise details of movements, alongside computational limitations. Depth data encoded with Height, Horizontal disparity, Angle (HHA) represent another crucial resource in the MultiMedaLIS Dataset. Applying HHA encoding to depth frames achieved 88% accuracy using the ResNet(2+1)D architecture <ref type="bibr" target="#b49">[49]</ref>, substantiating importance of three-dimensional information in enhancing understanding and interpretation of signs, offering a more detailed perspective compared to twodimensional data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Training and Evaluation Procedure</head><p>For the training of the models, we employed a multistream approach that integrates skeletal, RGB, and depth data to improve sign recognition accuracy. The models were trained on a NVIDIA Tesla T4 16GB GPU using the Adam optimizer with an initial learning rate of 0.001 and a batch size of 8. We applied cross-validation to ensure the robustness of the results, splitting the Dataset into training (70%) and validation (15%) subsets and data augmentation techniques, such as color jittering, changing the brightness, contrast, saturation and hue, to increase the diversity of the training data and improve generalization.</p><p>The loss function adopted for training was categorical cross-entropy, appropriate for multi-class classification tasks. The models were trained for a maximum of 100 epochs, with an early stopping criterion set to terminate training if no improvement in validation loss was observed for 10 consecutive epochs. For evaluation, we used a test set comprising 15% of the Dataset, ensuring that the models were tested on unseen data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Results</head><p>The results demonstrate the model's efficiency in leveraging multi-modal data for improved outcomes. As can be seen in Table <ref type="table">1</ref>, the SL-GCN multi-stream model achieved the best accuracy, with a Top-1 accuracy of 97.98% and a Top-5 accuracy of 99.94%, surpassing the performance of models using single data streams such as skeletal joints, bones, or motion alone. This demonstrates the advantage of combining multiple streams of information to capture both spatial and temporal dynamics of signs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Performance of SL-GCN multi-stream on the test set In Table <ref type="table">2</ref>, datasets trained on the SL-GCN model are compared. Our Dataset produced the highest accuracy (97.98%) among the datasets evaluated, outperforming larger datasets like AUTSL (95.45%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Comparison of different datasets on SL-GCN model Table <ref type="table">3</ref> presents a comparison of different methods across the entire Dataset. The SL-GCN trained on RGB frames achieved the highest accuracy (97.98%), followed by the SSTCN model with 96.33%. The ResNet(2+1)D architecture showed strong performance when applied to RGB frames (97.29%), but struggled when using optical flow data alone, reaching just 56.31% accuracy, suggesting that while the optical flow provides valuable information on motion, it lacks the richness of spatial features found in RGB and depth data. The HHAencoded depth data, when processed with the ResNet(2+1)D model, achieved an accuracy of 88.04%, confirming that depth information is complementary, but not as effective as RGB data in isolation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3 Performance of various methods on the MultiMedaLIS Dataset</head><p>The results highlight importance of combining multiple data modalities, especially RGB and skeletal data, for improving the accuracy and robustness of SLR systems. The performance of the SL-GCN model with multi-stream data shows the model's ability to effectively capture signs, as well as the Dataset's value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9.">Discussion and Conclusion</head><p>In this study, our goal was to demonstrate our first steps into testing the efficacy of the MultiMedaLIS Dataset in contributing to the advancement of the field of SLR through multisource approaches. The integration of RGB frames, depth data, optical flow, and skeletal data has provided a comprehensive basis for developing and evaluating SLR models. Our experiments with the SL-GCN and SSTCN architectures have highlighted advancements in recognizing isolated LIS signs in medical semantic contexts, given the domain of our Dataset.</p><p>The SL-GCN model, trained on skeletal data to construct temporal graphs, achieved accuracy in capturing spatiotemporal relationships critical to sign recognition. This approach not only enhances the precision of rendering LIS signs but is also reinforced by a Dataset able to support robust graph-based convolutional networks in multimodal SLR tasks. At the same time, our Dataset proved robust, precise and variable enough for SSTCN model testing, focusing on spatiotemporal separable convolutions, revealing robust performance in extracting spatial dynamics from RGB frames.</p><p>Having validated the visual modalities on the mentioned models, we have promising preliminary results on adapting these models to accept RADAR data. We plan to extract the pre-trained RADAR data processing module and use it independently during inference. This approach will eliminate the need for RGB visual data. Furthermore, we plan to expand the Dataset by applying the same protocol with 10 deaf signers. This will effectively increase the current Dataset, enhancing the generalizability across different signers. Our goal is to develop an autonomous, resource-constrained system (thanks to the exclusion of RGB data) that operates onedge or even offline. This cost-effective solution can be used in any emergency contexts where direct access to interpreting is not available.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: User interface display presented during the recording phase (green) and preparation phase (yellow).</figDesc><graphic coords="4,94.73,282.93,187.67,63.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Combination of synchronized infrared and depth data from the MultiMedaLIS Dataset.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">The neutral recording position referenced is a seated position in which the user has their arms extended along the sides of the torso, elbows bent at 90°, and palms facing downward<ref type="bibr" target="#b41">[41]</ref>.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Sign language structure: an outline of the visual communication systems of the American deaf</title>
		<author>
			<persName><forename type="first">W</forename><surname>Stokoe</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1960">1960</date>
			<pubPlace>Buffalo, New York</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of Buffalo</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Italian Sign Language from a Cognitive and Sociosemiotic Perspective</title>
		<author>
			<persName><forename type="first">V</forename><surname>Volterra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Roccaforte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Renzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Implications for a general language theory</title>
				<meeting><address><addrLine>Amsterdam-Philadelphia</addrLine></address></meeting>
		<imprint>
			<publisher>John Benjamins Publishing Company</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Gesto: il bambino sordo tra gesto e parola</title>
		<author>
			<persName><forename type="first">M</forename><surname>Montanini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Facchini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fruggeri</surname></persName>
		</author>
		<author>
			<persName><surname>Dal Gesto Al</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1979">1979</date>
			<publisher>Cappelli</publisher>
			<pubPlace>Bologna</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">I segni come le parole: la comunicazione dei sordi</title>
		<author>
			<persName><forename type="first">V</forename><surname>Volterra</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1981">1981</date>
			<publisher>Boringhieri</publisher>
			<pubPlace>Torino</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Language research and language community change: Italian Sign Language (LIS) 1981-2013</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Corazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boyes-Braem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Volterra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Journal of the Sociology of Language</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">236</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Italian Deaf Community at the Time of Coronavirus</title>
		<author>
			<persName><forename type="first">E</forename><surname>Tomasuolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gulli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Volterra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Sociology</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison</title>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Opazo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 2020 IEEE WACV</title>
				<meeting>the 2020 IEEE WACV<address><addrLine>Snowmass, CO, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1448" to="1458" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">AUTSL: A large scale multi-modal Turkish sign language dataset and baseline methods</title>
		<author>
			<persName><forename type="first">O</forename><surname>Mercanoglu Sincan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">Yalim</forename><surname>Keles</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2008.00932</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2008.00932" />
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">MS-ASL: A large-scale data set and benchmark for understanding American sign language</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Vaezi Joze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Koller</surname></persName>
		</author>
		<idno>arXiv, 2018</idno>
		<imprint/>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The significance of facial features for automatic sign language recognition</title>
		<author>
			<persName><forename type="first">U</forename><surname>Agris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Knorr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">F</forename><surname>Kraiss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 8 th IEEE International Conference on Automatic Face &amp; Gesture Recognition</title>
				<meeting>the 8 th IEEE International Conference on Automatic Face &amp; Gesture Recognition<address><addrLine>Amsterdam, Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tornay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Aran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Magimai</forename><surname>Doss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twelfth Language Resources and Evaluation Conference</title>
				<meeting>the Twelfth Language Resources and Evaluation Conference<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6049" to="6056" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X. -S</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE ICCV</title>
		<imprint>
			<biblScope unit="page" from="1221" to="1230" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Training deep networks for facial expression recognition with crowd-sourced label distribution</title>
		<author>
			<persName><forename type="first">E</forename><surname>Barsoum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ferrer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th ACM ICMI</title>
				<meeting>the 18th ACM ICMI</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="279" to="283" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A Novel Detection and Recognition Method for Continuous Hand Gesture Using FMCW Radar in volume 8 of</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="page" from="167264" to="167275" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Yusuf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Habib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moustafa</surname></persName>
		</author>
		<title level="m">Real-time hand gesture recognition: Integrating skeleton-based data fusion and multi-stream CNN</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Le Lingue dei Segni nel &apos;Volume Complementare&apos; e l&apos;Insegnamento della LIS nelle Università Italiane</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cardinaletti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mantovan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Italiano Lingua Seconda. Rivista internazionale di linguistica italiana e educazione linguistica</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="113" to="128" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Iconicity and Productivity in Sign Language Discourse: An Analysis of Three LIS Discourse Registers</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Russo</forename><surname>Cardona</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">of Sign Language Studies</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="164" to="197" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Che giorno è oggi? Prime analisi e riflessioni sull&apos;espressione del tempo in LIS</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ricci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bonsignori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Renzo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IV Convegno Nazionale LIS &apos;La Lingua dei Segni Italiana: una risorsa per il futuro</title>
				<meeting><address><addrLine>Rome</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Poster presentation</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">La morfologia valutativa in LIS: una descrizione preliminare</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fornasiero</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IV Convegno Nazionale LIS &apos;La Lingua dei Segni Italiana: una risorsa per il futuro</title>
				<meeting><address><addrLine>Rome</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Poster presentation</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">L&apos;uso delle Strutture di Grande Iconicità nei testi narrativi segnati: primi dati su bambini prescolari, scolari e adulti</title>
		<author>
			<persName><forename type="first">A</forename><surname>Di Renzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Slonimska</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IV Convegno Nazionale LIS &apos;La Lingua dei Segni Italiana: una risorsa per il futuro</title>
				<meeting><address><addrLine>Rome</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Poster presentation</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Nomi di persona e di luogo nella comunità sorda in Italia: interviste, analisi e primi risultati</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Conte</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IV Convegno Nazionale LIS &apos;La Lingua dei Segni Italiana: una risorsa per il futuro</title>
				<meeting><address><addrLine>Rome</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Poster presentation</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Raniolo</surname></persName>
		</author>
		<title level="m">Interazioni tra oralità e unità segniche: uno studio sulle labializzazioni nella Lingua dei Segni Italiana (LIS)</title>
				<editor>
			<persName><forename type="first">G</forename></persName>
		</editor>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m">Proceedings of the VII Dies Romanicus Turicensis</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Schneider</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Janner</surname></persName>
		</editor>
		<editor>
			<persName><surname>Élie</surname></persName>
		</editor>
		<meeting>the VII Dies Romanicus Turicensis<address><addrLine>Bern</addrLine></address></meeting>
		<imprint>
			<publisher>Peter Lang</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="241" to="258" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">On the Embodiment of Negation in Italian Sign Language: An Approach Based on Multiple Representation Theories</title>
		<author>
			<persName><forename type="first">V</forename><surname>Cuccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Di Stasio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Psychology</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Grammar and Experience: The Interplay Between Language Awareness and Attitude in Italian Sign Language (LIS)</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">the International Journal of Linguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="1" to="18" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">A multilingual dictionary for sign languages: &apos;SpreadTheSign</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hilzensauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Krammer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of ICERI</title>
				<meeting>ICERI<address><addrLine>Seville</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">La raccolta del Corpus LIS</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cecchetto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Giudice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mereghetti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Grammatica, Lessico e Dimensioni di Variazione della LIS</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Cardinaletti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cecchetto</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Donati</surname></persName>
		</editor>
		<meeting><address><addrLine>FrancoAngeli, Milan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="55" to="68" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">The LIS Corpus Project</title>
		<author>
			<persName><forename type="first">C</forename><surname>Geraci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Battaglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cardinaletti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cecchetto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Donati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Giudice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mereghetti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">of Sign Language Studies</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="528" to="571" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">L&apos;Annotazione del Corpus</title>
		<author>
			<persName><forename type="first">M</forename><surname>Santoro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Poletti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Grammatica, Lessico e Dimensioni di Variazione della LIS</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Cardinaletti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cecchetto</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Donati</surname></persName>
		</editor>
		<meeting><address><addrLine>FrancoAngeli, Milan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="69" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">ModDrop: Adaptive Multi-Modal Gesture Recognition</title>
		<author>
			<persName><forename type="first">N</forename><surname>Neverova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Nebout</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="1692" to="1706" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Iterative alignment network for continuous sign language recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4165" to="4174" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Circuits and Systems for Video Technology</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="2822" to="2832" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Sign language recognition, generation, and translation: An interdisciplinary perspective</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bragg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Verhoef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vogler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Koller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bellard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Berke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boudreault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Braffort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Huenerfauth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kacorri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility</title>
				<meeting>the 21st International ACM SIGACCESS Conference on Computers and Accessibility</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="16" to="31" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Isolated Sign Language Recognition with Multi-scale Features using LSTM</title>
		<author>
			<persName><forename type="first">O</forename><surname>Mercanoglu Sincan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Tur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">Yalim</forename><surname>Keles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 27th Signal Processing and Communications Applications Conference (SIU)</title>
				<meeting>the 27th Signal Processing and Communications Applications Conference (SIU)<address><addrLine>Sivas, Turkey</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">A linguistic perspective on radar microdoppler analysis of American sign language</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Z</forename><surname>Gurbuz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Gurbuz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Malaia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Griffin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Crawford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Rahman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aksu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kurtoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mdrafi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anbuselvam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Macks</surname></persName>
		</author>
		<author>
			<persName><surname>Ozcelik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 2020 IEEE International Radar Conference (RADAR)</title>
				<meeting>the 2020 IEEE International Radar Conference (RADAR)<address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="232" to="237" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Sign language/gesture recognition based on cumulative distribution density features using UWB radar</title>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE TIM</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Sign language gesture recognition using Doppler radar and deep learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kulhandjian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 2019 IEEE Globecom Workshops (GC Wkshps)</title>
				<meeting>the 2019 IEEE Globecom Workshops (GC Wkshps)<address><addrLine>Waikoloa, HI, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Sign language recognition with CW radar and machine learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 21st International Radar Symposium (IRS)</title>
				<meeting>the 21st International Radar Symposium (IRS)<address><addrLine>Warsaw, Poland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="31" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Sign language recognition using micro-doppler and explainable deep learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mccleary</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Modeling in Engineering &amp; Sciences</title>
		<imprint>
			<biblScope unit="volume">139</biblScope>
			<biblScope unit="page" from="2399" to="2450" />
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Faster R-CNN: Towards real-time object detection with region proposal networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="1137" to="1149" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">near real-time ASL recognition using a millimeter wave radar</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">O</forename><surname>Adeoluwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Kearney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kurtoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Connors</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Z</forename><surname>Gurbuz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Volume 11742 of Radar Sensor Technology XXV</title>
				<meeting>Volume 11742 of Radar Sensor Technology XXV</meeting>
		<imprint>
			<publisher>SPIE</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Sign Language Recognition for Patient-Doctor Communication: A Multimedia/Multimodal Dataset</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mineo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Caligiore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Spampinato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Palazzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ragonese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE 8th Forum on Research and Technologies for Society and Industry Innovation (RTSI)</title>
				<meeting>the IEEE 8th Forum on Research and Technologies for Society and Industry Innovation (RTSI)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Caligiore</surname></persName>
		</author>
		<title level="m">Codifying the body: exploring the cognitive and socio-semiotic framework in building a multimodal Italian sign language (LIS) dataset</title>
				<meeting><address><addrLine>Catania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
		<respStmt>
			<orgName>University of Catania</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b43">
	<monogr>
		<title level="m" type="main">Corpus Multimodale dell&apos;Italiano Parlato: basi metodologiche per la creazione di un prototipo</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lo Re</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<pubPlace>Florence</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of Florence</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition</title>
		<author>
			<persName><forename type="first">C</forename><surname>Correia De Amorim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Macedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zanchettin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 International Conference on Artificial Neural Networks</title>
				<meeting>the 2019 International Conference on Artificial Neural Networks<address><addrLine>Munich, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="646" to="657" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">Sign language recognition on video data based on graph convolutional network</title>
		<author>
			<persName><forename type="first">Ayas</forename><surname>Faikar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nafis</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Nanik</forename><surname>Suciati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Theoretical and Applied Information Technology</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="4323" to="4333" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">Skeleton aware multi-modal sign language recognition</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><surname>Fu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops</title>
				<meeting>the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="5693" to="5703" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">Deep highresolution representation learning for human pose estimation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="5693" to="5703" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Two-frame motion estimation based on polynomial expansion</title>
		<author>
			<persName><forename type="first">G</forename><surname>Farneback</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="volume">2749</biblScope>
			<biblScope unit="page" from="363" to="370" />
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<analytic>
		<title level="a" type="main">A closer look at spatiotemporal convolutions for action recognition</title>
		<author>
			<persName><forename type="first">D</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Torresani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paluri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="6450" to="6459" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
