A Dynamic Blurring Approach with EfficientNet and LSTM to Enhance Privacy in Video-Based Elderly Fall Detection Ivan Ursul1, Junaid Hussain Muzamal2 1 Ivan Franko National University of Lviv, Lviv, Universitetska, 1, 79090, Ukraine 2 Fast – National University of Computer and Emerging Sciences, Lahore, Pakistan Abstract This research paper introduces a novel approach to address privacy concerns in video-based elderly fall detection systems without compromising such technologies’ efficacy and real-time response. The methodology integrates EfficientNetB0 for robust feature extraction from video sequences and Long Short-Term Memory networks for accurate fall classification. Despite achieving exemplary performance metrics, including 100% scores in accuracy, Area Under the Curve, recall, and Precision, the pervasive issue of privacy infringement in video surveillance remains a significant challenge. To tackle this, we propose a dynamic blurring technique that selectively obscures identifiable features within video frames, such as faces and distinguishing clothing, thus maintaining individual anonymity. This method ensures that the privacy of the monitored individuals is preserved while retaining the essential details necessary for the fall detection algorithm to function effectively. This paper details this privacy- preserving technique and demonstrates its feasibility without detracting from the system’s performance. Our findings indicate that integrating dynamic blurring into the fall detection pipeline offers a promising solution to the privacy concerns associated with video-based monitoring systems. It protects sensitive personal information while providing high care and safety. This research contributes to the broader discourse on ethical technology use in healthcare. Moreover, it emphasizes the importance of balancing advanced monitoring capabilities with the imperative of privacy preservation. Keywords Elderly Fall Detection, Privacy Preservation, Video Surveillance, EfficientNetB0, Long Short-Term Memory (LSTM), Dynamic Blurring, Real-Time Monitoring, Feature Extraction 1 1. Introduction The growing demographic of the elderly population has precipitated an increased incidence of falls [1], a leading cause of morbidity and mortality among this group [2]. The recent technological solutions for fall detection have emerged as a critical component in mitigating these risks [3]. Among these, video-based fall detection systems have shown significant promise due to their non-invasiveness and capability for real-time monitoring [4]. However, video surveillance in healthcare, particularly in homes and care facilities, raises significant privacy concerns [5]. This research aims to find a balance between ensuring safety through surveillance and upholding the right to privacy. There has been a significant relationship between the efficacy of video-based fall detection and the imperative to protect individual privacy [6]. While effective in identifying falls, traditional approaches often overlook the privacy implications of constant video monitoring [7]. Possible solutions include avoiding video data, implementing basic obfuscation techniques [8], compromising effectiveness, or insufficient privacy [9]. Previous research has proposed various methods, including wearable devices [10-13] and environmental sensors [14-16], to circumvent the associated privacy issues. However, these alternatives fall short in accuracy and real-time response capabilities compared to video-based systems. In response to these challenges, this paper proposes an innovative solution that retains the advantages of video surveillance while addressing privacy concerns. Our approach employs dynamic blurring, selectively obscuring identifiable features within video frames. Thus, individuals are anonymized without compromising the system’s ability to detect falls. This CMIS-2024: Seventh International Workshop on Computer Modeling and Intelligent Systems, May 3, 2024, Zaporizhzhia, Ukraine Ivanon2@gmail.com (I. Ursul); junaidhocane6728@gmail.com (J. H. Muzamal) 0009-0002-9879-8008 (I. Ursul); 0009-0007-1598-8161 (J. H. Muzamal); © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings method differs from existing solutions by offering a real-time, privacy-preserving mechanism that does not detract from the system’s performance. Integrating EfficientNetB0 [17] for feature extraction and Long Short-Term Memory (LSTM) [18] networks for fall events ensures high precision in fall detection. This research aims to develop a fall detection system that fulfills the need for efficient, real- time monitoring with the imperative of privacy preservation. Our objectives include designing and implementing a dynamic blurring technique within a video-based fall detection framework. Moreover, we also aim to evaluate this system’s accuracy and privacy protection performance and demonstrate its applicability in real-world settings. This research can potentially contribute to the development of ethically responsible technological solutions in healthcare, particularly in the context of elderly care. This work seeks to pave the way for broader acceptance by addressing the privacy concerns associated with video-based monitoring. Moreover, deploying such systems enhances the safety and well-being of the elderly population. 2. Literature Review The exploration of fall detection systems, particularly for the elderly, is an area of research that has seen substantial evolution over time. Dean et al. [19] 2006 implemented the first real-time fall detection system using a triaxial accelerometer. At that time, most traditional techniques centered around simplistic, mechanical solutions and gradually transitioned towards incorporating technology [20]. Among the earliest methods were basic alert systems, which relied on the user to trigger an alert manually in case of a fall [21]. While pioneering for their time, these systems were limited by their dependence on the users to activate the alarm post-fall, which could be compromised due to injury. Advancements in technology brought in a new wave of methodologies, primarily categorized into sensor-based [14-16], wearable devices [10], [12], and video surveillance systems [22], [23- 25], alongside other innovative approaches. Sensor-based systems often utilize accelerometers and gyroscopes to detect sudden movements or orientations indicative of a fall. Wearable devices, such as smartwatches [26], integrate these sensors and offer portability. However, sensor-based and wearable systems face challenges related to user compliance, discomfort, and the potential for false positives due to non-fall-related abrupt movements [27]. In contrast, video surveillance systems offer a less intrusive alternative, capturing a broader context of the individual’s environment [28]. This method’s appeal lies in its passive nature, requiring no active input or wearables from the monitored individuals. Despite these advantages, video-based systems have challenges [29]. High-quality video processing demands significant computational resources, and managing vast data volumes poses storage and efficiency concerns. Moreover, the critical issue of privacy infringement emerges, given the intrusive nature of continuous video monitoring [9]. Traditional algorithms such as Support Vector Machines (SVMs) [30] and Decision Trees [31] were widely employed in the early stages of machine learning applications for fall detection. These methods primarily relied on handcrafted features extracted from sensor data or basic video analytics, including motion vectors and silhouette shapes. Flow-based methods, particularly optical flow [32], were also prominent, enabling the detection of movement patterns by analyzing the apparent motion of objects, surfaces, and edges. While effective to a certain extent, these approaches faced limitations in handling the high variability and complexity of human falls. They often struggled to distinguish falls from other activities involving rapid movements, leading to high false alarm rates [33], [34]. Additionally, their dependency on manually crafted features restricted their adaptability, as these features might not generalize well across different scenarios. The recent emergence of deep learning architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) has changed many dynamics [35]. Advanced models such as ResNet [36], LSTM [37], and YOLO especially marked a leap forward in fall detection. CNNs, with their ability to perform automatic feature extraction, have proven particularly adept at analyzing spatial characteristics in video frames [38]. While RNNs and LSTMs excel in capturing temporal dependencies, it is crucial to understand the sequence of movements leading to a fall. YOLO [39], an object detection model, brought further advancements by enabling real-time processing. Despite their successes, the search for enhanced performance led to exploring hybrid methods that combine multiple deep learning models. For instance, integrating CNNs with LSTMs allows for the effective processing of video data both spatially and temporally, offering a better understanding of fall events [40]. These hybrid approaches [41], alongside innovative methods within deep learning frameworks, promise to address the dynamics of fall detection [42]. Recent advancements aim to address these privacy concerns while maintaining system efficacy. Techniques such as dynamic blurring and real-time anonymization have been explored to obscure identifiable features in video feeds. This can help safeguard individual privacy without significantly compromising detection capabilities. Despite these efforts, there is a gap in the literature concerning developing a system that seamlessly integrates high detection accuracy with robust privacy protection. Our contribution to this field addresses this gap by proposing a novel fall detection system that employs EfficientNetB0 for advanced feature extraction and LSTM networks for accurate temporal classification, complemented by a dynamic blurring mechanism to ensure privacy. This integrated approach promises high performance, as evidenced by optimal accuracy, recall, and precision scores. Moreover, it introduces a viable solution to the privacy concerns that have long shadowed video-based monitoring systems. By achieving this delicate balance, our research paves the way for the broader acceptance of video- based fall detection systems, ensuring the safety of the elderly population. 3. Methodology This research presents a methodological framework to address the challenge of detecting falls through video surveillance while safeguarding their privacy. The foundation of the proposed approach is mathematical models and techniques to ensure precision, efficiency, and reliability. The proposed method integrates state-of-the-art EfficientNetB0 for spatial feature extraction and LSTM networks for temporal sequence analysis. Additionally, we introduce a dynamic blurring mechanism formulated to preserve privacy by selectively obscuring identifiable features within video frames. Figure 1. provides the overall architecture of the proposed approach. Figure 1. Overall Architecture of the Proposed Methodology 3.1. Dataset and Processing The dataset employed in this study was sourced from the UR Fall Detection Dataset [43], encompassing 70 sequences, of which 30 are fall events, and 40 represent activities of daily living (ADL). The fall events were captured using two Microsoft Kinect cameras, accompanied by accelerometric data, whereas the ADL events were documented using a single camera (camera 0) alongside accelerometer data. The accelerometric data was acquired through PS Move (60Hz) and x-IMU (256Hz) devices. The dataset is structured such that each sequence comprises depth and RGB images from both camera perspectives (parallel to the floor and ceiling-mounted), synchronization data, and raw accelerometer readings. Each video stream is archived separately as a sequence of PNG images. The depth data, stored in PNG16 format, necessitates rescaling to represent depth in millimeters (D) as follows accurately: 𝑉(𝑥,𝑦)⋅𝑆𝑖 𝐷𝑖 (𝑥, 𝑦) = (1) 65535 Where 𝐷𝑖 (𝑥, 𝑦) denotes the depth at position (𝑥, 𝑦) for the ith camera, 𝑉(𝑥, 𝑦) represents the pixel value at position (𝑥, 𝑦) in the PNG16 image, and 𝑆𝑖 is the scale ratio for the i-th camera. The scale ratios are defined as 𝑆0 = 6000 for fall sequences using camera 0, 𝑆1 = 3640 for fall sequences using camera 1, and 𝑆0 = 7000 for ADL sequences using camera 0. The preprocessing of video data involves a series of steps to prepare the frames for feature extraction. Initially, each video is accessed frame by frame using OpenCV’s VideoCapture functionality. Subsequently, each frame is resized to a uniform dimension of 224 × 224 pixels to align with the input requirements of the EfficientNetB0 model. This resizing operation can be mathematically represented as a function R that maps the original frame dimensions to the target dimensions, preserving the aspect ratio and interpolating pixel values as necessary: 𝑅: ℝ𝑤×ℎ×3 → ℝ224×224×3 (2) Where w and ℎ denote the original width and height of the frame, respectively. After resizing, the frames undergo normalization to scale the pixel values to the [0, 1] range, facilitating more stable and efficient model training. The normalization process for a frame F can be defined as: 𝐹 𝐹𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = (3) 225 This operation ensures that each pixel value in the frame is proportionally reduced to a decimal between 0 and 1, thus standardizing the input data for subsequent processing through the EfficientNetB0 architecture. 3.2. Feature Extraction Using EfficientNetB0 The feature extraction component of our methodology is built upon the EfficientNetB0 architecture, a cutting-edge CNN known for its scalability and efficiency. EfficientNetB0 uniformly scales the network’s depth, width, and resolution, optimizing its performance across various constraints. EfficientNetB0 are its convolutional operations, which form the backbone of its feature extraction capabilities. A convolutional operation on an input image or feature map can be mathematically described as: 𝐹𝑜𝑢𝑡 (𝑥, 𝑦) = ∑𝑎𝑖= −𝑎 ∑𝑏𝑗= −𝑏 𝐾 ( 𝑖, 𝑗) ⋅ 𝐹𝑖𝑛 (𝑥 − 𝑖, 𝑦 − 𝑗) (4) Where Fout is the output feature map, Fin is the input image or feature map, K is the kernel or filter of size (2𝑎 + 1) × (2𝑏 + 1), and (𝑥, 𝑦) denotes the pixel coordinates. This operation is applied across the entire input feature map, extracting features through the weighted summation of pixel values within the kernel’s receptive field. EfficientNetB0 also leverages batch normalization to enhance training stability and convergence. Batch normalization can be defined as: 𝑥−𝜇𝐵 𝐵𝑁(𝑥) = 𝛾( )+𝛽 (5) √𝜎2𝐵 +𝜖 Where x is the input to the batch normalization layer, μB and σB2 are the mean and variance of the batch, respectively, γ and β are learnable parameters of the layer, and ϵ is a small constant added for numerical stability. Furthermore, EfficientNetB0 employs depthwise separable convolutions, a technique that reduces computational cost without sacrificing depth or expressivity. A depthwise separable convolution comprises two stages: depthwise and pointwise convolution. The depthwise convolution applies a single filter per input channel, and the pointwise convolution then combines the output channels using a 1×11×1 convolution. This can be represented as: 𝐷𝑊(𝑥, 𝑦, 𝑐) = ∑𝑎𝑖= −𝑎 ∑𝑏𝑗= −𝑏 𝐾𝑐 ( 𝑖, 𝑗) ⋅ 𝐹𝑖𝑛 (𝑥 − 𝑖, 𝑦 − 𝑗, 𝑐) (6) 𝑃𝑊(𝑥, 𝑦, 𝑐́ ) = ∑𝐶𝑐= 1 𝐾́ (𝐶́ ) ⋅ 𝐷𝑊(𝑥, 𝑦, 𝑐) (7) Where DW denotes the output of the depthwise convolution for channel c, PW is the output of the pointwise convolution for channel c′, 𝐾𝑐 is the kernel for the depthwise convolution, and K′ is the 1×11×1 kernel for the pointwise convolution. C is the number of channels. Activation functions such as the Swish function, defined as 𝑓(𝑥) = 𝑥 ⋅ 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑥), are applied after convolutional operations to introduce non-linearity, enabling the network to learn complex features. By integrating these elements, EfficientNetB0 provides a powerful and efficient framework. 3.3. Dynamic Blurring for Privacy Preservation we address privacy concerns in video-based monitoring by implementing dynamic blurring for privacy preservation. This process involves the selective obfuscation of regions of interest (ROI) within video frames, specifically targeting identifiable features of individuals to maintain anonymity while preserving the utility of the data for fall detection. The identification of ROIs for blurring is governed by a detection function 𝐷(𝐹𝑖𝑛 , 𝜃), where 𝐹𝑖𝑛 represents an input frame, and 𝜃 denotes the parameters of the detection model, which may include facial recognition, pose estimation, or other relevant feature detection algorithms. The output of this function is a set of bounding boxes 𝐵 = {𝑏1 , 𝑏2 , . . . , 𝑏𝑛 }, where each 𝑏𝑖 specifies the coordinates and dimensions of an ROI within the frame. The dynamic blurring is then applied to these identified ROIs using a Gaussian blur operation, mathematically described as: 2 𝑥 +𝑦 2 1 − 𝐺(𝑥, 𝑦, 𝜎) = 2 𝑒 2𝜎2 (8) 2𝜋𝜎 Where (𝑥, 𝑦) are the coordinates relative to the center of the kernel, and 𝜎 is the standard deviation, which controls the extent of blurring. The size of the kernel, 𝑘 × 𝑘, is chosen based on the desired level of blurriness, typically set to several times the value of 𝜎 to ensure that the edges of the kernel contribute negligibly to the blur. The application of the Gaussian blur to an ROI 𝑏𝑖 within the frame 𝐹𝑖𝑛 can be represented as: 𝐹𝑏𝑙𝑢𝑟𝑟𝑒𝑑 (𝑥, 𝑦) = (𝐹𝑖𝑛 ∗ 𝐺)(𝑥, 𝑦) = ∑𝑎𝑚= −𝑎 ∑𝑏𝑛= −𝑏 𝐹𝑖𝑛 (𝑥 − 𝑚, 𝑦 − 𝑛) ⋅ 𝐺(𝑚, 𝑛, 𝜎) (9) for all (𝑥, 𝑦) within 𝑏𝑖 , where ∗ denotes the convolution operation, and a and b are half the width and height of the Gaussian kernel, respectively. 3.4. Temporal Analysis with LSTM The LSTM network is a specialized RNN designed to model temporal dependencies in sequence data effectively. Its architecture is uniquely suited to address the vanishing gradient problems, enabling it to capture long-term dependencies. An LSTM unit comprises three main gates: the input gate (i), the forget gate (f), and the output gate (o), each responsible for regulating the flow of information. We utilized bidirectional LSTM with the following structure, as shown in Figure 2. Figure 2. The architecture used for the LSTM Network 3.5. Classification Framework Following the extraction of temporal features, the next step involves classification, which is classified between fall and non-fall events. This process typically involves passing the LSTM output through one or more fully connected layers in a SoftMax layer for binary classification: ℎ𝑧 = 𝑊ℎ ⋅ ℎ𝑡 + 𝑏ℎ (10) 𝑒 𝑧𝑖 𝑝 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) = 𝑧 (11) ∑𝑗 𝑒 𝑗 where ℎ𝑡 is the output from the LSTM at time t, 𝑊ℎ and 𝑏ℎ are the weights and biases for the dense layer, respectively, z is the logit, and p represents the probabilities for each class obtained through the SoftMax function. The class with the highest probability is selected as the predicted class for each input sequence. This framework facilitates the effective classification of video sequences into fall or non-fall categories based on the temporal patterns identified by the LSTM network. 3.6. Training Process The training of the integrated model is underpinned by a mathematical framework that includes the definition of a loss function, the selection of an optimization algorithm, and the application of regularization techniques to prevent overfitting. The loss function L quantifies the discrepancy between the predicted outputs p and the true labels y. For binary classification tasks, such as fall detection, the binary cross-entropy loss is commonly used: 1 𝐿(𝑦, 𝑝) ∑𝑁 𝑖=1[𝑦𝑖 log(𝑝𝑖 ) + (1 − 𝑦𝑖 ) log(1 − 𝑝𝑖 )] (15) 𝑁 Where N is the number of samples, 𝑦𝑖 is the true label, and 𝑝𝑖 is the predicted probability for the i-th sample. The optimization of the model parameters is achieved through Stochastic Gradient Descent (SGD), which iteratively updates the weights W based on the gradients of the loss function: 𝑊𝑡+1 = 𝑊𝑡 − 𝜂𝛻𝐿 (16) Where 𝜂 is the learning rate, and ∇L denotes the gradient of the loss function for the weights at time t. L2 regularization and dropout techniques are applied to mitigate overfitting by adding a penalty term to the loss function or randomly omitting units from the network during training, respectively. The backpropagation process facilitates the computation of gradients ∇L through the network, employing the chain rule to propagate errors from the output layer back through the LSTM and EfficientNetB0 layers, enabling the model to learn and adjust its parameters to minimize the loss function. 4. Results and Analysis Figure 3. (Left) provides ‘Model Accuracy’ plots, which show the proportion of correctly classified instances (accuracy) against the number of epochs for both the training and validation datasets. It is observed that the training accuracy shows a consistent upward trend, indicating that the model is learning and improving its performance on the training data as the epochs progress. While generally following an upward trajectory, the validation accuracy exhibits some fluctuations. This could indicate the model’s encounters with challenging or previously unseen data in the validation set. These expected fluctuations indicate how the model might perform when exposed to new data. It is worth noting that both the training and validation accuracies converge to high values close to 1.0, suggesting that the model has achieved a high level of proficiency in distinguishing between fall and non-fall events. Figure 3. (Right) graph shows the model’s loss over the same number of epochs. For the training set, the loss decreases sharply and continues to decline steadily, which is typical behavior as the model adjusts its weights to minimize the prediction error. For the validation set, the loss decreases in tandem with the training loss but with notable spikes at specific points. These spikes often signify that the model made predictions significantly off the actual labels for some batches in the validation set. This can happen if the model encounters data points that differ from the learned patterns during training. Figure 3. Accuracy and Loss graphs of the proposed LSTM model The confusion matrix in Figure 4. provides a quantitative assessment of the model’s classification accuracy. It shows the number of true positive (TP) and true negative (TN) predictions, along with false positive (FP) and false negative (FN) predictions. In this case, the matrix reveals perfect classification on the test data, with all fall events being correctly identified (6 TP) and no ADL events being misclassified as falls (0 FN), implying an exceptional level of model performance. Figure 4. Confusion metric of unseen test videos The Receiver Operating Characteristic (ROC) curve in Figure 5. plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The area under the curve (AUC) in this ROC curve approaches 1, which suggests excellent model performance, with a high true positive rate and a low false positive rate across threshold values. Figure 5. ROC curve of the proposed model on unseen test data Figure 6 shows the model’s performance on real test videos; the model correctly predicted this particular scenario as a ‘Fall,’ corroborated by the RGB image on the right. This clearly shows an individual in a prone position on the floor. Moreover, the depth image reveals the successful application of the dynamic blurring method. The individual’s features are indistinguishable, and the privacy-preserving objective of the method is evident. The contours and the general posture of the person are discernible, which is sufficient for fall detection purposes, but the finer details necessary for personal identification have been effectively obfuscated. The blurring technique implemented in the system is designed to activate upon detecting a human figure within the video frame, applying a Gaussian blur where the person is detected. This ensures that any potentially sensitive information is rendered non-identifiable, addressing privacy concerns paramount in real-world applications of surveillance-based systems. The obscured depth image confirms that the privacy-preserving measures do not impede the algorithm’s ability to detect a fall. Figure 6. Model evaluation on the test video from the fall folder Figure 7 shows the system’s prediction for this scene, labeled ‘ADL,’ which is validated by the RGB image on the right. It depicts an individual in an upright position, supporting the prediction that no fall has occurred. The prediction’s accuracy is a testament to the model’s ability to effectively discern between falls and non-fall events. Furthermore, similar to the previous fall scenario, the depth image demonstrates the application of the dynamic blurring technique. The individual’s detailed features are indistinct, ensuring privacy is maintained. Despite the blurring, essential characteristics for ADL recognition, such as the vertical orientation of the body and the absence of unusual postures associated with falls, are preserved and remain detectable by the system. Figure 7. Model Evaluation on the test video from the ADL folder The analysis of the presented results underscores the robustness and reliability of the implemented fall detection model. This is evidenced by the convergence of the accuracy and loss metrics, the unequivocal classification outcomes depicted in the confusion matrix, and the favorable diagnostic characteristics portrayed by the ROC curve. These results collectively affirm the model’s efficacy in accurately detecting fall events. It preserves the privacy of individuals through dynamic blurring, as no identifiable features are discernible in the depth visualizations. Table 1. Comparative analysis of results with other state-of-the-art models Model/Study TPR TNR Accuracy Eltahir et al. [40] 95.88 97.02 97.56 Chan Su [38] 98.07 99.03 98.06 Single stream (RGB) [22] 100 96.61 96.99 Single stream (OF) [22] 100 96.34 96.75 Multi-stream (RGB+OF+PE) [22] 100 98.61 98.77 EfficientNet-B0 [41] 93.33 100 97.14 Improved YOLOv5s [39] – – 97.2 A single-frame human binary image with – – 96.7 YOLOv5s [42] Our Method 100 100 100 As shown in Table 1, the comparative analysis of fall detection methodologies yields a substantive understanding of the advancements and varying efficacies of diverse approaches in this research domain. The table encapsulates the True Positive Rate (TPR), True Negative Rate (TNR), and overall Accuracy, serving as pivotal metrics for the assessment of each method. The model presented by Eltahir et al. [40] manifests a commendable balance between sensitivity and specificity, with a TPR of 95.88% and a TNR of 97.02%, culminating in an accuracy of 97.56%. Chan Su’s model slightly improves sensitivity at 98.07% and specificity at 99.03%, with an analogous accuracy of 98.06%. These two models set a robust baseline in fall detection, evidencing high efficacy. The single-stream models using RGB and Optical Flow (OF) data individually attain a TPR of 100%, indicative of their flawless identification of fall events. However, their specificity scores, 96.61% and 96.34%, respectively, although high, suggest a slightly less robust capacity to classify non-fall activities accurately. This slight discrepancy is reflected in their accuracy scores, which, while impressive at 96.99% and 96.75%, do not reach the pinnacle of Chan Su’s model. Figure 8. Comparison of Accuracy with other state-of-the-art models The multi-stream approach amalgamating RGB, OF, and Pose Estimation (PE) data represents a significant leap forward, yielding a perfect TPR and an enhanced TNR of 98.61%, leading to an accuracy of 98.77%. This approach underscores the utility of integrating multiple data streams for improved specificity without compromising sensitivity. EfficientNet-B0, despite a lower TPR of 93.33%, achieves a perfect TNR of 100%. This accentuates the model’s exceptional performance in identifying non-fall events, though it falls short of the multi-stream model’s balanced accuracy. The improved YOLOv5s model and the single frame human binary image approach using YOLOv5s do not disclose TPR or TNR but report accuracies of 97.2% and 96.7%, respectively. While these figures suggest competent models, the lack of detailed TPR and TNR data precludes a complete comparative analysis. Our proposed methodology establishes a new benchmark, recording a flawless TPR and TNR of 100% and an unmatched accuracy of 100%. This unprecedented performance indicates a superior ability to correctly identify fall incidents and an unparalleled precision in confirming non-fall activities. 5. Conclusion In this research, we have successfully developed and evaluated a novel video-based fall detection system that prioritizes privacy without compromising the real-time detection efficacy of elderly falls. By integrating EfficientNetB0 and LSTM networks, our methodology ensures robust feature extraction and accurate fall event classification. The introduction of dynamic blurring as a privacy-preserving technique represents a significant advancement, allowing for anonymizing identifiable features within video frames while maintaining the system’s operational integrity. Our findings reveal that this approach achieves perfect accuracy, recall, precision, and area AUC scores. It also effectively addresses the critical privacy concerns of video surveillance in sensitive environments such as homes and elderly care facilities. Implementing dynamic blurring ensures that the privacy of monitored individuals is safeguarded, setting a new precedent in the ethical application of surveillance technologies in healthcare. Our future research will focus on further enhancing the adaptability and generalizability of the system across diverse settings and populations. This includes exploring additional privacy- preserving mechanisms and integrating multimodal data sources to enrich the system’s contextual understanding. This research contributes significantly to elderly care technology, presenting a practical solution to the long-standing challenge of balancing effective fall detection with stringent privacy requirements. Our work advances the technological capabilities in this domain and addresses critical ethical considerations. This paves the way for broader acceptance and deployment of video-based monitoring systems in healthcare settings. References [1] W. W. Fu, T. S. Fu, R. Jing, S. R. McFaull, and M. D. Cusimano, “Predictors of falls and mortality among elderly adults with traumatic brain injury: a nationwide, population-based study,” PloS One, vol. 12, no. 4, p. e0175868, 2017. [2] A. Kehoe, J. E. Smith, A. Edwards, D. Yates, and F. Lecky, “The changing face of major trauma in the UK,” Emerg. Med. J., vol. 32, no. 12, pp. 911–915, 2015. [3] S. K. Gharghan and H. A. Hashim, “A comprehensive review of elderly fall detection using wireless communication and artificial intelligence techniques,” Measurement, p. 114186, 2024. [4] D. Egeonu and B. Jia, “A systematic literature review of computer vision-based biomechanical models for physical workload estimation,” Ergonomics, pp. 1–24, Jan. 2024. [5] P. Khatiwada, B. Yang, J.-C. Lin, and B. Blobel, “Patient-Generated Health Data (PGHD): Understanding, Requirements, Challenges, and Existing Techniques for Data Security and Privacy,” J. Pers. Med., vol. 14, no. 3, p. 282, 2024. [6] M. Qaraqe et al., “PublicVision: A Secure Smart Surveillance System for Crowd Behavior Recognition,” IEEE Access, vol. 12, pp. 26474–26491, 2024. [7] R. Rajagopalan, I. Litvan, and T.-P. Jung, “Fall prediction and prevention systems: recent trends, challenges, and future research directions,” Sensors, vol. 17, no. 11, p. 2509, 2017. [8] V. Mehta, A. Dhall, S. Pal, and S. S. Khan, “Motion and region aware adversarial learning for fall detection with thermal imaging,” in 2020 25th international conference on pattern recognition (ICPR), IEEE, 2021, pp. 6321–6328. [9] S. Ravi, P. Climent-Pérez, and F. Florez-Revuelta, “A review on visual privacy preservation techniques for active and assisted living,” Multimed. Tools Appl., vol. 83, no. 5, pp. 14715– 14755, Jul. 2023, doi: 10.1007/s11042-023-15775-2. [10] P. Pierleoni, A. Belli, L. Palma, M. Pellegrini, L. Pernini, and S. Valenti, “A high reliability wearable device for elderly fall detection,” IEEE Sens. J., vol. 15, no. 8, pp. 4544–4553, 2015. [11] Alrasheedy, M. N., Muniyandi, R. C., & Fauzi, F. (2022, October). Text-Based Emotion Detection and Applications: A Literature Review. In 2022 International Conference on Cyber Resilience (ICCR) (pp. 1-9). IEEE. [12] Kwong, A., Muzamal, J. H., & Khan, Z. (2022, November). Privacy Pro: Spam Calls Detection Using Voice Signature Analysis and Behavior-Based Filtering. In 2022 17th International Conference on Emerging Technologies (ICET) (pp. 184-189). IEEE. [13] E. Torti et al., “Embedded real-time fall detection with deep learning on wearable devices,” in 2018 21st euromicro conference on digital system design (DSD), IEEE, 2018, pp. 405–412. [14] Y. S. Delahoz and M. A. Labrador, “Survey on fall detection and fall prevention using wearable and external sensors,” Sensors, vol. 14, no. 10, pp. 19806–19842, 2014. [15] S. Nooruddin, M. M. Islam, F. A. Sharna, H. Alhetari, and M. N. Kabir, “Sensor-based fall detection systems: a review,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 5, pp. 2735– 2751, 2022. [16] A. Singh, S. U. Rehman, S. Yongchareon, and P. H. J. Chong, “Sensor technologies for fall detection systems: A review,” IEEE Sens. J., vol. 20, no. 13, pp. 6889–6919, 2020. [17] B. Koonce, “EfficientNet,” in Convolutional Neural Networks with Swift for Tensorflow, Berkeley, CA: Apress, 2021, pp. 109–123. doi: 10.1007/978-1-4842-6168-2_10. [18] A. Graves, “Long Short-Term Memory,” in Supervised Sequence Labelling with Recurrent Neural Networks, vol. 385, in Studies in Computational Intelligence, vol. 385. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 37–45. doi: 10.1007/978-3-642-24797- 2_4. [19] D. M. Karantonis, M. R. Narayanan, M. Mathie, N. H. Lovell, and B. G. Celler, “Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 1, pp. 156–167, 2006. [20] A. K. Bourke, J. V. O’brien, and G. M. Lyons, “Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm,” Gait Posture, vol. 26, no. 2, pp. 194–199, 2007. [21] D. A. Ganz, Y. Bao, P. G. Shekelle, and L. Z. Rubenstein, “Will my patient fall?,” Jama, vol. 297, no. 1, pp. 77–86, 2007. [22] S. A. Carneiro, G. P. da Silva, G. V. Leite, R. Moreno, S. J. F. Guimaraes, and H. Pedrini, “Multi- stream deep convolutional network using high-level features applied to fall detection in video sequences,” in 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), IEEE, 2019, pp. 293–298. Accessed: Mar. 18, 2024. [23] N. Kaur, S. Rani, and S. Kaur, “Real-time video surveillance based human fall detection system using hybrid haar cascade classifier,” Multimed. Tools Appl., pp. 1–19, 2024. [24] A. Núñez-Marcos and I. Arganda-Carreras, “Transformer-based fall detection in videos,” Eng. Appl. Artif. Intell., vol. 132, p. 107937, 2024. [25] J. Moore et al., “Contextualizing remote fall risk: Video data capture and implementing ethical AI,” NPJ Digit. Med., vol. 7, no. 1, p. 61, 2024. [26] V. Fula and P. Moreno, “Wrist-Based Fall Detection: Towards Generalization across Datasets,” Sensors, vol. 24, no. 5, p. 1679, 2024. [27] A. Bansal, R. Sharma, and M. Kathuria, “A Vision-Based Approach to Enhance Fall Detection with Fine-Tuned Faster R-CNN,” in 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech), IEEE, 2023, pp. 678–684. [28] J. Gutiérrez, V. Rodríguez, and S. Martin, “Comprehensive review of vision-based fall detection systems,” Sensors, vol. 21, no. 3, p. 947, 2021. [29] S. Ezatzadeh and M. R. Keyvanpour, “Fall detection for elderly in assisted environments: Video surveillance systems and challenges,” in 2017 9th international conference on information and knowledge technology (ikt), IEEE, 2017, pp. 93–98. Accessed: Mar. 19, 2024. [30] I. Charfi, J. Miteran, J. Dubois, M. Atri, and R. Tourki, “Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost- based classification,” J. Electron. Imaging, vol. 22, no. 4, pp. 041106–041106, 2013. [31] F.-Y. Leu, C.-Y. Ko, Y.-C. Lin, H. Susanto, and H.-C. Yu, “Fall detection and motion classification by using decision tree on mobile phone,” in Smart Sensors Networks, Elsevier, 2017, pp. 205–237. Accessed: Mar. 19, 2024. [32] Y.-Z. Hsieh and Y.-L. Jeng, “Development of home intelligent fall detection IoT system based on feedback optical flow convolutional neural network,” Ieee Access, vol. 6, pp. 6048– 6057, 2017. [33] P. Vallabh and R. Malekian, “Fall detection monitoring systems: a comprehensive review,” J. Ambient Intell. Humaniz. Comput., vol. 9, no. 6, pp. 1809–1833, 2018. [34] R. Igual, C. Medrano, and I. Plaza, “Challenges, issues and trends in fall detection systems,” Biomed. Eng. OnLine, vol. 12, no. 1, p. 66, 2013, doi: 10.1186/1475-925X-12-66. [35] Y. Fan, M. D. Levine, G. Wen, and S. Qiu, “A deep neural network for real-time detection of falling humans in naturally occurring scenes,” Neurocomputing, vol. 260, pp. 43–58, 2017. [36] D. Singh, M. Gupta, and R. Kumar, “BGR Images-Based Human Fall Detection Using ResNet- 50 and LSTM,” in Third Congress on Intelligent Systems, vol. 608, S. Kumar, H. Sharma, K. Balachandran, J. H. Kim, and J. C. Bansal, Eds., in Lecture Notes in Networks and Systems, vol. 608. , Singapore: Springer Nature Singapore, 2023, pp. 175–186. doi: 10.1007/978-981-19- 9225-4_14. [37] N. Lu, Y. Wu, L. Feng, and J. Song, “Deep learning for fall detection: Three-dimensional CNN combined with LSTM on video kinematic data,” IEEE J. Biomed. Health Inform., vol. 23, no. 1, pp. 314–323, 2018. [38] C. Su, J. Wei, D. Lin, L. Kong, and Y. L. Guan, “A novel model for fall detection and action recognition combined lightweight 3D-CNN and convolutional LSTM networks,” Pattern Anal. Appl., vol. 27, no. 1, pp. 1–16, 2024. [39] T. Chen, Z. Ding, and B. Li, “Elderly fall detection based on improved YOLOv5s network,” IEEE Access, vol. 10, pp. 91273–91282, 2022. [40] M. M. Eltahir et al., “Deep Transfer Learning-Enabled Activity Identification and Fall Detection for Disabled People.,” Comput. Mater. Contin., vol. 75, no. 2, 2023, Accessed: Mar. 18, 2024. [41] S. Hwang, M. Ki, S.-H. Lee, S. Park, and B.-K. Jeon, “Cut and continuous paste towards real- time deep fall detection,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 1775–1779. Accessed: Mar. 18, 2024. [42] Y. Wang, R. Song, and X. Zhang, “Real-time human fall recognition based on deep learning methods and single depth image with privacy requirements,” in 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC), IEEE, 2022, pp. 1548–1553. Accessed: Mar. 18, 2024. [43] B. Kwolek and M. Kepski, “Human fall detection on embedded platform using depth maps and wireless accelerometer,” Comput. Methods Programs Biomed., vol. 117, no. 3, pp. 489– 501, 2014.