<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (S. Yakovlev);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hybrid deep learning model for deception detection in healthcare audio data⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Yakovlev</string-name>
          <email>sergiy.yakovlev@p.lodz.pl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Khovrat</string-name>
          <email>artem.khovrat@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitalii Volokhovskyi</string-name>
          <email>vitalii.volokhovskyi@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kobziev</string-name>
          <email>volodymyr.kobziev@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Nazarov</string-name>
          <email>oleksii.nazarov1@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14, Nauky, Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lodz University of Technology</institution>
          ,
          <addr-line>90-924 Lodz</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>V.N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>4, Svobody, Sq., Kharkiv, 61022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper investigates a hybrid deep learning model for detecting deception in healthcare audio data, addressing medical information falsification within insurance-based systems. A comprehensive approach transforms acoustic signals from patient-provider communications into structured representations suitable for linguistic analysis. The research proposes an integrated framework combining convolutional neural networks with bidirectional LSTM networks enhanced with attention mechanisms. The methodology includes multi-stage audio-to-text transformation with lexical analysis, statistical feature extraction, and a modified Apriori algorithm for identifying suspicious linguistic patterns. The hybrid RCNN architecture is evaluated against baseline methodologies including RNN, CNN, and Naive Bayes classifiers on medical audio datasets comprising doctor-patient communications and daily health checks. Results demonstrate 97% classification accuracy while maintaining computational efficiency, substantially outperforming alternative architectures. The hybrid approach exhibits superior discrimination by integrating local feature extraction with temporal sequence analysis, capturing both linguistic anomalies and contextual inconsistencies in manipulation attempts. Cross-dataset analysis reveals consistent performance across communication types with accuracy variation below 0.04. The findings demonstrate promising implementation prospects for smart healthcare monitoring systems where fraud detection is critical, particularly in resource-constrained environments. The study contributes to understanding hybrid neural architectures for deception detection and highlights research directions for enhancing operational capabilities in healthcare fraud prevention systems.</p>
      </abstract>
      <kwd-group>
        <kwd>benchmarking</kwd>
        <kwd>cache efficiency</kwd>
        <kwd>parallelized systems</kwd>
        <kwd>performance optimization</kwd>
        <kwd>spatial locality 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Modern healthcare monitoring tools include information recording systems between patients and
medical personnel, particularly during consultations with doctors, aimed at preserving patient
treatment history, improving the quality of medical service delivery, and simplifying documentation
management. Given the social orientation of the sector, arises a problem of falsification of patient
anamnesis and current condition to obtain unlawful benefits through the prescription of expensive
treatment at the cost of the state or insurance companies.</p>
      <p>
        The problem can be analyzed using video data; however, video recordings of doctor appointments
or conversations between medical staff and patients are not widespread in the industry and are
typically used only in telemedicine. Additionally, deepfake detection technologies, both contextual
and general, require substantial computational resources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The most used approach is the analysis
of textual information [2-4]. However, it conveys only a limited amount of information about the
patient while being filtered through the perception of medical personnel. Therefore, the decision was
made to analyze audio recordings, which is a common approach in the medical field and fully
conveys the interaction between patient and doctor.
      </p>
      <p>
        Given the volume of such interactions, relying exclusively on human resources for analysis and
detection of data falsification is not a viable, especially for Ukraine under conditions of military
conflict, where there is an increased need for medical services. Furthermore, outside the medical
field, data falsification detection tools have already demonstrated their effectiveness [
        <xref ref-type="bibr" rid="ref2">5</xref>
        ].
      </p>
      <p>
        At an international level, various anti-deception initiatives show promising advancements [
        <xref ref-type="bibr" rid="ref3">6</xref>
        ].
Computational verification tools spanning browser extensions and content analysis platforms have
emerged to flag suspicious text patterns and identify manipulations across multimedia channels.
Academic literature documents diverse analytical approaches within research communities
worldwide. Significant developments include MIT-led initiatives and Ukrainian research groups
examining acoustic tampering through machine learning systems [
        <xref ref-type="bibr" rid="ref4 ref5">7, 8</xref>
        ], alongside international
collaborations employing probability-based frameworks [
        <xref ref-type="bibr" rid="ref6">9</xref>
        ].
      </p>
      <p>
        Despite these innovations, most existing detection systems require substantial computational
resources and extensive labeled datasets to achieve acceptable accuracy, presenting significant
operational barriers [
        <xref ref-type="bibr" rid="ref7 ref8">10, 11</xref>
        ]. Within the constrained processing environments, such requirements
create substantial implementation challenges, limiting practical deployment within medical
analytical systems.
      </p>
      <p>
        It should be noted that within the scope of this work, generative data fabrication using modern
artificial intelligence tools will not be considered, as it can only be observed in telemedicine and
requires separate instruments for detecting falsification [
        <xref ref-type="bibr" rid="ref10 ref9">12, 13</xref>
        ].
      </p>
      <p>
        To address the problem, a specialized classification model is proposed. It integrates audio-to-text
conversion capabilities and distributed processing, optimized for resource-efficient detection of
health information falsification in medical data. The approach incorporates a hybrid architecture
combining specialized convolutional neural networks with bidirectional networks with enhanced
memory into an integrated analytical model (RCNN). Previous studies explored the efficiency of
multiple modalities [
        <xref ref-type="bibr" rid="ref12">14</xref>
        ] and validated similar approaches in detecting anomalies in medical
communication analysis [
        <xref ref-type="bibr" rid="ref8">11</xref>
        ]. Performance is compared with established baseline methodologies,
including conventional recurrent networks (RNN), convolutional systems (CNN), and probabilistic
classification models (especially naïve Bayes classifier NBC).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Linguistic manipulation indicators</title>
      <p>Irrespective of technique, audio manipulation fundamentally seeks to alter information perception
to achieve specific objectives. Through systematic analysis of manipulated healthcare
communications, several key linguistic markers have been identified that frequently signal content
tampering:
•
•
•
•</p>
      <p>Strategic questioning patterns: Manufactured content often employs rhetorical questions to
create false uncertainty, particularly in communications carrying social significance.
Medical authority questioning patterns: Manufactured content often employs rhetorical
questions to create false uncertainty.</p>
      <p>Manipulated sentiment markers: Tampered content typically shows inconsistent emotional
signaling, replacing moderate terminology with extreme descriptors.</p>
      <p>Artificial emotional escalation: Fabricated messages frequently feature emotionally charged
terminology with motivation-based language, creating unnatural intensity shifts.</p>
      <p>Narrative fractures: Manipulated content commonly exhibits subtle structural inconsistencies
and logical contradictions.</p>
      <p>Atypical pronoun distribution: Manipulated content often displays statistically unusual
pronoun concentrations that attempt to mimic specific communication styles (particularly
journalistic conventions).</p>
      <p>Lexical discontinuities: Tampered recordings typically contain non-standard phrasing,
unusual transitional elements, and distinctive vocabulary shifts that signal content
boundaries.</p>
      <p>
        This analytical framework represents an evolving understanding rather than an exhaustive
model. Fabricated content frequently features condensed syntactical structures alongside various
linguistic anomalies [
        <xref ref-type="bibr" rid="ref13">15</xref>
        ]. Such elements contribute to classification complexity and influence
detection system calibration requirements. These characteristics may stem from inadequate
transcription accuracy, regional dialect particularities, or speaker-specific patterns including
codeswitching behaviors.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology development</title>
      <p>Having established key manipulation indicators, this section outlines the detection approach and
implementation architecture.</p>
      <sec id="sec-3-1">
        <title>3.1. Audio-to-Text transformation</title>
        <p>A specialized transformation pipeline was developed to convert audio data into computational
representations suitable for deep analysis:</p>
        <p>
          Lexical Analysis: Input signals undergo speech-to-text conversion followed by tokenization,
stemming, and morphological normalization to create standardized linguistic units [
          <xref ref-type="bibr" rid="ref14">16</xref>
          ].
Statistical Feature Extraction: Text segments undergo feature extraction using term
frequency analysis with BM25 weighting to identify distributional anomalies [17], sentiment
intensity measurement using customized NLTK-based tools, contextual coherence metrics
measuring narrative consistency, and temporal pattern analysis examining cadence and
rhythm disruptions.
        </p>
        <p>Manipulation Likelihood Calculation: Extracted features are compared against established
deception patterns using a specialized scoring model that generates a normalized
manipulation probability score.</p>
        <p>This primary pipeline was supplemented with additional analytical components:
•
•
•</p>
        <p>Thematic Pattern Recognition: A modified Apriori algorithm was implemented to identify
suspicious combinations of topics and terminology that frequently audio deception attempts.
Communication Context Classification: Audio segments are categorized into functional
types to enable context-appropriate medical analysis.</p>
        <p>Transcription Quality Assessment: The system analyzes transcription confidence scores to
modulate classification thresholds based on input quality.</p>
        <p>The Apriori algorithm selection reflects its computational efficiency, implementational flexibility,
and parallelization potential. Though originally designed for market basket analysis, this framework
was adapted for linguistic pattern identification through substantial modifications. The core
algorithm leverages the monotonicity principle in frequent pattern mining, where any subset of a
frequent pattern must also be frequent, enabling efficient candidate pruning.</p>
        <p>The modified implementation comprises four main components:
•
•
•
•</p>
        <p>Frequency Analysis: The system calculates support values S(I) = count(I)/n for linguistic
elements, where n represents sentence count. Only elements exceeding min threshold
(empirically set at 0.15) continue to subsequent stages.</p>
        <p>Pattern Generation: The algorithm creates k+1 element combinations from k-element
patterns exceeding support thresholds. The implementation employs hash-based
acceleration techniques that reduced computational overhead by 47% compared to
conventional approaches.</p>
        <p>Association Rule Formation: The system generates statistical relationships from frequent
patterns based on confidence thresholds. Rules exceeding 0.75 confidence (determined
through cross-validation) are preserved. Additional metrics including conviction and lift
were incorporated to better evaluate rule significance.</p>
        <p>Pattern Prioritization: The system ranks identified patterns using a composite scoring
function combining support, confidence and context-specific relevance metrics.</p>
        <p>The implementation includes several performance enhancements including incremental database
reduction techniques and distributed processing using MapReduce frameworks. Benchmark testing
demonstrated 5.7x acceleration compared to sequential processing when analyzing large linguistic
datasets.</p>
        <p>The modified Apriori algorithm processes tokenized transcripts through the following
pseudocode implementation (Figure 1).</p>
        <p>The algorithm complexity is O(n×m×k²) where n is sentence count, m is average sentence length,
and k is maximum pattern length. Apriori-derived features are concatenated with CNN-extracted
features before entering the BiLSTM layer, creating a 96-dimensional combined feature vector (64
from CNN + 32 from Apriori patterns). Support and confidence thresholds were determined through
grid search over ranges [0.10-0.25] and [0.65-0.85] respectively, evaluated using 3-fold
crossvalidation on the training set. Sensitivity analysis showed ±0.03 accuracy variation within ±0.05
threshold adjustments, confirming reasonable stability.</p>
        <p>Following input processing, the next section examines the neural architecture for pattern
recognition and classification.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Neural network architecture</title>
        <p>The classification system employs a hybrid architecture combining feature extraction pathways with
temporal sequence analysis capabilities. This approach capitalizes on the complementary strengths
of different neural processing approaches convolutional networks excel at identifying local
patterns and feature hierarchies, while recurrent networks capture sequential dependencies across
time steps.</p>
        <p>The architecture follows a multi-stream design illustrated in Figure 2. The system processes input
data through several coordinated stages:
•
•
•
•</p>
        <p>Feature Extraction: Linguistic embeddings first pass through three cascaded convolutional
max pooling operations. This pathway progressively extracts increasingly abstract linguistic
features while reducing dimensionality from 300D word vectors to 64D feature
representations. This dimensional reduction addresses computational efficiency constraints
critical for deployment in resource-limited environments.</p>
        <p>Temporal Context Analysis: Processed features enter a bidirectional LSTM layer (128 units
per direction) with dropout regularization (0.3). This bidirectional approach enables
simultaneous analysis of preceding and subsequent contextual elements, capturing
dependencies that would be missed by unidirectional processing. Unlike transformer-based
approaches that require substantial computational resources, the BiLSTM implementation
achieves effective temporal modeling while maintaining deployment feasibility.</p>
        <p>Attention-Based Integration: An attention mechanism weighs the relative importance of
different sequence elements based on their contextual relevance, focusing computational
resources on the most informative segments. This approach particularly enhances
performance for longer audio sequences with varying information density.</p>
        <p>Classification Layers: The network concludes with two
fullyneurons) using ReLU activation and a final softmax classification layer that outputs
manipulation probability scores.</p>
        <p>Training employed consistent random seeds (42, 123, 456) across five independent runs to ensure
reproducibility. Stratified 5-fold cross-validation was applied on the training set for hyperparameter
selection, with the final test set (20%) held out completely until model selection was complete. The</p>
        <p>
          The system processes input data through several coordinated stages: Hyperparameter
optimization employed Bayesian search methods guided by previous research findings [
          <xref ref-type="bibr" rid="ref1">1, 2</xref>
          ]. Key
configuration decisions included:
•
•
•
•
        </p>
        <p>Convolutional Kernel Size: Optimal performance achieved with 5×5 kernels after evaluating
sizes ranging from 3×3 to 7×7.</p>
        <p>Learning Strategy: Adam optimizer with initial rate 0.001 and exponential decay schedule
Mini-batch Size: Optimal throughput-accuracy balance at 64 samples.</p>
        <p>Training Duration Control: Early stopping with 10-epoch patience, typically converging
between 30-50 epochs.</p>
        <p>The loss function incorporated class weighting to reflect operational priorities, with 2.5× penalty
for false negatives (missed manipulations) compared to false positives. The complete model contains
approximately 2.3 million trainable parameters - substantially fewer than transformer-based
alternatives while maintaining competitive performance characteristics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental design and evaluation</title>
      <p>To validate the approach, a comprehensive testing framework was created encompassing both
methodological validation and comparative performance analysis.</p>
      <sec id="sec-4-1">
        <title>4.1. Data selection and preparation</title>
        <p>Two distinct communication datasets were utilized for system evaluation:
•
•</p>
        <p>Doctor-Patient Communications Dataset: A specialized focus group recorded simulated
healthcare communications about treatment recommendations, medication advice, and
health guidance scenarios. This corpus features informal speech patterns, specialized medical
terminology, and non-standard linguistic constructions mimicking patient-provider
communications.</p>
        <p>Daily Health Checks Dataset: A dataset contains recordings of conversations during daily
patient check-ups done by nurses. Conversation involves observing for any changes in
patient's condition, assessing their emotional as well as physical state, providing emotional
support and clarifications about treatment.</p>
        <p>The Doctor-Patient Communications dataset comprises 2,847 audio recordings with total
duration of 127.3 hours, including 1,423 manipulated samples and 1,424 genuine communications.
The dataset includes 156 unique speakers (82 female, 74 male) with age distribution reflecting typical
patient demographics. Manipulated scenarios were created through scripted simulations where
actors deliberately incorporated deceptive linguistic patterns validated by medical fraud
investigators, while genuine communications were recorded from standardized medical role-play
exercises. The Daily Health Checks dataset contains 1,956 recordings totaling 84.6 hours, with 978
manipulated and 978 genuine samples from 94 speakers (51 female, 43 male). Both datasets
underwent independent labeling by three medical professionals with inter-annotator = 0.82.</p>
        <p>Each dataset underwent initial processing through the Google Speech-to-Text API for conversion
to text format. Datasets were divided into training (80%) and evaluation (20%) segments using
stratified sampling methods to maintain representative class distribution.</p>
        <p>Implementation utilized Python 3.10 with specialized libraries including TensorFlow 2.9 for
neural network development and training, NumPy 1.24 for numerical computations and array
processing, NLTK 3.8 for natural language processing tasks including tokenization and linguistic
feature extraction, and Polars 0.18 for high-performance data manipulation and preprocessing of
large audio datasets. The audio processing pipeline incorporated librosa 0.10 for signal processing
and feature extraction, while scikit-learn 1.3 provided additional machine learning utilities for
baseline comparisons and evaluation metrics. System integration employed Kubernetes-based
orchestration for distributed processing capability, enabling horizontal scaling across multiple
computing nodes to handle large-volume audio analysis workloads. The deployment architecture
utilized Docker containerization for consistent environment management and Redis for distributed
caching of preprocessed features, reducing computational overhead during model inference phases.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Speech-to-Text Quality and Error Impact</title>
        <p>Audio-to-text conversion employed Google Speech-to-Text API with language model optimized for
Ukrainian medical terminology. Manual validation by native Ukrainian speakers with medical
transcription experience revealed STT (Speech-to-Text) accuracy of 94.7% for standard speech
patterns and 87.2% for dialectical variations. Common transcription errors included medical
terminology misrecognition (23% of errors), proper name confusion (18%), and dialect-specific
phonetic variations (31%).</p>
        <p>To assess STT error impact on downstream classification, we conducted robustness analysis by
artificially degrading transcription quality. The RCNN architecture-maintained accuracy above 92%
even with 15% word error rate, demonstrating resilience to transcription imperfections. Performance
degradation became pronounced only when STT confidence scores fell below 0.65, at which point
the system automatically flags recordings for manual review. The attention mechanism proved
particularly valuable in mitigating STT errors by focusing on high-confidence segments while
downweighting uncertain transcriptions. Classification errors correlated strongly with segments
having mean STT confidence below 0.70 (r = -0.67, p &lt; 0.01).</p>
        <p>Systematic evaluation of transcription quality influence employed controlled degradation
experiments with synthetic STT errors at varying rates. Classification accuracy remained robust:
96.1% at 5% word error rate, 94.3% at 10% WER, and 92.7% at 15% WER. Critical threshold occurred
at approximately 20% WER where accuracy dropped to 88.4%. Analysis revealed errors affecting
content words degraded performance 2.3× more than function word errors, while medical
terminology misrecognitions produced 3.1× higher impact per affected word. The attention
mechanism partially mitigates STT errors by dynamically downweighting low-confidence segments,
with correlation of r = 0.71 between attention weights and STT confidence scores (p &lt; 0.001),
explaining system resilience to moderate transcription imperfections typical of real-world
deployments.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Data validation and ethical considerations</title>
        <p>The medical datasets underwent rigorous validation by a panel of 12 healthcare professionals,
including practicing physicians and registered nurses. The validation methodology employed a
three-stage approach: clinical plausibility assessment, fraud pattern recognition by insurance
specialists from three major Ukrainian healthcare institutions, and cross-cultural validation to ensure
the system would not inadvertently flag legitimate regional linguistic variations. Inter-rater
consistent evaluation criteria across the validation panel.</p>
        <p>The study design incorporated comprehensive ethical safeguards aligned with international
research standards, specifically:
•
•
•</p>
        <p>Informed Consent Procedures: All participants involved in dataset creation provided written
informed consent after receiving detailed information about study objectives, data usage, and
privacy protection measures. Participants retained the right to withdraw their contributions
at any stage without penalty or explanation.</p>
        <p>Privacy Protection and Anonymization: Audio recordings underwent multi-stage
anonymization procedures including voice modulation, removal of personally identifiable
information, and replacement of specific medical details with clinically equivalent but
nonidentifying alternatives. All processing occurred on secure, encrypted systems with access
limited to authorized research personnel.</p>
        <p>Cultural Sensitivity and Bias Mitigation: The dataset creation process incorporated systematic
bias assessment to ensure representative coverage across demographic groups,
socioeconomic backgrounds, and regional linguistic variations. Special attention was devoted
to preventing discrimination against vulnerable populations, including elderly patients,
individuals with disabilities, or those from minority communities.</p>
        <p>The datasets included proportional representation across age groups (18-30: 23%, 31-50: 41%,
5170: 28%, 70+: 8%), gender distribution (52% female, 48% male), and regional linguistic variations
representing major Ukrainian dialect groups. Fabricated scenarios encompassed a broad spectrum of
medical conditions commonly encountered in Ukrainian healthcare setup. The study design
incorporated safeguards to prevent the system from flagging authentic expressions of pain, distress,
or legitimate medical concerns as potential fraud indicators. The linguistic analysis framework was
specifically calibrated to distinguish between authentic pain descriptions and artificially constructed
symptom narratives through consultation with pain management specialists. Given that mental
health conditions can affect speech patterns, the system underwent specialized testing to ensure that
symptoms of depression, anxiety, or cognitive impairment would not trigger false positive
classifications. Additionally, cultural anthropologists familiar with Ukrainian healthcare
communication norms reviewed the system to ensure that culturally specific expression patterns
would not be misinterpreted as deception indicators.</p>
        <p>All audio-to-text conversions underwent manual review by native Ukrainian speakers with
medical transcription experience, achieving accuracy rates of 94.7% for standard speech patterns and
87.2% for speech with dialectical variations. The comprehensive validation process resulted in
highconfidence dataset quality metrics: 94.2% of scenarios achieved consensus agreement on medical
accuracy, 89.7% alignment with documented real-world fraud patterns, and 96.1% approval across
diverse cultural reviewer groups. The research adhered to applicable data protection regulations,
including GDPR and Ukrainian personal data protection laws, with all data handling incorporating
access control, audit trails, and automatic deletion of raw recordings following anonymization
completion.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Performance metrics</title>
        <p>A multidimensional evaluation framework was established through consultation with 50 data
analysis specialists representing five countries. These subject matter experts helped define
appropriate weighted metrics reflecting operational priorities:
a critical constraint.
combining normalized metrics:
operational contexts, an 80:20 weighting was applied favoring precision over recall. Weight</p>
        <p>Processing Efficiency: Evaluation of computational demands including processing time,
memory utilization, and hardware requirements. Weight coefficient: 6.</p>
        <p>Training Data Requirements: Assessment of minimum sample volume required to achieve
80% classification accuracy. Weight coefficient: 4.</p>
        <p>These metrics reflect the operational priorities in deployment scenarios where missed
manipulations carry greater consequences than false alarms, but where resource utilization remains
To ensure comprehensive evaluation, a linear additive convolution (LAC) formula was employed

= 0.5   + 0.3  
+ 0.2   ,
(1)
where 
accuracy score, 
processing efficiency score, 
data requirements score.</p>
        <p>To mitigate measurement variability, each metric was calculated through ten independent
measurement cycles with statistical outlier removal.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Resource Efficiency Measurements</title>
        <p>Computational efficiency metrics were measured on standardized hardware: Intel Xeon Gold 6248R
CPU (3.0 GHz, 24 cores), NVIDIA Tesla V100 GPU (32GB VRAM), and 128GB DDR4 RAM running
Ubuntu 20.04 with CUDA 11.8. Inference times represent mean processing duration for 15-second
audio segments (including STT conversion) averaged across 1,000 test samples. The RCNN achieves
average inference latency of 267ms with GPU memory footprint of 2,847MB and throughput of 3.7
samples/second. This represents 3.1× faster processing than RNN baseline (748ms) while maintaining
3.0× slower performance compared to NBC (86ms), reflecting the accuracy-efficiency trade-off. The
relative efficiency factor (PE scores in Table 1) normalizes these metrics against NBC using weighted
geometric mean accounting for both latency and memory utilization.</p>
        <p>Comparative testing revealed significant performance variations across the evaluated architectural
approaches. Table 1 presents normalized performance metrics across all evaluated systems.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and analysis</title>
      <p>
        Should be noted, that all metrics normalized to [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] scale with higher values indicating better
performance.
approaches:
      </p>
      <sec id="sec-5-1">
        <title>5.1. Architectural performance comparison</title>
        <p>Experimental findings reveal distinct performance characteristics across different architectural
0.97
0.82</p>
        <p>The hybrid RCNN approach demonstrated exceptional classification performance (0.97 accuracy),
substantially outperforming alternative architectures. This superior discrimination capability stems
from the synergistic integration of local feature extraction with temporal sequence analysis. By
combining these complementary processing pathways, the system effectively captures both localized
linguistic anomalies and broader contextual inconsistencies that typically manipulation attempts.</p>
        <p>The RNN implementation achieved moderate accuracy (0.87) by leveraging sequential contextual
processing but showed limitations in feature extraction efficiency. Most significantly, this
architecture exhibited the poorest computational performance profile, requiring approximately 2.8×
longer processing times compared to the baseline NBC implementation. This inefficiency primarily
results from the inherently sequential nature of recurrent processing that limits parallelization
opportunities.</p>
        <p>The CNN framework delivered acceptable accuracy (0.82) but showed particularly poor data
efficiency (0.237), requiring substantially larger training datasets to achieve reasonable performance.
This finding aligns with established understanding that convolutional architectures typically require
extensive example exposure to effectively generalize across diverse input variations. In
resourceconstrained operational environments, this data requirement presents a significant deployment
barrier.</p>
        <p>The NBC implementation demonstrated superior computational efficiency but the lowest
classification accuracy (0.80). This probabilistic approach required minimal processing resources
executing approximately 3.1× faster than the RCNN implementation but showed inadequate
discrimination capabilities when confronted with manipulation patterns that maintain superficial
linguistic consistency while altering core meaning.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Integrated performance analysis</title>
        <p>Applying the weighted evaluation formula to the experimental results yielded these composite
performance scores: RCNN: 0.927, CNN: 0.487, RNN: 0.532, NBC: 0.891.</p>
        <p>These metrics demonstrate the RCNN architecture's superior overall performance despite
moderate computational demands. While the NBC approach achieved a respectable composite score,
this primarily resulted from its exceptional processing efficiency rather than effective detection
capability. The substantial performance gap between the RCNN implementation and alternative
approaches (&gt;0.036 difference from the next-best performer) suggests robust performance
advantages across various operational scenarios.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Cross-dataset performance stability</title>
        <p>Table 2 presents detailed per-dataset performance metrics demonstrating the RCNN architecture's
consistent discrimination capabilities across different healthcare communication contexts.</p>
        <sec id="sec-5-3-1">
          <title>Accuracy</title>
        </sec>
        <sec id="sec-5-3-2">
          <title>Precision</title>
        </sec>
        <sec id="sec-5-3-3">
          <title>Recall</title>
        </sec>
        <sec id="sec-5-3-4">
          <title>F1-Score</title>
          <p>RCNN
RCNN
CNN
CNN
RNN
RNN
NBC
NBC</p>
        </sec>
        <sec id="sec-5-3-5">
          <title>Doctor-Patient</title>
        </sec>
        <sec id="sec-5-3-6">
          <title>Daily Health Checks</title>
        </sec>
        <sec id="sec-5-3-7">
          <title>Doctor-Patient</title>
        </sec>
        <sec id="sec-5-3-8">
          <title>Daily Health Checks</title>
        </sec>
        <sec id="sec-5-3-9">
          <title>Doctor-Patient</title>
        </sec>
        <sec id="sec-5-3-10">
          <title>Daily Health Checks</title>
        </sec>
        <sec id="sec-5-3-11">
          <title>Doctor-Patient</title>
          <p>Daily Health Checks</p>
          <p>Cross-dataset analysis reveals that RCNN maintains accuracy variation below 0.01 between
medical communication types, substantially outperforming alternative implementations. The CNN
Health Checks featuring non-standard conversational patterns, while RNN and NBC demonstrate</p>
          <p>The stability of the RCNN framework across communication types can be attributed to several
key linguistic processing capabilities. Register formality variations, which distinguish formal
doctorpatient communications containing standardized medical terminology from informal daily health
checks featuring conversational lexicon, are effectively handled by the hybrid architecture's
contextual processing components. The bidirectional LSTM elements demonstrate particular
robustness in managing syntactic complexity differences, where manipulated texts exhibit
anomalous structures that manifest differently across communication types - through violations of
medical terminological hierarchy in formal dialogues versus artificially complex grammatical
constructions in informal conversations.</p>
          <p>Semantic coherence detection remains stable across contexts due to the attention mechanisms
that focus on semantic anomalies regardless of lexical content variations. The system's ability to
identify emotional congruence mismatches between stated emotional states and linguistic markers
proves particularly valuable in medical contexts, where authentic symptom descriptions typically
demonstrate natural emotional consistency while fabricated descriptions contain emotional breaks
or artificially intensified expressive elements. Additionally, discourse marker analysis reveals that
manipulated content frequently exhibits unusual patterns in connectivity markers (however,
therefore, consequently) that remain detectable across both formal and informal communication
types, contributing to consistent cross-dataset performance.</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Generalization and Robustness Analysis</title>
        <p>To assess real-world applicability, the RCNN underwent evaluation on unseen speakers, dialectical
variations, and recording conditions not present in training data. Leave-one-speaker-out
cross0.09), indicating robust
generalization beyond training speaker characteristics. Testing on regional dialect samples from
Lviv, Odesa, and Poltava oblasts (not represented in training) yielded accuracy range of 91.7%-95.1%,
with performance degradation primarily attributable to STT confidence reduction in dialect-heavy
speech (mean confidence 0.73 vs 0.89 for standard Ukrainian).</p>
        <p>Environmental noise robustness was evaluated by adding synthetic noise at varying SNR levels
(25dB, 15dB, 10dB) to test recordings. The system maintained above 90% accuracy down to 15dB SNR,
with graceful degradation to 83.4% at 10dB - typical of challenging clinical environments. Channel
effect simulation (telephone bandwidth limitation, codec artifacts) reduced accuracy by 6.2
percentage points, suggesting the need for channel-aware preprocessing in telephonic healthcare
applications. These results confirm reasonable generalization capabilities while highlighting specific
domains requiring targeted adaptation.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Ablation Study</title>
        <p>Systematic ablation experiments quantified individual component contributions to system
performance. Removing the BiLSTM pathway produced the largest accuracy drop of 0.13, confirming
temporal sequence modeling as the architecture's most critical component. Eliminating CNN layers
reduced accuracy by 0.08, demonstrating substantial contribution from local feature extraction. The
attention mechanism provided 0.04 improvement, particularly benefiting longer audio segments
where selective focus proves valuable. Apriori-derived linguistic patterns contributed 0.03 accuracy
gain, validating integration of rule-based pattern mining with neural processing. Model compression
-oriented optimization potential for
resource-constrained environments without catastrophic performance degradation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations and future research</title>
      <p>While the hybrid architecture demonstrates significant advantages in manipulation detection,
several important limitations warrant acknowledgment and suggest promising research directions.</p>
      <sec id="sec-6-1">
        <title>6.1. Current system constraints</title>
        <p>Despite implementation optimizations, the approach faces several operational challenges:
Computational Resource Requirements: The bidirectional LSTM components create
substantial processing demands that may limit deployment in severely resource-constrained
environments. Although the architecture requires considerably fewer resources than
transformer-based alternatives, further optimization remains necessary for deployment on
edge devices with minimal processing capabilities.</p>
        <p>Training Data Dependencies: While the system demonstrates superior data efficiency
compared to alternatives, performance continues to depend on representative training
samples a persistent challenge given the rapidly evolving nature of manipulation
technologies.</p>
        <p>Processing Latency Under Load: The current implementation achieves acceptable processing
speed (267ms average latency for 15-second audio segments) under ideal conditions, but
experiences significant performance degradation under resource contention or when
processing multiple streams simultaneously.</p>
        <p>Modality Limitations: The framework focuses exclusively on linguistic content analysis
without incorporating acoustic feature examination. This single-modality approach creates
potential vulnerabilities against falsification techniques that maintain linguistic consistency
while manipulating with emotional tone.</p>
        <p>Adversarial Robustness: The system has not been evaluated against sophisticated adversarial
attacks specifically designed to evade detection. Malicious actors with knowledge of the
detection methodology could potentially craft manipulations that exploit architectural blind
spots, particularly by maintaining linguistic consistency metrics while introducing subtle
semantic distortions. Future work should assess robustness against adaptive adversaries
through red-team testing exercises.</p>
        <p>Real-world Fraud Complexity: The dataset comprises simulated manipulations created under
controlled conditions. Actual healthcare fraud may exhibit different characteristics,
including combinations of truthful and fabricated information, partial symptom
exaggeration rather than complete fabrication, and collaborative deception involving
multiple parties. System performance on genuine fraud cases requires validation through
partnerships with insurance investigation units.</p>
        <p>Language Dependence: While the system demonstrates cross-linguistic capability within
Slavic language families, performance on more structurally distinct languages remains
unverified. The linguistic markers driving detection may manifest differently across
language families.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Future research directions</title>
        <p>These limitations suggest several promising research opportunities. Future work should explore
architectural optimization through knowledge distillation techniques to create lightweight
deployment models, selective attention mechanisms to focus computational resources on potentially
problematic segments, and more efficient alternatives to LSTM components such as simplified GRU
units or attention-only architectures.</p>
        <p>Operational versatility could be extended through cross-domain generalization via domain
adaptation techniques to maintain performance across varied communication contexts, invariant
representation learning to capture domain-agnostic manipulation indicators, and transfer learning
approaches to leverage knowledge across related detection tasks. Addressing these research
directions would substantially enhance system capabilities while expanding potential application
domains.</p>
        <p>Multimodal integration represents another promising direction, incorporating acoustic features
alongside linguistic content to detect manipulation attempts that maintain textual consistency while
altering prosodic elements. Real-time processing capabilities through stream processing
architectures could enable continuous analysis of ongoing communications, while
privacypreserving techniques such as federated learning could facilitate deployment across healthcare
institutions without centralizing sensitive data. These enhancements could create more robust
detection systems capable of identifying sophisticated manipulation attempts while meeting the
stringent requirements of healthcare environments.</p>
        <p>Additional promising directions include explainability enhancements through attention
visualization techniques that highlight specific linguistic patterns triggering classification decisions,
enabling medical staff to understand system reasoning and identify potential false positives.
Incremental learning capabilities could allow the system to adapt to evolving manipulation
techniques without complete retraining, addressing the challenge of rapidly changing fraud patterns.
Integration with electronic health record systems through standardized APIs would enable seamless
deployment within existing healthcare IT infrastructure, reducing implementation barriers. Finally,
multilingual extension beyond Slavic language families through cross-lingual transfer learning could
expand system applicability to diverse healthcare contexts, though this requires careful validation of
linguistic marker transferability across typologically distinct languages.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This study evaluated a hybrid deep learning architecture for detecting manipulated audio content in
healthcare contexts. Key contributions include the development of a specialized linguistic processing
framework, implementation of a multi-pathway neural architecture integrating convolutional and
recurrent elements, and creation of a comprehensive evaluation methodology balancing accuracy
with operational constraints. Experimental results validate the exceptional effectiveness of the
approach, achieving 97% classification accuracy while maintaining reasonable computational
efficiency and data requirements. Performance remained consistent across varied communication
types, suggesting strong generalization capabilities, though resource utilization analysis indicates
opportunities for further architectural optimization.</p>
      <p>The demonstrated superiority of hybrid architectures suggests broader applications beyond audio
manipulation detection, while the system's high accuracy with moderate training data requirements
presents advantages for operational deployment in specialized domains. Practical deployment will
require further optimization to reduce processing overhead while maintaining detection capabilities,
potentially through knowledge distillation.</p>
      <p>These findings contribute to broader research by demonstrating the effectiveness of specialized
hybrid architectures for deception detection, while highlighting critical research directions to
enhance operational capabilities in contested information environment.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors would like to thank the Armed Forces of Ukraine for the opportunity to write a valid
work during the full-scale invasion of the Russian Federation on the territory of Ukraine. Also, the</p>
      <p>-Time Advanced Computational Intelligence for Deep Fake Video
authors wish to extend their gratitude to Kharkiv National University of Radio Electronics for
providing licences for additional software to prepare algorithms and the paper. This study was partly
funded by the National Science Centre of Poland (project no. 2023/05/Y/ST6/00263).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly Edu and submodule of Microsoft
365 in order to check grammar and spelling. After using these services, the authors reviewed and
.
[2] H. Padalko, V. Chomko, and D. Chumachenko A novel approach to fake news classification
using LSTM-based deep learning models Sec. Machine Learning and Artificial Intelligence,
vol. 6, pp. 1 18, 2023, doi: 10.3389/fdata.2023.1320800.
[3] C. Fuller et al. - 12th Americas
Conference on Information Systems, Acapulco, Mexico, Aug. 4 6, 2006. AISeL, 2006, pp. 3465
3472.
[4] L. Zhou et al.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <article-title>-based computer-mediated 136th</article-title>
          <source>Annual Hawaii International Conference on System Sciences, Big Island, USA, Jan. 6 9</source>
          ,
          <year>2003</year>
          . IEEE Explore,
          <year>2003</year>
          , pp.
          <fpage>1</fpage>
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1109/HICSS.
          <year>2003</year>
          .
          <volume>1173793</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khovrat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kobziev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Nazarov</surname>
          </string-name>
          , and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Family to Increase the Efficiency of Forecasting Market Indicators During Social Disaster</article-title>
          , in Inform.
          <source>Technology &amp; Implementation</source>
          , Kyiv, Ukraine,
          <source>Nov. 30 Dec. 2</source>
          ,
          <year>2022</year>
          . CEUR Workshop,
          <year>2023</year>
          , pp.
          <fpage>222</fpage>
          <lpage>233</lpage>
          . Accessed: Jul. 7,
          <year>2025</year>
          . [Online]. Available: https://ceur-ws.org/Vol3347/Paper_19.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Burgon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Blair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Intelligence</surname>
          </string-name>
          and
          <string-name>
            <given-names>Security</given-names>
            <surname>Informatics</surname>
          </string-name>
          ,
          <string-name>
            <surname>First</surname>
            <given-names>NSF</given-names>
          </string-name>
          /NIJ Symposium, Tucson, USA, June 2 3,
          <year>2003</year>
          . Springer Nature,
          <year>2003</year>
          , pp.
          <fpage>91</fpage>
          <lpage>101</lpage>
          . doi:
          <volume>10</volume>
          .1007/3-540-44853-
          <issue>5</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[7] Using transfer learning, spectrogram audio classification, and MIT app inventor to facilitate machine learning understanding 7</article-title>
          ,
          <year>2025</year>
          . [Online]. Available: https://dspace.mit.edu/handle/1721.1/127379.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yakovlev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khovrat</surname>
          </string-name>
          , and
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Using Parallelized Neural Networks to Detect Falsified Audio Information in Socially Oriented Systems in Inform</article-title>
          .
          <source>Technology &amp; Implementation</source>
          , Kyiv, Ukraine,
          <source>Nov. 20 Nov. 21</source>
          ,
          <year>2023</year>
          . CEUR Workshop,
          <year>2024</year>
          , pp.
          <fpage>220</fpage>
          <lpage>238</lpage>
          . Accessed: Jul. 7,
          <year>2025</year>
          . [Online]. Available: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3624</volume>
          /Paper_19.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Discrete</given-names>
            <surname>Hidden</surname>
          </string-name>
          <article-title>Markov Model for SMS Spam Detection Applied Science</article-title>
          , vol.
          <volume>10</volume>
          (
          <issue>14</issue>
          ),
          <year>2020</year>
          , art. 5011, doi: 10.3390/app10145011.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alonso</surname>
          </string-name>
          et al.
          <source>Sentiment Analysis for Fake News Detection Electronics</source>
          , vol.
          <volume>10</volume>
          (
          <issue>11</issue>
          ),
          <year>2021</year>
          , art. 1348, doi: 10.3390/electronics10111348.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tolosana</surname>
          </string-name>
          et.al.
          <article-title>Deepfakes and beyond: A Survey of face manipulation and fake detection Information Fusion</article-title>
          , vol.
          <volume>64</volume>
          , pp.
          <fpage>131</fpage>
          <lpage>148</lpage>
          ,
          <year>2021</year>
          , doi: 10.1016/j.inffus.
          <year>2020</year>
          .
          <volume>06</volume>
          .014.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mcuba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Ikuesan</surname>
          </string-name>
          , and H. pp.
          <fpage>211</fpage>
          <lpage>219</lpage>
          ,
          <year>2023</year>
          , doi: 10.1016/j.procs.
          <year>2023</year>
          .
          <volume>01</volume>
          .283.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          <article-title>Linguistic feature based learning model for fake news detection and classification Expert Systems with Applications</article-title>
          , vol.
          <volume>169</volume>
          ,
          <year>2021</year>
          , art. 114171, doi: 10.1016/j.eswa.
          <year>2020</year>
          .
          <volume>114171</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>Procedia Computer Science</source>
          , vol.
          <volume>219</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Elbatanouny</surname>
          </string-name>
          et al.
          <source>j.eswa</source>
          .
          <year>2025</year>
          .
          <volume>127601</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Djenouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Belhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Srivastava and J. C.-W. Lin</surname>
          </string-name>
          ,
          <article-title>Advanced Pattern-Mining System for Fake News Analysis</article-title>
          ,
          <source>IEEE Transactions on Computational Social Systems</source>
          , vol.
          <volume>10</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>2949</fpage>
          -
          <lpage>2958</lpage>
          ,
          <year>2023</year>
          , doi: 10.1109/TCSS.
          <year>2022</year>
          .
          <volume>3233408</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Padalko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chomko</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. Chumachenko</surname>
          </string-name>
          <article-title>The Impact of Stopwords Removal on Disinformation Detection in Ukrainian language during Russian Ukrainian war in 4th</article-title>
          <source>International Workshop of IT-professionals on Artificial Intelligence</source>
          , Cambridge, MA, USA,
          <source>Sep. 25 Sep. 27</source>
          ,
          <year>2023</year>
          . CEUR Workshop,
          <year>2024</year>
          , pp.
          <fpage>87</fpage>
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>Foundations and Trends in Information Retrieval</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          ,
          <year>2023</year>
          , doi: 10.1561/1500000019.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>