<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (S. Yakovlev);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14, Nauky, Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lodz University of Technology</institution>
          ,
          <addr-line>90-924 Lodz</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sergiy Yakovlev</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>V.N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>4, Svobody, Sq., Kharkiv, 61022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The detection of fabricated information on interactive social platforms has gained significant academic and regulatory attention. During social instability, such disinformation presents substantial risks to individuals and society. Disinformation varies in impact, from harmless humor to content threatening societal stability. This study focuses on textual news content due to limitations in generating convincing visual forgeries. For text classification, three classical approaches are utilized: probabilistic models, neural networks, and polynomial models. Previous research has shown that hybrid recurrent-convolutional network (RCNN) offers superior binary classification performance, yet multi-classification across diverse disinformation categories remains unresolved. This paper establishes a classification framework categorizing content through the RCNN model into five classes: explicit satire, subtle humor, content targeting individuals, regionally harmful news, and globally impactful disinformation. Based on this framework, three data categorization models were developed using neural networks, naive Bayes classification, and polynomial algorithms. Experimental evaluation measured accuracy, processing efficiency, and data reduction requirements to achieve &gt;80% accuracy. Results demonstrate that dual-layer implementations achieved approximately 20% improved effectiveness compared to standalone approaches. The RCNN-naive Bayes hybrid exhibited optimal accuracy and processing speed, showing considerable potential for highthroughput systems requiring swift responses to misinformation. These findings represent a significant advancement in automated information verification methodologies, establishing a foundation for future development in complex classification tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>Bayesian classification</kwd>
        <kwd>computational linguistic</kwd>
        <kwd>distributed computing</kwd>
        <kwd>fake news</kwd>
        <kwd>neural networks 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The identification of fabricated content within interactive multi-user systems, particularly social
networks, has gained increasing prominence in both academic research and regulatory frameworks
[1, 2]. The growing attention toward social networks stems from their structural characteristics that
amplify the diffusion and perceived credibility of user-generated content. In contrast to static
platforms such as blogs or web forums, social networks enable real-time dissemination, algorithmic
curation, and micro-targeted reach, which collectively facilitate the rapid proliferation of fabricated
information. These systems often introduce significant asymmetries between the speed of content
publication and the pace of its verification, contributing to increased informational vulnerability
during periods of societal stress. Their ubiquitous presence and influence over public discourse
further underline the urgency of developing automated detection mechanisms tailored to their
dynamic and high-volume environments. This trend correlates with progressive advancements in
content generation technologies and the escalating informational burden experienced by the general
populace. During periods of social transformation, such content can significantly compromise both
personal welfare and collective societal functioning.</p>
      <p>To mitigate subjective evaluation in information assessment, implementation of data mining
methodologies represents a prudent approach. The specific methodological implementations are
directly contingent upon the characteristics of the data under investigation. Within the scope of the
current research, analysis was restricted to textual news content. This limitation was established due
to the current absence of technologies capable of producing video fabrications that demonstrate
visual authenticity from human perceptual perspectives. Concurrently, audio content demonstrates
limited prevalence within social network environments.</p>
      <p>Numerous methodologies have been proposed for textual data classification, ranging from
symbolic rule-based systems to advanced ensemble models. However, in the context of the present
study, focus was placed on three foundational methodological paradigms that are both widely
adopted and methodologically representative [3]:
1. Probabilistic frameworks, including naive Bayes classifiers, Markov chain models, and</p>
      <p>Bayesian networks.
2. Neural network architectures, such as recurrent, convolutional, and
transformerbased models.
3. Polynomial models, particularly those based on additive convolutional formulations with
weighted coefficients and domain-specific constraints.</p>
      <p>These approaches were selected based on a combination of their conceptual diversity, established
performance in prior research, and applicability to social media content analysis. To ensure
methodological robustness and relevance, an expert panel comprising 20 data analysts from diverse
geographical regions was convened to validate and endorse the selection. Their input was
instrumental in narrowing the methodological scope to approaches deemed both technically sound
and practically viable for the detection of fabricated content.</p>
      <p>Previous investigations focusing on binary classification of content into authentic and fabricated
categories have examined probabilistic approach [4] and various neural network architectures [5].
Findings indicate that a hybrid architecture integrating recurrent and convolutional functionalities
— designated as recurrent-convolutional neural network hybrid approach (RCNN) — demonstrates
superior performance regarding both accuracy metrics and computational efficiency. Additionally,
research has identified that determining the significance threshold of fabricated content presents a
substantial challenge in information differentiation studies. Specifically, certain textual content may
exhibit overtly humorous characteristics readily identifiable by human evaluators, consequently
presenting minimal societal risk. Conversely, content designed to undermine socially significant
legislative initiatives carries substantial risk.</p>
      <p>Considering these factors, the current investigation aimed to establish categorization
frameworks for fabricated information, providing a foundation for classifying content filtered
through RCNN architectures. This approach results in a dual-layer model for detecting fabricated
news, intended to determine optimal methodologies for content differentiation. To achieve this
objective, the following research tasks were identified:
1. Conducting expert evaluation and domain analysis to establish fundamental classifications
of fabricated information.
2. Developing data segregation models utilizing naive Bayes classification frameworks.
3. Conducting experimental verification of the proposed dual-layer model in comparison with
standard RCNN implementations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Indicators of disinformation</title>
      <p>In constructing an appropriate analytical model, the formulation of a feature vector represents a
critical determinant of classification efficacy. Through comprehensive linguistic analysis and
empirical observation, a set of discriminative characteristics typical of fabricated information has
been identified and systematically categorized.</p>
      <p>The first group is “Primary Linguistic Indicators”, it can be systematized into 6 indicators [6, 7]:





</p>
      <p>Interrogative Density: Quantitative measurements indicate significant overuse of
questionbased structures designed for sociocognitive manipulation. Corpus-based journalism studies
confirm the atypical frequency of such patterns in legitimate reporting. This characteristic
remains consistent across multimedia disinformation channels.</p>
      <p>Emotional Vocabulary Amplification: Deliberate reduction of negating constructions paired
with extreme terminology substitutions (e.g., transforming "issue" into "crisis") represents a
documented cognitive manipulation technique. This strategy functions through dual
pathways: minimizing processing complexity while heightening affective responses.
Discourse Function Misalignment: Inappropriate deployment of directive and persuasive
linguistic structures, particularly in contexts mimicking authentic news formats, serves as a
reliable fabrication indicator.</p>
      <p>Pronoun Frequency Analysis: Disproportionate pronoun usage consistently correlates with
attempts at contextual manipulation, particularly when simulating journalistic content.
Quantitative measurement of pronoun density provides objective fabrication metrics.
Syntactic-Stylistic Irregularities: Consistent grammatical and register deviations, especially
within purported expert citations, function as significant fabrication markers.</p>
      <p>Temporal Reference Manipulation: Strategic distortion of temporal relationships through
inconsistent tense usage, deliberate chronological ambiguity, and absence of precise
temporal anchoring. This technique disrupts causal relationships and complicates
verification processes by obfuscating the sequence of reported events.</p>
      <p>The second group is “Secondary Stylometric Factors”, it can be systematized into 6 additional
indicators [8, 9]:





</p>
      <p>Emotional Response Engineering: Methodical analysis of affectively charged terminology
and psychological influence patterns reveals systematic emotional manipulation strategies.
Reference Integration Assessment: Systematic examination of citation patterns and
attribution frameworks, with specific focus on reference scarcity or complete absence of
verifiable external sources.</p>
      <p>Discourse Coherence Measurement: Implementation of computational text analysis
methodologies to identify logical contradictions and narrative structural inconsistencies
throughout content.</p>
      <p>Source Reliability Evaluation: Development of algorithmic frameworks for assessing
publication characteristics, with particular emphasis on temporal proximity to significant
sociopolitical events, integrated with comprehensive source credibility metrics.</p>
      <p>Information Density Distribution: Quantitative analysis of content-to-noise ratios
throughout text segments, identifying atypical information clustering patterns that diverge
from established journalistic conventions regarding information presentation and structural
organization.</p>
      <p>Linguistic Register Oscillation: Identification of inconsistent formality levels and
inappropriate stylistic variations within a single text, characterized by unpredictable
alternations between technical terminology and colloquial expressions that contradict
established genre conventions and indicate potential synthetic content generation.</p>
      <p>This enhanced feature set facilitates development of robust, multi-dimensional classification
models capable of identifying fabricated information across various media modalities with increased
precision and recall rates. Integration of both primary linguistic indicators and secondary stylometric
factors enables a more comprehensive approach to misinformation detection.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Classes of disinformation</title>
      <p>The initial phase in addressing the multi-classification challenge involves establishing
fundamental categories of disinformation through a rigorous methodological framework. To
determine this classification schema, an expert panel comprising 20 data analysts from various
European and North American countries was assembled, ensuring geographical and institutional
diversity. This systematic sampling approach yielded a comprehensive list of 10 predominant
categories, which underwent further refinement through statistical validation.</p>
      <p>Subsequently, a standardized assessment protocol was implemented through an open survey to
identify the most vulnerable types of information falsification. The aggregated responses from 300
participants (n=300, confidence interval = 95%, margin of error ±5.66%) were instrumental in
formulating these defined groups through hierarchical clustering analysis:




</p>
      <p>Satire with objectively identifiable manifestations: fabricated content characterized by clear
linguistic exaggeration or absurdity, often using hyperbolic language or comical distortion.
Example: “NASA confirms Moon landing was a rehearsal for alien diplomacy!”
Satire with contextual or grammatical manifestations: disinformation that requires cultural
familiarity or nuanced linguistic interpretation to identify its satirical nature. Example: “If
voting mattered, they'd make it illegal — says democratic official with a straight face.”
News targeting specific individuals or small groups (micro-level disinformation): content
that falsely accuses, discredits, or fabricates actions attributed to particular persons or closed
communities. Example: “Local activist John Smith caught funneling foreign funds —
documents leaked!”
News oriented toward multiple regions, countries, or large groups (meso-level
disinformation): narratives aimed at manipulating perception across medium-scale
audiences, often with geopolitical or interregional scope. Example: “European farmers unite
to ban Ukrainian imports, say EU is collapsing.”
News directed at multiple countries or society (macro-level disinformation): strategic,
highimpact narratives designed to destabilize public trust or provoke systemic panic. Example:
“UN to seize private property globally under new climate treaty, leaked papers reveal.”
The categorization framework demonstrates a hierarchical structure with increasing scope and
potential impact, facilitating both quantitative and qualitative analysis of disinformation patterns.
By grounding each class in linguistic and contextual examples, the system enables more precise
model training and real-world application relevance.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Basic approach</title>
      <sec id="sec-4-1">
        <title>4.1. Target features</title>
        <p>For the establishment of disinformation characteristics, the methodology advances to feature set
development serving as model input variables. The primary metric, "Emotional Characteristic,"
derives from content analysis principles, implementing a sequential algorithmic procedure [10]:
1. Textual segmentation into sentence units with non-semantic construct exclusion.
2. Lemmatization and stemming operations for morphological root extraction.
3. Computation of normalized frequency-emotional indicators.
4. Sentiment analysis implementation utilizing NLTK in Python3 for lexical distribution and
emotional valence determination [11].</p>
        <p>Additional quantitative indicators incorporated into the analytical framework include:





</p>
        <p>Rhetorical Density Coefficient: defined as the ratio of rhetorical constructions to total
sentences.</p>
        <p>Negative Construction Frequency: quantifying negative linguistic structure density.
Contextual Emotional Index: derived from sentiment analysis of temporally relevant
hightraffic content.</p>
        <p>Suspicion Coefficient: computed through lexical pattern matching against predetermined
deception indicators.</p>
        <p>Message Impact Factor: a hierarchical classification of content significance.</p>
        <p>Sentiment Magnitude Vector: an aggregate measure of emotional content intensity.</p>
        <p>This integrated approach enables comprehensive feature extraction
computational efficiency.
while ensuring</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline model</title>
        <p>Each In conventional Convolutional Neural Network (CNN) architectures, filter operations facilitate
local spatial dependency incorporation; however, the distinctive nature of the proposed indicators
necessitates comprehension of extended temporal sequences without introducing future-state
dependencies [12]. This limitation arises as contextual information may exist beyond CNN receptive
fields. To address this architectural constraint, a hybrid approach combining Long Short-Term
Memory Recurrent Neural Networks (LSTM) and CNN methodologies was implemented (shown on
Figure 1).</p>
        <p>Architectural enhancements incorporated:


</p>
        <p>Receptive field optimization through dilated convolutions, skip connections, and attention
mechanisms.</p>
        <p>Memory management protocols utilizing gated memory units, adaptive forget gates, and
memory-efficient backpropagation.</p>
        <p>Gradient flow optimization implementing residuals, layer normalization, and gradient
clipping.</p>
        <p>Cross-validation yielded optimal hyperparameters: 4-unit kernel dimensionality, 1-unit stride
parameter, omitted zero-padding, excluded bias terms, and 5×5×3 tensor filter dimensions.</p>
        <p>Additional performance optimizations included: curriculum learning implementation, dynamic
batch sizing, and early stopping with patience factor p=5 for training protocols; dropout layers
(rate=0.3), L2 regularization (λ=0.01), and feature-wise regularization for regularization strategies;
and model quantization, sparse tensor operations, and parallel processing for computational
efficiency.</p>
        <p>This enhanced architecture demonstrates superior performance while maintaining efficiency,
integrating bidirectional recurrent components with convolutional layers to capture both spatial and
temporal dependencies, achieving 94.3% validation accuracy on benchmark datasets.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Additional layer for classification model</title>
      <p>Following the initial classification performed by the primary RCNN architecture, three distinct
approaches were implemented for the secondary classification layer to enhance categorization
precision.</p>
      <p>The RCNN architecture serves as one potential implementation for the secondary layer, utilizing
independent training protocols optimized for multi-class discrimination. Given the extensive
documentation of this architectural framework in previous sections, further elaboration is omitted
from the current analysis.</p>
      <p>The naive Bayes classification (NBC) methodology represents the second implementation
approach, operating on fundamental Bayesian probability principles to calculate class membership
likelihood while maintaining feature independence assumptions. This independence assumption
demonstrates practical validity in the current context, as the defined feature set exhibits minimal
inter-feature dependency in subsequent value determination. The Bayesian theorem fundamentally
describes the probability of an event occurring based on prior knowledge of conditions related to
that event. In this context, it calculates the probability of information belonging to a particular class
by considering several key components: the probability of observing specific features when the
information belongs to that class, the overall probability of the class occurring in the dataset, and the
total probability of observing those specific features across all possible classes. This relationship
enables the calculation of the final probability that a piece of information belongs to a particular
class given its observed features.</p>
      <p>This probabilistic framework enables robust classification through systematic evaluation of class
membership probabilities, particularly effective in scenarios involving multiple independent feature
sets. The methodology's effectiveness is enhanced through its integration within the broader
duallayer architectural framework, complementing the RCNN-based primary classification layer.</p>
      <p>In the convolutional polynomial classification model (PA), each input feature is assigned a
weight coefficient reflecting its relative contribution to the final classification score. To ensure the
validity and reproducibility of these weights, a structured expert elicitation process was conducted
using a modified analytic hierarchy procedure (AHP) and Likert-scale scoring.</p>
      <p>A panel of 20 domain experts, previously involved in the category development phase (see
Section 3), participated in the weighting procedure. The process was organized in three sequential
steps:


</p>
      <p>Relative impact rating: Each expert was provided with a standardized set of 30 labeled news
items, representing a diverse range of disinformation classes. For each item, experts
evaluated the perceptual contribution of seven defined features (Emotional Characteristic,
Rhetorical Density, Negative Construction Frequency, etc.) to the classification decision on
a 10-point Likert scale. This generated a total of 4,200 individual judgments (30 texts × 7
features × 20 experts).</p>
      <p>Pairwise consistency check: To validate rating coherence, internal consistency of each
expert’s scores was assessed using pairwise comparison matrices and consistency ratios.
Responses with CR &gt; 0.15 were flagged and excluded from aggregation to maintain overall
reliability.</p>
      <p>Weight aggregation and normalization: Remaining ratings were averaged across all experts
and normalized such that the sum of all feature weights equaled 1.</p>
      <p>The final coefficients were:






</p>
      <p>Emotional Characteristic — 0.35.</p>
      <p>Rhetorical Density Coefficient — 0.1.</p>
      <p>Negative Construction Frequency — 0.1.</p>
      <p>Contextual Emotional Index — 0.1.</p>
      <p>Suspicion Coefficient — 0.15.</p>
      <p>Message Impact Factor — 0.1.</p>
      <p>Sentiment Magnitude Vector — 0.1.</p>
      <p>To evaluate the robustness of this weighting scheme, a sensitivity analysis was performed by
perturbing each coefficient ±10% and measuring the resulting classification variation across 2,000
samples. The average deviation in predicted class membership was below 2.7%, indicating high
tolerance to minor variations and reinforcing the stability of the expert-derived weights.</p>
      <p>The resulting weight coefficients reflect both theoretical understanding and practical experience
in information verification processes, enhancing the model's ability to discriminate between different
categories of fabricated information.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Distributed computing</title>
      <p>The distribution strategy employs MapReduce methodology, incorporating data partitioning across
distributed nodes (shown on Figure 2). This framework is fundamentally constructed upon mapping
and reduction function primitives. While both Spark and Hadoop frameworks offer robust
implementations, this research utilizes Hadoop's architecture, leveraging its inherent node-level
mapping and reduction capabilities for optimized database interactions [13]. This architectural
decision is particularly advantageous given the requirement to process heterogeneous, high-volume
data streams.</p>
      <p>For the RCNN implementation, the distributed processing architecture partitions incoming data
across multiple computational nodes. The mapping phase distributes neural network weight
calculations to parallel processors, with each node independently computing gradient updates on
assigned data segments. The reduction phase then aggregates these distributed gradients,
consolidating them into coherent weight updates that maintain model integrity. This approach
enables bidirectional processing without sacrificing temporal dependencies critical to recurrent
components.</p>
      <p>The naive Bayesian classifier implementation leverages MapReduce through probabilistic
computation distribution. During the mapping phase, conditional probabilities are calculated
independently across distributed nodes, with each processor handling specific feature-class
combinations. The reduction phase synthesizes these individual probability calculations into
comprehensive class likelihood estimations. This approach particularly benefits from the assumption
of feature independence inherent in naive Bayes methodologies, making it naturally conducive to
parallelization.</p>
      <p>For the polynomial approach implementation, the distributed framework allocates coefficient
calculations across computational nodes. The mapping phase distributes feature-specific weighting
operations, with each node calculating partial polynomial evaluations. The reduction phase
integrates these partial evaluations through weighted summation based on expert-derived
coefficients. This implementation demonstrates significant efficiency improvements for polynomial
operations with high-dimensional feature vectors, effectively mitigating computational bottlenecks.</p>
      <p>All three implementations incorporate synchronization protocols that balance computational
efficiency with model coherence, utilizing barrier synchronization methods and asynchronous data
transfer mechanisms to optimize performance while maintaining classification accuracy. To evaluate
the effectiveness of the MapReduce-based distributed processing architecture, a comparative analysis
of training duration, inference latency, and resource utilization was performed for centralized versus
distributed implementations of target models.</p>
      <p>Results indicate that distributed training via MapReduce achieved an average processing time
reduction of 43% across all models. Training time for the RCNN model decreased from 47 minutes to
26.8 minutes; NBC improved from 22 minutes to 11.9 minutes; and PA was reduced from 18.4 minutes
to 10.2 minutes. Measurements were conducted on the complete dataset using the hardware
configuration specified below.</p>
      <p>Inference latency per 1,000 records decreased by approximately 35%, with consistent
performance across model types. Classification accuracy remained statistically stable, with a
deviation margin of no more than ±0.002 across ten cross-validation cycles, demonstrating that
model performance was unaffected by distributed deployment.</p>
      <p>These findings confirm that the MapReduce framework substantially accelerates both training
and execution phases while preserving classification quality, making it a viable solution for
highthroughput and real-time misinformation detection scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Experimental environment</title>
      <p>and annotation process ensures high reliability and reproducibility of experimental results, allowing
for robust evaluation of the classification architecture.</p>
      <p>The evaluation framework incorporates three primary parameters to assess model performance
comprehensively:
1. Classification Accuracy: Measured as the proportion of correctly classified instances across
all categories, with particular emphasis on minimizing false negative classifications in
highimpact disinformation categories. This parameter utilizes a weighted F1-score incorporating
both precision (0.80) and recall (0.20) components to prioritize comprehensive detection.
2. Time Saving: Quantified as the time required for complete dataset analysis, normalized
against the baseline single-layer RCNN implementation. This metric incorporates both
training duration and inference latency to provide a holistic assessment of computational
demands.
3. Volume Saving: Determined by systematically reducing training dataset size until accuracy
metrics fall below the 80% threshold. This parameter evaluates model resilience to limited
training data, providing insights into implementation viability in domains with restricted
data availability.</p>
      <p>The experimental assessment incorporates comprehensive evaluation protocols developed in
collaboration with 20 data analysis specialists across multiple countries. Performance measurement
utilizes a weighted scoring system that prioritizes classification accuracy (16 points) through
balanced Precision (0.80) and Recall (0.20) metrics, while processing efficiency and data volume
optimization contribute equally (2 points each) to the total evaluation score. This weighting system
is implemented through linear additive convolution with weighted coefficients, enabling nuanced
model evaluation while maintaining classification accuracy as the primary focus.</p>
      <p>The architectural design facilitates seamless computational node integration and offers
significant flexibility, enabling scalable performance optimization without requiring fundamental
structural modifications. This adaptability is particularly valuable when processing heterogeneous
data streams across varying linguistic and contextual domains. The system demonstrates
effectiveness in processing high-dimensional feature spaces and complex linguistic patterns while
minimizing false negative classifications in socially sensitive contexts.</p>
      <p>To ensure statistical validity and minimize experimental uncertainty, the research methodology
incorporated a multi-level error identification and mitigation framework covering data integrity,
model robustness, and computational consistency.</p>
      <p>At the data level, potential sources of error included annotation inconsistency, label noise, and
class imbalance. These issues were addressed through a two-phase annotation process combining
semi-supervised pre-labeling with manual validation by a panel of six experts in computational
linguistics, journalism, and information verification. Ambiguous cases were resolved through
consensus discussions, and inter-annotator agreement was monitored using Cohen’s kappa (κ =
0.87), indicating high consistency. Stratified sampling was employed to maintain class distribution
during training-test splits, thereby reducing sampling bias.</p>
      <p>At the model level, stochastic variability in training outcomes was mitigated through 10-fold
cross-validation, repeated over ten independent training cycles (n = 10) for each configuration. This
procedure enabled the calculation of confidence intervals and standard deviation bounds for key
metrics (accuracy, precision, recall). Performance fluctuations across runs were analyzed using
coefficient of variation (CV), which remained below 3.2% for all final configurations, indicating high
stability.</p>
      <p>At the computational infrastructure level, hardware-induced noise was addressed by executing
all training and inference operations on a fixed cluster of five identical AMD Ryzen 5 5600X nodes.
System resource usage was locked via dedicated CPU-core pinning and RAM allocation (8 GB per
task), while operating system interruptions were minimized using taskset and real-time scheduling
(SCHED_FIFO). Additionally, all experiments were executed under a unified software stack with
fixed versions of Python (3.9.13), TensorFlow (2.11.0), and NLTK (3.8), eliminating variability due to
software environment drift.</p>
      <p>To reduce the impact of run-level outliers, all performance measurements were averaged across
repetitions, and outlier values (exceeding 2 SD from the mean) were excluded from final efficiency
scoring. No statistically significant anomalies (p &gt; 0.05, two-tailed) were detected in the cleaned
result sets.</p>
      <p>This multi-tier error control protocol ensured that experimental conclusions are grounded in
statistically robust, reproducible findings, with all critical performance claims substantiated by
repeated and independently validated trials.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Results of the experiment</title>
      <p>The aggregated results of the conducted experiment are shown in Table 1 below.</p>
      <p>Quantitative analysis utilizing linear additive convolution with weighted importance coefficients,
based on the performance metrics presented in Table 1, yields distinctive efficiency indicators across
various architectural implementations. The computational results demonstrate significant variation
in model performance:</p>
      <p>Single-layer RCNN implementation achieves an efficiency coefficient of 0.62, establishing a
baseline performance metric. The dual-layer RCNN architecture demonstrates substantial
improvement with an efficiency coefficient of 0.942, indicating enhanced classification capability.
Further architectural refinement incorporating RCNN as a foundation with naive Bayes classifier as
the secondary layer achieves optimal performance with an efficiency coefficient of 0.945. The
implementation utilizing RCNN as the primary layer combined with a polynomial approach in the
secondary layer yields an intermediate efficiency coefficient of 0.816.</p>
      <p>These results demonstrate clear performance differentiation among architectural variants, with
the RCNN-naive Bayes hybrid configuration exhibiting superior efficiency in the classification task.
The substantial improvement over the single-layer baseline indicates the efficacy of hierarchical
approaches in complex classification scenarios, particularly when combining complementary
methodological frameworks. Despite the superior computational efficiency exhibited by standard
RCNN implementation, the dual-layer approach's capacity to maintain accuracy thresholds
exceeding 80% with minimal data requirements represents a significant advancement in classification
methodology.</p>
      <p>Further analysis of the performance metrics reveals several significant insights. The RCNN-naive
Bayes hybrid demonstrates an optimal balance between classification accuracy and computational
efficiency, with only a 5% reduction in processing speed compared to the single-layer approach while
achieving a 46% improvement in classification accuracy. This configuration also exhibits exceptional
data efficiency, requiring only 10% of the original dataset volume to maintain performance
thresholds above 80% accuracy, suggesting substantial potential for implementation in
resourceconstrained environments.</p>
      <p>The dual RCNN configuration demonstrates the highest absolute accuracy (0.96), indicating its
potential utility in applications where precision is paramount regardless of computational demands.
However, this marginal accuracy improvement over the RCNN-naive Bayes hybrid (0.95) may not
justify the additional implementation complexity in most practical applications.</p>
      <p>The polynomial approach implementation, while exhibiting the highest time efficiency (0.96),
demonstrates comparatively modest accuracy improvements (0.85) and data efficiency (0.40). This
configuration may be appropriate for scenarios requiring rapid classification with moderate accuracy
requirements, particularly in high-throughput systems where processing speed is prioritized over
classification precision.</p>
      <p>These comparative performance characteristics provide a foundation for implementation-specific
architectural selection based on application requirements, enabling optimized deployment in various
operational contexts with differing priorities regarding accuracy, efficiency, and resource utilization.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>The research objective, centered on developing a sophisticated dual-layer data classification model,
successfully extends beyond basic fabrication detection to encompass both magnitude assessment
and intentionality analysis of data falsification. Empirical analysis demonstrates that the dual-layer
model implementation achieves an average 20% performance improvement compared to direct
RCNN methodology. This performance differential becomes particularly significant in
highthroughput systems where rapid identification and response to fabricated information represent
critical operational parameters.</p>
      <p>The experimental outcomes provide compelling evidence supporting the efficacy of hybrid
architectural approaches in complex information classification scenarios. This validation framework
establishes a robust foundation for practical implementation in high-demand environments, where
the rapid assessment and categorization of potentially fabricated information are paramount.
Furthermore, the demonstrated performance improvements suggest significant potential for
application in large-scale information processing systems where both accuracy and processing
efficiency are critical operational constraints.</p>
      <p>These findings represent a substantial advancement in automated information verification
methodology, establishing a framework for future development in hybrid neural network
architectures focused on complex classification tasks. The validated performance improvements
provide strong empirical support for the continued development and implementation of multi-layer
classification systems in practical applications requiring sophisticated information authenticity
assessment.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>The authors would like to thank the Armed Forces of Ukraine for the opportunity to write a valid
work during the full-scale invasion of the Russian Federation on the territory of Ukraine. Also, the
authors wish to extend their gratitude to Kharkiv National University of Radio Electronics for
providing licences for additional software to prepare algorithms and the paper.</p>
    </sec>
    <sec id="sec-11">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly Edu and submodule of Microsoft
365 in order to check grammar and spelling. After using these services, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.
References
[1] I. E. Aïmeur, S. Amri, G. Bassard, Fake news, disinformation and misinformation in social media:
a review. Social Network Analysis and Mining 13 (2023) no. 30. doi:
10.1007/s13278-023-010285
[2] D. D. Giandomenico, J. Sit, A. Ishizaka, D. Nunan, Fake news, social media and marketing: A
systematic review. Journal of Business Research 124 (2021) pp. 329–341. doi:
10.1016/j.jbusres.2020.11.037.
[3] Y. M. Rocha, G. A, de Moura, G. A. Desiderio, C. H. de Oleveira, F. D Lourenco, L. D. de
Figueiredo Nicolete, The impact of fake news on social media and its influence on health during
the COVID-19 pandemic: a systematic review. Journal of Public Health 31 (2023) pp 1007–1016.
doi: 10.1007/s10389-021-01658-z
[4] A. Khovrat, V. Kobziev, Using Naïve Bayes Classifier to Indentify Falsified Text Information. in:
Proceedings of the IEEE 5th KhPI Week on Advanced Technology, Kyiv, Ukraine, 7–10 October
2024, pp. 1–4. doi: 10.1109/KhPIWeek61434.2024.10877950.
[5] A. Khovrat, V. Kobziev, Using RCNN to Indentify the Fake Audio Information. in: Proceedings
of the IEEE 7th International Conference on Actual Problems of Unmanned Aerial Vehicles
Development, Kyiv, Ukraine, 22–24 October 2024, pp. 205–208. doi:
10.1109/APUAVD64488.2024.10765907.
[6] A. Choudhary, A. Arora, Linguistic feature-based learning model for fake news detection and
classification. Expert Systems with Applications. 169 (2021) no. 114171. doi:
10.1016/j.eswa.2020.114171.
[7] S. Garg, D. K. Sharma, Linguistic features-based framework for automatic fake news detection.</p>
      <p>Computers &amp; Industrial Engineering 172 (2022) no. 108432. doi: 10.1016/j.cie.2022.108432.
[8] N. F. Baarir, A. Djeffal, Fake News detection Using Machine Learning, in: Proceedings of the 2nd
International Workshop on Human-Centric Smart Environments for Health and Well-being,
Algeria, Algeria, 9–10 February 2021, pp. 205–208. doi: 10.1109/IHSH51661.2021.9378748.
[9] M. A. Alonso, D. Vilares, C. Gómez-Rodríguez, J. Vilares, Sentiment Analysis for Fake News</p>
      <p>Detection. Electronics. 10 (11) (2021) no. 1348. doi: 10.3390/electronics10111348.
[10] S. Kumar, N. A. Jailani, A. R. Singh, S. Panchal S, Sentiment Analysis on Online Reviews using
Machine Learning and NLTK, in: Proceedings of the 6th International Conference on Trends in
Electronics and Informatics, Tirunelveli, India, 28–30 April 2022, pp. 1183–1189.
[11] M. H. Goldani, R. Safabakhs, S. Momatazi, Convolutional neural network with margin loss for
fake news detection. Information Processing &amp; Management. 58 (2021) no. 102418. doi:
10.1016/j.ipm.2020.102418.
[12] M. Chen, Classification with Convolutional Neural Networks in MapReduce. Journal of</p>
      <p>Computer and Communications. 12 (2024) pp. 174–190. doi: 10.4236/jcc.2024.128011.
[13] P. R. Kanna, P. Santhi, An Enhanced Hybrid Intrusion Detection Using Mapreduce-Optimized
Black Widow Convolutional LSTM Neural Networks. Wireless Pers Commun. 138 (2024)
pp. 2407–2445. doi: 10.1007/s11277-024-11607-0.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>