<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Recall =</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Rule-based method for aligning ukrainian legislation media content with</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khrystyna Lipianina-Honcharenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ihor Ihnatiev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hennadii Bohuta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khrystyna Yurkiv</string-name>
          <email>kh.yurkiv@wunu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanislav Novosad</string-name>
          <email>stasnovosad1998@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>West Ukrainian National University</institution>
          ,
          <addr-line>Lvivska str., 11, Ternopil, 46009</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <issue>81</issue>
      <abstract>
        <p>We introduce a novel rule-based system for automatically mapping unstructured media texts to specific articles of Ukrainian legislation, addressing the growing need for transparent, explainable tools in legal and security monitoring. Leveraging expert-crafted lexical dictionaries for twelve regulatory norms and a threshold-based matching algorithm, our method achieves balanced performance (Precision = 0.84, Recall = 0.83, F1 = 0.84) on a diverse test set of news articles. The system is delivered as an interactive Streamlit application that supports dynamic dictionary updates and simultaneous sentiment analysis, enabling users to assess both legal relevance and emotional tone of content. Through 15 real-world case studies, we demonstrate the approach's practical utility in governmental and media-watch contexts and discuss paths for expanding dictionary coverage and lowering detection thresholds for shorter texts. Our work extends prior research on rule-based text analysis in domains such as cybersecurity and social media, and contributes a reproducible Explainable AI framework tailored for Ukrainian legal monitoring.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the context of the previous study [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which implemented a thematic text classification system
based on emotional coloring with an achieved accuracy of 92%, this paper proposes an improved
approach to a related, yet significantly narrower task — the automatic matching of input text to one
of the predefined articles of Ukrainian legislation. Unlike general categorization aimed at broad
semantic classification, the proposed method enables the establishment of a direct normative link
between the content and a specific legal provision.
      </p>
      <p>
        In the current context of hybrid threats and information attacks, the relevance of automated legal
monitoring systems is significantly increasing, especially in terms of ensuring the internal security
of the state. Previous research in the field of cybersecurity has demonstrated the effectiveness of
neural networks in detecting attacks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and in building intelligent cyber defense systems based on
artificial immunity models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which confirms the potential of specialized rule-based approaches in
the tasks of protecting the information space. At the same time, decision support models in the field
of internal security [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and modeling user responses on social networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] indicate a growing need
for transparent and reproducible algorithms for analyzing sensitive content. The system we propose
for formalized alignment of media texts with legal norms is a logical extension of these approaches
and has the potential to be integrated into the information security architecture at the state or
institutional monitoring level.
      </p>
      <p>The methodology is based on a rule-based approach that utilizes expert-defined dictionaries of
key terms for each of the 12 legal articles. By applying a phrase occurrence counter and threshold
filtering, the system enables highly interpretable identification of the most relevant article or
confirmation of its absence. This approach is particularly valuable in the context of automated
systems for preliminary legal analysis, such as in the monitoring of news, social media, citizen
appeals, or expert commentary.</p>
      <p>The study also implements a sentiment analysis component based on lexicon-oriented evaluation,
which allows for simultaneous assessment of both normative relevance and the overall emotional
tone of a message. The integration of these two dimensions — normative and tonal — enables the
development of an Explainable AI tool for legal monitoring that is both transparent and applicable
without the need for machine learning.</p>
      <p>The structure of the paper is as follows: Section 2 provides an overview of rule-based solutions
in applied domains (medicine, finance, smart contracts, public administration, social research);
Section 3 formalizes the task of matching text with legal articles and details the relevance calculation
algorithm based on expert dictionaries; Section 4 presents the system implementation as a Streamlit
application, including validation results and a case study of 15 news articles; and Section 5
summarizes the findings, compares them with existing approaches, and outlines conclusions
regarding the effectiveness of the proposed method for legal monitoring of media content.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of existing solutions</title>
      <p>To justify the choice of a rule-based approach in this study, a review is provided of the most
representative works in which expert dictionaries and formal rules have ensured high classification
quality across various domains. Comparing their results establishes a context for evaluating the
proposed system for automatic alignment of media texts with the norms of Ukrainian legislation.</p>
      <p>
        The study by Raees &amp; Fazilat (2024) examined the effectiveness of various classification models
for lexicon-based sentiment analysis. Using tweets without emoticons, the researchers achieved an
F1-score above 85%, demonstrating the advantages of a structured polarity dictionary [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The article by Abd et al. (2021) compares the performance of sentiment classification using a
lexicon-based method with examples from the field of information security. The average accuracy
exceeded 80% thanks to the optimization of the Jaccard metric [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The review by Ullah et al. (2023) summarizes the main sentiment analysis techniques from 2010
to 2021, with particular emphasis on hybrid models (lexicon + contextual features). The article
reports accuracy improvements up to 90% in the most recent approaches [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The work by Balshetwar &amp; Tuganayat (2019) proposes a frame-based analysis to determine tone
and mood. By combining frame semantic structures with lexical analysis, the authors achieve a
precision of approximately 82% [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In the study by Catelli et al. (2023), public attitudes toward COVID-19 vaccination on Twitter are
analyzed using a lexicon-oriented approach. A model based on BERT and an expert dictionary
demonstrates a recall of over 88% [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Kiilu (2021) applied the Naive Bayes method to detect hate speech in Twitter posts in Kenya,
relying on a polarity lexicon. The author demonstrated over 80% effectiveness in categorization [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The publication by Satapathy et al. (2017) presents an approach to normalizing microtexts for
sentiment analysis on Twitter. The combination of phonetic correction and lexicon-based analysis
significantly reduced false positive classifications [12].</p>
      <p>Itani (2018) developed a sentiment analysis system for informal Arabic used in social media. Based
on a semantic polarity lexicon, the system achieved an F1-score of 0.86 [13].</p>
      <p>The article by Guarasci et al. (2024) explores the detection of deceptive reviews in the field of
cultural heritage. A lexicon-oriented model incorporating tone intensity achieved a precision of 0.84
[14].</p>
      <p>Finally, the review by Kumar et al. (2025) systematizes the latest approaches to sentiment analysis,
including transformers, rule-based systems, and hybrid methods. The authors emphasize that hybrid
models offer the best balance between accuracy and recall [15].</p>
      <p>The study by Zhang et al. (2025) implemented rule-based methods to identify diseases and
rehabilitation activities in Q&amp;A communities. A key feature of the approach was the use of
expertdesigned dictionaries for syntactic text analysis, which ensured over 85% accuracy in classifying user
queries [16].</p>
      <p>The publication by Lashkari (2024) focuses on detecting vulnerabilities in smart contracts using
rule-based classification with dictionaries of key patterns. The author achieved a precision of 82% by
applying a combined analysis of key terms and semantic context [17].</p>
      <p>The article by Perron et al. (2024) justifies the use of local LLMs for analyzing sensitive texts in
social research, where rule-based structures and expert-defined matching criteria play a central role.
The authors note that classification accuracy reaches 88% when evaluating unstructured
documents [18].</p>
      <p>Thöni (2015) explored the application of text mining in monitoring sustainable development,
where supplier ranking is performed using a lexicon-based approach. The classification model
achieved a recall of 0.81, enabling effective detection of risks related to child labor [19].</p>
      <p>There is also a review by Yeo et al. (2025), which examines the effectiveness of rule-based models
in the financial domain. The study notes that the use of lexical patterns enables F1-scores above 80%
in explainable AI tasks [20].</p>
      <p>The study by Narayanan &amp; Georgiou (2013) demonstrated rule-based classification of behavioral
patterns based on linguistic indicators, with a focus on expert-compiled lexicons. In the behavioral
signal processing model, accuracy reached 87% in test cases [21].</p>
      <p>The work by Chiarello (2019) examines a rule-based approach to extracting technical knowledge,
where expert-defined rules are employed. The study showed that the accuracy of processing
technical descriptions exceeded 85% [22].</p>
      <p>In Nai (2025), rule-based models were applied to analyze public administration expenditures,
including expert queries to align legal content with budget documentation. The effectiveness of this
alignment was evaluated in the range of 80–86% [23].</p>
      <p>Liu &amp; Li (2023) examine the limitations of rule-based translation systems, noting that
expertdefined rules achieve accuracy above 80% only in limited domains. However, such methods remain
valuable in legal translation [24].</p>
      <p>0Rule-based approaches (Table 2) remain an important alternative to statistical and
transformerbased models when full transparency of classification logic is required or when labeled data is
limited. In medical Q&amp;A server communities, the system by Zhang et al. (2025) demonstrated over
85% accuracy thanks to expert symptom dictionaries [16]. Similarly, Lashkari (2024) achieved a
precision of 0.82 in detecting smart contract vulnerabilities by combining lexical patterns with
contextual rules [17]. In social research, Perron et al. (2024) reported 88% accuracy by integrating
local LLMs with rule-based compliance criteria [18]. The studies by Thöni (2015), Yeo et al. (2025),
and Narayanan &amp; Georgiou (2013) confirm the versatility of such methods in sustainable
development, finance, and behavioral analysis, respectively [19–21]. Despite this, there is a lack of
research in the legal monitoring of media that automatically aligns news texts with articles of
national legislation. Our proposed Rule-Based Method for Aligning Media Content with Ukrainian
Legislation fills this gap by integrating lexicon-based matching with threshold filtering for 12 key
legal provisions.</p>
      <p>Given the increasing volume of information flows and the need for rapid legal content analysis,
the rule-based method we propose is both timely and in demand. Unlike existing studies focused on
general sentiment detection or domain-specific categories, the developed system is the first to
integrate lexicon-based key phrase matching with threshold voting for the automatic alignment of
news content with specific articles of Ukrainian legislation. Experiments demonstrated balanced
performance with a Precision of 0.84, Recall of 0.83, and F1-score of 0.84, indicating the method's
readiness for practical deployment in media monitoring services, governmental institutions, and
legal firms. Thus, this research makes a significant contribution to the development of Explainable
AI in the legal domain by offering a transparent, reproducible, and adaptive tool for rapid assessment
of the normative relevance of media content.</p>
      <sec id="sec-2-1">
        <title>Development / Supply ranking</title>
      </sec>
      <sec id="sec-2-2">
        <title>Chains</title>
      </sec>
      <sec id="sec-2-3">
        <title>Financial Reports</title>
      </sec>
      <sec id="sec-2-4">
        <title>Behavioral Signal</title>
      </sec>
      <sec id="sec-2-5">
        <title>Processing</title>
      </sec>
      <sec id="sec-2-6">
        <title>Technical</title>
      </sec>
      <sec id="sec-2-7">
        <title>Documentation</title>
      </sec>
      <sec id="sec-2-8">
        <title>Public Budgets</title>
      </sec>
      <sec id="sec-2-9">
        <title>Legal Translation</title>
      </sec>
      <sec id="sec-2-10">
        <title>Explainable AI + lexical F1-score &gt; 0,80 rules</title>
      </sec>
      <sec id="sec-2-11">
        <title>Linguistic indicators + expert rules</title>
      </sec>
      <sec id="sec-2-12">
        <title>Rule-based knowledge</title>
      </sec>
      <sec id="sec-2-13">
        <title>Accuracy &gt; 85 % extraction</title>
      </sec>
      <sec id="sec-2-14">
        <title>Dictionaries of</title>
        <p>regulatory terms</p>
      </sec>
      <sec id="sec-2-15">
        <title>Rule-based MT Accuracy = 87 % Accuracy = 80– 86 %</title>
        <p>Yeo et al., 2025
[20]
[21]
[22]
Narayanan &amp;
Georgiou, 2013</p>
      </sec>
      <sec id="sec-2-16">
        <title>Chiarello, 2019 Nai, 2025 [23] Liu &amp; Li, 2023 [24]</title>
        <p>3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>3.1. Formal Problem Statement</title>
        <p>Let T denote the input text represented as a sequence of characters or words, normalized to
lowercase to unify substring search. Let the set A = { ,  ,…, 
} contain the names of the legal
articles, each of which is associated with a predefined set of key phrases.</p>
        <p>For each article</p>
        <p>a dictionary  ={  , ,  , ,…,  , }, is defined, where each element  , is a key
phrase or expression that best identifies the subject of that article. These dictionaries are compiled
by legal experts based on an analysis of the legal corpus and the practical application of laws.</p>
        <p>The objective of the method is to compute, for each i-th dictionary, the number of occurrences of
its key phrases in text ловника кількості входжень його ключових фраз у текст T. Formally, we
introduce a counter

=
1</p>
        <p>{ , ⊂ },
 ∗ =  
 ,
where 1{⋅} — denotes the indicator function for the occurrence of a substring.</p>
        <p>After computing the vector 
= ( ,  , … , 
) the task reduces to finding the argument of the
maximum:
"Not Found".</p>
        <p>The final decision is made based on a threshold condition: if  ∗ ≥  (where  = 5 is set by expert
judgment), the text is considered relevant to article  ∗; otherwise, the algorithm returns the result</p>
        <p>Thus, the formalization of the task allows for unambiguous and efficient alignment of the text
with legal articles, ensuring clarity of the decision and the ability to flexibly adjust the threshold
depending on specific application conditions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Algorithm</title>
        <p>Below is a detailed description of each step of the algorithm, which not only enables its
implementation in code but also ensures transparency and reproducibility of all internal operations:</p>
        <p>Step 1. Text and dictionary normalization [25, 26]. The input text T is converted to lowercase to
eliminate case sensitivity. Similarly, each key phrase  , is transformed to lowercase, ensuring the
method is case-insensitive.
will serve as a counter for the number of key phrase matches from dictionary  in the text.
(1)
(2)</p>
        <p>Step 3. Match search. For each i-th dictionary, the presence of each phrase  , in text T is
iteratively checked. If the condition  , ⊂ T is satisfied, the counter  is incremented by 1.</p>
        <p>Step 4. Candidate identification. After processing all dictionaries, a vector N = ( ,…, 
) is</p>
        <p>is computed, indicating the dictionary with the highest number
formed. The index  ∗ =  
of matches.</p>
        <p>Step 5. Threshold check and result output. If  ∗ ≥ 5 the algorithm returns the name of article  ∗;
if no counter reaches the threshold, the result is "Not Found". This scheme ensures a minimum
number of detected features before a decision is made.</p>
        <p>The following section is devoted to the quantitative analysis of the accuracy and interpretability
of the results. In a series of experimental scenarios, the system will be evaluated using precision,
recall, and F1-score metrics in order to compare the effectiveness of the proposed approach with
existing solutions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation</title>
      <sec id="sec-4-1">
        <title>4.1. Technical Description of the Developed System</title>
        <p>The developed automated text analysis system is implemented as a web application using the
Streamlit framework [27], which enables interactive user engagement and dynamic interface updates
without the need for separate server deployment. The user interface consists of a sidebar control
panel that allows the selection of a thematic category (legal article) and provides options for
managing the corresponding key phrase dictionary: adding, deleting, or viewing the current list of
terms. This structure ensures the system’s adaptability to changes in legal terminology and discourse
contexts, allowing experts to independently configure the dictionaries according to current needs.</p>
        <p>The main part of the interface offers two input modes: manual text entry or providing a URL link
to a news article, which is automatically processed by the system to extract textual content.
Additionally, a sentiment inversion option is implemented, which is useful when analyzing texts that
may contain rhetorical devices such as irony or sarcasm. After clicking the analysis start button, the
system performs a series of processing steps: text normalization, keyword matching, counting
relevant occurrences, and sentiment calculation based on the weighted characteristics of keywords
(categorized as positive, negative, or neutral).</p>
        <p>The analysis results are displayed in a user-friendly format, indicating the legal category,
sentiment index, and the processed text content. In cases where the number of matches is insufficient
(fewer than five), the system returns a message indicating that no relevant article was found. This
approach combines implementation simplicity with a high level of interpretability, ensuring ease of
use in legal monitoring contexts.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation of Classification Model Accuracy</title>
        <p>To evaluate the effectiveness of the algorithm for matching texts to legal articles, testing was
conducted on a control dataset with a uniform distribution across 12 classes. Based on the obtained
results, a confusion matrix was constructed (Figure 2), illustrating the relationship between actual
and predicted classes. Most observations are concentrated along the main diagonal, indicating high
classification accuracy.</p>
        <p>The overall accuracy is 83.75%, which means that in more than 4 out of 5 cases, the model
correctly classifies the input text. The precision score of 0.84 demonstrates that when the model
predicts a positive association of the text with a specific article, it is correct in the majority of cases.</p>
        <p>The recall metric of 0.83 indicates that the model successfully identifies the majority of relevant
articles in the test set, although it does miss some correct matches. In turn, the F1-score of 0.84
reflects a balance between precision and recall, confirming that the model is not only effective but
also robust against false predictions.</p>
        <p>Some degree of inter-class confusion is observed (particularly between K1↔K2 and K6↔K7),
which may be due to overlapping key terms in certain articles. However, the number of such errors
is relatively small compared to the total number of correct classifications.</p>
        <p>Thus, the results indicate a high level of agreement between the model's predictions and the actual
matches in the dataset, making it suitable for tasks involving automated analysis of the legal
relevance of texts.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Case Studies</title>
        <p>Among the 15 processed cases (Table 2), the largest proportion pertains to materials related to
“Military-Political Leadership of Ukraine at All Levels” (7/15, 46.7%), followed by “Ukraine’s
International Image in the USA, Canada, and the United Kingdom” (2/15, 13.3%), “Ukraine’s
International Image in the EU” (2/15, 13.3%), “Law Enforcement Agencies of Ukraine” (2/15, 13.3%),
and “Armed Forces of Ukraine” (2/15, 13.3%).</p>
        <p>Thus, the results of the 15 analyzed cases demonstrated that:
 The sentiment percentage ranged from −10% to +30%, with the following distribution:
a. Positive (≥ +10%) — 8 cases (53.3%);
b. Negative (≤ −10%) — 4 cases (26.7%);
c. Neutral (0%) — 3 cases (20%).</p>
        <p>BBC News, At least eight people
killed and more than 80 injured in
overnight attack on Kyiv, BBC
News, April 24, 2025 [28]
Hromadske, Тіло журналістки
Рощиної, яку закатували в
полоні, повернули в Україну,
Hromadske, 2025 [29]
Hromadske Rehiony, На
Тернопільщині почали
ексгумацію польських жертв
Волинської трагедії, Hromadske,
2025 [30]
BBC News, Pope Francis to be
buried at Santa Maria Maggiore,</p>
        <p>BBC News, April 24, 2025 [31]
13 2day.kh.ua, На Харківщині
загасили масштабну лісову
Category</p>
        <sec id="sec-4-3-1">
          <title>Ukraine’s</title>
          <p>International Image in
the USA, Canada, and
the United Kingdom
Armed Forces of
Ukraine
0 %
+30 %</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-4">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-5">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-6">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-7">
          <title>Constitution of Ukraine, Part 4, Article 32</title>
        </sec>
        <sec id="sec-4-3-8">
          <title>Not entered</title>
        </sec>
        <sec id="sec-4-3-9">
          <title>Criminal Code</title>
          <p>of Ukraine,
Article 109
 The average sentiment value was +6%, the median was +10%, and the standard deviation
was approximately 11.5%..
 In 5 out of 15 cases (33.3%), a relevant legal article was automatically identified:
a. Constitution of Ukraine, Part 4 Article 32 (2 cases);
b. Civil Code of Ukraine, Article 278 (2 cases);
c. Criminal Code of Ukraine, Article 109 (1 case);
d. Criminal Code of Ukraine, Article 182 (1 case);
e. Criminal Code of Ukraine, Article 259 (1 case).
 In 10 cases (66,7 %) the keyword occurrence threshold (τ=5) was not reached, and
therefore no relevant article was identified.</p>
          <p>The highest percentage of successful matches to legal articles was observed in the categories “Law
Enforcement Agencies of Ukraine” (1/2, 50%) and “Military-Political Leadership of Ukraine” (3/7,
42.9%). In the thematic “international image” categories, no correct matches were identified (0/4, 0%),
indicating insufficient representation of relevant key word forms in these groups.</p>
          <p>The obtained quantitative indicators suggest satisfactory accuracy in sentiment analysis of texts,
but a limited ability of the algorithm to identify specific legal articles.</p>
          <p>To improve the frequency of successful matches to legal norms, expert dictionaries should be
expanded, enriched with synonymous forms, and consideration should be given to lowering the
threshold τ for shorter news materials.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Сonclusions</title>
      <p>The conducted evaluation showed that the proposed rule-based method for automatically
aligning media texts with articles of Ukrainian legislation demonstrates an overall accuracy of 83.8%,
corresponding to Precision = 0.84, Recall = 0.83, and F1-score = 0.84 on a control sample of 12 legal
classes. These metrics are only slightly lower than the results of previous rule-based studies in
medical Q&amp;A domains (Accuracy &gt; 85%) and social research (Accuracy = 88%) [16, 18], indicating
the competitiveness of the approach even within the highly specialized legal domain. At the same
time, our metrics demonstrate a high balance between precision and recall, which is critical for
realworld media monitoring tasks where classification errors may have significant legal implications.</p>
      <p>A detailed analysis of the confusion matrix revealed that most errors occurred between closely
related categories such as “Military-Political Leadership of Ukraine” and “Law Enforcement Agencies
of Ukraine” (K1↔K2, K6↔K7), which share several key terms. Furthermore, in a series of 15
realworld case studies, the algorithm successfully identified a relevant article in only 33.3% of cases
(5/15), which is likely due to the fixed threshold τ = 5 and the relatively short length of typical news
content. The highest rate of successful matches was recorded in the category “Law Enforcement
Agencies of Ukraine” (50%), and the lowest — in the international image categories (0%), highlighting
the need to further refine expert dictionaries specifically for “international” topics.</p>
      <p>The obtained results highlight the value of the proposed Explainable AI tool: the system operates
without the need for trained models, offering full transparency of its decision logic and flexible
terminology customization through the user interface. This architecture is particularly beneficial for
governmental institutions and legal firms, where it is critically important to reproduce why and
according to which rules a text was matched to a specific legal article. Furthermore, the combination
of normative alignment with sentiment analysis opens opportunities for comprehensive monitoring
of reputational risks and compliance with media standards.</p>
      <p>Future research will focus on expanding and enriching expert dictionaries with new synonyms,
idiomatic expressions, and polysemous lexemes; adapting the threshold τ based on text length and
genre to increase sensitivity to shorter messages; integrating semantic methods (e.g., word
embeddings or topic ontologies) to reduce inter-class confusion; developing hybrid approaches that
combine rule-based logic with lightweight statistical or transformer modules; and enabling
multilingual support and cross-national comparative analysis of legal texts. Together, these
enhancements aim to significantly improve the system's legal coverage, increase the frequency of
successful matches, and broaden its practical applications in legal analysis of media content.
Declaration on Generative AI</p>
      <p>The authors used GPT-4 and DeepL to prepare this paper: Grammar and Spelling Checker. After
using these tools, the authors reviewed and edited the content as necessary and are solely responsible
for the content of the publication.
[12] R. Satapathy, C. Guerreiro, I. Chaturvedi, Phonetic-based microtext normalization, in: 2017
International Conference on Advances in Computing, Communications and Informatics (ICACCI),
IEEE, 2017. URL: https://ieeexplore.ieee.org/abstract/document/8215691/.
[13] M. Itani, Sentiment analysis for informal Arabic text, Ph.D. thesis, Sheffield Hallam University
(2018). URL: https://shura.shu.ac.uk/23402/1/Itani_2018_phd_SentimentAnalysisAnd.pdf.
[14] R. Guarasci, R. Catelli, M. Esposito, Deceptive reviews detection in cultural heritage domain,
Expert Systems with Applications (2024). URL:
https://www.sciencedirect.com/science/article/pii/S0957417424009977.
[15] M. Kumar, L. Khan, H. T. Chang, Evolving techniques in sentiment analysis, PeerJ Computer</p>
      <p>Science (2025). URL: https://peerj.com/articles/cs-2592/.
[16] Y. Zhang, T. Wang, Y. Wang, J. Cao, Knowledge discovery of diseases symptoms and rehabilitation
measures in Q&amp;A communities, 2025. URL:
https://www.nature.com/articles/s41598-025-983009.
[17] B. Lashkari, Classification and Vulnerability Detection of Ethereum Energy Smart Contracts, 2024.</p>
      <p>URL: https://era.library.ualberta.ca/items/5d6a38c4-f7f8-4c24-bb5c-23ba247ac1e7.
[18] B. E. Perron, H. Luan, B. G. Victor, Moving Beyond ChatGPT: Local Large Language Models (LLMs)
and the Secure Analysis of Confidential Unstructured Text Data in Social Work Research, 2024.</p>
      <p>URL: https://journals.sagepub.com/doi/abs/10.1177/10497315241280686.
[19] A. Thöni, Sustainability risk monitoring in supply chains: ranking suppliers using text mining and
Bayesian networks with a focus on child labor, Dissertation, Technische Universität Wien, 2015.</p>
      <p>URL: https://doi.org/10.34726/hss.2015.30640.
[20] W. J. Yeo, W. Van Der Heever, R. Mao, E. Cambria, A comprehensive review on financial
explainable AI, 2025. URL: https://link.springer.com/article/10.1007/s10462-024-11077-7.
[21] S. Narayanan, P. G. Georgiou, Behavioral signal processing: Deriving human behavioral
informatics from speech and language, Proc. IEEE 101 (5) (2013) 1203–1233.
[22] F. Chiarello, Mining Technical Knowledge, Ph.D. thesis, 2019. URL:
https://tesidottorato.depositolegale.it/handle/20.500.14242/131397.
[23] R. Nai, Analysis of Public Administration Procurement and Expenditures Related to Energy
Efficiency Improvements, Ph.D. thesis, 2025. URL:
https://tesidottorato.depositolegale.it/handle/20.500.14242/199436.
[24] X. Liu, C. Li, Artificial intelligence and translation, in: Routledge Encyclopedia of Translation</p>
      <p>Technology, Routledge, 2023, pp. 280–302.
[25] K. Lipianina-Honcharenko, A. Melnychuk, K. Yurkiv, G. Hladiy, M. Telka, Integrated Approach
to the International Aspects of Online Dispute Resolution Formation, in: Proceedings of the First
International Workshop of Young Scientists on Artificial Intelligence for Sustainable Development,
Ternopil, Ukraine, May 2024, pp. 88–98.
[26] K. Lipianina-Honcharenko, D. Lendiuk, M. Nazar Melnyk, T. L. Komar, Evaluation of the</p>
      <p>Keyword Selection Methods Effectiveness for the Fake News Classification, 2024.
[27] Streamlit app, Streamlit, n.d. Retrieved April 24, 2025. URL:
https://nelczgkwcsghtwdkw7buxn.streamlit.app/.
[28] BBC News, At least eight people killed and more than 80 injured in overnight attack on Kyiv, BBC</p>
      <p>News, April 24, 2025. URL: https://www.bbc.com/news/articles/cd7v0lgg18xo.
[29] Hromadske, Тіло журналістки Рощиної, яку закатували в полоні, повернули в Україну,
Hromadske, 2025. URL:
https://hromadske.ua/viyna/243651-tilo-zurnalistky-roshchynoyi-iakuzakatuvaly-v-poloni-povernuly-v-ukrayinu-zastupnyk-hlavy-mzs.
[30] Hromadske Rehiony, На Тернопільщині почали ексгумацію польських жертв Волинської
трагедії, Hromadske, 2025. URL:
https://hromadske.ua/rehiony/243654-na-ternopilshchynipochaly-ekshumatsiiu-polskykh-zertv-volynskoyi-trahediyi-rmf-fm.
[31] BBC News, Pope Francis to be buried at Santa Maria Maggiore, BBC News, April 24, 2025. URL:
https://www.bbc.com/news/articles/cgrgzl0vyqpo.
[32] Hromadske, Вижити в «автобусі смерті». Репортаж із Сум, Hromadske, 2025. URL:
https://hromadske.ua/viyna/243172-vyzyty-v-avtobusi-smerti-reportaz-iz-sum.
[33] Fox News, Key Karen Read witness admits grand jury testimony wasn’t true, Fox News, 2025.</p>
      <p>URL:
https://www.foxnews.com/us/key-karen-read-witness-admits-grand-jury-testimonywasnt-true.
[34] Magnolia-TV, У Києві чоловіка судитимуть за обвинуваченням у вуличному збуті
психотропів, Magnolia-TV, 2025. URL:
https://magnolia-tv.com/news/124548-u-kyyevi</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lendiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lipianina-Honcharenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dobrowolski</surname>
          </string-name>
          , G. Boguta, and L. Bytsyura, “
          <article-title>Method of determining the text sentiment by thematic rubrics”</article-title>
          ,
          <source>in Intell. Syst. Workshop CoLInS</source>
          <year>2024</year>
          . CoLInS,
          <year>2024</year>
          . Accessed: Apr.
          <volume>24</volume>
          ,
          <year>2025</year>
          . [Online]. Available: https://doi.org/10.31110/colins/2024-3/
          <fpage>026</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Komar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dorosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hladiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          ,
          <article-title>Deep neural network for detection of cyber attacks</article-title>
          ,
          <source>in: Proceedings of the 2018 IEEE 1st International Conference on System Analysis and Intelligent Computing (SAIC</source>
          <year>2018</year>
          ), IEEE,
          <year>2018</year>
          . Article ID:
          <fpage>8516753</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Komar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bezobrazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Golovko</surname>
          </string-name>
          ,
          <article-title>Intelligent cyber defense system using artificial neural network and immune system techniques</article-title>
          ,
          <source>Communications in Computer and Information Science</source>
          <volume>783</volume>
          (
          <year>2017</year>
          )
          <fpage>36</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dyvak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Melnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kedrin</surname>
          </string-name>
          ,
          <article-title>Interval model of the user reactions to messages in thematic groups of social networks</article-title>
          ,
          <source>in: 2022 IEEE 16th International Conference on Advanced Trends in Radioelectronics</source>
          , Telecommunications and Computer Engineering (TCSET), IEEE,
          <year>2022</year>
          , pp.
          <fpage>837</fpage>
          -
          <lpage>840</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kovalchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kasianchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karpinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shevchuk</surname>
          </string-name>
          ,
          <article-title>Decision-making supporting models concerning the internal security of the state</article-title>
          ,
          <source>International Journal of Electronics and Telecommunications</source>
          (
          <year>2023</year>
          )
          <fpage>301</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Raees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fazilat</surname>
          </string-name>
          ,
          <article-title>Lexicon-based sentiment analysis on text polarities with evaluation of classification models</article-title>
          ,
          <source>arXiv preprint arXiv:2409.12840</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Abd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Sadiq</surname>
          </string-name>
          ,
          <article-title>Analyzing sentiment system to specify polarity by lexiconbased</article-title>
          ,
          <source>Bulletin of Electrical Engineering and Informatics</source>
          (
          <year>2021</year>
          ). URL: https://beei.org/index.php/EEI/article/view/2471.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Nawi</surname>
          </string-name>
          ,
          <article-title>Review on sentiment analysis for text classification techniques</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          (
          <year>2023</year>
          ). URL: https://link.springer.com/article/10.1007/s11042- 022-14112-3.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Balshetwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Tuganayat</surname>
          </string-name>
          ,
          <article-title>Frame tone and sentiment analysis</article-title>
          ,
          <source>ResearchGate preprint</source>
          (
          <year>2019</year>
          ). URL: https://www.researchgate.net/publication/335811449_Frame_Tone_and_Sentiment_Analysis.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Catelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pelosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Comito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pizzuti</surname>
          </string-name>
          ,
          <article-title>Lexicon-based sentiment analysis on Twitter in Italy, Computers in Biology and Medicine (</article-title>
          <year>2023</year>
          ). URL: https://www.sciencedirect.com/science/article/pii/S0010482523003414.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>K. K. Kiilu</surname>
          </string-name>
          ,
          <article-title>Sentiment Classification for Hate Tweet Detection in Kenya, Master's thesis</article-title>
          , Jomo Kenyatta University of Agriculture and Technology (
          <year>2021</year>
          ). URL: http://ir.jkuat.ac.ke/handle/123456789/5521.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>