<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>unstructured Ukrainian texts⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kirill Smelyakov</string-name>
          <email>kirillsmelyakov@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiya Chupryna</string-name>
          <email>anastasiya.chupryna@nure.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kateryna Liulina</string-name>
          <email>kateryna.liulina@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Internal Affairs</institution>
          ,
          <addr-line>L. Landau Avenue 27 61080 Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky Ave 14, 61166, Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents a hybrid method for identifying named location entities in unstructured Ukrainian text. The proposed system combines machine learning methods, using the Stanza library for initial named entity detection, with rule-based algorithms that leverage the Universal Dependencies framework to identify prepositions of place and entity expansions through parsing. This enables the recognition of complex location cues, such as relative directions and descriptive elements. Testing on a dataset of 200 Ukrainian texts, including fiction and social media posts, showed that the hybrid approach achieves an Fscore of 80.79%, a precision of 82.88%, and a recall of 78.81%, outperforming individual machine learning methods (F-score of 44.31%) or rule-based approaches alone. The proposed named entity recognition scheme for locations performs well on texts taken from news and social media, but highlights the need for greater diversity in training data to handle literary styles. The proposed method can be integrated into modern emergency response systems, delivery services, and geographic information systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;named entity recognition</kwd>
        <kwd>machine learning</kwd>
        <kwd>location</kwd>
        <kwd>rule-based algorithm</kwd>
        <kwd>Ukranian language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Natural language processing systems continue to be explored and studied for different tasks,
especially named entity recognition for extraction of structured information from unstructured
text. While significant progress has been made for languages such as English, Chinese, Spanish,
and other European languages, significant challenges remain for other languages. The challenges of
extracting named entities from unstructured text include the limited annotated corpora, the
morphological complexity of languages, and their linguistic specificity. Ukrainian is a rich language
with a well-developed morphology, including a complex case system, extensive inflection, and
diverse syntactic constructions. Ukrainian's flexibility comes from its case system, which lets words
be arranged freely. This creates specific problems for recognizing named entities in text.</p>
      <p>Location recognition is crucial in emergency response systems, delivery services, and
geographic information systems. These systems require not only the identification of standard
toponyms (cities, countries, streets), but also more complex location details, which may include
relative
directions, descriptive
words, and
contextual elements that
establish
precise
georeferencing.</p>
      <p>This research examines methods and approaches for recognizing named entities of locations and
places in unstructured Ukrainian-language text. We propose a combined approach to entity
recognition using statistical machine learning and rule-based methods. Our methodology utilizes
the Universal Dependencies framework to identify relationships between words. Analysis of these
relationships enables the recognition of complex location descriptions by detecting syntactically
related components.</p>
      <p>The primary contributions of this work include the development of a hybrid system for
recognizing named entities of locations and places in Ukrainian-language text and the evaluation of
various architectural configurations combining machine learning and rule-based methods.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Research in the field of intelligent text data processing</title>
        <p>
          The article “Effectiveness of Modern Text Recognition Solutions and Tools for Common Data
Sources” examines the capabilities of optical character recognition (OCR) using tools such as
EasyOCR and TesserOCR [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The authors evaluate their performance on various data types,
including electronic documents, web pages, and advertising banners, and offer recommendations
for the use of these tools, taking into account the degree of data corruption. This study is of
interest in the context of this work, as it proposes mechanisms for expanding the capabilities of
converting various data types into a format suitable for further text processing. For example, in the
context of emergency response systems, this solution could be implemented to analyze images
provided by witnesses to automatically identify additional information about the scene.
        </p>
        <p>
          The paper “Methods of Multilingual Question Answering” examined question-answering
systems for various languages, with a particular focus on the Ukrainian language, for which
machine learning data is typically lacking [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. They explore various approaches to creating such
systems using BERT-based language models. It was found that tuning the model to English
improves its performance with the Ukrainian language. Together, these studies demonstrate how
text information is processed in intelligent systems today. First, text is extracted from visual
sources. The system then recognizes the text, understands its meaning, analyzes questions, and
finds appropriate answers.
        </p>
        <p>Both studies are important in that they offer practical solutions that expand the capabilities of
obtaining information about incident locations and location markers and help convert
userprovided information into an acceptable format for further text analysis and extraction of useful
information, particularly named location entities.</p>
        <p>
          Modern researchers also point out that the development of new Ukrainian-language datasets is
a pressing issue [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]. This problem stems from the fact that some machine learning algorithms
require significant datasets for research, especially when applied to a specific application domain.
These considerations also suggest that text processing methods that are less dependent on the
comprehensiveness of the provided dataset may be of particular interest.
        </p>
        <p>
          It is worth noting that research in natural language processing, as applied to the Ukrainian
language, remains relevant. Contemporary researchers are developing approaches to improving the
quality of Ukrainian source text for further processing, specifically proposing solutions for
eliminating errors in Ukrainian text [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Common approaches to Named Entity Recognition</title>
        <p>
          Named Entity Recognition methods can be based individually or simultaneously on lexicon-based,
rule-based, and machine learning-based approaches. Lexicon-based methods can be used in settings
where a corpus is not available. These models combine the results of morphological analysis, a set
of lexicons, and stemming and lemmatization methods. Rule-based methods can be used for entities
that have a specific structure that can be constructed according to predefined patterns and rules.
They can be used to recognize entities such as phone numbers, email addresses, personal data (ID
numbers, bank card numbers), dates, and other data in a predefined format [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Machine
learningbased methods are more adaptive and require the creation of training data for predictive text
labeling based on predefined named entities and their categories.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Applied research in Named Entity Recognition</title>
        <p>
          Studies in named entity recognition explore different ways of dealing with data limits and language
problems. The paper “Named Entity Recognition for Sensitive Data Discovery in Portuguese”
examined and analyzed methods for recognizing named entities in Portuguese source texts. The
authors proposed a comprehensive solution for named entity recognition that involves rule-based
and grammatical pattern-based methods; dictionary-based methods; and machine learning
methods. It was found that the most effective machine learning methods for identifying named
entities of locations in Portuguese are Bi-LSTM (bidirectional LSTM) and CRF (Conditional
Random Fields) models [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Similar methods have been used to recognize named location entities in
Chinese text and have demonstrated high efficiency in quickly identifying locations in
missioncritical systems [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ].
        </p>
        <p>
          The paper “Geographic Named Entity Recognition by Employing Natural Language Processing
and an Improved BERT Model” demonstrated that the developed model can effectively identify
fine-grained categories of geographic entities in Chinese [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This model incorporates components
such as ALBERT (A Lite BERT), a lightweight version of BERT for efficient feature extraction, a
bidirectional LSTM for capturing contextual dependencies, and a CRF layer for optimal sequential
labeling.
        </p>
        <p>
          The paper “Location Named-Entity Recognition using a Rule-Based Approach for Balinese
Texts” investigated the effectiveness of rule-based methods for identifying named location entities
in Balinese source texts. It was found that rule-based methods are effective in identifying specific
locations based on given rules and patterns. It was found that location entities that do not follow
expected patterns are better recognized using machine learning methods [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Researchers often note that hybrid methods are effective for recognizing named location entities
when processing text in different natural languages. For example, the paper “Location Reference
Recognition from Texts: A Survey and Comparison” found that combining regular expressions and
grammar rules with machine learning classifiers enables higher F1 scores for location extraction
tasks [10].</p>
        <p>The patent “Efficient and Accurate Method and Apparatus for Recognizing Named Entities”
describes an algorithm for extracting and recognizing named entities in digital documents and text.
This patent describes an approach to recognizing named entities using machine learning
algorithms in combination with rule-based algorithms. The results of both algorithms are analyzed
separately and combined to make a final decision on whether the recognized entity is the desired
one. This patent proposes the use of two sets of rules for identifying named entities: a general set
and a specific set of rules. The first set includes statistical information about how frequently used
words typically refer to certain named entities or are used as a specific part of speech. The specific
set of rules defines patterns and regular expressions that typically describe specific types of named
entities [11].</p>
        <p>Thus, it can be concluded that to solve the problem of recognizing specific types of named
entities, it is worthwhile to conduct research on both named entity recognition methods based on
machine learning and to analyze the features of the language to determine a set of specific rules
that help recognize concrete types of named entities.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>To solve the problem of recognizing named location entities in Ukrainian text, we propose a hybrid
system consisting of several modules: a module for preprocessing the incoming text; a module for
recognizing named location entities using machine learning; a module for recognizing prepositions
of place using a rule-based algorithm; a module for expanding recognized named entities using a
rule-based algorithm; and a module for outputting the text processing results. The schematic of the
proposed solution is shown in Figure 1.</p>
      <p>At the initial stage, the text preprocessing module splits the text into sentences, and the
sentences into tokens; determines parts of speech; analyzes morphological features of words; and
identifies relationships between words in a sentence. To implement this module, we propose using
the Stanza library, which supports the Ukrainian language and contains ready-made trained models
using the uk-lang corpus [12, 13].</p>
      <p>The second stage involves recognizing keywords in named location entities. We propose using a
module for recognizing named location entities using machine learning. This module uses the
Stanza library, which provides recognition capabilities for named location entities in text. As a
result, keywords indicating locations are recognized. To detect dependent words, an additional
module recognizing prepositions of place is proposed. This module analyzes the morphological
properties of a word and, if a spatial preposition is detected, marks it as a separately recognized
named location entity. Spatial prepositions in Ukrainian include words such as “у» (in), “на” (on),
“біля” (near), “за” (behind), and others. They signal potential location references.</p>
      <p>The third stage uses a module to expand recognized named entities using a rule-based
algorithm. This stage is necessary for identifying dependent words that can clarify or complement
the keywords of named location entities. Clarifying words include adjectives, conjunctions, relative
nouns, and numerals. Additional location information is extracted by analyzing the relationships
between words. This is done using the Universal Dependencies framework, which describes the
relationships between words.</p>
      <sec id="sec-3-1">
        <title>3.1. A module for recognizing prepositions of place using a rule-based algorithm</title>
        <p>The Universal Dependencies framework describes and systematizes the annotation of grammars for
various natural languages [14, 15]. This instruction is intended to describe the parts of languages,
the morphological features of words, and the syntactic relationships between them [16-21]. An
analysis of the grammar of the Ukrainian language reveals that spatial prepositions are used in
most cases to indicate place and location in sentences. Some prepositions and examples of their use
are listed in Table 1.
У місті, в Ужгороді, у саду, In the city, in Uzhhorod, in the
в театрі garden, in the theater
На дорозі, на столі</p>
        <sec id="sec-3-1-1">
          <title>On the road, on the table</title>
          <p>Від дороги, від школи</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>From the road, from the school</title>
          <p>Під столом, під деревом</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Under the table, under the tree</title>
          <p>За будинком, за лікарнею</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Behind the house, behind the</title>
          <p>hospital
Перед будинком, перед In front of the house, in front of
машиною the car
По вулиці, по воді</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>On the street, on the water</title>
          <p>До школи, до зупинки</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>To school, to the bus stop</title>
          <p>When forming phrases with prepositions, a dependency is formed between words, where the
preposition of place is the dependent word, and the noun or pronoun forming the phrase acts as
the main word. The Universal Dependencies framework annotates such dependencies as “Case”. A
“Case” dependency is used to indicate relationships between nouns and other parts of speech
(adjectives, numerals, prepositions). There are also subtypes of Case dependencies, one of which is
“Loc”. The “Loc” identifier is used to indicate relationships between words indicating place or time.
For example, the sentence “In September I was in Greece” contains two “Loc”-type case
dependencies: the first is “In September”, which denotes time; the second is “in Greece”, which
denotes place. By identifying such dependencies, it is possible to discover keywords that indicate
either place or time.</p>
          <p>“Loc”-type case dependency analysis can be used to develop a rule-based algorithm for
identifying locations in unstructured text. This approach allows to recognize additional named
location entities that would otherwise be missed by machine-learning-based algorithms.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. A module for expanding named entities using a rule-based algorithm</title>
        <p>The entity expansion algorithm uses recursive analysis of syntactic relationships to identify words
related to location words. It distinguishes between the expansion of head words by dependent
words and vice versa.</p>
        <p>For nominal relationships (nmod, amod, nummod, det), bidirectional expansion is used. That is,
when a head word is detected, all dependent words are included, and when a dependent word is
detected, the head word and its other dependent words are included. This helps accurately define
complex location descriptions while avoiding redundancy.</p>
        <p>For structural relationships (appos, conj), related elements are considered equivalent, and
expansion involves identifying dependent words for each equivalent component. For acl (adjective
clause) relationships, the algorithm analyzes the structure of the dependent clause to capture
detailed descriptive content.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset Construction</title>
        <p>The Ukrainian corpus lang-uk, comprising 262 texts, was used as the Ukrainian-language corpus
for training named entity recognition models. The primary source is the open corpus of Ukrainian
texts, the Brown Corpus of the Ukrainian Language.</p>
        <p>Ukrainian fiction and a set of messages from Telegram news channels were used as additional
data for experiments on the effectiveness of location entity recognition. These data include
sentences specifying locations using complex literary expressions, as well as live messages in
Ukrainian from modern native speakers.</p>
        <p>The following text sources were used for analyzing location entities from fiction: the social and
everyday novella “Kaidasheva Family” by Ivan Nechuy-Levytsky, the autobiographical novella
“Ocharovannaya Desna” by Oleksandr Dovzhenko, and the novel “Do Oxen Roar as if the Manger
is Full?” by Panas Myrnyi.</p>
        <p>When forming the test data sample, 20% of the data was taken from fiction and 80% from news
channels. This distribution is primarily due to the fact that the channels contain texts in the format
in which system users might specify and describe locations in third-party service offers. For
example, Telegram messages include messages such as “If anyone can, please check if anyone lives
on Solnechnaya Street and Molochnaya Street, 5 (opposite the Department Store)” or “Does anyone
know where Alex and his family are now? They lived on Nizhnyaya Street and were in the house
near the clinic.” These offers include both partial address information and additional details that
could help pinpoint a more precise location.</p>
        <p>To conduct the experiments, a test dataset in Ukrainian was created containing 200 records. It
was taken 40 records from fiction and 160 records from Telegram news channels.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metrics</title>
        <p>Standard information retrieval metrics are used to evaluate the system's performance: precision,
recall, and F-score. These metrics are calculated at the level of individual words, rather than entire
named entities. This is necessary to account for complex, wordy location descriptions, which are
the primary focus of this paper.</p>
        <p>The Precision metric calculates the percentage of correctly identified entities among all
recognized entities (see Formula 1).</p>
        <p>Precision=</p>
        <p>TP
TP + FP</p>
        <p>,
Recall=</p>
        <p>TP
TP + FN</p>
        <p>,
where TP – a number of true positive results, FP – a number of false positive results.</p>
        <p>The Recall metric reflects the percentage of correctly recognized named entities among those
that should have been recognized (see Formula 2).</p>
        <p>where TP – a number of true positive results, FN – a number of false negative results.</p>
        <p>The F-score is a balanced metric of the method's performance and is calculated using the
following formula (see Formula 3).
(1)
(2)
(3)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. System Configuration</title>
        <p>During the experiment, the effectiveness of each system module used to solve the problem of
recognizing named entities of locations is assessed. The following system configurations are
proposed:



</p>
        <p>Using only the named entity recognition module based on machine learning (Configuration
1 – ML module).</p>
        <p>Using the named entity recognition module based on machine learning and a module for
expanding recognized named entities using a rule-based algorithm (Configuration 2 – ML
module + Extending module).</p>
        <p>Using a module for recognizing prepositions of place using a rule-based algorithm and a
module for expanding recognized named entities using a rule-based algorithm
(Configuration 3 – Rule-based module + Extending module)
Using the named entity recognition module based on machine learning, a module for
recognizing prepositions of place using a rule-based algorithm, and a module for
recognizing prepositions of place using a rule-based algorithm (Configuration 4 – ML
module + Rule-based module + Extending module).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Overall Performance</title>
        <p>Although pure machine learning module (Configuration 1) shows very high precision (97.41%),
its recall is quite low (28.68%), resulting in a mediocre F-score (44.31%). This suggests that this
method is good at identifying frequently occurring, standard place-name descriptors but misses
non-standard location descriptions.</p>
        <p>Combination of the machine learning module with the module for extracting additional words
using rule-based algorithm (Configuration 2) significantly increases recall to 54.38% while
maintaining acceptable precision (81.46%). As a result, the F-score increases by 20.91% to 65.22%.
This demonstrates that syntactic relationship analysis helps more accurately identify named
location entities, rather than simply identifying individual words.</p>
        <p>The combination of rule-based modules for location prepositions detection and their extension
(Configuration 3) performs better than the Configuration 2, with an F-score of 71.19%. Thus, using
linguistics to identify spatial prepositions improves the retrieval of named location entities
compared to machine learning methods.</p>
        <p>The hybrid system (Configuration 4) demonstrates the best results, achieving an F-score of
80.79% and allows for the most accurate detection of named location entity boundaries in text.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Domain-Specific Analysis</title>
        <p>Tables 3 and 4 present a performance analysis showing significant differences depending on the
type of text: literary, news, or social media content.</p>
        <p>The machine learning component performs better on news texts (F-score 52.01%) than on
literary texts (F-score 6.89%). The hybrid system shows significantly better results on news and
social media texts (F-score 85.16%) than on literary texts (F-score 59.37%). This is due to the lang-uk
training corpus primarily consisting of news and web texts, which matches the characteristics of
the target text.</p>
        <p>Machine learning methods, even with the recognized named entity extension module enabled,
face challenges when working with literary texts, resulting in a significant performance drop in all
configurations. Specifically, the machine learning component demonstrates low efficiency in
processing literary material, achieving minimal recall rates (3.57% and 5.95%), even with high
accuracy. This suggests that the training set lacks literary texts, limiting the applicability of
machine learning methods and requiring the development of a specialized dataset for a specific
application. Approaches based on linguistic rules demonstrate superior performance in analyzing
literary content, achieving an F-score of 56.00% compared to 11.23% for Configuration 2. This
superiority indicates that spatial prepositions and syntactic structures maintain their consistency
across different text types, while statistical patterns identified based on news texts cannot be
generalized to literary language.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Modules Usage Analisys</title>
        <p>An analysis of the contributions of the individual components reveals different trends in the
precision-recall tradeoff, as well as complementary strengths. The machine learning component is
better at identifying common place names and well-represented feature types, but struggles with
morphological inflections and descriptive place names.</p>
        <p>The rule-based components demonstrate more consistent results in precision and recall by
consistently identifying spatial prepositions and associated noun phrases. This approach allows for
the detection of place descriptions that lack explicit names but contain location markers.</p>
        <p>The analysis of related words and their identification consistently improves results for all major
configurations, especially for machine learning. The 20.91% increase in F-score from usage of
machine learning component only to machine learning component usage in combination with
relative words extraction module demonstrates the importance of expanding the syntax for fully
recognizing named location entities, which may include descriptive elements.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <sec id="sec-6-1">
        <title>6.1. Effectiveness of Hybrid Architecture</title>
        <p>Analysis shows that different parts of the system have distinct characteristics in terms of precision
and recall, as well as how they complement each other. Machine learning is good at recognizing
common place names and known object types, but struggles with word inflections and descriptive
names.</p>
        <p>The parts of the system based on linguistic rules perform more consistently in terms of
precision and recall. When recognizing named location entities in unstructured text, identifying
spatial prepositions and related word groups is an effective solution. Algorithms based on linguistic
text analysis can accurately identify locations that are not explicitly named but contain location
references.</p>
        <p>Analyzing the dependencies of recognized keywords in named location entities always improves
system performance, especially for machine learning. The 20.91% improvement in F-score when
incorporating the module for expanding named location entities with dependent qualifying or
descriptive words into machine learning suggests that syntactic analysis is essential for accurately
identifying place names.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Domain Transfer Challenges</title>
        <p>The significant difference in performance between literary and news texts using the hybrid model
(F-score 85.16% versus 59.37%) indicates that there is still potential for developing systems for
recognizing named entities in Ukrainian text in specific domains.</p>
        <p>Literary texts include archaic place names, poetic descriptions, and narrative language
structures, which differ markedly from the modern news content used to train the model. The
complete failure of machine learning methods on literary material (F-score 6.89%) suggests the
need to retrain the neural network model using a more balanced dataset. This means that robust
Ukrainian NER systems require diverse training materials, including different text types and
historical language variants.</p>
        <p>Recognition methods based on linguistic rules demonstrate good transferability across domains,
maintaining acceptable performance on literary material (F-score of 56.00%) compared to their
results on news text (F-score of 74.63%). This stability reflects the robustness of syntactic structures
and the use of prepositions of place across different text types and historical periods.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Practical Application</title>
        <sec id="sec-6-3-1">
          <title>The proposed system can be implemented in the following domains:</title>
          <p>• Food and service delivery services;
• Emergency services, including fire departments, medical services, police services, etc.;
• Location analysis in news and chats;
• Taxi service systems.</p>
          <p>Rapid response systems can benefit from an 80.79% F-score when extracting location data
from user reports and emergency messages.</p>
          <p>Delivery services can use this system to extract addresses and identify geographic landmarks in
customer correspondence. The balance between precision and recall (82.88% and 78.81%,
respectively) enables automated processing of unstructured text with the option of human review.</p>
          <p>News analysis and social media monitoring can leverage the high performance of modern text
(F-score 85.16%) to determine the geographic location of events and analyze location-based content.
This is justified by the fact that the implementation of the proposed scheme for named location
entity recognition will enable the processing of complex location descriptions specified in the
Ukrainian language, ensuring comprehensive extraction of geographic information.</p>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Limitations and Future Work</title>
        <p>The current system has several limitations that point to avenues for future research. Problems with
transfer across domains indicate the need for more training data, including literary works,
historical texts, and specialized materials.</p>
        <p>It is worth noting that methods for recognizing named location entities based on linguistic rules
rely on knowledge of a specific language. Knowledge transfer across languages is an interesting
avenue for research. The rules partially described in this paper can be migrated to other languages,
but their application still requires reconsideration of the applicability of specific rules to the
linguistics of a specific language. Therefore, to make this model applicable to recognizing named
location entities in texts of other languages, it would be necessary to introduce a separate module
that could automatically generate morphologically correct forms and rules within the target
language. It's worth noting that this named entity recognition scheme can still be applied to texts
in other languages, including Slavic, thanks to the use of standardized morphological and syntactic
markup provided by the Universal Dependencies framework.</p>
        <p>Furthermore, when recognizing named location entities in Ukrainian text, the set of rules can be
expanded based on the specific features of this language, thereby improving the module's ability to
recognize named location entities using rule-based algorithms.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This paper proposed a hybrid model for recognizing named location entities in unstructured
Ukrainian text. Three main components were proposed for solving this problem. The first
component processes the input text, extracts information about words in sentences and the
relationships between them, and recognizes named location entities using a trained neural network
model. The Stanza library was used to implement this component. To improve the efficiency of
named entity recognition, a module for recognizing keywords in named location entities based on
morphological analysis of the text using rule-based algorithms was proposed. This module is
independent of the dataset configuration and relies solely on the specifics of a particular natural
language. To more fully and thoroughly recognize described locations in the text, a module for
expanding named location entities through analysis of dependent words was introduced.</p>
      <p>This study demonstrates that the combined use of different named entity recognition methods
in unstructured text is well suited for identifying place names in the Ukrainian language. An
accuracy of 80.79% was achieved by combining machine learning-based and rule-based algorithms
for recognizing location names. Experimental studies revealed that the different approaches
complement each other. To obtain complete and comprehensive information about given location
indicators in text, it is important to analyze the morphological relationships between words, which
ensures accurate identification of location names. The significant increase in performance using the
combined approaches (accuracy increased by 36.48% compared to standard machine learning)
demonstrates that the proposed framework effectively identifies and extracts location information.
Based on this, the proposed model can be recommended for use in software systems developed for
emergency response services, logistics, and data analysis.</p>
      <p>The study revealed challenges in recognizing named location entities in texts of varying styles.
This suggests the need to create more diverse and comprehensive Ukrainian-language datasets for
training neural networks when developing systems focused on named entity recognition.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The research of related work was conducted by Kirill Smelyakov, Kateryna Liulina and Victoria
Vysotska. The hybrid schema for location entity recognition in Ukrainian texts was proposed and
implemented by Kateryna Liulina, Kirill Smelyalov and Anastasiia Chupryna. The experimental
plan was prepared by Victoria Vysotska and Anastasiia Chupryna. The experiments were
conducted and analyzed by Kateryna Liulina, Kirill Smelyalov, Anastasiia Chupryna and Victoria
Vysotska.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <sec id="sec-9-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[10] X. Hu, Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, F. Klan, Location reference recognition
from texts: a survey and comparison, ACM Comput. Surveys 56(5) (2023) Article 112, 1–37.
doi:10.1145/3625819.
[11] World Intellectual Property Organization, Efficient and accurate named entity recognition
method and apparatus, WO2020118741, 2020. Available at:
https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2020118741.
[12] Stanza. Named entity recognition, n.d. Available at:
https://stanfordnlp.github.io/stanza/ner.html
[13] Lang-uk. NER annotation of the Ukrainian corpus, n.d. Available at:
https://github.com/languk/ner-uk.
[14] Universal Dependencies. Universal Dependencies project, n.d. Available at:
https://universaldependencies.org/
[15] M. de Marneffe, C.D. Manning, J. Nivre, D. Zeman, Universal Dependencies, Computational</p>
        <p>Linguistics 47(2) (2021) 255–308. doi:10.1162/coli_a_00402.
[16] V. Vysotska, Computer linguistic system architecture for Ukrainian language content
processing based on machine learning, in: CEUR Workshop Proceedings, vol. 3723, 2024, pp.
133–181.
[17] V. Vysotska, Computer linguistic system modelling for Ukrainian language processing, in:</p>
        <p>CEUR Workshop Proceedings, vol. 3722, 2024, pp. 288–342.
[18] V. Vysotska, Computer linguistic systems design and development features for Ukrainian
language content processing, in: CEUR Workshop Proceedings, vol. 3688, 2024, pp. 229–271.
[19] V. Vysotska, Linguistic intellectual analysis methods for Ukrainian textual content processing,
in: CEUR Workshop Proceedings, vol. 3722, 2024, pp. 490–552.
[20] V. Vysotska, K. Przystupa, Y. Kulikov, S. Chyrun, Y. Ushenko, Z. Hu, D. Uhryn, Recognizing
fakes, propaganda and disinformation in Ukrainian content based on NLP and
machinelearning technology, Int. J. Comput. Netw. Inf. Secur. 17(1) (2025) 92–127.
[21] D. Levkivskyi, V. Vysotska, L. Chyrun, Y. Ushenko, D. Uhryn, C. Hu, Agile methodology of
information engineering for semantic annotations categorization and creation in scientific
articles based on NLP and machine learning methods, Int. J. Inf. Eng. Electron. Business 17(2)
(2025) 1–50. doi:10.5815/ijieeb.2025.02.01.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Smelyakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chupryna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Darahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Midina</surname>
          </string-name>
          ,
          <article-title>Effectiveness of modern text recognition solutions and tools for common data sources</article-title>
          ,
          <source>in: Proc. 5th Int. Conf. on Computational Linguistics and Intelligent Systems (COLINS</source>
          <year>2021</year>
          ),
          <article-title>CEUR-WS</article-title>
          , vol.
          <volume>2870</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dashenkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Smelyakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Turuta</surname>
          </string-name>
          ,
          <article-title>Methods of multilanguage question answering</article-title>
          ,
          <source>in: Proc. 2021 IEEE 8th Int. Conf. on Problems of Infocommunications, Science and Technology (PIC S&amp;T)</source>
          , IEEE, Kharkiv, Ukraine,
          <year>2021</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>255</lpage>
          . doi:
          <volume>10</volume>
          .1109/PICST54195.
          <year>2021</year>
          .
          <volume>9772145</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maksymenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Turuta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luzan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tytarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Turuta</surname>
          </string-name>
          ,
          <article-title>Ukrainian News Corpus as text classification benchmark</article-title>
          ,
          <source>in: Proc. 17th Int. Conf. on ICT in Education, Research and Industrial Applications</source>
          . Volume II: Workshops,
          <year>2021</year>
          , pp.
          <fpage>717</fpage>
          -
          <lpage>726</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fedchuk</surname>
          </string-name>
          , V. Vysotska,
          <article-title>Mathematical model of a decision support system for identification and correction of errors in Ukrainian texts based on machine learning</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>4005</volume>
          ,
          <year>2025</year>
          . Available at: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>4005</volume>
          /paper3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dias</surname>
          </string-name>
          , et al.,
          <article-title>Named entity recognition for sensitive data discovery in Portuguese</article-title>
          , Applied Sciences (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <article-title>Fine-tuned BERT-BiLSTM-CRF approach for named entity recognition in geological disaster texts</article-title>
          ,
          <source>Earth Sci. Informatics</source>
          <volume>18</volume>
          (
          <issue>2</issue>
          ) (
          <year>2025</year>
          )
          <fpage>123</fpage>
          -
          <lpage>135</lpage>
          . doi:
          <volume>10</volume>
          .1007/s12145-025- 01870-5.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Chinese toponym recognition with variant neural structures from social media messages based on BERT methods</article-title>
          ,
          <source>J. Geogr. Syst</source>
          .
          <volume>24</volume>
          (
          <issue>2</issue>
          ) (
          <year>2022</year>
          )
          <fpage>143</fpage>
          -
          <lpage>169</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10109-022-00375-9.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Geographic named entity recognition using NLP and an improved BERT model</article-title>
          ,
          <source>ISPRS Int. J. Geo-Information</source>
          <volume>11</volume>
          (
          <issue>12</issue>
          ) (
          <year>2022</year>
          )
          <article-title>598</article-title>
          . doi:
          <volume>10</volume>
          .3390/ijgi11120598.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>N.A.S. ER</surname>
          </string-name>
          ,
          <article-title>Location named-entity recognition using rule-based approach for Balinese texts</article-title>
          ,
          <year>2021</year>
          . Available at: https://www.researchgate.net/publication/349518820.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>