<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Conference on Informatics and Applied Mathematics, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A NLP and Rule-Based Approach to Extract Spatial Entities and Relationships in Arabic Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Atmane HADJI</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Khireddine Kholladi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Farid Boumaza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Mohamed El Bachir El Ibrahimi</institution>
          ,
          <addr-line>Bordj Bou Arreridj 34030</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics and Computer Science, Faculty of Exact Sciences , University of El-Oued , , HAMMA Lakhdar</institution>
          ,
          <addr-line>El-Oued</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LAPECI Laboratory , University of Oran1</institution>
          ,
          <addr-line>Oran 31000</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LISI Laboratory, Computer Science Department, University Center A. Boussouf Mila</institution>
          ,
          <addr-line>43000 Mila</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>MISC Laboratory of Abdelhamid Mehri university of Constantine 2</institution>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Bejaia, Faculty of exact sciences, Department of Computer Science</institution>
          ,
          <addr-line>06000 Bejaia</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>4</fpage>
      <lpage>05</lpage>
      <abstract>
        <p />
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information extraction</kwd>
        <kwd>spatial information</kwd>
        <kwd>NLP Arabic</kwd>
        <kwd>rules JAPE</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, the massive growth of digital information, particularly on the Internet and within Big
Data, has highlighted the need to develop eficient information processing systems. This exponential data
increase, especially in georeferenced information, has amplified the challenge of information overload,
making quick and accurate access to relevant data increasingly crucial, especially in specialized fields
like geographic information systems (GIS). In this context, spatial information extraction from raw
texts has become a vital area of research, encompassing disciplines such as natural language processing
(NLP), information extraction (IE), information retrieval (IR), and GIS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Spatial information extraction, especially in the Arabic language, ofers significant advantages across
various sectors. It enriches geospatial databases, enhances the accuracy of geographic information
systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and optimizes location-based services (LBS) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It also supports decision-making in critical
ifelds such as urban planning, natural resource management, and disaster response. The extraction
process involves transforming unstructured textual data into structured information, thereby identifying
geospatial entities, relationships, semantic roles, and events for deeper analysis.
      </p>
      <p>However, despite these potential benefits, spatial information extraction from Arabic texts remains
a major challenge. Due to the morphological richness of the language and its semantic ambiguities,
traditional information extraction methods, whether based on statistical techniques or machine learning,
often fall short in addressing these challenges. The Arabic language presents linguistic and grammatical
complexities that complicate the identification of georeferenced information, making integration into
GIS systems even more challenging. This underscores the importance of developing advanced
techniques that can efectively handle these linguistic specifics and overcome the limitations of traditional
approaches.</p>
      <p>Our approach leverages the complementary strengths of NLP to handle the linguistic intricacies of
Arabic while addressing the growing needs of GIS users in the Arab world.</p>
      <p>In this work, we aimed to address the challenges posed by spatial information extraction in
Arabiclanguage texts, an underexplored field in GIS contexts. To this end, we developed innovative solutions
based on NLP techniques and JAPE rules. The objective is to overcome the limitations of traditional
approaches to structure geospatial knowledge and facilitate the indexing and extraction of spatial
entities and their relationships.</p>
      <p>In the first section, we introduce our new approach based on JAPE rules. Section 2 provides a review
of related works on information extraction systems in various domains. In Section 3, we present the
proposed approach along with the system architecture, detailing the components involved. Section 4
focuses on the application and implementation of our approach. Finally, in Section 5, we discuss the
results obtained with our method and perform a comparative evaluation with other approaches.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Rule-based methods have proven their efectiveness in various areas of information extraction, not
least thanks to their ability to capture specific relationships by applying defined syntactic and semantic
rules. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] reported a method for extracting and combining spatial and temporal information from
Arabic texts that enhances search and exploration capabilities using the GATE (General Architecture for
Text Engineering) architecture. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] Introduced “drNER”, a novel rule-based Named Entity Recognition
(NER) method designed to extract dietary concepts, and this approach showed significant results for
the extraction of evidence-based dietary recommendations.
      </p>
      <p>
        In the bibliographic domain, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] applied a rule-based information extraction process to bibliographic
data, aiming to establish a database of relevant concepts, refine the retrieved data and automate the
local retrieval process. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] developed a system combining information extraction and ontology creation
to facilitate the extraction and visualization of clinical information.
      </p>
      <p>
        Furthermore, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] addressed the challenge of automatic information structure extraction from PDF
books, proposing an intelligent rule-based approach to accurately extract logical metadata from these
documents widely used on the semantic web. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] presented the VALET (Very Agile Language Extraction
Toolkit) framework, a rule-based information extraction system that combines lexical, orthographic,
syntactic and corpus-analytic information in a flexible syntax.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposed an approach integrating automatic natural language processing (ANLP) techniques,
rules and gazetteers to extract spatial entities and their relationships from texts, ofering a viable
solution for enriching GIS with accurate spatial information. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], demonstrated the efectiveness of
a rule-based approach for extracting spatial relationships from annotated corpora, particularly for
simple directional relationships. [12] also proposed a system that automatically generates extraction
rules from complex Chinese literal features. [13] demonstrated how cross-linguistic alignment based
on specific grammatical rules can enrich Open IE datasets for under-represented languages such as
Brazilian Portuguese. Finally, [14] illustrated how AIS (Automatic Identification System) data from
ifshing vessels can be exploited to extract precise spatial information, aimed at improving marine
resource management.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed JAPE rule-based method</title>
      <p>The rule-based method is a classic and widely used approach in the field of information extraction. This
method relies on a set of predefined rules that are designed to identify and extract specific information
from text or other types of data. These rules are usually expressed in the form of models or patterns
that correspond to specific linguistic structures or patterns in the data.</p>
      <p>The general architecture of the proposed approach Figure 1 consists of four distinct phases.</p>
      <sec id="sec-3-1">
        <title>3.1. Creation of JAPE rules</title>
        <p>In the first phase, concepts related to Arab entities and spatial relationships are identified and collected.
These concepts are then used to formulate specific JAPE rules [ 15]. which are used to annotate and
extract relevant spatial information from Arabic texts. JAPE rules are advanced regular expressions
developed in Java, enabling the detection of complex patterns in text.</p>
        <p>JAPE rules ofer significant flexibility in natural language processing, particularly for extracting
information from unstructured text. Their main strength lies in the ease of adding or modifying new
rules. It is straightforward to integrate new words or expressions into an existing system without
disrupting the functionality of previously defined rules. This ability to quickly update the rules based
on domain evolution or analysis needs makes JAPE a particularly adaptable and eficient tool for tasks
such as entity recognition and contextual information extraction.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Text processing</title>
        <p>The second phase consists of applying natural language processing modules to prepare the raw text.
This process includes steps such as normalization, tokenization and annotation of the spatial entities
present in the text. These modules are crucial to ensuring that JAPE (Java Annotation Patterns Engine)
rules can be applied eficiently and accurately.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Combination and Extraction</title>
        <p>The third phase is based on the application of the JAPE rules created in the first phase. These rules are
used to associate text segments with defined classes, subclasses or instances. This phase is essential for
automatically extracting structured spatial information from unstructured text, taking into account the
linguistic and contextual specificities of the Arabic language.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Disambiguation and classification</title>
        <p>The fourth phase focuses on disambiguation and classification of the extracted spatial entities. This
step ensures that each entity and relationship is correctly interpreted in its specific context. JAPE rules
are also used here to refine the results, applying disambiguation and classification criteria to improve
the accuracy of the extracted data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Application and realization</title>
      <sec id="sec-4-1">
        <title>4.1. Implementation phase</title>
        <p>Our JAPE rule-based system architecture consists of two main phases, each playing a crucial role in
the extraction of geographic information from natural language text (Figure 2). The first phase uses
advanced Natural Language Processing (NLP) techniques to prepare and normalize text data. This
preparation includes text cleaning, sentence segmentation and initial annotation of linguistic elements,
facilitating better rule application.</p>
        <p>The second phase focuses on matching JAPE rules to extract specific information. This phase involves
the definition and creation of rules, the matching of these rules with the text, disambiguation and the
extraction of relevant information. Finally, post-processing is carried out to filter and structure the
extracted data, making it ready for further analysis or integration into geospatial databases. Together,
these phases ensure accurate and eficient information extraction, tailored to the needs of geographic
analysis.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Application environment</title>
        <p>We have chosen to use the GATE environment, a linguistic engineering framework developed by the
University of Shefield and widely adopted since its first release in 1996 for teaching and research. GATE
ofers a suite of reusable processing resources in JAVA, integrated into an information extraction system
called ANNIE (aNearly-New Information Extraction System) [16].</p>
        <p>By default, ANNIE is configured for languages other than Arabic. To adapt this tool to our target
language, we will use specialized components such as the Arabic tokenizer, sentence splitter, POS
tagger and Arabic morphological analyzer. To avoid interference with previous executions, we’ll apply
the “reset” option to remove all traces of previous processes. Annotations in GATE will be performed
by selecting words in the text and creating new annotation categories, enabling precise extraction of
geographical features and other relevant information.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Phase One: NLP techniques</title>
        <p>
          The first phase of our architecture implements NLP techniques that are essential for processing and
understanding natural language text. The aim of this phase is to prepare the textual data in such a
way as to facilitate the extraction of relevant information, bearing in mind that we have used the same
dataset or corpus discussed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <sec id="sec-4-3-1">
          <title>4.3.1. Linguistic pre-processing</title>
          <p>The cleaning of Arabic text is an essential step before applying Natural Language Processing (NLP)
techniques. Here are the main steps specific to the cleaning of Arabic texts:
• Removal of diacritical characters (Tashkeel): Arabic texts may contain diacritics (harakats)
such as Fatha, Damma, Kasra and so on. These diacritics can be removed, as they are often
unnecessary for analysis;
• Removal of special characters and punctuation: As in other languages, special characters
(such as !, @, #, etc.) and punctuation can be removed to simplify the text;
• Character standardization: In Arabic, some characters can be written in more than one way.</p>
          <p>For example, @, @, and @ are often normalized to @. Similarly, ø can be transformed into ø.
• Removing superfluous spaces: Arabic texts may contain multiple spaces or spaces before or
after punctuation. These spaces need to be normalized to ensure correct analysis.
These cleaning steps are crucial to obtaining accurate and relevant results when analyzing Arabic text
using Automatic Natural Language Processing (ANLP) techniques. Properly cleaned text reduces noise
and errors, enabling analysis algorithms to better understand the complex linguistic structures of Arabic,
improve the accuracy of results and ensure better interpretation of textual data. Thus, these cleaning
practices are essential for any NLP application, be it named entity recognition, text classification or
machine translation.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Application of TALN techniques</title>
          <p>TALN techniques such as Document Reset, Arabic Tokeniser, Sentence Splitter, Post Tagging and
Morphological Analyser were explained in detail in the next sections. In this study, we will focus on
the practical application of these techniques using the GATE platform [17]. This in-depth exploration
is intended to provide a better understanding of GATE and to serve as a practical guide to its use,
particularly in the context of Arabic text. Given that online documentation is relatively limited, this
chapter plays a vital role in filling this gap and ofering clear instructions for taking full advantage of
GATE’s features.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3.3. Sentence Splitter</title>
        </sec>
        <sec id="sec-4-3-4">
          <title>4.3.4. Tokenization</title>
          <p>Figure 4 shows a screenshot of the GATE platform, illustrating the tokenization process. It shows how
GATE segments text into basic units (tokens) for further linguistic analysis.</p>
        </sec>
        <sec id="sec-4-3-5">
          <title>4.3.5. Post Tagging and Morphological Analyser</title>
          <p>Morpho-syntactic tagging (Post Tagging) and morphological analysis (Morphological Analyser) are
essential processes in automatic natural language processing, particularly when integrated into advanced
systems such as GATE (General Architecture for Text Engineering) [18]. In this context, these steps
are exploited by JAPE (Java Annotation Patterns Engine) rules, which enable sophisticated annotation
patterns to be defined for detecting specific linguistic structures within a corpus. When JAPE rules are
executed, the annotations generated by Post Tagging and morphological analysis enrich the corpus by
adding detailed metadata on grammatical categories and word morphological structure. Although these
annotations do not afect the visible display of the text, they play a crucial role in providing invisible
but fundamental information for subsequent linguistic analysis and accurate information extraction.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Second phase: Application of JAPE rules</title>
        <p>The second phase focuses on the application of JAPE rules for the extraction of specific spatial
information. This phase follows the well-defined steps between rules and spatial information, i.e. the
classification of spatial entities and relationships according to the following table</p>
        <p>The Table 1 below presents a classification of spatial entities, including natural entities, non-natural
entities and entities corresponding to place names or locations.</p>
        <p>The Table 2 presents a classification of spatial relationships, detailing the following categories:
topological relationships, directional relationships, distance relationships and orientation relationships.
These categories help us understand how spatial entities position, orient and relate to each other in a
given space. Topological relations describe relationships of contiguity or inclusion, directional relations
indicate relative orientations, distance relations measure distances between entities, and orientation</p>
        <sec id="sec-4-4-1">
          <title>Spatial Entities Table 2</title>
          <p>, éÊK , Z@Qm  , QîE ,Qm '. , éK. A« , ù£A , éJ. ë , X@ð , ÉJ. k.
èQ m '.
, Ñª¢Ó , ñ , éªÓAg. , éJ. JºÓ , ù® Ó , éPYÓ , Yj. Ó
, YJ. ªÓ , é J» ,Qåk. , YJ¯ , PA¢Ó , é¢m× , ©JÓ , úaeê®Ó
P@X , I. ªÊÓ</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Instances</title>
          <p>úÎ« ; øñJÓ úÎ« ; úÎ« ; ú¯ ; Ég@X ; ¡ð ; á K. ; © . ; ZQk. ; ªK.
ÐAÓ@ ; ­Êg ; H. Q¯ ; Èñk ; I. KAm. '. ; é¯Ag úÎ« ;Pñm×
; Qå H. ñ J k. ; H. Q « ÈA ÖÞ ; Qå ÈA ÖÞ ; H. Q « ; Qå ; H. ñ J k. ; ÈA ÖÞ
ñm ' ; úaek ; ÈCg áÓ ;Q. « ; Y¯ ; H. ñ ; èAm. 'AK. ;ñm ' ; H. Q« H. ñJk.
; É®B@ ñm ' ; úÎ«B@
YJªK. ; áÓ I. KQ¯ ; AJ¯ ; H. Q¯ úÎ« ;ñKX ; H. Q¯ ; YªJ. K ; YªK. úÎ« ; é¯AÓ
úÎ« ; èQ ¯ é¯AÓ úÎ« ; AJ .  I. KQ¯ ; AJ .  YJªK. ; é¯AÓ úÎ« ; á«
...... éÊKñ£ é¯AÓ
úÎ« ; øñJÓ úÎ« ; úÎ« ; ú¯ ; Ég@X ; ¡ð ; á K. ; © . ; ZQk. ; ªK.
ÐAÓ@ ; ­Êg ; H. Q¯ ; Èñk ; I. KAm. '. ; é¯Ag úÎ« ;Pñm×
relations specify alignments or angles between them.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Creation of JAPE Rules</title>
        <p>A JAPE grammar consists of a set of phases, each containing a series of pattern/action [15]. The phases
execute sequentially, forming a cascade of finite-state transducers on the annotations. The left-hand
side (LHS) of the rules comprises an annotation pattern description, while the right-hand side (RHS)
contains instructions for manipulating the annotations. Matching annotations on the LHS of a rule can
be referenced on the RHS using labels attached to pattern elements. Below is an example of a JAPE
rule (Figure 5) for extracting named entities from the "Non-Natural Object" class containing specified</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Rule Matching</title>
        <p>instances: ÉJ. k. (mountain), éJ. ë (plateau), ù£A (beach), éK. A« (forest), X@ð (valley), Qm '. (sea), Z@Qm 
The defined rules (Figure 5) are applied to the text or data to identify segments that match the specified
patterns. A rule could be designed to identify geographic entities by searching for phrases containing
keywords such as "region," "city," or "country." In our method, the option control = appelt is used to
specify that the rules should be executed sequentially. This ensures that each rule is applied in a precise
order, thereby maximizing the accuracy of the extraction.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.7. Information Extraction</title>
        <p>When matching segments are identified, relevant information is extracted (Figure 6). This may include
capturing specific words or phrases or identifying relationships between diferent entities. For example,
the rule "Extract Natural Object" in our JAPE script checks for text matching one of the specified
instances in a list of natural objects, such as é K. A « (forest), X@ð (valley), or Q m '. (sea). When a token
matches a word in the text, a named entity "Natural Object" is created, and specific attributes, such as
class = "Natural Object" and type (containing the character string of the identified entity), are associated
with this entity.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Evaluation</title>
      <p>In this section, we examine the results of experiments aimed at evaluating the efectiveness of our
spatial information extraction system, which uses JAPE rules. This method is based on applying these
rules to automatically extract spatial entities and relationships from a corpus of Arabic texts.</p>
      <p>To evaluate and compare the methods we studied, we will use metrics: Precision, Recall, and F-scale.
Precision refers to the correctness of the retrieval, while recall refers to the completeness of the retrieval.
The F-measure provides the harmonic mean between precision and recall [17].</p>
      <p>According to [18]:
• Precision is the fraction of the valid annotations over the total number of identified annotations.</p>
      <p>It is formally defined as:
  = ()/( + )
(1)
• Recall is the fraction of the valid annotations over the total amount of annotations. It is formally
defined as:
 = /( +  )
(2)
• F-measure is defined as the harmonic mean of two factors, precision and recall. It is formally as:
 −  = (2 *   * )/((  + ))
(3)</p>
      <p>The following Tables present the evaluation results for the extraction of information related to natural
disasters. The corpus used for this evaluation is the same as described in the study (Hadji et al., 2024),
comprising a total of 9,008 words extracted from four diferent Algerian newspapers. This corpus was
annotated to identify 908 spatial information elements, of which 611 are spatial entities and 197 are
spatial relationships. The results obtained (Table 3) show the distribution of annotations across the
diferent newspapers and indicate that spatial entities make up the majority of annotations, representing
68.4% of the total, while spatial relationships account for 31.6%.</p>
      <p>The results (Table 4) show that the most frequently annotated spatial entities are non-natural objects
(265) and places (301), while natural objects are less represented. This may reflect a focus on entities
deemed more relevant in the contexts of the newspapers studied. Regarding spatial relationships,
directional relationships are the most commonly annotated (88), followed by orientation relationships
(58). Topological and distance relationships are much less frequent, which could indicate that they are
considered less important or less complex in the analyzed corpora.</p>
      <p>The following Table 5 shows the number of correct, incorrect, and missing annotations for Algerian
newspapers. This data provides an assessment of the accuracy of spatial annotations performed on
press articles, ofering an overview of the quality of the results obtained during the extraction process.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Analysis and Discussion</title>
      <p>In evaluating the efectiveness of our proposed JAPE rule-based approach for extracting spatial
information from Arabic texts, we compared our results with those of various other methodologies, including
rule-based and hybrid approaches. The following table summarizes the precision, recall, and F-measure
of each method.</p>
      <sec id="sec-6-1">
        <title>6.1. Analysis</title>
        <p>
          • Precision: Our approach demonstrates a high precision of 0.90, indicating that 90% of the
spatial information extracted is relevant and correct. This is a significant advantage, especially
in applications where accuracy is paramount, such as in disaster response scenarios or when
processing sensitive geographic data. In comparison, the rule-based methods [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and [19] show
lower precision values of 0.80 and 0.85, respectively. This discrepancy suggests that while these
methods can recall a broader range of information, they may also include a higher number of
false positives.
• Recall: In terms of recall, our approach achieves a score of 0.85, which indicates a solid ability to
capture a significant proportion of the actual relevant spatial information present in the texts.
Although this recall rate is lower than that of [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (0.91), it remains competitive, especially when
considering that higher recall often comes at the cost of lower precision. The approaches based
on rules [19] and the hybrid method exhibit similar performance levels in recall, with scores of
0.88 and 0.95, respectively. This suggests that while our method may miss some relevant entities
compared to the others, it does so while maintaining a high level of accuracy.
• F-measure: The F-measure, which balances precision and recall, is another critical metric for
assessing the overall performance of the approaches. Our method achieves an F-measure of 0.87,
which reflects a strong performance overall. The hybrid approach [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] leads in this area with an
impressive F-measure of 0.94, underscoring the efectiveness of combining diferent techniques to
leverage their respective strengths. While our approach does not outperform this hybrid model, it
still outshines the purely rule-based approaches, [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and [19], which both yield lower F-measure
scores of 0.85.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Discussion</title>
        <p>The analysis indicates that while our JAPE rule-based method excels in precision, making it a robust
option for applications that require accuracy, it falls slightly behind in recall compared to some other
methods. This presents a critical trade-of in the context of information extraction: achieving a high
precision often limits the breadth of recall. The hybrid method, although potentially more complex to
develop and implement, demonstrates the highest overall efectiveness, suggesting that a combined
strategy could yield the best results in future applications.</p>
        <p>Moving forward, our findings advocate for a nuanced approach to spatial information extraction that
considers the specific requirements of each task. For instance, in scenarios where precision is paramount,
our JAPE method stands out as an ideal choice. Conversely, for applications requiring extensive coverage
of information, exploring hybrid methodologies could enhance performance significantly. Further
research could involve developing a hybrid model that integrates the best features of our JAPE approach
with the comprehensive capabilities of hybrid and machine learning methods, aiming to improve both
precision and recall without sacrificing eficiency.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This research introduced an innovative approach for extracting spatial information from Arabic texts
within Geographic Information Systems (GIS), utilizing JAPE rule-based techniques. This
methodological choice proved efective for annotating and identifying spatial entities, such as natural objects,
artificial objects, and locations, as well as spatial relations, including distance, topology, orientation, and
directional relationships. The use of JAPE rules presents several advantages: it simplifies the creation of
specific linguistic patterns, making it a swift and suitable solution for systems with focused objectives
where ambiguities are minimal. Thus, for targeted applications and well-defined contexts, the JAPE
approach ensures reliable and systematic extraction of spatial information.</p>
      <p>However, our study also highlighted the limitations of this approach in addressing the linguistic
nuances of the Arabic language, which often require labor-intensive manual adjustments and advanced
linguistic expertise. In comparison, ontology-based and machine learning methods, though promising in
terms of generalization and adaptability, demand significant resources to build comprehensive ontologies
and annotated datasets, making them less accessible for applications requiring rapid deployment.</p>
      <p>In conclusion, our work underscores the relevance of the JAPE rule-based approach for extraction
systems where simplicity and quick implementation are paramount. For future applications, it would
be beneficial to explore the hybridization of this method with machine learning and deep learning
techniques, aiming to combine their precision with the adaptability and contextualization capacities that
these approaches ofer. Such a combination could lead to more robust and versatile spatial information
extraction systems, tailored to the diverse challenges presented by Arabic texts and GIS contexts.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[12] Y. Liao, J. Hua, L. Luo, W. Ping, X. Lu, Y. Zhong, Aprcoie: An open information extraction system
for chinese, SoftwareX 26 (2024) 101649.
[13] X. Zhao, A. Rios, Utsa-nlp at chemotimelines 2024: Evaluating instruction-tuned language models
for temporal relation extraction, in: Proceedings of the 6th Clinical Natural Language Processing
Workshop, 2024, pp. 604–615.
[14] L. Zhong, J. Wu, Q. Li, H. Peng, X. Wu, A comprehensive survey on automatic knowledge graph
construction, ACM Computing Surveys 56 (2023) 1–62.
[15] D. Thakker, T. Osman, P. Lakin, Gate jape grammar tutorial, Nottingham Trent University, UK,</p>
      <p>Phil Lakin, UK, Version 1 (2009).
[16] Gate.ac.uk, JAPE: Regular Expressions over Annotations, https://gate.ac.uk/sale/tao/splitch8.html,
2024. Last accessed July 23, 2024.
[17] A. Hadji, M. K. Kholladi, Advanced nlp methods for disaster information extraction: Analyzing
jape rules, ontologies, and machine learning approaches, in: Proceedings of the 3rd International
Conference on Computer Science’s Complex System and their Application (CCSA’2024), Computer
Science Book Series, Springer Nature, 2024. In press.
[18] A. Hadji, M.-K. Kholladi, Automatic opinion extraction from football-related social media: A
gazetteer and rule-based approach, NCAIA’2023 (2023) 61.
[19] S. Panda, A. Pradhan, V. Behera, A. Mohanty, A rule-based information extraction system,
International Journal of Innovative Technology and Exploring Engineering 8 (2019) 1613–1617.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadji</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-K. Kholladi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Borisova</surname>
          </string-name>
          ,
          <article-title>Enhancing spatial information extraction from arabic text: A hybrid approach with ontology and rule-based</article-title>
          ,
          <source>Ingenierie des Systemes d'Information</source>
          <volume>29</volume>
          (
          <year>2024</year>
          )
          <fpage>1261</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Aguilar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinos-Navarrete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Domingo Jaramillo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. L. de la</surname>
          </string-name>
          Hoz-Torres,
          <article-title>Geographic information systems and web gis in higher education: a collaborative tool for the analysis of accessibility in the urban and built environment</article-title>
          ,
          <source>in: Teaching Innovation in Architecture and Building Engineering: Challenges of the 21st Century</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>401</fpage>
          -
          <lpage>415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Anusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jhade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhanasekaran</surname>
          </string-name>
          ,
          <article-title>Progressive collaborative method for protecting users privacy in location-based services</article-title>
          ,
          <source>in: MATEC Web of Conferences</source>
          , volume
          <volume>392</volume>
          ,
          <string-name>
            <given-names>EDP</given-names>
            <surname>Sciences</surname>
          </string-name>
          ,
          <year>2024</year>
          , p.
          <fpage>01089</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Feriel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kholladi</surname>
          </string-name>
          ,
          <article-title>Automatic extraction of spatio-temporal information from arabic text documents</article-title>
          ,
          <source>Int. J. Comput. Sci. Inf. Technol</source>
          <volume>7</volume>
          (
          <year>2015</year>
          )
          <fpage>97</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Eftimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Koroušić</given-names>
            <surname>Seljak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Korošec</surname>
          </string-name>
          ,
          <article-title>A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations</article-title>
          ,
          <source>PloS one 12</source>
          (
          <year>2017</year>
          )
          <article-title>e0179488</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Makhija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahuja</surname>
          </string-name>
          ,
          <article-title>Rule based text extraction from a bibliographic database</article-title>
          .,
          <source>DESIDOC Journal of Library &amp; Information Technology</source>
          <volume>38</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jusoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Obeid</surname>
          </string-name>
          ,
          <article-title>The use of ontology in clinical information extraction</article-title>
          ,
          <source>in: Journal of Physics: Conference Series</source>
          , volume
          <volume>1529</volume>
          ,
          <string-name>
            <given-names>IOP</given-names>
            <surname>Publishing</surname>
          </string-name>
          ,
          <year>2020</year>
          , p.
          <fpage>052083</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alamoudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alomari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alwarthan</surname>
          </string-name>
          , et al.,
          <article-title>A rule-based information extraction approach for extracting metadata from pdf books, ICIC Express Letters</article-title>
          ,
          <source>Part B: Applications</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>121</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cadigan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sasseen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalmar</surname>
          </string-name>
          , Valet:
          <article-title>Rule-based information extraction for rapid deployment</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>524</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mahmoudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Faiz</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for spatial information extraction from natural language text</article-title>
          ,
          <source>in: 2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <article-title>Spatially oriented convolutional neural network for spatial relation extraction from natural language texts</article-title>
          ,
          <source>Transactions in GIS 26</source>
          (
          <year>2022</year>
          )
          <fpage>839</fpage>
          -
          <lpage>866</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>