<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Temporal Pattern Extraction in Arabic language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hajer Omri</string-name>
          <email>1hajer.omri2010@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zeineb Neji</string-name>
          <email>2zeineb.neji@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariem Ellouze</string-name>
          <email>3mariem.ellouze@planet.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Economics and management, Tunisia, Sfax Computer department, Miracl laboratory, University of Sfax</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>Despite the importance of temporal inference in several domains especially the question answering systems it remains still in its departure compared to other languages. This article deals with the automatic co-construction of patterns of temporal relations for the question answering systems. We have implemented this approach in temporal inference called TPE: Temporal Pattern Extraction.</p>
      </abstract>
      <kwd-group>
        <kwd>inference</kwd>
        <kwd>Question answering system</kwd>
        <kwd>temporal inference</kwd>
        <kwd>Arabic language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In previous years the main objective of researchers is to build machines that can
learn, communicate, see and manipulate objects and essentially reason because it is
considered one of the biggest stakes in different fields. Although reasoning or
inferring has always been peculiar to the human being and will not be easy to reproduce, it
constitutes a research objective and a motivation to continue imitating the functions of
the human brain.</p>
      <p>Inference is a mental operation that allows the reader to deduce the unspoken or
implicit elements in a text by drawing on his knowledge of the world in his "personal
encyclopedia". Making an inference means producing new information based on
available information.</p>
      <p>There are many kinds, all researchers don’t agree on a single definition and classify
inference according to non-mutually exclusive categories. Among the categories of
inferences we distinguish temporal inference. This inference as called also temporal
reasoning makes it possible to deduce temporal relations.</p>
      <p>Example (all the examples in Arabic language are transliterated with Buckwalter1):
1 http://www.qamus.org/transliteration.htm
ةرقابعلا رهشأ نم ربتعي يواسمن يقيسوم فلؤم اسمنلاب غروبزلاس يف 6271 رياني 72 يف ترازوم دلو "
ي ف حجن ن أ دعب ًاماع 57 ـلا زهاني رمع نع تام دقف ،ةريصق تناك هتايح نأ مغر ىقيسوملا خيرات يف نيعدبملا
يقيسوم لمع 171 جاتنإ
"
“Mozart was born on 27 January 1756 in Salzburg, Austria, an Austrian composer
who is considered one of the most famous geniuses in the history of music, although
his life was short. He died at the age of 35 after producing 626 musical works. ”
Question: “؟ترازوم دلو ىتم/ mtY wld mwzArt? / when was Mozart born?”
We need a smart analysis here to get the right answer. This intelligent analysis is
called inference more particularly temporal inference since one is processing temporal
information.</p>
      <p>The temporal inference covers several domains and disciplines because of its
importance. It is presented strongly in question answering systems, which is concerned
with building systems that automatically answer questions in a natural language by
extracting a precise answer from of a corpus of documents.</p>
      <p>Any temporal information can be clearly expressed (explicit) or referred to as an
unspoken (implicit) and which the interlocutor must understand by himself. A speaker
may wish to pass over some temporal information and if we speak of a machine that
extracts a response that is not clearly expressed, we encounter several difficulties,
hence the need for a system that makes the extraction of any temporal information
implicitly represented.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>In this section, we present the previous work on temporal inference. Despite
extensive research in Arabic and the volume of Arabic textual data has started growing
on the Web in the last decade, it is considered as a starting point for the work of other
languages such as English. Several criteria go into slower progress at Arabic research
levels.</p>
      <p>To understand that the information X is deduced from the information Y, is a
simple deduction for the human being, but for the machine it is quite different. That’s
why the researchers proposed several approaches to solve this problem. The latter are
classified into:
2.1</p>
      <sec id="sec-2-1">
        <title>Rules-based methods</title>
        <p>These methods, which are based on rules, are the oldest among the other types of
extraction methods. The principle of this method is that the system designer manually
establishes a set of rules for locating and extracting the desired data. These rules are
extraction patterns, often implemented using automata, but the creation of these
patterns is a long and costly job.</p>
        <p>Among the researchers who have made systems based on rules are:</p>
        <p>
          Reasoning [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] about time at different granularities while assuring the modeling of
imprecise, gradual and intuitive relationships such as “just before” or “almost
touches”. To deduce from the new relations it uses not only the classical operators but also
its new operators of ascending granular conversion “↑” and descending “↓” which
allows the conversion of one granularity to another.
        </p>
        <p>
          Expresses temporal information [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] on different levels of granularity as well
precision. It integrates it with other inferences, uses a uniform memory for declarative,
episodic, and procedural knowledge. It distinguishes temporal inference by several
characteristics: the use of a temporal window, temporal chaining, and interval
manipulation, with projection, eternisassions and Anticipation.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Semantic methods</title>
        <p>
          HUTO [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is an ontology which provides a conceptual model in RDFS for
modeling temporal expressions and annotating RDF resources. It proposes a set of the rules
allowing standardizing the representation of the temporal data, but also rules of
inferences and implications, expressed in the form of CONSTRUCT requests in SPARQL
in order to deduce and explain the maximum temporal information so that to allow
reasoning on the data.
        </p>
        <p>
          CHRONOS [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]: is a system of reasoning on temporal information for the OWL
ontologies. The latter represents both qualitative and quantitative temporal
information. Based on Allen's relationships CHRONOS makes it possible to deduce the
implicit relations and to detect the inconsistencies while retaining the solidity, the
exhaustiveness and traceability on the whole of the supported relations.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Hybrid methods</title>
        <p>
          Temporal inference has increased in recent years in several areas. Among the
works are researchers who focus on the clinical field as the team of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. He develops a
hybrid method for adapting the extraction of temporal expressions in a corpus of
patient clinical records. Hybridization takes place between a symbolic approach which
is a manual enrichment of the rules of the HeidelTime tool specific to the clinical
field. A supervised approach to sequence prediction based on CRF (conditional
random fields).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Particularity of Arabic language and time constraints</title>
      <p>Arabic is a very rich language; However, this richness needs special manipulation
which makes regular NLP systems, designed for other languages are unable to
manage it. Arabic is a spoken language by nearly 300 million people in the world and it is
the religious language for more than a billion people. It imposed itself with the
Quranic revelation which conferred its status as a sacred language. Its unique
character and beauty have forgotten the admiration of Muslims, beyond ethnic and
geographical disparities.</p>
      <p>Among the manifestations of the richness of this language is the fact that the
names, notions and concepts benefit from a very wide palette of nuances which
allows to be expressed with extreme precision. Citing the example for the designation
of the months of the year when one can note a significant variety of this word that’s
why we need a system of equivalence between the representations set which
designates the same temporal information to resolve any ambiguity.</p>
      <p>Example of ambiguity: for temporal information 03/12/2000 we find several
representations:
 03-12-2000
 يرجه 6276 ربنجد نم ثلاثلا /AlvAlv mn djnbr 1421 hjry
 يرجه 6276 ةجحلا وذ نم ثلاثلا /AlvAlv mn *w AlHjp 1421 hjry
 يدلايم 7222ربمسيد رهش نم ثلاثلا مويلا /Alywm AlvAlv mn $hr dysmbr 2000
mylAdy</p>
      <p>For the word "ربمسيد/ December / dysmbr " we also find the following words
which are equivalent « ربنجد/ djnbr, ةجحلا وذ / *w AlHjp »and for the year 2000 we can
also find the year يرجه 1421/1421 hjry /1421 Hijri or يدلايم
/2000 gregorian.
2000/2000 mylAdy
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed approach</title>
      <p>The proposed method presented in this section aims at automating the construction
of temporal relationship patterns for question answering systems. This method is
considered as a rule-based method and it’s composed of three modules as shown in the
previous figure (Fig1).</p>
      <p>The first module in this method consists of the question analysis, which makes it
possible to extract the various named entities as well as the verbs. In the second
module, we proceeded to the construction of our corpus by automatically downloading the
articles corresponding to the named entities already acquired through Wikipedia.
After a set of corpus pre-processing us go on to the last module which consists in
extracting the candidate sentences, which leads to a set of relevant sentences that is used
to construct the patterns of temporal relations.</p>
      <p>In the following we detail the various steps and the phases that constitute them.</p>
      <p>This step consists in analyzing a question in the Arabic language that solicits
temporal information only. This constraint must be highlighted at the level of our
program. Indeed, our starting point is a question bearing temporal information only; the
other types of questions are not the subject of our research.</p>
      <p>A question is called temporal if it begins with temporal signals. To find these
temporal signals we have used the list of questions produced in TERQAS Workshop2
(illustrated in Table 2) then this list has undergone an Arabic translation in order to
have possible temporal signals.</p>
      <p>This first stage contains two phases to be detailed.</p>
      <sec id="sec-4-1">
        <title>Extraction of named entities.</title>
        <p>
          We proceed at this level to the extraction of the named entities by [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] that remains
in each question for the purpose of building an EN base that we use for the
construction of our corpus granting the corresponding articles.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Extraction of verbs.</title>
        <p>
          Here it is a question of decomposing the question in order to extract the verbs that
exist by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The extraction of verbs will be useful for the following modules. More
details will then be given in the following sections.
4.2
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Construction of the corpus</title>
      </sec>
      <sec id="sec-4-4">
        <title>Downloading articles.</title>
        <p>From this phase, we put our first step for the construction of our corpus. This phase
consists of hosting articles from the online encyclopedia Wikipedia. In fact, we will
automatically download the corresponding articles to the extracted EN from the
previous step in XML format.
2 TERQAS was an ARDA Workshop focusing on Temporal and Event Recognition for
Question Answering Systems, www.cs.brandeis.edu/_jamesp/arda/time/readings.html</p>
      </sec>
      <sec id="sec-4-5">
        <title>Pre-treatment of articles.</title>
        <p>The structuring of Wikipedia articles requires a pre-treatment of gender:
elimination of parentheses and words that are not in the Arabic language and links and
images.</p>
        <p>In a first phase, we extract the textual content of the articles downloaded
automatically to have our corpus.</p>
        <p>During this phase, we will retrieve the Infobox, when it exists because we will use
it for verification in the following steps. We retrieve the raw text from the article. The
corpus becomes after this step of format TXT.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Segmentation.</title>
        <p>
          In a third phase, we proceed to the segmentation [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] of the articles. The latter
represents, in linguistics, a pre-processing of one or more textual documents in order to
be able to subsequently process them (a morphological analysis, semantics, etc.).This
operation is sensitive to each language because each has its own specificities that
must be taken into account. It is considered to be important to locate segments
containing the information.
        </p>
        <p>The result of this stage will serve as input for the step of extracting the so-called
temporal or candidate sentences.</p>
      </sec>
      <sec id="sec-4-7">
        <title>Extraction of temporal sentences.</title>
        <p>This is to get rid of unnecessary information and access those that are considered
relevant to anticipate and act as quickly as possible in decision-making.</p>
        <p>
          Once the articles, text part of the article precisely, cleaned up is segmented, we
proceed to a selection to keep only those sentences that contain temporal information
(relevant) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>Part-of-speech Tagging.</title>
        <p>
          This stage consists in identifying the morphological characteristics [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] of the words
of each temporal sentence of our corpus. What really interests us in this
morphological analysis is to locate the verbs.
        </p>
        <p>Let us return here to the first phase in which the verb of each question analyzed
was detected. A comparison of the verbs of the identified phrases and the detected
verb of the question will take place.
4.3</p>
      </sec>
      <sec id="sec-4-9">
        <title>Construction of patterns</title>
      </sec>
      <sec id="sec-4-10">
        <title>Extraction of synonyms and antonyms.</title>
        <p>In this step we will extract the list of synonyms and antonyms for the verbs
detected from our starting time questions from Arabic Wordnet (AWN).</p>
        <p>We went through a coding phase for this extraction; in fact AWN is codified with
Bluckwalter so we used a codification to have synonyms and antonyms in Arabic.</p>
        <p>The antonyms serve us for temporal questions concerning duration.
For Example:
ىلولاا ةيملاعلا برحلا تماد مك
How long did the First World War last?</p>
        <p>Km dAmt AlHrb AlEAlmyp AlAwlY</p>
        <sec id="sec-4-10-1">
          <title>We find ourselves in front of two situations:</title>
          <p> We can have a direct answer from a relation of synonymy:
تاونس ةعبرأ ةدمل ىلولاا ةيملاعلا برحلا ترمتسا
The First World War continued for four years</p>
          <p>Astmrt AlHrb AlEAlmyp AlAwlY lmdp &gt;rbEp snwAt
[ماد/ lasted /dAm =/رمتسإ/ continued /Astmr].
 Or we can extract the response from an antonymic relation:
ميحج نم تاونس ةعبرأ دعب ىلولاا ةيملاعلا برحلا بيهل ىهتنإ
The flames of First World War ended after four years of hell
&lt;nthY lhyb AlHrb AlEAlmyp AlAwlY bEd &gt;rbEp snwAt mn jHym
[ىهتنإ /&lt;nthY / ended≠ماد /dAm/ lasted].</p>
        </sec>
      </sec>
      <sec id="sec-4-11">
        <title>Extraction of relevant sentences.</title>
        <p>This phase is the most difficult if we aim at a good evaluation of the patterns.
A sentence is considered relevant if:
 It comprises the detected NE of the starting question.
 It comprises both the NE or a name of signal and the same verb as that of the
question or belonging to the list of synonyms of this verb.
 It comprises both the detected NE of the starting question or a name of signal
and a verb belonging to the list of antonyms of the question verb.</p>
        <p>Certainly, we have a set of relevant sentences whose correct answer (s) exists in
one or some of them. The solution envisaged for the correct answer is to make a
comparison between the temporal information contained in these relevant sentences and
the Infobox which generally contains the most important temporal information.</p>
        <p>In case of equality, after solving the temporal constraints, the sentences will be
considered that candidates.</p>
        <p>The result of this module will serve as input for the last module which is the
extraction of the patterns.</p>
      </sec>
      <sec id="sec-4-12">
        <title>Construction of patterns.</title>
        <p>The candidate sentences are considered to be responses to the temporal questions
asked at the outset.</p>
        <p>We can then associate with each question one or more regular answers (called
patterns).</p>
        <p>As an example to answer the question "ىلوالأ ةيملاعلا برحلا تهتنا ىتم/ When the First
World War were ended / mtY Antht AlHrb AlEAlmyp AlAwlY ". The answer to this
question can be presented differently in the text.</p>
        <sec id="sec-4-12-1">
          <title>Example:</title>
          <p> )6161-6162( ىلولأا ةيملاعلا برحلا
 6161 ةنس تهتنا و 6162 ةنس ىلولأا ةيملاعلا برحلا تأدب
The patters are:
 &gt;ءاهتنا خيرات&lt; &gt;أدب خيرات&lt; &gt;مسا&gt;
 &lt; ءاهتنا خيرات&lt; ةنس تهتنا و &gt;أدب خيرات&lt;ةنس &gt;مسا&lt; تأدب
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>The aim of the evaluation is analyzing the detailed capabilities of our proposed
method cited in the previous section. In this section we present their evaluation
results.</p>
      <p>As a first evaluation, we collected a corpus (in Arabic language) composed of set
of 100 temporal and heterogeneous questions related to several domains at the
beginning and the number of questions was increased each time to evaluate the results of
our system TPE. Our corpus was extracted from the corpus of TREC international
conference3 (Text REtrieval Conference) for the years from 1999 to 2003 and from a
list of questions produced in TERQAS Workshop.</p>
      <p>Once the patterns are extracted, and for more precision we have asked the help of
an expert in the domain to judge the semantics of the patterns.</p>
      <sec id="sec-5-1">
        <title>3 http://trec.nist.gov/</title>
        <p>The question of identifying temporal relations using a pattern approach is
particularly on interesting entry point in several areas such as question-and-answer systems.</p>
        <p>The work that we have presented in this article is part of the work of the
identification of temporal relations. In this context, we proposed a method for the identification
of temporal relations based on a semantic approach based on patterns.</p>
        <p>We began this article with an overview of temporal inferences. Next, we proposed
a method for the automatic extraction of patterns for the identification of temporal
relations. Then, we presented our system "TPE" which presents the result of
development of the proposed method. This system allows defining the temporal patterns
from a corpus of texts.</p>
        <p>In this work, we aim at extending the temporal information base in order to build a
specific time dictionary that can be useful in different domains.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>We would like to record our appreciation to all people that involve in writing this
article. First of all, our appreciation goes to Computer department for all the guidance
especially my advisors Madam Mariem Ellouze and Zeineb Neji for guiding and
assistaning us until we complete this article.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Quentin</surname>
          </string-name>
          Cohen-Solal et al, “
          <article-title>Une algèbre des relations temporelles granulaires pour le raisonnement qualitatif</article-title>
          ”,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Hammer</surname>
          </string-name>
          et al, “
          <article-title>The OpenNARS implementation of the Non-Axiomatic Reasoning System”</article-title>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Papa</given-names>
            <surname>Fary</surname>
          </string-name>
          Diallo et al, “
          <article-title>HuTO: une Ontologie Temporelle Narrative pour les Applications du Web Sémantique”</article-title>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Eleftherios</given-names>
            <surname>Anagnostopoulos</surname>
          </string-name>
          et al, “
          <article-title>CHRONOS: A Reasoning Engine for Qualitative Temporal Information in</article-title>
          OWL”,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Mike</given-names>
            <surname>Donald Tapi Nzali</surname>
          </string-name>
          , Aurélie Névéol, Xavier Tannier, “
          <article-title>analyse d'expressions temporelles dans les dossiers électroniques patients</article-title>
          ”,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Spence</given-names>
            <surname>Green</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , “Better Arabic Parsing: Baselines, Evaluations, and Analysis”,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Lamia</given-names>
            <surname>Hadrich</surname>
          </string-name>
          <string-name>
            <surname>Belguith</surname>
          </string-name>
          , Leila Baccour et Ghassan Mourad, “
          <article-title>Segmentation de textes arabes basée sur l'analyse contextuelle des signes de</article-title>
          ponctuations et de certaines particules”,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Max</given-names>
            <surname>Silberztein</surname>
          </string-name>
          ,
          <source>Nooj: A Linguistic Annotation System for Corpus Processing</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Héla</given-names>
            <surname>Fehri</surname>
          </string-name>
          et al, “
          <article-title>Reconnaissance et traduction d'entités nommées en arabe avec NooJ en utilisant un nouveau modèle</article-title>
          de représentation”,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>