=Paper= {{Paper |id=Vol-3266/paper1 |storemode=property |title=Abjad numerals recognition in medieval arabic mathematical texts |pdfUrl=https://ceur-ws.org/Vol-3266/paper1.pdf |volume=Vol-3266 |authors=Hadj Mohammed Djamel,Nacéra Bensaou |dblpUrl=https://dblp.org/rec/conf/viperc/DjamelB22 }} ==Abjad numerals recognition in medieval arabic mathematical texts== https://ceur-ws.org/Vol-3266/paper1.pdf
Abjad numerals recognition in medieval arabic
mathematical texts
Hadj Mohammed Djamel1,*,† , Nacéra Bensaou1,†
1
 USTHB University, Laboratory for research in artificial intelligence (LRIA), BP32 EL ALIA, BAB EZZOUAR, ALGER,
ALGERIE


                                         Abstract
                                         Abjad numerals, also called hisāb al-jumal, is a numeral system based on the twenty eight letters of Arabic
                                         but not in the dictionary order. In ancient Arabic mathematics, all problems and solutions sentences were
                                         completely expressed in natural language with no mathematical symbolism. The present paper is the first
                                         attempt to automatically analyze and recognize Abjad numerals in medieval Arabic mathematical texts.
                                         Since that hisāb al-jumal system has no ambiguity, we also translate Abjad numeral written in natural
                                         language to modern numeral system. We construct a new dataset named Hj-Tagged corpus to facilitate
                                         our study. According to the experimental results, the proposed method is efficient for automatically
                                         analyze and recognize Abjad numerals and mathematical components (such as numerical constants,
                                         Abjad numbers, mathematical operations,.. etc). We also translate Abjad terms detected in the previous
                                         step to modern numeral system, where it achieves an F1 score of 98.1%.




1. Introduction
In several medieval Arabic manuscripts such as mathematical, geographical, and astronomical
texts [1], the numbers are written in a system of Arabic alpha-numerical notation. In this system
each letter from the 28 Arabic letters has a specific numerical value known as the ’Adad of
that letter, and the value of a word is the sum of values to each letter compose that word. The
system was known as hisāb al-jumal ( ÉÒm.Ì '@ H
                                               . A‚k) which meant "numerological calculations"
and also sometimes as Abjad ( Ym.'. B@) which is an acronym referring to ’alif( @), Baã’( H
                                                                                           . ), Jïm( h.),
dãl( X), the first four letters in the 𝐴𝑏𝑗𝑎𝑑 order [2].
           The numbers from 1 to 9 were represented by the 9 first letters, ’alif( @), is used to represent
1; the second letter, Baã’( H
                            . ), is used to represent 2, etc. Then the numbers 10, 20, 30, .., 90
by the next nine letters (10 = yã’( ø), 20 = kãf( ¼), 30 = lãm( B), etc), then 100, 200, 300, .., 1000
                                                                               
by the next letters (100 = qãf( †), 200 = rã’(P), 300 = syïn( €), etc), see Table 1 for more details.

VIPERC2022: 1st International Virtual Conference on Visual Pattern Extraction and Recognition for Cultural Heritage
Understanding, 12 September 2022
*
  Corresponding author.
†
  These authors contributed equally.
$ hadjmohd@mcmaster.ca (H. M. Djamel); i.tiddi@vu.nl (N. Bensaou)
 0000-0001-6301-5152 (H. M. Djamel)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Table 1
The 28 letters of the Arabic alphabet are assigned numerical values (base on Abjad value order).


Hisab al-jumal numbers were used for all mathematical purposes also used for the creation of
chronograms, which "consist of grouping into one meaningful and characteristic word or short
phrase letters whose numerical values when totaled give the year of a past or future event”[3].
For example, a poet used it in talking about the rules of Tajweed. He made a poem stating the
                                                                         
rules of the Arabic letters, and in the end of the poem he said: AîD®JK áÒË øQ‚   . AêjK PAK
[the date of this poem is a good tiding to the one who masters it] when calculated in hisāb al-jumal
system (see Table 2 ), gave the year he authored that book, which was year 1198 (512+120+566)
AH.
   When a number is written in hisāb al-jumal notation, it becomes difficult to recognize
it as number, especially if the hisāb al-jumal word make sense. For example, the equation:
               
" éJKAK hA¿ ú¯ é®J¯X ñË H
                         . Qå” à @ AKXP@" [We wanted to multiply law minutes by kah seconds ]
(Miftāh al Hissāb [4]), the hisāb al-jumal numbers has a unique conversions into ordinary
                                          
decimal notation, yielding " éJKAK 29 ú¯ é®J¯X 36 H
                                                   . Qå” à @ AKXP@"
   In this work, we explore the use of a Bi-directional Long Short Term Memory (BI-LSTM)
network with a conditional random field (CRF) layer to automatically analyze and recognize
Abjad numerals in mathematical expression in medieval Arabic mathematical texts. Additionally,
we also translate hisāb al-jumal terms detected in the previous step which written in natural
language to modern numeral system (such as decimal numbers).
   In the past few years, recurrent neural networks (RNN) [5, 6], together with its variants (such
LSTM and gated recurrent unit (GRU)) are generally becoming more widely known and one
of the most common techniques of the natural language processing, such as part-of-speech
(POS) tagging [7, 8] and named entity recognition (NER) [9, 10]. Recently, [11] applies RNN
approach using Bi-LSTM with CRF for the automatic detection of words and character level
features for the task of drug NER. Similarly, [12] have combined the output of a Bi-LSTM and
Table 2: Hisab al-jummal is a calculation that assigns every letter in Arabic a specific numerical value.


a CRF as input to an Support Vector Machine (SVM) classifier for disease name recognition.
For sequence tagging tasks, [13] proposed a variant of Bi-LSTM with one CRF. The paper [14]
shows that a combination of Bi-LSTM with CRF and external word embeddings model achieves
impressive results for Russian NER task. [15] adapted a Rule-based machine translation system
using Dictionary Approach (DA) to automatically generate modern (symbolic) mathematical
equations from natural language in medieval Arabic Algebra. In this paper, we propose a novel
approach for automatically recognizing and translating Abjad numeral in medieval Arabic
mathematical texts to modern numeral system.
   Following this introduction, the remainder of this paper is organized as follows. Section 2
explains the LSTM networks, Bi-LSTM networks, and Bi-LSTM with CRF networks. Section
3 describes how to translate hisāb al-jumal to modern numeral system. Section 4 shows the
experimental setup such as dataset construction, model architecture, and the training process.
Finally section 5 summarizes our methods, results, and discusses the future work.


2. Bi-LSTM-CRF Model
Recurrent neural networks (RNNs) have proved to be efficient to learn sequential data including
language model [17, 18] and natural language process [19, 20]. An RNN is a neural network that
consists of an input layer x, hidden layer h and output layer y. For instance, given a sentence
𝑥 = (𝑥1 , ..., 𝑥𝑛 ) , an RNN uses a hidden state representation ℎ = (ℎ1 , ..., ℎ𝑛 ) so that it can map
the input 𝑥 to the output sequence 𝑦 = (𝑦1 , ..., 𝑦𝑛 ).
   However, standard RNNs suffer from both exploding and vanishing gradients problems [21].
On the other hand, the RNNs with the gating units such as LSTM-RNN [22] are the most
Figure 1: A bidirectional LSTM-CRF network for sequence tagging.


effective sequence models in practical applications by adding extra memory cell inherent in
RNNs.
   The LSTM cell can be described mathematically with the following six fundamental opera-
tional stages:
    ∙   Input Gate: 𝑖𝑡 = 𝜎(𝑊 (𝑖) 𝑥𝑡 + 𝑈 𝑖 ℎ𝑡−1 )
    ∙   Forget Gate: 𝑓𝑡 = 𝜎(𝑊 (𝑓 ) 𝑥𝑡 + 𝑈 𝑓 ℎ𝑡−1 )
    ∙   Output/Exposure Gate: 𝑜𝑡 = 𝜎(𝑊 (𝑜) 𝑥𝑡 + 𝑈 𝑜 ℎ𝑡−1 )
    ∙   New memory cell: ˜𝑐𝑡 = tanh(𝑊 (𝑐) 𝑥𝑡 + 𝑈 𝑐 ℎ𝑡−1 )
    ∙   Final memory cell: 𝑐𝑡 = 𝑓𝑡 ⊙ ˜𝑐𝑡−1 + 𝑖𝑡 ⊙ ˜𝑐𝑡
    ∙   Final hidden state: ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑐𝑡 )

where 𝑥𝑡 is the input vector at time 𝑡, and ℎ𝑡 denote the hidden state vector storing all the useful
information at (and before) time 𝑡. The 𝑈 and 𝑊 terms denote weight matrices for each gate.
The symbol 𝜎 represents the Sigmoid activation function, ⊙ is the element wise multiplication.
   In this paper, we propose to apply Bi-LSTM neural network [23] instead of a single forward
network. In doing so, we can efficiently make use of past features (via forward states) and future
features (via backward states) for a specific time frame. Finally, we construct our neural network
model by feeding the output vectors of Bi-LSTM into a Conditional Random Field (CRF) layer
[24] to jointly decode the best sequence of tags. Consider an input sentence 𝑋 = {𝑥0 , 𝑥1 , .., 𝑥𝑛 }
and 𝑦 = {𝑦1 , 𝑦2 , ..., 𝑦𝑛 } is the corresponding sequence of tags for sentence 𝑋. We consider 𝑃
to be the matrix of scores output by the Bi-LSTM network. 𝑃 is of size 𝑛 × 𝑘, where k is the
number of distinct tags, 𝑃𝑖,𝑗 is transition probability which represents the score of the 𝑗 𝑡ℎ tag
of the word 𝑖𝑡ℎ , its score defined with the following form [25]:
                                              𝑛
                                             ∑︁                   𝑛
                                                                 ∑︁
                                 𝑠(𝑋, 𝑦) =         A𝑦𝑖 ,𝑦𝑖+1 +         P𝑖,𝑦𝑖
                                             𝑛=0                 𝑛=1

   where A is a matrix of transition scores such that A𝑖,𝑗 represents the score of a transition
from the tag i to tag j. We use 𝑦0 and 𝑦𝑛 are the start and end tags of a sentence, then we add to
the set of possible tags. A is therefore a square matrix of size 𝑘 + 2.
   𝑌 (𝑥) denotes the set of possible sequence of tags for 𝑥. The probabilistic model for sequence
CRF defines a family of conditional probability 𝑝(𝑦|𝑋) with all possible sequence of tags 𝑦
under the given 𝑥 with the following form:

                                                       𝑒𝑠(𝑋,𝑦)
                                     𝑝(𝑦|𝑋) = ∑︀               𝑠(𝑋,𝑦
                                                                   ˜)
                                                     ˜∈𝑌 (𝑥) 𝑒
                                                     𝑦

During the training, log probability of correct tag sequence log 𝑝(𝑦|𝑥) is maximized.
  Figure 1 illustrates the main architecture of our neural network model for medieval mathe-
matical entity recognition system in which each word is tagged with other (O) or one of six
entity types: hisāb al-jumal (H-jumal), root (Root), square (Square), cube (Cube), equal (Equal),
and operation (Op). The sentence of " AÒëPX € ÈYªK èP@ Yg. @ lk. ð ÈAÓ" [A square and ℎ𝑎¯𝑗 roots
are equal to thirty sïn dirhems], is tagged as {Square Op H-jumal Root Equal H-jumal O}.


3. Equations and Hisab Al-Jummal Calculation in Medieval
   Arabic Algebra
In the following, we translate some of the Arabic basic mathematical terms and notations used
throughout medieval period into modern symbols:

    ∙   Shay’ (" Zúæ") or jidhr (" P Yg."), refer to unknown value(𝑥).

    ∙   Māl (" ÈAÓ") and ka’b (" I
                                  . ª»") represent respectively 𝑥2 and 𝑥3 .
    ∙   Powers greater than or equal to four can be formed by combining the two words māl and
        ka’b. For example, māl māl (𝑥4 ), māl ka’b (𝑥5 ), and so on.
                   
    ∙   Dirhams (" AÒëPX") or mina al’adad (" XYªË@ áÓ") represent a simple number.

    ∙   The verb ’ādala (" ÈXA«") is used to indicate equality ("=") in an equation.

    ∙   The one-letter word wa ("ð") take the meaning of the modern addition (“+”) depend on
        the context.
    ∙   Hisab al-jummal system is often used to describe numbers during the medieval Arabic
        period.

   The numerological calculation of hisāb al-jumal terms requires a dictionary approach to
relate every letter in the Arabic alphabet to its equivalence in a number format (see Table 1). A
dictionary approach is necessary to recognize each letter and its numerical value as shown in
the table below.
   Let 𝑆 be the sequence of 𝑛 words {𝑤1 , 𝑤2 , .., 𝑤𝑛 }, to capture a correspondence between the
word 𝑤 and its numerical value 𝑡, we define an alignment 𝑝 to be a set of pairs (𝑤, 𝑡), where
𝑤 is a token in 𝑆 and 𝑡 is sum total of 𝑤 letters value. For example, consider the following
                                                                   
sentence 𝑆: Mālan wa kāb‘ ‘ashyā‘ ta’dilu nüd‘ dirhaman (“ AÒëPX YK ÈYªK ZAJƒ @ I  . » ð àBAÓ”)
[Two māls and kab things equals nad dirhams], given the above definitions, and knowing that the
terms kab (" I
             . »") and nad (" YK") are hisāb al-jumal, once 𝑡 is calculated over all hisāb al-jumal
                                                                            
words {𝑡1 =( YK, 50+4) ,𝑡2 =( I
                              . », 20+2)}, 𝑆 can be written as “ AÒëPX 54 ÈYªK ZAJƒ @ 22 ð àBAÓ” [ Two
māls and twenty-two things equals fifty-four dirhams ]. We have shown that the passage from
a sequence of words of any length to its numerical value notation is quite easy. However, no
ambiguity is possible because there are exactly one unique translation of 𝑤𝑖 to 𝑡𝑖 .
   Consider the previous example sentence (see section 2):
            " AÒëPX € ÈYªK èP@ Yg. @ lk . ð ÈAÓ".
   Which was tagged with: {Square Op H-jumal Root Equal H-jumal O}. By applying the numero-
logical calculation of hisāb al-jumal to the words tagged as H-jumal, this sentence is transformed
into " AÒëPX 60 ÈYªK èP@ Yg. @ 11 ð ÈAÓ".


4. Experiments
In this section we first present our proposed architecture, shown in Figure 2, for automatically
recognize and translate hisāb al-jumal entity in medieval algebraic equations and expressions.
Next, we will discuss the construction of a new Hj-Tagged Corpus and the training detail
followed by their results.

4.1. Dataset Construction and Evaluation
We evaluate our proposed system on Hj-Tagged Corpus, constructed from the AMAK Dataset
[15] which consists of medieval-modern equations pairs. We implement a simple dictionary-
based method to detect and replace all numbers in the collected medieval equations (the numbers
are referred to as 𝑛𝑢𝑚 token) with a random numerical entity in the hisāb al-jumal system. We
also added several algebraic expressions which has numbers already written in hisāb al-jumal
system, obtained from Al-Khwārizmı̄ book, 9𝑡ℎ century[26][27], Al-Kāshı̄ book, 15𝑡ℎ century
[4], and Al-Yazdı̄ book [28] in order to increase the diversity of our training examples. Hj-Tagged
Corpus has 2,262 collected equations with 23,454 words and a vocabulary size of 5,049 words,
have been manually tagged using our own tagset, the collected corpus is fully tagged with O
Figure 2: The proposed system architecture.


(other) or one of six entity tags: H-jumal (hisāb al-jumal), Root (root), Square (square), Cube
(cube), Equal (equal), and Op (operation).
   Table 3 shows some examples from the Hj-Tagged Corpus which consists of sequences of
words and their tags. For evaluation, we report the precision, recall, and F1 scores for all tagged
entities in the test set.
   To ensure that the model does not see the context from the testing set during training, we
first split the training, validation, and testing set on our collected dataset. The size of the split
of our collected data into training, validation, and testing is 2,062, 100, 100 respectively.

4.2. Training
The Bi-LSTM-CRF model were implemented using the TensorFlow and Keras [29], a flexible
neural network library written in Python. The general settings of our neural network model
are listed below:
    ∙   Dimension of word embedding vector: 20.
    ∙   Dimension of hidden layer: 50 (for each LSTM: forward layer and backward layer).
    ∙   Learning method: SGD optimizer, learning rate: 0.01
    ∙   Number of examples used in each iteration(BATCH SIZE): 30.
Table 3
Some examples from the Hj-Tagged Corpus.
  Medieval Arabic algebra Equation template                      Tgas sequences (from left to right)
                    邮K ú¯ ¯ H. Qå• ú¯                                 O Op H-jumal O O
                     èP@ Yg. @ ñ“ ÈYªK ÈAÓ                             Square Equal H-jumal Root
                   P@ Yg. @  ƒ ÈYªK ÈAÓ ð                      H-jumal Square Equal H-jumal Root
                          I.« ÈYªK ÈAÓ                                    Square Equal H-jumal
                                        ªë ð ÈAÓ
           AÒëPX ÉK. @ ÈYªK èP@ Yg. @ Qj                      Square Op H-jumal Root Equal H-jumal O
        AÒëPX AK. @Q¢ ÈYªK èP@ Yg. @ ‘« ð ÈAÓ A¯       H-jumal Square Op H-jumal Root Equal H-jumal O
             P@ Yg. B@ Y’Ë ÈYªK I.ªºË@ ©®“                         H-jumal Cube Equal H-jumal O
   XYªË@ áÓ YJë ð àBAÓ ÈYªK èP@ Yg. @ Y« ð I.ª»         Cube Op H-jumal Root Equal Square Op H-jumal O O
 èP@ Yg. @ ák ð I.ª» l»X ÈYªK ÈAÖÏ @ ÈAÓ QK@ ð àBAÓ   Square Op H-jumal Square Square Equal H-jumal Cube Op H-jumal O
                        ........                                                    .......


    ∙   We fix dropout rate at 0.5 for all dropout layers through all the experiments.
    ∙   Supervised learning was applied with up to 100 epochs for training the network.

4.3. Results and Discussions
In what follows (see Table 4), we use classification metrics such as precision, recall, and F1-score
to evaluate our methods. F1-score can be computed to evaluate the performance of the system
based on the detection results of the machine and the results of a human evaluator. F1-score is
the harmonic mean of Precision and Recall computed from the number of mispronunciations
detected by both the computer and human evaluator. They are defined as

                                        2 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
                                   𝐹1 =                                                            (1)
                                          𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
   Precision = TP/(TP + FP) is the fraction of all positive predictions that are true positives, while
Recall = TP/(TP + FN) is the fraction of all actual positives that are predicted positive. More
precisely, the True Positive (TP), in this system, is the number of Abjad numbers the system got
right, False Positive (FP) is the number of Abjad numbers wrongly selected, and False Negative
(FN) are the Abjad numbers wrongly classified as no Abjad numbers.
   First of all, we notice that Bi-LSTM-CRF network performs remarkably well on the Hj-Tagged
Corpus with a mean F1 score of 98.1%. Additionally, using the same parameters, we compare
Bi-LSTM-CRF model performance to a Bi-LSTM network. We show the precision, recall, and
F1 scores of the models. One can see that adding CRF layer significantly improved prediction.
Besides that, the training phase require less than 60 epochs to converge and it in general takes
a few minutes. Finally, our experimental results suggests that Bi-LSTM-CRF network are less
sensitive to training data size and the impact of noise from the tags.
Table 4
Precision, recall, and F1 scores of Bi-LSTM-CRF and Bi-LSTM models on the Hj-Tagged Corpus.
                                Model              Metric
                                Bi-LSTM + CRF      Precision   99.0
                                                   Recall      97.2
                                                   F1          98.1
                                Bi-LSTM            Precision   97.1
                                                   Recall      96.7
                                                   F1          96.9



  This paper focuses on recognizing and tagging components of a mathematical expression in
medieval Arabic text. First, we want to mention here that our model was trained only on the
Hj-Tagged Corpus. The training set is small, this limits the amount of ensemble diversity, which
may reduce the network ability to generalize on new testing examples. Second, we did not
perform any dataset preprocessing, apart from replacing every decimal number in the collected
equations with a random numerical entity in the hisāb al-jumal system.
  An other important point is that manually tagging such dataset with limited-vocabulary
makes the system extremely sensitive to noise.
  On the other hand, our model was able to correctly predict sentences which contain am-
biguities in the test phase. For example, the word wa ("ð") can mean the addition opera-
tor such as (" èP@ Yg. @ 
                           ð ÈAÓ" [Square Op H-jumal Root]), or hisāb al-jumal entity such as
(" èP@ Yg. @ ð ð ÈAÓ"[Square Op H-jumal Root]).
    Finally, we implement a simple tag-based method to translate all hisāb al-jumal terms detected
in the previous step to modern numeral system using the methodology described in Section 3.
For example: The sentence of " èP@ Yg. @ lk
                                          . ð ÈAÓ AÒëPX € ÈYªK" [Square Op H-jumal Root Equal
H-jumal O], is transformed into: " AÒëPX 60 ÈYªK èP@ Yg. @ 11 ð ÈAÓ".


5. Conclusion
This paper experimented the first attempt to automatically analyze and recognize Abjad numer-
als in medieval Arabic mathematical texts using the Bi-LSTM CRF model. We also translate
hisāb al-jumal terms detected in the previous step to modern numeral system. An additional
key strength of this work is the time and effort spent on manually building a new dataset named
Hj-Tagged Corpus, which consists of 2,262 tagged medieval mathematical sentences.
   In the future, we can improve the intermediate representations learned in our network by
training this model jointly with named entity recognition (NER) tags. We also plan to enrich
the training examples by expanding Hj-Tagged Corpus. Another interesting direction is to
apply our model to data from other Arabic sources in many different fields, such as geography,
physics, chemistry, medicine, architecture, Astronomy, and so on.
   Experimental results on the Hj-Tagged Corpus demonstrate that the proposed method offers
an important step in medieval Arabic mathematics analysis to enable scientists to understand
and explore medieval mathematical texts.


References
 [1] Chrisomalis, S. (2021) NUMERALS AS LETTERS: LUDIC LANGUAGE IN CHRONO-
     GRAPHIC WRITING. 09.
 [2] Farooqi, Mehr Afshan. (2003) "The Secret of Letters: Chronograms in Urdu Literary
     Culture." Edebiyat 13.2 : 147-58.
 [3] Ifrah, Georges. (2000) "The universal history of numbers: From prehistory to the invention
     of the computer, translated by David Vellos, EF Harding, Sophie Wood and Ian Monk." .
 [4] Miftāh al Hisāb Ghiyāth al-Dı̄n Jamshı̄d Mas’ud al-Kāshı̄, edited by: A.S al Damardāshı̄ &
     all, Dar al Kitāb al ’Arabi, Le Caire, 1967, 357 pages
 [5] Hinton, Geoffrey E., D. E. Rumelhart, and Ronald J. Williams. (1986) "Learning representa-
     tions by back-propagating errors." Nature 323.9 : 533-536.
 [6] Werbos, Paul J. (1988) "Generalization of backpropagation with application to a recurrent
     gas market model." Neural networks 1.4 : 339-356.
 [7] AlKhwiter, Wasan, and Nora Al-Twairesh. (2021) "Part-of-speech tagging for Arabic tweets
     using CRF and Bi-LSTM." Computer Speech & Language 65 : 101138.
 [8] Kamath, Shilpa, Chaitra Shivanagoudar, and K. G. Karibasappa. (2021) "Part of Speech
     Tagging Using Bi-LSTM-CRF and Performance Evaluation Based on Tagging Accuracy."
     Advances in Computing and Network Communications. Springer, Singapore. 299-310.
 [9] Jin, Guozhe, and Zhezhou Yu. (2021) "A Korean named entity recognition method using
     bi-LSTM-CRF and masked self-attention." Computer Speech & Language 65 : 101134.
[10] Wintaka, Deni Cahya, Moch Arif Bijaksana, and Ibnu Asror. (2019) "Named-entity recog-
     nition on Indonesian tweets using bidirectional LSTM-CRF." Procedia Computer Science
     157 : 221-228.
[11] Zeng, Donghuo, et al. (2017) "LSTM-CRF for drug-named entity recognition." Entropy 19.6
     : 283.
[12] Wei, Qikang, et al. (2016) "Disease named entity recognition by combining conditional
     random fields and bidirectional recurrent neural networks." Database 2016 .
[13] Huang, Zhiheng, Wei Xu, and Kai Yu. (2015) "Bidirectional LSTM-CRF models for sequence
     tagging." arXiv preprint arXiv:1508.01991 .
[14] Anh, Le T., Mikhail Y. Arkhipov, and M. S. Burtsev. (2017) "Application of a Hybrid
     Bi-LSTM-CRF model to the task of Russian Named Entity Recognition." arXiv preprint
     arXiv:1709.09686 .
[15] Djamel, Hadj Mohammed, and Nacéra Bensaou. (2018) "Automatic Extraction of Equations
     in Medieval Arabic Algebra." 2018 IEEE/ACS 15th International Conference on Computer
     Systems and Applications (AICCSA). IEEE.
[16] Gacek, Adam. Arabic manuscripts: a vademecum for readers. Vol. 98. Brill, 2009.
[17] Mikolov, Tomáš, et al. (2010) "Recurrent neural network based language model." Eleventh
     annual conference of the international speech communication association. .
[18] Mikolov, Tomáš, et al. (2011) "Strategies for training large scale neural network language
     models." 2011 IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE.
[19] Jackson, Richard G., et al. (2017) "Natural language processing to extract symptoms of se-
     vere mental illness from clinical text: the Clinical Record Interactive Search Comprehensive
     Data Extraction (CRIS-CODE) project." BMJ open 7.1 : e012012.
[20] Swartz, Jordan, et al. (2017) "Creation of a simple natural language processing tool to sup-
     port an imaging utilization quality dashboard." International journal of medical informatics
     101 : 93-99.
[21] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. (1994) "Learning long-term depen-
     dencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 :
     157-166.
[22] Hochreiter, Sepp, and Jürgen Schmidhuber. (1997) "Long short-term memory." Neural
     computation 9.8 : 1735-1780.
[23] Graves, Alex, and Jürgen Schmidhuber. (2005) "Framewise phoneme classification with
     bidirectional LSTM networks." Proceedings. 2005 IEEE International Joint Conference on
     Neural Networks, 2005.. Vol. 4. IEEE, .
[24] John Lafferty, Andrew McCallum, and Fernando CN Pereira. (2001). Conditional random
     fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of
     ICML2001, volume 951, pages 282–289.
[25] Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv preprint
     arXiv:1603.01360 (2016).
[26] MUŠARRAFA, Alı Mustafı et AHMAD, Muhammad Mursı. Al-Khwārizmı̄. Kitab al-
     mukhtasar fı hisāb al-jabr wa’l-muqabalah, 1939.
[27] Roshdi Rashed, Al-Khwārizmı̄, Le commencement de l’algèbre, éd. 2009.
[28] "Muhammad Bāqir Zayn al-’Ābidı̄n al-Yazdı̄", ’Uyun al-Hissab (Les Fontaines du calcul),
     Manuscript undated - Harvard University, http://pds.lib.harvard.edu/pds/view/11328976?
     n=1&imagesize=1200&jp2Res=.25&printThumbnails=no.
[29] TensorFlow’s implementation of the Keras, https://www.tensorflow.org/guide/keras,