<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Constructing an Annotated Resource for Part-Of-Speech Tagging of Mishnaic Hebrew</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emiliano Giovannetti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Albanesi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Bellandi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Marchi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Pecchioli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Istituto di Linguistica Computazionale</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Via G. Moruzzi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pisa name.surname@ilc.cnr.it</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Progetto Traduzione Talmud Babilonese S.c.a r.l.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lungotevere Sanzio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roma alepec</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@gmail.com</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>English. This paper introduces the research in Part-Of-Speech tagging of mishnaic Hebrew carried out within the Babylonian Talmud Translation Project. Since no tagged resource was available to train a stochastic POS tagger, a portion of the Mishna of the Babylonian Talmud has been morphologically annotated using an ad hoc developed tool connected with the DB containing the talmudic text being translated. The final aim of this research is to add a linguistic support to the Translation Memory System of Traduco, the Computer-Assisted Translation tool developed and used within the Project.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. In questo articolo è
introdotta la ricerca nel
Part-OfSpeech tagging dell’Ebraico mishnaico
condotta nell’ambito del Progetto
Traduzione Talmud Babilonese. Data
l’indisponibilità di risorse annotate
necessarie per l’addestramento di un
POS tagger stocastico, una porzione di
Mishnà del Talmud Babilonese è stata
annotata morfologicamente utilizzando
uno strumento sviluppato ad hoc
collegato al DB dove risiede il testo
talmudico in traduzione. L’obiettivo
ifnale di questa ricerca è lo sviluppo
di un supporto linguistico al sistema
di Memoria di Traduzione di Traduco,
lo strumento di traduzione assistita
utilizzato nell’ambito del Progetto.
The present work has been conducted within
the Babylonian Talmud Translation Project
(in Italian, Progetto Traduzione Talmud
Babilonese - PTTB) which aims at the
translation of the Babylonian Talmud (BT) into
Italian.</p>
      <p>
        The translation is being carried out with the
aid of tools for text and language processing
integrated into an application, called Traduco
        <xref ref-type="bibr" rid="ref5">(Bellandi et al., 2016)</xref>
        , developed by the
Institute of Computational Linguistics “Antonio
Zampolli” of the CNR in collaboration with
the PTTB team. Traduco is a collaborative
computer-assisted translation (CAT) tool
conceived to ease the translation, revision and
editing of the BT.
      </p>
      <p>The research described here fits exactly in
this context: we want to provide the system
with additional informative elements as a
further aid in the translation of the Talmud. In
particular, we intend to linguistically analyze
the Talmudic text starting from the automatic
attribution of the Part-Of-Speech to words by
adopting a stochastic POS tagging approach.</p>
      <p>The first dificulty that has emerged regards
the text and the languages it contains. In this
regard we can say, simplifying, that the
Babylonian Talmud is essentially composed of two
languages which, in turn, correspond to two
distinct texts: the Mishna and the Gemara.
The first is the oldest one written in mishnaic
Hebrew, one of the most homogeneous and
coherent languages appearing in the Talmud
that, for this reason, has been chosen to start
from in the POS tagging experiment.</p>
      <p>The main purpose of linguistic analysis in
the context of our translation project is to
improve the suggestions provided by the
system through the so-called Translation Memory
(TM).</p>
      <p>Moreover, on a linguistically annotated text
it is possible to carry out linguistic-based
searches, useful both for the scholar (in this
case a talmudist), and, during the translation
work, for the revisor and the curator, who
have the possibility, for example, to make bulk
editing of polysemous words by discarding out
words with undesired POS.</p>
      <p>The rest of the paper is organized as
follows: Section 2 summarizes the state of the
art in NLP of Hebrew. The construction of the
linguistically annotated corpus is described in
Section 3. The training process and evaluation
of the POS taggers used in the experiments is
detailed in Section 4. Lastly, Section 5
outlines the next steps of the research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>State of the art</title>
      <p>
        The aforementioned linguistic richness and the
intrinsic complexity of the Babylonian Talmud
make automatic linguistic analysis of the BT
particularly hard
        <xref ref-type="bibr" rid="ref4">(Bellandi et al., 2015)</xref>
        .
      </p>
      <p>
        However, some linguistic resources of
ancient Hebrew and Aramaic have been (and
are being) developed, among which we cite: i)
the Hebrew Text Database
        <xref ref-type="bibr" rid="ref4">(Van Peursen and
Sikkel, 2014)</xref>
        (ETCBC) accessible by
SHEBANQ1 an online environment for the study
of Biblical Hebrew (with emphasis on syntax),
developed by the Eep Talstra Centre for Bible
and Computer of the Vrije Universiteit in
Amsterdam; ii) the Historical Dictionary2 project
of the Academy of the Hebrew Language of
Israel; iii) the Comprehensive Aramaic
Lexicon (CAL)3 developed by the Hebrew Union
College of Cincinnati; iv) the Digital Mishna4
project, concerning the creation of a digital
scholarly edition of the Mishna conducted by
the Maryland Institute of Technology in the
Humanities.
      </p>
      <p>
        Apart from the aforementioned resources, to
date there are no available NLP tools suitable
for the processing of ancient north-western
Semitic languages, such as the diferent
Aramaic idioms and the historical variants of
Hebrew attested in the BT. The only existing
projects and tools for the processing of
Jewish languages
        <xref ref-type="bibr" rid="ref11">(Kamir et al., 2002)</xref>
        <xref ref-type="bibr" rid="ref13 ref16 ref7 ref9">(Cohen and
Smith, 2007)</xref>
        have been developed for
modern Hebrew, a language that has been
artificially revitalized from the end of the XIX
cen1shebanq.ancient-data.org
2maagarim.hebrew-academy.org.il
3cal.huc.edu
4www.digitalmishnah.org
tury and that does not correspond to the
idioms recurring in the BT. Among them we cite
HebTokenizer5 for tokenization, MILA
(Barhaim et al., 2008), HebMorph6,
MorphTagger 7 and NLPH8 for morphological
analysis and lemmatization, yap9, hebdepparser10,
UD_Hebrew11 for syntactic analysis. We
conducted some preliminary tests by starting with
MILA’s (ambiguous) morphological analyzer
applied to the three main languages of the
Talmud:
1. Aramaic: Hebrew and Aramaic are
diferent languages. There are even some cases
in which the very same root has
diferent semantics in the two languages. In
addition, MILA did not recognize many
aramaic roots, tagging the relative words,
derived from them, as proper nouns.
2. Biblical Hebrew: MILA recognized most
of the words, since Modern Hebrew
preserved almost the entire biblical lexicon.
However, syntax of Modern Hebrew is
quite diferent from that of Biblical
Hebrew, leading MILA to output wrong
analyses.
3. Mishnaic Hebrew: this is the language
where MILA performed better.
Modern Hebrew inherits some of the
morphosyntactic features of mishnaic Hebrew,
however, the two idioms difer
substantially on the lexicon, since in modern
Hebrew many archaic words have been lost
        <xref ref-type="bibr" rid="ref13 ref16 ref7 ref9">(Skolnik and Berenbaum, 2007)</xref>
        .
      </p>
      <p>In the light of the above, we decided to create a
novel linguistically annotated resource to start
developing our own tools for the processing of
ancient Jewish languages. In the next section,
we will describe how the resource was built.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Building the resource</title>
      <p>The linguistic annotation of Semitic languages
poses several problems. Although we here
discuss the analysis of Hebrew, many of the
critical points that must be taken into account are
5www.cs.bgu.ac.il/ yoavg/software/hebtokenizer
6code972.com/hebmorph
7www.cs.technion.ac.il/ barhaim/MorphTagger
8github.com/NLPH/NLPH
9github.com/habeanf/yap
10tinyurl.com/hebdepparser
11github.com/UniversalDependencies/UD_Hebrew
common to other languages belonging to the
same family. As already mentioned in the
previous section, the first problem concerns the
access to existing linguistic resources and
analytical tools which, in the case of Hebrew, are
available exclusively for the modern language.</p>
      <p>One of the major challenges posed by the
morphological analysis of Semitic languages
is the orthographic disambiguation of words.
Since writing is almost exclusively
consonantal, every word can have multiple readings.
The problem of orthographic ambiguity,
crucial in all studies on large corpora (typically in
Hebrew and modern Arabic), does not prove
to be so dificult when the text under
examination is vocalized.</p>
      <p>The edition of the Talmud used in the
project is actually vocalized and the text,
consequently, is orthographically unambiguous.
An additional critical aspect is represented by
the definition of the tagset. Most of the
computational studies on language analysis have
been conducted on Indo-european languages
(especially on English).</p>
      <p>
        As a result, it may be dificult to reuse
tagsets created for these languages. Not
surprisingly, there are still many discussions
about how it is better to catalog some POS
and each language has its own part under
discussion. Each tagset must ultimately be
created in the light of a specific purpose. For
example, the tagging of the (Modern) Hebrew
Treebank developed at the Technion
        <xref ref-type="bibr" rid="ref15">(Sima’an
et al., 2001)</xref>
        was syntax-oriented, while the
work on participles of Hebrew described in
        <xref ref-type="bibr" rid="ref1">(Adler et al., 2008)</xref>
        was more lexicon-oriented.
We considered the idea of adopting the tagset
used in the already cited Universal
Dependency Corpus for Hebrew. However, its 16
tags appeared to be too “coarse grained” for
our purposes.12 In particular, the UD tagset
lacks of all the prefix tags that we needed.
For this reason we decided to define our own
tagset.
      </p>
      <p>Once the tagset has been defined, it remains
to decide which is the most suitable
grammatical category to associate with each token. You
can collect essentially two types of
information, the problem is how and if you can keep
12github.com/UniversalDependencies/UD_HebrewHTB/blob/master/stats.xml
both, in particular: i) the definition of the
token from a syntagmatic perspective (i.e. what
the token represents in context) and ii) the
lexical information that the token gives by itself
(without context). To give a couple of
examples:
• Verb/noun: רידִ מַ הַ ת אֶ תּ שְּ אִ → is רידִ מַ הַ “the
one who makes a vow” or “the vowing”?
(the one who consecrates his wife): should
it be assigned to verb or noun category?
• Adjective/verb: ם אִ ןי לִכ יְּ לי חִ תְּ הַ לְּ רמ גְּ לִוְּ ד עַ
אֹל שֶ וּעי גִיַ הרָ וּשּׁ לַ - וּלי חִ תְּיַ → is ןי לִכ יְּ
adjective or verb (given that most of the
mishnaic language dictionaries provide both
options)?
We could discuss about which category would
be the best for each and why, but, for now,
we decided to keep both by introducing two
parallel annotations, by “category” (without
context) and by “function” (in context). The
tagset we used for this work are the
following: agg., avv., cong., interiez., nome pr., num.
card., num. ord., pref. art., pref. cong., pref.
prep., pref. pron. rel., prep., pron. dim., pron.
indef., pron. interr., pron. pers., pron. suf. ,
punt., sost., vb.</p>
      <p>
        One could also envisage the refining of the
tagset by adding: interrogative, modal,
negation, and quantifier
        <xref ref-type="bibr" rid="ref13 ref2">(Adler, 2007)</xref>
        <xref ref-type="bibr" rid="ref12">(Netzer and
Elhadad, 1998)</xref>
        <xref ref-type="bibr" rid="ref13">(Netzer et al., 2007)</xref>
        .
      </p>
      <p>As anticipated, in order to build the
morphologically annotated resource, all of the
Mishna sentences were extracted from the
Talmud and annotated using an ad hoc developed
Web application (Fig. 1).</p>
      <p>All the annotations have been made with
the aim of training a stochastic POS tagger in
charge of the automatic analysis of the entire
Mishna: to obtain a good accuracy it was thus
necessary to manually annotate as many
sentences as possible. To date, 10442 tokens have
been annotated.</p>
      <p>The software created for the annotation
shows, in a tabular form, the information of
the analysis carried out on a sentence by
sentence basis.</p>
      <p>The system, once a sentence is selected for
annotation, checks whether the tokens
composing it have already been analyzed and, in
case, calculates a possible subdivision into
subtokens (i.e. the stems, prefixes and sufixes
constituting each word) by exploiting previous
annotations. If the system finds that a word is
associated with multiple diferent annotations,
it proposes the most frequent one.</p>
      <p>
        Regarding the linguistic annotation, the
grammar of Pérez Fernández
        <xref ref-type="bibr" rid="ref8">(Fernández and
Elwolde, 1999)</xref>
        was adopted and, for
lemmatization, the dictionary of M. Jastrow
        <xref ref-type="bibr" rid="ref10">(Jastrow,
1971)</xref>
        .
      </p>
      <p>The software allows to gather as much
information as possible for each word by providing
a double annotation: by “category” to
represent the POS from a grammatical point of
view, and by “function” to describe the
function the word assumes in its context. For the
POS tagging experiments, described below, we
used the annotation made by “function”.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Training and testing of POS taggers</title>
      <p>
        Once the mishnaic corpus has been
linguistically annotated three of the most used
algorithms for POS tagging have been used and
evaluated: HunPos (Halácsy et al., 2007),
the Stanford Log-linear Part-Of-Speech
Tagger
        <xref ref-type="bibr" rid="ref17">(Toutanova et al., 2003)</xref>
        , and TreeTagger
        <xref ref-type="bibr" rid="ref14">(Schmid, 1994)</xref>
        . The three algorithms
implement supervised stochastic models and,
consequently, they need to be trained with a
manually annotated corpus.
      </p>
      <p>
        To evaluate the accuracy of the algorithms
we adopted the strategy of k-fold cross
validation
        <xref ref-type="bibr" rid="ref6">(Brink et al., 2016)</xref>
        , with k set to 10, and
thus dividing the corpus in 10 partitions.
      </p>
      <p>Table 1 summarizes the results of the
experiment by showing the tagging accuracy of
the three tested algorithms. With a number of
tokens slightly higher than ten thousands the
Tagging Accuracy
Stanford Hunpos Treetagger
87,90% 86,34% 86,74%
Stanford POS tagger provided the best results
over HunPos and Treetagger, with an accuracy
of 87,9%.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Next steps</title>
      <p>In this work, the tagging experiments have
been limited to the attribution of the
PartOf-Speech: the next, natural step, will be the
addition of the lemma. Furthermore, we will
try to modify the parameters afecting the
behaviour of the three adopted POS taggers (left
at their default values for the experiments)
and see how they influence the results.</p>
      <p>Once the Mishna will be lemmatized,
Traduco, the software used to translate the
Talmud in Italian, will be able to exploit this
additional information mainly to provide
translators with translation suggestions based on
lemmas, but also to allow users to query the
mishnaic text by POS and lemma.</p>
      <p>As a further step we will also take into
account the linguistic annotation of portions
of the Babylonian Talmud written in other
languages, starting from the Babylonian
Aramaic, the language of the Gemara, which
constitutes the earlier portion of the Talmud.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was conducted in the context of the
TALMUD project and the scientific
cooperation between S.c.a r.l. PTTB and ILC-CNR.</p>
      <p>Wido Van Peursen and Constantijn Sikkel. 2014.</p>
      <p>Hebrew Text Database ETCBC4. type: dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Meni</given-names>
            <surname>Adler</surname>
          </string-name>
          , Yael Netzer, Yoav Goldberg, David Gabay,
          <string-name>
            <given-names>and Michael</given-names>
            <surname>Elhadad</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Tagging a hebrew corpus: the case of participles</article-title>
          .
          <source>In Nicoletta Calzolari (Conference Chair)</source>
          , Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors,
          <source>Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)</source>
          , Marrakech, Morocco, may.
          <source>European Language Resources Association (ELRA)</source>
          . http://www.lrecconf.org/proceedings/lrec2008/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Menahem</given-names>
            <surname>Meni Adler</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Hebrew Morphological Disambiguation: An Unsupervised Stochastic Word-based Approach</article-title>
          .
          <source>PhD Thesis</source>
          , Ben-Gurion University of the Negev.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          Bar-haim,
          <source>Khalil Sima'an, and Yoad Winter</source>
          .
          <year>2008</year>
          .
          <article-title>Part-of-speech Tagging of Modern Hebrew Text</article-title>
          .
          <source>Nat. Lang</source>
          . Eng.,
          <volume>14</volume>
          (
          <issue>2</issue>
          ):
          <fpage>223</fpage>
          -
          <lpage>251</lpage>
          , April.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bellandi</surname>
          </string-name>
          , Alessia Bellusci, and
          <string-name>
            <given-names>Emiliano</given-names>
            <surname>Giovannetti</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Computer Assisted Translation of Ancient Texts: the Babylonian Talmud Case Study</article-title>
          .
          <source>In Natural Language Processing and Cognitive Science, Proceedings</source>
          <year>2014</year>
          , Berlin/Munich. De Gruyter Saur.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bellandi</surname>
          </string-name>
          , Davide Albanesi, Giulia Benotto, and
          <string-name>
            <given-names>Emiliano</given-names>
            <surname>Giovannetti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Il Sistema Traduco nel Progetto Traduzione del Talmud Babilonese</article-title>
          .
          <source>IJCoL</source>
          Vol.
          <volume>2</volume>
          , n. 2,
          <string-name>
            <surname>December</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Special Issue on ”NLP and Digital Humanities”</article-title>
          . Accademia University Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Henrik</given-names>
            <surname>Brink</surname>
          </string-name>
          , Joseph Richards, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Fetherolf</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Real-World Machine Learning</article-title>
          .
          <source>Manning Publications Co., Greenwich</source>
          ,
          <string-name>
            <surname>CT</surname>
          </string-name>
          , USA, 1st edition.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Shay</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
            and
            <given-names>Noah A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Joint Morphological and Syntactic Disambiguation</article-title>
          .
          <source>In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Miguel</given-names>
            <surname>Pérez Fernández and John F. Elwolde</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>An Introductory Grammar of Rabbinic Hebrew</article-title>
          . Interactive Factory, Leiden, The Netherlands.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Péter</given-names>
            <surname>Halácsy</surname>
          </string-name>
          , András Kornai, and
          <string-name>
            <given-names>Csaba</given-names>
            <surname>Oravecz</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>HunPos: An Open Source Trigram Tagger</article-title>
          .
          <source>In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions</source>
          ,
          <source>ACL '07</source>
          , pages
          <fpage>209</fpage>
          -
          <lpage>212</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Marcus</given-names>
            <surname>Jastrow</surname>
          </string-name>
          .
          <year>1971</year>
          .
          <article-title>A dictionary of the Targumim, the Talmud Babli and Yerushalmi, and the Midrashic literature</article-title>
          . Judaica Press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Dror</given-names>
            <surname>Kamir</surname>
          </string-name>
          , Naama Soreq, and
          <string-name>
            <given-names>Yoni</given-names>
            <surname>Neeman</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew</article-title>
          .
          <source>In Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages, SEMITIC '02</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Yael</given-names>
            <surname>Dahan Netzer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Elhadad</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Generating Determiners and Quantifiers in Hebrew</article-title>
          .
          <source>In Proceedings of the Workshop on Computational Approaches to Semitic Languages, Semitic '98</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Yael</given-names>
            <surname>Netzer</surname>
          </string-name>
          , Meni Adler, David Gabay,
          <string-name>
            <given-names>and Michael</given-names>
            <surname>Elhadad</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Can You Tag the Modal? You Should</article-title>
          .
          <source>In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          , Prague, Czech Republic.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Part-of-speech tagging with neural networks</article-title>
          .
          <source>In Proceedings of the 15th Conference on Computational Linguistics - Volume 1, COLING '94</source>
          , pages
          <fpage>172</fpage>
          -
          <lpage>176</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Khalil</given-names>
            <surname>Sima</surname>
          </string-name>
          <article-title>'an, Alon Itai</article-title>
          , Yoad Winter, Alon Altman, and
          <string-name>
            <given-names>Noa</given-names>
            <surname>Nativ</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Building a treebank of modern hebrew text</article-title>
          .
          <source>TAL. Traitement automatique des langues</source>
          ,
          <volume>42</volume>
          (
          <issue>2</issue>
          ):
          <fpage>347</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Fred</given-names>
            <surname>Skolnik</surname>
          </string-name>
          and Michael Berenbaum, editors.
          <source>2007. Encyclopaedia Judaica</source>
          vol.
          <volume>8</volume>
          .
          <string-name>
            <given-names>Encyclopaedia</given-names>
            <surname>Judaica</surname>
          </string-name>
          .
          <source>Macmillan Reference USA</source>
          ,
          <volume>2</volume>
          <fpage>edition</fpage>
          . Brovender Chaim and
          <string-name>
            <given-names>Blau</given-names>
            <surname>Joshua</surname>
          </string-name>
          and Kutscher Eduard Y. and
          <article-title>Breuer Yochanan and Eytan Eli sub v. “Hebrew Language”</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Dan Klein,
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yoram</given-names>
            <surname>Singer</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Featurerich Part-of-speech Tagging with a Cyclic Dependency Network</article-title>
          .
          <source>In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03</source>
          , pages
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>