<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Normalization and Spelling Correction in Kazakh Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gaukhar Slamova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meruyert Mukhanova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Suleyman Demirel University</institution>
          ,
          <addr-line>Engineering and Natural Sciences, Information Systems, 040900, Kaskelen, Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Text normalization is significant step in preprocessing of informal, social media and short texts in the Natural Language Processing (NLP) tasks. Researches in the field are mostly on English, but not on the agglutinative languages such as Kazakh, Korean, Japanese, which are determined as morphologically rich languages, and complex compared to English. In this paper, we present text normalization and auto correction of words for Kazakh language, we convert informal text into grammatically correct form. To do the auto correction task, firstly we countered keyboard error while typing words, then choose the best match from them. Additionally, we categorized words to several groups and separated text into modules of words. The exact match score of the overall system on the provided datasets are 85.40 per cent.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Text normalization is the task of transforming informal writing into its standard
form in the language. It is an important processing step for a wide range of Natural
Language Processing (NLP) tasks such as text-to-speech synthesis, speech
recognition, information extraction, parsing, and machine translation.
        <xref ref-type="bibr" rid="ref19">(Richard Sproat, Alan
W. Black, Stanley F. Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards,
2001)</xref>
        Text normalization involves merging different written forms of token into a
canonical normalized form; for example, a document may contain the equivalent
tokens “Mr.”, “Mr”, “mister”, and “Mister” that would all be normalized to a single
form
        <xref ref-type="bibr" rid="ref15">(Nitin Indurkhya, Fred J. Damerau, 2010)</xref>
        .
      </p>
      <p>Normalization poses multiple challenges, as we know it is a task of mapping all
out-of-vocabulary non-standard word tokens to in-vocabulary standard forms, to deal
with it we should convert raw text into grammatically correct sentence by modifying
punctuation and capitalization, and adding, removing, or reordering words. Also, we
gave specific values to some types as date, phone, currency, URL, etc. On informal
texts as usual a lot of mistakes, it is useful to correct them. To spelling correction task,
we consider keyboard typing mistakes, character repetition and other tools. In this
paper, we propose spelling correction and text preprocessing by mentioned above
techniques, it gives higher precision accuracy than other methodologies.</p>
      <p>The rest of this paper is organized as follows. In Section 2 we discuss previous
approaches to the normalization problem. Section 3 presents our normalization
framework, including the actual normalization and learning procedures. In Section 4 we
introduce evaluation metric, and present experimental results of our model with
respect to several categories. Finally, we conclude in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>Early studies of text normalization include machine learning approach in
text-tospeech and social media, and with usage of neural network in it. In this paper, we use
similar method as in works which investigated text normalization in social media,
because of recent rise heavily informal writing in messaging applications, text
normalization is a huge problem of every language.</p>
      <p>
        Previous works handled text normalization process by producing noisy text where
normalized text go through a noisy channel; this approach called noisy channel
model.
        <xref ref-type="bibr" rid="ref13">(Moore, Eric Brill and Robert C., 2000)</xref>
        presented a method for modelling the
spelling correction as a noisy channel model based on string to string edits; this model
gives significant improvements compared to early studies.
        <xref ref-type="bibr" rid="ref8">(Kristina Toutanova and
Robert C. Moore, 2002)</xref>
        enhanced the string to string edits model by modelling
pronunciation similarities between words achieved a substantial performance
improvement over the previous best performing models for spelling correction.
        <xref ref-type="bibr" rid="ref12">(Monojit
Choudhury, Rahul Saraf, Vijit Jain, Animesh Mukherjee, Sudeshna Sarkar, and
Anupam Basu, 2007)</xref>
        introduced a supervised HMM channel model which adopted
the spellchecking metaphor based on character-level edit which has been extended by
        <xref ref-type="bibr" rid="ref1 ref18">(Paul Cook and Suzanne Stevenson, 2009)</xref>
        who used unsupervised noisy channel
model using probabilistic models for common abbreviation and various spelling
errors types.
        <xref ref-type="bibr" rid="ref7">(Kobus Catherine, François Yvon, and Géraldine, 2008)</xref>
        presented French
SMS messages normalization process by normalizing the orthography with
combination of Statistical Machine Translation and automatic speech recognition approaches.
        <xref ref-type="bibr" rid="ref10 ref2">(Bo Han and Timothy Baldwin, 2011)</xref>
        presented model for identifying and
normalizing ill-formed words, generating correction candidates based on morphophonemic
similarity over SMS corpus and Twitter.
        <xref ref-type="bibr" rid="ref6">(Joseph Kaufmann and Jugal Kalita, 2010)</xref>
        used a machine translation approach with a pre-processor for syntactic normalization
rather than lexical.
        <xref ref-type="bibr" rid="ref10 ref2">(Liu, Deana Pennell and Yang, 2011)</xref>
        presented two-phase method
for expanding abbreviations using a machine translation system trained at the
character level during the first phase and in the second phase utilizing an in-domain
language model, in the context of neighbouring words.
        <xref ref-type="bibr" rid="ref4">(Fei Liu, Fuliang Weng, and Xiao
Jiang, 2012)</xref>
        proposed a cognitively-driven normalization system that integrates
different human perspectives in normalizing the nonstandard tokens, including the
enhanced letter transformation, visual priming, and string/phonetic similarity.
      </p>
      <p>
        There are fewer studies done on the agglutinative language comparing to English,
        <xref ref-type="bibr" rid="ref5">(Gülşen Eryiğit, Dilara Torunoğlu-Selamet, 2017)</xref>
        introduced social media text
normalization for Turkish by analyzing Web 2.0 Turkish texts, categorizing them into
seven types and providing candidate spelling correction words.
        <xref ref-type="bibr" rid="ref11">(Mohammad Saloot,
Norisma Idris, Rohana Mahmud, 2014)</xref>
        propose an approach to normalize the Malay
Twitter messages based on corpus-driven analysis.
        <xref ref-type="bibr" rid="ref17">(Panchapagesan Krishnamurthy,
P.P. Talukdar, N Sridhar, A.G. Ramakrishnan, 2004)</xref>
        introduced a novel approach to
text normalization, wherein tokenization and initial token classification are combined
into one stage followed by a second level of token sense disambiguation, is described.
        <xref ref-type="bibr" rid="ref16">(O. De Clercq, B. Desmet, S. Schulz, E. Lefever, V. Hoste, 2013)</xref>
        used multimodule
approach which rely on Machine Translation and transliteration-based system for
social media messages in the Dutch language. Agglutinative languages tend to have
longer words than fusional ones
        <xref ref-type="bibr" rid="ref20">(Steffen Eger et al., 2016)</xref>
        and spelling correction
model would be complex, because of the morphology.
      </p>
      <p>To our knowledge, the work presented here is the first which observed
normalization in Kazakh language with the usage of auto correction methodology and value
categorization.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Evaluation</title>
      <p>In this section we introduce our normalization framework, which consider both
spelling correction and text preprocessing processes. Morphologically rich languages
such as Kazakh, Korean, Finnish, Arabic, Turkish, etc. are considered as highly
inflectional; their characteristic is that one stem in these languages may have hundreds
of possible forms.
3.1</p>
      <sec id="sec-3-1">
        <title>Spelling Correction</title>
        <p>Spelling errors are categorized into two classes: typographic and cognitive.
Cognitive errors phonetic or orthographic similarity of words; person does not know how to
spell a word. Typographic errors are related to the keyboard and hand/finger
movement where spelling errors happen because of two letters keys’ closeness on the
keyboard. (Kukich, 1992)</p>
        <p>
          To spelling correction Spelling errors have been classified into four types:
Deletion, Insertion, Substitution and Transposition.
          <xref ref-type="bibr" rid="ref3">(Damerau, 1964)</xref>
          Deletion errors
where characters are repeated, as in қаты→қатты, is observed significantly more
frequently than in a non-repeating context showing that visually conspicuous errors
tend to be corrected. Substitution errors of visually similar characters (e.g., ага→аға)
are in fact very common. (Yukino Baba, Hisami Suzuki, 2012)
        </p>
        <p>We make correction within four parts:
 Selection Mechanism – choose candidate with the highest probability
 Candidate model – gives candidate for the given word.
 Language model – probability of the candidates acquireness on the text
 Error model – probability that another word was typed when author mean
exact word.</p>
        <p>When we trying to find most likely correct candidate (x) to word out of all possible
candidates that has maximum probability to intended correction to given word, w:
By Bayes’ Theorem it is equivalent to:
Since P(w) is the same for every possible candidate c, we can factor it out, giving:
Consider the misspelled word "сенін" and the two candidates "сенім" and "сенің".
Correction candidate "сенің" seems good because words look similar and only change
is "ң" to "н", it is an accusative case of noun. On the other hand, "сенім" is a very
common word and a noun, this is the correct spelling of word. The point is that to
estimate P(x|w) we consider both the probability of candidate and the probability of
the change from x to w.
3.1</p>
      </sec>
      <sec id="sec-3-2">
        <title>Replacement rules</title>
        <p>Kazakh is morphologically rich language; one stem has a very large number of
word forms. It is not efficient to use a lexicon lookup for storing and checking all
possible candidates of word forms in the dataset. But morphological analyzer helps to
find all possible word forms, lemmas, and inflectional or derivational structures.</p>
        <p>Kazakh is generally verb-final, though various permutations on subject–object–
verb word order can be used. Inflectional and derivational morphology, both verbal
and nominal, in Kazakh, exists almost exclusively in the form
of agglutinative suffixes. Kazakh is a nominative-accusative, head-final,
leftbranching, dependent-marking language. (Mukhamedova, Raikhangul, 2015)</p>
      </sec>
      <sec id="sec-3-3">
        <title>Case</title>
        <p>Nom
Acc
Gen
Dat
Loc
Abl</p>
      </sec>
      <sec id="sec-3-4">
        <title>Inst Possible Forms</title>
        <p>— шелек
-ні, -ны, -ді, -ды, шелекті
-ті, -ты, -н
-нің, -ның, -дің, - шелектің
дың, -тің, -тың
-ге, -ға, -ке, -қа, - Шелекке
не, -на
-де, -да, -те, -та</p>
        <p>Шелекте
-ден, -дан, -тен, - шелектен
тан, -нен, -нан
-мен(ен) -бен(ен) шелекпен
-пен(ен)
кеме
кемені
кеменің
кемеге
кемеде
кемеден
бас
басты
бастың
басқа
баста
бастан
кемемен
баспен</p>
        <p>
          тұз
"salt"
тұз
Тұзды
тұздың
тұзға
тұзда
тұздан
тұзбен
          <xref ref-type="bibr" rid="ref1 ref18">( Zitouni and R. Sarikaya, 2009)</xref>
          list the below problems related to issue with
agglutinative languages:
 Increase in dictionary size;
 Poor language model probability estimation;
 Higher out-of-vocabulary rate;
 Inflection gap for machine translation
        </p>
        <p>We make candidate generation for the nonstandard word forms. In informal texts
mostly used slangs, abbreviations, character repetitions, logograms, wrong letter
cases, spelling errors related to pronunciation, vowels misspelling errors. To normalize
such words, we make following candidate generation layer:
 Letter case transformations;</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Evaluation</title>
      <p>We performed evaluation for both word spelling correction and replacement rules.
For the training dataset we used most popular and valuable novels of Kazakh
literature written by Mukhtar Auezov “Abai Zholy” (The path of Abai) which consists of
16893 words.</p>
      <p>Misspelling
Қыздартың
Сагыз
Атам
Сенің
Сагыніш
Жанын</p>
      <p>As shown in Table 5, spelling correction with the usage of keyboard model errors
gave higher accuracy than word replacement to find normalized form of value. Noisy
non-standard words correction not inserts words into the dataset, it generates best fit
candidate to the misspelling word. We made testing to 500 words, constructed testing
dataset according to words from “Abai Zholy”. Instead of using lexicon lookup, we
propose to use keyboard model for Kazakh language.</p>
    </sec>
    <sec id="sec-5">
      <title>4 Conclusions</title>
      <p>NLP is the recent field of science in the Kazakhstan, there is a lack of tools for
preprocessing and spelling correction. In this research, we aimed to explore the
necessary components for text normalization of a morphologically rich language, Kazakh,
for the further studies related to this field.</p>
      <p>In this article, we suggested to use social media and messaging normalization
technique for Kazakh language. We hope to have provided a better insight into spelling
correction by the keyboard usage in Kazakh alphabet which contains 42 letters 16
characters more than English.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We thank the anonymous reviewers for helpful comments and suggestions. We
also thank Kessikbayeva Gulshat for her comments on a preliminary version of this
work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Zitouni</surname>
            and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sarikaya</surname>
          </string-name>
          . (
          <year>2009</year>
          ).
          <article-title>Arabic diacritic restoration approach based on maximum entropy models</article-title>
          . London, UK: Computer Speech &amp; Language.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Bo</given-names>
            <surname>Han</surname>
          </string-name>
          and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          . (
          <year>2011</year>
          ).
          <article-title>Lexical Normalisation of Short Text Messages: Makn Sens a #twitter</article-title>
          . Portland, Oregon, USA: Proceedings of ACL-HLT.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Damerau</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>1964</year>
          ).
          <article-title>A technique for computer detection and correction of spelling errors</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>659</volume>
          -
          <fpage>664</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Fei</given-names>
            <surname>Liu</surname>
          </string-name>
          , Fuliang Weng, and
          <string-name>
            <surname>Xiao Jiang.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A broad-coverage normalization system for social media language</article-title>
          .
          <source>ACL</source>
          ,
          <fpage>1035</fpage>
          -
          <lpage>1044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Gülşen</given-names>
            <surname>Eryiğit</surname>
          </string-name>
          , Dilara Torunoğlu-Selamet.
          <article-title>(</article-title>
          <year>2017</year>
          ).
          <article-title>Social media text normalization for Turkish</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <fpage>835</fpage>
          -
          <lpage>875</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          and Jugal Kalita. (
          <year>2010</year>
          ).
          <article-title>Syntactic normalization of Twitter messages</article-title>
          .
          <source>Kharagpur, India: International Conference on Natural Language Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Kobus</given-names>
            <surname>Catherine</surname>
          </string-name>
          ,
          <article-title>François Yvon, and</article-title>
          <string-name>
            <surname>Géraldine.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Transcrire les SMS comme on reconnaît la parole</article-title>
          .
          <source>Actes de la Conférence sur le Traitement Automatique des Langues</source>
          (pp.
          <fpage>128</fpage>
          -
          <lpage>138</lpage>
          ). Avignon, France: TALN'
          <fpage>08</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          and
          <string-name>
            <surname>Robert C. Moore.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Pronunciation modeling for improved spelling correction</article-title>
          . Philadelphia, USA:
          <article-title>Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</article-title>
          , ACL.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kukich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>1992</year>
          ).
          <article-title>Techniques for automatically correcting</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>24</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Deana Pennell</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yang.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>A character-level machine translation approach for normalization of SMS abbreviations</article-title>
          .
          <source>IJCNLP</source>
          ,
          <fpage>974</fpage>
          -
          <lpage>982</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Saloot</surname>
          </string-name>
          , Norisma Idris, Rohana Mahmud. (
          <year>2014</year>
          ).
          <article-title>An architecture for Malay Tweet normalization</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <fpage>621</fpage>
          -
          <lpage>633</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Monojit</given-names>
            <surname>Choudhury</surname>
          </string-name>
          , Rahul Saraf, Vijit Jain, Animesh Mukherjee, Sudeshna Sarkar, and
          <string-name>
            <surname>Anupam Basu.</surname>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Investigation and modeling of the structure of texting language</article-title>
          .
          <source>International Journal of Document Analysis and Recognition</source>
          ,
          <volume>157</volume>
          -
          <fpage>174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>Eric</given-names>
          </string-name>
          <string-name>
            <surname>Brill</surname>
            and
            <given-names>Robert C.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>An improved error model for noisy channel spelling correction</article-title>
          .
          <source>Englewood Cliffs</source>
          , NJ, USA:
          <article-title>Proceedings of the 38th Annual Meeting on Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Mukhamedova</surname>
            ,
            <given-names>Raikhangul.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Kazakh: A Comprehensive Grammar</article-title>
          .
          <source>Routledge(ISBN 9781317573081).</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Nitin</given-names>
            <surname>Indurkhya</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fred J. Damerau.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <source>Handbook of Natural Language Processing</source>
          (2 ed.). New York, US: Taylor&amp;Francis Group, LLC.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>O. De Clercq</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Desmet</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Lefever</surname>
            ,
            <given-names>V. Hoste.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Normalization of Dutch user-generated content</article-title>
          .
          <source>Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing</source>
          (pp.
          <fpage>179</fpage>
          -
          <lpage>88</lpage>
          ). Hissar, Bulgaria: RANLP'
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Panchapagesan</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.P.</given-names>
            <surname>Talukdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Sridhar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.G. Ramakrishnan.</surname>
          </string-name>
          (
          <year>2004</year>
          ).
          <source>Hindi Text Normalization. Conference: Fifth International Conference on Knowledge Based Computer Systems (KBCS)</source>
          (p.
          <fpage>10</fpage>
          ). Hyderabad, India: KBCS.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Cook</surname>
          </string-name>
          and
          <string-name>
            <given-names>Suzanne</given-names>
            <surname>Stevenson</surname>
          </string-name>
          . (
          <year>2009</year>
          ).
          <article-title>An unsupervised model for text message normalization</article-title>
          .
          <source>Boulder, USA: CALC 09: Proceedings of the Workshop on Computational Approaches</source>
          to Linguistic Creativity.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Richard</given-names>
            <surname>Sproat</surname>
          </string-name>
          , Alan W. Black, Stanley F. Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards. (
          <year>2001</year>
          ).
          <article-title>Normalization of non-standard</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <fpage>287</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Steffen</given-names>
            <surname>Eger</surname>
          </string-name>
          et al. (
          <year>2016</year>
          ).
          <article-title>A comparison of four character-level string-to-string translation models for (OCR) spelling error correction</article-title>
          .
          <source>The Prague Bulletin of Mathematical Linguistics</source>
          ,
          <volume>77</volume>
          -
          <fpage>99</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Yukino</given-names>
            <surname>Baba</surname>
          </string-name>
          , Hisami Suzuki. (
          <year>2012</year>
          ).
          <article-title>How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs</article-title>
          .
          <source>Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics</source>
          (pp.
          <fpage>373</fpage>
          -
          <lpage>377</lpage>
          ). Jeju, Republic of Korea:
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>