<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Morphological and phonological learning</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Promodes: A probabilistic generative model for word decomposition</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sebastian Spiegler, Bruno Golénia, Peter Flach Machine Learning Group, Computer Science Department, University of Bristol</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2002</year>
      </pub-date>
      <volume>6</volume>
      <fpage>11</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>For the Morpho Challenge 2009 we present an algorithm for unsupervised morphological analysis called Promodes1 which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. Promodes purely concentrates on segmenting words whereas its labeling method is simplistic. Morpheme labels are the segments themselves. The algorithm can be employed in different degrees of supervision. For the challenge, however, we demonstrate three unsupervised versions. The first one uses a simple segmenting algorithm on a small subset of the data which is based on letter succession probabilities in substrings and then estimates the model parameters using a maximum likelihood approach. The second version estimates its parameters through expectation maximization. Independently of the parameter estimation, we utilized each model to decompose words from the original language data. A third method is a committee of unsupervised learners where each learner corresponds to the second version, however, with different initializations of the expectation maximization. The solution is then found by majority vote which decides whether to segment in a word position or not. In this paper, we describe the details of the probabilistic model, how parameters are estimated and how the most likely decomposition of an input word is found. We have tested Promodes on Arabic (vowelized and non-vowelized), English, Finnish, German and Turkish. All three methods achieved competitive results in the Morpho Challenge 2009.</p>
      </abstract>
      <kwd-group>
        <kwd>I</kwd>
        <kwd>2 [Artificial Intelligence]</kwd>
        <kwd>I</kwd>
        <kwd>2</kwd>
        <kwd>6 Learning</kwd>
        <kwd>Parameter learning</kwd>
        <kwd>I</kwd>
        <kwd>2</kwd>
        <kwd>7 Natural Language Processing</kwd>
        <kwd>Language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        word morphology, word decomposition, probabilistic generative model, expectation maximization,
committee of unsupervised learners
The goal of the Morpho Challenge is to advance algorithms for unsupervised morphological
analysis. In general, morphological analysis is a subdiscipline of linguistics and can be defined as the
study of the internal structure of words [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There are four tasks assigned to it: 1) decomposing
words into morphemes, 2) building a morpheme dictionary, 3) defining morphosyntactical rules
1Promodes stands for PRObabilistic generative MOdel for different DEgrees of Supervision.
which state how morphemes can be combined to valid words and 4) defining
morphophonological rules which specify phonological changes when morphemes are combined [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Unsupervised
means absence of supervision in machine learning. Most generally, machine learning refers to a
computational program which improves its results with increasing experience in terms of data and
intervention of an external teacher [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. If there is no intervention of a teacher and the data does
not contain any information towards the task to be solved, the learning process is referred to as
unsupervised. For the Morpho Challenge, the task is to perform unsupervised morpheme analysis
for words contained in a word list using a generic algorithm without any further information. Test
languages are Arabic, English, German, Finnish and Turkish.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Over the recent years, a number of algorithms for unsupervised morphological analysis have been
produced. Goldsmith [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presents a morphological analyzer called Linguistica which learns
signatures from a word list. Signatures are sets of suffixes which are used with certain sets of stems.
They are built by initially splitting words into stem and affix candidates and then by heuristically
changing stem-affix boundaries until the description length of the signatures is minimized. A
similar approach has been chosen by Monson [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] who developed Paramor, an algorithm which learns
paradigmatic structures of morphology as sets of mutually substitutable morphological operations.
Another frequently cited morphological analyzer is Morfessor developed by Creutz et al. [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]
who implemented a model family for unsupervised morphology induction. The two main methods
are Morfessor baseline and Morfessor Categories-MAP.2 The first one is a recursive algorithm for
morphological decomposition based on minimum description length (MDL) and the latter method
is based on a probabilistic maximum a posteriori (MAP) framework and furthermore distinguishes
between different categories of morphemes (prefixes, stems, suffixes). Linguistica, Paramor and
Morfessor carry out morphological analysis in terms of word decomposition, learning a morpheme
dictionary and finding morphosyntactical rules. Other approaches [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ] focus on word
decomposition by analyzing words based on transition probabilities or letter successor variety which
originates in Harris’ approach [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Snover [15, 16] describes a generative model for unsupervised
learning of morphology, however, it differs from ours in the fact that Snover searches, similar to
Monson, for paradigms and we are interested in word decomposition based on the probability of
having a boundary in a certain position and the resulting letter transition of morphemes.
      </p>
      <p>The remainder of this paper is structured as follows. In Section 2 we will present our algorithm
Promodes, its mathematical model, different ways of estimating its parameters and different
methods for word decompositions. In Section 3 and 4 experiments are explained, results analysed
and conclusions drawn.
2</p>
      <sec id="sec-2-1">
        <title>Algorithm</title>
        <p>
          In this section we will explain our algorithm Promodes. Its name stands for probabilistic
generative model for different degrees of supervision. It can be applied in an unsupervised manner where
model parameters are estimated using expectation maximization (EM) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] or (semi-) supervised
by computing maximum likelihood estimates (MLE) from a pre-segmented training set. For the
Morpho Challenge, we used a simple unsupervised segmenting algorithm to segment words in the
training set and then estimated the model parameters by MLE. Independently of the parameter
estimation, we applied the generative model to the original data and decomposed all words. Apart
from the two different ways of estimating the model parameters, we also carried out experiments
with a committee of unsupervised learners where we combined different results by majority vote.
        </p>
        <p>In Figure 1, an overview over the algorithm is given. The probabilistic generative model which
will be introduced in Section 2.1 supplies us with an objective function which assigns a probability
to a single word analysis. In Section 2.2 two ways of estimating parameters will be explained. In
Section 2.3 we will show how we used a committee of unsupervised learners to decompose words.
2Both morphological analyzers are the reference algorithms for the Morpho Challenge.</p>
        <sec id="sec-2-1-1">
          <title>Objective function Search method</title>
          <p>1) Maximum likelihooda
2) Expectation maximizationb
aPromodes 1
bPromodes 2
cPromodes Committee
In a general way, one can say that a probabilistic generative model (PGM) is used to describe the
process of data generation based on observed variables X and hidden or target variables Y with
the goal of forming a conditional probability distribution P (Y jX). In morphological analysis, we
try to decompose words into morphemes. Therefore, the observables correspond to the original
words and the hidden variables to their segmentation or boundaries. Knowing the parameters
of the model we can find the best segmentation of a given word. However, we can also learn
the parameters from a word list by using either maximum likelihood estimates or expectation
maximization as described in Section 2.2.1 and 2.2.2. Subsequently, we will describe how the
PGM has been constructed.</p>
          <p>Words and segmentations The data we are dealing with is a list W of words wj with 1
j jW j. A word consists of n letters and has m = n 1 positions where a boundary can be
inserted. We denote a segmentation of a word with b which describes a binary vector hb1; : : : ; bmi
with 1 i m. In position i a boundary value bi can be f0; 1g for putting a boundary or not.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Letter transition probability distribution In the Markovian spirit we describe a word by</title>
          <p>transitions from a letter x to a letter y within a morpheme where y is drawn from an alphabet
A and x from AB = A [ B where B is a silent start symbol pointing to the first letter of the
morpheme. By introducing such a silent start symbol it is guaranteed that different segmentations
of a word have the same number of transitions.</p>
          <p>px;y = P r(Xi+1 = yjXi = x) with</p>
          <p>X px;y = 1 8x 2 AB and 1
y2A
i
m:
(1)</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Probability distribution over boundary and non-boundary positions A simple way of</title>
          <p>describing a segmentation is in a position-independent and identically distributed manner.
However, assuming that boundaries are equally likely over the segmentation is a strong simplification.
For this reason we chose a position-dependent and non-identical distribution. Each position i is
therefore assigned to a Bernoulli random variable Zi and the existence of a boundary corresponds
to a single trial with positive outcome.</p>
          <p>P r(Zi = 1jm) = pzi;m with P r(Zi = 0jm) + P r(Zi = 1jm) = 1 and 1
i
m
with Zi 2 Z = [Z1; : : : ; Zm] for the entire segmentation. Both above distributions describe the
parameters of our generative model which can be summarized as
= fX; Zg
(2)
(3)</p>
          <p>P r(bijm; ) = pbi;m
P r(bijm; ) =
r;bi =
1
Y pr;m r;bi
r=0
(1;
0;
if bi = r,
otherwise
Finding the best segmentation of a word is done as follows.
bmax;i =
(1;
0;
if P r(Zi = 1jm) P r(Xi+1 = yjXi = B) &gt; P r(Zi = 0jm) P r(Xi+1 = yjXi = x)
otherwise.
where we iterate over possible boundary values r = f0; 1g and have a function r;i which eliminates
all r’s in the product which do not correspond to bi.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Probability of a letter transition in position i If the segmentation of the word is known</title>
          <p>we can assign a probability to its letter structure in terms of letter transitions (which are
lengthindependent).</p>
          <p>P r(wjijbi; ) = px;y
where in the jth word the letter x is in position i and y in i + 1 given the information whether
there is a boundary or not in position i expressed by bi. For later derivations, we rewrite this
equation such that we iterate over the alphabet using x0 and y0, and eliminate all probabilities
which do not correspond to the original x and y by using the function xy;x0y0 .</p>
          <p>P r(wjijbi; ) = Y px;y xy;x0y0 x0 2 AB
xy;x0y0 =
y02A
(1;
0;
if x0 = x and y0 = y in wj at ith position,
otherwise.
with X being the probability distribution over transitions and Z the probability distribution over
boundary and non-boundary positions in a segmentation.</p>
          <p>Position-based perspective and probability of segmenting or not in position i Instead
of looking for the best segmentation bjk given a word wj in an exponential search space with
1 k 2m, we make a decision in each position i and therefore turn the search into a linear
problem. The probability of putting a boundary in position i or not is then defined as
where pbi;m is the probability of having a boundary value bi = f0; 1g in position i given the length
of the segmentation m. For later derivations we rewrite this equation as</p>
          <p>bmax = hbmax;1; : : : ; bmax;mi
If the product of the transition probability from the abstract start symbol B to letter i + 1 and
the probability for segmenting in i is higher than product of the transition from letter i to i + 1
and not segmenting in i, segment, otherwise do not segment.
2.2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Parameter estimation</title>
      <p>Before applying the probabilistic model described in the previous section, its parameters have to
be estimated. For the Morpho Challenge we present two ways of doing this, at first by maximum
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
likelihood estimates and then by expectation maximization. A training set was prepared which
constituted a subset of the original data. Approximately 100,000 word forms3 were selected for
each language which did not contain special characters like apostrophes or hyphens.
2.2.1</p>
      <sec id="sec-3-1">
        <title>Maximum likelihood estimates</title>
        <p>
          We segmented each training set using a heuristic similar to the successor variety [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] in a separate
pre-processing step. All possible substrings of every word were collected in a forward trie4 along
with statistical information as their frequency. A particular word was then analysed from the first
letter onwards where the conditional probability P (lt+1jl1; : : : ; lt) with l being any letter from the
alphabet was evaluated. If the conditional probability dropped in t + 1, the word was segmented
in position t and the evaluation process was reinitiated in position t + 1. Since this is a rather
crude method, it tends to over-segment and solutions for similar words might vary a lot. From the
segmentations we estimated the parameters for Promodes 1 using maximum likelihood estimates.
2.2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Expectation maximization</title>
        <p>
          Parameter estimation by expectation maximization (EM) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] was used in Promodes 2. The EM
algorithm iteratively alternates between two distinctive steps, the expectation or E-step and the
maximization or M-step, until a convergence criterion is met. In the E-step the log-likelihood of
the current estimates for the model parameters are computed. In the M-step the parameters are
updated such that the log-likelihood is maximized. First, we will describe how the likelihood is
calculated and then we will show how the re-estimation is done using partial derivatives.
        </p>
        <p>The Q function as the expected value of the log likelihood function is defined as follows:
c1 = X px0;y0 = 1 with x0 2 AB, and c2 =
y02A</p>
        <p>1
X pz=r0;m = 1
r0=0
L = Q( ; t)
1(c1
1)
2(c2</p>
        <p>1)
2
jW j mj 1 1
= X X X P (bi = rjwji; ) 4log Y pr;m r;bi + log Y px;y xy;x0y0 5
j=1 i=1 r=0 r=0 y02A
02 3 1
3
1A
2
" 1</p>
        <p>X pz=r0;m
r0=0
#
!
1
jW j mj 1
Q( ; t) = X X X P (bi = rjwji; ) log P (wji; bi = rj t)</p>
        <p>j=1 i=1 r=0
t = arg max Q( ; t)</p>
        <p>t</p>
        <p>Our objective function L which we want to maximize during the M-step is built of the Q
function from Equation 12 and includes constraints c1 and c2 and Lagrange multiplier 1 and 2:
px;y =
3For Arabic we used the whole data set since there were less than 20,000 word forms given in the vowelized and
non-vowelized data set.</p>
        <p>
          4A trie is a tree structure where each node corresponds to a single letter. If a set of strings contain a common
prefix, they hang off the same node and the path from the root to the last common node corresponds to the prefix.
(12)
(13)
(14)
(15)
(16)
Although both derivatives are bulky, they have an intuitive interpretation. In Equation 16 we
first count the occurrence of a certain letter transition from x to y multiplied by the probability
P (birjwji; t) and divide it by the weighted sum of all transitions from x. In Equation 17 we
calculate the weighted sum for putting a boundary in position i of words with length mj and
divide it by the weighted sum of all boundaries and non-boundaries in position i.
Since different initializations of the above described EM algorithm may converge in different local
optima, the actual models might give slightly different analyses for a single word. Therefore it
seems natural to average results from a set of different initializations and combine them to a single
solution. Our third method for word decomposition Promodes Committee does exactly this by
using a committee of unsupervised learners. This approach is similar to Atwell’s [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in the Morpho
Challenge 2006. In general, a committee is a group of individuals which have been appointed to
perform a certain task or to arrive at a certain decision. In machine learning, a committee can
combine results from different algorithms or in our case different initializations. Each member of
the committee can vote for a certain partial or complete solution. The weight of each vote can
be either uniform or non-uniform, e.g. based on the algorithm’s performance or the confidence in
the algorithm. The approach we are presenting is completely unsupervised and purely based on
majority vote for putting a boundary in a certain position. Given analyses for a single word wj
in position i we introduce scorej;i as
(17)
(18)
scorej;i = X
        </p>
        <p>h;j;i
h;j;i =
h=1
8&gt;1;
&lt;
&gt;
: 1;
if analysis h contains a boundary
in position i for word wj ,
otherwise.
and put a boundary at the ith position of word wj if scorej;i &gt; 0.
3</p>
        <sec id="sec-3-2-1">
          <title>Experimental results</title>
          <p>For the Morpho Challenge 2009 we applied three methods, Promodes 1, Promodes 2 and
Promodes Committee, to five languages. Two of these languages, Turkish and Finnish, are
agglutinating languages where words are composed mostly by joining morphemes, however, in
certain situations morphemes also undergo phonological changes during the word building process.
One of these phenomena is called vowel harmony which is defined as long-distance phonological
interactions constraining what kind of vowels can co-occur in a word. The other three languages,
namely German, English and Arabic, are rather fusional languages where morphemes are tightly
fused together and are more difficult to separate. The Promodes methods are intended for
agglutinating languages. They decompose words into their morphemes. Morphosyntactic rules
are implicitly stored as statistics in terms of probabilities for segmenting in certain word positions
and resulting letter transitions within morphemes. There is no further grammatical analysis like
building signatures or paradigms. Morpheme labels are either the morphemes themselves or simple
labels consisting of morpheme+index number which we generated during a post-processing step.
The results across all language are listed in Figure 2 and in Table 1 where the best results for
precision, recall and F-measure are written in bold.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>General setup of the experiments</title>
      <p>For method Promodes 1 (P1) we estimated the algorithm’s parameters from a pre-segmented
subset which was generated with a simple segmenting algorithm based on a forward trie that
contained all possible n-grams of words. Boundaries were put in word positions where the
conditional probability of seeing a certain letter after a substring suddenly dropped. This kind of
pre-processing did not give precise morpheme boundaries for various reasons, mostly because it is
a crude heuristic and it does not distinguish between common substrings from different locations
in a word. By using maximum likelihood estimates we averaged the statistics across the subset.
Subsequently, the model was applied to the entire data set to decompose all words.</p>
      <p>
        Promodes 2 (P2) was the version which used expectation maximization (EM) to estimate its
parameters. Initially, words from a subset were randomly segmented and then the EM algorithm
improved the parameter estimates until a convergence criterion was met. As convergence criterion
we used the Kullback-Leibler divergence [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which measures the difference between the model’s
probability distributions before and after each iteration of the EM. The resulting probabilistic
model was then applied to the entire data set. Since the EM converges to local optima we
restarted it with different initializations varying the rate of the random segmentation.
      </p>
      <p>Promodes Committee (PC) made use of the different initializations and the resulting
analyses of Promodes 2. Instead of having to choose a single result it combined different solutions
into one by using majority vote for either segmenting or not in a certain position of a word. The
idea was that when different initializations led to a word boundary in the same position, it should
be placed and if they disagreed, the boundary can be omitted.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Analysis of results</title>
      <p>Comparing our three methods to each other, we can say that P1 had the highest precision aside
from Finnish and Turkish where PC came first. For the recall, P2 had the highest results except
for English and Turkish where P1 had higher values. Looking at the F-measure as the harmonic
mean of precision and recall, P2 had the highest performance for Arabic (non-vowelized, vowelized),
German and Turkish. The highest performance for English was achieved by P1 and for Finnish by
PC. Comparing our methods to other participant’s methods in this year’s challenge, we can state
that P2 ranked first for vowelized and non-vowelized Arabic, and PC came third for Finnish. For
the other languages, P1, P2 and PC were in top or mid ranks.</p>
      <p>Since the analysis of precision and recall also depends on the distribution of the underlying data,
we compared our methods to two default approaches shown in Table 2, segmenting words after
each letter called the segment-all (SA) approach and only assigning a common morpheme to all
words without further analysis called the one-common-morpheme-only (CM) approach. SA shows
whether words which share a morpheme label in the gold standard also have letters in common.
CM shows the different underlying distributions in terms of average number of morphemes in each
language. For CM, we would expect a recall close to 100% if word pairs of the gold standard have
on average a morpheme in common. For SA, we also expect a recall near 100% if words labeled with
the same morpheme tag in the gold standard have a similar spelling. For the languages English,
Finnish, German and Turkish the precision of all our methods were significantly above the SA
and CM method. Arabic was a challenging language because it contained the most morpheme
labels per word as shown in Table 3, independently from being vowelized (vw) or non-vowelized
(nv). The winning method P1 still managed to have a higher precision, however, P2 and PC
were slightly below the default solution. For the recall, SA and CM represented an upper bound.
Since the recall has been calculated by looking at word pairs in the gold standard which share
a common morpheme and should share a morpheme in the result file as well, CM would always
return a correct answer because all words have a common morpheme. SA would return a correct
answer if two words share at least a letter which is the case if they share a morpheme with the
same or similar spelling. Expectedly, all our methods have recall values below the default solutions
by SA and CM.</p>
      <p>Looking at the overall performance expressed by the F-measure, results were significantly better
than default for English, Finnish and German. For Turkish, the F-measure of our methods were
slightly lower than the one of the CM approach. For vowelized and non-vowelized Arabic, our
methods were also slightly below the SA approach. SA’s higher F-measure has been caused by the
number of morphemes per word which exceeds the number of letters. There is also a difference
between the vowelized and the non-vowelized version where non-vowelized Arabic yields a lower
F-measure. This can be explained by the fact that non-vowelized words are shorter and contain
therefore less linguistical information.
4</p>
      <sec id="sec-5-1">
        <title>Conclusions</title>
        <p>We have presented different settings of an algorithm called Promodes which is based on a
probabilistic generative model. Promodes has been developed from a machine learning perspective
with little linguistical considerations and a simple labeling scheme where each morpheme has a
unique morpheme label, either the morpheme itself or an index number. Two ways of estimating
parameters from a subset of each language have been described, maximum likelihood estimates
(Promodes 1) and expectation maximization (Promodes 2). Subsequently, we applied the two
models to decompose an entire data set. As a third method we presented a committee of
unsupervised learners (Promodes Committee) which combined different analyses of Promodes 2. All
three methods achieved competitive results in the Morpho Challenge 2009 whereas Promodes 2
ranked first for vowelized and non-vowelized Arabic, and Promodes Committee came third for
Finnish. For the other languages, our methods were in top or mid ranks.</p>
        <p>In general, we believe that Promodes has the following strengths. The algorithm focuses on
decomposing words and it predicts well how many morphemes two words have in common. The
algorithm does not make assumptions about the structure of the language. It does not matter
whether a language uses only suffixes or also prefixes. Furthermore, it can be applied to different
degrees of supervision. In a (semi-) supervised or unsupervised setting it can be utilized to average
over a small segmented set by learning its parameters from it. Another advantage is that instead
of building a morpheme dictionary and morphosyntactic rules which are likely to be incomplete, it
applies the statistics of a small training set to a larger test data set. Furthermore, the deployment
of expectation maximization showed another way of estimating the model parameters. Doing so,
we achieved similar or slightly better results than with the first method and we can choose a model,
which meets our expectations the best, from different initializations. Another interesting method
was the committee of unsupervised learners which combined results from different analyses.</p>
        <p>Our future work includes investigating the behaviour of the committee in terms of number of
members and how members should be best initialized. Moreover, we want to examine in greater
detail what the impact of the training set size is and to continue in our ongoing optimization of
the probabilistic model.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Acknowledgements</title>
        <p>We would like to thank Aram Harrow for various fruitful discussions on the mathematical
background of this paper, and our team colleagues Roger Tucker and Ksenia Shalonova for
consulting us on general issues in morphological analysis. The work was sponsored by EPSRC grant
EP/E010857/1 Learning the morphology of complex synthetic languages.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Eric</given-names>
            <surname>Atwell</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Roberts</surname>
          </string-name>
          .
          <article-title>Combinatory hybrid elementary analysis of text (cheat)</article-title>
          .
          <source>Proceedings of the PASCAL Challenges Workshop on Unsupervised Segmentation of Words into Morphemes</source>
          , Venice, Italy,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Delphine</given-names>
            <surname>Bernhard</surname>
          </string-name>
          .
          <article-title>Simple morpheme labelling in unsupervised morpheme analysis</article-title>
          .
          <source>Working notes for the CLEF 2007 Workshop</source>
          , Budapest, Hungary,
          <volume>1</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Geert</given-names>
            <surname>Booij</surname>
          </string-name>
          .
          <article-title>The Grammar of Words: An Introduction to Linguistic Morphology</article-title>
          . Oxford University Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Creutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Krista</given-names>
            <surname>Lagus</surname>
          </string-name>
          .
          <article-title>Inducing the morphological lexicon of a natural language from unannotated text</article-title>
          .
          <source>Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05)</source>
          ,
          <volume>1</volume>
          :
          <fpage>106</fpage>
          -
          <lpage>113</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Creutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Krista</given-names>
            <surname>Lagus</surname>
          </string-name>
          .
          <article-title>Unsupervised models for morpheme segmentation and morphology learning</article-title>
          .
          <source>ACM Trans. Speech Lang. Process.</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Minh</given-names>
            <surname>Thang</surname>
          </string-name>
          Dang and
          <string-name>
            <given-names>Saad</given-names>
            <surname>Choudri</surname>
          </string-name>
          .
          <article-title>Simple unsupervised morphology analysis algorithm (sumaa)</article-title>
          .
          <source>Proceedings of the PASCAL Challenges Workshop on Unsupervised Segmentation of Words into Morphemes</source>
          , Venice, Italy,
          <volume>1</volume>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Arthur</given-names>
            <surname>Dempster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nan</given-names>
            <surname>Laird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Donald</given-names>
            <surname>Rubin</surname>
          </string-name>
          .
          <article-title>Maximum likelihood from incomplete data via the em algorithms</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          ,
          <volume>39</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>John</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          .
          <article-title>Unsupervised learning of the morphology of a natural language</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>27</volume>
          :
          <fpage>153</fpage>
          -
          <lpage>198</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>John</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          .
          <article-title>The Handbook of Computational Linguistics, chapter Segmentation and morphology</article-title>
          .
          <source>Blackwell</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Hafer</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.F</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>Word segmentation by letter successor varieties</article-title>
          .
          <source>Information Storage and Retrieval</source>
          ,
          <volume>10</volume>
          :
          <fpage>371âĂŞ385</fpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Zellig</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Harris</surname>
          </string-name>
          .
          <article-title>From phoneme to morpheme</article-title>
          .
          <source>Language</source>
          ,
          <volume>31</volume>
          (
          <issue>2</issue>
          ):
          <fpage>190</fpage>
          -
          <lpage>222</lpage>
          ,
          <year>1955</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kullback</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Leibler</surname>
          </string-name>
          .
          <article-title>On information and sufficiency</article-title>
          .
          <source>Annals of Mathematical Statistics</source>
          ,
          <volume>22</volume>
          :
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          ,
          <year>1951</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Tom</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mitchell</surname>
          </string-name>
          . Machine Learning.
          <string-name>
            <surname>McGraw-Hill</surname>
            <given-names>Science</given-names>
          </string-name>
          /Engineering/Math,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Monson</surname>
          </string-name>
          .
          <article-title>ParaMor: From Paradigm Structure To Natural Language Morphology Induction</article-title>
          .
          <source>PhD thesis</source>
          , Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>