<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unsupervised Learning of Morphology by Using Syntactic Categories</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Burcu Can, Suresh Manandhar Department of Computer Science, University of York York YO10 5DD</institution>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Morphological Analysis</institution>
          ,
          <addr-line>Syntax, Unsupervised Learning, Clustering</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a method for unsupervised learning of morphology that exploits the syntactic categories of words. Previous research [4][12] on learning of morphology and syntax has shown that both kinds of knowledge a ect each other making it possible to use one type of knowledge to help the other. In this work, we make use of syntactic information i.e. Part-of-Speech (PoS) tags of words to aid morphological analysis. We employ an existing unsupervised PoS tagging algorithm for inducing the PoS categories. A distributional clustering algorithm is developed for inducing morphological paradigms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        MDL based models (Brent [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Brent et. al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Goldsmith [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Creutz and Lagus [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) aim
to minimize the space used by the corpus and the model by morphologically segmenting the words
in the corpus. LSV model has been employed by Bordag [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that used letter frequencies to nd
split points in the words. Snover et. al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] describe a probabilistic model in which morphological
paradigms are created gradually by choosing the number of stems, morphemes, paradigms in a
probabilistic, and generative manner. Another generative model is due to Creutz [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in which the
lengths and the frequencies of the morphemes are used as prior information. Schone and
Jurafsky [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] used LSA to capture the semantic relatedness of words to aid morphological segmentation.
      </p>
      <p>
        All the above work has been primarily employed to learn simple i.e. non-recursive
concatenative morphology but they do not directly address the recursive nature of the morphology of
agglutinative languages. Monson [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a system for handling the morphology of
agglutinative languages. His system achieved a precision of 52% on Turkish as evaluated in the Morpho
Challenge Workshop, 2008.
      </p>
      <p>
        There has been some work on the joint unsupervised learning of morphology and PoS tags. Hu
et. al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] extends the minimum description length (MDL) based framework due to Goldsmith [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
to explore the link between morphological signatures and PoS tags. Clark and Tim [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] experiments
with the xed endings of the words for PoS clustering. So morphologically similar words tend to
belong into the same PoS cluster.
      </p>
      <p>Our current work can be viewed is in a similar direction. In particular, we show that
unsupervised PoS tagging can be e ectively employed for learning of morphology. However, the work
presented here is not a method for simultaneous learning of PoS categories and morphology. It
is limited to learning of morphology given that PoS categories already been induced using an
unsupervised PoS tagger.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Inducing Syntactic Categories</title>
      <p>
        For the induction of syntactic categories, we used Clark's distributional clustering approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
which can be considered as an instance of average link clustering. Although it should be
emphasised that any other method for unsupervised induction of PoS tags can be substituted without
a ecting the method presented in this paper. Following Clark's approach, each word is clustered
by using its context. A context consists of the previous word, and the following word. Each word
has a context distribution over all ordered pairs of left-context/right-context words. To measure
the distributional similarity between words, KL divergence is used which is de ned as:
(1)
(2)
D(pkq) = X p(x) log
x
p(x)
q(x)
where p; q are the context distributions of the words being compared and x ranges over contexts.
In his approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the probability of a context for a target word is de ned as:
p(&lt; w1; w2 &gt;) = p(&lt; c(w1); c(w2) &gt;)p(w1jc(w1))p(w2jc(w2))
where c(w1); c(w2) denote the PoS cluster of words w1; w2 respectively.
      </p>
      <p>The algorithm requires the number of clusters K to be speci ed in advance. In addition to the
K clusters, one spare cluster is employed containing all unclustered words. During each iteration,
one word is chosen from the spare cluster having the minimum KL divergence with one of the K
clusters. For each cluster, its context distribution is computed as the averaged distribution of all
words in the cluster. In addition, the KL divergence between clusters are computed after each
iteration and clusters are merged if the divergence is below a manually set threshold.</p>
      <p>We set K=77, the number of tags de ned in CLAWS tagset used for tagging the BNC (British
National Corpus). We used the same number of clusters for Turkish and German. Final clusters
show that PoS clusters are related with the major syntactic categories. The system nds PoS
clusters that can be identi ed as proper nouns, verbs in past tense form, verbs in present continuous
form, nouns, adjectives, adverbs, and so on.</p>
      <sec id="sec-2-1">
        <title>Some sample clusters are given below for English:</title>
        <p>Cluster 1: much far badly deeply strongly thoroughly busy rapidly slightly heavily neatly
widely closely easily profoundly readily eagerly</p>
        <p>Cluster 2: made found held kept bought heard played left passed nished lost changed etc</p>
      </sec>
      <sec id="sec-2-2">
        <title>Cluster 3: should may could would will might did does etc Cluster 4: working travelling ying ghting running moving playing turning etc Cluster 5: people men women children girls horses students pupils sta families etc</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Inducing Mophological Paradigms</title>
      <p>After the induction of the syntactic categories, the conditional probability of each morpheme x
given its PoS cluster c, p(xjc), is computed. The possible morphemes in each PoS cluster is found
by splitting each word in each cluster at all split points, and the morphemes are ranked, and sorted
according to their maximum likelihood estimates in each PoS cluster.</p>
      <p>A list of highest ranked morphemes are given in Table 1 for English, German, and Turkish.</p>
      <p>This ranking is used to eliminate the potential non-morphemes with a low conditional
probability hence reducing the search space. In the next step, morphemes across PoS clusters are
incrementally merged forming the basis of the paradigm capturing mechanism. In each iteration,
a morpheme pair across two di erent PoS cluster with the highest number of common stems is
chosen for merging. Once a morpheme pair is merged the words that belong to this newly formed
paradigm are removed from their respective PoS clusters. Once a word is assigned to a paradigm,
it cannot be part of any other paradigm. Thus, we postulate that a word can only belong to a
single morphological paradigm.</p>
      <p>
        Since, in our current framework, morphemes are tied to PoS clusters our de nition of paradigm
deviates from that of Goldsmith [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] in that a paradigm is a list of morpheme/cluster pairs i.e.
      </p>
      <p>= fm1=c1; : : : ; mn=cng. Associated with each paradigm is a list of stems i.e. the list of stems
that can combine with each of the morphemes mi to produce a word belonging to the ci PoS
category.</p>
      <p>Algorithm 1 describes the complete paradigm capturing process. Some examples of sample
paradigms captured are given below:</p>
      <p>English:
ed ing : reclaim aggravat hogg trimm expell administer divert register stimulat shap rehabilitat
exempt sti en spar deceiv contaminat disciplin implement stabiliz feign mistreat extricat mimick
alert seal etc
s d : implicate ditche amuse overcharge equate despise torpedoe curse plie supersede preclude
snare tangle eclipse relinquishe ambushe reimburse alienate conceive vetoe waive envie negotiate
diagnose etc
er ing : brows wring worship cropp cater stroll zipp moneymak tun chok hustl angl windsurf
swindl cricket painkill climb heckl improvis scream scaveng panhandl lawmak bark clean lifesav
beekeep toast matchmak bodybuild etc
e ed : subsid liquidat redecorat exorcis amputat fertiliz reshap regulat foreclos infring eradicat
reverberat chim centralis restructur crippl rehabilitat symbolis reinstat etc
ly er : dark cheap slow quiet fair light high poor rich cool quick broad deep bright calm crisp
mild clever etc
0 s : benchmark instrument pretzel wheelchair scapegoat spike infomercial catastrophe beard
paycheck reserve abduction</p>
      <p>Turkish:
i e : zemin faaliyetin torenler secim incelemeler eyalet nem takvim makineler yontemin becerisin
gorusmeler teknigin merkezin iklim goruntuler etc
i a : cevab bakimin mektuplar esnaf olayin akisin miktar kayd yasamay bulgular sular masra arin
heyecanin kalan haklarin anlamin etc
i in : sanayiin degerlerin esin denizler duman teminat erkekler kurullarin birbirin vatandaslarimiz
gelismesin milletvekillerin partisin
de e : bolgesin duzeyin yonetimin dergisin sektorun birimlerin bolgelerin tumun bolumlerin
tesislerin donemin kongresin evin etc
mesi en : izlen yurutul degis uretil gerceklestiril desteklen gelistiril etc
i 0 : iman cekim mahkemelerin orneklem ga et yazman sanat trendler mahalleler eviniz hamamlar
piller ogretim olimpiyat</p>
      <p>German:
r n : kurze ehemalige eidgenoessische professionelle erste bescheidene ungewoehnliche ethnische
unbekannte besondere nationalsozialistische deutsche
e en : praechtig gesichert dauerhaft bescheiden vereinbart biologisch natuerlich oekumenisch
kantonal unterirdisch wissenschaftlich nahegelegen chinesisch
t en : funktionier konkurrier schneid mitwirk ansteig plaedier pfeif aufklaer schluck ausgleich
weitermach abhol ankomm spazier speis aussteig aufhoer
er ung : versteiger unterdrueck erneuer vermarkt beschleunig besetz geschaeftsfuehr
wirtschaftsfoerder nanzverwalt verhandl
s 0 : potential instrument ohmarkt vorhang pilotprojekt idol rechner thriller ensemble
bebauungsplan emp nden defekt aufschwung
7:
8:
9:
Algorithm 1 Algorithm for paradigm capturing using syntactic categories
1: Apply unsupervised PoS clustering to the input corpus
2: For each PoS cluster c and morpheme m, compute maximum likelihood estimates of p(m j c)
3: Keep all m (in c) with p(m j c) &gt; t, where t is a threshold
4: repeat
5:
6:
for all PoS clusters c1; c2 do</p>
      <p>Pick morphemes m1 in c1 and m2 in c2 with the highest number of
common stems
Store = fm1=c1; m2=c2g as the new paradigm
Remove all words in c1 with morpheme m1 and associate these words
with .</p>
      <p>Remove all words in c2 with morpheme m2 and associate these words
with .
For capturing more general paradigms, paradigm merging is performed. We rank potential
paradigms by the ratio of common stems with the total number of stems captured by the paradigm.
More precisely, given paradigms 1; 2, let P be the total number of common stems. Let N1 be
the total number of stems in 1 that are not present in 2. Similarly, let N2 be the total number
of stems in 2 that are not present in 1. Then, we can de ne the expected paradigm accuracy of
1 with respect to 2 by:</p>
      <p>Acc1 =</p>
      <p>P
P + N1</p>
      <p>P P
Acc( 1; 2) = P +N1 + P +N2
2
(3)
(4)
Acc2 is de ned analogously.</p>
      <p>We use the average of Acc1 and Acc2 to compute the combined (averaged) expected accuracy
of the merged paradigms 1; 2:</p>
      <p>During each iteration, all the paradigm pairs having an expected accuracy greater than a given
threshold value are merged. Once two paradigms are merged, stems that occur in only one of the
paradigms inherit the morphemes from the other paradigm. This mechanism helps create a more
general paradigm and helps recover missing word forms. Thus, although some of the word forms
do not exist in the corpus, it becomes possible to capture these forms.</p>
      <p>Some example paradigms that are found by the system are given below:
English:
es ing e ed: sketch chew nipp debut met factor pro t occurr err trudg participat necessitat
stomp streak siphon stroll sprint drizzl rm climax gestur whipp roll tripp stemm dangl shu
kindl broker chalk latch rippl collaborat chok summ propp pedal paralyz parad plough cramm
slack wad saddl conjur tipp gallop totall catalogu bundl barg whittl retaliat straighten tick peek
jabb slimm</p>
      <p>s ing ed 0: benchmark mothball weed snicker thread queue jack paw yacht implement import
bracket whoop con ict spoof stunt bargain honor bird ngerprint excerpt handcu veil comment
Turkish:
u a e i : yapabileceklerin kredisin hizmetleri'n sevdikleriniz yeter' transferlerin sevkin elimiz
tehlikelerin sas mucizey tehditlerin bakir muhasebesin ed gayrimenkuller ecevit' defterim izlemelerin
tescilin minarey tahsilin lastikler yerlestirmey</p>
      <p>i lar li in : ruhsat semt ikilem reaksiyonlar harc tip prim gidilmis kaldirmis degistirmis
bulunmayacak aktarmis bulunacak kapanacak yazilabilecek devredilmis degisecek gelmemis
German:
er 0 e en: kassiert beguenstigt eingeholt genuegt angelastet beruehrt beinhaltet
zurueckgegeben beschleunigt initiiert abgestellt bewirkt mitgenommen abgebrochen beruhigt besichtigt
te ung er ten t en lich e : fahr gebrauch blockier identi zier studier entfalt gestalt agier
passier sprech berat tausch kauf such weck beug erreich bearbeit beobacht erleid ueberrasch halt
helf oe n pruef uebertre bezahl spring fuell toet</p>
      <p>0 te t er : lichtenberg limburg hill trier elmshorn dreieich praunheim heusenstamm
heddernheim hellersdorf schmitt muehlheim lueneburg kassel schluechtern preungesheim rodgau bieber
osnabrueck rodheim muenchen london lissabon seoul wedding treptow
5</p>
    </sec>
    <sec id="sec-4">
      <title>Morphological Segmentation</title>
      <p>For Morpho Challenge 2009, we rst clustered all the words in the given corpora thereby creating
a set of PoS clusters. We then followed the steps described in the previous sections to induce the
morphological paradigms.</p>
      <p>Wordlists as provided in Morpho Challenge 2009 contain the list of words that need to be
segmented. To assign a PoS cluster to given word, w from the wordlist, the context distribution
of w is rst computed. The word is assigned the PoS cluster with the minimal KL divergence. In
this case, we only consider words with a frequency greater than 10 to eliminate noise.</p>
      <p>To segment the words in the word lists, rst the word is checked if it exists in one of the
existing paradigms. We followed di erent algorithms for known, unknown and compound words:
5.1</p>
      <sec id="sec-4-1">
        <title>Handling known words</title>
        <p>If the word exists in one of the paradigms, it is segmented by using the morpheme in the paradigm
in which the word is found. For example, if a paradigm exists as given below:
s ing ed 0 : benchmark mothball weed snicker thread queue jack paw yacht implement import
bracket whoop</p>
        <p>If a word 'importing' is to be morphologically analyzed, it is automatically segmented by using
the morpheme 'ing'.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Handling unknown words</title>
        <p>If the word does not exist in any of the paradigms, a sequence of segmentation rules are applied. By
using the paradigms, we created a morpheme dictionary to split the words which do not belong
to any of the paradigms. All the morphemes in each paradigm are included in the morpheme
dictionary if in any of the paradigm the initial letters of the morphemes are not the same. If the
initial letters of the all morphemes in the same paradigm are the same, the longest morpheme is
included in the dictionary. Using the morpheme dictionary, the word is scanned from the
rightmost letter to check if any of the endings of the word exist in the dictionary. The longest letter
sequence (of the word) existing in the dictionary is chosen to split the word. The same process is
repeated after splitting the word until no split can be applied.
5.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Handling compounds</title>
        <p>For the compounds, such as 'hausaufhaben' in German, or 'railway' in English, for both known,
and unknown words a recursive approach is performed. The compounding rules split a word
recursively from the rightmost end to the left. If an ending sequence of letters exists as a word in
the corpus, the sequence is split, and the same procedure is repeated until no valid internal word
part is a valid word itself in the corpus. When there are multiple matches the longest match is
chosen. This recursive search is also able to nd the pre xes as it searches for the valid sub-words
in the words.</p>
        <p>Algorithm for the segmentation of the words is given in Algorithm 2.</p>
        <p>Algorithm 2 Morphological Segmentation
1: for all For each given word, w, to be segmented do
2: if w already exists in a paradigm then
3: Split w using as w = u + m
4: else
5:
6:
7:</p>
        <p>u = w
end if
If possible split u recursively from the rightmost end by using the morpheme dictionary
as u = s1 + : : : + sn otherwise s1 = u
8: If possible split s1 into its sub-words recursively from the rightmost end as s1 = w1 +
: : : + wn
9: end for
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>6.1</p>
      <sec id="sec-5-1">
        <title>Datasets</title>
        <p>The model is evaluated in Morpho Challenge 2009 competition. Here we describe the datasets we
used, our model parameters, and nally we give our evaluation results.</p>
        <p>We used the datasets given by Morpho Challenge 2009, and Cross Language Evaluation Forum
(CLEF) to train our system on 3 di erent languages: English, German, and Turkish. For PoS
clustering, we used the given corpora by Morpho Challenge 2009 1. For the clustering of the words
in the word lists to be segmented we used the datasets supplied by CLEF organization 2. We used
the CLEF dataset to obtain context distributions of the words in German, and in English. For
Turkish, we used a collection manually collected newspaper archives.
6.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Model Parameters</title>
        <p>Our model is unsupervised, but it requires two prior parameters to be manually set. First, the
threshold t on the conditional probability of the morpheme given its PoS cluster, P (mjc), needs
to be xed. We tested di erent values of this parameter for each language to nd a suitable value
through trail and error and we set t = 0:1. Thus, only morphemes, m, with P (mjc) &gt; 0:1 were
considered. Second, the threshold on the expected accuracy, T , of merging two paradigms 1; 2
given in Equation 4 needs to be set. Smaller values of this threshold leads to bigger paradigms
with more stems, but it decreases the accuracy. Several experiments were performed to nd its
optimum value for di erent languages and a value of T = 0:75 was chosen. Both thresholds t and
T once set were unchanged across all experiments reported in this paper.
6.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Evaluation &amp; Results</title>
        <p>The system was evaluated in Competition 1 of Morpho Challenge 2009. Precision is calculated
by sampling pairs of words with the same morpheme(s) from the system output and checking
this against the gold standard. Recall is calculated in a similar way but this time pairs of words
with the same morpheme(s) are sampled from the gold standard and checked against the system
output. F-measure is the harmonic mean of the precision, and the recall:</p>
        <p>F
measure =</p>
        <p>1
1 1
P recision + Recall
(5)
Evaluation results corresponding to the English language are given in Table 2.</p>
        <p>Evaluation results corresponding to the German language are given in Table 3. We conducted
two di erent experiments for German. In the rst experiment, we used only the compounding
rules (see Section 5.3 ) for the German word list. Since German heavily consists of compounds,
the results in Table 3 show that the compounding rules have high precision but low recall. In the
second experiment, we used the unsupervised model developed in this paper.</p>
        <p>Evaluation results for Turkish are given in Table 4. Two di erent experiments were conducted
for Turkish. In the rst experiment, a validity check was performed while splitting the word
recursively to decide whether to split the word. The validity check simply checks the membership
of the given word in the Turkish corpus. If the rest of the word after splitting one morpheme
exists in the corpus, the validity condition is assumed to be met. In the second experiment, no
validity check is performed. Instead the morpheme dictionary is used. The morpheme dictionary
is constructed from the learnt morphological paradigms by extracting all the morphemes to create
a dictionary. The word is split recursively from the rightmost end by matching these with the
morphemes in the morpheme dictionary. Our experiments show that the precision gets higher
when a validity check is done but the recall is reduced. Since the Turkish dataset does not include
all the forms of every word, the validity check is not reliable leading to a lower recall.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion &amp; Future Work</title>
      <p>In this paper, we have developed a model for unsupervised morphology learning that exploits the
PoS categories induced by an unsupervised PoS tagger. To our knowledge there has been very
limited work on combined learning of syntactic categories and morphology. Our results demonstrate
that it is meaningful to use the syntactic categorial information for morphology learning. One
problem with the current approach is that it requires a large amount of corpus for PoS clustering.
If a word does not have enough context information due to corpus size, it can not be clustered.
The system then segments this by using just the morpheme dictionary. This in turn leads to
inaccurate segmentations. The current system also requires manual setting of some thresholds.
Furthermore, the system is very sensitive to these thresholds.</p>
      <p>In the near future, we plan to address the above issues with the current model. In particular,
we are interested in generative models for joint learning of morphology and PoS categories.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Bordag</surname>
          </string-name>
          .
          <article-title>Unsupervised and knowledge-free morpheme segmentation and analysis</article-title>
          .
          <source>In Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the CrossLanguage Evaluation Forum (CLEF)</source>
          , pages
          <fpage>881</fpage>
          {
          <fpage>891</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Michael</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Brent</surname>
          </string-name>
          .
          <article-title>Minimal generative models: A middle ground between neurons and triggers</article-title>
          .
          <source>In Proceedings of the 5th International Workshop on Arti cial Intelligence and Statistics</source>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Michael</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Brent</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sreerama K. Murthy</surname>
            , and
            <given-names>Andrew</given-names>
          </string-name>
          <string-name>
            <surname>Lundberg</surname>
          </string-name>
          .
          <article-title>Discovering morphemic su xes a case study in MDL induction</article-title>
          .
          <source>In Fifth International Workshop on AI and Statistics</source>
          , pages
          <volume>264</volume>
          {
          <fpage>271</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <given-names>Issco</given-names>
            <surname>Tim</surname>
          </string-name>
          .
          <article-title>Combining distributional and morphological information for part of speech induction</article-title>
          .
          <source>In Proceedings of the 10th Annual Meeting of the European Association for Computational Linguistics (EACL)</source>
          , pages
          <fpage>59</fpage>
          {
          <fpage>66</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Clark</surname>
          </string-name>
          .
          <article-title>Inducing syntactic categories by context distribution clustering</article-title>
          .
          <source>In The Fourth Conference on Natural Language Learning (CoNLL)</source>
          , pages
          <fpage>91</fpage>
          {
          <fpage>94</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Creutz</surname>
          </string-name>
          .
          <article-title>Unsupervised segmentation of words using prior distributions of morph length and frequency</article-title>
          .
          <source>In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL)</source>
          , pages
          <fpage>280</fpage>
          {
          <fpage>287</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Creutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Krista</given-names>
            <surname>Lagus</surname>
          </string-name>
          .
          <article-title>Unsupervised discovery of morphemes</article-title>
          .
          <source>In Proceedings of the ACL workshop on Morphological and phonological learning</source>
          , pages
          <volume>21</volume>
          {
          <fpage>30</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Herve</given-names>
            <surname>Dejean</surname>
          </string-name>
          and
          <string-name>
            <given-names>Basse</given-names>
            <surname>Norm</surname>
          </string-name>
          .
          <article-title>Morphemes as necessary concept for structures discovery from untagged corpora</article-title>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>John</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          .
          <article-title>Unsupervised learning of the morphology of a natural language</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>27</volume>
          (
          <issue>2</issue>
          ):
          <volume>153</volume>
          {
          <fpage>198</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>John</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          .
          <article-title>An algorithm for the unsupervised learning of morphology</article-title>
          .
          <source>In Natural Language Engineering</source>
          , volume
          <volume>12</volume>
          , pages
          <fpage>353</fpage>
          {
          <fpage>371</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Zellig</surname>
            <given-names>Sabbettai</given-names>
          </string-name>
          <string-name>
            <surname>Harris</surname>
          </string-name>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          ,
          <volume>10</volume>
          (
          <issue>23</issue>
          ):
          <volume>146</volume>
          {
          <fpage>162</fpage>
          ,
          <year>1954</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Yu</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Matveeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Sprague</surname>
          </string-name>
          .
          <article-title>Using morphology and syntax together in unsupervised learning</article-title>
          .
          <source>In Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition</source>
          , pages
          <volume>20</volume>
          {
          <fpage>27</fpage>
          ,
          <string-name>
            <surname>June</surname>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Fred</given-names>
            <surname>Karlsson</surname>
          </string-name>
          .
          <article-title>Finnish grammar</article-title>
          . WSOY, Juva,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Monson</surname>
          </string-name>
          .
          <article-title>Paramor: From Paradigm Structure to Natural Language Morphology Induction</article-title>
          .
          <source>PhD thesis</source>
          , Language Technologies Institute, School of Computer Science, Carnegie Mellon University,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Schone</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <article-title>Knowledge-free induction of morphology using latent semantic analysis</article-title>
          .
          <source>In Proceedings of CoNLL-2000 and LLL-2000</source>
          , pages
          <fpage>67</fpage>
          {
          <fpage>72</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Matthew</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Snover</surname>
            ,
            <given-names>Gaja E.</given-names>
          </string-name>
          <string-name>
            <surname>Jarosz</surname>
            and
            <given-names>Michael R.</given-names>
          </string-name>
          <string-name>
            <surname>Brent</surname>
          </string-name>
          .
          <article-title>Unsupervised learning of morphology using a novel directed search algorithm: taking the rst step</article-title>
          .
          <source>In Proceedings of the ACL workshop on Morphological and phonological learning</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>