<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Hand-Written Rewrite Rules to Induce Underlying Morphology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael A. Tepper</string-name>
          <email>mtepper@u.washington.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Washington</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Allomorphic variation, or form-variation among morphemes with the same referential meaning, is often mentioned as a stumbling block to unsupervised morphological induction. To address this problem head-on, we present a hybrid approach that uses a small amount of linguistic knowledge in the form of orthographic rewrite rules to help refine an existing segmentation. Our goal is to learn when surface morphs (units of the segmentation) should really be counted together as the same underlying morpheme. In order to do this, we customize the Morfessor algorithm and model developed by Mathias Creutz and Krista Lagus, adding segmentation analyses generated by orthographic rewrite rules along with a statistical framework to predict when analyses should be used as underlying morphemes. An initial segmentation produced by Morfessor Categories-MAP 0.9.2 is used as input. To suggest underlying morphemes, a set of language-specific orthographic rules is currently needed. Though we are not officially a part of the Challenge competition, for English and Turkish we report 62.22% and 54.83% contest F-measures, which amount to 2% and 48% improvements respectively over top unsupervised entrants for those languages.</p>
      </abstract>
      <kwd-group>
        <kwd>Morphological induction</kwd>
        <kwd>Allomorphic variation</kwd>
        <kwd>Knowledge-lite</kwd>
        <kwd>Word segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body />
  <back>
    <ref-list />
  </back>
</article>