<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Twelve Years of Unsupervised Dependency Parsing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Marecˇek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics Charles University in Prague</institution>
          ,
          <addr-line>Malostranské nám. 25, 118 00, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>56</fpage>
      <lpage>62</lpage>
      <abstract>
        <p>In the last 12 years, there has been a big progress in the field of unsupervised dependency parsing. Different approaches however sometimes differ in motivation and definition of the problem. Some of them allow using resources that are forbidden by others, since they are treated as a kind of supervision. The goal of this paper is to define all the variants of unsupervised dependency parsing problem and show their motivation, progress, and the best results. We also discuss the usefulness of the unsupervised parsing generally, both for the formal linguistics and for the applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Dependency parsing is one of the traditional tasks in
natural language processing. It gets a tokenized sentence as
input (in most cases, individual tokens (words) are labelled
by part-of-speech (POS) tags), and produces a rooted
dependency tree, in which the nodes correspond to words and
edges correspond to syntactic relations between the words.</p>
      <p>
        Rule-based approaches of dependency parsing were
suppressed by the statistical dependency parsers, which
achieved better quality compared to the human
annotations. Important milestones in dependency parsing were
the CoNLL shared tasks in 2006 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and 2007 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. They
provided about 20 treebanks of different languages
available in the same format. This became the standard for
measuring quality of dependency parsers in fact up to
now. 1
      </p>
      <p>At the same time, there were efforts to develop a
parser that does not need any annotated data. The
unsupervised parsers infer the dependency structures based
on language- and tagset-independent properties of
dependency trees, which is mainly the low entropies of the
governing-dependent word pairs and low entropies of the
word fertilities (number of dependents).</p>
      <p>One general motivation is to be able to parse languages
for which no annotated treebanks exist. Less sound
motivation is to create a dependency structure which better
suits a particular NLP application, e.g. machine
translation.</p>
      <p>
        This is a survey paper about unsupervised dependency
parsers. Since different approaches have different
moti1In recent years, many researchers work on a project called
Universal Dependencies [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], a collection of treebanks for many languages
(51 treebanks and 40 languages in its current version 1.3), where the
morphological and dependency annotation styles are unified across the
languages.
vations, allow to use different kinds of data and different
amount of knowledge about them, they cannot be
compared because of different degree of (un)supervision. The
aim of this paper is to cluster the approaches to several
groups in which they are comparable and to show the most
important ones together with the results.
      </p>
      <p>The paper is structured as follows: In Section 2, we
define different unsupervised parsing problem settings and
summarize the motivations and advantages. Section 3
describes different evaluation measures developed for
unsupervised parsers. In Section 4, we go through the works
done in this field and describe the most important
approaches. In Section 5, we compare the results across data,
parsers, and languages. Section 6 discusses generally the
usefulness of unsupervised parsing methods in linguistics
and in applications. Section 7 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Settings</title>
      <p>Some unsupervised parsing approaches use different kinds
of data for the grammar inference than others. What is
used in one is treated as not allowed kind of supervision
in another. We therefore categorize the approaches into
four groups. They are described in the following
subsections and sorted from the least unsupervised (more data
and knowledge) to the most unsupervised (less data and
knowledge).
2.1</p>
      <sec id="sec-2-1">
        <title>Using supervised POS tags and some knowledge about them</title>
        <p>In the first group of approaches, there are parsers that need
the sentences labelled by supervised POS tags, i.e. by a
manually designed tagset. On top of that they also
somehow utilizes the knowledge about the tagset. For example,
they know which tags are used for verbs and therefore treat
them differently through the grammar inference. This is
the main difference from the second group (Section 2.2)
and it is sometimes considered as a bit of cheating. If we
know the meaning of the POS tags, we could easily build
a simple rule-based parser, which would definitely not be
unsupervised. This also relates to so-called delexicalized
parsing, where the parser is trained on a different language
with the same POS tagset and the model can then be used
for languages without treebanks. This is however beyond
the scope of this paper. The approaches we assigned to
this group however use only a bit of such knowledge that
help the inferred structures to be in a required shape.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Using supervised POS tags without any knowledge about them</title>
        <p>The majority of works describing unsupervised
dependency parsing utilize supervised POS tags without any
knowledge about them. In other words, the parsers take
the POS tags only as labels without any meaning. It is
a bit strange not to tell the parser anything, for example:
“ADJ are adjectives and often depends on the following
NOUNs”, if there is such possibility, however allowing
it would bring the parsers to the first group (Section 2.1)
whose unsupervisedness is sometimes disputable.</p>
        <p>Nevertheless, what is more strange, is the usage of
supervised POS tags. The POS tags carry a lot of
syntactic information. Imagine a sequence of POS tags “ADJ
NOUN VERB PREP ADJ NOUN”. You would easily
build the most probable dependency tree. The motivation
of this problem setting may be:
1. We want to compare supervised and unsupervised
parsers operating on the same tagset.
2. We want to evaluate an unsupervised parser and, in
the future, we will use the unsupervised word classes
instead of the supervised tags on low-resourced
languages and hope that it will work as well.
3. We have a language without treebank and we have
a POS tagger. However, we are not able to find the
meaning of POS tags used.</p>
        <p>The third option is rather hypothetical. We always find
someone who speaks that language or have a parallel
corpus from which we could get basic meanings of individual
words and tags.</p>
        <p>It is also worth to mention that almost all the
experiments and evaluation in the papers were done using gold
standard POS tags, i.e. the POS tags assigned manually
by human annotations. This is not surprising. While the
qualities of unsupervised parsers are substantially lower
than the qualities of supervised parsers, it is not worthy to
make experiments also with the predicted POS tags.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Using unsupervised POS tags</title>
        <p>The lower attention was given to fully unsupervised
parsers using unsupervised POS tags (words classes). The
only source they use is raw text. The motivation is obvious
here: If we want to analyze a language without any
manually annotated resources, we need exactly this approach.
Other motivation could be the need of having different
structures from that present in annotated treebanks.
Majority of works here used the same parser as for supervised
POS tags (Section 2.2) and obtain the unsupervised POS
tags by some of the best word clustering tools available.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Direct parsing from raw text without POS tags</title>
        <p>The last setting we describe is unsupervised parsing from
raw texts. Here we do not use any POS tags or word
classes. The only units the parser plays with are the words.
The results should be theoretically compared with the
previous category (2.3), where unsupervised word clustering
is used. However, the word classes are typically inferred
on much larger text corpora than dependency trees are.
This approaches use therefore much less data for the
inference and that is why we assign them into a separate
category. Such approaches would be the most elegant way of
parsing, however, they naturally achieve very poor results.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Parsing Evaluation</title>
      <p>
        The unsupervised parsing approaches sometimes differ
also in evaluation metrics. The standard attachment score
is sometimes found too strict to evaluate the inferred
structures and therefore new, more tolerant metrics, are
designed. The following three evaluation metrics exists:
1. Directed attachment score (unlabeled attachment
score2) is a standard metric for measuring
dependency parsing quality. It is a percentage of words
correctly attached to their parents. It does not allow even
the slightest local structural differences, which might
be caused just by more or less arbitrary linguistic or
technical conventions.
2. Undirected attachment score disregards the
directions of edges and is therefore less biased towards
such conventions. For example, there is no difference
whether the parser attaches prepositions to nouns or
nouns to prepositions. Nevertheless, this holds for all
edges, including these with undoubted directions.
3. Neutral edge direction3 metric proposed by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] is
even more tolerant in assessing parsing errors than
the undirected attachment score. It treats not only
node’s parent and child as the correct answer, but also
its grandparent.
      </p>
      <p>Even though the alternative scores were proposed and
sometimes used, the majority of experiments were
evaluated by the directed attachment scores, probably because
of its simplicity and the tradition in the field and also
because the other two did not prove to be substantially better.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Unsupervised Dependency Parsers</title>
      <p>In this Section, we summarize and describe the most
important works in the field of unsupervised dependency
2We do not want to use the abbreviation UAS for the unlabeled
attachment score here, since it could be mistaken for undirected attachment
score.</p>
      <p>
        3http://www.cs.huji.ac.il/ roys02/softwae/ned.html
parsing through the last 12 years. Even though there were
a couple of works before, the first paper with results better
than a chain baseline 4 was the Dependency Model with
Valence by Klein and Manning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>We first describe the methods using supervised POS
tags without any other knowledge (Section 2.2) in
Sections 4.1 and 4.2, then we switch to other settings. A
detailed table with results over different methods and
different problem settings is shown in Section 5.
4.1</p>
      <sec id="sec-4-1">
        <title>Dependency Model with Valence</title>
        <p>
          We start with Dependency Model with Valence (DMV),
which was introduced by Klein and Manning [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. It is
the most popular approach, which was followed by many
other researchers and improved in many ways. It is a
generative model that generates dependency trees using two
submodels:
• Stop model pstop(·|tg, dir) represents probability of
not generating another dependent in direction dir to
a node with POS tag tg. The direction dir can be left
or right. If pstop = 1, the node with the tag tg cannot
have any dependent in direction dir. If it is 1 in both
directions, the node is a leaf.
• Attach model pattach(td |tg, dir) represents probability
that the dependent of the node with POS tag tg in
direction dir is labeled with POS tag td .
        </p>
        <p>
          The grammar consisting of probability distributions
pstop and pattach is learned using the Expectation
Maximization inside-outside algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The learning
is further improved by Smith et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and Cohen et
al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Headden et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] introduce the Extended
Valence Grammar and add lexicalization and smoothing.
Besides the POS tags, the parser begin to operate with word
forms as well. Blunsom and Cohn [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] use tree
substitution grammars, which allow learning of larger
dependency fragments by employing the Pitman-Yor
process. Spitkovsky [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] improves the inference using
iterated learning of increasingly longer sentences. Further
improvements are achieved by better dealing with
punctuation [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] and new “boundary” models [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]. Spitkovsky
also improves the learning itself in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] and [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
        </p>
        <p>
          Marecˇek and Straka [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] use so called reducibility
principle to predict pstop probabilities for individual POS tags
from raw texts, add it to the Dependency Model with
Valence and use Gibbs sampling to infer the grammar.
In [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], they suppose that the function words, which can
be predicted by they shortness, have fixed low number of
dependents and move the parsing results even a bit higher.
        </p>
        <p>4In the left or right chain baseline, each word is attached to the next
or previous one respectively.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Other approaches using supervised POS tagset</title>
        <p>
          There are also approaches not based on DMV, even though
their models are not far from it. Marecˇek and
Žabokrtský [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] use a fertility to model number of children for
particular POS tags instead of the pstop model.
        </p>
        <p>
          Sogaard [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] explores a completely different view in
which a dependency structure is among other things a
partial order on the nodes in terms of centrality or saliency.
        </p>
        <p>
          Cohen et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] do the grammar inference
multilingually on more languages. The data do not need to be
parallel, they only have to share the tagset. The inference
is then less prone to skew to bad solutions due to the
language differences.
        </p>
        <p>
          Bisk and Hockenmaier [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] use the Combinatory
Categorial Grammars for dependency structure induction.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Approaches using some knowledge about the</title>
      </sec>
      <sec id="sec-4-4">
        <title>POS tags</title>
        <p>
          The “less unsupervised” approaches utilizing an external
knowledge of the POS tagset reach often better
attachment scores than the previous approaches. Any additional
knowledge about the tags used can be very strong and
can change the inferred structures dramatically. For
example, Naseem et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] follow Eisner [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and make use
of manually-specified universal dependency rules such as
Verb→Noun or Noun→Adjective to guide grammar
induction and improve the results by a wide margin. Marecˇek
and Žabokrtský [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] show that only the information that
“the POS tags for nouns are more frequent than the POS
tags for verbs” very much improves the baseline. This
however fails for example in case the POS tags for nouns
are subcategorized in some way. Then we would need to
know which POS tags are for nouns and group them
together. Rasooli and Faili [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] identify the last verb in the
sentence, minimize its probability of reduction and push it
to the root position, and also make a huge improvement.
        </p>
        <p>Such approaches achieve better results; however, they
are useless for grammar induction for languages, for which
the tagger is not available.
4.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Approaches using unsupervised POS tags</title>
        <p>
          These approaches mostly do not bring any new methods.
The authors only take their unsupervised parsers we
presented in Section 4.1, take a word clustering tool to
produce unsupervised POS tags and run their parser on them.
Spitkovsky [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] took the clustering tool by Clark [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and
Brown et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and showed that the parsing with
supervised POS tags can be outperformed for English, if the
word classes are used instead. Marecˇek [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] performed
similar experiments on 30 languages and showed that on
some of them the use of unsupervised word classes instead
of supervised POS tags improve the parsing accuracy. The
average score across the languages was however
significantly worse.
        </p>
        <p>
          Christodoulopoulos et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] try to do inference of POS
tags and dependency structure together. After random
initialization, they alternate the prediction of the structure
based on the POS tags and prediction of the POS tags
based on the structure.
4.5
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>Approaches using raw text only</title>
        <p>
          There are couple of approaches, which do not need any
word categorization. We only mention the incremental
parsing by Yoav Seginer [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. His algorithm collects lists
of labels for each word, based on neighboring words, and
then directly uses these labels to parse.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        In Tables 1, 2, and 3, we summarize the results over the
individual parsers, data, and settings. Unfortunately,
different parsers were evaluated on different data. In the
beginnings, the parsers were evaluated mainly on the
English Penn Treebank [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (transformed to dependencies)
and some only on the short sentences of length up to 10
(ptb10), since the shorter sentences were easier to parse
and the resulting scores did not look so bad. See Table 1.
      </p>
      <p>After the unsupervised parsers were improved and
achieved much better results than simple baselines, they
started to be evaluated across languages and on sentences
of all lengths (Table 2).</p>
      <p>
        In 2012, there has been a shared task on unsupervised
dependency parsing named “The PASCAL challenge on
Grammar Induction” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Seven competing parsers were
evaluated on new datasets comprising ten different
languages, including simpler English used by small children.
See Table 3.
      </p>
      <p>Unfortunately, some of the parsers were evaluated on
non-standard data or with non-standard metrics and
therefor their results could not be added into any of the three
tables.</p>
      <p>All the tables share the same format: each method is
labelled by a link to the references and by a group label: SP
for using supervised POS tags, UP for using unsupervised
POS tags, and SP+K when an additional knowledge about
the supervised tags was used.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Usefulness of Unsupervised parsers in linguistics and applications</title>
      <p>We could see a lot of work done in the field of
unsupervised parsing in the last 12 years. The quality of induced
structures are better than before, but the supervised parsers
are still better then the unsupervised ones by a wide
margin. However, for low resourced languages, for which no
annotated data exists, this is the way, how to obtain their
syntactic structure.</p>
      <p>A more serious problem with unsupervised parsing is
that, according to our knowledge, there were so far no
works incorporating any kind of unsupervised parsing into
applications, even though many papers mention that in
some cases, an unsupervised structures, different from
manual annotations following a given schema, may be
very beneficial.</p>
      <p>Moreover, in the last two years, no new strong paper
about unsupervised parsing appeared in NLP conferences.
Instead, a new techniques have arrived: The recurrent
neural networks, which may fulfill the previous motivations
for unsupervised parsing – to find a structure of language
that would help machines to understand it better. Instead
of dependency trees, some structures are hidden in hidden
states of the deep neural networks.</p>
      <p>From the linguistic point of view, the structures inferred
by unsupervised parsers can be compared to the manually
annotated treebanks. What are the differences? How the
unsupervised methods deal with phenomena that are not
clear how to parse? Should prepositions depend on nouns
or vice versa? And what about coordinations? Many such
questions could be answered, however, neither this topic
was studied so far.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>We categorized the unsupervised dependency parsers into
four groups according to their needs of data, so that they
could be fairly compared. We make a survey over the most
important papers and works that reached state-of-the-art
results when they were published. We showed a
comparison of the results across the methods and languages. It is
apparent that there is a big variance over the attachment
scores for individual languages. The good performance
of a method on one language tells nothing about the
performance on another language. We hope that this paper
brings to readers some system in the world of
unsupervised parsing.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been supported by the grant 14-06548P of
the Czech Science Foundation.
link
group
PennTreebank (up to 10 words)
PennTreebank (all sentences)
–
–
27.7
24.0
–
–
38.4
31.7
4
0
0
2
n
i
e
l</p>
      <p>
        K
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
SP
47.5
8
0
0
2
n
e
h
o
      </p>
      <p>C
link
group
Arabic
Basque
Czech
Danish</p>
      <p>Dutch
English (childes)</p>
      <p>English (PTB)</p>
      <p>Portuguese
Slovenian</p>
      <p>Swedish
average</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Yonatan</given-names>
            <surname>Bisk</surname>
          </string-name>
          and
          <string-name>
            <given-names>Julia</given-names>
            <surname>Hockenmaier</surname>
          </string-name>
          .
          <article-title>Induction of linguistic structure with combinatory categorial grammars</article-title>
          .
          <source>The NAACL-HLT Workshop on the Induction of Linguistic Structure, page 90</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Phil</given-names>
            <surname>Blunsom</surname>
          </string-name>
          and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Cohn</surname>
          </string-name>
          .
          <article-title>Unsupervised induction of tree substitution grammars for dependency parsing</article-title>
          .
          <source>In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10</source>
          , pages
          <fpage>1204</fpage>
          -
          <lpage>1213</lpage>
          , Stroudsburg, PA, USA,
          <year>2010</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Peter F. Brown</surname>
            , Peter V. deSouza, Robert L. Mercer,
            <given-names>Vincent J. Della</given-names>
          </string-name>
          <string-name>
            <surname>Pietra</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jenifer</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lai</surname>
          </string-name>
          .
          <article-title>Class-based n-gram models of natural language</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>467</fpage>
          -
          <lpage>479</lpage>
          ,
          <year>December 1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Buchholz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Marsi</surname>
          </string-name>
          .
          <article-title>CoNLL-X shared task on multilingual dependency parsing</article-title>
          .
          <source>In Proceedings of the Tenth Conference on Computational Natural Language Learning</source>
          , CoNLL-X '
          <volume>06</volume>
          , pages
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          , Stroudsburg, PA, USA,
          <year>2006</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Christos</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          , Sharon Goldwater, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steedman</surname>
          </string-name>
          .
          <article-title>Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction</article-title>
          .
          <source>In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure</source>
          , pages
          <fpage>96</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>June 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Clark</surname>
          </string-name>
          .
          <article-title>Combining distributional and morphological information for part of speech induction</article-title>
          .
          <source>Proceedings of 10th EACL</source>
          , pages
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Shay</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dipanjan Das</surname>
          </string-name>
          , and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Unsupervised structure prediction with non-parallel multilingual guidance</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11</source>
          , pages
          <fpage>50</fpage>
          -
          <lpage>61</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Shay</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
          </string-name>
          , Kevin Gimpel, and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Logistic normal priors for unsupervised probabilistic grammar induction</article-title>
          .
          <source>In Neural Information Processing Systems</source>
          , pages
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Eisner</surname>
          </string-name>
          .
          <article-title>Three New Probabilistic Models for Dependency Parsing: An Exploration</article-title>
          .
          <source>In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96)</source>
          , pages
          <fpage>340</fpage>
          -
          <lpage>345</lpage>
          , Copenhagen,
          <year>August 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Douwe</surname>
            <given-names>Gelling</given-names>
          </string-name>
          , Trevor Cohn, Phil Blunsom, and
          <string-name>
            <given-names>Joao</given-names>
            <surname>Graca</surname>
          </string-name>
          .
          <article-title>The PASCAL Challenge on Grammar Induction</article-title>
          .
          <source>In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure</source>
          , pages
          <fpage>64</fpage>
          -
          <lpage>80</lpage>
          , Montréal, Canada,
          <year>June 2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>William P. Headden</surname>
            <given-names>III</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mark</surname>
            <given-names>Johnson</given-names>
          </string-name>
          , and
          <string-name>
            <surname>David McClosky</surname>
          </string-name>
          .
          <article-title>Improving unsupervised dependency parsing with richer contexts and smoothing</article-title>
          .
          <source>In Proceedings of Human Language Technologies</source>
          :
          <article-title>The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          ,
          <source>NAACL '09</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>109</lpage>
          , Stroudsburg, PA, USA,
          <year>2009</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Klein</surname>
          </string-name>
          .
          <article-title>The Unsupervised Learning of Natural Language Structure</article-title>
          .
          <source>PhD thesis</source>
          , Stanford University,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Klein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Corpus-based induction of syntactic structure: models of dependency and constituency</article-title>
          .
          <source>In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics</source>
          , ACL '04,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2004</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mitchell</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Marcus</surname>
          </string-name>
          , Beatrice Santorini, and
          <string-name>
            <surname>Mary</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Marcinkiewicz</surname>
          </string-name>
          .
          <article-title>Building a Large Annotated Corpus of English: The Penn Treebank</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):
          <fpage>313</fpage>
          -
          <lpage>330</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>David</given-names>
            <surname>Marecˇek</surname>
          </string-name>
          .
          <article-title>Multilingual unsupervised dependency parsing with unsupervised pos tags</article-title>
          . In Grigorii Sidorov and
          <string-name>
            <given-names>N.</given-names>
            <surname>Sofía</surname>
          </string-name>
          Galicia-Haro, editors,
          <source>Advances in Artificial Intelligence and Soft Computing: 14th Mexican International Conference on Artificial Intelligence, MICAI</source>
          <year>2015</year>
          , Cuernavaca, Morelos, Mexico,
          <source>October 25-31</source>
          ,
          <year>2015</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <fpage>72</fpage>
          -
          <lpage>82</lpage>
          , Cham,
          <year>2015</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>David Marecˇek</surname>
            and
            <given-names>Milan</given-names>
          </string-name>
          <string-name>
            <surname>Straka</surname>
          </string-name>
          .
          <article-title>Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>281</fpage>
          -
          <lpage>290</lpage>
          , Sofia, Bulgaria,
          <year>August 2013</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>David Marecˇek and Zdeneˇk Žabokrtský</surname>
          </string-name>
          .
          <article-title>Gibbs Sampling with Treeness constraint in Unsupervised Dependency Parsing</article-title>
          .
          <source>In Proceedings of RANLP Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , Hissar, Bulgaria,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>David Marecˇek and Zdeneˇk Žabokrtský</surname>
          </string-name>
          .
          <article-title>Exploiting reducibility in unsupervised dependency parsing</article-title>
          .
          <source>In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12</source>
          , pages
          <fpage>297</fpage>
          -
          <lpage>307</lpage>
          , Stroudsburg, PA, USA,
          <year>2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>David Marecˇek and Zdeneˇk Žabokrtský</surname>
          </string-name>
          .
          <article-title>Dealing with function words in unsupervised dependency parsing</article-title>
          .
          <source>In Computational Linguistics and Intelligent Text Processing, CICLing 2014</source>
          , pages
          <fpage>250</fpage>
          -
          <lpage>261</lpage>
          , Kathmandu, Nepal,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Tahira</surname>
            <given-names>Naseem</given-names>
          </string-name>
          , Harr Chen, Regina Barzilay, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <article-title>Using universal linguistic knowledge to guide grammar induction</article-title>
          .
          <source>In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10</source>
          , pages
          <fpage>1234</fpage>
          -
          <lpage>1244</lpage>
          , Stroudsburg, PA, USA,
          <year>2010</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Joakim</surname>
            <given-names>Nivre</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marie-Catherine de Marneffe</surname>
          </string-name>
          , Filip Ginter, Yoav Goldberg,
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Christopher Manning,
          <string-name>
            <surname>Ryan</surname>
            <given-names>McDonald</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Slav</given-names>
            <surname>Petrov</surname>
          </string-name>
          , Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman.
          <article-title>Universal dependencies v1: A multilingual treebank collection</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), Portorož, Slovenia,
          <year>2016</year>
          . European Language Resources Association.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Joakim</surname>
            <given-names>Nivre</given-names>
          </string-name>
          , Johan Hall, Sandra Kübler,
          <string-name>
            <surname>Ryan</surname>
            <given-names>McDonald</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jens</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Riedel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Deniz</given-names>
            <surname>Yuret</surname>
          </string-name>
          .
          <article-title>The CoNLL 2007 Shared Task on Dependency Parsing</article-title>
          .
          <source>In Proceedings of the CoNLL Shared Task Session of EMNLPCoNLL</source>
          <year>2007</year>
          , pages
          <fpage>915</fpage>
          -
          <lpage>932</lpage>
          , Prague, Czech Republic,
          <year>June 2007</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Sadegh</surname>
          </string-name>
          Rasooli and
          <string-name>
            <given-names>Heshaam</given-names>
            <surname>Faili</surname>
          </string-name>
          .
          <article-title>Fast unsupervised dependency parsing with arc-standard transitions</article-title>
          .
          <source>In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, ROBUS-UNSUP</source>
          <year>2012</year>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , Stroudsburg, PA, USA,
          <year>2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Roy</surname>
            <given-names>Schwartz</given-names>
          </string-name>
          , Omri Abend, Roi Reichart, and
          <string-name>
            <given-names>Ari</given-names>
            <surname>Rappoport</surname>
          </string-name>
          .
          <article-title>Neutralizing linguistically problematic annotations in unsupervised dependency parsing evaluation</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>663</fpage>
          -
          <lpage>672</lpage>
          , Portland, Oregon, USA,
          <year>June 2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Seginer</surname>
          </string-name>
          .
          <article-title>Fast Unsupervised Incremental Parsing</article-title>
          .
          <source>In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics</source>
          , pages
          <fpage>384</fpage>
          -
          <lpage>391</lpage>
          , Prague, Czech Republic,
          <year>2007</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Noah</surname>
            <given-names>Ashton</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Novel estimation methods for unsupervised discovery of latent structure in natural language text</article-title>
          .
          <source>PhD thesis</source>
          , Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <year>2007</year>
          . AAI3240799.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Anders</given-names>
            <surname>Søgaard</surname>
          </string-name>
          .
          <article-title>From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing</article-title>
          .
          <source>In Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing, TextGraphs-6</source>
          , pages
          <fpage>60</fpage>
          -
          <lpage>68</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Anders</given-names>
            <surname>Søgaard</surname>
          </string-name>
          .
          <article-title>Two baselines for unsupervised dependency parsing</article-title>
          .
          <source>In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure</source>
          , pages
          <fpage>81</fpage>
          -
          <lpage>83</lpage>
          , Montréal, Canada,
          <year>June 2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi,
          <string-name>
            <surname>Angel</surname>
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , and Daniel Jurafsky.
          <article-title>Unsupervised dependency parsing without gold part-of-speech tags</article-title>
          .
          <source>In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2011</year>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <article-title>From baby steps to leapfrog: how "less is more" in unsupervised dependency parsing</article-title>
          .
          <source>In Human Language Technologies</source>
          :
          <article-title>The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          ,
          <source>HLT '10</source>
          , pages
          <fpage>751</fpage>
          -
          <lpage>759</lpage>
          , Stroudsburg, PA, USA,
          <year>2010</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi, and Daniel Jurafsky. Lateen EM:
          <article-title>Unsupervised training with multiple objectives, applied to dependency grammar induction</article-title>
          .
          <source>In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2011</year>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi, and Daniel Jurafsky. Punctuation:
          <article-title>Making a point in unsupervised dependency parsing</article-title>
          .
          <source>In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL2011)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi, and Daniel Jurafsky.
          <article-title>Three Dependency-and-Boundary Models for Grammar Induction</article-title>
          .
          <source>In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL</source>
          <year>2012</year>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Valentin</surname>
            <given-names>I. Spitkovsky</given-names>
          </string-name>
          , Hiyan Alshawi, and Daniel Jurafsky.
          <article-title>Breaking out of local optima with count transforms and model recombination: A study in grammar induction</article-title>
          .
          <source>In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1983</fpage>
          -
          <lpage>1995</lpage>
          , Seattle, Washington, USA,
          <year>October 2013</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Kewei</given-names>
            <surname>Tu</surname>
          </string-name>
          .
          <article-title>Combining the sparsity and unambiguity biases for grammar induction</article-title>
          .
          <source>In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure</source>
          , pages
          <fpage>105</fpage>
          -
          <lpage>110</lpage>
          , Montréal, Canada,
          <year>June 2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>