<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Verbal Multiword Data from Rich Treebank Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eduard Bejcˇek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Hajicˇ</string-name>
          <email>hajic@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Stranˇ ák</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zdenˇ ka Urešová</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague, Faculty of Mathematics and Physics</institution>
          ,
          <addr-line>ÚFAL</addr-line>
        </aff>
      </contrib-group>
      <fpage>13</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>The PARSEME Shared Task on automatic identification of verbal multiword expressions aims at identifying such expressions in running texts. Typology of verbal multiword expressions, very detailed annotation guidelines and gold-standard data for as many languages as possible will be provided. Since the Prague Dependency Treebank includes Czech multiword expression annotation, it was natural to make an attempt to automatically convert the data into the Shared Task format. However, since the Czech treebank predates the Shared Task annotation guidelines, a prior examination was necessary to determine to which extent the conversion can be fully automatic and how much manual work remains. In this paper, we show that information contained in the Prague Dependency Treebank is sufficient to extract all of the Shared Task categories of verbal multiword expressions relevant for Czech, even if these categories are originally annotated differently; nevertheless, some manual checking and annotation would still be necessary, e.g. for distinguishing borderline cases.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The goal of the PARSEME [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] Shared Task (PST)1 is to develop automatic
detection of verbal multiword expressions (VMWEs) for a wide range of languages
from different language families. It includes data preparation for the task
participants, based on annotation guidelines that were tested on real data for almost
twenty languages [16].2 The training and testing data for the PST (3,500 instances
per language) are being annotated; while manual annotation is necessary for many
languages, reusing existing annotated data is preferred whenever possible.
      </p>
      <p>
        This preference led us to explore the Prague Dependency Treebank (PDT,
[
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ]), which includes quite a rich annotation of MWEs.3 However, the
anno1http://multiword.sourceforge.net/sharedtask2017
2Also at http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext.
3Some VMWEs categories were annotated during the creation of the original PDT 2.0, others
were annotated particularly for PDT 2.5; PDT 3.0 contains all of them.
tation of the PDT preceded the PARSEME typology of VMWEs and thus it is
understandable that the information encoded there is not straightforwardly
transformable into the PST categories and format. Nevertheless, we hoped that the PDT
annotation did contain all the necessary information. If confirmed, it would prove
that the original scheme of rich annotation was well conceived, and in particular,
that the MWE annotation in PDT in fact followed the principles recommended in
[
        <xref ref-type="bibr" rid="ref9">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Introduction</title>
      <p>
        We believe that for the Czech language, annotation of VMWEs already encoded
in the data of the Prague Dependency Treebank 3.0 (PDT) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presents suitable
material for the PST and satisfies the task needs in both (i) the amount of annotated
data and (ii) the types of VMWEs that correspond to the types proposed in the PST.
      </p>
      <p>The PARSEME Shared Task identifies six groups of VMWEs: light verb
constructions (LVC), idioms (ID), verb particle combinations (VPC), inherently
reflexive verbs (IReflV), language specific types and other verbal MWEs (OTH).</p>
      <p>All the various types of VMWEs required by the PST are annotated in quite
a number of diverse ways in the PDT and the information is spread across several
layers of annotation. Thus we first had to relate the PDT annotation to the PST
guidelines in order to confirm that the PDT data can be reused for the Shared Task
and only then the extraction of all types of VMWEs (relevant for Czech) and their
conversion into the PST format could take place.</p>
      <p>
        At the same time (or even more importantly), we were testing the following
four principles for good-quality MWE treebank design published in [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ], which
are based on a survey of as many as 23 different treebanks (dependency-based,
constituency-based, HPSG, LFG, mixed):
Principle A: to annotate MWEs as such,
Principle B: to mark MWEs in a distinctive and specific way,
Principle C: to annotate even discontinuous MWEs and MWEs of varying forms,
Principle D: to allow for searching MWEs by their type.
      </p>
      <p>After thorough analysis of the PDT we have concluded that Principles A and
B are clearly fulfilled in the PDT due to its explicit MWE annotation. Principle
C is also followed thanks to the explicit links between the PDT’s annotation
layers. Principle D is, from the PST point of view, followed only partially, since the
respective typologies do not match one-to-one.</p>
      <p>Thorough inspection of the PDT annotation scheme resulted in an automatic
conversion procedure with rules formulated for each of the PST types. Manual
checks and some amount of manual annotation is still necessary, even if for only a
fraction of the data.</p>
    </sec>
    <sec id="sec-3">
      <title>Conversion of Czech data</title>
      <p>As already explained, the creation of the Czech language data for the PST takes
advantage of the existing rich annotation of the PDT, including explicit annotation
of VMWEs.</p>
      <p>
        The treatment of verbal idioms (part of the ID category) and LVCs in the PDT
is related to valency, as the valency formalism allows for morphological, syntactic
and semantic description of VMWEs in the treebank [
        <xref ref-type="bibr" rid="ref12 ref2 ref3">2, 3, 13</xref>
        ]. These VMWEs
are recorded in the related valency lexicon, PDT-Vallex [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ], as specific “senses”
of the base lemma. For the annotation of verb-noun idiomatic combinations and
some other types of MWEs in the PDT style treebanks and in the associated
valency lexicons see [
        <xref ref-type="bibr" rid="ref14 ref8">9, 15</xref>
        ]. PDT-Vallex has been available already with the original
PDT 2.0 treebank [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Afterwards, explicit general annotation of MWEs including
verbal phrases which now correspond to the ID, LVC and OTH categories has been
carried out (see [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]). The MWE annotation became part of later PDT releases,
including the most recent, PDT 3.0 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].4 Reflexive verbs (IReflV) are treated as
“words with spaces” on the deep syntactic annotation layer, with the particle being
part of such words.
      </p>
      <sec id="sec-3-1">
        <title>Prague Dependency Treebank 3.0</title>
      </sec>
      <sec id="sec-3-2">
        <title>PARSEME Shared Task</title>
        <p>dostat_se</p>
        <p>PRED
nevidomý
ACT</p>
        <p>To sum up, different PST types of VMWEs are obtained from various
information sources available at the different layers of annotation in the PDT. See Figure 1
for an illustration of three of them (the annotation view is simplified only to cover
MWE-related phenomena); an annotation of the non-verbal MWE “rehabilitacˇní
pracovník” (rehabilitation worker) which is not being converted for PST is also
shown.</p>
        <p>In this section, we describe the PDT-style annotation of the proposed six types
of VMWEs recognized in the PST as well as their conversion into the common PST
format (Sections 3.1–3.6). Two special aspects are discussed, namely deverbative
variants (Section 3.7) and cases of overlapping annotation (Section 3.8).
3.1</p>
        <sec id="sec-3-2-1">
          <title>Light Verb Constructions</title>
          <p>In the PDT annotation, LVCs consist of two lexical units: a semantically empty
(or “light”) verb and a noun carrying the main lexical meaning of the entire phrase.
The nominal part of the LVCs is labeled by the CPHR functor (Compound PHRase).
For example: to comePRED into forceCPHR, to undertakePRED preparationsCPHR.
LVCs are identified as depicted in Figure 2.
1. Input text
Zákon tak vstoupil v platnost.</p>
          <p>Law so came into force.</p>
          <p>By that the law has come into force.</p>
          <p>2. PDT t‐layer
vstoupit
PRED</p>
          <p>&lt;MWEcategory="LVC"&gt;
vstoupil Zákon tak vstoupil v platnost.
3. PDT a‐layer Pred has come</p>
          <p>into force
4. Output annota on
zákon
ACT
tak podepsat platnost
MANN CPHR</p>
          <p>Sb Adv vAuxP</p>
          <p>Zákon tak
#QCor
ACT
pOlbajtnost</p>
          <p>Three more things have to be taken into account:
1. Prepositions, if they are part of the LVC, must be retrieved from the surface
syntactic layer, since they are not present on the deep layer. If there is any
extra node between a node for a predicate and a node for a CPHR, it is part
of the LVC.
2. If reflexive particles are part of the verb lemma (see IReflV in Section 3.4),
they also have to become part of the LVC.
3. The CPHR functor is also used for a specific type of phrases with the verb
“to be” (it is necessaryCPHR to leave). These phrases, not assumed by the
PST guidelines, are excluded by checking the lemma of the verb.</p>
          <p>There are 2496 LVCs in the PDT extracted by the above rules. Minor details
aside, LVCs as defined for the PST can be identified on the basis of the existing
PDT annotation without any additional manual annotation.
3.2</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Verbal Idioms</title>
          <p>These VMWEs, denoted as ID in the PST guidelines, compose quite a large group
containing not only traditional idioms. We have to process it in two steps.</p>
          <p>Part of the VMWEs defined as IDs, namely those which are quite fixed idioms,
are understood similarly in the PDT and in the guidelines for the Shared Task, e.g.:
“házet klacky pod nohy” lit. to-throw sticks under feet (= to put obstacles in one’s
way), “brát vítr z plachet” (= to take the wind out of someone’s sails). These
verbal idioms (similarly to LVCs) always consist of two nodes in the PDT: the
governing verb part and the dependent node (with the DPHR functor = Dependent
part of PHRaseme). These idioms can be thus easily extracted by looking for the
DPHR functor. The DPHR node represents all other lexical components of the
idiom, should there be more than one (lemma of the deep syntactic layer is e.g.
“klacky_pod_nohy” or “vítr_z_plachet”), since these are quite fixed expressions in
terms of (the impossibility of) insertion or other modification. Even prepositions
are part of it and their detection is even easier than with a CPHR. See an example
in Figure 3.
1. Input text
Odezva na sebe nedala čekat.</p>
          <p>Reac on on itself not‐gave wait.</p>
          <p>The reac on didn't keep us wai ng.</p>
          <p>2. PDT t‐layer
dát.enunc</p>
          <p>PRED
odezva #Neg na_sebe_čekat
ACT RHEM DPHR
3. Output annota on</p>
          <p>&lt;MWE category="ID"&gt;
Odezva na sebe nedala čekat.</p>
          <p>didn't keep us wai ng</p>
          <p>
            The other group of VMWEs categorized as ID in PST is not so fixed. VMWEs
from this group do not fulfill the criteria for DPHR annotation in the PDT, but they
still qualify to be an IDin the PST. They have been annotated together with all
other MWEs in PDT 3.0 [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ]. The problem is they are marked neither as idioms,
nor even as verbal expressions. Moreover, they are recorded on the deep syntactic
layer as a set of nodes (i.e. content words), neglecting auxiliary words.
          </p>
          <p>Our approach finds a head in the syntactic tree of such a set. If it is a verb, the
MWE is a verbal one (Figure 4). Then other auxiliary nodes (e.g., prepositions)
referred to by the annotated content words are added. (The exception is a
conjunction introducing the whole phrase: it does not belong to the VMWE.) The resulting
VMWE gets the ID mark, unless it overlaps with CPHR or DPHR annotation (see
Section 3.8).</p>
          <p>We have identified 2107 IDs using either the PDT 3.0 MWE or DPHR
annotation.
root 2. PDT t‐layer
mwe lexeme
kroutit
PRED
verb</p>
          <p>3. Output annota on
&lt;MWE category="ID"&gt;
Nevěřícně krou m hlavou nad legisla vou.</p>
          <p>shaking my head
nevěřícný hlava legislativa
MANN PAT REG
1. Input text
Nevěřícně krou m hlavou nad legisla vou.</p>
          <p>Disbelievingly I‐shake head over legisla on.</p>
          <p>I am shaking my head in disbelief on the legisla on. #PersPron</p>
          <p>ACT
Verb-particle combinations (VPC) are not present in Czech. A phenomenon similar
to VPCs is in Czech realized by verbal prefixes (the result being another single
lexical unit, i.e., not a MWE).
3.4</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Inherently Reflexive Verbs</title>
          <p>Inherently Reflexive Verbs (IReflV) contain one of two possible clitics in Czech:
“se” or “si”, e.g. “bát se” (= to be afraid), “hledeˇt si” (= to mind sth). Such verb
is considered a separate lexical unit (different from the verb appearing without the
particle if such verb exists at all) and both its parts are represented by just one node
at the deep syntactic layer of the PDT, and the node’s lemma matches the
PDTVallex lexical unit, which includes the appropriate particle as part of the headword
in the lexicon. This annotation was used for exactly the two types qualified as
IReflV in the PST guidelines, namely, for the case when the non-reflexive
counterpart verb does not exist or when its meaning is markedly changed. Using this
annotation, all IReflVs can be extracted from the PDT texts and converted, see
Figure 5.
1. Input text
Opatření se týká zejména domovníků.</p>
          <p>The meassure involves chiefly housekeepers.</p>
          <p>týkat_se2. PDT t‐layer3. PDT a‐layer tPýrekád</p>
          <p>PRED
opatření
ACT
4. Output annota on</p>
          <p>&lt;MWEcategory="IReflV"&gt;</p>
          <p>Opatření se týká zejména domovníků.</p>
          <p>SObpatření sAeuxT dOobmjovníků involves
zejména domovník
RHEM PAT
zAeujxmZéna</p>
          <p>IReflVs should be possible to extract also without the deep syntactic layer;
an analytical function of an IReflV reflexive particle should be either AuxT or
AuxO on a surface syntactic layer; other values (AuxR, Obj, or Adv) are reserved
for reflexive particles used in other than IReflV contexts, e.g. in passive
constructions. Suspicious cases (705 verb occurrences) in which the information from
the two layers of annotation clashes have been detected by looking for
discrepancy between the lemma and the corresponding analytical function and manually
checked and corrected when necessary (330 cases). There are some borderline
cases where the PDT annotation differs from the PST guidelines; however, these
are mainly errors in annotation and not a true difference between the PST and PDT
guidelines.</p>
          <p>By this approach, 10,266 VMWEs of the IReflV type were extracted from
the PDT. The conversion was automatic except for the 705 manually checked
occurrences.
3.5</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Others</title>
          <p>This category (OTH) is specified in the PST guidelines as a VMWE that does not
fit into any of the other categories, as described in the previous sections. Namely,
it applies to “coordinations of verbs, e.g. to drink and drive, and compound verbs,
e.g. to short-circuit, to pretty-print, to voice act”. The second subtype usually
results in a one-word expression in Czech, so we need to search only for coordinated
verbs.</p>
          <p>
            For this category, the PDT 3.0 MWEs annotation [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ] is useful again. All
MWEs containing two verbs connected by a coordinating conjunction are marked
as an OTH, see Figure 6. This is a very marginal category; we have found only two
OTHs in the data.
1. Input text
Doktorand je studentem, jak se sluší a patří.
          </p>
          <p>PhD‐student is student, as &lt;REFL&gt; suits and befits.</p>
          <p>A PhD student is a student, as he should be.</p>
          <p>být</p>
          <p>PRED
doktorand
ACT
2. PDT t‐layer
student
PAT
a</p>
          <p>CONJ
#Gen slušet_se patřit_se
ACT RSTR RSTR
3. PDT a‐layer</p>
          <p>conjunction
verb verb
4. Output
annota on
&lt;MWEcategory="OTH"&gt;
Doktorand je studentem, jak se sluší a patří.</p>
          <p>
            as he should be
PARSEME Shared Task guidelines also recognize other, non-verbal variants of
verbal MWEs, such as relative clauses (heart which he broke), gerunds (heart
breaking), nominal groups (heart-breaking), or adjectival groups (breaking her heart).
In Czech, nominalization is a common way of verbal MWE variation, see [
            <xref ref-type="bibr" rid="ref5 ref6 ref7">7, 6, 8</xref>
            ].
          </p>
          <p>
            There is no nominal group annotated as CPHR in the PDT and thus no LVC
variant. There are several nominal MWEs annotated as DPHR, but only seven of
them are made from verbal MWEs. We have picked them manually. During the
PDT 3.0 MWE annotation project [
            <xref ref-type="bibr" rid="ref11">12</xref>
            ], annotators were asked to mark deverbative
variants with the verbal lexicon entry. This annotation, although it is not frequent,
is also used.
          </p>
          <p>The situation is quite different for IReflV where many non-verbal lemmas
also contain reflexive particles “se” or “si”. These cases qualify themselves as
nominal or adverbial variants of inherently reflexive verbs.</p>
          <p>To sum up, there are deverbative MWEs in the PST Czech data, however they
are not frequent.</p>
          <p>We are also preparing other deverbative MWEs using data by an idiom
recognizer based on a database, upgraded for deverbatives by Milena Hnátková [5].
3.8</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>Overlaps</title>
          <p>Since the data for PST are extracted from various pieces of annotation, it can easily
happen they are duplicated or that they overlap. All these cases have to be solved
properly, as described below.
3.8.1</p>
          <p>Coordination
Part of a VMWE can be coordinated while the other part is used only once, as
in “Ministerstvo poskytuje malým podnikatelu˚m informacˇní služby a poradenskou
cˇinnost.” (The ministry provides information services and counselling activities to
small businesses.), where two LVCs are present: to provide services and to provide
activities. Such a case is correct and both VMWEs should be preserved and marked
in the output data, with the verb “provide” being part of both.
3.8.2</p>
          <p>Duplicates due to added nodes in the PDT
Since a large part of the MWE annotation in the PDT is encoded at the deep
syntactic layer, sometimes a VMWE is found that has no direct realization in the surface
form of the sentence, although it is present in its deep structure. For example, The
measure can be taken for six month at most and only for selected items., which
in fact means The measure can be taken for six month at most and the measure
can be taken only for selected items. In the PDT, two light verb constructions are
annotated and both of them are linked to the same words. This would result in
duplicate annotation of the words “measure” “be” and “taken” in the sentence. Such
duplicates are detected and removed before the data are exported.
3.8.3</p>
          <p>Overlapping different types of VMWEs
As described previously, we combine explicit idiomatic annotation (DPHR),
explicit light verb annotation (CPHR) and the verbal MWE annotation from PDT 3.0.
If they overlap, the type of the MWE (ID or LVC) is always determined by the
explicit DPHR/CPHR annotation. If only the PDT 3.0 MWE annotation is present,
it always gets ID type as the most probable case; however, this could be checked
manually in future.</p>
          <p>Whenever IReflV overlaps with any other, usually larger MWE, both are
correct and should remain in the output. Other overlaps of different types of VMWEs
are not possible due to the source data we work with.</p>
          <p>It is yet to be determined what to do with cases where an ID from DPHR and
from PDT 3.0 MWE annotation overlaps with different word range.
3.9</p>
        </sec>
        <sec id="sec-3-2-6">
          <title>Results</title>
          <p>After removing the overlaps, there are over 14,000 verbal multiword expressions
exported in the PST format. Table 1 presents the numbers of individual types of
VMWEs.</p>
          <p>VMWE type
It can be concluded that due to a well-founded, rich annotation scheme used in the
Prague Dependency Treebank, which also conforms to most of the four PARSEME
MWE annotation principles, we can almost fully automatically transform the
original MWE annotation into the PARSEME Shared Task verbal MWE types. By that,
we can extract 14,032 VMWEs.</p>
          <p>In the near future, we still want to manually check some borderline cases
mentioned above, e.g. whether an isolated verbal PDT 3.0 MWE should be always
an ID, or how to solve overlapping annotation of the same type but of a different
range. We will include deverbative MWEs from separate automatic lexicon-based
annotation.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>The work described here has been supported by the project PARSEME, LD14117,
by the Ministry of Education, Youth and Sports of the Czech Republic, and carried
out within the framework of the project COST IC1207 PARSEME. The project
used data distributed by the LINDAT/CLARIN repository, supported by the
Ministry of Education, Youth and Sports of the Czech Republic (projects LM2010013,
LM2015071). We also thank our colleague Milena Hnátková who kindly extracted
deverbative variants of VMWEs using her phraseme database and we are working
on incorporating this data into our outputs.
[5] Milena Hnátková. Znacˇkování frazému˚ a idiomu˚ v Cˇeském národním
korpusu s pomocí Slovníku cˇeské frazeologie a idiomatiky. Slovo a slovesnost,
2002.
[16] Veronika Vincze, Agata Savary, Marie Candito, Carlos Ramisch, and
Fabienne Cap. Annotation guidelines for the PARSEME shared task on automatic
detection of verbal multiword expressions, version 6.0, 2016. http:
//typo.uni-konstanz.de/parseme/images/shared-task/
guidelines/PARSEME-ST-annotation-guidelines-v6.pdf.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Eduard</surname>
            <given-names>Bejcˇek</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eva</surname>
            <given-names>Hajicˇová</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Pavlína Jínová, Václava Kettnerová,
          <string-name>
            <surname>Veronika</surname>
            <given-names>Kolárˇová</given-names>
          </string-name>
          , Marie Mikulová, Jirˇí Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková,
          <string-name>
            <surname>Magda</surname>
            <given-names>Ševcˇíková</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jan</surname>
            <given-names>Šteˇpánek</given-names>
          </string-name>
          , and Šárka Zikánová.
          <source>Prague Dependency Treebank 3.0</source>
          ,
          <year>2013</year>
          . Data available from LINDAT/CLARIN, http://hdl.handle.
          <source>net/11858/00-097C-0000-0023- 1AAF-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Jarmila Panevová, Zdenˇka Urešová, Alevtina Bémová, Veronika Kolárˇová, and Petr Pajas.
          <article-title>PDT-VALLEX: Creating a large-coverage valency lexicon for treebank annotation</article-title>
          .
          <source>In Joakim Nivre and Erhard Hinrichs</source>
          , editors,
          <source>Proceedings of The Second Workshop on Treebanks and Linguistic Theories</source>
          , volume
          <volume>9</volume>
          of Mathematical Modeling in Physics, Engineering and Cognitive Sciences, pages
          <fpage>57</fpage>
          -
          <lpage>68</lpage>
          , Vaxjo, Sweden,
          <year>2003</year>
          . Vaxjo University Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Hajicˇ and Zdenˇka Urešová</surname>
          </string-name>
          .
          <article-title>Linguistic annotation: from links to crosslayer lexicons</article-title>
          .
          <source>In Joakim Nivre and Erhard Hinrichs</source>
          , editors,
          <source>Proceedings of The Second Workshop on Treebanks and Linguistic Theories</source>
          , volume
          <volume>9</volume>
          of Mathematical Modeling in Physics, Engineering and Cognitive Sciences, pages
          <fpage>69</fpage>
          -
          <lpage>80</lpage>
          , Vaxjo, Sweden,
          <year>2003</year>
          . Vaxjo University Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Jarmila Panevová,
          <string-name>
            <surname>Eva</surname>
            <given-names>Hajicˇová</given-names>
          </string-name>
          , Petr Sgall, Petr Pajas, Jan Šteˇpánek, Jirˇí Havelka, Marie Mikulová, Zdeneˇk Žabokrtský, Magda Ševcˇíková Razímová, and
          <source>Zdenˇka Urešová. Prague Dependency Treebank 2.0</source>
          ,
          <year>2006</year>
          . LDC2006T01. Philadelphia, PA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Kolárˇová</surname>
          </string-name>
          .
          <article-title>Valence deverbativních substantiv v cˇeštineˇ (PhD thesis)</article-title>
          .
          <source>PhD thesis</source>
          , Univerzita Karlova v Praze,
          <article-title>Matematicko-fyzikální fakulta</article-title>
          , Praha, Czechia,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Kolárˇová</surname>
          </string-name>
          .
          <source>Valency of Deverbal Nouns in Czech. The Prague Bulletin of Mathematical Linguistics</source>
          ,
          <volume>86</volume>
          :
          <fpage>5</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Kolárˇová</surname>
          </string-name>
          .
          <article-title>Special valency behavior of Czech deverbal nouns</article-title>
          , chapter
          <volume>2</volume>
          , pages
          <fpage>19</fpage>
          -
          <lpage>60</lpage>
          . Studies in Language Companion Series,
          <volume>158</volume>
          . John Benjamins Publishing Company, Amsterdam, The Netherlands,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Adam</given-names>
            <surname>Przepiórkowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Elz˙bieta Hajnicz, and Zdenˇka Urešová.
          <article-title>Phraseology in two slavic valency dictionaries: limitations and perspectives</article-title>
          .
          <source>International Journal of Lexicography</source>
          , (
          <volume>1</volume>
          ):
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Victoria</surname>
            <given-names>Rosén</given-names>
          </string-name>
          , Koenraad De Smedt, Gyri Losnegaard,
          <string-name>
            <surname>Eduard</surname>
            <given-names>Bejcˇek</given-names>
          </string-name>
          , Agata Savary, and Petya Osenova.
          <article-title>MWEs in treebanks: From survey to guidelines</article-title>
          . In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors,
          <source>Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), pages
          <fpage>2323</fpage>
          -
          <lpage>2330</lpage>
          , Paris, France,
          <year>2016</year>
          . European Language Resources Association.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Agata</surname>
            <given-names>Savary</given-names>
          </string-name>
          , Manfred Sailer, Yannick Parmentier, Michael Rosner, Victoria Rosén, Adam Przepiórkowski, Cvetana Krstev, Veronika Vincze, Beata Wójtowicz, Gyri Smørdal Losnegaard, Carla Parra Escartín, Jakub Waszczuk, Matthieu Constant, Petya Osenova, and
          <string-name>
            <given-names>Federico</given-names>
            <surname>Sangati</surname>
          </string-name>
          .
          <article-title>PARSEME - PARSing and Multiword Expressions within a European multilingual network</article-title>
          .
          <source>In 7th Language</source>
          &amp; Technology Conference:
          <article-title>Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC</article-title>
          <year>2015</year>
          ), Poznan´, Poland,
          <year>November 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Pavel</surname>
            <given-names>Stranˇák.</given-names>
          </string-name>
          <article-title>Annotation of Multiword Expressions in The Prague Dependency Treebank</article-title>
          .
          <source>PhD thesis</source>
          , Charles University in Prague,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Zdenˇka</given-names>
            <surname>Urešová</surname>
          </string-name>
          .
          <article-title>Valence sloves v Pražském závislostním korpusu</article-title>
          .
          <source>Studies in Computational and Theoretical Linguistics</source>
          .
          <article-title>Ústav formální a aplikované lingvistiky</article-title>
          , Praha, Czechia,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Zdenˇka</given-names>
            <surname>Urešová</surname>
          </string-name>
          .
          <article-title>Valencˇní slovník Pražského závislostního korpusu (PDTVallex)</article-title>
          .
          <article-title>Studies in Computational and Theoretical Linguistics. Ústav formální a aplikované lingvistiky</article-title>
          , Praha, Czechia,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Zdenˇka</given-names>
            <surname>Urešová</surname>
          </string-name>
          , Eva Fucˇíková,
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Jana</given-names>
            <surname>Šindlerová</surname>
          </string-name>
          .
          <article-title>An analysis of annotation of verb-noun idiomatic combinations in a parallel dependency corpus</article-title>
          .
          <source>In The 9th Workshop on Multiword Expressions (MWE</source>
          <year>2013</year>
          ), pages
          <fpage>58</fpage>
          -
          <lpage>63</lpage>
          , Atlanta, Georgia, USA,
          <year>2013</year>
          .
          <article-title>Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>