<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Automatic Detection of Applicable Diatheses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Vernerová</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markéta Lopatková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics</institution>
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>1003</volume>
      <fpage>10</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>The valency behavior (argument structure) of lexical items is so varied that it cannot be described by general rules and must be captured in lexicons separately for each lexical item. For verbs, lexicons typically describe only unmarked usage-the active form-while natural languages allow for certain regular changes in the number, type and/or realization of complementations (e.g. passivization). Thanks to their regularity, such changes may be described in a separate rule component of the lexicon; however, they are typically seen in many but not all verbs and their applicability to a given lexical unit (verb meaning) is not predictable from its valency alone. In this paper, we describe our initial experiments with using a large morphologically annotated corpus of Czech for determining which diatheses are applicable to a given lexical unit.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Valency refers to the argument structure of lexical units.1
In the Functional Generative Description (FGD), valency
belongs to the so-called tectogrammatical layer [
        <xref ref-type="bibr" rid="ref16 ref20">16, 20</xref>
        ],
i.e. the layer of linguistically structured meaning. It
is captured by so called valency frames specifying the
valency complementations (arguments that are either
required or specifically permitted by the given lexical unit).
For each valency complementation, both its semantics (in
the form of a tectogrammatical functor, which captures
a coarse-grained semantic role) and its
syntactic/morphological form must be specified.
• vyzývat ‘to appeal, to challenge’
      </p>
    </sec>
    <sec id="sec-2">
      <title>ACTNom ADDRAcc PATk+Dat, na+Acc, aby, at’, že</title>
      <p>vyzvat neˇkohoADDR.Acc, aby se uklidnilPAT.aby-Clause
‘to ask somebody to calm down’
vyzvat neˇkohoADDR.Acc na soubojPAT.na+Acc</p>
      <p>‘to challenge somebody to a duel’
• apelovat ‘to appeal’</p>
      <p>ACTNom ADDRna+Acc PATaby, at’, že
1Whereas the term lexeme roughly corresponds to a dictionary verb
item with all its meanings, by a lexical unit (LU) we refer to a verb in a
given meaning. See Section 3.1 for more details.</p>
      <p>2The frames and examples are taken from Vallex 2.6, http://
ufal.mff.cuni.cz/vallex/2.6/data/html .</p>
      <p>apelovat na kolegyADDR.na+Acc, aby práci
dokoncˇiliPAT.aby-Clause vcˇas</p>
      <p>‘to appeal to his colleagues to finish the work in
time’
• apelovat ‘to put emphasis’</p>
    </sec>
    <sec id="sec-3">
      <title>ACTNom PATna+Acc</title>
      <p>v jeho rodineˇ se stále apeluje na morálkuPAT.na+Acc
‘in his family emphasis is always put on morality’
The above examples demonstrate how valency behavior
varies even among semantically close lexical units (LUs),
both when they belong to the same lexeme and when they
belong to different lexemes. It must therefore be captured
for each lexical unit of a verb separately in the form of a
lexical entry listed in the valency lexicon. On the other
hand, certain changes in the valency structure are regular
and can be described in the form of rules which can be
specified in a separate component of the lexicon. Such
changes are typically seen in many but not all verbs and
their applicability to a given lexical unit is not predictable
from its valency frame alone.</p>
      <p>A lexical entry does not list all of its possible forms but
only one—usually the structure corresponding to the
active form of the verb, which is considered to be its
unmarked use—and a list of rules for creating other
possible structures (the marked uses). This description is both
economical (less space is needed for storing the
information about all available realizations of the LU) and
linguistically adequate (it captures generalizations which would
not be obvious if all possible surface forms were listed).</p>
      <p>Valency lexicons are created with many applications in
mind: they help to maintain consistency of corpus
annotation, provide syntactic and morphological
information during parsing and natural language generation, and
may even prove useful in word sense disambiguation and
machine translation; moreover, lexicon data is consulted
by linguists during their theoretical research and provides
useful information for students of Czech. All of these tasks
involve actual occurrences of the valency patterns in the
natural language, and so the unmarked structures from the
lexicon need to be converted into all structures that may
appear in the actual data.</p>
      <p>
        A rule based approach to creating derived valency
structures has already been used during the annotation of the
Prague Dependency Treebank3 (PDT) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Frames in
the valency lexicon PDT-Vallex describe the unmarked
structure but all possible structures may appear in actual
treebank data. During consistency controls, general rules
were used to generate frames describing the marked
valency structures; then it was checked whether any of these
marked structures matches the data and the annotation
in the treebank. (The derived structures carry
information about the required form of the verb, and the number
and type of the valency complementations including their
functors, obligatoriness and permitted forms.) The rules
that were used for the conversions are described in detail
in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>Because correctness of the underlying PDT data was
assumed, the rules were allowed to heavily over-generate.
For example, “passive” frames of the verb mít ‘to have’
were generated although, in reality, it does not form
passive in Czech. While this is a reasonable strategy for
consistency checks of annotated data, other tasks that utilize
a valency lexicon would benefit from lists of diatheses
applicable to any given lexical unit. Manual annotation
provides a number of examples of lexical units occurring in
different types of diatheses; however, the size of the
tectogrammatically annotated PDT data is too small, so we
cannot make any conclusions from the fact that a lexical
unit does not occur in a diathesis. Therefore, we are trying
to draw evidence from a much larger, automatically
morphologically annotated corpus. We have decided to use
SYN, a non-referential corpus of 1,300 million
automatically morphologically tagged words.</p>
      <p>
        For Czech, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] used simple heuristics for
determining which diatheses are applicable to which lexical units
(both kinds of passive for verbs with complementations
realized as a prepositionless object, infinitive or dependent
clause; only reflexive passive for intransitives and verbs
where all complementations are realized as prepositional
phrases; no passive for reflexives). For other languages,
most authors have only studied the applicability of
alternations and diatheses to whole lexemes rather than to
individual lexical units [
        <xref ref-type="bibr" rid="ref11 ref14 ref15 ref19">11, 14, 15, 19</xref>
        ]. We also draw
inspiration from the work on automatic extraction of whole
frames from corpora, which has been attempted for several
languages including English [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Czech [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and Polish
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
      <sec id="sec-3-1">
        <title>Diatheses</title>
        <p>
          Regular changes of the valency structure of a lexical unit,
in the English-language literature usually called
alternations, typically allow the speaker to express the same
situational meaning (i.e., propositional content characterized
by the set of situational participants) in different ways
that result in different perspectives from which the
situation is viewed. Alternations have already been studied
extensively for several decades [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In the description of
Czech, we follow the classification given by Kettnerová
et al. in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Here we focus on diatheses—specific relations
stemming from the changes in the linking of situational
participants, valency complementations and surface syntactic
positions. Diatheses belong to the group of grammaticalized
alternations: they are realized by the use of specific
morphological and/or syntactical means, including the
grammatical category of voice of the verb and the surface forms
of the complementations. They relate different surface
syntactic structures of a single lexical unit of a verb. They
also belong to the group of conversive alternations: the
transformation acts as a permutation on the assignment of
valency complementations to surface syntactic positions,
typically shifting Actor away from the prominent subject
position and filling it with some other complementation.
2.1</p>
        <sec id="sec-3-1-1">
          <title>Types of grammatical diatheses in Czech</title>
          <p>
            In this section, we summarize the description of Czech
diatheses as given by [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] and [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], and comment on some
of the issues that need to be solved and decisions that need
to be made for their automatic analysis.
          </p>
          <p>The unmarked member of the diathesis. The unmarked
usage is described in the lexical entry in the lexicon. The
verb appears in an active form or as an infinitive; the
complementations are realized in the forms specified for them
in the lexicon entry. All complementations specified in
the entry as obligatory are present on the
tectogrammatical layer, although some of them may be elided in the
surface realization of the sentence (if their value is either
clear from the context or general); inner participants4 that
are not specified in the lexical entry must not appear as
arguments of the verb, but free modifications may.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Diatheses with past participle.</title>
          <p>1. passive diathesis (periphrastic passive)
e.g. Neustále jsem byl neˇkým vyzýván, abych se
legitimoval. ‘All the time - I was - someoneInstr - asked
to show my ID.’ – ‘I kept being asked to show my ID’
The form of the verb in this diathesis consists of the
past participle of the main verb + the verb být ‘to
be’ (in a finite or infinite form). The subject slot of
the passive construction either remains empty, or it
is filled by a complementation which originally filled
an object slot (typically that of an Accusative object,
but realization through infinitive, clause, genitive, or
phrase jako+Acc ‘as something’ is also possible); if
the complementation is expressed as a noun phrase, it
is turned into the Nominative case. The Actor (which
3See http://ufal.mff.cuni.cz/pdt2.5/ for information
about the current version.</p>
          <p>4Inner participants are complementations with either of the functors
ACTot, PATient, ADDRessee, EFFect or ORIGin.
in the active construction fills the subject slot) may be
realized either in the Instrumental case, or as a
prepositional phrase od+Gen ‘by/from+Gen’.
2. resultative diathesis with the auxiliary verb být ‘to be’
e.g. Jídlo je uvarˇeno. ‘The food is cooked.’
This form of resultative diathesis differs from the
periphrastic passive only in meaning, not in the surface
form or structure. In many cases, it is not clear which
of the two possible readings the speaker has in mind.
For example, the sentence okno je otevrˇeno may be
interpreted as a case of the resultative diathesis,
describing a state, i.e. ‘the window is (already) open’,
or as a case of the passive diathesis, describing an
event, i.e. ‘the window is (being) opened’. This
ambiguity is called event–state homonymy in Czech
linguistics. Because it is so common, we assume that the
passive diathesis is possible whenever the resultative
diathesis is possible and vice versa.</p>
          <p>Moreover, Czech also exhibits a competition between
past participles and deverbal adjectives. The kind of
deverbal adjectives that we have in mind are formed
by adding vowel endings to past participles; both the
participle and the adjective can then be used to
express the resultative meaning, while only the
participle can be used to express the passive meaning. On
one hand, this interchangeability of the “short”
(participle) and “long” (adjectival) forms is often used
as a guideline in determining whether a given
sentence should be considered resultative—if the
participle can be replaced with the adjective, the resultative
interpretation is valid. On the other hand, participle
forms are sometimes used in purely adjectival
meaning, such as in the sentence stále ješteˇ nebyl najeden
‘he still was not full’, which features the word form
najeden (past participle of the reflexive verb najíst se
‘satiate oneself, eat so much that one is full’). If we
were to read this as a diathesis, this would have to
be a case of periphrastic passive or of the resultative
diathesis with auxiliary verb být ‘to be’ formed from
the sentence najedl se ‘he ate to be full’. However, it
is not possible that the same complementation would
fill the subject position in both the active and the
passive/resultative diathesis. We have to read this
sentence as a sentence with the adjective najedený ‘full,
satiated’.
3. possessive resultative diathesis
e.g. Maminka má jídlo uvarˇeno. ‘The mother has the
food cooked.’
In this type of construction, auxiliary verb mít ‘to
have’ is used together with the past participle of the
main verb.</p>
          <p>Note that the conversive aspect is crucial for our
theoretical concept of a diathesis. For example, only one
of the two possible readings of the example sentence
above is considered to be a diathesis:
Mamince vcˇera jídlo prˇipravila teticˇka. Maminka má
tedy již jídlo uvarˇeno. ‘The aunt has prepared the
food for the mother yesterday. Therefore, the mother
has the food cooked already.’ This case is considered
to be a diathesis, because the Actor of the first
sentence (the aunt) has moved away from the subject
position (and is not expressed in the resultative variant
at all).</p>
          <p>Maminka varˇila celé dopoledne a nyní již má jídlo
uvarˇeno. ‘Mother has been cooking all morning and
now she already has the food cooked.’ This case is
not considered to be a diathesis, because the same
complementation is corresponding to the subject in
both cases.</p>
          <p>
            In practice, however, it is often impossible to
distinguish between the two readings (Panevová et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]
claim that out of 60 cases of a resultative diathesis
in the PDT, 23 are ambiguous), and the difference is
usually only obvious from the context, so it is
inaccessible to the kind of naive, syntax-based automatic
methods that we are trying to use. Our automatic
method does not differentiate between the two
readings.
4. recipient passive diathesis
e.g. Dostal jsem zaplaceno (od šéfa). ‘I got paid (by
the boss).’
The most visible characteristics of the recipient
diathesis is the auxiliary verb dostat ‘to get’ and the
past participle. The original frame must contain a
complementation in dative or a benefactor; this
complementation becomes the subject of the diathetic
construction. The actor is expressed in the
Instrumental case, or as a prepositional phrase od+Gen ‘by’.
If there is a semantic patient, it keeps its form and
agrees with the participle in gender and case. All
these conditions together are fairly specific and
therefore allow for a fairly accurate search for corpus
concordances.
          </p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Diatheses with the reflexive particle se.</title>
          <p>5. deagentive diathesis (reflexive passive)
e.g. Varˇilo se tu pro emigranty. ‘Cooked - reflexive
- here - for - emigrants.’ – ‘It was cooked here for
emigrants.’
The only surface marks of this diathesis are a verb in
the third person (agreeing with the subject in
number and gender, or singular neuter for subjectless
sentences) and the free reflexive morpheme se. The
Actor is not expressed in this kind of construction at all.
Rules for forming the deagentive diathesis have
almost the same conditions as the rules for forming the
passive diathesis (and both types can be applied to
almost any frame), and also the ways in which the
Patient, Addressee or Effect are moved into the
subject position are the same. However, the sets of verbs
that allow the two diatheses are different.
6. dispositional diathesis (mediopassive)
e.g. Dobrˇeadverb se (miDat) tu hrál tenis.
‘Well - reflexive - to-me - here - played - tennis.’ –
‘For me, this was a good place to play tennis. I
enjoyed playing tennis here.’
A characteristic feature of the dispositional diathesis
is an evaluative element, usually an adverb such as
dobrˇe ‘well’, pomalu ‘slowly’. The verb form is the
same as in the deagentive diathesis, i.e. a third
person verb agreeing with the subject (or signular neuter
in subjectless sentences) + reflexive partice se.
However, the Actor may be expressed on the surface in the
dative case.</p>
          <p>
            Although people do not have difficulties
distinguishing between the deagentive and the dispositional
diathesis, the difference is hard to grasp for an
automatic procedure when the Actor in the dative is
elided from the sentence (in a sample from the
corpus SYN2005 cited in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], the Actor was only
expressed in 22 sentences out of 143). For example, the
following sentence is deagentive, although its surface
structure is similar to the example of a dispositional
diathesis given above:
Odpoledneadverb se tu hrál tenis.
‘In the afternoons - reflexive - here - played - tennis.’
– ‘In the afternoon, tennis was played here.’
Moreover, this diathesis is very rare—according to
[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], it appears only 8 times in the
tectogrammatically annotated part of the PDT. We therefore follow
the strategy used by Skoumalová [21, p.47] and
assume that any imperfective verb which can form the
deagentive diathesis can also form the dispositional
diathesis.
          </p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Diatheses with the reflexive particle si.</title>
          <p>7. causative diathesis
e.g. Nechal si od Gesy varˇit. ‘He let - reflexive - by
Gesy - cook.’ – ‘He let Gesy cook for him.’
This verbal form roughly corresponds to the English
‘have something done’. For lexicographic purposes,
we view the causative diathesis as a separate sense of
the verbs nechat/dát ‘let/give’. One reason for this
treatment lies in the fact that two Actors appear in the
construction - the Actor of the verb nechat/dát and
the Actor of the dependent infinitive.
3</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Treatment of diatheses in Vallex and</title>
      </sec>
      <sec id="sec-3-3">
        <title>PDT-Vallex</title>
        <p>
          Our proposal is primarily formulated for the purpose of the
description of valency in valency lexions of Czech verbs
built within the framework of the Functional Generative
Description (FGD). We are working with two lexicons,
VALLEX 2.65, see [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], and PDT-Vallex 2.06, see [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ],
although this phenomenon is to be solved in any valency
lexicon. We work with the common format developed for
the two lexicons by Bejcˇek et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Both lexicons are divided into two components: a data
component and a rule component.
3.1</p>
        <sec id="sec-3-3-1">
          <title>The data component</title>
          <p>The data component consists of word entries
corresponding to verb lexemes. Lexeme is an abstract twofold
data structure which associates lexical form(s) and
lexical unit(s). Lexical forms are all possible manifestations
of a lexeme in an utterance (e.g. a lemma or a group of
lemmas,7 all morphological forms of these lemmas, and
their reflexive and irreflexive forms). All lexical forms of
a lexeme are represented by its lemma(s).</p>
          <p>In the lexicon, each lexical unit (a sense of a verb) is
characterized by a gloss (a verb or a paraphrase roughly
synonymous with the given sense) and by example(s)
(sentence fragment(s) containing the given verb used in the
given sense). The core information on valency
characteristics of a lexical unit is encoded as (exactly one) valency
frame reflecting the unmarked (active) use of the verb. 8</p>
          <p>In an ideal model of the lexicon, information on the
possible application of diatheses is stored in each lexical
unit in a special attribute -diat. This attribute has not
been implemented in either of the lexicons yet, but the
attribute -rfl (reflexivity) that is present in Vallex overlaps
with the proposed attribute -diat to a certain extent. It
lists possible syntactic functions of the relexive morpheme
se/si. The values of the -rfl attribute are pass for
reflexive passives in verbs with accusative complements, pass0
for reflexive passives in intransitive verbs, and cor4 and
cor3 for cases where se and si fill the position of an object
in accusative (cor4) or in dative (cor3), showing that the
subject is performing an action on itself. If the verb allows
for any of these constructions with se/si, the possibility has
been exemplified with made up examples (the annotators
simply converted the examples given for the active
diathesis into a passive/reciprocal construction). Many of these
5http://ufal.mff.cuni.cz/vallex/2.6/doc/home.html
6This is the version that has been published as part of
the Prague Czech-English Dependency Treebank 2.0.; it is
available from http://ufal.mff.cuni.cz/pcedt2.0/publications/
vallex3.xml and can be browsed at http://ufal.mff.cuni.cz/
lindat/PDT-Vallex.html.</p>
          <p>7Vallex lexemes comprise perfective, imperfective and iterative
variants, as well as spelling variants, so that the lexicon covers almost twice
as many lemmas as lexemes. On the other hand, there is a one-to-one
correspondence between lexemes and lemmas in PDT-Vallex.
8See the Introduction for more details about valency frames.
lexemes7
lexical units (LU)
lemmas (L)
LUs (separated by lemma)
reflexive
nonreflexive, 0 occurrences
of past participles (tag „Vs“)
nonreflexive, some
occurrences of Vs, 1 LU
nonreflexive, 0 occurrences
in sentences with “se”
tagged as “P7-X4-*”
examples do not sound natural. We hope that our methods
will provide some more natural corpus examples.
Moreover, we intend to cover other diatheses that the current
annotation does not cover.</p>
          <p>See Table 1 for counts of lexemes, lemmas and
lexical units in both lexicons. Both lexicons are available
in machine-tractable XML format and also as
humanfriendly web pages.
3.2</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>The rule component</title>
          <p>The proposed rule component of the lexicon consists of
a set of formal syntactic rules determining changes in the
mapping of valency complementations onto surface
syntactic positions. They make it possible to obtain all
possible surface syntactic manifestations of lexical units of
verbs (i.e., number of complementations, their types and
possible morphological forms).</p>
          <p>
            At present, we use transformational rules formulated for
the purposes of the description of diatheses in PDT-Vallex,
the lexicon of the Prague Dependency Treebank, see [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ].
4
          </p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Methodology</title>
        <p>Due to the size of the lexicon, it is preferable to
minimize the necessary manual work involved in augmenting
the lexicon with information about applicable diatheses.
Moreover, experience suggests that annotators tend to be
positively biased towards assuming the applicability of the
diatheses. Also, examples given by annotators tend to be
contrieved/unnatural. To address these problems we would
like to have a semiautomatic method which should, where
possible
• automatically decide whether a diathesis is
applicable,
• provide natural corpus examples of the diathesis to be
included in the lexicon,
and in uncertain cases
• provide corpus evidence on the basis of which the
annotators can quickly make the decision.</p>
        <p>
          Below we describe such a method in some detail. The
method works by iterating over the frames in three passes.
The first pass is a negative pass which filters out lexical
units where the diathesis is not applicable due to either
grammatical concerns or insufficient corpus evidence. The
second pass is a positive pass where lexical units with
sufficient evidence for applicability are dealt with. In the
final step, corpus evidence is gathered for the remaining
unclear lexical units. This evidence is then presented to the
annotator for a manual decision. If the second or third
phase yields a large number of examples, the automatic
method should also order them so that simple, clear
examples come first. The method of ordering corpus examples
used by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is well-suited for our purposes.
        </p>
        <p>Due to the difficulties in distinguishing some of the
diatheses mentioned above, the proposed semi-automatic
procedure only strives to identify cases of the following
diatheses: periphrastic passive, possessive resultative,
recipient, and deagentive (reflexive passive).
4.1</p>
        <sec id="sec-3-4-1">
          <title>Negative pass — excluding frames</title>
          <p>In the negative pass we use various methods for excluding
inapplicable diatheses. In some cases, we exclude whole
lexemes (reflexive verbs and lexemes for which no
corpus evidence suggesting a possibility of the diathesis was
found); the rule-based exclusion, on the other hand, may
exclude some lexical units of a lexeme while other proceed
into the next phase.</p>
          <p>Reflexives. We assume that none of the diatheses is
applicable to a lexeme with a reflexive lemma. These cases
include reflexiva tantum ( bát se ‘to fear’) and derived
reflexives ( šírˇit se ‘to spread (itself)’). This assumption
covers 1528 out of 4789 lemmas occurring in Vallex 2.6, and
1590 out of 7116 verb lemmas occurring in PDT-Vallex
2.0. It can be seen from Table 1 that so far this is the most
effective step in the negative pass.</p>
          <p>We are aware of the fact that this assumption is only
approximately valid. According to [9, p. 93], derived
reflexives do not form passive (neither periphrastic nor
reflexive), but some reflexiva tantum do; [21, p. 43] is only
aware of two reflexive verbs that form a periphrastic
passive, the reflexiva tantum tázat se ‘to ask’ and obávat se
‘to fear’, and otherwise assumes that reflexive verbs do
not form passives. While [23, p. 124] discusses the
limited possibility of forming the reflexive passive of reflexiva
tantum, she also gives a (made up?) example of a
stylistically non-neutral sentence smálo se, až se plakalo ‘it was
laughed so much that it was cried’ with reflexive passive
of reflexiva tantum.</p>
          <p>
            We have found several other cases where reflexive verbs
form a diathesis:
1. To se lehko pamatuje. ‘This is easy to remember. It is
easy to remember it.’ (derived from pamatovat si ‘to
remember’)
Na to se lehko zvykne. ‘This is easy to get used to. It
is easy to get used to it.’ (derived from zvyknout si ‘to
get used to’)
Na všechno se zvykne. ‘Everything gets used to.
People get used to everything.’ (derived from zvyknout si
‘to get used to’)
This usage is almost idiomatic; the first two
examples are cases of the dispositional diathesis, and the
third seems to be derived from it. We expect that
further research will show that this type of construction
is productive even among reflexive verbs.
2. Prezident Václav Havel je lidmiInstr nejméneˇ oblíben
od té doby, kdy zacˇal prezidentovat. ‘President
Václav Havel is by-the-people least liked since he
started presidenting.’ – ‘President Václav Havel’s
popularity is the least since he became president.’
(derived from oblíbit si ‘get to like’)
The corpus contains many instances of je oblíben ‘is
liked’ which can be easily analyzed as cases of the
verbo-nominal predicate být oblíben(ý) ‘to be liked’,
not as passive. However, this particular sentence also
contains the Actor lidmi ‘by the people’ in the
Instrumental case, which is typical of a passive
construction. One option is to claim that lidmi is a valency
complementation of the adjective (oblíben kým ‘to be
liked by whom’). The other option is to admit that
this is a case of a passive construction, possibly
related to the historical existence of the verb oblíbit ‘get
to like’ without a reflexive particle (as documented in
[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]9).
3. Zdálo se, že toto úsilí už už zacˇne nést ovoce, bylo
vdeˇcˇneˇ povšimnuto cˇtenárˇi. ‘It seemed that the effort
will soon bear fruit, it was noticed by the readers.’
(derived from povšimnout si ‘notice’)
Here, the reading as a verbo-nominal predicate seems
even less likely than in the previous example.
          </p>
          <p>The possibility to form passives of reflexive verbs is
certainly an interesting area for further research.</p>
          <p>
            Rule-based exclusion. Some of the diatheses require a
particular grammatical structure to be applicable. It is
therefore possible to exclude frames where this structure
is absent. Here we rely on [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] where a machine-readable
9At http://psjc.ujc.cas.cz/, search for oblíbiti gives 60
instances documented on write-out cards; the relevant entry from the
lexicon can be found by searching for oblíbiti si.
list of the necessary structures for each diathesis is
compiled. The effectiveness of this exclusion depends on the
type of diathesis. The diatheses that are formed with the
past participle can be applied to almost any structure. This
step is a little more useful in the the cases where the
diathesis is formed using the particle se.
          </p>
          <p>Corpus-based exclusion. We start with a very naive
implementation of this step, excluding the applicability of the
diatheses for whole lexemes. Applicability of the
diatheses formed with the past participle may be ruled out if
the past participle is not found in the corpus. Similarly,
we may exclude the applicability of the reflexive passive
whenever the verb does not appear in the same sentence as
the particle se anywhere in the corpus. Table 1 shows that
we need to refine these criteria, especially for the
exclusion of the reflexive passive.</p>
          <p>The mere presence of the se token is not necessarily
indicative of the given diathesis. First of all, the se need
not be a particle at all, e.g. in the sentence tancˇil se
ženou ‘he danced with (his) wife’, the word se ‘with’ is in
fact a preposition. (The morphological tagger used to tag
the corpus SYN is accurate enough to overcome this
ambiguity.) But even as a particle, se can be part of a
different grammatical structure, e.g. in the sentence snažil se
tancˇit ‘he tried reflexive to dance’ the word se belongs to
the reflexivum tantum snažit se ‘to try’, not to the verb
tancˇit ‘to dance’. Limiting the search to segments
enclosed by punctuation might exclude some genuine
examples of diatheses: minule se, pokud si pamatuji, tancˇilo
až do rána ‘the last time reflexive, as far as I remember,
danced until morning’ – ‘as far as I remember, the last
time dancing continued until morning’; we do not want to
take the risk of missing some existing evidence already in
this phase. Thus, naive corpus search does not suffice to
exclude more than a tiny number of verbs (as can be seen
from Table 1): auxiliary methods such as (shallow) parsing
or at least clause detection are needed. The Prague
Dependency Treebank is too small for the purpose of rejecting
the applicability of a diathesis, especially if it is rare such
as the possessive resultative. (E.g., there are only about
70 instances of the possessive resultative in the whole of
PDT.) Corpus SYN, albeit more adequate in size, is not
parsed, so a different, inherently less reliable method must
be used. We could, for example, base our decision as to
whether the se is connected to the relevant verb or not on
their distance in the sentence.</p>
          <p>Combination of the rule-based and corpus-based
method. The rules allow us to identify frames
describing structures in which a given lexical unit may appear in
a diathesis. These structures can be turned automatically
into patterns for corpus search. In general, no significant
conclusions can be drawn from the fact that the resulting
search does not produce any results: Czech is a pro-drop
language, so even semantically obligatory elements can
be elided in the actual sentence. Only the dispositional
diathesis contains an element that is obligatory on the
surface, but the range of possible morphemic realizations of
this evaluative element needs to be further researched.
4.2</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>Positive copus-based pass</title>
          <p>In the positive pass we search the corpus for evidence
showing the applicability of a given diathesis. Especially
an occurrence of a past participle is indicative of a
diathesis (although concerns about the competition between past
participles and adjectives need to be addressed). The three
kinds of diatheses with past participle forms that we
intend to distinguish—periphrastic passive, possessive
resultative and recipient passive—moreover differ in the
auxiliary verbs. Therefore we assume that instances of past
participles found in the corpus can be assigned to a
diathesis with a fair amount of certainty. The situation with the
passive constructions built with the reflexive particle se is
more complex, but the techniques developed for the first
pass will hopefully help here as well.</p>
          <p>The automatic method must be able to assign the
evidence found to a particular diathesis and to a particular
lexical unit (it does not suffice to know that a verb with many
meanings appears in the passive diathesis in the given
sentence; we are looking for examples which we can
desambiguate). Sometimes, the first pass will give us a single
candidate. In other instances, we apply the rules to the
remaining frames, derive the description of the full
structures corresponding to a diathesis, and then search the
corpus for patterns with elements that are unique to only one
of the candidates.
4.3</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>Corpus evidence for manual annotation</title>
          <p>Finally, similar methods as in the second phase will be
used, but examples with ambiguous status will be
output. We expect that the examples will be automatically
assigned to a diathesis with high precision. Thus, for each
combination of a lexical unit and a diathesis that remain
undecided after the previous pass, the system will be able
to provide the annotator with a selection of sentences that
could be instances of this LU in the given diathesis with
high likelihood. The annotator will then either select a
couple of examples that demonstrate the applicability of
the diathesis, or will decide that the diathesis is not
applicable to the given LU.
5</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Conclusions</title>
        <p>
          We introduced a (semi-)automatic method for
identifying lexical units that undergo individual diatheses, and we
have discussed some of the difficulties that stand in the
way of a fully automatic procedure. We have also shown
that the question whether a diathesis is applicable to a
lexical unit may be answered in several different ways:
• The least strict measure is the applicability of a rule
for forming the given diathesis. This is a necessary,
yet not sufficient condition. The rules have been
described in detail in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and it is known that they
heavily overgenerate.
• If corpus evidence is found for the applicability of the
diathesis, the amount/reliability of this evidence may
be just as important (especially if the decision is not
reviewed by an annotator). Even a single corpus
occurrence provides evidence that it is possible to form
the diathesis, yet (if the verb itself is frequent) it also
provides evidence that for some reason, that
possibility is not widely used by the users of the language.
• Lack of corpus evidence leads to the exclusion of
some LUs that pass the first test. We expect to find
cases where no corpus evidence of the applicability
of the diathesis will be found, yet an annotator
presented with the LU might still feel that it cannot be
excluded completely. (This is essentially the same
case as we discussed in the previous paragraph—
a possibility that is exploited only rarely—only this
time for diathesis-verb combinations that did not
appear in the corpus.) We believe that in such a case,
and if the entry has been reviewed by an annotator, it
is best to provide this information to the user of the
lexicon.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Acknowledgments</title>
        <p>The research reported in this paper was supported by
the grant of the Czech Science Foundation GACˇ R No.
P406/12/0557.</p>
        <p>The first author was partially supported by the grant
SVV-2013-267314.</p>
        <p>This work has been using language resources developed
and/or stored and/or distributed by the LINDAT-Clarin
project of the Ministry of Education of the Czech Republic
(project LM2010013).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bejcˇek</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kettnerová</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Advanced searching in the valency lexicons using PML-TQ search engine</article-title>
          . In Sojka, P.,
          <string-name>
            <surname>Horák</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopecˇek</surname>
          </string-name>
          , I., and
          <string-name>
            <surname>Pala</surname>
          </string-name>
          , K., editors,
          <source>Text, Speech and Dialogue. 13th International Conference</source>
          , volume
          <volume>6231</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>51</fpage>
          -
          <lpage>58</lpage>
          , Berlin / Heidelberg. Masarykova univerzita, Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] De˛bowski, Ł. (
          <year>2009</year>
          ).
          <article-title>Valence extraction using EM selection and co-occurrence matrices</article-title>
          .
          <source>Language resources and evaluation</source>
          ,
          <volume>43</volume>
          (
          <issue>4</issue>
          ):
          <fpage>301</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hajicˇ</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Complex corpus annotation: The Prague Dependency Treebank</article-title>
          . In Šimková, M., editor,
          <source>Insight into Slovak and Czech Corpus Linguistics</source>
          , pages
          <fpage>54</fpage>
          -
          <lpage>73</lpage>
          . Veda, Bratislava.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hajicˇ</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajicˇová</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Sgall</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pajas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šteˇpánek</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Havelka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikulová</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Žabokrtský</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ševcˇíková-Razímová</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Prague Dependency Treebank 2.0. CD-ROM. LDC Catalog No</article-title>
          .
          <year>LDC2006T01</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hujer</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smetánka</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weingart</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havránek</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šmilauer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Získal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1933</year>
          -
          <fpage>1957</fpage>
          ).
          <article-title>Prˇírucˇní slovník jazyka cˇeského. Státní nakladatelství, Státní nakladatelství ucˇebnic, Státní pedagogické nakladatelství</article-title>
          , Praha.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kettnerová</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>The lexicographic representation of Czech diatheses: Rule based approach</article-title>
          . In Majchráková, D. and
          <string-name>
            <surname>Garabík</surname>
          </string-name>
          , R., editors,
          <source>Natural Language Processing</source>
          , Multilinguality, pages
          <fpage>89</fpage>
          -
          <lpage>100</lpage>
          , Bratislava, Slovakia. Tribun EU.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kettnerová</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bejcˇek</surname>
          </string-name>
          , E. (
          <year>2012</year>
          ).
          <article-title>The syntax-semantics interface of Czech verbs in the valency lexicon</article-title>
          . In Fjeld, R. and
          <string-name>
            <surname>Torjusen</surname>
          </string-name>
          , J., editors,
          <source>Proceedings of the 15th EURALEX International Congress</source>
          , pages
          <fpage>434</fpage>
          -
          <lpage>443</lpage>
          , Oslo, Norway. Department of Linguistics and Scandinavian Studies, University of Oslo.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kilgarriff</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Husak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McAdam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rundell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rychlý</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>GDEX: Automatically finding good dictionary examples in a corpus</article-title>
          . In Bernal, E. and
          <string-name>
            <surname>DeCesaris</surname>
          </string-name>
          , J., editors,
          <source>Proceedings of the 13th EURALEX International Congress</source>
          , Barcelona, Spain. Institut Universitari de Lingüística Aplicada. Universitat Pompeu Fabra; Documenta Universitaria.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kopecˇný</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>1962</year>
          ).
          <article-title>Základy cˇeské skladby</article-title>
          .
          <source>Státní pedagogické nakladatelství, Praha</source>
          ,
          <volume>2</volume>
          . edition.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Korhonen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Subcategorization Acquisition</article-title>
          .
          <source>PhD thesis</source>
          ,
          <source>Ph. D. thesis</source>
          , University of Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Acquiring lexical generalizations from corpora: A case study for diathesis alternations</article-title>
          .
          <source>In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics</source>
          , pages
          <fpage>397</fpage>
          -
          <lpage>404</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Levin</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          (
          <year>1993</year>
          ).
          <article-title>English Verb Classes and Alternations: A Preliminary Investigation</article-title>
          . The University of Chicago Press, Chicago and London.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Žabokrtský</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kettnerová</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Valencˇní slovník cˇeských sloves</article-title>
          . Karolinum, Praha.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Lexical acquisition at the syntax-semantics interface: diathesis alternations, subcategorization frames and selectional preferences</article-title>
          .
          <source>PhD thesis</source>
          , University of Sussex.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Korhonen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Detecting verbal participation in diathesis alternations</article-title>
          .
          <source>In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, ACL '98</source>
          , pages
          <fpage>1493</fpage>
          -
          <lpage>1495</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Valency frames and the meaning of the sentence</article-title>
          . In Luelsdorff, P. A., editor,
          <source>The Prague School of Structural and Functional Linguistics</source>
          , pages
          <fpage>223</fpage>
          -
          <lpage>243</lpage>
          . John Benjamins Publishing Company, Amsterdam, Philadelphia.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <source>(manuscript)</source>
          .
          <source>Syntax soucˇasné cˇeštiny (na základeˇ anotovaného korpusu)</source>
          .
          <source>Nakladatelství Karolinum</source>
          , Praha.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zeman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Automatic extraction of subcategorization frames for Czech</article-title>
          .
          <source>In Proceedings of the 18th International Conference on Computational Linguistics</source>
          ,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2000</year>
          , volume
          <volume>2</volume>
          , pages
          <fpage>691</fpage>
          -
          <lpage>697</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Schulte im Walde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Clustering verbs semantically according to their alternation behaviour</article-title>
          .
          <source>In Proceedings of the 18th conference on Computational linguistics - Volume 2, COLING '00</source>
          , pages
          <fpage>747</fpage>
          -
          <lpage>753</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Sgall</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bémová</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borota</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajicˇová</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Hajicˇová</surname>
          </string-name>
          , I., Jirk ˚u, P.,
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Pit'ha,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Plátek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Vrbová</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>1986</year>
          ).
          <article-title>Úvod do syntaxe a sémantiky</article-title>
          .
          <source>Academia.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Skoumalová</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Czech Syntactic Lexicon</article-title>
          .
          <source>PhD thesis</source>
          , Charles University in Prague.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Urešová</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2011a</year>
          ).
          <article-title>Valencˇní slovník Pražského závislostního korpusu (PDT-Vallex)</article-title>
          .
          <article-title>Studies in Computational and Theoretical Linguistics. Ústav formální a aplikované lingvistiky</article-title>
          , Praha, Czech Republic.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Urešová</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2011b</year>
          ).
          <article-title>Valence sloves v Pražském závislostním korpusu</article-title>
          .
          <source>Studies in Computational and Theoretical Linguistics</source>
          .
          <article-title>Ústav formální a aplikované lingvistiky</article-title>
          , Praha, Czech Republic.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>