=Paper=
{{Paper
|id=Vol-1779/07dekok
|storemode=property
|title=Extracting a PP Attachment Data Set from a German Dependency Treebank Using Topological Fields
|pdfUrl=https://ceur-ws.org/Vol-1779/07dekok.pdf
|volume=Vol-1779
|authors=Daniël de Kok,Corina Dima,Jianqiang Ma,Erhard Hinrichs
|dblpUrl=https://dblp.org/rec/conf/tlt/KokDMH17
}}
==Extracting a PP Attachment Data Set from a German Dependency Treebank Using Topological Fields==
<pdf width="1500px">https://ceur-ws.org/Vol-1779/07dekok.pdf</pdf>
<pre>
   Extracting a PP Attachment Data Set from a
 German Dependency Treebank Using Topological
                      Fields
      Daniël de Kok, Corina Dima, Jianqiang Ma and Erhard Hinrichs

              SFB 833 and Seminar für Sprachwissenschaft
                   University of Tübingen, Germany
    {daniel.de-kok,corina.dima,jianqiang.ma,erhard.hinrichs}
                        @uni-tuebingen.de


                                         Abstract

           PP-attachment has traditionally been tackled as a binary classification
       task where a preposition is attached to the immediately preceding noun or to
       the main verb. In this paper, we provide an analysis of PP-attachment in Ger-
       man to show that the assumption that prepositions have only two head candi-
       dates does not hold. We propose a realistic PP-attachment data set, in which
       each preposition has multiple head candidates. The data set is extracted au-
       tomatically from a dependency treebank with topological field annotations.
       Finally, we show that the task of PP-attachment is substantially more difficult
       with this realistic data set than with a binary classification data set.


1     Introduction
Treebanks are constructed to provide empirical data of language use and their syn-
tactic structure. Apart from being of importance for linguistic research, the avail-
ability of treebanks has also opened the possibility to train statistical models that
select the most plausible parse of a sentence from the (typically) exponential num-
ber of available parses.
     Prepositional phrase attachment (PP-attachment) is known to be one of the
difficult problems in parse selection [7]. In dependency parsing this problem man-
ifests itself in that the preposition of a PP can have a variety of tokens as its head.
This variety concerns both the category and the position of the head. The correct
attachment of the preposition is typically dictated by semantics. In the example
below the preposition with can be attached syntactically either to the verb kill or to
the noun birds. In this case, the verbal attachment is the only one that makes sense
semantically. However, the syntactically correct nominal attachment can also pro-
vide valuable information about potential, yet semantically incompatible heads.


                                             89
                                              PP
                                                             PN
                                 to kill two birds with one stone .

                           (1)                          PP


     A treebank containing gold syntax trees can only provide examples of cor-
rect attachments. In order to train a good model for parse selection, examples of
bad attachments are necessary as well. Stochastic rule-based parsers solve this
problem by producing a parse forest with all the analyses for a sentence that are
generated by the grammar and lexicon. The parse selection model is then trained
on all parses [1, 9] or a representative sample thereof [12] and learns to discrim-
inate between correct and incorrect attachments. A transition-based dependency
parser [11] cannot apply the same strategy: since it is optimized for finding the
most plausible reading, it cannot be used to enumerate the total range of syntacti-
cally correct variations.
     Our goal in this paper is to enrich the training material available to transition-
based dependency parsers by approximating the range of candidates for potential
PP attachments that rule-based parsers can create.1 Moreover, our approach is
more robust in the presence of out-of-vocabulary words since it does not use fine-
grained syntactic analysis. The necessary information, namely the positions in a
sentence where the potential heads of a preposition can reside, is already implicitly
available in the treebank.
     PP attachment disambiguation has generally been framed as a binary decision
task [7, 10, 13, 16] where the head of the preposition is either the noun2 imme-
diately preceding the preposition or the verb. A more realistic setup is to decide
on a PP attachment only after considering every potential attachment point in the
sentence. While considering all the nouns and verbs in the sentence as possible
heads for the preposition has been suggested [5], this approach overestimates the
set of candidate heads and fails to rule out candidates that can be eliminated on
the basis of structural, syntactic information. An illustration is example (2) from
section 2.2, where, for reasons described in detail in the next section, Goetsch is a
highly unlikely candidate head for the preposition von.
     In this paper, we report on the creation of a new PP-attachment data set for
German that is extracted from the dependency version of the TüBa-D/Z [14, 15]
and includes multiple candidate heads for each preposition. We use the topological
field model to investigate the distribution of PPs and their heads in order to select
only those candidate heads that are plausible competition during parsing, thus ap-
proximating the candidate set that a rule-based parser would produce. Although
the data set concerns German PP-attachment, the techniques that are presented in
this paper are generally applicable for Germanic languages.
   1 One could argue that a rule-based parser should be used to extract all possible attachments. How-

ever, hand-crafted grammar rules for wide-coverage parsers are only available for a small number of
languages.
   2 In this paper, we use the terms noun and nominal for nouns and proper names.


                                                   90
2     Analysis of ambiguous PP attachments
In this Section, we will provide an analysis of PP-attachment in terms of the topo-
logical field model of German clause structure. This allows us to determine the
head candidates in a more fine-grained manner than simply including any noun or
verb within the clause or within a certain window of words.

2.1   The topological field model of German
The topological field model can be used to account for regularities in word order
across different clause types in German [3, 4, 6, 8]. This model postulates that
each clause type has a left bracket (LK) and a right bracket (RK), which appear left
and right of the middle field (MF). In verb-second declarative clauses, the LK is
preceded by an initial field (VF), while the RK can optionally be followed by a final
field (NF). Table 1 illustrates the topological field annotation for different types of
clauses. As can be seen in these examples, the RK contains the verb cluster. In
main clauses the finite verb moves to the LK, whereas in subordinate clauses the
LK holds the complementizer(s).
                         VF       LK           MF            RK           NF
               MC:     Gestern     hat     er häufiger    angerufen    als heute
                      Yesterday    has    he more-often     called    than today
               MC:       Er       ruft        häufig          an
                         He       calls     frequently        up
               SC:                 der    noch häufiger     anruft      als er
                                  who      more often        calls    than him

Table 1: Topological field structure of a main clause with an auxiliary verb, a main
clause without an auxiliary/modal verb, and a subordinate clause.

    It has been shown that the distribution of the fields, wherein the heads and
dependents of a particular dependency relation lie, provides information that can
improve dependency parsing [2]. For example, it is very likely that the subject is
in the VF of a main clause and highly unlikely that it is in the NF. Therefore, an
analysis that attaches an NP in the NF as the subject is probably incorrect. We
expect PP-attachment to have similar properties. For example, it seems similarly
implausible that a preposition in the NF attaches to a head in the VF. The scope
of head candidate selection can thus be constrained by using the distribution of
the PP-attachment relation in combination with topological fields. The use of the
topological fields model also has practical benefits since topological fields can be
predicted accurately using words and part-of-speech tags [2].
    Table 2 shows the distribution of head fields of prepositions in the VF, MF,
and NF in TüBa-D/Z release 10. Two overarching properties can be observed. (1)
Prepositions in all fields can attach to verbs in both brackets. This is due to the fact
that prepositions in the dependency version of the TüBa-D/Z are usually attached
to the main (non-auxiliary/modal) verb, with one exception that we discuss later.
(2) If the head is not in one of the brackets it is most likely to be in the same field


                                              91
as the preposition. In the next section, we will explore the per-field attachment
properties of PPs in more detail.
                                                                    Preposition field
                                                                   VF      MF         NF


                               Prep. head field
                                                            VF   41.16         0.24    0.57
                                                  nominal   MF    1.73        33.47    6.15
                                                            NF    0.00         0.05   35.74
                                                            LK   55.24        22.19   18.17
                                                  verbal
                                                            RK    1.87        44.05   39.37

                    Table 2: Distribution of prepositions and their heads.


2.2      Analysis per topological field
Mittelfeld The distribution of the heads of prepositions in the MF (Table 2) re-
veals an interesting property: they are very rarely in the VF or NF. More specif-
ically, the verbal candidate head for a MF preposition is in the LK or RK, while
the nominal candidate heads are mostly in the MF. The set of candidate heads
in the MF can be further restricted, given that German prepositions with nominal
heads typically attach leftwards [16].3 Another assumption that is sometimes made
[10, 16] is that only the noun that immediately precedes the preposition is a nomi-
nal head candidate. However, in 12.41% of the preposition-noun attachments in the
TüBa-D/Z there is another noun between the head and the preposition. Typically,
the interspersed word is a genitive modifier (der Opposition in Example 2) or a
prepositional phrase (mit verschiedenen Sprachen in Example 3) that attach to the
same head as the preposition. There are, however, also cases when the interspersed
material attaches to another head (auf der Kundgebung in Example 4).
                                                                                              PP

       [VF Goetsch ] [LK bezog ] [MF   sich auf einen Vorschlag der Opposition von voriger Woche ] .
          Goetsch        referred    (herself) to a   proposal of-the opposition of  last  week    .
(2)

                                                                                      PP

       [VF Wir ] [LK waren ] [MF verschiedene Leute mit verschiedenen Sprachen aus einem Land ] .
           We        were          different people with   different  languages from one country .
(3)

                                                     PP                  PP

       [VF    Es ] [LK gibt ] [MF viel Beifall auf der Kundgebung für Margret Mönig-Raane ] .
             There      is        much approval at the    rally   for Margret Mönig-Raane .
(4)

    As mentioned before, verbal attachments of the preposition always attach to
the main verb in the TüBa-D/Z dependency scheme. The main verb is in the LK if
the clause is verb-second declarative without a auxiliary/modal verb (Example 2).
Otherwise, it is in the RK (Example 5).
      3 In TüBa-D/Z 97.51% of the preposition-noun attachments in the MF are leftward.


                                                                 92
Vorfeld Two things stand out in the distribution of heads of VF prepositions in
Table 2: (1) in contrast to prepositions in the MF, VF prepositions can have nominal
heads that lie outside their field, namely in the MF and (2) the vast majority of
verbal heads are in the LK. It should not be surprising that nominal heads can be
in the MF, since German permits topicalization of PPs. In Example 5 the PP Für
diese Behauptung is topicalized. Additionally, this example shows that in the case
of noun-attachment, the head is not necessarily the first noun of the MF. Instead,
the preposition für attaches to the direct object Nachweis and not to the first MF
noun, Beckmeyer. The consequence for candidate extraction is that this generates
a lot of candidates since nominal heads of a preposition in topicalized PPs can lie
anywhere in the MF.
                                                       PP

      [VF Für diese Behauptung ] [LK hat ] [MF Beckmeyer keinen Nachweis ] [RK geliefert ] .
          For this assertion         has       Beckmeyer no      proof         provided .
(5)

     When the PP in the VF is not topicalized, it becomes very likely that the prepo-
sition attaches to a noun in the VF. Typical for this case is that the preposition is
immediately preceded by a noun. Table 3 gives the distribution of such preposi-
tions and shows that all nominal attachments are now in the VF. The immediately
preceding noun is the head of the preposition in 88.65% of these cases, while in
the other 11.35% of cases the head is another preceding VF noun.
                                                                        Preposition field
                                                                          VF          NF
                                 Prep. head field


                                                              VF        98.34       0.07
                                                    nominal   MF         0.01       0.47
                                                              NF         0.00      94.85
                                                              LK         1.65       1.74
                                                    verbal
                                                              RK         0.00       2.88

Table 3: Distribution of prepositions and their heads when a preposition is in the
VF or NF, and is immediately preceded by a noun.

    Zooming back out on the overall distribution in Table 2, we see that the over-
whelming majority (55.24% versus 1.87%) of verbal heads is in the LK. This is
an artifact of the dependency conversion of the TüBa-D/Z — which attaches PPs
in the VF to the LK. For consistency, we reattach the preposition to the main verb
when the main verb is not in the LK.

Nachfeld Prepositions in the NF regularly attach to heads in every bracket or
field, except in the VF. NF preposition with nominal heads show again a marked
preference for attachments to heads in the same field (35.74% nominal NF attach-
ments in Table 2), while at the same time allowing for the most nominal attach-
ments to another field, namely to the MF (6.15%). Similarly to the VF, if the
preposition is immediately preceded by a noun in the NF, the head virtually always
lies in the NF or brackets (Table 3).


                                                                   93
3      Data set construction
The PP-attachment data set is constructed using a set of rules based on the in-
sights of the previous section. Using this set of rules, which is summarized in
Appendix A, we extracted 72,878 prepositions with at least two candidate heads
from TüBa-D/Z release 10.
    In Gloss 6 we show an example from our PP attachment data set. All the
words bounded by boxes would be considered possible heads for the underlined
preposition an4 . However, using the insights from the previous section, we can
remove the seven words highlighted in red from the candidate set and retain only
the other four candidates. The correct candidate head geflossen is highlighted in
green, while the incorrect candidates are blue. Compared to a crude extraction
procedure, seven of the eleven original candidates are eliminated. The remaining
four candidates are included in the dataset, specifying in each case whether the
candidate is the actual head or not.
    (6) [VF 165.000 Mark aus der bundesweiten Geldsammlung für die Flutopfer in Südpolen ]
            165,000 Mark from the nation-wide fundraiser   for the flood-victims in south-Poland
         sind [MF über das Konto des Bremer         Landesverbandes der     AWO [P an ] die Caritas in
        are       via the account of-the of-Bremen state-chapter    of-the AWO     to the Caritas in
         Danzig ] geflossen .
        Danzig    flowed      .
        165,000 Mark from the nation-wide fundraiser for flood victims in south Poland flowed to Caritas in
        Danzig through the account of the Bremen state chapter of the AWO.

    This selection of heads is representative for our data set. Table 4 shows the
average number of possible heads before and after our candidate selection. On
average, the number of candidates is reduced from 10.34 to 3.15. Moreover, the
thesis that PP-attachment is not a binary classification task is confirmed by the
average number of candidates in our data set. In the next section, we will explore
the ramifications of the average number of candidates further.
                      Prep. field    Instances    Possible heads    Candidate heads
                         VF             21560               9.42               3.42
                         MF             48250              10.67               3.04
                         NF               3068             11.55               3.11
                         All            72878              10.34               3.15

Table 4: Average number of candidate heads for prepositions before and after
selection. Only instances with at least two candidates are counted.


4      Consequences for the PP-attachment task
As discussed in Section 1, PP-attachment is typically treated as a binary classifi-
cation task, where a preposition can be attached to the main verb or the noun that
     4 Of course, the noun Caritas is the complement of the preposition; if the PP is already bracketed

this noun can be immediately removed from the candidate list.


                                                    94
immediately precedes the preposition. We have shown that this approach to PP-
attachment is unsound, both because the preposition is not necessarily preceded by
a noun in German (Section 2.2) and because the average number of heads is larger
than two (Section 3). This raises two interesting questions: how many prepositions
in ambiguous positions were missed because they were not immediately preceded
by a noun; and is the task of preposition attachment more difficult when there are
more than two candidate heads?
     In order to answer these questions, we extract a data set from the PP-attachment
set that was described in Section 3 where each preposition only has two candidate
heads (binary data set). First, we remove all instances where either a preposition is
not preceded by a noun or the head is a noun that is not the immediately preceding
noun. From the remaining instances we remove all noun candidates, except for the
noun that immediately precedes the head.
     The binary data set contains 32.8% fewer training instances than the full data
set. This answers our first question — in treating preposition attachment as a bi-
nary classification task, almost one third of the ambiguous prepositions are missed
because they have no immediately preceding noun or incorrectly treated because
another noun than the immediately preceding noun was the head.
     To answer the second question, we train feed-forward neural networks that
estimate attachment probabilities for each candidate, on the binary data set and on
the data set with multiple candidate heads, respectively.5 In the evaluation of both
networks, the attachment with the highest probability is regarded as the attachment
chosen by the network. The networks use a hidden ReLU layer, a sigmoid output
layer, and the feature set proposed by Kübler et al. [10]: the preposition, the object
of the preposition, the candidate head, and their part-of-speech tags (where each
word or tag with one of these three relations is encoded as a binary feature). In
addition, the absolute and relative distances between the candidate head and the
preposition are added as integer features.
     Table 5 shows the results on the two data sets (with binary v.s. multiple candi-
dates) using an 80/20% split for training and evaluation data. We can clearly see
that the realistic task with multiple head candidates is considerably more difficult
than the binary classification task.
                      Data set   Train samples    Eval samples    Accuracy (%)
                      Binary             39179            9795           78.00
                      Multiple           39179            9795           68.79

Table 5: Preposition attachment accuracy with only two head candidates (binary)
and multiple candidates (multiple).

   5 Since the binary data set is the smaller of the two sets, we first take a random sample of the set

with multiple candidates so that both data sets have the same size.


                                                  95
5     Conclusion
Most previous work in German PP-attachment has assumed that a preposition at-
taches either to the immediately preceding noun or the main verb. However, the
qualitative analysis in the present work provides evidence that prepositions do not
only attach to immediately neighboring nouns (Examples 2, 3, and 5). The quanti-
tative analysis shows that such contexts, where there are more than two candidates,
are indeed very common. Consequently, PP-attachment should rather be consid-
ered to be a ranking task, supporting the thesis of Foth and Menzel [5].
     Based on these insights, we have constructed a PP-attachment data set for Ger-
man that includes all the realistic attachment points for a preposition, using topo-
logical field analyses. We expect that this data set can facilitate future research in
PP-attachment, since our preliminary analysis in Section 4 has shown that the task
is considerably harder under the presence of multiple head candidates.
     The data set is provided as a stand-off annotation for the TüBa-D/Z treebank.
This allows users of the data set to extract the information that is relevant to the
task at hand from the annotation layers provided by TüBa-D/Z. This stand-off an-
notation will be made available to licensees of the TüBa-D/Z.6


Acknowledgments
Financial support for the research reported in this paper was provided by the Ger-
man Research Foundation (DFG) as part of the Collaborative Research Center “The
Construction of Meaning” (SFB 833), project A3.


References
 [1] Steven P Abney. Stochastic attribute-value grammars. Computational Lin-
     guistics, 23(4):597–618, 1997.

 [2] Daniël de Kok and Erhard W. Hinrichs. Transition-based dependency parsing
     with topological fields. In Proceedings of the 54th Annual Meeting of the
     Association for Computational Linguistics, ACL 2016, August 7-12, 2016,
     Berlin, Germany, Volume 2: Short Papers, 2016.

 [3] Erich Drach. Grundgedanken der Deutschen Satzlehre. Frankfurt/Main,
     1937.

 [4] Oskar Erdmann. Grundzüge der deutschen Syntax nach ihrer geschichtlichen
     Entwicklung dargestellt. Stuttgart: Cotta, 1886. Erste Abteilung.

 [5] Kilian A. Foth and Wolfgang Menzel. The benefit of stochastic PP attach-
     ment to a rule-based parser. In Proceedings of the COLING/ACL on Main
    6 http://www.sfs.uni-tuebingen.de/ascl/ressourcen/corpora/tueba-dz.html


                                          96
     conference poster sessions, pages 223–230. Association for Computational
     Linguistics, 2006.
 [6] Simon Herling. Über die Topik der deutschen Sprache. In Abhandlungen
     des frankfurterischen Gelehrtenvereins für deutsche Sprache, pages 296–362,
     394. Frankfurt/Main, 1821. Drittes Stück.
 [7] Donald Hindle and Mats Rooth. Structural ambiguity and lexical relations.
     Computational linguistics, 19(1):103–120, 1993.
 [8] Tilman Höhle. Der Begriff ‘Mittelfeld’. Anmerkungen über die Theorie der
     topologischen Felder. In A. Schöne, editor, Kontroversen alte und neue. Ak-
     ten des 7. Internationalen Germanistenkongresses Göttingen, pages 329–340.
     Tübingen: Niemeyer, 1986.
 [9] Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi, and Stefan Rie-
     zler. Estimators for stochastic unification-based grammars. In Proceedings of
     the 37th annual meeting of the Association for Computational Linguistics on
     Computational Linguistics, pages 535–541. Association for Computational
     Linguistics, 1999.
[10] Sandra Kübler, Steliana Ivanova, and Eva Klett. Combining Dependency
     Parsing with PP Attachment. In Fourth Midwest Computational Linguistics
     Colloquium, 2007.
[11] Joakim Nivre. An efficient algorithm for projective dependency parsing.
     In Proceedings of the 8th International Workshop on Parsing Technologies
     (IWPT), pages 149–160, 2003.
[12] Miles Osborne. Estimation of stochastic attribute-value grammars using an
     informative sample. In Proceedings of the 18th conference on Computa-
     tional linguistics-Volume 1, pages 586–592. Association for Computational
     Linguistics, 2000.
[13] Adwait Ratnaparkhi. Statistical models for unsupervised prepositional phrase
     attachment. In Proceedings of the 17th international conference on Computa-
     tional linguistics-Volume 2, pages 1079–1085. Association for Computational
     Linguistics, 1998.
[14] Heike Telljohann, Erhard W. Hinrichs, Sandra Kübler, Heike Zinsmeister,
     and Kathrin Beck. Stylebook for the Tübingen treebank of written German
     (TüBa-D/Z). In Seminar fur Sprachwissenschaft, Universitat Tubingen, Tub-
     ingen, Germany, 2006.
[15] Yannick Versley. Parser evaluation across text types. In Proceedings of the
     Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), 2005.
[16] Martin Volk. Exploiting the WWW as a corpus to resolve PP attachment
     ambiguities. In Proceedings of Corpus Linguistics, volume 200, 2001.


                                       97
A    Extraction rules
In this appendix, we give a per-field description of these extraction rules. For all
three fields, the verb candidate is the main verb. The main verb is found by (transi-
tively) resolving the AUX relation (which is used to attach verbs to an auxiliary or
modal) until we encounter a verb that is not an auxiliary or modal verb.

Mittelfeld To find the set of candidate heads in the MF, we scan backwards from
a preposition until we find a token that forms the LK. Every noun on this path is
marked as a candidate head. If the clause under consideration is a main clause,
the finite verb is in the LK. We resolve for the main verb using the LK and add
it to the candidate set. If the clause is a subordinate clause, the LK is normally a
complementizer, which has an attachment to the finite verb in the RK. We use this
verb to find the main verb and add it to the candidate set.

Vorfeld While extracting from the VF, we should take two different scenarios
into account: (1) the preposition is immediately preceded by a noun in the VF or
(2) the preposition is not immediately preceded by a noun in the VF. In the former
case, we only add nominal candidates in the VF that precede the preposition. In the
latter case, nouns in the MF are added as candidates as well. The verb candidate is
found by scanning rightward from the preposition until we find the LK. The verb
in the LK is then used to find the main verb.

Nachfeld Processing of the NF is similar to the VF: when the preposition is im-
mediately preceded by a noun, nouns in the NF immediately preceding the prepo-
sition are added as candidates. If the preposition is not preceded by a noun, nouns
in the MF are added as well. To find the main verb candidate, we scan leftward
until we find a bracket and resolve for the main verb.


                                         98

</pre>