<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Corpus-Based Model of Semantic Plausibility for German Bracketing Paradoxes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Corina Dima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jianqiang Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Bücking</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frauke Buscher</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johanna Herdtfelder</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julia Lukassek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Prysłopska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erhard Hinrichs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniël de Kok</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Maienborn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SFB 833, Deutsches Seminar and Seminar für Sprachwissenschaft University of Tübingen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>64</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>In this paper, we investigate German constructions composed of an adjective and a two-part nominal compound, such as katholisches Kirchenoberhaupt ('catholic church.leader'), focusing on two issues: (i) what are the prerequisites for semantically possible adjective-nominal compound constructions; (ii) which semantic factors determine the availability of bracketing paradox readings (i. e., the adjective modifies the first noun) in such constructions. We test theory-driven hypotheses using a corpus-based frequency model and evaluate the performance of the model with respect to human annotations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nominal compounds which are modified by an adjective, as in Ex. (1), can have
an iconic or an anti-iconic reading. Here, the phrase may refer to a church leader,
who is Catholic (iconic reading), or to the leader of the Catholic Church
(antiiconic reading). The latter reading is discussed in the literature as a bracketing
paradox (BP) because the semantic bracketing [[adjective noun] noun] does
not iconically match the structural bracketing [adjective [noun noun]], the
latter of which is invariant in German. Formally explicit approaches to this
phenomenon focus on systematically deriving their ambiguity ([2, 4]), but fail to
satisfactorily account for the puzzling ungrammaticality of Ex. (
        <xref ref-type="bibr" rid="ref1">2</xref>
        ). While the iconic
reading of this example, four-storeyed owner of a house, is semantically
impossible, the anti-iconic reading, the owner of a four-storeyed house, should be fine.
Surprisingly, however, this reading is considered impossible in German, according
to native speaker judgments.
(1) Katholisches Kirchenoberhaupt
      </p>
      <p>
        Catholic church.leader
(
        <xref ref-type="bibr" rid="ref1">2</xref>
        )
* vierstöckiger Hausbesitzer
four-storeyed house.owner
      </p>
      <p>We investigate such German constructions composed of an adjective (A) and
a two-part nominal compound (N1N2), focusing on two issues: (i) what are the
prerequisites for semantically possible A-N1N2 constructions; (ii) which semantic
factors determine the availability of the iconic or anti-iconic reading. We analyze
these constructions from a theoretical perspective (Section 2) and test the
theoretical assumptions using a corpus-based frequency model (Sections 3 and 4). In
a second step, the performance of the model is evaluated with respect to human
annotations (Sections 5 and 6).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Modeling Bracketing Paradoxes</title>
      <p>
        The perceived ungrammaticality of Ex. (
        <xref ref-type="bibr" rid="ref1">2</xref>
        ) suggests a prerequisite for any A-N1N2
construction, namely, that A-N2 must be semantically possible (see as well [1]).
We formulate this intuition as Hypothesis 1 (H1):
H1: if an A-N1N2 is semantically possible, then A-N2 is semantically possible.
      </p>
      <p>This restriction does not make any assumptions regarding the distinction
between the iconic or anti-iconic interpretation of A-N1N2 constructions. We
hypothesize that this distinction is based on the relative semantic plausibility ([5],[8])
of the A-N1 and A-N2 constructions. We formulate this intuition as Hypothesis 2
(H2):
H2: for examples where H1 holds, the higher the semantic plausibility of A-N1
relative to A-N2 is, the more likely it is that A-N1N2 is a bracketing paradox.</p>
      <p>
        The effects of H2 are exemplified by Ex. (1), (
        <xref ref-type="bibr" rid="ref2">3</xref>
        ) and (
        <xref ref-type="bibr" rid="ref3">4</xref>
        ). Ex. (1) is a
bracketing paradox because the semantic plausibility of the phrase Catholic Church is
presumably greater than the one of the phrase Catholic leader. Ex. (
        <xref ref-type="bibr" rid="ref2">3</xref>
        ) and (
        <xref ref-type="bibr" rid="ref3">4</xref>
        ) are
different: the phrase Catholic table in Ex. (
        <xref ref-type="bibr" rid="ref2">3</xref>
        ) is semantically impossible, and thus
trivially less plausible then Catholic prayer. The semantic plausibilities of Catholic
company and Catholic leader from Ex. (
        <xref ref-type="bibr" rid="ref3">4</xref>
        ) do not differ significantly. Therefore,
Ex. (
        <xref ref-type="bibr" rid="ref2">3</xref>
        ) is not considered a bracketing paradox, whereas Ex. (
        <xref ref-type="bibr" rid="ref3">4</xref>
        ) is a borderline case
which can be interpreted both iconically and anti-iconically.
      </p>
      <p>
        (
        <xref ref-type="bibr" rid="ref2">3</xref>
        ) katholisches Tischgebet
      </p>
      <p>
        Catholic table.prayer
(
        <xref ref-type="bibr" rid="ref3">4</xref>
        ) katholisches Firmenoberhaupt
      </p>
      <p>Catholic company.leader
3</p>
    </sec>
    <sec id="sec-3">
      <title>Frequency-based Semantic Plausibility Model</title>
      <p>We verify H1 and H2 using a frequency-based model derived from the 11.6 billion
tokens decow14ax corpus [6]. The model considers the lemmatised form of the
words for computing the frequencies. For an A-N1N2 construction we compute the
following:
f reqA N1 , the frequency (number of corpus occurrences) of A-N1
f reqA N2 , the frequency of A-N2
f reqA N1N2 , the frequency of A-N1N2</p>
      <p>f reqA N1 , the relative frequency of A-N1 and A-N2.</p>
      <p>r fA N1N2 = f reqA N2</p>
      <p>We use the frequency of a construction in the corpus to model the notions of
semantic possibility and semantic plausibility and to make judgments regarding
the two hypotheses formulated in Section 2. If the frequency of a construction is
higher than 0, the construction is considered semantically possible. We consider
constructions that do not occur in the corpus to be semantically impossible.</p>
      <p>The semantic plausibility score for an adjective-noun pair is given by the
frequency count. The relative semantic plausibility score is the relative frequency of
the two adjective-noun pairs in an A-N1N2 construction ( ff rreeqqAANN12 ).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Testing the Hypotheses using the Frequency-based Semantic Plausibility Model</title>
      <p>We test our two hypotheses using a dataset of 198 A-N1N2 constructions compiled
based on the theoretical literature.</p>
      <p>
        To test H1, an A-N1N2 construction is considered semantically possible if its
corpus frequency is greater than 0. The dataset contained 77 semantically
possible A-N1N2 constructions (i. e., constructions that actually occurred in the
corpus1). For 70 of these examples, our model predicted A-N2 being also semantically
possible, resulting in a 90.9% prediction accuracy for H1. An interesting case is
the phrase in Ex. (
        <xref ref-type="bibr" rid="ref1">2</xref>
        ), vierstöckiger Hausbesitzer: the full construction is
considered semantically possible, because it has 10 occurrences in the corpus (as part of
meta-discussions concerning its semantic impossibility). The same reasoning holds
for the construction verregnete Feriengefahr ‘rainy vacation.danger’, which occurs
twice in the corpus. The respective A-N2 pairs of these constructions, however, do
not occur, pointing to a logical discrepancy caused by the false initial assumption.
      </p>
      <p>For H2, the model computes a relative semantic plausibility score for each
semantically possible A-N1N2 construction in our dataset. We identify this score
with the BP-ness of an A-N1N2 construction. An initial inspection of the
constructions with a high relative semantic plausibility score shows that these are indeed
1The discrepancy between the initial and the attested number of constructions can be explained
as follows: on the one hand, many of the examples were ungrammatical contrasts to grammatical
examples; other examples were constructed by analogy with existing examples, which of course
does not imply their actual occurrence in the corpus.
constructions with an anti-iconic reading: katholisches Kirchenoberhaupt has a
score of 1328, europäischer Auslandsaufenthalt (‘European foreign-country.stay’)
has a score of 7327. In contrast, constructions with a very low score, like
verrückter Chemieprofessor (‘crazy chemistry.professor’, score 0:003) or ambulante
Unfallbehandlung (‘ambulant accident.treatment’, score 0:0003), clearly have no
anti-iconic reading. Borderline cases include bedrohliches Krankheitssymptom
(‘menacing desease.symptom’, score 4:86) and politische Satiresendung
(‘political satire.broadcast’, score 1:81).</p>
      <p>The model confirmed H1, and thereby provided good evidence for the
assumption that for all A-N1N2, A-N2 must be semantically possible, irrespective
of whether A can modify N1 or not. The initial observations also suggest that the
frequency model can be successfully used to test H2. To test H2, we annotated
the set of 77 A-N1N2 constructions with regard to their perceived iconicity. The
annotation is described in the next section.
5
5.1</p>
    </sec>
    <sec id="sec-5">
      <title>Annotation</title>
      <sec id="sec-5-1">
        <title>Annotation guidelines</title>
        <p>The dataset contained those 77 A-N1N2 constructions that are semantically
possible according to the frequency model. These items were annotated by 5 PhD
students in linguistics (3 women, 2 men; native speakers of German); two of them are
co-authors of this paper. The items were presented to the annotators in an
Excelspreadsheet in one of 5 randomized orders. The annotators worked independently
from each other.</p>
        <p>The annotators annotated each item according to the following two questions2:
Q1 Is the A-N1N2 construction as a whole grammatical? yes/no</p>
        <p>Q2 Which reading is preferred? anti-iconic, iconic, equal preference
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results &amp; discussion</title>
        <p>Grammaticality (Q1) We categorized the 77 items according to the annotation
question Q1. As a prerequisite, at least 4 (out of 5) annotators had to agree upon
an answer. If there was no corresponding agreement, the item was not categorized.
73 out of 77 items are judged as grammatical. 2 items are judged as
ungrammatical, and 2 items could not be categorized; these 4 examples were excluded from
the evaluation of Q2. The inter-rater agreement had a Fleiss’ k value [3] of 0.45
(moderate agreement). Notably, the 2 items that were judged as ungrammatical are
vierstöckiger Hausbesitzer and verregnete Feriengefahr. These are exactly those
cases that are falsely classified as semantically possible by the frequency-based
2We also asked the annotators whether A-N1 and A-N2 are semantically possible. As the answers
do not directly bear on H2, we will not report them here.
model as they occur in metadiscussions on their semantic impossibility (see
Section 4).</p>
        <p>Preference (Q2) We categorized the remaining 73 grammatical items according
to the annotation question Q2. As a prerequisite, at least 4 (out of 5) annotators
had to agree upon an answer (if all 5 annotators judged the item as grammatical),
or, at least 3 (out of 4) annotators had to agree upon an answer (if only 4
annotators judged the item as grammatical). 11 items could not be categorized, as the
required majority was not obtained. From the remaining 62 items, 16 were
perceived to have an anti-iconic reading, while the other 46 were perceived to have
an iconic reading. The inter-rater agreement had a Fleiss’ k value of 0.58
(moderate agreement). The results yield a two-way distinction between anti-iconic and
iconic readings. Notably, no items were annotated as being truly ambiguous.
However, for 11 items, the annotators did not agree upon a preferred reading. Among
these, several examples are prototypes for bracketing paradoxes according to the
theoretical literature; in fact, 4 have a tendency for being ambiguous (e. g.,
politische Satiresendung) and 2 have a tendency for being anti-iconic (e. g., katholisches
Kirchenoberhaupt). As this result is in need of further clarification, we excluded
the problematic data points from the evaluation of the frequency-based model. The
remaining 62 items annotated for iconicity will be further used to train and test the
frequency-based model.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results of the Frequency-based Semantic Plausibility</title>
    </sec>
    <sec id="sec-7">
      <title>Model</title>
      <p>This section presents the results of using the frequency-based semantic plausibility
model introduced in Section 3 to predict the iconicity of A-N1N2 constructions.
The 62 annotations which were considered to be grammatically correct (Q1) and
were assigned the same preferred reading by the majority of the human annotators
(Q2) are used as a dataset.</p>
      <p>The task at hand is to predict if a particular A-N1N2 construction is considered
a bracketing paradox or not using only the frequency information, in particular the
relative semantic plausibility score which we normalize across the examples in our
dataset.</p>
      <p>We use logistic regression, a widely-used linear machine learning method, to
train a prediction model. Table 1 reports the average F1 score [7] and accuracy
figures obtained for 10-fold cross-validation. The model hyper-parameter (the
regularization coefficient) is chosen individually for each fold, using a grid search
over 10 equally spaced values in the interval [1e 4, 1e + 4]. The results show
that despite the imbalanced number of instances for each class, the relative
semantic plausibility score is a very good predictor for the preferred interpretation of a
particular construction.</p>
      <sec id="sec-7-1">
        <title>Data set</title>
        <p>16 BP, 46 non-BP (62 total)</p>
      </sec>
      <sec id="sec-7-2">
        <title>F1 score</title>
        <p>0.90</p>
      </sec>
      <sec id="sec-7-3">
        <title>Accuracy (%)</title>
        <p>95.71
The corpus-based frequency model confirmed both H1 and H2. First, for all
AN1N2 constructions, A-N2 must be semantically possible, irrespective of whether
A can modify N1 or not (H1). Second, the higher the relative semantic plausibility
score of A-N1N2 constructions, the more likely it is that the construction is a
bracketing paradox (H2). This result is based on our evaluation of the frequency-based
model using human annotation as a gold standard.</p>
        <p>Our study also pointed to issues that need to be further investigated in future
work. From a theoretical perspective, it is surprising that katholisches
Kirchenoberhaupt, the prototypical textbook example for bracketing paradoxes, received mixed
ratings from the annotators. This suggests that the distinction might not necessarily
be a binary one. We plan to complement our results via a rating study that
elicits graded judgments for the perceived iconicity of A-N1N2 constructions. From
the perspective of the computational modeling, we discovered some limitations of
the frequency-based model. For example, in the construction intelligenter
Tierarzt ‘intelligent animal.doctor’, the pair intelligenter Arzt is very infrequent, as the
adjective ‘intelligent’ spells out an implied attribute of the ‘doctor’, whereas the
locution intelligentes Tier is much more frequent. Thus, the model predicts that this
construction should have an anti-iconic interpretation (‘doctor for intelligent
animals’), which is clearly wrong. Another shortcoming relates to the inability of the
model to make predictions in absence of the frequency information, which resulted
in analyzing only a part of the initial dataset. In order to circumvent these
shortcomings, we plan to use distributional semantics models, which have the ability to
integrate information about the semantics of the construction’s constituents.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Financial support for the research reported in this paper was provided by the
German Research Foundation (DFG) as part of the Collaborative Research Center “The
Construction of Meaning” (SFB 833), projects A1 and A3. We thank the
anonymous reviewers for their comments and our annotators for their support.
[1] Rolf Bergmann. Verregnete Feriengefahr und Deutsche Sprachwissenschaft.</p>
      <p>Zum Verhältnis von Substantivkompositum und Adjektivattribut.
Sprachwis</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Markus</given-names>
            <surname>Egg</surname>
          </string-name>
          .
          <article-title>Flexible semantics for reinterpretation phenomena</article-title>
          .
          <source>CSLI Publications Stanford</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Joseph</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Fleiss</surname>
          </string-name>
          .
          <article-title>Measuring nominal scale agreement among many raters</article-title>
          .
          <source>Psychological bulletin</source>
          ,
          <volume>76</volume>
          (
          <issue>5</issue>
          ):
          <fpage>378</fpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Richard</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Events and modification in nominals</article-title>
          .
          <source>In Semantics and Linguistic Theory</source>
          , volume
          <volume>8</volume>
          , pages
          <fpage>145</fpage>
          -
          <lpage>168</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Angeliki</given-names>
            <surname>Lazaridou</surname>
          </string-name>
          , Eva Maria Vecchi, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <article-title>Fish transporters and miracle homes: How compositional distributional semantics can help NP parsing</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2013</year>
          ), Seattle, USA,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Roland</given-names>
            <surname>Schäfer</surname>
          </string-name>
          .
          <article-title>Processing and querying large web corpora with the COW14 architecture. Challenges in the Management of Large Corpora (CMLC-3</article-title>
          ), page 28,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C. J. van Rijsbergen. Information</given-names>
            <surname>Retrieval</surname>
          </string-name>
          .
          <source>Butterworth-Heinemann, 2nd edition</source>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Eva</given-names>
            <surname>Maria</surname>
          </string-name>
          <string-name>
            <surname>Vecchi</surname>
          </string-name>
          , Marco Baroni, and Roberto Zamparelli.
          <article-title>(Linear) maps of the impossible: capturing semantic anomalies in distributional space</article-title>
          .
          <source>In Proceedings of the Workshop on Distributional Semantics and Compositionality</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>