=Paper=
{{Paper
|id=Vol-1347/paper01
|storemode=property
|title=Implicative structure and joint predictiveness
|pdfUrl=https://ceur-ws.org/Vol-1347/paper01.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/BonamiB15
}}
==Implicative structure and joint predictiveness==
<pdf width="1500px">https://ceur-ws.org/Vol-1347/paper01.pdf</pdf>
<pre>
                       Implicative structure and joint predictiveness

                   Olivier Bonami                                       Sacha Beniamine
              Université Paris-Sorbonne                            Université Paris Diderot
          Laboratoire de linguistique formelle                Laboratoire de linguistique formelle &
             (U. Paris Diderot & CNRS)                          Alpage (Inria & U. Paris Diderot)
          olivier.bonami@paris-sorbonne.fr


1     Introduction                                                 Overall, no other cell in the paradigm is a very
                                                               good predictor of the past participle. However,
(Ackerman et al., 2009) define the PARADIGM
                                                               joint knowledge of some pairs of paradigm cells
C ELL F ILLING P ROBLEM (PCFP), which we
                                                               radically improves the quality of prediction. For
paraphrase in (1), as the cornerstone of the study
                                                               instance, joint knowledge of the infinitive and
of inflectional paradigms.
                                                               some present plural form removes all uncertainty
(1)    How do speakers know how to inflect the full            in the sample in Table 1: knowledge of the in-
       paradigm of a lexeme on the basis of expo-              finitive form partitions the set of lexemes in two
       sure to only some of its forms?                         classes within which the PRS .3 PL is fully predic-
                                                               tive of the past participle.
   (Ackerman et al., 2009) go on to argue that
speakers rely on knowledge of the IMPLICA -                        Although the existence of joint predictiveness is
TIVE STRUCTURE of paradigms (Wurzel, 1984):
                                                               acknowledged in the literature (Matthews, 1972;
paradigms are structured in such a way that there              Thymé et al., 1994; Ackerman et al., 2009; Stump
are reliable correlations between the form filling             and Finkel, 2013; Blevins, in press; Sims, 2015),
one paradigm cell A and the form filling another               little attention has been given to quantifying its im-
cell B. The reliability of these correlations de-              portance. In this paper we first give further argu-
pends on the particular pair of cells A and B un-              ments that joint predictiveness is a crucial aspect
der scrutiny; it can be assessed quantitatively by             of implicative structure, and that a careful empiri-
examining the statistical distribution of operations           cal examination of joint predictiveness is essential
required to go from A to B in the lexicon.                     to both linguistic and psycholinguistic assessment
   This presentation focuses on one particular                 of the PCFP and related issues. We then propose
aspect of implicative structure, which we call                 and illustrate a method for the quantitative evalua-
JOINT PREDICTIVENESS . In some situations, joint
                                                               tion of joint predictiveness. We end with a discus-
knowledge of two paradigm cells A and B pro-                   sion of principal part systems.
vides more information on cell C than could be
inferred from knowledge of either A or B. Table 1
                                                               2    The relevance of joint predictiveness
below provides a simple example from French, us-               We start by establishing that speakers do have the
ing lexemes illustrating 7 patterns corresponding              opportunity to use joint predictiveness. Figure 1
to of 95% of the verbs documented in the Flex-                 plots how the number of forms per lemma evolves
ique phoneticized lexicon (Bonami et al., 2014).               when walking through the 1.6 billion words of
In French conjugation, predicting the past par-                the FrWaC web corpus (Baroni et al., 2009), re-
ticiple from the infinitive is hard, because of the            stricting attention to the 6847 verbs documented
opacity between second conjugation infinitives,                in the Lefff lexicon (Sagot, 2010) to compensate
such as B ÂTIR, and some third conjugation in-                for tagging errors.1 Note that 1.6 billion words is
finitives, such as TENIR , OUVRIR , MOURIR. Pre-
                                                                   1
dicting the past participle from present SG forms                    Note that this restriction leads to overestimating the av-
                                                               erage number of forms per lemma, as neologisms, very rare
is also hard, this time because some first conju-              words and hapaxes not present in the lexical resource are not
gation verbs with a stem in -i (e.g. RELIER) are               included. We are counting distinct forms rather than distinct
not distinguished from second conjugation verbs.               paradigm cells, as there is currently no tagger for French that
                                                               reliably disambiguates homographic forms of the same lex-
A different subset of first conjugation verbs (e.g.            eme. French verbs have 51 paradigm cells, and the average
RATISSER ) raises similar problems for PL forms.               number of distinct forms per verb in the Lefff lexicon is 35.8.


                  Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
 In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                           Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org
                            Lexeme                INF        PRS .3 SG   PRS .3 PL   PST. PTCP      #
                            LIVRER ‘deliver’      livKe      livK        livK        livKe       4108
                            RELIER ‘link’         K@lje      K@li        K@li        K@lje        210
                            RATISSER ‘rake’       Katise     Katis       Katis       Katise        22
                            B ÂTIR ‘build’       batiK      bati        batis       bati        327
                            TENIR ‘hold’          t@niK      tjẼ        tjEn        t@ny         37
                            OUVRIR ‘open’         uvKiK      uvK         uvK         uvEK          8
                            MOURIR ‘die’          muKiK      mœK         mœK         mOK           1

Table 1: Exemplary paradigms for inflection patterns for 4-cell subparadigms of French verbs (data from
Flexique — 5% of the lemmas illustrating minor patterns have been excluded)


in the order of magnitude of the overall linguis-                    ness from two cells to infer the likely form of the
tic exposure of an adult speaker. The distribution                   participle.
strongly suggests that, as speakers get exposed to                      The final observation is that there are important
more words, paradigms fill slowly on average, so                     linguistic generalizations that can only be obtained
that predicting unknown forms stays relevant; at                     by looking at joint predictiveness. To supplement
the same time, speakers are massively exposed to                     the French data presented in the introduction, let
multiple forms of the same lexemes, which makes                      us consider a spectacular example from European
knowledge of joint predictiveness relevant to ad-                    Portuguese, concerning the prediction of the form
dressing the PCFP.                                                   of the infinitive from those of the present singu-
   A second relevant observation is that speakers                    lar. Table 2 presents relevant data. Because it does
do manifest knowledge of joint predictiveness. Al-                   not contain a theme vowel, the present 1 SG is a
though this topic deserves dedicated experimen-                      bad predictor of the infinitive: a priori, any present
tal studies that are beyond the scope of this pa-                    1 SG could correspond to a first, second or third
per, circumstantial evidence from speech errors                      conjugation verb. 2 SG and 3 SG forms are slightly
is easy to find. One common conjugation error                        better predictors, as they distinguish first conju-
in French (Kilani-Schoch and Dressler, 2005) is                      gation endings (-5S,-5) from second/third conjuga-
to use mouru as the past participle of MOURIR,                       tion endings (-@S,-@); the distinction between the
whereas mouri is almost never used (140 rele-                        two last conjugations is still neutralized. How-
vant occurrences of mouru in the full FrWaC cor-                     ever, if a verb has a mid prethematic vowel in the
pus, 0 or mouri). This would be surprising if                        2 SG and 3 SG, the shape of that vowel is raised
speakers were analogizing from a single paradigm                     to high-mid in the 1 SG in the second conjugation
cell: given knowledge of the sole infinitive, mouri                  (witness RECEBER , RECORRER), and to high in
would be the most likely regularization; given                       the third conjugation (witness SEGUIR , SUBIR).
knowledge of some present form, mouré or meuré                     Whether one sees this phenomenon as the result
would be expected.2 Thus the property speakers                       of a synchronic vowel harmony in the 1 SG oper-
seem to be sensitive to is the existence of an al-                   ating prior to theme vowel deletion (Mateus and
lomorphic relation between the infinitive and the                    d’Andrade, 2000) or as a historical remnant with
present stem—hence, employing joint predictive-                      no synchronic motivation, it remains that on the
     2
                                                                     surface, for verbs with a mid prethematic vowel
       A reviewer points out that if speech errors are due to
analogy to the nearest (frequent) neighbor, mouru is unsur-          in the 2 SG and 3 SG, knowledge of the 1 SG disam-
prising, as courir (past participle couru) is the most frequent      biguates whether the verb belongs to the second or
of the verbs whose infinitive is at a minimal edit distance from     third conjugation and thus helps predict the infini-
mourir. This assumption however is not plausible. Witness
the case of the verb dire, whose present 2 PL dites is very com-     tive.
monly overregularized to disez. The most frequent phonolog-
ical neighbor of dire is lire; however, according to the lexique     3    Quantifying joint predictiveness
database (New et al., 2007), dire is 8 times more frequent
than lire in written French, and 17 times in spoken French.
It is thus not plausible that analogical regularization is driven    To assess the importance of joint predictiveness,
by the closest neighbor; rather, it is driven by general pat-        we build on previous proposals by (Bonami and
terns applying across lexemes—for instance, dire is one of a         Boyé, 2014) and (Bonami and Luı́s, 2014) on
handful of exceptions to the regular Xons ∼ Xez alternation
between 1 PL and 2 PL, that is overwhelmingly prevalent both         the evaluation of predictiveness from a single
in type and token frequency.                                         paradigm cell, themselves improving on (Acker-
                            20                                                                                                                                 100
                                   Mean forms per lemma                                                         % of lemmas with more
                                                                                                                                  Meanthan  1 form
                                                                                                                                        forms       (right)
                                                                                                                                               per lemma


                                                                                                                                                               80
                            15


                                                                                                                                                               60
          Number of forms


                                                                                                                                                                    %of lemmas
                            10


                                                                                                                                                               40


                             5
                                                                                                                                                               20


                            0                                                                                                                                  0
                             0.0          0.2              0.4         0.6            0.8                1.0           1.2              1.4              1.6
                                                                                size of the corpus                                                       1e9


Figure 1: Mean number of forms per lemma and proportion of lemmas with multiple forms as a function
of vocabulary size (FrWaC corpus)

                                                          INF      1 SG      2 SG           3 SG         1 PL           2 PL           3 PL
                                      LEVAR               l@"vaR   "lEvu     "lEv5S         "lEv5        l@"v5muS       l@"vaiS        "lEv5̃ũ
                                      NOTAR               nu"taR   "nOtu     "nOt5S         "nOt5        nu"t5muS       nu"taiS        "nOt5̃ũ
                                      RECEBER             r@s@"beR r@"sebu r@"sEb@S r@"sEb@ r@s@"bemuS r@s@"b5iS r@"sEb5̃ı̃
                                      RECORRER            r@ku"reR r@"koru r@"kOr@S r@"kOr@ r@ku"remuS r@ku"r5iS r@"kOr5̃ı̃
                                      SEGUIR              s@"giR   "sigu     "sEg@S         "sEg@        s@"gimuS       s@"giS         "sEg5̃ı̃
                                      SUBIR               su"biR   "subu     "sOb@S         "sOb@        su"bimuS       su"biS         "sOb5̃ı̃

          Table 2: Selected European Portuguese verbs in the infinitive and present indicative


man et al., 2009) and (Ackerman and Malouf,                                                  classification from raw data is an open research
2013). Specifically, for every pair of paradigm                                              question,3 we opportunistically use the algorithm
cells A and B, we infer a classification of pat-                                             sketched in (2) that we know to give satisfactory
terns of alternation relating these two cells. These                                         results for the languages at hand.
patterns are then used to define a random vari-
                                                                                             (2)         a. For any pair of strings hφ1 , φ2 i, find
able A∼B over pairs of forms corresponding to
                                                                                                            strings α, γ, β1 , β2 , δ1 and δ2 such that
the distribution of patterns, and a random vari-
                                                                                                            φ1 = αβ1 γδ1 and φ2 = αβ2 γδ2 ,
able AA∼B classifying possible form for A on
                                                                                                            where β1 and β2 have the same length;
the basis of the patterns they could possibly in-
                                                                                                            segments in β1 and β2 (resp. δ1 and
stantiate. For instance, going back to the data
                                                                                                            δ2 ) match in category (vowel vs. con-
in Table 1, INF∼PST. PTCP partitions the set of
                                                                                                            sonant), starting from the left; and
pairs in 5 subsets corresponding to the patterns
                                                                                                            the length of α is maximal. Clas-
Xe∼Xe, XiK∼Xi, XiK∼Xy, XKiK∼XEK and
                                                                                                            sify the pair as instantiating pattern
XuKiK∼XOK, while INF INF∼PST. PTCP partitions the
                                                                                                            [Xβ1 Y δ1 ∼Xβ2 Y δ2 / α γ ].
set of infinitive forms in 4 sets, depending on
whether they end in -e, -uKiK, -V KiK with V 6= u,                                                       b. For      all       patterns       instantiat-
or -XiK with X 6= K.                                                                                        ing       the        same        alternation
                                                                                                            [x∼y / α1 γ1 ], . . . , [x∼y / αn γn ],
   H(A∼B | AA∼B ), the conditional entropy of
                                                                                                            determine maximally specific feature de-
the pattern relating A and B given relevant fea-
                                                                                                            scriptions of sets of strings {α1 , . . . , αn }
tures of the form filling A, evaluates how well A
                                                                                                     3
predicts B.                                                                                       The problem can be presented as that of finding, for any
                                                                                             set of pairs of forms, a minimal set of subsequential finite-
   Crucial to this computation is the choice of                                              state transducers such that one of the transducers maps each
a strategy of exhaustive classification of patterns                                          input form to the correct output. Even if that problem were
                                                                                             solved, it is entirely possible for there to be more than one
of alternation between pairs of forms. Since the                                             such minimal set, leading to competing classifications of the
design of an algorithm finding an optimal such                                               pairs and thus to different assessments of predictiveness.
           and {γ1 , . . . , γn }, using (Albright,               a form with a 3-way contrast of theme vowels,
           2002)’s Minimal Generalization strat-                  such as the infinitive, and a form with stress on
           egy.                                                   the prethematic vowel, such as the present 3 SG.
                                                                  This corresponds to the observation in (Bonami
   Joint predictiveness can then be assessed look-                and Luı́s, 2014) that such pairs of cells have com-
ing at joint random variables: predicting C from                  plementary predictive power. The sheer number
A and B is evaluated by (3): we assess the uncer-                 of alternative principal part systems highlights the
tainty associated with predicting both the pattern                arbitrariness of the choice of a particular set of
relating A to C and the pattern relating B to C,                  principal parts (Matthews, 1972; Ackerman et al.,
given knowledge of relevant properties of A, rel-                 2009; Blevins, in press).
evant properties of B, and the pattern relating A                    Turning to French, we found no set of prin-
and B. Notice that this easily generalizes to pre-                cipal parts of cardinality 2, as already observed
diction given joint knowledge of n different cells.               by (Stump and Finkel, 2013). This is testament
(3) H(A∼C, B∼C | AA∼C , BB∼C , A∼B)                               to the prevalence of erratic stem allomorphy in
                                                                  French conjugation, leading to numerous situa-
   Table 3 shows the average entropy from 1 or 2                  tions of unpredictibility local to a small subpart of
cells for 5000 French verbs and 2000 European                     the paradigm (Bonami and Boyé, 2002). However,
Portuguese verbs respectively.4 In both languages,                this observation should be modalized in two ways.
knowing a second cell significantly reduces uncer-                   First, our method yields 396 sets of principal
tainty on average.                                                parts of cardinality 3, whereas (Stump and Finkel,
                                                                  2013) found no set of cardinality smaller than 5.
    # of predictor cells       French       Portuguese            This difference seems to be due to the fact that,
                                                                  under the methodology used here, the applicabil-
    1                           0.1670           0.1649
                                                                  ity of a pattern of alternation is sensitive to phono-
    2                           0.0540           0.0818
                                                                  tactic properties of the stem (thanks to the use
Table 3: Average conditional entropy when pre-                    of the Minimal Generalization strategy in (2b)),
dicting from 1 or 2 cells                                         whereas (Stump and Finkel, 2013) only look at ex-
                                                                  ponence. Arguably then, the present method pro-
                                                                  vides a superior evaluation of the diagnostic value
4    Principal part systems                                       of paradigm cells.
                                                                     Second, although there is no pair of cells with
A system of principal parts is a set of paradigm
                                                                  categorical diagnostic value, some come very
cells such that knowledge of the forms filling
                                                                  close. There are 25 pairs of cells (among which
these cells is sufficient to derive the rest of the
                                                                  pairs of very frequent cells such as the present
paradigm (Hockett, 1967; Matthews, 1972; Finkel
                                                                  3 PL and the infinitive) such that predicting any
and Stump, 2007; Stump and Finkel, 2013).5 The
                                                                  other cell from this pair yields an entropy below
validity of a principal part system thus rests on
                                                                  0.005. This means that given knowledge of these
the existence of systematic categorical joint pre-
                                                                  two cells, trying to guess any other cell will be
dictiveness; and the evaluation method outlined in
                                                                  about as hard as predicting an event with a 99.95%
the preceding section may be used to infer sets of
                                                                  probability of occurrence.6 This casts doubts both
principal parts.
                                                                  on the pedagogical value of categorical principal
   Exploring this issue on the European Por-
                                                                  part systems and on the usefulness of principal
tuguese dataset, we find that there are 177 such
                                                                  part systems, as opposed to graded evaluations of
systems for Portuguese. All these systems include
                                                                  joint predictiveness, for the study of morphologi-
    4
      The French dataset was extracted from Flexique              cal competence.
(Bonami et al., 2014). The Portuguese dataset was derived
from the University of Coimbra pronunciation dictionary           Acknowledgments
(Veiga et al., 2012) for the purpose of (Bonami and Luı́s,
2013).
    5
      We focus here on traditional ‘static’ principal part sys-
                                                                  This work was partially supported by a public
tems. See (Bonami and Boyé, 2007; Finkel and Stump, 2007;        grant overseen by the French National Research
Stump and Finkel, 2013) for alternative formulations of the
                                                                      6
notion of principal part where different sets of paradigm cells         If X is a binary random variable one of whose values has
serve as predictor depending on the lexeme.                       a probability of 0.9995, H(X) > 0.0062.
 Agency (ANR) as part of the “Investissements                et modélisation(s)., number 22 in Mémoires de la
 d’Avenir” program (reference: ANR-10-LABX-                  Société de Linguistique de Paris, pages 111–151.
                                                             Peeters, Leuven.
 0083).
                                                          [Bonami et al.2014] Olivier Bonami, Gauthier Caron,
                                                             and Clément Plancq. 2014. Construction d’un
 References                                                  lexique flexionnel phonétisé libre du français. In
[Ackerman and Malouf2013] Farrell Ackerman and               Franck Neveu, Peter Blumenthal, Linda Hriba, An-
   Robert Malouf. 2013. Morphological organization:          nette Gerstenberg, Judith Meinschaefer, and Sophie
   the low conditional entropy conjecture. Language,         Prévost, editors, Actes du quatrième Congrès Mon-
   89:429–464.                                               dial de Linguistique Française, pages 2583–2596.

[Ackerman et al.2009] Farrell Ackerman, James P.          [Finkel and Stump2007] Raphael Finkel and Gregory T.
   Blevins, and Robert Malouf.         2009.      Parts       Stump. 2007. Principal parts and morphological
   and wholes: implicative patterns in inflectional           typology. Morphology, 17:39–75.
   paradigms. In James P. Blevins and Juliette Blevins,
   editors, Analogy in Grammar, pages 54–82. Oxford       [Hockett1967] Charles F. Hockett. 1967. The Yawel-
   University Press, Oxford.                                 mani basic verb. Language, 43:208–222.

[Albright2002] Adam C. Albright. 2002. The Identifi-      [Kilani-Schoch and Dressler2005] Marianne        Kilani-
    cation of Bases in Morphological Paradigms. Ph.D.         Schoch and Wolfgang Dressler. 2005. Morphologie
    thesis, University of California, Los Angeles.            naturelle et flexion du verbe français. Gunter Narr
                                                              Verlag, Tübingen.
[Baroni et al.2009] Marco Baroni, Silvia Bernardini,
   Adriano Ferraresi, and Eros Zanchetta. 2009. The       [Mateus and d’Andrade2000] Maria Helena Mateus
   wacky wide web: A collection of very large lin-           and Ernesto d’Andrade. 2000. The Phonology of
   guistically processed web-crawled corpora. In Lan-        Portuguese. Oxford University Press, Oxford.
   guage Resources and Evaluation, volume 43, pages
   209–226.                                               [Matthews1972] P. H. Matthews. 1972. Inflectional
                                                             Morphology. A Theoretical Study Based on Aspects
[Blevinsin press] James P. Blevins. in press. Word and       of Latin Verb Conjugation. Cambridge University
    Paradigm Morphology. Oxford University Press,            Press, Cambridge.
    Oxford.
                                                          [New et al.2007] Boris New, Marc Brysbaert, Jean
[Bonami and Boyé2002] Olivier Bonami and Gilles             Veronis, and Christophe Pallier. 2007. The use of
   Boyé. 2002. Suppletion and stem dependency in in-        film subtitles to estimate word frequencies. Applied
   flectional morphology. In Franck Van Eynde, Lars          Psycholinguistics, 28:661–677.
   Hellan, and Dorothee Beerman, editors, The Pro-
   ceedings of the HPSG ’01 Conference, pages 51–70.      [Sagot2010] Benoı̂t Sagot. 2010. The Lefff, a freely
   CSLI Publications, Stanford.                               available and large-coverage morphological and
                                                              syntactic lexicon for French. In Proceedings of
[Bonami and Boyé2007] Olivier Bonami and Gilles
                                                              LREC 2010.
   Boyé. 2007. Remarques sur les bases de la conju-
   gaison. In Elisabeth Delais-Roussarie and Laurence
                                                          [Sims2015] Andrea Sims. 2015. Inflectional defective-
   Labrune, editors, Des sons et des sens, pages 77–90.
                                                              ness. Cambridge University Press, Cambridge.
   Hermès, Paris.

[Bonami and Boyé2014] Olivier Bonami and Gilles          [Stump and Finkel2013] Gregory T. Stump and
   Boyé. 2014. De formes en thèmes. In Florence Vil-        Raphael Finkel. 2013. Morphological Typology:
   loing, Sarah Leroy, and Sophie David, editors, Foi-        From Word to Paradigm. Cambridge University
   sonnements morphologiques. Etudes en hommage à            Press, Cambridge.
   Françoise Kerleroux, pages 17–45. Presses Univer-
   sitaires de Paris Ouest.                               [Thymé et al.1994] Ann Thymé, Farrell Ackerman, and
                                                             Jeff Elman. 1994. Finnish nominal inflection:
[Bonami and Luı́s2013] Olivier Bonami and Ana R.             Paradigmatic patterns and token analogy. In Su-
   Luı́s. 2013. Causes and consequences of complex-          san D. Lima, Roberta Corrigan, and Gregory K.
   ity in portuguese verbal paradigms. In 9th Mediter-       Iverson, editors, The Reality of Linguistic Rules.
   ranean Morphology Meeting, Dubrovnik, septem-             John Benjamins.
   bre.
                                                          [Veiga et al.2012] Arlindo Oliveira da Veiga, Sara Can-
[Bonami and Luı́s2014] Olivier Bonami and Ana R.              deias, and Fernando Perdigão. 2012. Generating
   Luı́s. 2014. Sur la morphologie implicative dans           a pronunciation dictionary for european portuguese
   la conjugaison du portugais : une étude quantita-         using a joint-sequence model with embedded stress
   tive. In Jean-Léonard Léonard, editor, Morphologie       assignment. Journal of the Brazilian Computer So-
   flexionnelle et dialectologie romane. Typologie(s)         ciety, 88.
[Wurzel1984] Wolfgang Ulrich Wurzel. 1984. Flex-
   ionsmorphologie und Natürlichkeit. Ein Beitrag
   zur morphologischen Theoriebildung. Akademie-
   Verlag, Berlin. Translated as (?).

</pre>