=Paper=
{{Paper
|id=Vol-1347/paper01
|storemode=property
|title=Implicative structure and joint predictiveness
|pdfUrl=https://ceur-ws.org/Vol-1347/paper01.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/BonamiB15
}}
==Implicative structure and joint predictiveness==
Implicative structure and joint predictiveness Olivier Bonami Sacha Beniamine Université Paris-Sorbonne Université Paris Diderot Laboratoire de linguistique formelle Laboratoire de linguistique formelle & (U. Paris Diderot & CNRS) Alpage (Inria & U. Paris Diderot) olivier.bonami@paris-sorbonne.fr 1 Introduction Overall, no other cell in the paradigm is a very good predictor of the past participle. However, (Ackerman et al., 2009) define the PARADIGM joint knowledge of some pairs of paradigm cells C ELL F ILLING P ROBLEM (PCFP), which we radically improves the quality of prediction. For paraphrase in (1), as the cornerstone of the study instance, joint knowledge of the infinitive and of inflectional paradigms. some present plural form removes all uncertainty (1) How do speakers know how to inflect the full in the sample in Table 1: knowledge of the in- paradigm of a lexeme on the basis of expo- finitive form partitions the set of lexemes in two sure to only some of its forms? classes within which the PRS .3 PL is fully predic- tive of the past participle. (Ackerman et al., 2009) go on to argue that speakers rely on knowledge of the IMPLICA - Although the existence of joint predictiveness is TIVE STRUCTURE of paradigms (Wurzel, 1984): acknowledged in the literature (Matthews, 1972; paradigms are structured in such a way that there Thymé et al., 1994; Ackerman et al., 2009; Stump are reliable correlations between the form filling and Finkel, 2013; Blevins, in press; Sims, 2015), one paradigm cell A and the form filling another little attention has been given to quantifying its im- cell B. The reliability of these correlations de- portance. In this paper we first give further argu- pends on the particular pair of cells A and B un- ments that joint predictiveness is a crucial aspect der scrutiny; it can be assessed quantitatively by of implicative structure, and that a careful empiri- examining the statistical distribution of operations cal examination of joint predictiveness is essential required to go from A to B in the lexicon. to both linguistic and psycholinguistic assessment This presentation focuses on one particular of the PCFP and related issues. We then propose aspect of implicative structure, which we call and illustrate a method for the quantitative evalua- JOINT PREDICTIVENESS . In some situations, joint tion of joint predictiveness. We end with a discus- knowledge of two paradigm cells A and B pro- sion of principal part systems. vides more information on cell C than could be inferred from knowledge of either A or B. Table 1 2 The relevance of joint predictiveness below provides a simple example from French, us- We start by establishing that speakers do have the ing lexemes illustrating 7 patterns corresponding opportunity to use joint predictiveness. Figure 1 to of 95% of the verbs documented in the Flex- plots how the number of forms per lemma evolves ique phoneticized lexicon (Bonami et al., 2014). when walking through the 1.6 billion words of In French conjugation, predicting the past par- the FrWaC web corpus (Baroni et al., 2009), re- ticiple from the infinitive is hard, because of the stricting attention to the 6847 verbs documented opacity between second conjugation infinitives, in the Lefff lexicon (Sagot, 2010) to compensate such as B ÂTIR, and some third conjugation in- for tagging errors.1 Note that 1.6 billion words is finitives, such as TENIR , OUVRIR , MOURIR. Pre- 1 dicting the past participle from present SG forms Note that this restriction leads to overestimating the av- erage number of forms per lemma, as neologisms, very rare is also hard, this time because some first conju- words and hapaxes not present in the lexical resource are not gation verbs with a stem in -i (e.g. RELIER) are included. We are counting distinct forms rather than distinct not distinguished from second conjugation verbs. paradigm cells, as there is currently no tagger for French that reliably disambiguates homographic forms of the same lex- A different subset of first conjugation verbs (e.g. eme. French verbs have 51 paradigm cells, and the average RATISSER ) raises similar problems for PL forms. number of distinct forms per verb in the Lefff lexicon is 35.8. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org Lexeme INF PRS .3 SG PRS .3 PL PST. PTCP # LIVRER ‘deliver’ livKe livK livK livKe 4108 RELIER ‘link’ K@lje K@li K@li K@lje 210 RATISSER ‘rake’ Katise Katis Katis Katise 22 B ÂTIR ‘build’ batiK bati batis bati 327 TENIR ‘hold’ t@niK tjẼ tjEn t@ny 37 OUVRIR ‘open’ uvKiK uvK uvK uvEK 8 MOURIR ‘die’ muKiK mœK mœK mOK 1 Table 1: Exemplary paradigms for inflection patterns for 4-cell subparadigms of French verbs (data from Flexique — 5% of the lemmas illustrating minor patterns have been excluded) in the order of magnitude of the overall linguis- ness from two cells to infer the likely form of the tic exposure of an adult speaker. The distribution participle. strongly suggests that, as speakers get exposed to The final observation is that there are important more words, paradigms fill slowly on average, so linguistic generalizations that can only be obtained that predicting unknown forms stays relevant; at by looking at joint predictiveness. To supplement the same time, speakers are massively exposed to the French data presented in the introduction, let multiple forms of the same lexemes, which makes us consider a spectacular example from European knowledge of joint predictiveness relevant to ad- Portuguese, concerning the prediction of the form dressing the PCFP. of the infinitive from those of the present singu- A second relevant observation is that speakers lar. Table 2 presents relevant data. Because it does do manifest knowledge of joint predictiveness. Al- not contain a theme vowel, the present 1 SG is a though this topic deserves dedicated experimen- bad predictor of the infinitive: a priori, any present tal studies that are beyond the scope of this pa- 1 SG could correspond to a first, second or third per, circumstantial evidence from speech errors conjugation verb. 2 SG and 3 SG forms are slightly is easy to find. One common conjugation error better predictors, as they distinguish first conju- in French (Kilani-Schoch and Dressler, 2005) is gation endings (-5S,-5) from second/third conjuga- to use mouru as the past participle of MOURIR, tion endings (-@S,-@); the distinction between the whereas mouri is almost never used (140 rele- two last conjugations is still neutralized. How- vant occurrences of mouru in the full FrWaC cor- ever, if a verb has a mid prethematic vowel in the pus, 0 or mouri). This would be surprising if 2 SG and 3 SG, the shape of that vowel is raised speakers were analogizing from a single paradigm to high-mid in the 1 SG in the second conjugation cell: given knowledge of the sole infinitive, mouri (witness RECEBER , RECORRER), and to high in would be the most likely regularization; given the third conjugation (witness SEGUIR , SUBIR). knowledge of some present form, mouré or meuré Whether one sees this phenomenon as the result would be expected.2 Thus the property speakers of a synchronic vowel harmony in the 1 SG oper- seem to be sensitive to is the existence of an al- ating prior to theme vowel deletion (Mateus and lomorphic relation between the infinitive and the d’Andrade, 2000) or as a historical remnant with present stem—hence, employing joint predictive- no synchronic motivation, it remains that on the 2 surface, for verbs with a mid prethematic vowel A reviewer points out that if speech errors are due to analogy to the nearest (frequent) neighbor, mouru is unsur- in the 2 SG and 3 SG, knowledge of the 1 SG disam- prising, as courir (past participle couru) is the most frequent biguates whether the verb belongs to the second or of the verbs whose infinitive is at a minimal edit distance from third conjugation and thus helps predict the infini- mourir. This assumption however is not plausible. Witness the case of the verb dire, whose present 2 PL dites is very com- tive. monly overregularized to disez. The most frequent phonolog- ical neighbor of dire is lire; however, according to the lexique 3 Quantifying joint predictiveness database (New et al., 2007), dire is 8 times more frequent than lire in written French, and 17 times in spoken French. It is thus not plausible that analogical regularization is driven To assess the importance of joint predictiveness, by the closest neighbor; rather, it is driven by general pat- we build on previous proposals by (Bonami and terns applying across lexemes—for instance, dire is one of a Boyé, 2014) and (Bonami and Luı́s, 2014) on handful of exceptions to the regular Xons ∼ Xez alternation between 1 PL and 2 PL, that is overwhelmingly prevalent both the evaluation of predictiveness from a single in type and token frequency. paradigm cell, themselves improving on (Acker- 20 100 Mean forms per lemma % of lemmas with more Meanthan 1 form forms (right) per lemma 80 15 60 Number of forms %of lemmas 10 40 5 20 0 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 size of the corpus 1e9 Figure 1: Mean number of forms per lemma and proportion of lemmas with multiple forms as a function of vocabulary size (FrWaC corpus) INF 1 SG 2 SG 3 SG 1 PL 2 PL 3 PL LEVAR l@"vaR "lEvu "lEv5S "lEv5 l@"v5muS l@"vaiS "lEv5̃ũ NOTAR nu"taR "nOtu "nOt5S "nOt5 nu"t5muS nu"taiS "nOt5̃ũ RECEBER r@s@"beR r@"sebu r@"sEb@S r@"sEb@ r@s@"bemuS r@s@"b5iS r@"sEb5̃ı̃ RECORRER r@ku"reR r@"koru r@"kOr@S r@"kOr@ r@ku"remuS r@ku"r5iS r@"kOr5̃ı̃ SEGUIR s@"giR "sigu "sEg@S "sEg@ s@"gimuS s@"giS "sEg5̃ı̃ SUBIR su"biR "subu "sOb@S "sOb@ su"bimuS su"biS "sOb5̃ı̃ Table 2: Selected European Portuguese verbs in the infinitive and present indicative man et al., 2009) and (Ackerman and Malouf, classification from raw data is an open research 2013). Specifically, for every pair of paradigm question,3 we opportunistically use the algorithm cells A and B, we infer a classification of pat- sketched in (2) that we know to give satisfactory terns of alternation relating these two cells. These results for the languages at hand. patterns are then used to define a random vari- (2) a. For any pair of strings hφ1 , φ2 i, find able A∼B over pairs of forms corresponding to strings α, γ, β1 , β2 , δ1 and δ2 such that the distribution of patterns, and a random vari- φ1 = αβ1 γδ1 and φ2 = αβ2 γδ2 , able AA∼B classifying possible form for A on where β1 and β2 have the same length; the basis of the patterns they could possibly in- segments in β1 and β2 (resp. δ1 and stantiate. For instance, going back to the data δ2 ) match in category (vowel vs. con- in Table 1, INF∼PST. PTCP partitions the set of sonant), starting from the left; and pairs in 5 subsets corresponding to the patterns the length of α is maximal. Clas- Xe∼Xe, XiK∼Xi, XiK∼Xy, XKiK∼XEK and sify the pair as instantiating pattern XuKiK∼XOK, while INF INF∼PST. PTCP partitions the [Xβ1 Y δ1 ∼Xβ2 Y δ2 / α γ ]. set of infinitive forms in 4 sets, depending on whether they end in -e, -uKiK, -V KiK with V 6= u, b. For all patterns instantiat- or -XiK with X 6= K. ing the same alternation [x∼y / α1 γ1 ], . . . , [x∼y / αn γn ], H(A∼B | AA∼B ), the conditional entropy of determine maximally specific feature de- the pattern relating A and B given relevant fea- scriptions of sets of strings {α1 , . . . , αn } tures of the form filling A, evaluates how well A 3 predicts B. The problem can be presented as that of finding, for any set of pairs of forms, a minimal set of subsequential finite- Crucial to this computation is the choice of state transducers such that one of the transducers maps each a strategy of exhaustive classification of patterns input form to the correct output. Even if that problem were solved, it is entirely possible for there to be more than one of alternation between pairs of forms. Since the such minimal set, leading to competing classifications of the design of an algorithm finding an optimal such pairs and thus to different assessments of predictiveness. and {γ1 , . . . , γn }, using (Albright, a form with a 3-way contrast of theme vowels, 2002)’s Minimal Generalization strat- such as the infinitive, and a form with stress on egy. the prethematic vowel, such as the present 3 SG. This corresponds to the observation in (Bonami Joint predictiveness can then be assessed look- and Luı́s, 2014) that such pairs of cells have com- ing at joint random variables: predicting C from plementary predictive power. The sheer number A and B is evaluated by (3): we assess the uncer- of alternative principal part systems highlights the tainty associated with predicting both the pattern arbitrariness of the choice of a particular set of relating A to C and the pattern relating B to C, principal parts (Matthews, 1972; Ackerman et al., given knowledge of relevant properties of A, rel- 2009; Blevins, in press). evant properties of B, and the pattern relating A Turning to French, we found no set of prin- and B. Notice that this easily generalizes to pre- cipal parts of cardinality 2, as already observed diction given joint knowledge of n different cells. by (Stump and Finkel, 2013). This is testament (3) H(A∼C, B∼C | AA∼C , BB∼C , A∼B) to the prevalence of erratic stem allomorphy in French conjugation, leading to numerous situa- Table 3 shows the average entropy from 1 or 2 tions of unpredictibility local to a small subpart of cells for 5000 French verbs and 2000 European the paradigm (Bonami and Boyé, 2002). However, Portuguese verbs respectively.4 In both languages, this observation should be modalized in two ways. knowing a second cell significantly reduces uncer- First, our method yields 396 sets of principal tainty on average. parts of cardinality 3, whereas (Stump and Finkel, 2013) found no set of cardinality smaller than 5. # of predictor cells French Portuguese This difference seems to be due to the fact that, under the methodology used here, the applicabil- 1 0.1670 0.1649 ity of a pattern of alternation is sensitive to phono- 2 0.0540 0.0818 tactic properties of the stem (thanks to the use Table 3: Average conditional entropy when pre- of the Minimal Generalization strategy in (2b)), dicting from 1 or 2 cells whereas (Stump and Finkel, 2013) only look at ex- ponence. Arguably then, the present method pro- vides a superior evaluation of the diagnostic value 4 Principal part systems of paradigm cells. Second, although there is no pair of cells with A system of principal parts is a set of paradigm categorical diagnostic value, some come very cells such that knowledge of the forms filling close. There are 25 pairs of cells (among which these cells is sufficient to derive the rest of the pairs of very frequent cells such as the present paradigm (Hockett, 1967; Matthews, 1972; Finkel 3 PL and the infinitive) such that predicting any and Stump, 2007; Stump and Finkel, 2013).5 The other cell from this pair yields an entropy below validity of a principal part system thus rests on 0.005. This means that given knowledge of these the existence of systematic categorical joint pre- two cells, trying to guess any other cell will be dictiveness; and the evaluation method outlined in about as hard as predicting an event with a 99.95% the preceding section may be used to infer sets of probability of occurrence.6 This casts doubts both principal parts. on the pedagogical value of categorical principal Exploring this issue on the European Por- part systems and on the usefulness of principal tuguese dataset, we find that there are 177 such part systems, as opposed to graded evaluations of systems for Portuguese. All these systems include joint predictiveness, for the study of morphologi- 4 The French dataset was extracted from Flexique cal competence. (Bonami et al., 2014). The Portuguese dataset was derived from the University of Coimbra pronunciation dictionary Acknowledgments (Veiga et al., 2012) for the purpose of (Bonami and Luı́s, 2013). 5 We focus here on traditional ‘static’ principal part sys- This work was partially supported by a public tems. See (Bonami and Boyé, 2007; Finkel and Stump, 2007; grant overseen by the French National Research Stump and Finkel, 2013) for alternative formulations of the 6 notion of principal part where different sets of paradigm cells If X is a binary random variable one of whose values has serve as predictor depending on the lexeme. a probability of 0.9995, H(X) > 0.0062. Agency (ANR) as part of the “Investissements et modélisation(s)., number 22 in Mémoires de la d’Avenir” program (reference: ANR-10-LABX- Société de Linguistique de Paris, pages 111–151. Peeters, Leuven. 0083). [Bonami et al.2014] Olivier Bonami, Gauthier Caron, and Clément Plancq. 2014. Construction d’un References lexique flexionnel phonétisé libre du français. In [Ackerman and Malouf2013] Farrell Ackerman and Franck Neveu, Peter Blumenthal, Linda Hriba, An- Robert Malouf. 2013. Morphological organization: nette Gerstenberg, Judith Meinschaefer, and Sophie the low conditional entropy conjecture. Language, Prévost, editors, Actes du quatrième Congrès Mon- 89:429–464. dial de Linguistique Française, pages 2583–2596. [Ackerman et al.2009] Farrell Ackerman, James P. [Finkel and Stump2007] Raphael Finkel and Gregory T. Blevins, and Robert Malouf. 2009. Parts Stump. 2007. Principal parts and morphological and wholes: implicative patterns in inflectional typology. Morphology, 17:39–75. paradigms. In James P. Blevins and Juliette Blevins, editors, Analogy in Grammar, pages 54–82. Oxford [Hockett1967] Charles F. Hockett. 1967. The Yawel- University Press, Oxford. mani basic verb. Language, 43:208–222. [Albright2002] Adam C. Albright. 2002. The Identifi- [Kilani-Schoch and Dressler2005] Marianne Kilani- cation of Bases in Morphological Paradigms. Ph.D. Schoch and Wolfgang Dressler. 2005. Morphologie thesis, University of California, Los Angeles. naturelle et flexion du verbe français. Gunter Narr Verlag, Tübingen. [Baroni et al.2009] Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The [Mateus and d’Andrade2000] Maria Helena Mateus wacky wide web: A collection of very large lin- and Ernesto d’Andrade. 2000. The Phonology of guistically processed web-crawled corpora. In Lan- Portuguese. Oxford University Press, Oxford. guage Resources and Evaluation, volume 43, pages 209–226. [Matthews1972] P. H. Matthews. 1972. Inflectional Morphology. A Theoretical Study Based on Aspects [Blevinsin press] James P. Blevins. in press. Word and of Latin Verb Conjugation. Cambridge University Paradigm Morphology. Oxford University Press, Press, Cambridge. Oxford. [New et al.2007] Boris New, Marc Brysbaert, Jean [Bonami and Boyé2002] Olivier Bonami and Gilles Veronis, and Christophe Pallier. 2007. The use of Boyé. 2002. Suppletion and stem dependency in in- film subtitles to estimate word frequencies. Applied flectional morphology. In Franck Van Eynde, Lars Psycholinguistics, 28:661–677. Hellan, and Dorothee Beerman, editors, The Pro- ceedings of the HPSG ’01 Conference, pages 51–70. [Sagot2010] Benoı̂t Sagot. 2010. The Lefff, a freely CSLI Publications, Stanford. available and large-coverage morphological and syntactic lexicon for French. In Proceedings of [Bonami and Boyé2007] Olivier Bonami and Gilles LREC 2010. Boyé. 2007. Remarques sur les bases de la conju- gaison. In Elisabeth Delais-Roussarie and Laurence [Sims2015] Andrea Sims. 2015. Inflectional defective- Labrune, editors, Des sons et des sens, pages 77–90. ness. Cambridge University Press, Cambridge. Hermès, Paris. [Bonami and Boyé2014] Olivier Bonami and Gilles [Stump and Finkel2013] Gregory T. Stump and Boyé. 2014. De formes en thèmes. In Florence Vil- Raphael Finkel. 2013. Morphological Typology: loing, Sarah Leroy, and Sophie David, editors, Foi- From Word to Paradigm. Cambridge University sonnements morphologiques. Etudes en hommage à Press, Cambridge. Françoise Kerleroux, pages 17–45. Presses Univer- sitaires de Paris Ouest. [Thymé et al.1994] Ann Thymé, Farrell Ackerman, and Jeff Elman. 1994. Finnish nominal inflection: [Bonami and Luı́s2013] Olivier Bonami and Ana R. Paradigmatic patterns and token analogy. In Su- Luı́s. 2013. Causes and consequences of complex- san D. Lima, Roberta Corrigan, and Gregory K. ity in portuguese verbal paradigms. In 9th Mediter- Iverson, editors, The Reality of Linguistic Rules. ranean Morphology Meeting, Dubrovnik, septem- John Benjamins. bre. [Veiga et al.2012] Arlindo Oliveira da Veiga, Sara Can- [Bonami and Luı́s2014] Olivier Bonami and Ana R. deias, and Fernando Perdigão. 2012. Generating Luı́s. 2014. Sur la morphologie implicative dans a pronunciation dictionary for european portuguese la conjugaison du portugais : une étude quantita- using a joint-sequence model with embedded stress tive. In Jean-Léonard Léonard, editor, Morphologie assignment. Journal of the Brazilian Computer So- flexionnelle et dialectologie romane. Typologie(s) ciety, 88. [Wurzel1984] Wolfgang Ulrich Wurzel. 1984. Flex- ionsmorphologie und Natürlichkeit. Ein Beitrag zur morphologischen Theoriebildung. Akademie- Verlag, Berlin. Translated as (?).