1. Introduction

Grammar Assistance Using Syntactic Structures (GAUSS)

Olga Zamaraeva

Lorena S. Allegue

Carlos Gómez-Rodríguez

Margarita Alonso-Ramos

Anastasiia Ogneva

2 0 Universidade da Coruña, CITIC, Department of Computer Science and Information Technologies. 15071 A Coruña , Spain 1 Universidade da Coruña, CITIC, Department of Humanities (“Letras”). 15071 A Coruña , Spain 2 Universidade de Santiago de Compostela, Department of Developmental Psychology , 15782 Santiago de Compostela , Spain

Automatic grammar coaching serves an important purpose of advising on standard grammar varieties while not imposing social pressures or reinforcing established social roles. Such systems already exist but most of them are for English and few of them ofer meaningful feedback. Furthermore, they typically rely completely on neural methods and require huge computational resources which most of the world cannot aford. We propose a grammar coaching system for Spanish that relies on (i) a rich linguistic formalism capable of giving informative feedback; and (ii) a faster parsing algorithm which makes using this formalism practical in a real-world application. The approach is feasible for any language for which there is a computerized grammar and is less reliant on expensive and environmentally costly neural methods. We seek to contribute to Greener AI and to address global education challenges by raising the standards of inclusivity and engagement in grammar coaching.

eol>grammar engineering grammar coaching second language acquisition HPSG syntactic theory syntax parsing

1. Introduction

S [︃PERNUM

GEN f3epml]︃⎤⎦⟩ ⎥⎥⎥⎦⎥⎢⎢⎢⎢⎢RELS ⎤⎡adj-fem-pl [︃PERNUM

2. State of the art at the start of the project

Most grammar coaching systems available today are purely statistical and do not use explicit linguistic knowledge. Based on purely statistical methods and lacking interpretability, they “guess” based on the context and are not aware of concepts like agreement. Their feedback is divorced from the methodology of suggesting a better sentence, opening possibilities for wrong feedback. Such systems are often only available for English, because their neural architectures require huge quantities of training data. Such systems are also ecologically problematic[1]. the SRG. of deploying the grammar on some data. The grammar itself contains the types, not the instances. The types are instantiated through interfacing with the lexicon and, in some cases, an external morphophonological analyzer.

The HPSG theory covers many syntactic phenomena and has been developed and tested using a variety of data from a variety of languages. One of the approaches to the empirical testing of this theory is implementing it on the computer and then automatically parsing data and inspecting the results for correctness and consistency. Eforts of this kind include ParGram [ 6], CoreGram [7] and DELPH-IN [8, 9] It is this approach that gave rise to

3.2. DELPH-IN Consortium 3. Methodology The DELPH-IN research consortium is an international

The GAUSS project is the result of the collaboration be- efort for grammar engineering using HPSG: Deep Lintween research areas such as CS, NLP, theoretical linguis- guistic Processing with HPSG Initiative. It is committed tics, and applied linguistics. The intersectional nature of to using a particular version of the HPSG formalism that the project is realized by the combination of NLP tech- was defined originally in [ 8]. The consortium develops niques and theoretically formalized grammars. In partic- tools such as parsers, including the parser we used in ular, the project relies on the Spanish Resource Grammar this project, the ACE parser [10]. Another set of relevant [SRG; 2, 3, 4], a grammar of Spanish implemented in tools includes the software for automatic profiling of the Head-driven Phrase Structure Grammar formalism test data known as incr tsdb() (pronounced ‘tsdb++’) (HPSG). [11, 12] and a related tool “full-forest treebanker” (ftb) [13]. These tools allow us to inspect diferences between 3.1. HPSG syntax theory diferent grammar versions systematically. Grammars are tested on sentences automatically, using Head-driven Phrase Structure Grammar [HPSG; 5] is a a parser. The first time a grammar is run on a sentence, constraint unification theory of syntax. A sentence is an expert must verify the correctness of the output. Often analyzed as a structure where parts can be constrained it makes sense to do this by looking at the semantic (deto be identical to each other. For example, a verb’s agree- pendency) structure; we can assume that if the semantics ment values (e.g. third person) can be constrained to be is correct, then the syntactic structure that corresponds identical to the agreement values of the subject of the to it is adequate. The semantics in DELPH-IN grammars verb. Similarly, adjectives can be constrained with re- is modeled with Minimal Recursion Semantics formalism spect to the agreement values of the noun they modify, [MRS; 14]. An MRS structure is a bag of predications as shown in Figure 2. Crucially, ungrammatical strings encoding dependencies as well as modifier and negaof words will violate the constraints required for well- tion scope, information structure, and more. It can be formed structures and as such will not be covered by an automatically converted to a dependency structure familHPSG grammar. iar to natural language processing (NLP) practitioners

Structures like the ones in Figure 2 are instances of (Figure 3). When the parser analyzes a sentence accordmore general types and can be seen in the specific results ing to the grammar, the resulting structure includes an

MRS, the adequacy of which is easy to establish manu- ( 1 ). ally (whether the meaning of the sentence is the intended one). Adequacy of obtained analyses on corpora serve as accumulating evidence for the validity of the theory of syntax.

3.3. Spanish Resource Grammar

( 1 ) *Mis abuelos son my.3pl grandparent.masc.pl be.3pl.pres.ind personas famosos. person.fem.3pl famous.masc.pl Intended: ‘My grandparents are famous people.’ [spa; Yamada et al. 18]

At the core of the project’s methodology is the digital rep

resentation of the Spanish syntax, the Spanish Resource The grammar will detect such learner structures usGrammar [2, 3, 4]. The SRG consists of 54,510 lemmas ing what is called ‘mal-rules’ [19], a technical term for in the lexicon, 543 lexical types to instantiate those lem- HPSG types designed specifically to cover productions mas, 504 lexical rule types serving morphophonological characteristic of learners. For example, the grammar will analysis, and 226 phrasal types. It is the second largest have to have a way to ignore the incompatible agreement DELPH-IN grammar (after the English Resource Gram- values in Figure 4. mar [15, 16]). SRG was first developed prior to the ACE We achieve this by only a small set of modifications to parser and one of the objectives of the GAUSS project the grammar. We use the interface of the grammar with ended up being the complete reimplementation of the the external morphophonological analyzer to recognize SRG morphophonological interface. The outcome is that any noun or adjective as potentially belonging to either the SRG can now be used with the ACE parser [4]. As gender (this requires 40 short additional entries in the before, it relies on an external morphophonological ana- lexical rule section of the grammar, one corresponding lyzer Freeling [17]. to each possible Freeling noun or adjective tag). We as

One major outcome of this is that we could reparse sociate each such lexical rule with a special LEARNER the portions of the AnCora corpus previously released feature, so that ultimately any sentence that uses one or as the TIBIDABO treebank [3]. The previously released more of such rules can be detected as a learner producversion was partially verified for the correctness of the tion. No changes in the syntax part of the grammar are structure but the accuracy figures corresponding to that required, in principle. However, deploying the grammar verification were never reported (as far as we can tell). on the learner sentences without modifications revealed One of the outcomes of GAUSS is the re-parserd, re- a number of overgeneration issues in the original gramverified, and re-released portions of TIBIDABO (currently mar, which we were able to fix thanks to this experiment. 2291 sentences) [4]. The updated version of the SRG Overgeneration is when a grammar covers an ungramalong with the verified treebanks are open-source and matical sentence or produces a nonsensical structure for are released on GitHub: https://github.com/delph-in/srg a sentence along with the correct one(s). When we saw instances of the original grammar covering learner pro3.4. Using the SRG with learner data ductions, we investigated such cases and have found 4 syntactic types (so far) which were underconstrained The main idea behind the GAUSS project is that we can with respect to the agreement values. We have added use the SRG to model constructions characteristic of the missing agreement constraints, which resulted in relearners of Spanish (as opposed to native speakers). We duced overgeneration and ambiguity of the SRG with create a version of the SRG that is modified specifically to respect to the TIBIDABO treebank. In this way, modeling cover learner constructions, starting with gender agree- learner constructions helped us improve the analysis of ment constructions, like the one illustrated in example agreement in the original SRG. ⎢⎢STEM[︁personas]︁ ⎢ ⎢⎢ ⟨ ⎡ ⎢⎢RELS ⎣PNG ⎣ [︃PERNUM

GEN [︃PERNUM

GEN ⎤ ⎥ ⎥ ⎥ 3pl ]︃⎤⟩ ⎥⎥

⎥ masc ⎦ ⎥⎥ ⎥ ⎥ ⎥ ⎦

After all the necessary mal-rules are implemented, the items in the treebanks, we can attempt to train a neural plan is to ( 1 ) accompany each model of a learner con- supertagger for Spanish once again. struction with meaningful feedback; and (2) deploy the grammar as a web-based service such that it can be tested by learners of Spanish. This is work in progress. 4. Planning and Team

3.5. Parsing speed bottleneck

The main challenge in HPSG parsing speed is that large feature structures combinatorically lead to huge search space. As a result, HPSG parsing is comparatively slow in practice. For example, the ACE parser takes about 3 seconds per sentence on average on a corpus of 100K sentences (some of these sentences take minutes while others take less than a second) [20]. The GAUSS project attempts to address this challenge by a combination of methodologies: ( 1 ) improving analyses in the grammar to reduce meaningless ambiguity (overgeneration) and thus reduce the size of the parse chart; (2) integrating top-down parsing, and (3) filtering lexical entries and grammar rules so that fewer rules are considered at each step. Method ( 1 ) is what we employed while addressing overgeneration we discovered by deploying the grammar on the learner corpus. We have managed to improve the SRG’s performance up to 60% on sentences of length 8-10. Method (2) has been underexplored in HPSG but has seen a rekindled interest recently [21]. HPSG parers are overwhelmingly bottom-up but for long sentences, a lot can be learned immediately from the start of the sentence/top of the syntax tree, discarding many irrelevant search paths. Method (3) includes developing a neural supertagger (filter) for HPSG. The supertagger will reduce the number of possibilities the parser needs to explore by discarding unlikely word meanings. Statistical filtering was successfully applied to HPSG [22], and we are now researching how neural methods can improve the SOTA. We start with applying method (3) to the English Resource Grammar treebanks and obtain a speed-up of a factor of three compared to the baseline. However, when we attempted the method on the Spanish treebanks, the results were not yet satisfactory, apparently because the Spanish treebanks were not big enough at the start of the GAUSS project. Now that we added more verified

The GAUSS project consists of three Research Objective (RO) and four Work Packages (WP). They are summarized in Table 1. Acknowledgments The GAUSS project is funded by the European Union’s

Horizon Europe Framework Programme under the Marie Skłodowska-Curie postdoctoral fellowship grant HORIZON-MSCA-2021-PF-01 (GAUSS, grant agreement No 101063104) The project is carried out in the Language and Society Information research group (LyS) of Universidade da Coruña. [2] M. Marimon, The Spanish Resource Grammar, in: A. Carando, K. Sagae, C. Sánchez-Gutiérrez, COWS

LREC, 2010. L2H: A corpus of Spanish learner writing, Research [3] M. Marimon, N. Bel, L. Padró, Automatic selection in Corpus Linguistics 8 (2020) 17–32. of HPSG-parsed sentences for treebank construc- [19] D. Schneider, K. McCoy, Recognizing syntactic tion, Computational Linguistics 40 (2014) 523–531. errors in the writing of second language learners, [4] O. Zamaraeva, L. S. Allegue, C. Gómez-Rodríguez, in: ACL, 1998, pp. 1198–1204.

Spanish Resource Grammar version 2023, in: [20] O. Zamaraeva, C. Gómez-Rodríguez, Revisiting suCOLING-2024, in press. pertagging for HPSG, 2023. arXiv:2309.07590. [5] C. Pollard, I. Sag, Head-Driven Phrase Structure [21] L. Chiruzzo, D. Wonsever, Statistical deep parsing

Grammar, CSLI, 1994. for spanish using neural networks, in: IWPT, 2020, [6] M. Butt, T. H. King, Urdu and the parallel gram- pp. 132–144.

mar project, in: Proceedings of the 3rd work- [22] R. Dridan, Ubertagging: Joint segmentation and shop on Asian language resources and international supertagging for english, in: EMNLP, 2013, pp. standardization-Volume 12, Association for Com- 1201–1212.

putational Linguistics, 2002, pp. 1–3. [7] S. Müller, The CoreGram project: Theoretical linguistics, theory development and verification, Journal of Language Modelling 3 (2015) 21–86. [8] A. Copestake, Appendix: Definitions of typed feature structures, Natural Language Engineering 6 (2000) 109–112. [9] E. M. Bender, G. Emerson, Computational linguistics and grammar engineering, in: S. Müller, A. Abeillé, R. D. Borsley, J.-P. Koenig (Eds.), HeadDriven Phrase Structure Grammar: The handbook, 2021. [10] B. Crysmann, W. Packard, Towards eficient HPSG generation for German, a non-configurational language., in: COLING, 2012, pp. 695–710. [11] S. Oepen, D. Flickinger, Towards systematic grammar profiling. test suite technology 10 years after,

Computer Speech & Language 12 (1998) 411–435. [12] S. Oepen, [incr tsdb ()] competence and perfor

mance laboratory. user and reference manual, 1999. [13] W. Packard, UW-MRS: Leveraging a deep grammar for robotic spatial commands, SemEval 2014 (2014) 812. [14] A. Copestake, D. Flickinger, C. Pollard, I. A. Sag,

Minimal recursion semantics: An introduction, Research on language and computation 3 (2005) 281– 332. [15] D. Flickinger, On building a more eficient grammar by exploiting types, Natural Language Engineering 6 (2000) 15–28. [16] D. Flickinger, Accuracy v. robustness in grammar engineering, in: E. M. Bender, J. E. Arnold (Eds.), Language from a Cognitive Perspective: Grammar, Usage and Processing, CSLI, Stanford, CA, 2011, pp.

31–50. [17] X. Carreras, I. Chao, L. Padró, M. Padró, Freeling:

An open-source suite of language analyzers, in: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 2004. [18] A. Yamada, S. Davidson, P. Fernández-Mira,

[1]

Schwartz ,

Dodge ,

N. A.

Smith ,

Etzioni , Green

, ACM 63 ( 2020 ).