Testing the Syntactic Competence of Large Language Models with a Translation Task Dative Ambiguity in Russian

Testing the Syntactic Competence of Large Language Models with a Translation Task Dative Ambiguity in Russian EdytaJurkiewicz-Rohrbacher Universität Hamburg

Mittelweg 177 22222 Hamburg Germany

Universität Regensburg

Universitätsstr. 34 93333 Regensburg Germany

Testing the Syntactic Competence of Large Language Models with a Translation Task Dative Ambiguity in Russian 1613-0073 7519BAD1E55B7E1A428FBD4CC4D9B0B8 GROBID - A machine learning software for extracting information from scholarly documents syntax ambiguity translation task linguistics linguistic competence of large language models

The paper explores opportunities for using a translation task to obtain knowledge about the syntactic competence of large language models. It reports the accuracy achieved in a Russian-English translation task on Russian sentences containing highly ambiguous structures with two dative personal pronouns. Seven tools (systems and agents) based on pre-trained generative models were tested in their function as machine translators on a data set obtained from several web corpora. The study shows that the principles of reference assignment relevant to the syntax of human language users (referential prominence and linear order of pronouns and predicates) are also statistically relevant for pre-trained generative models.

Introduction

The rapid developments in generative pre-trained language models have resulted in agents that deliver relatively well-formed texts in various natural languages. The economic result of this process is a large number of cheaply produced but relatively well-written machine-generated texts (MGT) freely circulating and spreading online. For linguists this means that language users are being exposed to automatically generated content on an equal footing with human-generated content. The language varieties emerging from MGTs are thus quite naturally becoming an object of linguistic research next to human varieties such as slangs, dialects, idiolects, etc. Consequently, linguistics as a discipline is facing new challenges pertaining to the methods through which knowledge about artificially emerging lects can be obtained. The new question before the linguistic community is: Can the established corpus-, psycho-and neurolinguistic methods be applied in research on rapidly emerging LLMs? In general, tasks intuitively formulated as instructions, where the input has a similar structure to the output are better processed in zero-shot prompts than tasks presented in other ways, for example, as finishing an incomplete sentence [1].This study aims to explore to what extent using translation, a method well-known from typological questionnaires, can be applied to explore the syntactic competence of LLMs. In the subsections that follow, translation as a task and a selected phenomenon of dative case ambiguity in Russian are described. Section 2 presents the study design. The central quantitative results are provided in Section 3, while minor results, which might feed into future studies, are presented in Section 4.

Translation task

Translation has been used as a data elicitation task in typological and psycholinguistic research in various ways. In linguistic fieldwork, translational questionnaires are frequently constructed to examine how a particular area of grammar with a known representation in language A is represented by its 4th Workshop on Humanities-Centred Artificial Intelligence 2024 (CHAI 2024) edyta.jurkiewicz-rohrbacher@uni-hamburg.de (E. Jurkiewicz-Rohrbacher) https://www.slm.uni-hamburg.de/slavistik/personen/jurkiewicz-rohrbacher.html (E. Jurkiewicz-Rohrbacher) 0000-0001-6737-7847 (E. Jurkiewicz-Rohrbacher)

native speakers in language B [cf. 2]. In psycholinguistics, translation is in itself an object of study as a cognitive process [3], but it is also used as a method for accessing the linguistic performance and competence of multilingual speakers, e.g., for exploring their multilingual lexicons [4].

Previous studies [5,6] suggest that pre-trained generative models do capture syntactic information. However, accessing this information seems computationally demanding, and due to various practical reasons, impossible in the case of very large, commercially developed models. To address this, the present study employs a translation task to access knowledge of the principles governing syntactic parsing by having various types of pre-trained systems or agents perform a translation task.

Test Case: Dative Ambiguity in Russian

Recent reports show that neural machine translation (NMT) systems still have shortcomings in the area of co-reference resolution and lexical cohesiveness, which results in inaccurate translation of pronouns [7]. Syntactically ambiguous structures pose another type of problem [8], which I assume is challenging for the correlates of syntactic constituency parsing that might be found in generative pre-trained language models. A typical example of such structure is the prepositional phrase attachment, as in the often-cited sentence A man saw a woman with a telescope, where the phrase with a telescope can be parsed as an attribute of a woman or as an adjunct to the predicate saw. Nevertheless, ambiguity is an inherent feature of natural languages. Some scholars propose that it is a desirable quality because it facilitates efficient, that is, short and simple communication [9]. To explore the feasibility of using translation as a task in research on the syntactic competence of large language models, I examined ambiguous Russian structures containing two personal pronouns in the dative case placed adjacently in a complex sentence.

Although Slavic languages have extremely flexible word order, ambiguity in syntactic role assignment is rather rare because of their rich morphology. However, when two arguments have identical lexicogrammatical properties and the same morphological marking on the sentence surface, ambiguity is possible even in the case of two full NPs, as shown in example (1).

(1) Miškata mouse.f.sg.def vižda see.ipfv.prs.3sg kotkata. cat.f.sg.def 'The mouse sees the cat / The cat sees the mouse' (Bg, [10])

Studies on the Russian dative case with infinitive structures [11,12,13,14,15] mention in passing that the co-occurrence of two dative arguments in one sentence is possible, being predominantly observed in sentences with a free infinitive. 1 Such sentences are ambiguous because in several types of Russian clauses the dative case is not assigned solely to the syntactic role of indirect object (third argument), but also to the so-called 'logical subject',2 as shown in ( 2 It is suggested [13] that such structures might generally be avoided in language use. However, where they occurred, semantic-syntactic role assignment would follow the linear principle, correlating with the syntactic hierarchy of arguments (Agent over Recipient or other Participant). Others [14] claim that a word order of dative arguments which is at variance with the syntactic hierarchy is marked only prosodically. Therefore, such structures could pose a challenge for large language models that are not trained on acoustic data.

Another work [15] argues that the context clarifies role assignment. For example, in (3), it is the negative personal pronoun nekomu 'to/for nobody', and not the linearly first dative mne 'to/for me' which is higher in the syntactic structure, and therefore more subject-like.

(3) Mne me.dat

zvonit' call.inf nekomu nobody.dat -ja I i foc ne neg slušaju.

listen.1sg 'Nobody calls me so I'm not listening (for the telephone).' [I. Grekova. Letom v gorode (1962)] (after [15]) Finally, scholars have yet to provide an overview of structures in which two dative arguments interact in one sentence in Russian, and limit themselves to at least overtly one-predicate infinitive structures. 3Hence, the order of predicates governing dative arguments is usually neglected as a factor.

A study on adjacent dative pronouns in Russian natural data originating from written text corpora [21] establishes that such structures do occur in language use, albeit mostly in overtly bipredicative structures, in combination with embedded infinitival complements, as shown in example (4). 4( The source also mentions some interesting speculations regarding the plans of Intel and NVIDIA, but we would like to dedicate a separate article to them. '

The linearly first dative pronoun im 'them' is governed by the infinitival complement posvjatit' 'dedicate', while the linearly second and adjacent pronoun nam 'us' is governed by the complementtaking matrix predicate chotelos' 'wish.refl'. Note that this sentence does not contain an explicit subject in the nominative.

According to the analysis in question [21], two factors significantly impact the probability of obtaining different word orders of arguments: the order of the main and embedded predicate, and the type of referential prominence that the pronouns represent, locuphoric pronouns (first/second person) being more likely to be assigned the agentive role than aliophoric ones (third person) [22].

For better comprehension the model is shown in Figure 1. Considerable variation is observed in sentences with an infinitival complement preceding the matrix verb in the linear order of the sentence (CM type marked on the abscissa). In such environments, two locuphoric pronouns are more likely to comply with the deep syntactic order of the predicates rather than with the shallow order suggested by the surface.

In combinations with at least one aliophoric pronoun, the picture is more complex (marked with red circle). Sentences where the referential prominence hierarchy is retained show the highest variation in pronoun order, as the probabilities of the two word orders occurring are nearly equal. When the referential prominence hierarchy is violated (an aliphoric pronoun takes a high position in syntax), the pronoun order corresponding to the order of the predicates on the surface is preferred, and so is significantly more likely to occur. Nonetheless, without context it is impossible to distinguish between these two conditions, which is a source of ambiguity.

It may be predicted that in a translation task, adjacent dative pronouns in Russian structures with embedded infinitives would be a source of error for LLMs. A particularly high error rate is to be expected for sentences with an infinitival complement preceding the matrix predicate in the surface linear order of a sentence, and sentences with a combination of a locuphoric and an aliophoric pronoun. [21]. Red marked items are predicted to be source for error in interpretation for LLMs.

Study Design

In this section I describe the translation task conducted in the study. The primary sources of data were the Russian Timestamped JSI web corpus 2014-2021 [23] and the ruTenTen17 corpus [24], from which I extracted 74 stimuli excerpts of 200-1100 characters each. 5 Every excerpt contained a sentence with a two-predicate structure, 6 in which the embedded predicate (complement) was placed earlier in the linear structure of the sentence than the embedding predicate (matrix), and which contained two adjacent personal pronouns in the dative case, one locuphoric and the other aliophoric (see example 5

Further in the paper I use the following notation: M stands for matrix predicate (syntactically higher predicate), C for complement predicate (syntactically lower predicate, embedded by M), D1 for dative pronoun governed by M, D2 for dative pronoun governed by C.

( nothing.gen 'They asked for plain water, but we had nothing to give them. '

The obtained data set was used to test the performance of three specialized translation tools based on neural network architectures, DeepL7 , Google Translate8 , Yandex 9 , and four chatbots with similar architectures: Google Gemini10 , Perplexity AI 11 , ChatGPT Turbo and ChatGPT Omni 12 .

The choice of commercial tools has clear drawbacks. First, the exact details of commercial models' architectures and the structure of the training data are not disclosed. Second, the computations performed by the models cannot be controlled, nor can the models be fine-tuned to improve accuracy of performance. However, training methods are beyond the scope of this study. My objective was to verify to what extent errors in MGTs can be predicted on the basis of statistically significant regularities detected in the behavior of language users. The selected pre-trained generative agents are all based on encoder-decoder architectures. This paper treats them similarly to human agents in usage-based theories of language acquisition [25]. In these theories, language acquisition is possible not thanks to universal grammar but is based on cognitive skills, in particular intention-reading and pattern-finding. Since the latter is clearly relevant to pre-trained generative models, I assume that their linguistic competence is emergent [26].

The training data represents the performance of multiple competent language users and is fed during the training process, from which the linguistic competence emerges, with the difference that each model has been exposed to a much larger amount of linguistic (written) data than any human agent can ever be. Knowledge about the linguistic competence of LLMs is accessed indirectly in this study, by evaluating performance in a specially developed translation task, just as it is done when the linguistic competence of human beings is studied. Therefore, for the selection of tools high competence had a greater priority than control over a language model. Another important argument in favor of commercial models was that typologically interesting Slavic languages, characterized by rich morphology and very flexible word order, are still rarely available in open-source multilingual models such as the Llama family. 13 Although ambiguity per se is not rare in language, keeping as many factors as possible constant, and thus focusing on only one type of ambiguity, leads to considerable data reduction. The construction examined in this study is rather complex and relatively rare. Therefore, correct performance requires big computational capacities and state-of-art technologies.

In the period 05-18.06.24, the sentences in their authentic contexts were fed into the translation systems as chunks of 200-800 characters. The size of each chunk depended on the place in the text where the context necessary for disambiguation was located. It was mainly found either before or after the tested sentence. In rare cases, both the pre-and post-context were necessary for an unambiguous interpretation of pronouns. The chatbots were zero-shot prompted with the command "Translate the following passage from Russian to English: [passage]". 14 For each stimulus, the chat was restarted and no feedback regarding performance was given. In this way, 518 observations were obtained.

In the study, I focused only on the correct assignment of syntactic roles to the dative pronouns, as rendered in the process of translation. Other types of translation errors were disregarded.

Results

Table 1 demonstrates that the dative pronoun disambiguation task is not straightforward and that error rates vary primarily between the specialized translation systems and the agents. The best-performing ChatGPT Omni achieved an accuracy of nearly 0.95, while all the other chatbots had a strikingly similar accuracy of 0.89. The best translation system performed under this rate, reaching an accuracy of 0.85. The other two systems showed far poorer accuracy: 0.74 in the case of Google Translate and 0.67, of Yandex.

Interestingly, single instances of misclassification were observed only for translation systems, but not for generative agents. In other words, if an agent misclassified, there existed a system that misclassified too. In order to find out whether the word order and the prominence hierarchy principle were considerable factors, the data were annotated for these two features. The distributions of errors across them are presented in Table 2.

I observe that D2D1 word order with kept prominence hierarchy is clearly the easiest to classify causing barely any errors.

It may be observed that the D2D1 word order which preserved the prominence hierarchy was clearly the easiest to classify, causing barely any errors. Violation of one of the principles led to a considerable deterioration in the models' performance. I observed a decrease in accuracy when either the surface word order of the pronouns did not replicate the surface order of the predicates or the pronouns did not comply with the prominence hierarchy, i.e., when an aliophoric pronoun was higher in the syntactic hierarchy than a locuphoric one. However, if both of these principles were violated, the language models seemed to act quite randomly: performance dropped to 0.51.

A logistic-regression model with mixed-effects15 [27] performed in R Environment [28], where stimuli and translator were treated as random effects, confirmed the intuition formulated above. The model shows that both factors (word order and prominence hierarchy) play a significant role in modeling the performance of the studied LLMs (cf. Table 3), and they have positive impact on performing the reference assignment in the translation task. and have a positive impact on correct reference assignment in the translation task. According to the studied model, the probability that sentences violating both principles will be accurately classified is 0.55. Sentences complying with both principles have a probability of 0.99 of being classified correctly. For sentences where only the hierarchy prominence is violated, the model predicts correct reference assignment with a probability of 0.93, while for sentences violating only the surface word order correspondence between governors and pronouns the respective number is 0.94.

Discussion and Future Prospects

The results obtained in the study are in line with prior predictions that "sentences with reversed order of predicates (CM), where two dative pronouns represent different levels of the prominence hierarchy can pose interpretation problems for NMT systems and other tools for NLP" [21]. It appears that in such structures, contextually available information might not be sufficient for correct disambiguation by a machine; for example, the key features might not be identified, as in sentence (6) 16 which was misclassified by six out of seven translators: Both hierarchical prominence and concordance of the word order of the governors and pronouns turned out to be relevant factors that might facilitate or hinder the task of disambiguation for the purpose of role assignment, for instance if there are no contextual cues. It should be pointed out that typically, sentences with two dative pronouns do not contain a nominative phrase, which in traditional syntax would be interpreted as the canonical subject. Consequently, such sentences most likely place a greater burden on processing, assuming that a correlate of syntactic parsing emerges in LLMs [5,6]. In other words, phenomena studied in theoretical linguistics and typology seem to be relevant and retrievable from the linguistic behavior of large language models, also through established linguistic methods. Although the way pre-trained language models process language is still comparable to a black box, I argue that methods used to study the linguistic behavior of the human species can be adjusted to studying the linguistic behavior of machines. If not human natural language users can be considered a black box too, since linguistic knowledge is never directly accessible. Another interesting finding of this study is that models pre-trained for performing various tasks communicated within a conversation performed better than specially trained machine translators. This result should by no means suggest that agents are in general better than translation systems at performing translation tasks, as only one particular aspect was evaluated. Nonetheless, in the future it should be examined whether this observation holds for other phenomena and what the reason might be. I cannot rule out that it is related to the number of parameters, the size of the context window, or the type of training data. Nevertheless, it is important to repeat that in this study, agents misclassified only structures which were misclassified by systems, that is, a subset of stimuli misclassified by systems and not a disjoint set of stimuli. This suggests the systematicity of errors made by the agents. Note that for all stimuli, disambiguation was always potentially possible due to the available context. Presumably, agents use context better than translation systems do. This could be due to the fact that agents are trained to be multifunctional. Contextually given information is necessary to perform other types of tasks and therefore better used, also in translation. Multilingualism is in a sense a byproduct. Verifying this claim would certainly require further studies on context processing in reference resolution tasks.

Limitations

Gicen that ambiguity is an intrinsic feature of natural language, this phenomenon is pertinent to the processing of any natural language by machines. This paper focuses on syntactic ambiguity which is associated with flexible word order and languages where a single morpheme can be used to encode various syntactic arguments. The Slavic branch is an illustrative example, where the scope of roles encoded by the dative is particularly extensive. This case study demonstrates that translation tasks can be employed to evaluate the capabilities of LLMs in a systemic manner and can serve as a foundation for future research.

The present study was limited to commercial products, which does not allow for evaluation of improvements on the training set. Moreover, the current tasks might permit improvement of the studied tools and thus the obtained results might not be replicable in the future.

To an extent, the study is limited by the small set of test sentences (which will be enlarged in the future) and neglecting of the contextual factors. However, the point of the study was to ascertain whether translation tasks can bring insights into the linguistic competences of LLMs. Furthermore, the same problem might be relevant for automatic dependency annotation.

Finally, the study was limited only to the automatic text processing tools only. Although it would be possible to perform a similar study with human language users would be possible, I do not expect that the results obtained in the same or similar task would surpass the best performing ChatGPT Omni. The set of stimuli is cognitively quite demanding. Therefore, I assume that in an artificial experimental setting, language users would base their choices on the two, linguistic principles discussed in this paper, rather than spending time on re-analyzing the full context because it is cognitively more demanding. However, it is precisely for this reason that I postulate that some notional linguistic rules, such as those observed in the theoretical and general linguistics, might be common to humans and machines,notwithstanding the fact, that one would expect the latter group to make fewer mistakes and to follow logic (provided by the context).

In addition, human users, unlike chatbots (agents), could make mistakes in different stimuli than the erroneous translation systems, which is an interesting result of this study worth further investigation.

Figure 1 :1Figure1: Probability of obtaining a reversed order of pronominal arguments in bipredicative structures with two dative arguments[21]. Red marked items are predicted to be source for error in interpretation for LLMs.

Table 11Accuracy in the translation task.Translation SystemsDLGoogle YandexAccuracy0.850.740.66ChatbotsOmni Turbo Gemini PerplexityAccuracy0.950.890.890.89Table 2Distribution of the studied factors.Incorrectly classifiedCorrectly classifiedProminence hierarchy noyesProminence hierarchy no yesWord orderword orderD1D224 (0.49) 43(0.17)D1D225 216D2D114 (0.15) 2(0.02)D2D177 117

Table 33Results of the performed regression model m not shouting or calling for help -it's pointless: the guys upstairs are far enough away from me, they have nothing to throw to me (and even if they threw a rope, how would I grab it?). 'Random effects:GroupsNameVariance Std.Dev.SentID (Intercept)3.54591.8831Translator (Intercept)0.97430.9871Fixed effects:EstimateStd. Errorz value Pr(>|z|)(Intercept)0.20530.90900.2260.82133D2D12.37711.06242.2370.02525*D1 locuphoric2.58610.95112.7190.00655*D2D1:D1 locuphoric0.92961.50920.6160.53791(6) Janekričuinezovunapomošč'-ėtobessmyslenno:rebjatanaverchu,dostatočnoInegshout.1sgandnegcallonhelpitpointlesskidstopenoughdalekootmenja,kinut' 𝐶im 𝐷1mne 𝐷2nečego 𝑀(aeslibyikinulifarfrommethrow.infthem.datme.datnothingconjifcondfocthrow.pst.3plverëvku,točemzaneëuchvatit'sja?).ropetheninsforshe.acccatch.inf.refl'I'

A sentence where the main predicate is expressed as an infinitive. There is no general agreement as to which syntactic role should be assigned to such datives; for a recent review of the topic see[16]. I refrain from generalization on this matter here, as the present study also involves object control structures where the syntactically highest dative argument carries the syntactic role of indirect object in the matrix predicate, but at the same time also assigns the semantic role of the non-overt subject in the complement clause. It is unclear whether sentences with a free infinitive are monoclausal[17,18] or biclausal[19,20]. In the latter case, scholars assume the existence of a copula, which is not marked overtly in present-tense sentences. https://overclockers.ru/hardnews/show/93720/kazhdyj-kvartal-sledujuschego-goda-budet-prinosit-novye-graficheskie-resheniya-amd I applied a CQL query for two adjacent pronominal lowercase word forms using the Sketch Engine corpus manager. The obtained data were manually controlled by a native speaker annotator and controlled for error by a second native speaker annotator. Because the context was always available, the task was usually straightforward. Still, for the current task only such sentences were chosen that did not raise any doubts in either of the annotators. For instance subject or object control constructions, predicatives, or modal-existential wh-predicates. https://www.deepl.com, licenced account https://translate.google.com, free user account https://translate.yandex.com, free user account https://gemini.google.com/app , free user account https://www.perplexity.ai/, free user account Access to both models via https://uhhgpt.uni-hamburg.de provided via the Universität Hamburg's license. The newest Llama 3.2 supports officially only English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Some prompts had to be modified for Google Gemini, which sometimes failed to perform the task for various reasons. Formula: Correct ∼ Order + hierarchy + Order*hierarchy + (1 | SentID) + (1|Translator) https://viktorkotl.livejournal.com/167122.html

Acknowledgments

The research has been partly supported by the Representative for Equal Opportunities and Academic Research Sabbatical Fund of the University of Regensburg. I thank Roman Fisun, Konstanzia Lüke and Irina Maykova for help with the preparation of the data set.

Ethics Statement

This work complies with the ACL Ethics Policy. Prior to the current study, I had not taken any actions to pretrain the systems for the needs of the current task.

A. Online Resources

The full list of stimuli and their translations is available via GitHub.

Finetuned language models are zero-shot learners JWei MBosma VYZhao KGuu AWYu BLester NDu AMDai QVLe 2022 From questionnaires to parallel corpora in typology ÖDahl Sprachtypologie und Universalienforschung 60 2007 Psycholinguistic and Cognitive Inquiries into Translation and Interpreting A. Ferreira, J. W. Schwieter 2015 John Benjamins Amsterdam Some applications of translation to psycholinguistic research TMWłosowicz Linguistica Silesiana 33 2012 Syntax representation in word embeddings and neural networksa survey TLimisiewicz DMareček Proceedings of the 20th Conference Information Technologies -Applications and Theory (ITAT 2020) CEUR-Workshop Proceedings MHoleňa THorváth AKelemenová FMráz DPardubská MPlátek PSosík the 20th Conference Information Technologies -Applications and Theory (ITAT 2020)

Košice, Slovakia

2020 Do transformers parse while predicting the masked word? HZhao APanigrahi RGe SArora 10.18653/v1/2023.emnlp-main.1029 Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics HBouamor JPino KBali the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics

Singapore

2023 Document-level neural MT: A systematic comparison ALopes MAFarajian RBawden MZhang AF TMartins Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, European Association for Machine Translation AMartins HMoniz SFumega BMartins FBatista LCoheur CParra ITrancoso MTurchi ABisazza JMoorkens AGuerberof MNurminen LMarg MLForcada the 22nd Annual Conference of the European Association for Machine Translation, European Association for Machine Translation

Lisboa, Portugal

2020 Going beyond the sentence: Contextual Machine Translation of Dialogue RBawden 2018 Orsay, France LIMSI, CNRS, Université Paris-Sud, Université Paris-Saclay Ph.D. thesis The communicative function of ambiguity in language STPiantadosi HTily EGibson 10.1016/j.cognition.2011.10.004 doi: Cognition 122 2012 MKorytkowska Gramatyka konfrontatywna bułgarsko-polska, volume V, Slavistyczny Ośrodek Wydawniczy

Warsaw

1992 Konstrukcija tipa negde spat': sintaksis, semantika, leksikografija JDApresjan LLIomdin Semiotika i informatika 1989 Infinitif et datif en polonais contemporain: un couple malheureux? DWeiss Complétude et incomplétude dans les langues romanes et slaves. Actes du VI Colloque international de linguistique romane et slave SKarolak

Cracovie; Cracow

29 sept.-3 oct. 1991. 1993 FMaurice Der modale Infinitiv in der modernen russischen Standardsprache

Munich

Peter Lang 1996 ABonč-Osmolovskaja Konstrukcii s dativnym subjektom v russkom jazyke

Moscow

MGU 2003 PhD thesis Otricatel'nye mestoimenija-predikativy (na ne-) EVPadučeva Russkaja korpusnaja grammatika 26.02.2022. 2015 Subject BHansen 10.1163/2589-6229_ESLO_COM_032471 Encyclopedia of Slavic Languages and Linguistics Online MLGreenberg 2020 Dva tipa dativnych predloženij v russkom jazyke AZimmerling Slovo -čistoe vesel'e: Sbornik statej v čest ABPen'kovskogo M

Moscow

2009 Jazyki slavjanskich kul'tur Dative infinitive constructions in Russian: Are they really biclausal ETsedryk Formal approaches to Slavic Linguistics 25 : The third Cornell meeting 2016 BWayles DMiloje NEnzinna SHarmath De Lemos RKarlin DZec

Ann Arbor

Michigan Slavic Publications 2018 Modal possessive constructions: Evidence from Russian ILivitz Lingua 122 2012 Quantificational properties of neg-wh items in Russian NKondrashova RŠimík Proceedings of North East Linguistics Society 40 North East Linguistics Society 40

Amherst

Graduate Linguistic Student Association 2013 University of Massachusetts Dative ambiguity in Russian: A corpus induced study EJurkiewicz-Rohrbacher 10.2478/jazcas-2023-0025 Journal of Linguistics/Jazykovedný casopis 74 2023 Role-reference associations and the explanation of argument coding splits MHaspelmath 10.1515/ling-2020-0252 doi: Linguistics 59 2021 The internals of an aggregated web news feed MTrampuš BNovak Proceedings of 15th Multiconference on Information Society 2012 (IS-2012) 15th Multiconference on Information Society 2012 (IS-2012) 2012 The TenTen corpus family MJakubíček AKilgarriff VKovář PRychlý VSuchomel 7th International Corpus Linguistics Conference CL 2013 Constructing a Language: A Usage-Based Theory of Language Acquisition MTomasello 2003 Harvard University Press Harvard Emergent grammar PJHopper Berkeley Linguistics Society 1987 Fitting linear mixed-effects models using lme4 DBates MMächler BBolker SWalker 10.18637/jss.v067.i01 Journal of Statistical Software 67 2015 RCTeam R: A language and environment for statistical computing 2017