=Paper=
{{Paper
|id=Vol-1607/conrod
|storemode=property
|title=We Who Tweet: Pronominal Relative Clauses on Twitter
|pdfUrl=https://ceur-ws.org/Vol-1607/conrod.pdf
|volume=Vol-1607
|authors=Kirby Conrod,Rachael Tatman,Rik Koncel-Kedziorski
|dblpUrl=https://dblp.org/rec/conf/clif/ConrodTK16
}}
==We Who Tweet: Pronominal Relative Clauses on Twitter==
We Who Tweet: Pronominal Relative Clauses on Twitter Kirby Conrod, Rachael Tatman and Rik Koncel-Kedziorski Department of Linguistics University of Washington kconrod@uw.edu, rctatman@uw.edu, kedzior@uw.edu Abstract this point idiomatic. (1) above is one such ‘liter- ary’ PRRC, in this case a translation of a quote Pronominal relative clauses were previ- from Seneca, a Roman Stoic philosopher. In con- ously reported to be unproductive in En- trast, (2) is a completely non-literary example of glish, appearing only in Bible verses and a PRRC, and cannot be traced to any religious or proverbs. This corpus study of Twit- idiomatic history. ter data shows that pronominal relative Non-restrictive pronominal relative clauses, by clauses are a productive part of contem- contrast, are appositive relative clauses headed by porary English, and can be used in both referential pronouns: literary and nonliterary registers. (3) I can do all things in him who strengthens 1 Introduction me. [twi.4746] Pronominal relative clauses, also known as Volde- (4) How is it that he, who showed a CLOCK mort phrases (Zobel, 2015), are relative clauses to an ENGINEERING teacher got ar- headed by pronouns. Pronominal relative clauses rested? [twi.727] (PRCs) can be restrictive (PRRCs) or non- restrictive. In previous works that have studied We also found both literary and non-literary uses PRCs it has been claimed that the construction is of this non-restrictive version throughout the Twit- not productive, but is only found in particularly ter data. We used repetition and uniqueness to literary contexts (Curme, 1912; Elbourne, 2013; encode literariness. Our data shows that PRCs Zobel, 2015). In order to better determine whether are a productive part of contemporary English. PRCs are a productive part of modern English syn- Through further analysis of PRRCs we estimate tax, and to test empirical generalizations that have that a PRRC is tweeted every thirty seconds. Addi- been made about the construction, we collected tionally, of the PRCs that we sampled, 37% (1222 and analyzed data from Twitter which contains tweets) were unique and non-literary. PRCs. PRRCs are pronominal relative clauses which 2 Background of Pronominal Relative have a restrictive reading available. In the Twitter Clauses data that we have collected, we found many exam- ples of PRRCs: Elbourne’s (2013) syntactic analysis of PRRCs (1) He who has great power should use it (Voldemort phrases) proposes a structure in which lightly. [twi.82] the pronominal head is a definite determiner. The analysis of relative clauses in Elbourne (2013) is (2) We who #FeelTheBern feel the same agnostic as to whether the pronominal head orig- about you! [twi.7511] inates within the relative clause or merges exter- The data we collected included many PRRCs of nally. Elbourne (2013) does include a nominal a ‘literary’ style–either Bible verses, quotes from layer in the structure that constitutes a null element famous politicians, or old adages that may be at meaning something like ‘person.’ 17 (5) [[he[person [who. . . ]]]] September 22, 2015) were collected for ten exact (Elbourne, 2013) phrase PRO who for each pronoun. Elbourne (2013) does not propose a structural To facilitate hand-analysis, we applied a cas- difference between restrictive and non-restrictive cade of strategic filters to remove unwanted tweets relative clauses. (i.e. non-PRCs) from our data. For each we pro- In a recent semantic analysis, Zobel (2015) pro- vide a description of the filter and its intended tar- poses that PRRCs constitute generic statements gets below. Many of these filters are designed to about generic kinds: that is, that PRCs such as discard specific syntactic phenomena that would he who prays are equivalent to the kind of man be more readily recognizable given a parse struc- who prays. The insight of this semantic deriva- ture. However, due to the casual nature of twitter tion captures what makes pronouns–which other- orthography, accurate automatic syntactic parsing wise would be referential expressions–allowable is rarely available. We use simple, strategic fil- as heads of restrictive relative clauses. In order to ters to approximately characterize undesired con- head a restrictive relative clause, a pronoun must structions. They are designed with a bias toward not be referential, must not have an antecedent, removing too few rather than too many tweets. and must not be interpreted as specific. Zobel First, we apply exact string match to remove du- (2015) derives a semantic structure of PRRCs that plicates. From the remaining tweets we remove reduces down to generic kinds: he is interpreted those where the last non-whitespace character be- as something more like the sort of man. This is fore the word who is anything but a letter or a well in keeping with the syntactic definiteness of comma. This also excludes utterance-initial who. the pronoun. We then remove tweets with the pattern of PRO In these two analyses of PRRCs and an oft-cited who (where PRO varies over English pronouns) as earlier description by Curme (1912), however, these are typically partitives rather than PRRCs. several generalizations have been made about We then filter tweets where it precedes who by 2 PRRCs without robust support. Elbourne (2013), to 4 words, e.g It is we who or It may have been Zobel (2015), and Curme (1912) all work with the we who. These cleft constructions are not PRCs. assumption (citing Curme’s (1912) generalization) Finally, we remove tweets where the PRO who that PRRCs are not a fully productive construction combination is preceded by a clause taking verb in contemporary English. Following this assump- from among the inflected forms of ask, tell, won- tion, examples that (Zobel, 2015) and (Elbourne, der, inform, and show. This filtered out tweets with 2013) use are Bible quotes, proverbs, or literary clausal complements, e.g. show me who did it. quotes: This filtering process left us with 3261 tweets of (6) He who abides in love abides in God. the original 10,000. We sampled 300 of these for ((Zobel, 2015): 7) detailed hand analysis. (1 John 4:16 NKJV) We cast the remaining 3261 tweets into two classes, based upon the uniqueness of the words (7) He who hesitates is lost. immediately following who. If the two word se- ((Elbourne, 2013): 205) quence following who is unique, we consider this In keeping with these assumptions, examples evidence that it is a non-literary use. This non- that (Elbourne, 2013) and (Zobel, 2015) use are literary class consists of 1222 examples. themselves proverbs and quotes. This study pro- The data was hand-tagged for availability of vides evidence to that the construction is not in fact a restrictive reading, availability of an appositive limited to Biblical language or literary reference. reading, the head of the relative clause, the case 3 Methods of the pronominal head, and what role the relative clause played in the matrix clause. Twitter data was extracted from the Twitter pub- Restrictive and appositive readings were not lic application programming interface (API) with taken to be mutually exclusive. A relative clause an R (R Core Team, 2015) script using the plyr was tagged as restrictive if there was an interpre- (Wickham, 2011) and twitteR (Gentry, 2015) li- tation available where the relative clause denoted braries1 . Up to the 1000 most recent tweets (on some subset of a (usually generic) set of entities 1 All code and data used will be made available. denoted by the head. A relative clause was tagged 18 as appositive if there was an interpretation avail- Biblical tweets which are considered literary by able where the relative clause denoted a property our metric. This is due in part to the fact that held by the entire set of entities denoted by the phrases from the bible whose original context is head. Relative clauses could be tagged as both re- non-PRRC are often used in PRRC constructions strictive and appositive if both readings were avail- (for example, PRO who comes in the name of the able; no relative clauses were tagged as neither re- Lord does not appear in the KJV, but does appear strictive nor appositive. frequently in our twitter data). Heads were tagged simply as the pronouns he, By using 4-grams such as PRO who w1 w2 to him, she, her, they, them, I, me, we, us, and you. test for repetition, we labelled as literary exact The case of the head was tagged as nominative repetitions of phrases as well as near-repetitions (NOM) if it was tagged as he, she, they, I, we, and that use templatic patterns for the purpose of lit- accusative (ACC) if it was him, her, them, me, or erary allusion. This is an inclusive measure of lit- us. We did not tag you as either NOM or ACC be- erariness that labels more tweets as literary than cause there is no overt morphological difference those that exactly correspond to texts such as the between the two forms. KJV Bible; our hope is that the tweets remain- The role of the relative clause in the matrix ing tagged as non-literary should be more reli- clause was tagged S if the RC was a subject, O ably unique. Both measures of uniqueness versus if the RC was an object, P if the RC was a pred- literariness–the comparisons within our own data icate, and F if the RC was a fragment, or did not set and the comparisons to the KJV Bible–indicate play a role in a matrix clause. that pronominal relative clauses are not always ei- ther literary or Biblical. 4 Results 4.2 Possible Pronominal Heads 4.1 Literariness and Uniqueness While this is not central to her semantic analy- The data we collected allowed us to address sev- sis, Zobel (2015) also claims that PRRCs are only eral generalizations made about PRCs. One re- headed by masculine pronouns, while other (non- striction Zobel (2015) places on PRCs is based he) pronominal RCs are non-restrictive. Contrary on Curme’s same claim (1912): that PRCs are ar- to this claim, we found many instances of PRRCs chaic, and not productive in English– and that they headed by the full range of English pronouns, in- appear only in Bible texts, proverbs, and other say- cluding he, she, him, her, they, them, we, us, and ings. This is a claim not about the structure of you. PRCs, but about their historicity, register, and pro- (8) But value he who shows you respect, hon- ductivity as a syntactic construction. Firstly, Zo- esty & trust. [twi.310] bel’s (2015) and Elbourne’s (2013) ability to pro- duce novel examples of PRCs for syntactic and (9) A preacher that fears the powers that are semantic analysis indicates that speakers of En- contemporary, and dismisses the power of glish retain the ability to create novel sentences us- him who is eternally in power, is not fit to ing this syntactic structure. Secondly, 37% of the lead people [twi.4673] tweets containing PRRCs that we sampled (1222 (10) She who leads rules, so play nice or I tweets) were unique, which we take as evidence won’t let you [twi.1455] that they are non-literary. (11) Every moment is a golden one for him/her Our method for determining literariness, while who has the vision to recognize it as such. less ideal than comparing against a large corpus [twi.5951] of literature, is quite conservative and has the ad- (12) they who control the pumpkin spice con- vantage of capturing facts about contemporary us- trol the universe [twi.2610] age. For instance, noting that many PRCs deal with biblical topics, we searched the King James (13) It’s a funny old game ain’t it but them Version of the Bible (KJV) for the 4-grams PRO who take part in it wouldn’t change it who w1 w2 , where PRO is some pronoun and w1 w2 [twi.6439] are words that follow who in one or more of the (14) And we who know and realize this should 3261 filtered tweets. This resulted in a mere always be willing and eager to save others 227 matches. However, there are more than 227 and not condemn them [twi.7637] 19 Figure 1: Proportion of PRRC’s with each head by Figure 2: Proportion of PRRC’s in each role in the literariness. matrix sentence. 0.5 Repeated Repeated 0.25 Unique Unique 0.4 Proportion of Tweets Proportion of Tweets 0.20 0.3 0.15 0.2 0.10 0.1 0.05 0.0 0.00 he we she us him they them he/she her you S O F P Pronominal Heads Role in the Matrix Clause (15) u can’t tease us who weren’t there with a rized in Figure 2. This suggests that there’s more new song and not let us hear!!!! [twi.9453] diversity in these construction than has previously (16) you who make me smile, you are what been posited. makes my heart [twi.8765] 5 Restrictives and Appositives Of the restrictive constructions in the data we Restrictive and appositive tweets were also ana- analysed, the only pronominal head we did not lyzed by uniqueness. There was no statistically find was it. While there is a clear preference for reliable difference between unique and repeated he, this is by no means a universal constraint, and tweets in their use of either restrictive or appos- so cannot be relied upon to provide the semantic itive structures (χ2 (1, N= 184) = 2.45, p = 0.117 denotation derived by PRRCs. In addition, this and χ2 (1, N= 184) = 2.09, p = 0.147, respectively). preference is somewhat weaker when looking only This is summarized in Figure 3. As can be seen in at the unique tweets, as can be seen in Figure 1. the figure, roughly the same proportion of unique (Although it was not statistically significant at α = and repeated tweets were tagged as restrictive or 0.05: χ2 (9, N= 225) = 15.44, p = 0.079.) appositive. 4.3 Role in the Matrix Clause 6 Conclusions Zobel (2015) states that PRRCs “combine with object-level predicates” at the matrix level: that is, Based on the data collected, we conclude that the PRRC is usually at the left periphery, or is the pronominal restrictive relative clauses are a pro- subject of, the matrix sentence. To test whether ductive and robust part of contemporary English. this was the case, we separated out unique and re- Based on our data, we estimate that a PRRC is peated PRRCs and analyzed their syntactic role tweeted every thirty seconds. Additionally, of the in the matrix sentence. Each tweet was tagged PRCs that we sampled, 37% (1222 tweets) were with a matrix syntactic role: S (subject), O (ob- unique and thus, we argue, non-literary. This ject), P (predicate), or F (fragment). Among all is evidence that, while they may be stylistically PRRC’s the subject position was the most com- marked in some way, PRCs are a construction that mon, account for 49% of PRRCs in the data-set, is productively available to contemporary English object position (23%) and fragments (19%) were speakers. also quite common. Predicate PRRC’s (8%) were Further, there are not predictable differences be- by far the rarest. These proportions held across tween repeated and unique PRC’s, which seems both types–unique and repeated. This is summa- to suggest that productive and non-productive uses 20 Figure 3: Proportion of unique and repeated clauses that are restrictive and appositive. Restrictive Appositive 1.0 1.0 Unique Repeated 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 Restrictive Not Appos. Not are making use of the same underlying structures. The most striking difference between repeated and unique tweets was that the latter were more likely to be headed by pronouns other than he or she– which is a semantic rather than syntactic distinc- tion. References George O Curme. 1912. A history of english rela- tive constructions. The Journal of English and Ger- manic Philology, 11(3):355–380. Paul Elbourne. 2013. Definite descriptions. OUP Ox- ford. Jeff Gentry, 2015. twitteR: R Based Twitter Client. R package version 1.1.9. R Core Team, 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statis- tical Computing, Vienna, Austria. Hadley Wickham. 2011. The split-apply-combine strategy for data analysis. Journal of Statistical Soft- ware, 40(1):1–29. Sarah Zobel. 2015. Voldemort phrases in generic sen- tences. Grazer Linguistische Studien. 21