=Paper=
{{Paper
|id=Vol-1607/conrod
|storemode=property
|title=We Who Tweet: Pronominal Relative Clauses on Twitter
|pdfUrl=https://ceur-ws.org/Vol-1607/conrod.pdf
|volume=Vol-1607
|authors=Kirby Conrod,Rachael Tatman,Rik Koncel-Kedziorski
|dblpUrl=https://dblp.org/rec/conf/clif/ConrodTK16
}}
==We Who Tweet: Pronominal Relative Clauses on Twitter==
<pdf width="1500px">https://ceur-ws.org/Vol-1607/conrod.pdf</pdf>
<pre>
              We Who Tweet: Pronominal Relative Clauses on Twitter


                Kirby Conrod, Rachael Tatman and Rik Koncel-Kedziorski
                               Department of Linguistics
                               University of Washington
              kconrod@uw.edu, rctatman@uw.edu, kedzior@uw.edu


                    Abstract                                this point idiomatic. (1) above is one such ‘liter-
                                                            ary’ PRRC, in this case a translation of a quote
     Pronominal relative clauses were previ-                from Seneca, a Roman Stoic philosopher. In con-
     ously reported to be unproductive in En-               trast, (2) is a completely non-literary example of
     glish, appearing only in Bible verses and              a PRRC, and cannot be traced to any religious or
     proverbs. This corpus study of Twit-                   idiomatic history.
     ter data shows that pronominal relative                   Non-restrictive pronominal relative clauses, by
     clauses are a productive part of contem-               contrast, are appositive relative clauses headed by
     porary English, and can be used in both                referential pronouns:
     literary and nonliterary registers.
                                                                (3)   I can do all things in him who strengthens
1    Introduction
                                                                      me.                             [twi.4746]
Pronominal relative clauses, also known as Volde-
                                                                (4)   How is it that he, who showed a CLOCK
mort phrases (Zobel, 2015), are relative clauses
                                                                      to an ENGINEERING teacher got ar-
headed by pronouns. Pronominal relative clauses
                                                                      rested?                        [twi.727]
(PRCs) can be restrictive (PRRCs) or non-
restrictive. In previous works that have studied            We also found both literary and non-literary uses
PRCs it has been claimed that the construction is           of this non-restrictive version throughout the Twit-
not productive, but is only found in particularly           ter data. We used repetition and uniqueness to
literary contexts (Curme, 1912; Elbourne, 2013;             encode literariness. Our data shows that PRCs
Zobel, 2015). In order to better determine whether          are a productive part of contemporary English.
PRCs are a productive part of modern English syn-           Through further analysis of PRRCs we estimate
tax, and to test empirical generalizations that have        that a PRRC is tweeted every thirty seconds. Addi-
been made about the construction, we collected              tionally, of the PRCs that we sampled, 37% (1222
and analyzed data from Twitter which contains               tweets) were unique and non-literary.
PRCs.
   PRRCs are pronominal relative clauses which
                                                            2    Background of Pronominal Relative
have a restrictive reading available. In the Twitter
                                                                 Clauses
data that we have collected, we found many exam-
ples of PRRCs:                                              Elbourne’s (2013) syntactic analysis of PRRCs
    (1)   He who has great power should use it              (Voldemort phrases) proposes a structure in which
          lightly.                    [twi.82]              the pronominal head is a definite determiner. The
                                                            analysis of relative clauses in Elbourne (2013) is
    (2)   We who #FeelTheBern feel the same
                                                            agnostic as to whether the pronominal head orig-
          about you!               [twi.7511]
                                                            inates within the relative clause or merges exter-
The data we collected included many PRRCs of                nally. Elbourne (2013) does include a nominal
a ‘literary’ style–either Bible verses, quotes from         layer in the structure that constitutes a null element
famous politicians, or old adages that may be at            meaning something like ‘person.’


                                                       17
    (5)      [[he[person [who. . . ]]]]                            September 22, 2015) were collected for ten exact
                                           (Elbourne, 2013)        phrase PRO who for each pronoun.
   Elbourne (2013) does not propose a structural                      To facilitate hand-analysis, we applied a cas-
difference between restrictive and non-restrictive                 cade of strategic filters to remove unwanted tweets
relative clauses.                                                  (i.e. non-PRCs) from our data. For each we pro-
   In a recent semantic analysis, Zobel (2015) pro-                vide a description of the filter and its intended tar-
poses that PRRCs constitute generic statements                     gets below. Many of these filters are designed to
about generic kinds: that is, that PRCs such as                    discard specific syntactic phenomena that would
he who prays are equivalent to the kind of man                     be more readily recognizable given a parse struc-
who prays. The insight of this semantic deriva-                    ture. However, due to the casual nature of twitter
tion captures what makes pronouns–which other-                     orthography, accurate automatic syntactic parsing
wise would be referential expressions–allowable                    is rarely available. We use simple, strategic fil-
as heads of restrictive relative clauses. In order to              ters to approximately characterize undesired con-
head a restrictive relative clause, a pronoun must                 structions. They are designed with a bias toward
not be referential, must not have an antecedent,                   removing too few rather than too many tweets.
and must not be interpreted as specific. Zobel                        First, we apply exact string match to remove du-
(2015) derives a semantic structure of PRRCs that                  plicates. From the remaining tweets we remove
reduces down to generic kinds: he is interpreted                   those where the last non-whitespace character be-
as something more like the sort of man. This is                    fore the word who is anything but a letter or a
well in keeping with the syntactic definiteness of                 comma. This also excludes utterance-initial who.
the pronoun.                                                       We then remove tweets with the pattern of PRO
   In these two analyses of PRRCs and an oft-cited                 who (where PRO varies over English pronouns) as
earlier description by Curme (1912), however,                      these are typically partitives rather than PRRCs.
several generalizations have been made about                       We then filter tweets where it precedes who by 2
PRRCs without robust support. Elbourne (2013),                     to 4 words, e.g It is we who or It may have been
Zobel (2015), and Curme (1912) all work with the                   we who. These cleft constructions are not PRCs.
assumption (citing Curme’s (1912) generalization)                  Finally, we remove tweets where the PRO who
that PRRCs are not a fully productive construction                 combination is preceded by a clause taking verb
in contemporary English. Following this assump-                    from among the inflected forms of ask, tell, won-
tion, examples that (Zobel, 2015) and (Elbourne,                   der, inform, and show. This filtered out tweets with
2013) use are Bible quotes, proverbs, or literary                  clausal complements, e.g. show me who did it.
quotes:                                                               This filtering process left us with 3261 tweets of
    (6)      He who abides in love abides in God.                  the original 10,000. We sampled 300 of these for
                                 ((Zobel, 2015): 7)                detailed hand analysis.
                               (1 John 4:16 NKJV)                     We cast the remaining 3261 tweets into two
                                                                   classes, based upon the uniqueness of the words
    (7)      He who hesitates is lost.
                                                                   immediately following who. If the two word se-
                               ((Elbourne, 2013): 205)
                                                                   quence following who is unique, we consider this
   In keeping with these assumptions, examples                     evidence that it is a non-literary use. This non-
that (Elbourne, 2013) and (Zobel, 2015) use are                    literary class consists of 1222 examples.
themselves proverbs and quotes. This study pro-                       The data was hand-tagged for availability of
vides evidence to that the construction is not in fact             a restrictive reading, availability of an appositive
limited to Biblical language or literary reference.                reading, the head of the relative clause, the case
3       Methods                                                    of the pronominal head, and what role the relative
                                                                   clause played in the matrix clause.
Twitter data was extracted from the Twitter pub-                      Restrictive and appositive readings were not
lic application programming interface (API) with                   taken to be mutually exclusive. A relative clause
an R (R Core Team, 2015) script using the plyr                     was tagged as restrictive if there was an interpre-
(Wickham, 2011) and twitteR (Gentry, 2015) li-                     tation available where the relative clause denoted
braries1 . Up to the 1000 most recent tweets (on                   some subset of a (usually generic) set of entities
    1
        All code and data used will be made available.             denoted by the head. A relative clause was tagged


                                                              18
as appositive if there was an interpretation avail-           Biblical tweets which are considered literary by
able where the relative clause denoted a property             our metric. This is due in part to the fact that
held by the entire set of entities denoted by the             phrases from the bible whose original context is
head. Relative clauses could be tagged as both re-            non-PRRC are often used in PRRC constructions
strictive and appositive if both readings were avail-         (for example, PRO who comes in the name of the
able; no relative clauses were tagged as neither re-          Lord does not appear in the KJV, but does appear
strictive nor appositive.                                     frequently in our twitter data).
   Heads were tagged simply as the pronouns he,                  By using 4-grams such as PRO who w1 w2 to
him, she, her, they, them, I, me, we, us, and you.            test for repetition, we labelled as literary exact
The case of the head was tagged as nominative                 repetitions of phrases as well as near-repetitions
(NOM) if it was tagged as he, she, they, I, we, and           that use templatic patterns for the purpose of lit-
accusative (ACC) if it was him, her, them, me, or             erary allusion. This is an inclusive measure of lit-
us. We did not tag you as either NOM or ACC be-               erariness that labels more tweets as literary than
cause there is no overt morphological difference              those that exactly correspond to texts such as the
between the two forms.                                        KJV Bible; our hope is that the tweets remain-
   The role of the relative clause in the matrix              ing tagged as non-literary should be more reli-
clause was tagged S if the RC was a subject, O                ably unique. Both measures of uniqueness versus
if the RC was an object, P if the RC was a pred-              literariness–the comparisons within our own data
icate, and F if the RC was a fragment, or did not             set and the comparisons to the KJV Bible–indicate
play a role in a matrix clause.                               that pronominal relative clauses are not always ei-
                                                              ther literary or Biblical.
4     Results
                                                              4.2     Possible Pronominal Heads
4.1    Literariness and Uniqueness                            While this is not central to her semantic analy-
The data we collected allowed us to address sev-              sis, Zobel (2015) also claims that PRRCs are only
eral generalizations made about PRCs. One re-                 headed by masculine pronouns, while other (non-
striction Zobel (2015) places on PRCs is based                he) pronominal RCs are non-restrictive. Contrary
on Curme’s same claim (1912): that PRCs are ar-               to this claim, we found many instances of PRRCs
chaic, and not productive in English– and that they           headed by the full range of English pronouns, in-
appear only in Bible texts, proverbs, and other say-          cluding he, she, him, her, they, them, we, us, and
ings. This is a claim not about the structure of              you.
PRCs, but about their historicity, register, and pro-
                                                                (8)    But value he who shows you respect, hon-
ductivity as a syntactic construction. Firstly, Zo-
                                                                       esty & trust. [twi.310]
bel’s (2015) and Elbourne’s (2013) ability to pro-
duce novel examples of PRCs for syntactic and                   (9)    A preacher that fears the powers that are
semantic analysis indicates that speakers of En-                       contemporary, and dismisses the power of
glish retain the ability to create novel sentences us-                 him who is eternally in power, is not fit to
ing this syntactic structure. Secondly, 37% of the                     lead people [twi.4673]
tweets containing PRRCs that we sampled (1222                  (10)    She who leads rules, so play nice or I
tweets) were unique, which we take as evidence                         won’t let you [twi.1455]
that they are non-literary.                                    (11)    Every moment is a golden one for him/her
   Our method for determining literariness, while                      who has the vision to recognize it as such.
less ideal than comparing against a large corpus                       [twi.5951]
of literature, is quite conservative and has the ad-
                                                               (12)    they who control the pumpkin spice con-
vantage of capturing facts about contemporary us-
                                                                       trol the universe [twi.2610]
age. For instance, noting that many PRCs deal
with biblical topics, we searched the King James               (13)    It’s a funny old game ain’t it but them
Version of the Bible (KJV) for the 4-grams PRO                         who take part in it wouldn’t change it
who w1 w2 , where PRO is some pronoun and w1 w2                        [twi.6439]
are words that follow who in one or more of the                (14)    And we who know and realize this should
3261 filtered tweets. This resulted in a mere                          always be willing and eager to save others
227 matches. However, there are more than 227                          and not condemn them [twi.7637]


                                                         19
Figure 1: Proportion of PRRC’s with each head by                                                    Figure 2: Proportion of PRRC’s in each role in the
literariness.                                                                                       matrix sentence.


                                                                                                                           0.5
                                                                             Repeated                                                                        Repeated
                       0.25
                                                                             Unique                                                                          Unique


                                                                                                                           0.4
Proportion of Tweets


                                                                                                    Proportion of Tweets
                       0.20


                                                                                                                           0.3
                       0.15


                                                                                                                           0.2
                       0.10


                                                                                                                           0.1
                       0.05


                                                                                                                           0.0
                       0.00
                                he
                                     we
                                          she
                                                 us
                                                      him
                                                            they
                                                                   them
                                                                          he/she
                                                                                   her
                                                                                         you

                                                                                                                                  S        O           F          P

                                                Pronominal Heads                                                                      Role in the Matrix Clause


       (15)                   u can’t tease us who weren’t there with a                             rized in Figure 2. This suggests that there’s more
                              new song and not let us hear!!!! [twi.9453]                           diversity in these construction than has previously
       (16)                   you who make me smile, you are what                                   been posited.
                              makes my heart [twi.8765]
                                                                                                    5                      Restrictives and Appositives
   Of the restrictive constructions in the data we
                                                                                                    Restrictive and appositive tweets were also ana-
analysed, the only pronominal head we did not
                                                                                                    lyzed by uniqueness. There was no statistically
find was it. While there is a clear preference for
                                                                                                    reliable difference between unique and repeated
he, this is by no means a universal constraint, and
                                                                                                    tweets in their use of either restrictive or appos-
so cannot be relied upon to provide the semantic
                                                                                                    itive structures (χ2 (1, N= 184) = 2.45, p = 0.117
denotation derived by PRRCs. In addition, this
                                                                                                    and χ2 (1, N= 184) = 2.09, p = 0.147, respectively).
preference is somewhat weaker when looking only
                                                                                                    This is summarized in Figure 3. As can be seen in
at the unique tweets, as can be seen in Figure 1.
                                                                                                    the figure, roughly the same proportion of unique
(Although it was not statistically significant at α =
                                                                                                    and repeated tweets were tagged as restrictive or
0.05: χ2 (9, N= 225) = 15.44, p = 0.079.)
                                                                                                    appositive.
4.3                      Role in the Matrix Clause
                                                                                                    6                      Conclusions
Zobel (2015) states that PRRCs “combine with
object-level predicates” at the matrix level: that is,                                              Based on the data collected, we conclude that
the PRRC is usually at the left periphery, or is the                                                pronominal restrictive relative clauses are a pro-
subject of, the matrix sentence. To test whether                                                    ductive and robust part of contemporary English.
this was the case, we separated out unique and re-                                                  Based on our data, we estimate that a PRRC is
peated PRRCs and analyzed their syntactic role                                                      tweeted every thirty seconds. Additionally, of the
in the matrix sentence. Each tweet was tagged                                                       PRCs that we sampled, 37% (1222 tweets) were
with a matrix syntactic role: S (subject), O (ob-                                                   unique and thus, we argue, non-literary. This
ject), P (predicate), or F (fragment). Among all                                                    is evidence that, while they may be stylistically
PRRC’s the subject position was the most com-                                                       marked in some way, PRCs are a construction that
mon, account for 49% of PRRCs in the data-set,                                                      is productively available to contemporary English
object position (23%) and fragments (19%) were                                                      speakers.
also quite common. Predicate PRRC’s (8%) were                                                          Further, there are not predictable differences be-
by far the rarest. These proportions held across                                                    tween repeated and unique PRC’s, which seems
both types–unique and repeated. This is summa-                                                      to suggest that productive and non-productive uses


                                                                                               20
Figure 3: Proportion of unique and repeated
clauses that are restrictive and appositive.

              Restrictive                Appositive
    1.0


                                 1.0
                                                Unique
                                                Repeated
    0.8


                                 0.8
    0.6


                                 0.6
    0.4


                                 0.4
    0.2


                                 0.2
    0.0


                                 0.0


          Restrictive   Not            Appos.      Not


are making use of the same underlying structures.
The most striking difference between repeated and
unique tweets was that the latter were more likely
to be headed by pronouns other than he or she–
which is a semantic rather than syntactic distinc-
tion.


References
George O Curme. 1912. A history of english rela-
  tive constructions. The Journal of English and Ger-
  manic Philology, 11(3):355–380.
Paul Elbourne. 2013. Definite descriptions. OUP Ox-
  ford.

Jeff Gentry, 2015. twitteR: R Based Twitter Client. R
   package version 1.1.9.
R Core Team, 2015. R: A Language and Environment
  for Statistical Computing. R Foundation for Statis-
  tical Computing, Vienna, Austria.
Hadley Wickham. 2011. The split-apply-combine
  strategy for data analysis. Journal of Statistical Soft-
  ware, 40(1):1–29.
Sarah Zobel. 2015. Voldemort phrases in generic sen-
  tences. Grazer Linguistische Studien.


                                                             21

</pre>