<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Antwerp, Belgium
∗Corresponding author.
£ hannah.seemann@rub(H. Seemann); tatjana.scheffler@rub.de(T. Sche昀툀er)
ç https://tscheffler.github.io/(T. Sche昀툀er)
ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Di昀erentiating Social Media Texts via Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hannah Seemann</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatjana Sche昀툀er</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germanistisches Institut</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruhr-Universität Bochum</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>We propose to use clustering of documents based on their 昀椀ne-grained linguistic properties in order to capture and validate text type distinctions such as medium and register. Correlating the bottom-up, linguistic feature driven clustering with text type distinctions (medium and register) enables us to quantify the in昀氀uence of individual author choice and medium/register conventions on variable linguistic phenomena. Our pilot study applies the method to German particles and intensi昀椀ers in a multimedia corpus, annotated for register. We show that German particles and intensi昀椀ers di昀er across both register and medium. The clustering based on the linguistic features most closely corresponds to the medium distinction, while the strati昀椀cation into registers is re昀氀ected to a lesser extent.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;clustering</kwd>
        <kwd>social media</kwd>
        <kwd>register</kwd>
        <kwd>media</kwd>
        <kwd>German modal particles</kwd>
        <kwd>intensi昀椀ers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In this pilot study we investigate the use of clustering to capture macro-level distinctions
between texts. We construct a bottom-up view of textual similarities via clustering based on
their speci昀椀c linguistic features. We compare the results with annotations of the medium and
register of the texts to see to what extend e昀ects of the used medium and register can be
differentiated from individual author’s variability.</p>
      <p>
        It is known that the linguistic phenomena (e.g., word choice, use of tenses, punctuation, etc.)
found in a text are shaped by many factors. In particular, highly variable phenomena such
as discourse particles are known to be in昀氀uenced by a wide range of aspects. Such aspects
of text level variation can be sociolinguistic factors like author demographics and identity,
author persona, or simply individual style, as investigated by sociolingui3s5ts, 2[
        <xref ref-type="bibr" rid="ref1 ref27 ref30">0, 27, 30, 1</xref>
        ]
and corpus linguists2[
        <xref ref-type="bibr" rid="ref1 ref12 ref28">1, 12, 28</xref>
        ]. Furthermore, writers also adapt to external circumstances of
the utterance situation, such as the mode, medium, topic, or register (the situational context
of language use) [
        <xref ref-type="bibr" rid="ref2 ref4">4, 2</xref>
        ]. For example, the “conceptual orality” theory proposes that a
conceptual mode (spoken or written) is realized by a language producer by using di昀erent linguistic
means in informal (conceptually spoken) vs. formal (conceptually written) langu1a7g]e.
[Different media are located in di昀erent places on the conceptual orality scale from typical spoken
interaction to written text. Other research proposes that the register of a text in昀氀uences the
linguistic features that can be found in it, to the extent that linguistic features can be used to
distinguish between di昀erent registers [
        <xref ref-type="bibr" rid="ref5 ref8">5, 8</xref>
        ].
      </p>
      <p>So while both the author as well as various external aspects are known to in昀氀uence linguistic
variables, it is di昀케cult to pinpoint to what extent each linguistic feature depends on each of the
in昀氀uences. The reason for this is that natural corpus data typically only covers a single medium
or register, or con昀氀ates all categories: individual authors only contribute in one medium, each
medium contains di昀erent registers or wildly di昀erent topics than the others, or the corpus is
balanced for genre but it is not possible to track individual auth1ors.</p>
      <p>In this paper, we make use of a social media corpus containing data from two di昀erent
media (blogs and tweets), but covering the same set of 44 authors, the same topic (parenting and
family life), and the same three registers (more detail below). We cluster the texts in our
corpus using the relative frequency of two highly variable linguistic features, German modal and
intensifying particles, found in each user’s texts, divided by medium and register. We then
compare the resulting clustering with the groupings based on register or medium to assess
whether the linguistic features re昀氀ect these external aspects of the utterance situation.</p>
      <p>We 昀椀nd that both medium and register are positively correlated with the clustering of
documents based on linguistic features, where the alignment is better for the medium distinction
than for register. We argue that our method makes it possible to tease apart the individual
in昀氀uence of medium, register, as well as individual author properties on the linguistic features
studied.</p>
      <p>The tables and scripts used in this paper can be accessed via the Open Science Framewor2k.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Categorizing texts: Register and medium</title>
      <p>
        Various concepts have been used to characterize the situational circumstances in which a
discourse is produced, as these directly or indirectly in昀氀uence the way the discourse is shaped:
text type, genre, topic, register, and others4[
        <xref ref-type="bibr" rid="ref19">, 19</xref>
        ]. In this study, we focus on the dimensions
of medium and register.
      </p>
      <p>
        The medium is the speci昀椀c communication channel via which an utterance is made and
reaches its addressee, such as television, phone, oral speech, Twitter, or Facebook. This notion
is helpful in distinguishing between di昀erent communication situations speci昀椀cally related to
di昀erent so-called social media, as each medium carries its own a昀ordances. The a昀ordances a
medium o昀ers its users determine in which way the user and medium can interact13[
        <xref ref-type="bibr" rid="ref36">, 36</xref>
        ], and
have subtle in昀氀uences on the linguistic behavior of users (e.g., whether a post will be publicly
visible or only to my friends might in昀氀uence whether I will use a swear word). In our work, we
study the two media blog posts and tweets. Both are written, but exhibit many informal and
variable linguistic properties. They occupy di昀erent locations in the conceptual orality space
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The data will be described in more detail i3n. §
1One example is the Ontonotes corpus, which contains a range of text types in both spoken and written language,
but no overlap between medium and register, or individual authohrttsp: s://catalog.ldc.upenn.edu/LDC2013T1.9
A notable exception, pointed out to us by a reviewer, is the Early Modern Multiloquent Authors (EMMA) Corpus,
which tracks changes of authors’ language use over their lifetime in di昀erent spoken and written registehrtst:ps:
//www.uantwerpen.be/en/projects/mind-bending-grammars/emma-corpu. s/
2https://osf.io/kjnsu/
      </p>
      <p>While the notion medium is based on the technical implementation of a discourse, register
takes various aspects of the extralinguistic context into account, such as whether a speci昀椀c
discourse is interactive, what the relation of the disourse participants is like, whether it is
emotionally charged or its purpose is merely the exchange of information, et4c].. [Due to this
interplay of contextual properties, register has, following Bib3]e,rfr[equently been
characterized as multidimensional. Some researchers even propose to do away with a language-external
inventory of register altogether6][, and want to instead represent registers as combinations
of linguistic features present in the text. We do not follow this approach here, since we want
to speci昀椀cally investigate the in昀氀uence of register on linguistic features – and therefore the
registers themselves must be delineated independently.</p>
      <p>In this work, we distinguish the registers Informative, Narrative, and Persuasive, based on
situational properties such as the purpose of conversation (passing on information, reporting
on life events, argumentation, respectively), the interactivity with the addressee, and the author
involvement (both ranging from low for Informative to high for Persuasive). Linguistic features,
with the exception of pronouns, were not used to distinguish between register dimensions. All
registers are present in each of the two media.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>To compare whether register, the medium or individual authorship has the most in昀氀uence on
text similarity, it is necessary that we look at texts from the same author in di昀erent media
and registers. To this end, we collected a corpus of German language blog posts and tweets
from the same 44 individuals, but in a single domain: parenting. The community of parenting
bloggers is relatively coherent and writes about similar topics both in their blogs, as well as on
Twitter.</p>
      <p>Blogs are a long-form text format with limited interactivity, while tweets are short posts
(all our tweets are still under 140 characters) which allow direct responses; both media are
public. Thus, the two media o昀er di昀erent types of communicative situations, but they are
both available for all three registers introduced2i,nd§epending on the individual usage.</p>
      <p>
        We constructed the corpus using the Twitter API and the user’s corresponding blog’s RSS
feed. The initial data collection was carried out in February, 2017, and the data used here
comprises the 500 most recent tweets and the 5 or 10 (depending on availability) most recent
blog posts. A more detailed description of the corpus and data collection can be found in
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and on its website.3 The resulting corpus consists of data from 44 authors, comprising
390 blog posts (∼350k tokens) and 20,131 tweets (∼300k tokens). All data has been manually
pseudonymized.
      </p>
      <p>We manually annotated each blog post with one register (Informative, Narrative, or
Persuasive). Since the tweets are o昀琀en too short to be assigned a clear register, we grouped them
together and assigned one register to the entire tweet collection from one author, capturing
the main usage of Twitter by that author. For tweet collections, we additionally allowed the
intermediate registers Narrative-Informative and Narrative-Persuasive, denoting a mix between
3http://staff.germanistik.rub.de/digitale-forensisch-leinguistik/forschung/textkorpus-sprachliche-variioant-insozialen-medien/
these registers.</p>
      <p>In addition, all modal and intensifying particles were manually identi昀椀ed and disambiguated
in the corpus, with the help of word lists, annotation guidelines, and additional trained
linguistics students. We de昀椀ne these phenomena in the next section.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Variable linguistic features</title>
      <p>
        German modal and intensifying particles are used di昀erently in di昀erent communicative
situations (spoken vs. written and formal vs. informal communication) and also depending on the
author using them. Both types of particles are nonin昀氀ected and modify the element in their
scope, and both are generally assumed to be more frequent in speech or conceptually spoken
language [
        <xref ref-type="bibr" rid="ref14 ref32 ref34">14, 34, 32</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>4.1. Modal particles</title>
        <p>
          German modal particles are used to express the author’s attitude towards a proposition or
to make assumptions concerning the “common ground”, the shared knowledge of author and
reader [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], but they do not a昀ect the truth conditions of a sentence [37]. (1) is an example of
this function: In (1-a), the modal particle ‘ja’ is used to indicate that the fact that the author
is writing is known from the sentence before. In(1-b), ‘doch’ is used to express the author’s
(negative) attitude towards the idea that only fathers who are in fact not able to pay refrain
from paying.
(1)
a.
        </p>
        <p>Wenn ich schreibe, kann ich immerhin nicht einschlafen wie gestern beim
Staatsanwalt. Allerdings bekomme ich trotzdem nicht mit was passiert, weil ich ja schreibe.
‘While I’m writing I can not fall asleep, as it happened yesterday at the prosecutor’s
o昀케ce. But I still don’t get what’s happening because I’m JA writing.’
(blogposts-5487-3)4
@[USERNAME] Zu denken, nur diejenigen Väter würden nicht zahlen, die es nicht
können, ist doch völlig weltfremd.
‘@[USERNAME] It is DOCH naive to think that only fathers who are not able to
pay won’t do so.’ (tweets-1123)
Due to the possibility of expressing multiple functions with one modal particle, the meaning
of one modal particle can vary in di昀erent contexts. Additionally, not all modal particles have
an exact match in other languages and they can not be directly translated to Englis1h0][.
Kratzer shows two examples of ‘ja’ that include a translation to English. In both cases, there is
no word that matches the meaning of ‘ja’ exactly, it is rather the function of the modal particle
that is translated:
(2)</p>
        <p>Ich bin ja ein Einzelkind.</p>
        <p>‘As you know, I am an only child.’
4If not indicated otherwise, examples are from our corpus.</p>
        <p>
          The use of modal particles varies between individuals and linguistic mod3e4s, [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We
therefore expect to see di昀erences in particle use between di昀erent authors, but also between
di昀erent media and registers. Figure1 shows the distribution of the ten most frequent modal
particles in our corpus divided by our register dimensions. As expected, there are di昀erences
2
in how frequently modal particles are used in di昀erent media and registers. With = 2188
and p &lt; 0.01, it can be assumed that there is a dependency between modal particle count and
medium/register. This indicates that modal particles can be used as a linguistic feature to
cluster documents by medium and register.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Intensifiers</title>
        <p>
          Intensifying particles (for short, ‘intensi昀椀ers’) can be used to boost or tone down the intensity
of a gradable expression or utterance22[]. Similar to modal particles, there is inter-individual
variation in the use of intensi昀椀ers, but intensi昀椀ers are subject to much more rapid change of use
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Even though they are assumed to be more frequent in speech than in written language,
it has been shown that they are frequently used in written social medi2a5[] and they can be
found in our social media corpus, as well (s(e4e)).
(4)
a.
        </p>
        <p>@[USERNAME] wieso kann ein Tattoo so brillante Farben haben? Wo hast du das
machen lassen?
‘@[USERNAME] How can a tattoo have such brilliant colours? Where did you get
it?’ (tweets-4677)</p>
        <p>Da gibt es wirklich tolle Sachen - @[USERNAME]
‘There you can 昀椀nd really great things - @[USERNAME]’
(tweets-7846)</p>
        <p>
          An overview of German intensi昀椀ers can be found in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], Breindl discusses (issues with) the
categorization of German intensi昀椀ers [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. A large-scale corpus study of intensi昀椀ers in spoken
German was conducted by Stratton, showing that intensi昀椀ers are used quite frequently in
spoken language and that the use of intensi昀椀ers varies by individual demographic characteristics
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. This was previously shown for English intensi昀椀ers, as well33[].
        </p>
        <p>Based on the previous sociolinguistic results, we expect that the use of intensi昀椀ers in social
media varies by individual demographic factors of the authors (as shown for speech), but may
also vary by medium and register. Figur2eshows the distribution of the ten most frequent
intensi昀椀ers in our corpus divided by our register dimensions. Similar to modal particles,
dif2
ferent intensi昀椀ers are used more or less frequently in di昀erent media or registers. With =
1062.2 and p &lt; 0.01, it can be assumed that there is a dependency between intensi昀椀er count and
medium as well as register. This indicates that intensi昀椀ers, as well, can be used as a linguistic
feature to cluster documents by medium and register.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Clustering</title>
      <p>It is our hypothesis that register in昀氀uences the low-level linguistic choices in addition to the
medium or the author style. Starting from this hypothesis, we carry out a pilot study to cluster
texts in a data-driven way based on their linguistic features. We want to 昀椀nd out whether
these features enable us to distinguish registers from each other, e.g. rather than clustering
each user’s tweet collection with their blog posts (as would be expected if the features re昀氀ect
only individual linguistic style). The features we use are the per-sentence frequency of the top
10 modal particles and intensi昀椀ers found. We use the relative frequency of every feature to
take the di昀erent lengths of the blog posts/tweet collections into account. Each document is
represented by a vector containing the relative frequencies of the particles and intensi昀椀ers (see
Table1).</p>
      <p>Each user’s texts are split by medium (blogs and tweets) as well as register into a minimum
of 2 (and a maximum of 4) documents per user. For example, the user 1095 is represented in
three documents:b_1095_I (containing all Informative blog postsb)_,1095_N (Narrative blog
posts), and t_1095_I (containing all tweets, which were annotated as Informative). Document
names re昀氀ect the medium, user id, and register, in order.</p>
      <p>
        We used the agglomerative clustering algorithm implemented in Python’s scikit-learn
package [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Agglomerative clustering successively groups the most similar documents together,
until all clusters have been merged. Figu3reshows the clustering results.
      </p>
      <p>Out of nine groups of clusters, three contain mainly data points labelled as using Narrative
register (document names shown in blue), two contain mainly data points labelled as Narrative
or Informative (yellow) and one cluster contains data points labelled as Narrative or Persuasive
(red). The last three groups of clusters contain data points from all of the register dimensions in
equal amount. Out of these nine groups of clusters, six contain data points labelled as coming
from blog posts and of the remaining three clusters, only one contains more data points labelled
as coming from tweet collections than from blog posts.</p>
      <p>Out of 64 pairs of documents that were clustered directly together, 22 had the same register
label. 18 out of these 22 cases are nodes where blog posts and tweet collections were clustered
together, 23 other cases are nodes which contain data points from the same medium. In one
node, neither medium and register nor author are the same. There are only two cases where
the same user’s blog posts and tweets were clustered together. For one author, both documents
were also labelled as Informative (blog post and tweet collection), for the other author, one was
labelled as Informative and one as Narrative.</p>
      <p>Even though particles and intensi昀椀ers are not enough data to cluster the same register
together in all cases, the algorithm still tends to cluster documents from the same register
together, as opposed to grouping the same user’s blog posts and tweet collection. This indicates
that medium and register in昀氀uence how users write and that writing in a speci昀椀c register has
an independent impact on linguistic choice from just the medium in which the user writes, and
their personal linguistic style.</p>
      <p>
        For evaluation, we compared the correlation of the clustering by linguistic features with the
register distribution on the one hand, and the medium on the other. As a quality measure, we
used the V-measure [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], which balances homogeneity (whether a cluster contains only
documents of one class, i.e. belonging to one register/medium) with completeness (to what extent
all documents from one class are put into the same cluster). We applied the V-measure
implemented in Python’s scikit-learn package to the comparison between a 20-cluster crosscut of the
hierarchical clustering shown in Figur3,eand the grouping by register/medium as indicated
by the document labels. The results show that the clustering corresponds more closely to the
grouping by medium (V = 0.2246) than the grouping by register (V = 0.0839). Homogeneity
of clusters is also higher for medium (0.5449) than for register (0.1419), though both show a
positive correlation5.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We proposed clustering documents in a multi-media and multi-register corpus of German
parenting bloggers by their usage of modal and intensifying particles. The method can be used to
5A reviewer suggests that Twitter’s length restrictions for tweets lead to authors choosing shorter and, in general,
less intensi昀椀ers than in blog posts. This e昀ect can also be seen for modal particles, though it is less strong. Thus,
the question arises whether con昀氀ating both types of particles to cluster the document is reasonable. In fact, using
only intensi昀椀ers for clustering leads to a slightly better V-measure for grouping by register (V = 0.1293), but does
worse for grouping by medium (V = 0.1628). Using only modal particles leads to worse V-measure results than
using only intensi昀椀ers. The prevalence of the medium probably arises from this di昀ering use of particles in both
media, possibly due to length restrictions in tweets.
generate bottom-up clusters of documents (based on linguistic features) and to compare these
clusters to groupings of the same documents by medium and register. We show that both the
medium and the register dimensions are re昀氀ected in the variation in our linguistic features. For
our feature groups, modal and intensifying particles, the medium has a bigger e昀ect than the
register. We would like to argue that clustering enables us to determine the relative importance
of individual author properties and text level properties (medium and register) on the linguistic
expressions found in a text.</p>
      <p>
        A昀琀er having conducted this pilot study, we will apply this method to a di昀erent dataset to
test the reproducibility of our results. Another natural next step would be to integrate other
linguistic phenomena as features in the clustering. On the one hand, one could choose
phenomena that have been argued to vary based on register or medium. On the other hand, linguistic
variation in small-scale features has been used to account for individual author style, for
example in authorship attribution or author pro昀椀lin1g6[
        <xref ref-type="bibr" rid="ref31">, 31</xref>
        ]. If the features proposed in authorship
analyses are integrated in our clustering account, it may be possible to tease apart in昀氀uences
based on medium or register from individual author style choices.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank Lesley-Ann Kern for discussion, and the annotators in preparing the
corpus data. We are grateful to the anonymous reviewers for their helpful comments. Funded
by the Deutsche Forschungsgemeinscha昀琀 (DFG, German Research Foundation), project ID
317633480, SFB 1287.
[37] M. Zimmermann. “Discourse Particles”. InS:emantics. Ed. by P. Portner, C. Maienborn,
and K. v. Heusinger. Vol. 2. Handbücher zur Sprach- und Kommunikationswissenscha昀琀
HSK. Berlin: Mouton de Gruyter, 2011, pp. 2011–2038.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>J. AndroutsopoulosD.</surname>
          </string-name>
          eutsche Jugendsprache:
          <article-title>Untersuchungen zu ihren Strukturen und Funktionen. Frankfurt a</article-title>
          . M.:
          <string-name>
            <given-names>Peter</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Androutsopoulos</surname>
          </string-name>
          . “Mediatization and
          <string-name>
            <given-names>Sociolinguistic</given-names>
            <surname>Change</surname>
          </string-name>
          .
          <source>Key Concepts</source>
          , Research Traditions, Open Issues”. In:Mediatization and sociolinguistic change. Ed. by
          <string-name>
            <given-names>J.</given-names>
            <surname>Androutsopoulos</surname>
          </string-name>
          . Linguae &amp; litterae v.
          <volume>36</volume>
          . Berlin ; Boston: De Gruyter,
          <year>2014</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          . “
          <article-title>Using Register-Diversi昀椀ed Corpora for General Language Studies”</article-title>
          .
          <source>ICno:mputational Linguistics 19.2</source>
          (
          <issue>1993</issue>
          ), pp.
          <fpage>219</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Conrad</surname>
          </string-name>
          .Register, Genre, and
          <string-name>
            <surname>Style</surname>
          </string-name>
          . 2nd ed. Cambridge Textbooks in Linguistics. Cambridge: Cambridge University Press,
          <year>2019</year>
          . do1i0:.
          <volume>1017</volume>
          /9781108686136.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Egbert</surname>
          </string-name>
          . “
          <article-title>Register Variation on the Searchable Web: A Multi-Dimensional Analysis”</article-title>
          .
          <source>In:Journal of English Linguistics 44.2</source>
          (
          <issue>2016</issue>
          ), pp.
          <fpage>95</fpage>
          -
          <lpage>137</lpage>
          . doi:
          <volume>10</volume>
          .1177/007542 4216628955.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bildhauer</surname>
          </string-name>
          , E. Pankratz, and R. SchäferC.orpus,
          <source>Inference, and Models of Register Distribution. Talk</source>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>E. Breindl.</surname>
          </string-name>
          “Intensitätspartikeln”.
          <source>InH:andbuch der deutschen Wortarten</source>
          . Berlin, New York: De Gruyter,
          <year>2007</year>
          , pp.
          <fpage>397</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Clarke</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Grieve</surname>
          </string-name>
          . “
          <article-title>Stylistic Variation on the Donald Trump Twitter Account: A Linguistic Analysis of Tweets Posted between 2009 and 2018”</article-title>
          .
          <source>InP:los One 14.9</source>
          (
          <year>2019</year>
          ). Ed. by
          <string-name>
            <surname>C. M.</surname>
          </string-name>
          <article-title>Danforth</article-title>
          . doi:
          <volume>10</volume>
          .1371/journal.pone.
          <volume>022206</volume>
          .2
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>U.</given-names>
            <surname>Claudi</surname>
          </string-name>
          . “
          <article-title>Intensi昀椀ers of Adjectives in German”</article-title>
          .
          <source>In:Language Typology and Universals 59.4</source>
          (
          <issue>2006</issue>
          ), pp.
          <fpage>350</fpage>
          -
          <lpage>369</lpage>
          . doi:
          <volume>10</volume>
          .1524/stuf.
          <year>2006</year>
          .
          <volume>59</volume>
          .4.350.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Degand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cornillie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pietrandrea</surname>
          </string-name>
          , edDs.
          <source>iscourse Markers and Modal Particles: Categorization and Description. Pragmatics &amp; beyond new series</source>
          volume
          <volume>234</volume>
          . Amsterdam ; Philadelphia: John Benjamins Publishing Company,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>G. Diewald.</surname>
          </string-name>
          “Abtönungspartikel”.
          <source>InH:andbuch der deutschen Wortarten</source>
          . Berlin, New York: de Gruyter,
          <year>2009</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Nini</surname>
          </string-name>
          . “
          <article-title>Individuality in Syntactic Variation: An Investigation of the Seventeenth-Century Gerund Alternation”</article-title>
          .
          <source>InC:ognitive Linguistics 31.2</source>
          (
          <issue>2020</issue>
          ), pp.
          <fpage>279</fpage>
          -
          <lpage>308</lpage>
          . doi:
          <volume>10</volume>
          .1515/cog-2019-0040.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Gibson</surname>
          </string-name>
          . “
          <article-title>The Theory of A昀ordances”</article-title>
          . In: The Ecological Approach to Visual Perception. Psychology Press. New York London: Taylor &amp; Francis Group,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          . “
          <article-title>Syntaktische Funktionen der Partikeln eben, eigentlich, einfach, nämlich, ruhig, vielleicht und wohl. Zur Grundlegung einer diachronischen Untersuchung von Satzpartikeln im Deutschen”</article-title>
          . In:Die Partikeln der deutschen Sprache. Ed. by
          <string-name>
            <given-names>H.</given-names>
            <surname>Weydt</surname>
          </string-name>
          . Berlin, New York: De Gruyter,
          <year>1979</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ito</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tagliamonte</surname>
          </string-name>
          . “Well Weird, Right Dodgy, Very Strange, Really Cool:
          <article-title>Layering and Recycling in English Intensi昀椀ers”</article-title>
          .
          <source>In: Language in Society 32.2</source>
          (
          <issue>2003</issue>
          ), pp.
          <fpage>257</fpage>
          -
          <lpage>279</lpage>
          . doi:
          <volume>10</volume>
          .1017/s0047404503322055.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tschuggnall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          . “
          <article-title>Overview of the Author Identi昀椀cation Task at</article-title>
          PAN-2018:
          <article-title>Cross-domain Authorship Attribution and Style Change Detection”</article-title>
          . InC:lef.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Koch</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Oesterreicher</surname>
          </string-name>
          . “Sprache der Nähe - Sprache der Distanz:
          <article-title>Mündlichkeit und Schri昀琀lichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte”</article-title>
          .
          <source>InR:omanistisches Jahrbuch</source>
          <volume>36</volume>
          (
          <year>1985</year>
          ), pp.
          <fpage>15</fpage>
          -
          <lpage>43</lpage>
          . doi:
          <volume>10</volume>
          .15496/publikation-2041 0.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kratzer</surname>
          </string-name>
          . “Beyond 'Ouch' and 'Oops'.
          <article-title>How Descriptive and Expressive Meaning interact”</article-title>
          .
          <source>In: Cornell Conference on Theories of Context Dependency</source>
          (
          <year>1999</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          . “Genres, Registers, Text Types,
          <article-title>Domains and Styles: Clarifying the Concepts and Nevigating a Path through the BNC jungle”</article-title>
          .
          <source>InL:anguage Learning and Technology 5 (3</source>
          <year>2001</year>
          ), pp.
          <fpage>37</fpage>
          -
          <lpage>72</lpage>
          . url: https://ro.uow.edu.au/artspapers/59.8
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>L. MacKenzie.</surname>
          </string-name>
          “
          <article-title>Perturbing the Community Grammar: Individual Di昀erences and Community-Level Constraints on Sociolinguistic Variation”</article-title>
          .
          <source>GInl:ossa: a journal of general linguistics 4</source>
          .1 (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .5334/gjgl.622.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nini</surname>
          </string-name>
          . “
          <article-title>An Authorship Analysis of the Jack the Ripper letters”</article-title>
          .
          <source>IDn:igital Scholarship in the Humanities 33.3</source>
          (
          <issue>2018</issue>
          ), pp.
          <fpage>621</fpage>
          -
          <lpage>636</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqx065.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C. v.</given-names>
            <surname>Os</surname>
          </string-name>
          .
          <article-title>Aspekte der Intensivierung im Deutschen</article-title>
          .
          <source>Studien zur deutschen Grammatik 37. Tübingen: Narr</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Duchesnay.</surname>
          </string-name>
          “
          <article-title>Scikit-learn: Machine Learning in Python”</article-title>
          .
          <source>IJno:urnal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ), pp.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          . “
          <string-name>
            <surname>V-Measure</surname>
          </string-name>
          :
          <article-title>A Conditional Entropy-Based External Cluster Evaluation Measure”. InP:roceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</article-title>
          . Prague, Czech Republic: Association for Computational Linguistics,
          <year>2007</year>
          , pp.
          <fpage>410</fpage>
          -
          <lpage>420</lpage>
          . url: https://aclanthology.org/D07-104.
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sche昀툀er</surname>
          </string-name>
          . “Conversations on Twitter”. In: Investigating Computer-Mediated Communication:
          <article-title>Corpus-Based Approaches to Language in the Digital World</article-title>
          . Ed. by
          <string-name>
            <given-names>D.</given-names>
            <surname>Fišer</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Beißwenger</surname>
          </string-name>
          . Ljubljana: University Press,
          <year>2017</year>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>T. Sche昀툀er</surname>
            , L.-
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kern</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Seemann</surname>
          </string-name>
          . “
          <article-title>Individuelle linguistische Variabilität in sozialen Medien”</article-title>
          . In: Neue Entwicklungen in der Korpuslandscha昀琀 der Germanistik:
          <article-title>Beiträge zur IDS-Methodenmesse 2022</article-title>
          . Ed. by
          <string-name>
            <given-names>M.</given-names>
            <surname>Kupietz</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <article-title>Korpuslinguistik und Interdisziplinäre Perspektiven auf Sprache (CLIP) 11</article-title>
          . Tübingen: Narr, forthcoming.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>E. Schleef.</surname>
          </string-name>
          “
          <article-title>Individual Di昀erences in Intra-Speaker Variation: T-Glottalling in England and Scotland”</article-title>
          .
          <source>In:Linguistics Vanguard</source>
          <volume>7</volume>
          .s2 (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1515/lingvan-2020-0033.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>H.-J. Schmid</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mantlik</surname>
          </string-name>
          . “
          <article-title>Entrenchment in Historical Corpora? Reconstructing Dead Authors' Minds from their Usage Pro昀椀les”</article-title>
          .
          <source>InA:nglia 133.4</source>
          (
          <issue>2015</issue>
          ), pp.
          <fpage>583</fpage>
          -
          <lpage>623</lpage>
          . doi:
          <volume>10</volume>
          .1515/ang-2015-0056.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Stalnaker</surname>
          </string-name>
          . “Assertion”.
          <source>InS:yntax and semantics</source>
          <volume>9</volume>
          : Pragmatics. Ed. by
          <string-name>
            <given-names>P.</given-names>
            <surname>Cole</surname>
          </string-name>
          . Vol.
          <volume>9</volume>
          . New York, NY, USA: Academic Press,
          <year>1978</year>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Stratton</surname>
          </string-name>
          . “
          <article-title>Adjective Intensi昀椀ers in German”</article-title>
          .
          <source>In: Journal of Germanic Linguistics 32.2</source>
          (
          <issue>2020</issue>
          ), pp.
          <fpage>183</fpage>
          -
          <lpage>215</lpage>
          . doi:
          <volume>10</volume>
          .1017/s1470542719000163.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Woodard</surname>
          </string-name>
          . “What Represents “style” in Authorship Attribution?” In: Coling.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Tagliamonte</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Denis</surname>
          </string-name>
          . “
          <article-title>Linguistic Ruin? LOL! Instant Messaging and Teen Language”</article-title>
          .
          <source>In: American Speech 83.1</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .1215/
          <fpage>00031283</fpage>
          -2008-001.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Tagliamonte</surname>
          </string-name>
          . “So Di昀erent and Pretty Cool! Recycling Intensi昀椀ers in Toronto, Canada”.
          <source>In: English Language and Linguistics 12.2</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>361</fpage>
          -
          <lpage>394</lpage>
          . doi:
          <volume>10</volume>
          .1017/s13 60674308002669.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thurmair</surname>
          </string-name>
          .
          <article-title>Modalpartikeln und ihre Kombinationen</article-title>
          .
          <source>Linguistische Arbeiten</source>
          <volume>223</volume>
          .
          <string-name>
            <surname>Tübingen: M. Niemeyer</surname>
          </string-name>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wolfram</surname>
          </string-name>
          . “
          <article-title>Variation and Language: Overview”</article-title>
          .
          <source>InE:ncyclopedia of Language &amp; Linguistics. Elsevier</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>341</lpage>
          . doi:
          <volume>10</volume>
          .1016/b0-08-044854-2/
          <fpage>04256</fpage>
          -5.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          . “
          <article-title>Conceptualizing Perceived A昀ordances in Social Media Interaction Design”</article-title>
          .
          <source>In:Aslib Proceedings 65.3</source>
          (
          <issue>2013</issue>
          ), pp.
          <fpage>289</fpage>
          -
          <lpage>303</lpage>
          . doi:
          <volume>10</volume>
          .1108/0 0012531311330656.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>