<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Building It-tok: an Italian TikTok Corpus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luisa Troncone</string-name>
          <email>ltroncone@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CLiC-it 2025: Eleventh Italian Conference on Computational Linguistics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>STL CNRS UMR 8163, University of Lille, Rue du Burreau</institution>
          ,
          <addr-line>59650 Villeneuve d'Ascq</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Salerno</institution>
          ,
          <addr-line>Via Giovanni Paolo II, 84084 Fisciano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This contribution focuses on the process of building a corpus for TikTok discourse. Particularly, it aims at describing the choices made during the construction of a corpus of Italian TikTok videos. The corpus It-tok was collected to individuate linguistic functional correlates of digital discourse on TikTok. It-tok includes two subsets of videos: the first one is centered on videos concerning themes of interest for the public debate (e.g. abortion, femicide, racism, internal politics); the second one is made up of videos with no specific theme, intended to constitute the control sample for the observations made for the first sub-corpus.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;TikTok</kwd>
        <kwd>public discourse</kwd>
        <kwd>CMC corpora</kwd>
        <kwd>modality</kwd>
        <kwd>linguistic functional correlates 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The major goal of this contribution is to present some of
the choices made during the process of building It-tok,
an Italian TikTok corpus. The It-tok project was born
with three aims:
•
•
•
to provide a first assessment of the linguistic
functional correlates (LFCs) displayed by
TikTok content, and, subsequently, of the
modality of communication on this specific
social network;
to highlight how themes of interest for the
public debate are treated on this social
network;
to compare the LFCs found in general TikTok
discourse with those emerging in thematically
focused content.</p>
      <p>By functional linguistic correlates, we mean the set of
features that characterize language across different
modalities; consequently, spoken texts exhibit specific
correlates compared to written, read, or digitally
produced texts (see section 3. for a more in-depth
discussion): for instance, some LFCs of spoken language
with regard to written language highlighted in previous
studies regarded a significantly different distribution of
PoS [1], the higher count of deictics [2], or of
demonstratives [3]. LFCs describe effects on language
uses based on the modality the communicative event
takes place in. The focus on functional linguistic
correlates poses significant challenges for decisions
related to the corpus design, precisely because of the
platform’s content multimodality. Indeed, the structure
of the final product, i.e., the TikTok video, can take
highly diverse forms, and can employ a number of
semiotic means for conveying the message. Because of
the platform’s multifaceted nature, a number of choices
were to be made, to achieve a selection of content for
analysis which was both replicable in its methodology
and suitable for the achievement of the objective. In fact,
beyond the thematic relevance, the selection process is
complicated by the inherently heterogeneous nature of
TikTok videos. These products differ not only in terms
of topic but also from a semiotic point of view: they can
include spoken language, text on screen, music, sound
effects, gestures, visual editing techniques, or a
combination of these means. As a result, choices
concerning the corpus are of various kinds and must
account for the platform’s multimodal nature, which
challenges both linguistic analysis and methodological
consistency from the outset. In this work, we chose to
focus solely on verbal content, but TikTok would allow
for a variety of different levels of interest.</p>
      <p>For accomplishing the goals illustrated above, we
decided to subdivide the collection stage into two
phases, one concerned with general discourse (Gen) and
one concerned with political and social discourse
0000-0003-0791-8714
© 2025 Copyright for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
(PolSo).</p>
      <p>During the collection, a number of methodological
issues arose, which will be described here, together with
the solutions we opted for; but, before getting to the
decision-making section, we will briefly introduce
TikTok and the reasons it was chosen.</p>
    </sec>
    <sec id="sec-2">
      <title>2. TikTok: Characteristics,</title>
    </sec>
    <sec id="sec-3">
      <title>Meaningfulness, Employment in</title>
    </sec>
    <sec id="sec-4">
      <title>Linguistics</title>
      <p>TikTok is a social media platform which allows for the
publication of video content. It gained much popularity
in the latest years especially among the youngest parts
of the population [4][5]. A tiktok is a (usually) short
video: in fact, while the maximum duration the platform
allows for a tiktok is 10 minutes, the mean duration of
tiktoks stands around 50 seconds2. This format can be
also found on other social media (e.g., reels on
Instagram), but the reasons which led to choosing it are
linked to the amount and the kind of popularity it
reached lately, rather than the specific format of the
content considered.</p>
      <p>
        With the beginning of the post-digital era, and the
intersection and overlap of the online and offline lives,
digital content has begun to have a consistent effect on
our analogic life [6]. This is especially true for public
debate themes, and, in the latest years, this influence has
been especially clear for TikTok content, which is
getting central in the political discourse. To give an idea
of the importance TikTok gained in the public opinion
building process, we can provide some examples.
Consider the ban imposed by Donald Trump: as
neoelected President of the US, at the end of January 2025
he imposed the closure of the platform in the US, since
he held that the Chinese government was receiving
sensitive data about US government and citizens
through TikTok users3. Given the amount of public
disagreement with such decision, also manifested
through many Americans signing up to Xiaohongshu,
another Chinese social [7][8], Trump postponed the
closure, and TikTok went dark for only one day on the
20th January. Conversely, political and social topics are
increasingly present on TikTok discourse: the political
importance of the platform can be seen also in spreading
and testifying major political events and boosting
discussion about major socio-political issues, such as the
Black Lives Matter protests (2020) [
        <xref ref-type="bibr" rid="ref8">9</xref>
        ], the killing of
Mahsa Amini (2022)[
        <xref ref-type="bibr" rid="ref7 ref9">10</xref>
        ], the war in Gaza
(20232 According to Statista, Average TikTok video length in 2023 and
2024.
3 The Trump-TikTok controversy was already on in 2020, during
his first administration.
4 For a theoretical perspective on communication dynamics on
TikTok, see [
        <xref ref-type="bibr" rid="ref16">17</xref>
        ].
present)[
        <xref ref-type="bibr" rid="ref10">11</xref>
        ], the #metoo movement (2020)[
        <xref ref-type="bibr" rid="ref11">12</xref>
        ], the
suspension of the unsentenced guilty raper by the
University of Leuven (2025) [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ].
      </p>
      <p>
        Given the importance social media nowadays hold
in our society, there is no doubt that TikTok constitutes
now a fundamental political mean [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ][
        <xref ref-type="bibr" rid="ref14">15</xref>
        ][
        <xref ref-type="bibr" rid="ref15">16</xref>
        ], which
makes it a viable field of study for our aims.
      </p>
      <sec id="sec-4-1">
        <title>2.1. TikTok and Linguistics</title>
        <p>
          Several studies in the field of linguistics (especially
acquisitional, clinical, and variational) have already
considered data coming from TikTok4. Many studies
regard specific domains, and were built through a
punctual methodology, which usually is not focused on
an open resource. The main application fields for these
studies regard especially language learning and teaching
practices enhanced through TikTok [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ][
          <xref ref-type="bibr" rid="ref18">19</xref>
          ] [
          <xref ref-type="bibr" rid="ref19">20</xref>
          ][
          <xref ref-type="bibr" rid="ref20">21</xref>
          ],
the study of code-switching dynamics detected on the
platform [
          <xref ref-type="bibr" rid="ref21">22</xref>
          ][
          <xref ref-type="bibr" rid="ref22">23</xref>
          ][
          <xref ref-type="bibr" rid="ref23">24</xref>
          ], language creativity [
          <xref ref-type="bibr" rid="ref24">25</xref>
          ][
          <xref ref-type="bibr" rid="ref25">26</xref>
          ], or
hate speech detection and moderation [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ][
          <xref ref-type="bibr" rid="ref27">28</xref>
          ][
          <xref ref-type="bibr" rid="ref28">29</xref>
          ].
Still, anyways, a description of the communicative
modality/ies employed on TikTok, and especially in
tiktoks, is missing. Furthermore, corpora of Computer
Mediated Communication [
          <xref ref-type="bibr" rid="ref29">30</xref>
          ] have mainly
concentrated (and this is also true for TikTok studies) on
thematic corpora, which on their own can provide a
partial portrait of the discourse on platforms. Just to
focus on some examples regarding Italian CMC corpora,
the only example we were able to find of a methodology
leading to a generalist corpus is the one by TWITA [
          <xref ref-type="bibr" rid="ref30">31</xref>
          ]5,
while others mainly exploit thematic hashtags
[
          <xref ref-type="bibr" rid="ref31">32</xref>
          ][
          <xref ref-type="bibr" rid="ref32">33</xref>
          ][
          <xref ref-type="bibr" rid="ref33">34</xref>
          ][
          <xref ref-type="bibr" rid="ref34">35</xref>
          ] or specific pages [
          <xref ref-type="bibr" rid="ref35">36</xref>
          ][
          <xref ref-type="bibr" rid="ref36">37</xref>
          ] for the
extraction. The reason for a generalist (“control-like”)
corpus stands in the fact that, in order for assertions on
specific subsections (or thematic sections) to be solid,
they should be checked with respect to how language is
generally used on the specific platform. It-tok aims at
providing both a description of the chosen path for the
creation of a generalist corpus of tiktoks, and a
characterization of modality displayed in such a content
format.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. Linguistic Functional Correlates</title>
      <p>As explained in the previous paragraph, TikTok has
already undergone a number of investigations in
linguistics. Still, anyways, a bottom-up description of
the functional features characterizing the platform is
missing. Most existing studies tend to adopt a top-down
5 TWITA exploits an extraction method which would have not been
much effective for tiktoks, as it is based on the extraction of tweets
with Italian most frequent words, but tiktoks cannot be extracted
based on words in the video, since automatic subtitles are not
searcheable.
approach, focusing on specific trends or phenomena,
without accounting for the underlying structural
features of TikTok communication as shaped by the
platform’s multimodal and technologically mediated
nature. Addressing this gap is one of the central aims of
the It-tok project, which seeks to identify TikTok’s
LFCs.</p>
      <p>
        The LFCs of a specific modality of communication
consist in the set of features which primarily describe
that specific modality and characterize it with respect to
others [
        <xref ref-type="bibr" rid="ref37">38</xref>
        ]. By modality, we mean the combination of
semiotic resources (e.g., speech, gesture, text, image,
sound), interactional dynamics (e.g., synchronicity,
turn-taking), and cognitive constraints (e.g., processing
time, spontaneity) that shape linguistic production in a
given environment. For instance, the spoken modality is
typically associated with the gesture-auditory-visual
channel, real-time interaction, prosody, and a high
degree of context-dependence. In contrast, written
modalities tend to involve planning, permanence, and
syntactic density, often favoring nominal constructions
over verbal ones [
        <xref ref-type="bibr" rid="ref38">39</xref>
        ].
      </p>
      <p>It is important to distinguish
between modality and channel. While channel refers
specifically to the physical means of transmission (e.g.,
auditory, visual, tactile), modality encompasses the
broader communicative framework that includes social
conventions, technological constraints, and the
multimodal configuration of the medium. In the case of
TikTok, the modality is particularly complex and hybrid,
since it combines features of spoken interaction (e.g.,
spontaneous speech, direct address to an audience) with
elements of edited visual media (e.g., cuts, overlays,
subtitles, background music), thereby creating a
composite, dynamic communicative environment.</p>
      <p>As noted in the literature on this topic, LFCs do not
depend on sociolinguistic features of speakers, but,
instead, they stay the same across diastratically and
diatopically different speakers. For this very reason, the
construction of It-tok could avoid taking into
consideration sociolinguistic representativeness issues,
focusing instead on capturing the linguistic regularities
that emerge specifically from the platform’s multimodal
communicative modality. The primary goal was to
ensure that the corpus would be suitable for identifying
these modality-driven patterns, rather than for mapping
speaker-based variation.
6 This was also confirmed by a questionnaire we spread over March
2025, concerned with the use of hashtags on TikTok.
7 TikTok for Business,
https://ads.tiktok.com/business/creativecenter/inspiration/popular/
hashtag/pc/en.
8 The same questionnaire cited in note 3 revealed that users of</p>
    </sec>
    <sec id="sec-6">
      <title>4. Building It-tok</title>
      <p>
        Some issues with building a corpus from TikTok videos
have already been pointed out in [
        <xref ref-type="bibr" rid="ref39">40</xref>
        ]: namely, the
authors refer to different formats of the videos, necessity
of manual supervising for the automatic transcriptions,
ethical considerations. Throughout our work, we tried
to address these issues, regarding which we tried to
make choices as solid as possible.
      </p>
      <p>To identify LFCs of TikTok discourse (subcorpus Gen),
and to compare them to the LFCs of that sub-part of
TikTok discourse which concerns themes of interest for
the public debate (subcorpus PolSo), we proceeded
through a double phased data collection.</p>
      <sec id="sec-6-1">
        <title>4.1. Corpus Building Process</title>
        <p>TikTok API allows for the extraction of a maximum of
100 videos per extraction, which shall be from a 30 days
time period, so the extraction was carried out month by
month. The affordances of this research API does not
allow for queries of tokens within the automatically
generated captions (which would have been the
preferred path), but it allows for querying hashtags.
TikTok displays several characteristics in common with
other platforms. One of these, is the affordance of
hashtags. Hashtags are (small strings of) words, which
function as hyperlinks, and link a content directly to
others which contain the same hashtag. Most hashtags
are thematic, in the sense that they describe the topic of
that content. But this is not the only function they have
on social media. In fact, hashtags can also be exploited
to gain followers, or views, and in this case their form is
a bit different. While, regarding the first function,
hashtags do not display particularities on TikTok,
considering the second one, these hashtags usually have
a very transparent form on other platforms
(#followforfollow, #followme). This is not the case for
TikTok hashtags. Here hashtags are exploited by users
in a way which, according to them, would boost the
algorithm, and make them gain more views6, but their
form is by far less transparent, namely we have #foryou,
#fyp, #perte, which all refer to the for you page of the app.
This type of hashtags is by far the most used on TikTok7,
compared to thematic ones8. The so-called for you page
(it. per te), so commonly cited in the hashtags, is the main
page of the app, where users get the content TikTok
suggests them based on what they liked or watched for
longer9. What distinguishes the fyp from the other
scrollable pages is the fact that in the fyp users are
TikTok are well aware there are hashtags specifically useful (or
thought to be useful) for TikTok content rather than other social
media.
9 An equivalent of the TikTok fyp is the search page on Instagram,
or the tl (timeline) on X.
reached by content not necessarily published by people
they follow. Therefore, to get in other people’s fyp
means to get more visibility on the app. For this reason,
users tend to exploit hashtags connected to fyp.</p>
        <p>Another way to boost the popularity of one’s
content consists in using thematic trending or popular
hashtags, even for videos which have nothing to do with
it.</p>
        <p>All these features consistently affected our
methodology of retrieving data, which had to consider
the peculiarities of the platform.</p>
        <p>Because of the peculiarities of TikTok hashtags, we
chose to pay some special attention to the hashtags used
for the query, and in particular we had to avoid keyword
with a scope which was too large and concurrently the
ones whose scope was too restricted, since we would
have risked ending up with no results. We extracted a
minimum of 15 videos per month for each of the
subcorpora, selecting them by duration (&gt;60s) and
region of publication ("IT"=Italy). The video extracted
were all published between October 2024 and January
2025. The extraction was performed during February
and March 2025. Among the videos reached, only the
ones showing the voice_to_text feature, namely the
TikTok automatic transcriptions, were considered viable
for It-tok. This way we could avoid video memes
(usually shorter than 60s), those videos where the
message is carried by the music rather than the speech
and the ones in which there is no speech at all. Note that
we did not use the automatic transcription as the final
transcript: its presence was solely employed as a filter to
exclude videos that did not contain or feature any
spoken language. This way, we isolated the materials
containing spoken language, whether continuous or
discontinuous, explicitly excluding content such as
memes, images carousels, or other materials lacking
spoken language.</p>
        <p>
          Finally, we got a total of 196 viable videos. Those
videos were automatically downloaded, transcribed
through the tool Open-AI Whisper in Python [
          <xref ref-type="bibr" rid="ref40">41</xref>
          ], both
in aligned .txt and .eaf files. The transcriptions were
annotated using the CLIPS [
          <xref ref-type="bibr" rid="ref41">42</xref>
          ] standard [
          <xref ref-type="bibr" rid="ref42">43</xref>
          ]. We
decided to add some tags, which we thought would be
useful for detecting specific sections of the texts. Table 1
summarizes the tags to be found in the annotated
transcription.
        </p>
        <p>
          Finally the .txt files were automatically tagged
(through spaCy [
          <xref ref-type="bibr" rid="ref43">44</xref>
          ]), ending up in a .conllu file.
        </p>
        <p>To sum up, for each video It-tok provides:
•
•
•
•
•
a .mp4 file;
a .txt file;
an antr.txt file;
a .conllu file;
a .eaf file.</p>
        <p>
          The CoNLL-U file PROPN tags were exploited to carry
out the anonymization of the files.
As for now, the CoNLL-U files were checked just for the
PoS and lemma columns. Here we also tagged discourse
markers (DMs), in order to make them easily retrievable.
We chose to tag DMs because we thought they could
provide a measure of the extent to which TikTok
discourse could be compared to spoken language, and
since they are also included in the features which make
up LFCs [
          <xref ref-type="bibr" rid="ref38">39</xref>
          ].
        </p>
        <p>
          During the extraction, both in the process for PolSo
and Gen, we noticed that the number of minimum
extraction necessary for reaching the minimum of 15
viable videos per month differed sensibly from month to
month, as can be seen in Table 2. We supposed it
depended on the period of the year the videos we were
extracting belonged to. Particularly, we decided to
extract videos from October, November and December
of 2024 and January of 2025. As it is well known, the
amount of posting, and the quality of posts on social
media is very much dependent on the time of posting. In
particular, during the last months of the year more
“seasonal” posting happens [
          <xref ref-type="bibr" rid="ref44">45</xref>
          ][
          <xref ref-type="bibr" rid="ref45">46</xref>
          ][
          <xref ref-type="bibr" rid="ref46">47</xref>
          ], which may be
due to specific festivities (Halloween, Christmas, New
        </p>
        <sec id="sec-6-1-1">
          <title>PolSo</title>
          <p>Gen</p>
          <p>Oct
Nov
Dec
Jan
Feb
Mar
Oct
Nov
Dec
Jan
Feb
Mar
Years’ Eve) or the whole period of “end of the year”
wrapped. This seasonal posting primarily consists of
videos that likely do not meet our extraction criteria, as
they are probably shorter than 60 seconds and/or lack
spoken language. Consequently, to make sure it was a
contingency of the peculiarities of the months
considered, we attempted a subsequent extraction of
February and March 2025, which showed a piece of
evidence favoring our hypothesis, as they show a rate of
videos featuring voice_to_text similar to the one
displayed by January. This happens because the trends
usually developing or spreading at the end of the year
are trends that usually do not produce videos that would
have been considered viable for our data (i.e., they are
usually short, with songs or media carrying the message
rather than the words and consequently not featuring
voice_to_text).
4.2. PolSo
Our thematic section was collected by extracting videos
whose description included (at least) one in a list of
hashtags. Due to the original thematic nature of
hashtags, they usually have a general form, which made
us prefer them with respect to keywords in video
descriptions, as these last would have needed a broader
consideration of their flected and/or derived forms (i.e.,
femminista ‘feminist.SG’, femministe ‘feminist.PL.F’,
femministi ‘feminist.PL.M’, femminismo ‘feminism’).</p>
          <p>
            The selection of the hashtags was carried out based
on our common sense of users, and on the most recent
surveys about what worries Gen Z10 the most (especially
compared to GenX), carried out by IPSOS in 202211 [
            <xref ref-type="bibr" rid="ref48">49</xref>
            ].
Furthermore, to ensure that this preference aligned with
the interests of Italian youth, we distributed a brief
online questionnaire via Google Forms to a random
sample of individuals under the age of 27, selected
through cluster sampling. The responses confirmed the
primary areas of interest and, to some extent, introduced
additional hashtags related to foreign policy, an area
that, anyways is not currently taken into consideration
for It-tok. The themes regard mainly civil rights and
internal politics issues and can be subdivided in four
groups: environmental and ecological crisis, national
identities and policies, politicians, and social
intersectional rights. Table 3 shows the hashtag names
selected, together with their category.
          </p>
          <p>
            Following the questionnaire, we plan on realizing an
expansion of the PolSo section with themes from foreign
politics12.
4.3. Gen
As regards the generalist section, our modus operandi
was completely different. Since we could not find any
TikTok generalist corpus building methodology which
would have been somewhat exhaustive, we opted for a
format-based extraction strategy. Specifically, we
selected three widely used formats on the platform,
which are very common on the platform and that differ
primarily in their varying degree of (perceived)
interactionality: storytimes, answers, stitches.
Particularly, the extraction of these last two types was
based on the external caption TikTok automatically
produces when creating a video in these formats:
namely, risposta a ‘answer to’ and #stitch con ‘stitch
with’. Storytimes were extracted through hashtags.
In order to maintain internal consistency and
comparability, we also aimed to keep the duration of the
videos across the three formats as uniform as possible.
4.3.1. Storytime
The first format we exploited was the storytime,
extracted through the corresponding hashtag. A
storytime video displays a person usually speaking
directly to the camera, telling a story, usually from their
personal life, and unsolicited by anyone. Therefore,
storytimes are strongly monological. They make a
format of their own on a number of platforms13, as they
were also published on YouTube since 2015.
10 GenZ is the most present generation on TikTok [
            <xref ref-type="bibr" rid="ref47">48</xref>
            ].
11 The survey showed that while GenX members are more
interested in themes such as taxes, (un)employement levels and job
market, GenZers care more about the environment, education and
civil rights.
12 Since LFCs do not depend on the theme, they shall not be
interested in the specific topic of the video, so they shall remain the
same as the ones we will extract from PolSo as it is.
13 The massive presence of such a format is linked to its efficiency
in being an instrument for creating online communities [
            <xref ref-type="bibr" rid="ref49">50</xref>
            ].
          </p>
        </sec>
        <sec id="sec-6-1-2">
          <title>List of the hashtags chosen for the extraction</title>
          <p>Category hashtags
environmental ecologismo ‘ecologism’, ecoansia
and ecological ‘ecoanxiety’, ecoterrorismo ‘ecoterrorism’,
crisis overtourism, antispecismo
‘antispeciesism’, specismo ‘speciesism’,
ecofemminismo ‘ecofeminism’
national
identities and
policies
social
intersectional
rights
politicians’
names
4.3.2. Answer
capitalismo ‘capitalism’, anticapitalismo
‘anticapitalism’, migrante ‘migrant’,
migranti ‘migrants’, rifugiati ‘refugees’,
antifascista ‘antifascist’, antirazzista
‘antiracist’, razzismo ‘racism’
femminismo ‘feminism’, feminist, metoo,
femminicidio ‘femicide’, patriarcato
‘patriarchy’, violenzadigenere
‘genderbased violence’, aborto ‘abortion’,
misoginia ‘misogyny’, omofobia
‘homophobia’, transfobia ‘transphobia’,
omolesbobitransfobia ‘homo- lesbo-
bitrans- phobia’, dirittiLGBT ‘LGBT rights’,
abilismo ‘ableism’, grassofobia
‘fatphobia’, femminismointersezionale
‘intersectional feminism’, intersezionale
‘intersectional’, privilegio ‘privilege’,
woke
giorgiameloni, governomeloni,
matteosalvini, giuseppeconte, ellyschlein,
antoniotajani
The second format we selected is composed by answers.
In these videos, the creator selects a comment to one of
their videos and answers to the comment through a
tiktok, rather than through writing another comment.
This affordance is exploited for two reasons: either
because writing would have costed too much time, or
because a video answer is content in itself, whereas a
written answer is not, and TikTok algorithm is said to
boost accounts posting frequently. Since answers
directly refer to one comment, they can be considered
more interactional, compared to storytimes. This is also
to be seen in the linguistic features answers they display:
more deictic expressions or second person pronouns and
verb forms.
4.3.3. Stitch
Stitches are somewhere in the middle between answers
and storytimes. A stitch is a video which starts
with/from another video section, usually lasting no
more than 10 seconds, by another creator14. The stitched
video (i.e., the original one) can be either used as a base
for speaking one’s mind on a subject introduced by said
stitched video itself, making it just a clarifier for the
context, or can have the same function as the base
comment in the answer videos, and in this case, as
happens for answer videos, linguistic features include
second person pronouns and verb forms. So, because of
their very nature, stitches can be seen as formats more
interactional than storytimes, but less interactional than
answers.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>4.4. Semi-supervised Automatic Collection</title>
        <p>All the processes the videos went through were to be
checked manually, as automatized processing turned out
to be not always completely reliable. As for the
transcriptions, this could be due to the quality of the
sound, the presence of dialect or of strong regional
markedness, or the (highly variable) speech rate.
Regarding the tagging, as we mentioned earlier (4.1), at
the moment we did not check the syntactic tagging yet,
but the PoS tagging and lemmatization outputs were to
be manually checked, as they presented some
inaccuracies. Lemmatization was to be corrected
especially for cases such as:
Regarding PoS tagging, main issues pertained:
•
•
•
•
•
•
less frequent verbs or verb forms, e.g. tatuo
‘tattoo.PRS.1SG’ was lemmatized as tatuere
instead of tatuare, or future forms like ripeterai
‘repeat.FUT.2SG’ lemmatized as ripeterai instead
of ripetere;
irregular verb forms, e.g., sai ‘know.PRS.2SG’
got lemmatized as saare instead of sapere;
verb forms displaying suppletivism, e.g., vai
‘go.PRS.2SG’ got lemmatized as vai instead of
andare.
loans, marked as proper names;
big numbers, such as years, which in the
transcription were written in words (e.g. not
20 but twenty) and got marked as proper
names;
deverbal nouns, e.g. (il) ritorno ‘(the) come
back’, tagged as VERB instead of NOUN.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Current Status and Future</title>
    </sec>
    <sec id="sec-8">
      <title>Perspectives</title>
      <sec id="sec-8-1">
        <title>5.1. Current status of It-tok</title>
        <p>
          As for April 2025, we extracted a total of 196 videos for
a total duration of more than 7 hours (see Table 4 for a
deeper insight). We are in the process of analyzing some
14 According to TikTok support: “Stitch allows you to combine a
video on TikTok with one you're creating” [
          <xref ref-type="bibr" rid="ref50">51</xref>
          ].
        </p>
        <p>
          LFCs, focusing on particular lexicosyntactic traits.
Anyways, we are well aware that a number of
sociophonetic and morphological features could be
analyzed, but we chose to focus only on some levels. In
the meantime, It-tok shall be published online by the
beginning of 2026. Concerning the publication and the
treatment of data, we considered what was done
during other studies based on TikTok data.
Particularly, since the text data cannot be traced back
directly to the original videos, and since the content
we extracted is publicly accessible online, the data is
to be considered of public domain [
          <xref ref-type="bibr" rid="ref39">40</xref>
          ][
          <xref ref-type="bibr" rid="ref51">52</xref>
          ].
        </p>
      </sec>
      <sec id="sec-8-2">
        <title>5.2. Preliminary Observations</title>
        <p>Table 6 confirms what we asserted about the way
hashtags are used on TikTok: on the one hand, it is well
true that some of the most used hashtags, excluding the
ones we explicitly searched videos for, are thematic, but
this is a very specific way of using hashtags, since the
only ones we find are connected to PolSo themes. Videos
from PolSo, in fact, display an overabundance of
hashtags, compared to the ones in Gen (i.e., 9,37 vs. 3,2
hashtags per video, in average). Nonetheless, the most
frequent hashtags remain #perte and #fyp, both very
TikTok-specific and directed towards gaining views
and/or boosting the content through the algorithm.</p>
        <p>
          Turning to LFCs, we preliminarly looked at the
distribution of PoS in the It-tok corpus. PoS were chosen
for first assessment of some modality characteristics
because it has been shown that they correlate with the
spokenness of texts. In particular, nouns and verbs, and
their respective modifiers, adjectives and adverbs, have
been said to act as pivotal units in the construction of a
text. Their frequency offers significant insights into how
different types of texts are syntactically structured and
how modality influences linguistic composition.
Specifically, nouns tend to be more prominent in written
texts, while the frequency of verbs increases
progressively as one moves towards more natural
spoken language [
          <xref ref-type="bibr" rid="ref38">39</xref>
          ]. Such a tendency is seen also for
It-tok: Table 7 shows PoS occurrence percentages of
Ittok, along with a comparison with the corpora:
•
•
•
        </p>
        <p>
          Lessico dell’Italiano Parlato (LIP), a corpus of
spoken Italian [
          <xref ref-type="bibr" rid="ref52">53</xref>
          ];
Primo Tesoro della lingua italiana letteraria del
Novecento (PTLLI), a corpus of literary Italian,
from the 1900s [
          <xref ref-type="bibr" rid="ref53">54</xref>
          ]
Corpus Scritto (CS), a corpus of written Italian
[
          <xref ref-type="bibr" rid="ref54">55</xref>
          ].
        </p>
        <p>It-tok
17,9%
5,6%</p>
        <p>
          The data for LIP, PTLLI and CS in Table 7 is taken
from [
          <xref ref-type="bibr" rid="ref39">40</xref>
          ]. The corpora shown above are the ones
previous research was performed on, and for which the
tendencies named had been observed. However, similar
tendencies emerge also from the CORIS [
          <xref ref-type="bibr" rid="ref55">56</xref>
          ], ItWac [
          <xref ref-type="bibr" rid="ref56">57</xref>
          ]
and Paisà [
          <xref ref-type="bibr" rid="ref57">58</xref>
          ] written corpora, see Table 8. Notice that
these last two corpora are collected from online sources.
Also with respect to these, It-tok alignes more closely to
        </p>
        <p>NOUN
ADJ</p>
        <p>It must be considered, anyways, that the two
subcorpora differ greatly by linguistic features
displayed. As an example, Figure 1 shows the difference
in the occurrence of DMs, which are strongly associated
with spoken modality, with respect to the total of tokens
in the different texts15.</p>
        <p>As can be seen in Figure 1, even though the two
subcorpora are equally spoken, it seems that the
thematic one employs a kind of speech which is
probably less hesitant. Further research will include a
differentiation based on the different themes, within
PolSo, and based on different features, between the
PolSo and Gen.</p>
      </sec>
      <sec id="sec-8-3">
        <title>5.3. Future perspectives</title>
        <p>Next advancements in It-tok building involve exploring
the language features findable on TikTok (e.g., newly
imported constructions or neologisms), broading It-tok
and its scope, and the building of a treebank of TikTok
discourse.</p>
        <p>Though It-tok being a still very small corpus, it
displays the potentiality to show a number of linguistic
15 p &lt; 0.001 for the Mann-Whitney test.
uses hardly findable in traditional corpora, like creative
uses or newly registered loans (see 1-3), for which It-tok
could also be searched.</p>
        <p>(1) venire blastato da
meraviglioso (0125_S)
mio
nipote
è
stato
‘to be blasted by my nephew was wonderful’
In (1), blastato &lt; blastare &lt; en. to blast stands for ‘getting
humiliated’ through words. An adapted loan can be seen
also in (2), where flexare sth. &lt; en. to flex stands for
‘show one’s ability in sth.’.</p>
        <p>(2) […] possibilità di flexare un po' di statistica
(G0125_16)
‘[...] opportunity to flex some statistics’
In (3) it’s the whole passive construction to get
borrowed.</p>
        <p>
          (3) Un calciatore della Juventus è stato fatto outing
(1224_F)
‘A Juventus player was outed’
Another set of features, phonetics in nature, that could
be thus investigated regards the so-called “influencer
accent”, which was noticed around the internet but still
never assessed [
          <xref ref-type="bibr" rid="ref58">59</xref>
          ][
          <xref ref-type="bibr" rid="ref59">60</xref>
          ].
        </p>
        <p>
          Furthermore, due to its informal nature, TikTok
could provide naturalistic data for a number of
linguistics areas of interest, e. g., neologisms and
gendered neologisms [
          <xref ref-type="bibr" rid="ref60">61</xref>
          ][
          <xref ref-type="bibr" rid="ref61">62</xref>
          ]16, or code
switching/mixing phenomena [
          <xref ref-type="bibr" rid="ref62">63</xref>
          ].
        </p>
        <p>The expansion we foresee for It-tok regards Gen,
but also partially PolSo. Nonetheless, based on the
methodology applied for the extraction, Gen could be
easily systematically broadened, making it a potential
monitor corpus for Italian TikTok discourse through an
yearly update. Furthermore, PolSo will be widened to
include themes of foreign policy. A further expansion
could pertain a set of videos that were systematically
excluded with the present methodology: particularly,
the video memes, and that could constitute the base for
studies on innovative linguistic forms, could be
extracted through hashtags such as #memetok. The
pseudosuffix -tok can apply to any word X, i.e. X-tok,
standing for ‘section of TikTok regarding X’. Examples
of usage involve booktok ‘section of TikTok regarding
books’, cattok ‘section of TikTok regarding cats’,
footballtok ‘section of TikTok regarding football,
16 In fact, some gendered neologisms, such as girl dinner, actually
were born from a trend on TikTok, and then spread all over other
social networks.
feministtok ‘section of TikTok dedicated to feminism’,
lefttok ‘section of TikTok filled with leftists. From this
pseudosuffix, It-tok takes its name.</p>
        <p>
          Finally, we will be building a treebank of at least 10%
of It-tok, based on the methodology implemented for the
KIPARLA forest project [
          <xref ref-type="bibr" rid="ref63">64</xref>
          ][
          <xref ref-type="bibr" rid="ref64">65</xref>
          ]. This will allow for
syntactic queries, and make visible LFCs which are
proper of the syntactic level of analysis (e.g., types of
clauses, syntactic dependencies, subordination,
syntactic heaviness).
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>6. Conclusions</title>
      <p>With this contribution, we aimed to provide a brief
overview of the methods adopted and the decisions
made during the construction of an Italian TikTok
corpus. Our choices were guided both by the specific
communicative dynamics of the platform and by our
research objectives, namely, to assess certain LFCs of
TikTok discourse and, where applicable, to distinguish
between generalist and thematic subtypes.</p>
      <p>It-tok is structured to represent the first generalist
corpus of spoken Italian on TikTok, and besides its main
aims about LFCs and characteristics of political and
social discourse online, it can represent a way to open
TikTok to linguistic systematic studies, because of its
replicable methodology, also applicable to create
comparable corpora for other languages.</p>
      <p>Due to the aim to describe LFCs in both general
TikTok discourse and discourse on political and social
topics, we adopted a split extraction strategy. Manual
supervision was required at all stages of the automated
processing to ensure consistency, accuracy, and
compliance with the criteria established for corpus
inclusion.</p>
      <p>
        Importantly, the multimodal nature of TikTok, as a
platform where language coexists with visual, auditory,
and gestural elements, means that its texts are
inherently complex and shaped by multiple interacting
variables. These characteristics pose unique challenges
for corpus design and analysis, they also provide
valuable insights into modern digital communication
practices, both in terms of parallel communication
channels and the simultaneous use of multiple semiotic
modes to construct a message (e.g., use of emojis, or
particular visual rendering of verbal language, such as
the SpongeBob Mocking meme to convey derision [
        <xref ref-type="bibr" rid="ref65">66</xref>
        ]).
      </p>
      <p>Once completed, It-tok will provide a linguistically
annotated corpus of Italian TikTok discourse, featuring
transcriptions formatted according to the CLIPS
conventions and annotated at multiple levels, including
PoS tagging, morphological features, and syntactic
17 Website of the observatory available at URL:
https://olindinum.huma-num.fr.
dependencies in UD.</p>
      <p>The corpus will also include a small but
representative treebank, offering structured syntactic
analyses of selected texts that reflect the linguistic
complexity of this emerging multimodal variety.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>I would like to thank prof. Voghera, and the whole
linguistics nucleus of the University of Salerno, for the
suggestions and the discussions about the project.
This project is part of the program of education and
sensibilization towards online public discourse of the
OLiNDiNUM17 - Observatoire LINguistique du DIscours
NUMérique project.</p>
    </sec>
    <sec id="sec-11">
      <title>A. Online Resources</title>
      <p>The corpus repository, which documents the corpus and
treebank construction processes and the challenges
encountered in the syntactic annotation of spoken data,
is available on GitHub
(https://github.com/cabinsix/ItTok). The repository will host the transcribed and
anonymized files, along with their corresponding
CoNLL-U formatted versions. The treebank is currently
under development and can be accessed via the
Arborator platform
(https://arborator.grew.fr/?#/projects/It-tok).</p>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Paraphrase
and reword and Grammar and spelling check. After using these tool(s)/service(s), the author(s)
reviewed and edited the content as needed and take(s) full responsibility for the publication’s
content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Policarpi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rombi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Voghera</surname>
          </string-name>
          ,
          <article-title>Nomi e verbi in sincronia e diacronia: multidimensionalità della variazione</article-title>
          , in: A.
          <string-name>
            <surname>Ferrari</surname>
          </string-name>
          (Ed.),
          <article-title>Sintassi storica e sincronica dell'italiano. Subordinazione, coordinazione, giustapposizione. Atti del X Congresso della Società Internazionale di Linguistica e Filologia Italiana (Basilea, 30 giugno -</article-title>
          3 luglio
          <year>2008</year>
          ), Cesati, Firenze, vol.
          <source>I (543-560)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Sammarco</surname>
          </string-name>
          ,
          <article-title>Il contributo delle costruzioni senza verbo nell'espressione delle relazioni spaziali nel Parlato</article-title>
          , in Testi e Linguaggi,
          <volume>14</volume>
          (
          <year>2020</year>
          ),
          <fpage>91</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Gaudino-Fallegger</surname>
          </string-name>
          ,
          <article-title>I dimostrativi nell'italiano parlato</article-title>
          .
          <source>Wilhelmsfeld: Egert</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>TikTok user demographics: what's the average age of TikTok users? SOAX</article-title>
          . URL: https://soax.com/research/average-age-of-tiktokusers.
          <source>Last accessed on 28th April</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>TikTok: distribution of global audience, by age and gender</article-title>
          . Statista. URL: https://www.statista.com/statistics/1299771/
          <article-title>tikto k-global-user-age-distribution/</article-title>
          .
          <source>Last accessed on 28th April</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>I.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gourlai</surname>
          </string-name>
          , Postdigital / More ‐ Than ‐ Digital
          <string-name>
            <surname>Meaning‐Making</surname>
          </string-name>
          ,
          <article-title>Postdigital Science and Education (</article-title>
          <year>2024</year>
          )
          <volume>6</volume>
          :
          <fpage>735</fpage>
          -
          <lpage>742</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          doi.org/10.1007/s42438-024-00512-1
          <string-name>
            <given-names>American</given-names>
            <surname>Defiance Against TikTok Ban Fuels</surname>
          </string-name>
          <article-title>Surge in Alternative Social Media Platforms</article-title>
          , Legal News Feed,
          <source>Last accessed on 26th April</source>
          <year>2025</year>
          . URL: https://legalnewsfeed.com/
          <year>2025</year>
          /01/14/americandefiance-against
          <article-title>-tiktok-ban-fuels-surge-inalternative-social-media-platforms/? TikTok users in US flock to 'China's Instagram'</article-title>
          , RedNote, ahead of ban, Al Jazeera,
          <source>Last accessed on 26th April</source>
          <year>2025</year>
          . URL: https://www.aljazeera.com/amp/economy/2025/1/ 15/tiktok-users
          <article-title>-in-us-flock-to-chinas-instagramahead-of-ban</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[9] TikTok serves as hub for #blacklivesmatter activism</article-title>
          ,
          <source>CNN. Last accessed on 26th April</source>
          <year>2025</year>
          . URL: https://edition.cnn.com/
          <year>2020</year>
          /06/04/politics/tiktok-black
          <article-title>-lives-matter/index</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Walsh</surname>
          </string-name>
          , “
          <article-title>TikTok as a site of social protest in Iran's Gen-Z uprising</article-title>
          .”
          <source>Discourse &amp; Society, 35.5</source>
          (
          <year>2024</year>
          ):
          <fpage>625</fpage>
          -
          <lpage>650</lpage>
          . doi.org/10.1177/09579265241234351
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Abu</surname>
          </string-name>
          <string-name>
            <surname>Laban</surname>
          </string-name>
          , “
          <article-title>The Role of TikTok in Disseminating the Palestinian Narrative during the War on Gaza from the Perspective of Palestinian University Students</article-title>
          .”
          <source>Advances in Journalism and Communication</source>
          ,
          <volume>11</volume>
          (
          <year>2023</year>
          ):
          <fpage>394</fpage>
          -
          <lpage>408</lpage>
          . doi.org/10.4236/ajc.
          <year>2024</year>
          .123021
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>McEwan</surname>
          </string-name>
          ,
          <article-title>Viral paradox: The intersection of “me too” and #</article-title>
          <string-name>
            <surname>MeToo</surname>
          </string-name>
          , New Media &amp; Society, 26.6 (
          <year>2022</year>
          ):
          <fpage>3454</fpage>
          -
          <lpage>3471</lpage>
          . doi.org/10.1177/14614448221099187
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Leuven</given-names>
            <surname>Public</surname>
          </string-name>
          <article-title>Prosecutor appeals verdict of medical student rape case</article-title>
          .
          <source>The Brussel Times. Last accessed on 28th April</source>
          <year>2025</year>
          . . URL: https://www.brusselstimes.com/1518910/leuvenpublic-prosecutor
          <article-title>-appeals-verdict-of-medicalstudent-rape-case</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I.</given-names>
            <surname>Literat</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Kligler-Vilenchik, TikTok as a Key Platform for Youth Political Expression: Reflecting on the Opportunities and Stakes Involved</article-title>
          .
          <source>Social Media + Society, 9.1</source>
          (
          <year>2023</year>
          )
          <article-title>: doi</article-title>
          .org/10.1177/20563051231157595
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>From</given-names>
            <surname>Viral</surname>
          </string-name>
          <article-title>Dances to Political Movements: The Impact of TikTok Challenges and Memes, Medium</article-title>
          .
          <source>Last accessed on 28th April</source>
          <year>2025</year>
          . URL: https://medium.com/%40wilsonrolypaul/
          <article-title>fromviral-dances-to-political-movements-the-impactof-tiktok-challenges-and-memes-609632842f3e</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <article-title>The Weapon of the Century: Contemporary Politics Through the TikTok Algorithm, The Harvard Political Review</article-title>
          .
          <source>Last accessed on 28th April</source>
          <year>2025</year>
          . URL: https://theharvardpoliticalreview.com/tiktokpolitics-algorithm/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marino</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          Surace (Eds.), TikTok.
          <article-title>Capire le dinamiche della comunicazione iper-social</article-title>
          .
          <source>Hoepli editore, Milano</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nu</surname>
          </string-name>
          <article-title>'man</article-title>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Indriana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ainul &amp; R. D. Hasti</surname>
          </string-name>
          .
          <article-title>Improving Verbal Linguistic Intelligence in Early Childhood Through the Use of Tiktok Media</article-title>
          ,
          <source>Jurnal Obsesi Jurnal Pendidikan Anak Usia Dini</source>
          ,
          <volume>6</volume>
          .3 (
          <year>2022</year>
          )
          <fpage>2316</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Fitria</surname>
          </string-name>
          ,
          <article-title>Value Engagement of TikTok: A Review of TikTok as Learning Media for Language Learners in Pronunciation Skill</article-title>
          . EBONY,
          <source>Journal of English Language Teaching</source>
          , Linguistics, and Literature,
          <volume>3</volume>
          .2 (
          <issue>2023</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>108</lpage>
          . doi.org/10.37304/ebony.v3i2.
          <fpage>9605</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Fitria</surname>
          </string-name>
          ,
          <article-title>Using TikTok application as an English teaching media: a literature review</article-title>
          ,
          <source>Journal of English Language Teaching, Applied Linguistics, and Literature</source>
          ,
          <volume>6</volume>
          .2 (
          <issue>2023</issue>
          ),
          <fpage>109</fpage>
          -
          <lpage>124</lpage>
          . doi.org/10.20527/jetall.v6i2.
          <fpage>16058</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Leon Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>M. T. Feng</surname>
          </string-name>
          , TikTok Refugees,
          <article-title>Digital Migration, and the Expanding Affordances of Xiaohongshu (RedNote) for Informal Language Learning</article-title>
          ,
          <source>International Journal of TESOL Studies</source>
          (
          <year>2025</year>
          ),
          <volume>250123</volume>
          . doi.org/10.58304/ijts.250123
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Daulay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Nst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Ningsih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Beretu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Irham</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>R. Mahmudah</surname>
          </string-name>
          ,
          <article-title>Code Switching in the Social Media Era: A Linguistic Analysis of Instagram and TikTok Users</article-title>
          ,
          <source>Humanitatis: Journal of Language and Literature</source>
          ,
          <volume>10</volume>
          .2 (
          <issue>2024</issue>
          ),
          <fpage>373</fpage>
          -
          <lpage>385</lpage>
          .
          <fpage>10</fpage>
          .30812/humanitatis.v10i2.
          <fpage>3837</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          &amp;
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Investigating translanguaging strategies and online self-presentation through internet slang on Douyin (Chinese TikTok)</article-title>
          , Applied Linguistics Review,
          <volume>15</volume>
          .6 (
          <issue>2024</issue>
          ),
          <fpage>2823</fpage>
          -
          <lpage>2855</lpage>
          . doi.org/10.1515/applirev-2023
          <source>-0094</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>E.</given-names>
            <surname>Nurchurifiani</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Hanum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Damiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Oktariyani</surname>
          </string-name>
          ,
          <article-title>A code mxing usage on social media: a linguistic analysis of video on TikTok</article-title>
          , KLAUSA: Kajian Linguistik, Pembelajaran Bahasa,
          <source>dan Sastra - Journal of Linguistics, Literature, and Language Teaching</source>
          ,
          <volume>9</volume>
          .1 (
          <issue>2025</issue>
          ),
          <fpage>90</fpage>
          -
          <lpage>101</lpage>
          . doi.org/10.33479/klausa.v9i1.
          <fpage>1194</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ugoala</surname>
          </string-name>
          ,
          <article-title>Generation Z's lingos on TikTok: analysis of emerging structures</article-title>
          ,
          <source>Journal of Language of Communication</source>
          ,
          <volume>11</volume>
          .2 (
          <issue>2024</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>224</lpage>
          .
          <fpage>10</fpage>
          .47836/jlc.11.02.08
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomenchuk</surname>
          </string-name>
          &amp; T. Tiushka,
          <article-title>The impact of TikTok on the English language: slang and trends</article-title>
          ,
          <source>Vě da a perspektivy</source>
          ,
          <volume>11</volume>
          .42 (
          <year>2024</year>
          ),
          <fpage>441</fpage>
          -
          <lpage>447</lpage>
          . doi.org/10.52058/
          <fpage>2695</fpage>
          -1592-2024-
          <volume>11</volume>
          (
          <issue>42</issue>
          )
          <string-name>
            <surname>-</surname>
          </string-name>
          441-447
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Calhoun</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>A. Fawcett</surname>
          </string-name>
          , “
          <article-title>They Edited Out her Nip Nops”: Linguistic Innovation as Textual Censorship Avoidance on TikTok</article-title>
          .
          <source>Language@Internet</source>
          ,
          <volume>21</volume>
          (
          <year>2023</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
          <fpage>10</fpage>
          .14434/li.v21.
          <fpage>37371</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Rauchberg</surname>
          </string-name>
          , #
          <article-title>Shadowbanned: Queer, trans, and disabled creator responses to algorithmic oppression on TikTok</article-title>
          , in: P. Paromita (Ed.),
          <article-title>LGBTQ digital cultures: A global perspective</article-title>
          (
          <volume>196</volume>
          -
          <fpage>209</fpage>
          ). Routledge.
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fadhilah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Suswanto</surname>
          </string-name>
          &amp;
          <string-name>
            <given-names>Y. P.</given-names>
            <surname>Utami</surname>
          </string-name>
          . Forensic Linguistics:
          <article-title>Netizens' Hate Speech Implicature on the Issue of the 2024 Presidential Election (TikTok Social Media Case Study)</article-title>
          ,
          <source>Technium Social Sciences Journal</source>
          ,
          <volume>50</volume>
          (
          <year>2023</year>
          ),
          <fpage>204</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>C.</given-names>
            <surname>Thurlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lengel</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>A. Tomic</surname>
          </string-name>
          , Computer Mediated Communication, Sage publications, London,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>M. Nissim</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis on Italian tweets</article-title>
          , in: A.
          <string-name>
            <surname>Balahur</surname>
            ,
            <given-names>E. van der Goot</given-names>
          </string-name>
          &amp; A.
          <string-name>
            <surname>Montoyo</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          (
          <volume>100</volume>
          -
          <fpage>107</fpage>
          ), ACL,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Donati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Polidori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vernillo</surname>
          </string-name>
          , G. Gagliardi,
          <article-title>Building a corpus on Eating Disorders from TikTok: challenges and opportunities</article-title>
          ,
          <source>in Proceedings of the Ninth Italian Conference on Computational Linguistic</source>
          (CLiC-it
          <year>2024</year>
          ),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M.</given-names>
            <surname>Palermo</surname>
          </string-name>
          ,
          <article-title>La rappresentazione multimodale dei dialetti su TikTok</article-title>
          , Italiano LinguaDue,
          <volume>14</volume>
          .2 (
          <issue>2023</issue>
          ),
          <fpage>131</fpage>
          -
          <lpage>139</lpage>
          . doi.org/10.54103/2037-3597/
          <fpage>19652</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>I.</given-names>
            <surname>Caiazzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Dimitri</surname>
          </string-name>
          &amp; L. Tronci,
          <article-title>IncluInstIT: Un nuovo corpus per lo studio di linguaggio inclusivo su Instagram</article-title>
          , in: S. Rebora,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Bazzaco</surname>
          </string-name>
          , (Eds.), Diversità, Equità e Inclusione:
          <article-title>Sfide e Opportunità per l'Informatica Umanistica nell'Era dell'Intelligenza Artificiale</article-title>
          ,
          <source>Proceedings del XIV Convegno Annuale AIUCD</source>
          <year>2025</year>
          (
          <volume>35</volume>
          -
          <fpage>39</fpage>
          ).
          <source>Verona: AIUCD</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          &amp; V.
          <string-name>
            <surname>Patti</surname>
          </string-name>
          , TWITTIRO`:
          <article-title>a Social Media Corpus with a Multi-layered Annotation for Irony</article-title>
          , in: R. Basili,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nissim</surname>
          </string-name>
          &amp; G. Satta (Eds.),
          <source>Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it</source>
          <year>2017</year>
          ,
          <fpage>11</fpage>
          -12
          <source>December</source>
          <year>2017</year>
          , Rome (
          <volume>101</volume>
          -
          <fpage>106</fpage>
          ), Accademia University Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ferrini</surname>
          </string-name>
          ,
          <article-title>Il parlato-digitato dell'italiano come heritage language nei gruppi Facebook: riflessioni e modellizzazioni da un corpus multilingue</article-title>
          .
          <source>Italica</source>
          ,
          <volume>98</volume>
          .1 (
          <year>2021</year>
          ):
          <fpage>112</fpage>
          -
          <lpage>128</lpage>
          . doi.
          <source>org/10.5406/23256672.98.1.08</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Martari</surname>
          </string-name>
          ,
          <article-title>Come scrivono i politici italiani su Facebook Appunti per un'analisi linguistica comparative,</article-title>
          <string-name>
            <surname>L'Analisi Linguistica E Letteraria</surname>
          </string-name>
          ,
          <volume>26</volume>
          .2 (
          <issue>2018</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [38]
          <string-name>
            <surname>M. J. Luzón</surname>
          </string-name>
          ,
          <article-title>Forms and functions of intertextuality in academic tweets composed by research groups</article-title>
          ,
          <source>Journal of English for Academic Purposes</source>
          <volume>64</volume>
          (
          <year>2023</year>
          ),
          <volume>101254</volume>
          . doi.org/10.1016/j.jeap.
          <year>2023</year>
          .
          <volume>101254</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>M.</given-names>
            <surname>Voghera</surname>
          </string-name>
          ,
          <article-title>Dal parlato alla grammatica. Costruzione e forma dei testi spontanei</article-title>
          , Carocci, Roma,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Donati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vernillo</surname>
          </string-name>
          ,
          <article-title>La linguistica dei corpora nell'era dei social media: Le nuove sfide poste da TikTok</article-title>
          , in: S. Mattiola,
          <string-name>
            <given-names>M. Miličević</given-names>
            <surname>Petrović</surname>
          </string-name>
          , CLUB Working Papers in Linguistics, volume
          <volume>8</volume>
          , University of Bologna, Bologna,
          <year>2024</year>
          , doi.org/10.6092/unibo/amsacta/8065
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            ,
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Brockman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>McLeavey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          , Sutskever,
          <source>Robust Speech Recognition via Large-Scale Weak Supervision, International Conference on Machine Learning</source>
          (
          <year>2022</year>
          ). doi.org/10.48550/arXiv.2212.04356
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>F.</given-names>
            <surname>Albano Leoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sobrero</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Paoloni</surname>
          </string-name>
          ,
          <article-title>Corpora e lessici di italiano parlato e scritto (CLIPS), Bollettino di italianistica, Rivista di critica, storia letteraria</article-title>
          ,
          <source>filologia e linguistica 2</source>
          (
          <year>2007</year>
          ):
          <fpage>121</fpage>
          -
          <lpage>0</lpage>
          , doi: 10.7367/71826
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.</given-names>
            <surname>Savy</surname>
          </string-name>
          ,
          <string-name>
            <surname>CLIPS.</surname>
          </string-name>
          <article-title>Specifiche per la trascrizione ortografica annotata dei testi raccolti</article-title>
          .
          <source>Università del Salento</source>
          . URL: https://www.unisalento.it/documents/20152/5018 562/Specifiche+per+la+trascrizione+ortografica.p
          <source>df/414d183f-fd6a-2d31-7fbe44ac7ff63772?version=1</source>
          .
          <fpage>0</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          , I. Montani,
          <string-name>
            <surname>S. Van Landeghem</surname>
          </string-name>
          , &amp; A. Boyd, spaCy: Industrial-strength
          <source>Natural Language Processing in Python (</source>
          <year>2020</year>
          ). https://doi.org/10.5281/zenodo.1212303
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , G. Ande,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Toward Maximizing the Visibility of Content in Social Media Brand Pages: A Temporal Analysis</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>8</volume>
          .11 (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1007/s13278-018-0488-z
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takayasu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Havlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Takayasu, Identifying long-term periodic cycles and memories of collective emotion in online social media</article-title>
          ,
          <source>PLoS ONE 14.3</source>
          (
          <year>2019</year>
          )
          <article-title>: e0213843</article-title>
          . doi.org/10.1371/journal.pone.0213843
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>N.</given-names>
            <surname>Okano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Higashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <article-title>The Influence of Social Media Writing on Online Search Behavior for Seasonal Events: The Sociophysics Approach</article-title>
          ,
          <source>2018 IEEE International Conference on Big Data (Big Data)</source>
          , Seattle, WA, USA,
          <year>2018</year>
          ,
          <fpage>4339</fpage>
          -
          <lpage>4345</lpage>
          , doi: 10.1109/BigData.
          <year>2018</year>
          .
          <volume>8622186</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [48]
          <article-title>TikTok leads time spent on social for most US adults, E-marketer</article-title>
          .
          <source>Last accessed on 27th April</source>
          <year>2025</year>
          . URL: https://www.emarketer.com/content/tiktok-leadstime
          <article-title>-spent-on-social-most-us-adults#</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [49]
          <string-name>
            <surname>IPSOS</surname>
          </string-name>
          ,
          <article-title>Elezioni politiche 25 settembre 2022: il confronto tra Generazione Z e Millennials</article-title>
          .
          <source>Last accessed on 29th April</source>
          <year>2025</year>
          . . URL: https://www.ipsos.com/it-it/millenialsgenerazione-z
          <article-title>-rapporto-giovani-politica-italia</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Papacharissi</surname>
          </string-name>
          , Affective Publics: Sentiment, Technology, and
          <string-name>
            <surname>Politics</surname>
          </string-name>
          , Oxford University Press, Oxford,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [51]
          <article-title>What is a stitch, TikTok support</article-title>
          .
          <source>Last accessed on 20th April</source>
          <year>2025</year>
          . URL: https://support.tiktok.com/en/usingtiktok/creating-videos/stitch
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>S. S. C.</given-names>
            <surname>Herrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hallward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Duncan</surname>
          </string-name>
          ,
          <article-title>"This is just how I cope": An inductive thematic analysis of eating disorder recovery content created and shared on TikTok using #EDrecovery</article-title>
          ,
          <string-name>
            <surname>Int J Eat Disord</surname>
          </string-name>
          ,
          <volume>54</volume>
          .4 (
          <year>2021</year>
          ):
          <fpage>516</fpage>
          -
          <lpage>526</lpage>
          . doi:
          <volume>10</volume>
          .1002/eat.23463.
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [53]
          <string-name>
            <surname>T. De Mauro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mancini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedovelli</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voghera</surname>
          </string-name>
          ,
          <article-title>Lessico di frequenza dell'italiano Parlato (LIP), Etaslibri</article-title>
          , Milano,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [54]
          <string-name>
            <surname>T. De</surname>
            <given-names>Mauro</given-names>
          </string-name>
          ,
          <article-title>Primo Tesoro della lingua italiana letteraria del Novecento (PTLLI), UTET</article-title>
          , Torino,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mancini</surname>
          </string-name>
          ,
          <string-name>
            <surname>L'</surname>
          </string-name>
          <article-title>elaborazione automatica del corpus</article-title>
          , in: T. De Mauro,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mancini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vedovelli</surname>
          </string-name>
          , M. Voghera (Eds.),
          <article-title>Lessico di frequenza dell'italiano Parlato (LIP), Etaslibri</article-title>
          , Milano,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>R. Rossini</given-names>
            <surname>Favretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>C. De Santis</surname>
          </string-name>
          , CORIS/CODIS:
          <article-title>A corpus of written Italian based on a defined and a dynamic model</article-title>
          , in: S. Wilson,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rayson</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>T. McEnery</surname>
          </string-name>
          (Eds.),
          <article-title>A Rainbow of Corpora: Corpus Linguistics and the Languages of the World</article-title>
          (pp.
          <fpage>27</fpage>
          -
          <lpage>38</lpage>
          ), München, LINCOM-Europa,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bernardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferraresi</surname>
          </string-name>
          , E. Zanchetta,
          <string-name>
            <surname>The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled</surname>
            <given-names>Corpora</given-names>
          </string-name>
          ,
          <source>Language Resources &amp; Evaluation</source>
          ,
          <volume>43</volume>
          ,
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          (
          <year>2009</year>
          ). doi.org/10.1007/s10579-009-9081-4
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lyding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stemle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Borghetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brunello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castagnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dittmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          , &amp; V.
          <string-name>
            <surname>Pirrelli</surname>
          </string-name>
          ,
          <source>The PAISÀ Corpus of Italian Web Texts</source>
          , in: F. Bildhauer &amp; R. Schäfer (Eds.),
          <source>Proceedings of the 9th Web as Corpus Workshop (WaC-9)</source>
          ,
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          , Gothenburg, Sweden.
          <source>Association for Computational Linguistics</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [59]
          <article-title>How TikTok created a new accent - and why it might be the future of English, BBC</article-title>
          .
          <source>Last accessed on 2nd May</source>
          <year>2025</year>
          . . URL: https://www.bbc.com/future/article/20240123- what
          <article-title>-tiktok-voice-sounds-like-internet-influencer</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>N.</given-names>
            <surname>Adomaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Trieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>The TikTok Influencer Voice: Do Sociolinguistic Features Influence the Success of TikTok Videos?, Languaged Life - Studies in language and society</article-title>
          ,
          <source>UCLA</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>O.</given-names>
            <surname>Foubert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lemmens</surname>
          </string-name>
          ,
          <article-title>Gender-biased neologisms: the case of man-X, Lexis Journal in English Lexicology (Lexical and Semantic Neology in English</article-title>
          ),
          <volume>12</volume>
          ,
          <year>2018</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>M.</given-names>
            <surname>Szymańska</surname>
          </string-name>
          ,
          <source>Gendered Neologisms Beyond Social Media: the Current Use of Mansplaining</source>
          , Research in Language, vol.
          <volume>20</volume>
          .3,
          <year>2020</year>
          ,
          <fpage>259</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Wardhani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Arifin</surname>
          </string-name>
          ,
          <article-title>Code switching and code mixing in Ritsuki's vlog on Digita Media TikTok: a study of sociolinguistics</article-title>
          ,
          <source>Esteem Journal of English Education Study Programme</source>
          ,
          <volume>8</volume>
          .1 (
          <issue>2025</issue>
          ),
          <fpage>200</fpage>
          -
          <lpage>205</lpage>
          . doi: doi.org/10.31851/esteem.v8i1.
          <fpage>18124</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pannitto</surname>
          </string-name>
          ,
          <article-title>Towards the first UD Treebank of Spoken Italian: the KIParla forest</article-title>
          . ArXiv, abs/2410.04589. doi.org/10.48550/arXiv.2410.04589
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pannitto</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Mauri, The KIPARLA Forest treebank of spoken Italian: an overview of initial design choices</article-title>
          .
          <source>ArXiv, abs/2411</source>
          .06554, doi.org/ 10.48550/arXiv.2411.06554
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          [66]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yazell</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>A. Wohlmann</surname>
          </string-name>
          ,
          <article-title>Memes in the Literature Studies Classroom, Narrative Works</article-title>
          . Issues, Investigations, &amp;
          <string-name>
            <surname>Interventions</surname>
          </string-name>
          ,
          <volume>12</volume>
          .1 (
          <issue>2023</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . /doi.org/10.7202/1111279ar.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>