<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Life and Death of Fakes: on Data Persistence for Manipulative Social Media Content</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Olga</forename><surname>Uryupina</surname></persName>
							<email>uryupina@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Information Engineering and Computer Science</orgName>
								<orgName type="institution">University of Trento</orgName>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Life and Death of Fakes: on Data Persistence for Manipulative Social Media Content</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2CDDB1B931E181711FFE0AECAD97811D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>fact checking, replicability,</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This work presents an in-depth investigation of the data decay for publicly fact-checked online content. We monitor compromised posts on major social media platforms (Facebook, Instagram, Twitter, TikTok) for one year, tracking the changes in their visibility and availability. We show that data persistence is an important issue for manipulative content, on a larger scale than previously reported for online content in general. Our findings also suggest a (much) higher data decay rate for the platforms suffering most from online disinformation, indicating an important area for data collection/preservation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Manipulative online content is rapidly becoming a more and more pervasive issue for the modern society: by deliberately biasing our information flow, unscrupulous content writers can and do affect our emotional state, beliefs, reasoning and both online and offline behaviour. It is therefore not surprising that this has become a central issue for various stakeholders, from journalists and fact-checkers to NLP researchers both in academia and in the industry. Given the current rapid growth in datadriven studies of manipulative content, it is essential to have a reliable overview of data persistence issues in this specific domain: compromised content is often very dynamic and changes or becomes unavailable over time, raising reproducibility concerns, From the readers' perspective, the visibility of compromised content over time affects directly its impact: a removed or strongly downgraded document is unlikely to be read/recovered and cannot be used to promote or support other fakes. From the research and development perspective, data persistence is crucial for benchmarking, ensuring fair comparison between models as well as even simply providing them with high-quality real-life training and testing examples.</p><p>Starting from already a decade ago, NLP benchmarking campaign studies <ref type="bibr" target="#b0">[1]</ref> report data persistence issues for online content, as used in various shared tasks, reporting around 10% of entries missing compared to the original dataset (gold standard). These shared tasks, however, are based almost exclusively on Twitter and do not focus specifically on compromised content. We believe that a large proportion of manipulative content is created on purpose by professional copywriters who might have different goals and motivations to keep their texts online (e.g., for click-bait purposes) or remove them (e.g., to reduce the reputation loss from being exposed as unreliable).</p><p>Our work focuses specifically on the lifespan of factchecked compromised content. We go beyond the naive binary present vs. removed view, studying more nuanced cases as well. In particular, we track compromised online posts over time for the appearance of explicit platformspecific reliability labels (e.g. "out of context"), obfuscation (the common situation when the online content isfully or partially -rendered either very blurred or as a black/white box, with a message raising awareness of its limited reliability; this content, however, is still accessible to the user upon an extra click), and author-generated edits, as well as complete content removal.</p><p>More specifically, we address the following research questions: RQ1: How persistent is the compromised content?</p><p>How does its visibility and availability change over time? RQ2: What is the typical timeline for interaction between the content generators and fact-checkers? How -if at all -do content writers alter their posts after being exposed as problematic by fact checkers? RQ3: Are the trends different across platforms?</p><p>To this end, we analyze two datasets (in English) of social media documents, fact-checked by PolitiFact. <ref type="foot" target="#foot_0">1</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Multiple studies report on data persistence issues for online content. These works, however, mostly focus on Twitter datasets, as used for various challenges and shared tasks.</p><p>Zubiaga <ref type="bibr" target="#b1">[2]</ref> provides an exhaustive report on data persistence for multiple Twitter datasets, showing an average data decay of around 20% over 4 years.</p><p>Küpfer <ref type="bibr" target="#b2">[3]</ref> argues, always for Twitter, that data persistence is not random, becoming drastically more of an issue for emotionally charged or controversial content. Indeed, both Bastos <ref type="bibr" target="#b3">[4]</ref> and Duan et al. <ref type="bibr" target="#b4">[5]</ref> report much higher tweet decay rates for #Brexit and #BlackLivesMatter, content respectively.</p><p>To our knowledge, there have been no studies assessing explicitly data persistence issues for fakes. For some datasets, the creators provide estimations of content decay. For example, Bianchi et al. <ref type="bibr" target="#b5">[6]</ref> estimate that around 25% of the tweets in their corpus on harmful speech online were no longer available at the paper publication time. It is, however, unspecified, how this estimation was obtained.</p><p>We hope to bring new insights to our understanding of the data persistence issues for compromised content by addressing the following novel angles: (i) we aim at a targeted analysis of manipulative content (fake news), (ii) we provide a more nuanced approach, tracking subtler changes in data availability for users and machines (e.g., obfuscation) and (iii) we go beyond Twitter, targeting all the major social media platforms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data</head><p>For our study, we use two data sets of real-life suspicious online posts, analyzed by PolitiFact. A 2-months dataset (PolitiFact reports from 15 May -15 July 2023, around 200 entries) has been thoroughly monitored for data visibility and persistence up till now. A larger and older dataset (PolitiFact reports from January -September 2022, around 800 entries) has been analyzed twice to assess longer-term trends.</p><p>The two datasets include all the posts in English from the major social media platforms as reported by PolitiFact during the above mentioned periods (i.e., the original publications slightly predate May 15, 2023 and Jan 1, 2022, respectively).</p><p>The analysis involves the following dimensions:</p><p>• visibility: visible (possibly with a warning), obfuscated, removed Assessing the time required for professional fact-checking (fc): statistics for the 2-month dataset, days.</p><p>While some of these aspects are crucial for algorithmic NLP (e.g., data persistence is important for benchmarking and -in critical cases -even training ML models), others are more relevant for understanding the impact of manipulative content on human readers (e.g., obfuscation is an unambiguous warning the platform sends to the reader on a low reliability of the information). The 2-months dataset has been analysed every two days for the first two months and then on a weekly basis for the following year. The 8-months dataset has been analyzed in May and October 2024, when the documents were 1.5-2 and 2-2.5 years old respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Compromised content: timeline</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">From publication to fact-checking</head><p>For this project, we start monitoring the content the day it appears on PolitiFact. Obviously, this doesn't happen the very moment the content gets published by its creators: it takes some time for the content to reach PolitiFact and then an extra period to perform fact-checking. This lag may depend on numerous factors: for example, some fakes are simple and repetitive, thus requiring less investigative effort, whereas some others lead PolitiFact journalists to request third-party expert analytics, involving time-consuming communications with various public figures and organizations.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows time lag statistics (in days) between the content publication date (as reported by the platforms) and the appearance of the corresponding fact-checking report. It suggests that PolitiFact is doing an outstanding job at timely reacting to online misinformation: an average suspicious post is analyzed in 4 days, with a large bulk of reports appearing on the next day already. We observe no platform-based difference in PolitiFact reaction times, thus confirming their neutrality in this respect.</p><p>PolitiFact stays in active collaborations with major social media platforms. <ref type="foot" target="#foot_2">2</ref>  (e.g. "false" or "out of context") shortly after or even before the publication on the PolitiFact website. This marking, as we will see below, often leads to immediate content modification or withdrawal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Content availability after fact-checking</head><p>Tables <ref type="table" target="#tab_1">2 and 3</ref> illustrate data availability over time for the 2-months set. We distinguish between two categories: visible and available. Available content can be accessed by either a human or a machine, possibly with some effort (e.g., an extra click). Visible content can be accessed as-is.</p><p>In other words, non-visible accessible content includes fully or partially obfuscated posts. We see several important trends here. First of all, already at the fact-checking date, around 12% of documents are no longer available. This number grows rapidly: after one year, the unavailable content comprises 38% of datapoints for our 2-month set.. This number is much more pessimistic than common estimations of online data persistence <ref type="bibr" target="#b1">[2]</ref>. This raises an important and a very urgent issue: as a community, we should invest a more focused and consistent effort in timely saving samples of compromised documents for ongoing and future research/benchmarking. From the human reader perspective, only one third of posts are clearly visible after one year (and even in such cases, they might contain explicit markings, such as "partially false").</p><p>We also observe a striking difference across platforms: while most tweets remain online, almost a half of compromised Instagram posts are no longer available after 12 months. This is truly problematic: while the NLP community focuses mainly on Twitter data, fakes on other platforms are more prevalent-and keep appearing and disappearing at an alarming rate, leaving us virtually no opportunity to model the underlying trends.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Content adjustment</head><p>As we have seen above, once a document has been factchecked and deemed false, the most typical reaction is its -rather fast -removal. This would be a rather natural reaction: most creators do not enjoy having their content (and their name) marked as unreliable. In some cases, however, the users<ref type="foot" target="#foot_3">3</ref> prefer keeping the compromised content online. Such content -proven do be problematic by a publicly available fact-checking report -would trigger a reaction from (a) the hosting social media platform, (b) the community and (c) the authors themselves. The observed reactions for visible documents are summarized in Table <ref type="table" target="#tab_2">4</ref>.</p><p>Facebook and Instagram adopt their own labels to mark questionable content, distinguishing between "false", "out-of-context" and "partly false" documents. <ref type="foot" target="#foot_4">4</ref> Although PolitiFact stays in an active collaboration with the both platforms, there is no direct correspondence between the labels. The labels get assigned rather quickly and stay unchanged (almost all of the observed label change is due to the complete removal of the document).</p><p>Twitter relies on its own community to highlight problematic content. This measure was introduced after the start of our project and therefore we cannot assess di- rectly how quickly the posts become marked as potentially problematic. Finally, the users themselves might react verbally to fact-checking reports or consequent actions by social media platforms, editing their original posts. The modifications might range from acknowledging the fact-checking findings and putting clear and unambiguous updates all the way to claiming being ironic or actively attacking fact checkers and arguing against their findings. We have also observed a higher percentage of edits from non-anonymous accounts.</p><formula xml:id="formula_0">%</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Longer-term trends</head><p>Table <ref type="table" target="#tab_3">5</ref> shows similar statistics for our 8-months dataset, covering PolitiFact reports published from January to September 2022. We have computed them in May and October 2024 when most posts were almost 2 and 2.5 years old respectively.</p><p>These numbers support our initial findings: almost half (44.8%) of compromised documents are no longer available after 2 years. The decay is more pronounced for TikTok and Instagram.</p><p>A considerably larger percent of Facebook posts remains visible (non-obfuscated) in our 8-months dataset: this might be attributed to a rendering policy change.</p><p>Finally, the 2022 dataset (8-months) contains a larger share of tweets. The decay rate for Twitter is at 17% after 2 years (compared to just 6% after 1 year for the 2-months 2023 dataset). We believe that the considerable change in the platform guidance in the past two years has affected the way content writers use Twitter (both publishing and removing). A larger-scale study is needed to provide more reliable Twitter-specific estimates under the new policies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>This paper aims at an in-depth analysis of data persistence for publicly fact-checked online content. After one year of monitoring thoroughly online posts fact-checked by PolitiFact, we have observed the following findings. First, the data persistence is a crucial and underrated issue for compromised content, with considerable decay rates. Second, the decay trends differ across platforms, with Facebook, TikTok and Instagram showing much less data persistance. Third, the decay starts immediately, with 12% of the compromised posts getting deleted at (or before) the publication of the PolitiFact report and 20% becoming unavailable within a week. This suggests an urgent need for a concentrated effort on timely collecting real-life fakes if we want to go beyond synthetic or simplistic datasets and train impactful fact-checking models.</p><p>In the future, we want to analyze further aspects of the decay issues for the compromised content. Thus, we plan to add more fact-checking outlets beyond PolitiFact to see if there are any effects due to the report itself. Second, we plan to study in more detail the difference in online behaviour (content removal) between anonymous users, non-anonymous users and public figures. Finally, we plan to expand our research on interaction between content writers and fact-checkers ("editing").</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>;</cell></row><row><cell>• persistence: original, edited, removed;</cell></row><row><cell>• extra labelling: any platform-specific add-ons,</cell></row><row><cell>e.g. "missing context".</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>As a result, in most cases the content is marked by the platform as somewhat spurious Statistics for the 2-months dataset: data visibility at fact-checking day and one week, 1, 3 and 12 months afterwards: % of visible documents.</figDesc><table><row><cell></cell><cell>% d0</cell><cell>% d7</cell><cell>% d30</cell><cell>% d100</cell><cell>% d365</cell><cell>total</cell></row><row><cell>all</cell><cell>88.02%</cell><cell>80.72%</cell><cell>75.52%</cell><cell>69.27%</cell><cell>61.97%</cell><cell>192</cell></row><row><cell>fb</cell><cell>83.72%</cell><cell>80.23%</cell><cell>75.58%</cell><cell>70.93%</cell><cell>63.95%</cell><cell>86</cell></row><row><cell>twitter</cell><cell>93.75%</cell><cell>93.75%</cell><cell>87.5%</cell><cell>93.75%</cell><cell>93.75%</cell><cell>16</cell></row><row><cell>tiktok</cell><cell>94.11%</cell><cell>82.35%</cell><cell>76.47%</cell><cell>64.7%</cell><cell>58.82%</cell><cell>17</cell></row><row><cell>instagram</cell><cell>90.27%</cell><cell>77.77%</cell><cell>72.22%</cell><cell>63.88%</cell><cell>54.16%</cell><cell>72</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="7">Statistics for the 2-moths dataset: data availability at fact-checking day and one week, 1, 3 and 12 months afterwards: % of</cell></row><row><cell cols="2">available (visible or obfuscated) documents.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>% day0</cell><cell>% day7</cell><cell>% day30</cell><cell>% day100</cell><cell>% day365</cell><cell>total</cell></row><row><cell>all</cell><cell>48.43%</cell><cell>46.87%</cell><cell>43.22%</cell><cell>40.1%</cell><cell>36.97%</cell><cell>192</cell></row><row><cell>fb</cell><cell>41.86%</cell><cell>39.53%</cell><cell>36.04%</cell><cell>32.55%</cell><cell>27.9%</cell><cell>86</cell></row><row><cell>twitter</cell><cell>93.75%</cell><cell>93.75%</cell><cell>87.5%</cell><cell>93.75%</cell><cell>93.75%</cell><cell>16</cell></row><row><cell>tiktok</cell><cell>94.11%</cell><cell>82.35%</cell><cell>76.47%</cell><cell>64.7%</cell><cell>58.82%</cell><cell>17</cell></row><row><cell>instagram</cell><cell>34.72%</cell><cell>36.11%</cell><cell>33.33%</cell><cell>31.94%</cell><cell>30.55%</cell><cell>72</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Reactions to fact-checking by social media platforms, community and users.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>day0</cell><cell cols="2">% day7</cell><cell>% day30</cell><cell cols="2">% day100</cell><cell>% day365</cell><cell cols="2">at some point</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Platform labels</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">missing context</cell><cell></cell><cell>11.5%</cell><cell></cell><cell>10.9%</cell><cell>12.0%</cell><cell cols="2">10.4%</cell><cell>8.9%</cell><cell></cell><cell>13.5%</cell><cell></cell></row><row><cell></cell><cell cols="2">partly false</cell><cell></cell><cell>8.9%</cell><cell></cell><cell>8.9%</cell><cell>9.4%</cell><cell cols="2">9.4%</cell><cell>8.9%</cell><cell></cell><cell>11.5%</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Community labels</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">reader's context</cell><cell></cell><cell>0.5%</cell><cell></cell><cell>1.0%</cell><cell>2.1%</cell><cell cols="2">3.1%</cell><cell>3.1%</cell><cell></cell><cell>3.1%</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Authors' intervention</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>editing</cell><cell></cell><cell>1.6%</cell><cell></cell><cell>2.6%</cell><cell>2.1%</cell><cell cols="2">1.6%</cell><cell>1.6%</cell><cell></cell><cell>2.6%</cell><cell></cell></row><row><cell>all</cell><cell></cell><cell cols="2">visible</cell><cell></cell><cell></cell><cell></cell><cell cols="2">obfuscated</cell><cell></cell><cell></cell><cell cols="2">removed</cell><cell></cell><cell>total</cell></row><row><cell></cell><cell cols="2">May 2024</cell><cell cols="2">Oct 2024</cell><cell></cell><cell cols="2">May 2024</cell><cell cols="2">Oct 2024</cell><cell cols="2">May 2024</cell><cell cols="2">Oct 2024</cell></row><row><cell>all</cell><cell cols="5">363 44.21% 346 42.14%</cell><cell cols="4">128 15.59% 107 13.03%</cell><cell cols="4">330 40.19% 368 44.82%</cell><cell>821</cell></row><row><cell>fb</cell><cell cols="5">170 33.53% 164 32.35%</cell><cell>106</cell><cell>20.9%</cell><cell>90</cell><cell>17.75%</cell><cell cols="4">231 45.56% 253 49.90%</cell><cell>507</cell></row><row><cell>twitter</cell><cell cols="5">156 81.25% 157 81.77%</cell><cell>3</cell><cell>1.56%</cell><cell>2</cell><cell>1.04%</cell><cell>33</cell><cell>17.18%</cell><cell>33</cell><cell>17.8%</cell><cell>192</cell></row><row><cell>tiktok</cell><cell>3</cell><cell>25%</cell><cell>1</cell><cell>8.33%</cell><cell></cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>9</cell><cell>75%</cell><cell>11</cell><cell>91.67%</cell><cell>12</cell></row><row><cell>instagram</cell><cell>29</cell><cell>28.15%</cell><cell>23</cell><cell cols="2">22.33%</cell><cell>19</cell><cell>18.44%</cell><cell>15</cell><cell>14.56%</cell><cell>55</cell><cell>53.39%</cell><cell>65</cell><cell>63.11%</cell><cell>103</cell></row><row><cell>youtube</cell><cell>5</cell><cell>83.33%</cell><cell>5</cell><cell cols="2">83.33%</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>1</cell><cell>16.66%</cell><cell>1</cell><cell>16.66</cell><cell>6</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Statistics for the 8-months dataset: data persistence across platforms, assessed in May 2024 (1.5-2 years after the publication).</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">PolitiFact (https://www.politifact.com/) is an independent journalistic agency and one of the most experienced fact-checking organizations, providing detailed analytics for non-transparent online content</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2007" xml:id="foot_1">since 2007.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">For example, https://www.facebook.com/help/1952307158131536? helpref=related and https://www.tiktok.com/safety/en/ safety-partners/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">We do not have any reliable estimations on the content removal by the major online platforms themselves. In this study, we assume, albeit unrealistically, that the content gets removed by the users.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4">The exact labels vary across platforms (e.g. "out of context" vs. "missing context").</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank the Autonomous Province of Trento for the financial support of our project via the AI@TN initiative.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Tweetnorm: a benchmark for lexical normalization of spanish tweets</title>
		<author>
			<persName><forename type="first">I</forename><surname>Alegria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Aranberri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Comas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gamallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Padró</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>San</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vicente</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Turmo</surname></persName>
		</author>
		<author>
			<persName><surname>Zubiaga</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10579-015-9315-6</idno>
	</analytic>
	<monogr>
		<title level="j">Language Resources and Evaluation</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A longitudinal assessment of the persistence of twitter datasets</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zubiaga</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.24026</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Nonrandom tweet mortality and data access restrictions: Compromising the replication of sensitive twitter studies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Küpfer</surname></persName>
		</author>
		<idno type="DOI">10.1017/pan.2024.7</idno>
	</analytic>
	<monogr>
		<title level="j">Political Analysis</title>
		<imprint>
			<biblScope unit="page" from="1" to="14" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">This account doesn&apos;t exist: Tweet decay and the politics of deletion in the brexit debate</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bastos</surname></persName>
		</author>
		<idno type="DOI">10.1177/0002764221989772</idno>
	</analytic>
	<monogr>
		<title level="j">American Behavioral Scientist</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page">000276422198977</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">this tweet is unavailable&quot;: #blacklivesmatter tweets decay</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hemsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Smith</surname></persName>
		</author>
		<idno type="DOI">10.5210/spir.v2023i0.13414</idno>
		<ptr target="https://spir.aoir.org/ojs/index.php/spir/article/view/13414.doi:10.5210/spir.v2023i0.13414" />
	</analytic>
	<monogr>
		<title level="m">AoIR Selected Papers of Internet Research</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">it&apos;s not just hate&quot;: A multi-dimensional perspective on detecting harmful speech online</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rossini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tromble</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tintarev</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.emnlp-main.553</idno>
		<ptr target="https://aclanthology.org/2022.emnlp-main.553.doi:10.18653/v1/2022.emnlp-main.553" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting>the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="8093" to="8099" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
