TAR on Social Media: A Framework for Online Content Moderation Eugene Yang1 , David D. Lewis2 and Ophir Frieder3 1 IR Lab, Georgetown University, Washington, DC, USA 2 Reveal Brainspace, Chicago, IL, USA 3 IR Lab, Georgetown University, Washington, DC, USA Abstract Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from the perspective of technology-assisted review (TAR): a human-in-the-loop active learning approach developed for high recall retrieval problems in civil litigation and other fields. We show how TAR workflows, and a TAR cost model, can be adapted to the content moderation problem. We then demonstrate on two publicly available content moderation data sets that a TAR workflow can reduce moderation costs by 20% to 55% across a variety of conditions. Keywords Technology-assised review, active learning, social media, content moderation, cost analysis 1. Introduction and automated classification will be required for online content moderation for the foreseeable future [1, 5, 6]. Online social networks are powerful platforms for per- This has meant not just capital investments in machine sonal communication, community building, and free ex- learning tools for moderation, but also massive ongoing pression. Unfortunately, they can also be powerful plat- personnel expenses for teams of human reviewers [7]. forms for harassment, disinformation, and perpetration Surprisingly, the challenge of reducing costs when of criminal and terrorist activities. Organizations host- both machine learning and manual review are neces- ing social networks, such as Facebook, Twitter, Reddit, sary has been an active area of interest for almost two and others, have deployed a range of techniques to coun- decades, but in a completely different area: civil litiga- teract these threats and maintain a safe and respectful tion. Electronic discovery (eDiscovery) projects involve environment for their users. teams of attorneys, sometimes billing the equivalent of One such approach is content moderation: removal hundreds of euros per person-hour, seeking to find docu- (hard moderation) or demoting (soft moderation) of ments responsive to a legal matter [8]. As the volume of policy-violating posts [1, 2]. Despite recent progress in electronically produced documents grew, machine learn- machine learning, online content moderation still heav- ing began to be integrated in eDiscovery workflows in ily relies on human reviews [3]. Facebook’s CEO Mark the early 2000s, a history we review elsewhere [9]. Zuckerberg stated that language nuances could get lost The result in the legal world has been technology- when relying on automated detection approaches, empha- assisted review (TAR): human-in-the-loop active learning sizing the necessities for human judgments. 1 Ongoing workflows that prioritize the most important documents changes in what is considered inappropriate content com- for review [10, 11]. One-phase (continuous model refine- plicates the use of machine learning [4]. Policy experts ment) and two-phase (with separate training and deploy- have argued that complete automation of content mod- ment phases) TAR workflows are both in use [9, 12]. eration is socially undesirable regardless of algorithmic Because of the need to find most or all relevant docu- accuracy [5]. ments, eDiscovery has been referred to as a high recall It is thus widely believed that both human moderation review (HRR) problem [13, 14, 15]. HRR problems also arise in systematic reviews in medicine, sunshine law DESIRES 2021 – 2nd International Conference on Design of Experimental Search & Information REtrieval Systems, September requests, and other tasks [16, 17, 18]. Online content 15–18, 2021, Padua, Italy moderation is an HRR problem as well, in that a very " eugene@ir.cs.georgetown.edu (E. Yang); high proportion of inappropriate content should be iden- desires2021paper@davelewis.com (D. D. Lewis); tified and removed. ophir@ir.cs.georgetown.edu (O. Frieder) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Our contributions in this paper are two-fold. First, we CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) describe how to adapt TAR and its cost-based evaluation 1 https://www.businessinsider.com/zuckerberg-nuances-conte framework to the content moderation problem. Second, nt-moderation-ai-misinformation-hearing-2021-3 we test this approach using two publicly available con- [22, 31, 33, 34, 35, 36] or offensive language in general tent moderation datasets. Our experiments show substan- [23, 37, 38, 39, 40, 41, 42]. tial cost reductions using the proposed TAR framework However, some moderation judgments are inevitably over both manual review of unprioritized documents and too subtle for purely automated methods3 , particularly training of prioritized models on random samples. when content is generated with the intent of fooling au- tomated systems [1, 25, 43]. Content that is recontextual- ized from the original problematic context, for example, 2. Background through reposting, screenshotting, and embedding in new contexts complicates moderation [2]. Additionally, Content moderation on online platforms is a neces- bias in automated systems can also arise both by learn- sity [19, 20] and has been argued by some to be the defin- ing from biased labels and from numerous other choices ing feature of an online platform [6]. Despite terms of in data preparation and algorithmic settings [27, 44, 45]. service and community rules on each platform, users Biased models risk further marginalizing and dispropor- produce inappropriate content, particularly when anony- tionately censoring groups that already face discrimina- mous [21]. Inappropriate content includes toxic content tion [1]. Differences in cultural and regulatory contexts such as hate speech [22], offensive content [23], and mis further complicate the definition of appropriateness, cre- / disinformation [4, 23]. It also includes content that is ating another dimension of complexity when deploying inappropriate for legal or commercial reasons, such as automated content moderation [4]. potential copyright violations [5, 24]. Human-in-the-loop approaches, where AI systems ac- The identification of toxic content can require subtle tively manage which materials are brought to the at- human insight [4, 22], both due to attempts at obfuscation tention of human moderators, attempt to address the by posters, and because the inappropriateness of the weaknesses of both approaches while gathering training content is often tied to its cultural, regional, and temporal data to support supervised learning components [25, 46]. context [1, 3]. Mis- and disinformation often consists Filtering mechanisms that proactively present only ap- of subtle mixtures of truthful and misleading content proved content (pre-moderation) and/or removal mecha- that require human common sense inferences and other nisms that passively take down inappropriate ones are background knowledge [4, 23]. used by platforms depending on the intensity [4]. Re- Social media organizations have deployed numerous viewing protocols could shift from one to the other based techniques for implementing community policies, includ- on the frequency of violations or during a specific event, ing graph- and time-based analyses of communication such as elections4 . Regardless of the workflows, the core patterns, user profile information, and others [25]. Our and arguably the most critical components is reviews. focus here, however, is on methods that use the content However, the primary research focus of human-in-the- of a post. loop content moderation has been on classification algo- Content monitoring falls into three categories: man- rithm design and bias mitigation, rarely on the investiga- ual moderation, text classification, and human-in-the- tion of the overall workflow. loop methods. The latter two approaches leverage ma- Like content moderation, eDiscovery is a high recall chine learning models and are sometimes collectively retrieval task applied to large bodies of primarily tex- referred to as algorithmic content moderation in policy tual content (typically enterprise documents, email, and research [5]. chat) [11, 12]. Both fixed data set and streaming task Manual moderation is the oldest approach, dating back structures have been explored, though the streaming to email mailing lists. It is, however, extremely expensive context tends to bursty (e.g., all data from a single person at the scale of large social networks and suffers potential arriving at once) rather than continuous. Since cost min- human biases. Additionally, mental health concerns are imization is a primary rationale for TAR [47], research an issue for moderators exposed to large volumes of toxic on TAR has focused on training regimens and workflows content [25, 26, 27]. for minimizing the number, or more generally the cost, The simplest text classification approaches are key- of documents reviewed [9, 12]. A new TAR approach is word filters, but these are susceptible to embarrass- typically evaluated for its ability to meet an effectiveness ing mistakes2 and countermeasures by content cre- target while minimizing cost or a cost target while maxi- ators. More effective text classification approaches to mizing effectiveness [18, 48, 49]. This makes approaches content moderation are based on supervised machine developed for TAR natural to consider for content mod- learning [28, 29]. Content types that have been ad- eration. dressed include cyberbullying [29, 30, 31, 32], hate speech 3 https://venturebeat.com/2020/05/23/ai-proves-its-a-poor-su 2 https://www.techdirt.com/articles/20200912/11133045288/p bstitute-for-human-content-checkers-during-lockdown/ 4 aypal-blocks-purchases-tardigrade-merchandise-potentially-viol https://www.washingtonpost.com/technology/2020/11/07/f ating-us-sanctions-laws.shtml acebook-groups-election/ 3. Applying TAR to Content shut up mind your own business and go f*** some one Moderation else over In most TAR applications, at least a few documents of the (usually rare) category of interest are available at (a) Wikipedia collection. the start of the workflow. These are used to initialize an iterative pool-based active learning workflow [50]. Re- : being in love with a girl you dont even know yours is viewed documents are used to train a predictive model, sadder which in turn is used to select further documents based : f*** off you f***ing c***! on predicted relevance [51], uncertainty [52], or compos- ite factors. Workflows may be batch-oriented (mimicking pre-machine learning manual workflows common in the (b) ASKfm collection law) or a stream of documents may be presented through an interactive interface with training done in the back- Figure 1: Example content in the collections ground. These active learning workflows have almost completely displaced training from random examples when supervised learning is used in eDiscovery. tion approaches used in social media are complex, but in Two workflow styles can be distinguished [9]. In a the end reduce to some combination of machine-assisted one-phase workflow, iterative review and training simply manual decisions (phase one) and automated decisions continues until a stopping rule is triggered [49, 53, 54]. based on deploying a trained model (phase two). Opera- Stopping may be conditioned on estimated effectiveness tional decisions such as flagging and screening all posts (usually recall), cost limits, and other factors [53, 55, 56]. from an account or massive reviewing of posts related Two-phase workflows stop training before review is fin- to certain events [4, 6] are all results of applying previ- ished, and deploy the final trained classifier to rank the ously trained models, which is also a form of deployment. remaining documents for review. The reviewed docu- Also, broadly applying the model to filter the content ments are typically drawn from the top of the ranking, vastly reduces moderation burden when similar content with the depth in the ranking chosen so that an estimated is rapidly being published on the platform with the risk effectiveness target is reached [18, 48]. Two-phase work- of falsely removal [4]. We claim no optimal for this spe- flows are favored when labeling of training data needs to cific simplified model in evaluating content moderation, be done by more expensive personnel than are necessary but an initial effort for modeling the human-in-the-loop for routine review. moderation process. The cost of both one- and two-phase TAR workflows When applying the model to content moderation, how- can be captured by in a common cost model [9]. The ever, we assume uniform review costs for all documents. model defines the total cost of a one-phase review termi- This seems the best assumption given the short length nated at a particular point as the cost incurred in review- of texts reviewed and what is known publicly about the ing documents to that point, plus a penalty if the desired cost structure of moderation [6]. effectiveness target (e.g., a minimum recall value) has not In the next section, we describe our experimental set- been met. The penalty is simply the cost of continuing ting for adapting and evaluating TAR for content moder- on to an optimal second-phase review from that point, ation. i.e. the minimum number of prioritized documents is reviewed to hit the effectiveness target. For a two-phase workflow, we similarly define total cost to be the cost 4. Experiment Design of the training phase plus the cost of an optimal second phase using the final trained model. Here we review the data sets, evaluation metric, and These costs in both cases are idealizations in that there implementation details for our experiment. may be additional cost (e.g. a labeled random sample) to choose a phase two cutoff citecikmpaper. However, the 4.1. Data Sets model allows a wide range of workflows to be compared on a common basis, as well as allowing differential costs We used two fully labeled and publicly available con- for review of positive vs. negative documents, or phase tent moderation data sets with a focus on inappropriate one vs. phase two documents. user-generated content. The Wikipedia personal attack While developed for eDiscovery, the above cost model data set [32] consists of 115,737 Wikipedia discussion is also a good fit for content moderation. As discussed comments with labels obtained via crowdsourcing. An in the previous section, the human-in-the-loop modera- example of the comment is presented in Figure 1(a) Eight annotators assigned one of five mutually exclusive la- framework is available on GitHub7 . bels to each document: Recipient Target, Third Party Target, Quotation Attack, Other Attack, and No Attack 4.3. Evaluation (our names). We defined three binary classification tasks corresponding to distinguishing Recipient Target, Third Our metric was total cost to reach 80% recall as described Party Target, or Other Attack from all other classes. (Quo- in Section 3. This was computed at the end of each train- tation Attack had too low a prevalence.) A fourth binary ing round as the sum of the number of training doc- classification task distinguished the union of all attacks uments, plus the ideal second phase review cost as a from No Attack. A document was a positive example if 5 penalty, which is the number of additional top-ranked or more annotators put it in the positive class. Proportion documents (if any) needed to bring recall up to 80%. Rank- of the positive class ranged from 13.44% to 0.18%. ing was based on sorting the non-training documents by The ASKfm cyberbullying dataset [29] contains 61,232 probability of relevance using the most recent trained English utterance/response pairs, each of which we model. Note that we experimented with 80% recall as treated as a single document. An example of the con- an example. However, the TAR workflow is capable of versation is presented in Figure 1(b). Linguists annotated running with arbitrary recall target, such as 95% for sys- both the poster and responder with zero or one of four tematic review [18, 56]. mutually exclusive cyberbullying roles, as well as an- In actual TAR workflows, recall would be estimated notating the pair as a whole for any combination of 15 from a labeled random sample. Since the cost of this sam- types of textual expressions related to cyberbullying. We ple would be constant across our experimental conditions treated these annotations as defining 23 binary classifica- we used an oracle for recall instead. tions for a pair, with prevalence of the positive examples ranging from 4.63% to 0.04%. For both data sets we refer to the binary classification 5. Results and Analysis tasks as topics and the units being classified as documents. Our core finding was that, as in eDiscovery, active selec- Documents were tokenized by separating at punctuation tion of which documents to review reduces costs over and whitespace. Each distinct term became a feature. We random selection. Figure 2 shows mean cost to reach used log tf weighting as the features for the underlying 80% recall over 20 replications (different seed sets and classification model. The value of a feature was 0 if not random samples) for six representative categories. On all present, and else 1 + π‘™π‘œπ‘”(𝑑𝑓 ), where 𝑑𝑓 is the number of six categories, all TAR workflows within a few iterations occurrences of that term in the document. beat the baseline of reviewing a random 80% of the data set (horizontal line labeled Manual Review). 4.2. Algorithms and Workflow The Wikipedia Attack category is typical of low to moderate prevalence categories (𝑝 = 0.1344). Uncer- Our experiments simulated a typical TAR workflow. The tainty sampling strongly dominates both random sam- first training round is a seed set consisting of one ran- pling (too few positives chosen) and relevance feedback dom positive example (simulating manual input) and one (too many redundant positives chosen for good training). random negative example. At the end of each round, a Costs decrease uniformly with additional training. We logistic regression model was trained and applied to the plot 99% confidence intervals under the assumption that unlabeled documents. The training batch for the next costs are normally distributed across replicates. Costs round was then selected by one of three methods: a ran- are not only higher for relevance feedback, but less pre- dom sampling baseline, uncertainty sampling [52], or dictable. relevance feedback (top scoring documents) [51]. Vari- The ASKfm Curse Exclusion (𝑝 = 0.0169) and ants of the latter two are widely used in eDiscovery [57]. Wikipedia Other attack (𝑝 = 0.0019) category are typi- Labels for the training batch were looked up, the batch cal low prevalence categories. Uncertainty sampling and was added to the training set, and a new model trained to relevance feedback act similarly in such circumstances: repeat the cycle. Batches of size 100 and 200 were used even top scoring documents are at best uncertainly posi- and training continued for 80 and 40 iterations respec- tive. Average cost across replicates levels off and starts to tively, resulting in 8002 coded training documents at the increase after 44 iterations for uncertainty sampling and end. 45 iterations for relevance feedback. This is the point at We implemented the TAR workflow in libact5 [58], which additional training no longer pays for itself by im- an open-source framework for active learning experi- proving the ranking of documents. For this category (and ments. We fit logistic regression models using Vowpal typically) this occurs shortly before 80% recall is reached Wabbit6 with default parameter settings. Our experiment 5 https://github.com/ntucllab/libact 6 7 https://vowpalwabbit.org/ https://github.com/eugene-yang/TAR-Content-Moderation Figure 2: Total cost for TAR alternatives to identify 80% of positive documents for Wikipedia Attack, Other Attack, and Recipient Attack, and ASKfm Curse Exclusion, General Insult, and Sexism classifications. Values are averaged over 20 replicates, and a 99% confidence interval on costs is shown as shading around each curve. Horizontal line is cost to review a random 80% of the data set. on the training data alone (iteration 48 for uncertainty ing sets reached 5000 documents for ASKfm but continue sampling and iteration 52 for relevance feedback). for Wikipedia. Categories in Wikipedia (𝑝 = 0.1344 Task such as the ASKfm Sexism category (𝑝 = 0.0030) to 0.0018) are generally more frequent comparing to that deals with nuances in human languages requires ASKfm (𝑝 = 0.0463 to 0.001), providing more advan- more training data to produce a stable classifier. While tage for training to identify more positive documents. obtaining training data by random sampling stops reduc- Larger batch size slightly reduce the improvement as the ing the cost after the first iteration, uncertainty sampling underlying classifiers are retrained less frequently. In and relevance feedback continue to take advantage of practice, the sizes are depending on the cost structure of additional training data to minimize the cost and become reviewing and specific workflows in each organization. more predictable. However, as the classifiers are frequently updated with Note that the general relationship between the preva- more coded documents, the total cost would be reduced lence of the task and the cost of reaching a certain recall over the iterations. target using TAR workflows is discussed Yang et al. [9]. Besides the overall cost reduction, Figure 3 shows Table 1 looks more broadly at the two datasets, averag- a heatmap of mean precision across 20 replicates for ing costs both over all topics and over 20 replicate runs batches 1 to 81 with batch size of 100, to give insight for each topic for batch sizes of both 100 and 200 . By into the moderator experience of TAR workflows. Pre- 20 iterations with batch size of 100 (2002 training doc- cision for relevance feedback starts high and declines uments), TAR workflows with both relevance feedback very gradually. Uncertainty sampling maintains rela- and uncertainty sampling significantly reduce costs ver- tively constant precision. For the very low prevalence sus TAR with random sampling. (Significance is based on category Curse Exclusion we cut off the heatmap at 52 paired t-tests assuming non-identical variances and mak- iterations for relevance feedback and 48 iterations for ing a Bonferroni correction for 72 tests.) All three TAR uncertainty sampling since on average 80% recall is ob- methods in turn dominate reviewing a random 80% of tained on training data alone by those iterations. For the dataset, which costs 92,590 for Wikipedia and 90,958 both categories, even applying uncertainty sampling that for ASKfm. is intended to improve the quality of the classifier im- The improvement over cost plateaued after the train- proves the batch precision over the random sampling be Table 1 Total review cost to reach 80% recall. Values are averaged over all topics for a data set and 20 replicates. Percentages show relative cost reduction over the random sample training baseline. A * indicates that the difference is statistically significant over the random sample training baseline with 99% confidence by conducting paired t-test with Bonferroni correction. ASKfm Wikipedia batch # Train Random Relevance Uncertainty Random Relevance Uncertainty 100 202 47685.53 *49833.73 (-4.50) *50273.21 (-5.43) 52948.45 *60751.69 (-14.74) 52210.00 ( 1.39) 1002 46327.93 *43329.31 ( 6.47) *42723.12 ( 7.78) 49010.71 52931.28 ( -8.00) *39879.78 (18.63) 2002 45139.15 *38179.79 (15.42) *37938.19 (15.95) 47805.25 46673.34 ( 2.37) *29387.06 (38.53) 3002 44148.28 *34909.72 (20.93) *34719.50 (21.36) 47065.66 *38964.91 ( 17.21) *25676.82 (45.44) 4002 43731.25 *33439.69 (23.53) *32795.05 (25.01) 47234.75 *34408.14 ( 27.16) *24202.29 (48.76) 5002 43469.91 *32261.33 (25.78) *31957.57 (26.48) 47125.79 *31267.88 ( 33.65) *22746.94 (51.73) 6002 42973.85 *31767.73 (26.08) *31384.51 (26.97) 47300.02 *28945.59 ( 38.80) *21922.42 (53.65) 7002 42563.09 *30567.00 (28.18) *30502.95 (28.33) 47086.42 *27356.89 ( 41.90) *21301.92 (54.76) 8002 42385.43 *30708.85 (27.55) *30441.77 (28.18) 47106.34 *25949.51 ( 44.91) *21144.28 (55.11) 200 202 47685.53 *49302.36 (-3.39) *49339.93 (-3.47) 52948.45 *58866.41 (-11.18) 55747.35 (-5.29) 1002 46327.93 45014.51 ( 2.84) 44733.10 ( 3.44) 49010.71 *55302.14 (-12.84) *42896.71 (12.47) 2002 45139.15 *40473.12 (10.34) *39894.98 (11.62) 47805.25 49968.88 ( -4.53) *33981.56 (28.92) 3002 44148.28 *37050.02 (16.08) *36902.63 (16.41) 47065.66 42521.55 ( 9.65) *28332.55 (39.80) 4002 43731.25 *35310.13 (19.26) *34888.22 (20.22) 47234.75 *37492.98 ( 20.62) *25667.95 (45.66) 5002 43469.91 *33690.33 (22.50) *33519.15 (22.89) 47125.79 *34933.90 ( 25.87) *24070.44 (48.92) 6002 42973.85 *32425.25 (24.55) *32612.13 (24.11) 47300.02 *33004.90 ( 30.22) *22839.39 (51.71) 7002 42563.09 *31488.77 (26.02) *31813.08 (25.26) 47086.42 *31664.04 ( 32.75) *22084.88 (53.10) 8002 42385.43 *31198.75 (26.39) *31171.80 (26.46) 47106.34 *29346.76 ( 37.70) *21837.84 (53.64) Figure 3: Precision in each batch for TAR workflows on Wikipedia Attack(𝑝 = 0.1344) and ASKfm Curse Exclusion(𝑝 = 0.0169) classifications. The x-axis shows the iteration number. A lighter color in an iteration block indicates higher precision. a significant amount. scales) streaming collection of data, and concomitant con- straints on the time available to make a review decision. Batching and prioritization must reflect these constraints. 6. Summary and Future Work Moderation in addition must deal with temporal variation in both textual content and the definitions of sensitive Our results suggest that TAR workflows developed for content, as well as scaling across many languages and legal review tasks may substantially reduce costs for cultures. As litigation and investigations become more content moderation tasks. Other legal workflow tech- international, these challenges may be faced in the law as niques, such as routing near duplicates and conversa- well, providing opportunity for the legal and moderation tional threads in batches to the same reviewer, may be fields to learn from each other. worth testing as well. This preliminary experiment omitted complexities that should be explored in more detailed studies. Both con- tent moderation and legal cases involve (at different time References S. Tomlinson, Evaluation of information retrieval for e-discovery, Artificial Intelligence and Law 18 [1] N. Duarte, E. Llanso, A. Loup, Mixed messages? (2010) 347–386. the limits of automated social media content analy- [14] A. Roegiest, G. V. Cormack, M. R. Grossman, sis, in: Conference on Fairness, Accountability and C. Clarke, Trec 2015 total recall track overview, Transparency, PMLR, 2018, pp. 106–106. in: TREC, 2015. [2] J. A. Gallo, C. Y. Cho, Social Media: Misinformation [15] M. R. Grossman, G. V. Cormack, A. Roegiest, Trec and Content Moderation Issues for Congress, Tech- 2016 total recall track overview., in: TREC, 2016. nical Report R46662, Congress Research Service, [16] J. R. Baron, N. Payne, Dark archives and edemoc- 2021. URL: https://crsreports.congress.gov/produc racy: strategies for overcoming access barriers to t/pdf/R/R46662. the public record archives of the future, in: 2017 [3] M. Ruckenstein, L. L. M. Turunen, Re-humanizing Conference for E-Democracy and Open Govern- the platform: Content moderators and the ment (CeDEM), IEEE, 2017, pp. 3–11. logic of care, new media & society (2019) [17] I. J. Marshall, B. C. Wallace, Toward systematic 1461444819875990. review automation: a practical guide to using ma- [4] C. Consultants, Use of AI in Online Content Mod- chine learning tools in research synthesis, System- eration, 2019. URL: https://www.ofcom.org.uk/__d atic reviews 8 (2019) 163. ata/assets/pdf_file/0028/157249/cambridge-cons [18] B. C. Wallace, T. A. Trikalinos, J. Lau, C. Brodley, ultants-ai-content-moderation.pdf. C. H. Schmid, Semi-automated screening of biomed- [5] R. Gorwa, R. Binns, C. Katzenbach, Algorithmic ical citations for systematic reviews, BMC bioinfor- content moderation: Technical and political chal- matics 11 (2010) 55. lenges in the automation of platform governance, [19] V. Bekkers, A. Edwards, D. de Kool, Social media Big Data & Society 7 (2020) 2053951719897945. monitoring: Responsive governance in the shadow [6] T. Gillespie, Custodians of the Internet: Platforms, of surveillance?, Government Information Quar- Content Moderation, and the Hidden Decisions terly 30 (2013) 335–342. That Shape Social Media, Yale University Press, [20] A. Veglis, Moderation techniques for social media 2018. URL: https://books.google.com/books?i content, in: International Conference on Social d=cOJgDwAAQBAJ. Computing and Social Media, Springer, 2014, pp. [7] Fact.MR, Content Moderation Solutions Market 137–148. Forecast, Trend Analysis & Competition Track- [21] K. Langvardt, Regulating online content modera- ing - Global Market Insights 2019 to 2029, Tech- tion, The Georgetown law journal 106 (2017) 1353. nical Report FACT4522MR, Fact.MR, 2020. URL: [22] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Go- https://www.factmr.com/report/4522/content harian, O. Frieder, Hate speech detection: Chal- -moderation-solutions-market. lenges and solutions, PloS one 14 (2019) e0221152. [8] M. Surguy, International E-discovery: A Global [23] T. Xiang, S. MacAvaney, E. Yang, N. Goharian, Handbook of Law and Technology, Global Law and ToxCCIn: Toxic content classification with inter- Business Limited, 2018. URL: https://books.google pretability, in: 11th Workshop on Computational .com/books?id=pfK3swEACAAJ. Approaches to Subjectivity, Sentiment & Social Me- [9] E. Yang, D. D. Lewis, O. Frieder, On minimizing cost dia Analysis, 2021. URL: https://arxiv.org/abs/2103 in legal document review workflows, in: Proceed- .01328. ings of the 21st ACM Symposium on Document [24] A. Holland, C. Bavitz, J. Hermes, A. Sellars, R. Bud- Engineering), 2021. ish, M. Lambert, N. Decoster, Intermediary lia- [10] R. G. Maura, V. C. Gordon, Quantifying Success: bility in the united states, Network of Centers– Using Data Science to Measure the Accuracy of Publixphere (2014). Technology-Assisted Review in Electronic Discov- [25] A. Halevy, C. C. Ferrer, H. Ma, U. Ozertem, P. Pan- ery, in: Data-Driven Law: Data Analytics and the tel, M. Saeidi, F. Silvestri, V. Stoyanov, Preserving New Legal Services, CRC Press, 2018, pp. 127–152. integrity in online social networks, arXiv preprint [11] J. Baron, R. Losey, M. Berman, Perspectives on Pre- arXiv:2009.10311 (2020). dictive Coding: And Other Advanced Search Meth- [26] S. Akhtar, V. Basile, V. Patti, Modeling annotator ods for the Legal Practitioner, American Bar As- perspective and polarized opinions to improve hate sociation, Section of Litigation, 2016. URL: https: speech detection, in: Proceedings of the AAAI //books.google.com/books?id=TdJ2AQAACAAJ. Conference on Human Computation and Crowd- [12] J. Tredennick, TAR for Smart People, Catalyst sourcing, volume 8, 2020, pp. 151–154. Repository Systems, 2015. [27] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, [13] D. W. Oard, J. R. Baron, B. Hedin, D. D. Lewis, The risk of racial bias in hate speech detection, in: Proceedings of the 57th Annual Meeting of the As- [39] G. K. Pitsilis, H. Ramampiaro, H. Langseth, De- sociation for Computational Linguistics, 2019, pp. tecting offensive language in tweets using deep 1668–1678. learning, arXiv preprint arXiv:1801.04433 (2018). [28] J. Pavlopoulos, P. Malakasiotis, I. Androutsopoulos, [40] S. Sotudeh, T. Xiang, H.-R. Yao, S. MacAvaney, Deeper attention to abusive user content moder- E. Yang, N. Goharian, O. Frieder, Guir at semeval- ation, in: Proceedings of the 2017 conference on 2020 task 12: Domain-tuned contextualized models empirical methods in natural language processing, for offensive language detection, arXiv preprint 2017, pp. 1125–1135. arXiv:2007.14477 (2020). [29] C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, [41] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, E. Lefever, B. Verhoeven, G. De Pauw, W. Daele- N. Farra, R. Kumar, Semeval-2019 task 6: Identify- mans, V. Hoste, Automatic detection of cyber- ing and categorizing offensive language in social bullying in social media text, PloS one 13 (2018) media (offenseval), arXiv preprint arXiv:1903.08983 e0203794. (2019). [30] K. Reynolds, A. Kontostathis, L. Edwards, Using [42] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova, machine learning to detect cyberbullying, in: 2011 G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis, 10th International Conference on Machine learning Γ‡. Çâltekin, Semeval-2020 task 12: Multilingual and applications and workshops, volume 2, IEEE, offensive language identification in social media 2011, pp. 241–244. (offenseval 2020), arXiv preprint arXiv:2006.07235 [31] A. Schmidt, M. Wiegand, A survey on hate speech (2020). detection using natural language processing, in: [43] R. Binns, M. Veale, M. Van Kleek, N. Shadbolt, Like Proceedings of the Fifth International workshop on trainer, like bot? inheritance of bias in algorithmic natural language processing for social media, 2017, content moderation, in: International Conference pp. 1–10. on Social Informatics, Springer, 2017, pp. 405–415. [32] E. Wulczyn, N. Thain, L. Dixon, Ex machina: Per- [44] L. Dixon, J. Li, J. Sorensen, N. Thain, L. Vasser- sonal attacks seen at scale, in: Proceedings of the man, Measuring and mitigating unintended bias 26th International Conference on World Wide Web, in text classification, in: Proceedings of the 2018 International World Wide Web Conferences Steer- AAAI/ACM Conference on AI, Ethics, and Society, ing Committee, 2017, pp. 1391–1399. 2018, pp. 67–73. [33] T. Davidson, D. Warmsley, M. Macy, I. Weber, Au- [45] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, tomated hate speech detection and the problem of A. Galstyan, A survey on bias and fairness in ma- offensive language, in: Eleventh international aaai chine learning, arXiv preprint arXiv:1908.09635 conference on web and social media, 2017. (2019). [34] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Ra- [46] D. Link, B. Hellingrath, J. Ling, A human-is-the- dosavljevic, N. Bhamidipati, Hate speech detection loop approach for semi-automated content moder- with comment embeddings, in: Proceedings of the ation., in: ISCRAM, 2016. 24th international conference on world wide web, [47] N. M. Pace, L. Zakaras, Where the money goes: ACM, 2015, pp. 29–30. Understanding litigant expenditures for producing [35] P. Fortuna, S. Nunes, A survey on automatic de- electronic discovery, RAND Corporation, 2012. tection of hate speech in text, ACM Computing [48] M. Bagdouri, W. Webber, D. D. Lewis, D. W. Oard, Surveys (CSUR) 51 (2018) 1–30. Towards minimizing the annotation cost of certified [36] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, text classification, in: CIKM 2013, ACM, 2013, pp. Y. Chang, Abusive language detection in online 989–998. user content, in: Proceedings of the 25th interna- [49] G. V. Cormack, M. R. Grossman, Autonomy and reli- tional conference on world wide web, International ability of continuous active learning for technology- World Wide Web Conferences Steering Committee, assisted review, arXiv preprint arXiv:1504.06868 2016, pp. 145–153. (2015). [37] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, [50] B. Settles, Active learning literature survey (2009). N. Farra, R. Kumar, Predicting the type and target [51] J. Rocchio, Relevance feedback in information re- of offensive posts in social media, arXiv preprint trieval, The Smart retrieval system-experiments in arXiv:1902.09666 (2019). automatic document processing (1971) 313–323. [38] R. Kumar, A. N. Reganti, A. Bhatia, T. Maheshwari, [52] D. D. Lewis, W. A. Gale, A sequential algorithm for Aggression-annotated corpus of hindi-english code- training text classifiers, in: SIGIR 1994, 1994, pp. mixed data, in: Proceedings of the Eleventh Inter- 3–12. national Conference on Language Resources and [53] G. V. Cormack, M. R. Grossman, Engineering Qual- Evaluation (LREC-2018), 2018. ity and Reliability in Technology-Assisted Review, in: SIGIR, ACM Press, Pisa, Italy, 2016, pp. 75–84. URL: http://dl.acm.org/citation.cf m?doid=2911 451.2911510. doi:10.1145/2911451.2911510, 00024. [54] D. D. Lewis, E. Yang, O. Frieder, Certifying one- phase technology-assisted reviews (2021). [55] E. Yang, D. D. Lewis, O. Frieder, Heuristic stopping rules for technology-assisted review, in: Proceed- ings of the 21st ACM Symposium on Document Engineering, 2021. [56] D. Li, E. Kanoulas, When to stop reviewing in technology-assisted reviews: Sampling from an adaptive distribution to estimate residual relevant documents, ACM Transactions on Information Sys- tems (TOIS) 38 (2020) 1–36. [57] G. F. Cormack, M. F. Grossman, Evaluation of machine-learning protocols for technology-assisted review in electronic discovery, SIGIR 2014 (2014) 153–162. doi:10.1145/2600428.2609601. [58] Y.-Y. Yang, S.-C. Lee, Y.-A. Chung, T.-E. Wu, S.-A. Chen, H.-T. Lin, libact: Pool-based Active Learn- ing in Python, Technical Report, National Taiwan University, 2017. URL: https://github.com/ntu cllab/libact, available as arXiv preprint https: //arxiv.org/abs/1710.00379.