TAR on Social Media: A Framework for Online Content
Moderation
Eugene Yang1 , David D. Lewis2 and Ophir Frieder3
1
  IR Lab, Georgetown University, Washington, DC, USA
2
  Reveal Brainspace, Chicago, IL, USA
3
  IR Lab, Georgetown University, Washington, DC, USA


                                             Abstract
                                             Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use
                                             to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the
                                             scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We
                                             consider content moderation from the perspective of technology-assisted review (TAR): a human-in-the-loop active learning
                                             approach developed for high recall retrieval problems in civil litigation and other fields. We show how TAR workflows, and a
                                             TAR cost model, can be adapted to the content moderation problem. We then demonstrate on two publicly available content
                                             moderation data sets that a TAR workflow can reduce moderation costs by 20% to 55% across a variety of conditions.

                                             Keywords
                                             Technology-assised review, active learning, social media, content moderation, cost analysis


1. Introduction                                                                                                       and automated classification will be required for online
                                                                                                                      content moderation for the foreseeable future [1, 5, 6].
Online social networks are powerful platforms for per-                                                                This has meant not just capital investments in machine
sonal communication, community building, and free ex-                                                                 learning tools for moderation, but also massive ongoing
pression. Unfortunately, they can also be powerful plat-                                                              personnel expenses for teams of human reviewers [7].
forms for harassment, disinformation, and perpetration                                                                   Surprisingly, the challenge of reducing costs when
of criminal and terrorist activities. Organizations host-                                                             both machine learning and manual review are neces-
ing social networks, such as Facebook, Twitter, Reddit,                                                               sary has been an active area of interest for almost two
and others, have deployed a range of techniques to coun-                                                              decades, but in a completely different area: civil litiga-
teract these threats and maintain a safe and respectful                                                               tion. Electronic discovery (eDiscovery) projects involve
environment for their users.                                                                                          teams of attorneys, sometimes billing the equivalent of
   One such approach is content moderation: removal                                                                   hundreds of euros per person-hour, seeking to find docu-
(hard moderation) or demoting (soft moderation) of                                                                    ments responsive to a legal matter [8]. As the volume of
policy-violating posts [1, 2]. Despite recent progress in                                                             electronically produced documents grew, machine learn-
machine learning, online content moderation still heav-                                                               ing began to be integrated in eDiscovery workflows in
ily relies on human reviews [3]. Facebook’s CEO Mark                                                                  the early 2000s, a history we review elsewhere [9].
Zuckerberg stated that language nuances could get lost                                                                   The result in the legal world has been technology-
when relying on automated detection approaches, empha-                                                                assisted review (TAR): human-in-the-loop active learning
sizing the necessities for human judgments. 1 Ongoing                                                                 workflows that prioritize the most important documents
changes in what is considered inappropriate content com-                                                              for review [10, 11]. One-phase (continuous model refine-
plicates the use of machine learning [4]. Policy experts                                                              ment) and two-phase (with separate training and deploy-
have argued that complete automation of content mod-                                                                  ment phases) TAR workflows are both in use [9, 12].
eration is socially undesirable regardless of algorithmic                                                                Because of the need to find most or all relevant docu-
accuracy [5].                                                                                                         ments, eDiscovery has been referred to as a high recall
   It is thus widely believed that both human moderation                                                              review (HRR) problem [13, 14, 15]. HRR problems also
                                                                                                                      arise in systematic reviews in medicine, sunshine law
DESIRES 2021 – 2nd International Conference on Design of
Experimental Search & Information REtrieval Systems, September                                                        requests, and other tasks [16, 17, 18]. Online content
15–18, 2021, Padua, Italy                                                                                             moderation is an HRR problem as well, in that a very
" eugene@ir.cs.georgetown.edu (E. Yang);                                                                              high proportion of inappropriate content should be iden-
desires2021paper@davelewis.com (D. D. Lewis);                                                                         tified and removed.
ophir@ir.cs.georgetown.edu (O. Frieder)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative      Our contributions in this paper are two-fold. First, we
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Commons License Attribution 4.0 International (CC BY 4.0).
        CEUR Workshop Proceedings (CEUR-WS.org)                                                                       describe how to adapt TAR and its cost-based evaluation
                  1
    https://www.businessinsider.com/zuckerberg-nuances-conte                                                          framework to the content moderation problem. Second,
nt-moderation-ai-misinformation-hearing-2021-3
we test this approach using two publicly available con-          [22, 31, 33, 34, 35, 36] or offensive language in general
tent moderation datasets. Our experiments show substan-          [23, 37, 38, 39, 40, 41, 42].
tial cost reductions using the proposed TAR framework               However, some moderation judgments are inevitably
over both manual review of unprioritized documents and           too subtle for purely automated methods3 , particularly
training of prioritized models on random samples.                when content is generated with the intent of fooling au-
                                                                 tomated systems [1, 25, 43]. Content that is recontextual-
                                                                 ized from the original problematic context, for example,
2. Background                                                    through reposting, screenshotting, and embedding in
                                                                 new contexts complicates moderation [2]. Additionally,
Content moderation on online platforms is a neces-
                                                                 bias in automated systems can also arise both by learn-
sity [19, 20] and has been argued by some to be the defin-
                                                                 ing from biased labels and from numerous other choices
ing feature of an online platform [6]. Despite terms of
                                                                 in data preparation and algorithmic settings [27, 44, 45].
service and community rules on each platform, users
                                                                 Biased models risk further marginalizing and dispropor-
produce inappropriate content, particularly when anony-
                                                                 tionately censoring groups that already face discrimina-
mous [21]. Inappropriate content includes toxic content
                                                                 tion [1]. Differences in cultural and regulatory contexts
such as hate speech [22], offensive content [23], and mis
                                                                 further complicate the definition of appropriateness, cre-
/ disinformation [4, 23]. It also includes content that is
                                                                 ating another dimension of complexity when deploying
inappropriate for legal or commercial reasons, such as
                                                                 automated content moderation [4].
potential copyright violations [5, 24].
                                                                    Human-in-the-loop approaches, where AI systems ac-
   The identification of toxic content can require subtle
                                                                 tively manage which materials are brought to the at-
human insight [4, 22], both due to attempts at obfuscation
                                                                 tention of human moderators, attempt to address the
by posters, and because the inappropriateness of the
                                                                 weaknesses of both approaches while gathering training
content is often tied to its cultural, regional, and temporal
                                                                 data to support supervised learning components [25, 46].
context [1, 3]. Mis- and disinformation often consists
                                                                 Filtering mechanisms that proactively present only ap-
of subtle mixtures of truthful and misleading content
                                                                 proved content (pre-moderation) and/or removal mecha-
that require human common sense inferences and other
                                                                 nisms that passively take down inappropriate ones are
background knowledge [4, 23].
                                                                 used by platforms depending on the intensity [4]. Re-
   Social media organizations have deployed numerous
                                                                 viewing protocols could shift from one to the other based
techniques for implementing community policies, includ-
                                                                 on the frequency of violations or during a specific event,
ing graph- and time-based analyses of communication
                                                                 such as elections4 . Regardless of the workflows, the core
patterns, user profile information, and others [25]. Our
                                                                 and arguably the most critical components is reviews.
focus here, however, is on methods that use the content
                                                                 However, the primary research focus of human-in-the-
of a post.
                                                                 loop content moderation has been on classification algo-
   Content monitoring falls into three categories: man-
                                                                 rithm design and bias mitigation, rarely on the investiga-
ual moderation, text classification, and human-in-the-
                                                                 tion of the overall workflow.
loop methods. The latter two approaches leverage ma-
                                                                    Like content moderation, eDiscovery is a high recall
chine learning models and are sometimes collectively
                                                                 retrieval task applied to large bodies of primarily tex-
referred to as algorithmic content moderation in policy
                                                                 tual content (typically enterprise documents, email, and
research [5].
                                                                 chat) [11, 12]. Both fixed data set and streaming task
   Manual moderation is the oldest approach, dating back
                                                                 structures have been explored, though the streaming
to email mailing lists. It is, however, extremely expensive
                                                                 context tends to bursty (e.g., all data from a single person
at the scale of large social networks and suffers potential
                                                                 arriving at once) rather than continuous. Since cost min-
human biases. Additionally, mental health concerns are
                                                                 imization is a primary rationale for TAR [47], research
an issue for moderators exposed to large volumes of toxic
                                                                 on TAR has focused on training regimens and workflows
content [25, 26, 27].
                                                                 for minimizing the number, or more generally the cost,
   The simplest text classification approaches are key-
                                                                 of documents reviewed [9, 12]. A new TAR approach is
word filters, but these are susceptible to embarrass-
                                                                 typically evaluated for its ability to meet an effectiveness
ing mistakes2 and countermeasures by content cre-
                                                                 target while minimizing cost or a cost target while maxi-
ators. More effective text classification approaches to
                                                                 mizing effectiveness [18, 48, 49]. This makes approaches
content moderation are based on supervised machine
                                                                 developed for TAR natural to consider for content mod-
learning [28, 29]. Content types that have been ad-
                                                                 eration.
dressed include cyberbullying [29, 30, 31, 32], hate speech
                                                                      3
                                                                        https://venturebeat.com/2020/05/23/ai-proves-its-a-poor-su
    2
     https://www.techdirt.com/articles/20200912/11133045288/p    bstitute-for-human-content-checkers-during-lockdown/
                                                                      4
aypal-blocks-purchases-tardigrade-merchandise-potentially-viol          https://www.washingtonpost.com/technology/2020/11/07/f
ating-us-sanctions-laws.shtml                                    acebook-groups-election/
3. Applying TAR to Content                                         shut up mind your own business and go f*** some one
   Moderation                                                      else over

In most TAR applications, at least a few documents of
the (usually rare) category of interest are available at                          (a) Wikipedia collection.
the start of the workflow. These are used to initialize an
iterative pool-based active learning workflow [50]. Re-            : being in love with a girl you dont even know yours is
viewed documents are used to train a predictive model,             sadder
which in turn is used to select further documents based            : f*** off you f***ing c***!
on predicted relevance [51], uncertainty [52], or compos-
ite factors. Workflows may be batch-oriented (mimicking
pre-machine learning manual workflows common in the
                                                                                    (b) ASKfm collection
law) or a stream of documents may be presented through
an interactive interface with training done in the back-         Figure 1: Example content in the collections
ground. These active learning workflows have almost
completely displaced training from random examples
when supervised learning is used in eDiscovery.                  tion approaches used in social media are complex, but in
   Two workflow styles can be distinguished [9]. In a            the end reduce to some combination of machine-assisted
one-phase workflow, iterative review and training simply         manual decisions (phase one) and automated decisions
continues until a stopping rule is triggered [49, 53, 54].       based on deploying a trained model (phase two). Opera-
Stopping may be conditioned on estimated effectiveness           tional decisions such as flagging and screening all posts
(usually recall), cost limits, and other factors [53, 55, 56].   from an account or massive reviewing of posts related
Two-phase workflows stop training before review is fin-          to certain events [4, 6] are all results of applying previ-
ished, and deploy the final trained classifier to rank the       ously trained models, which is also a form of deployment.
remaining documents for review. The reviewed docu-               Also, broadly applying the model to filter the content
ments are typically drawn from the top of the ranking,           vastly reduces moderation burden when similar content
with the depth in the ranking chosen so that an estimated        is rapidly being published on the platform with the risk
effectiveness target is reached [18, 48]. Two-phase work-        of falsely removal [4]. We claim no optimal for this spe-
flows are favored when labeling of training data needs to        cific simplified model in evaluating content moderation,
be done by more expensive personnel than are necessary           but an initial effort for modeling the human-in-the-loop
for routine review.                                              moderation process.
   The cost of both one- and two-phase TAR workflows                When applying the model to content moderation, how-
can be captured by in a common cost model [9]. The               ever, we assume uniform review costs for all documents.
model defines the total cost of a one-phase review termi-        This seems the best assumption given the short length
nated at a particular point as the cost incurred in review-      of texts reviewed and what is known publicly about the
ing documents to that point, plus a penalty if the desired       cost structure of moderation [6].
effectiveness target (e.g., a minimum recall value) has not         In the next section, we describe our experimental set-
been met. The penalty is simply the cost of continuing           ting for adapting and evaluating TAR for content moder-
on to an optimal second-phase review from that point,            ation.
i.e. the minimum number of prioritized documents is
reviewed to hit the effectiveness target. For a two-phase
workflow, we similarly define total cost to be the cost          4. Experiment Design
of the training phase plus the cost of an optimal second
phase using the final trained model.                             Here we review the data sets, evaluation metric, and
   These costs in both cases are idealizations in that there     implementation details for our experiment.
may be additional cost (e.g. a labeled random sample) to
choose a phase two cutoff citecikmpaper. However, the            4.1. Data Sets
model allows a wide range of workflows to be compared
on a common basis, as well as allowing differential costs        We used two fully labeled and publicly available con-
for review of positive vs. negative documents, or phase          tent moderation data sets with a focus on inappropriate
one vs. phase two documents.                                     user-generated content. The Wikipedia personal attack
   While developed for eDiscovery, the above cost model          data set [32] consists of 115,737 Wikipedia discussion
is also a good fit for content moderation. As discussed          comments with labels obtained via crowdsourcing. An
in the previous section, the human-in-the-loop modera-           example of the comment is presented in Figure 1(a) Eight
annotators assigned one of five mutually exclusive la-         framework is available on GitHub7 .
bels to each document: Recipient Target, Third Party
Target, Quotation Attack, Other Attack, and No Attack          4.3. Evaluation
(our names). We defined three binary classification tasks
corresponding to distinguishing Recipient Target, Third        Our metric was total cost to reach 80% recall as described
Party Target, or Other Attack from all other classes. (Quo-    in Section 3. This was computed at the end of each train-
tation Attack had too low a prevalence.) A fourth binary       ing round as the sum of the number of training doc-
classification task distinguished the union of all attacks     uments, plus the ideal second phase review cost as a
from No Attack. A document was a positive example if 5         penalty, which is the number of additional top-ranked
or more annotators put it in the positive class. Proportion    documents (if any) needed to bring recall up to 80%. Rank-
of the positive class ranged from 13.44% to 0.18%.             ing was based on sorting the non-training documents by
   The ASKfm cyberbullying dataset [29] contains 61,232        probability of relevance using the most recent trained
English utterance/response pairs, each of which we             model. Note that we experimented with 80% recall as
treated as a single document. An example of the con-           an example. However, the TAR workflow is capable of
versation is presented in Figure 1(b). Linguists annotated     running with arbitrary recall target, such as 95% for sys-
both the poster and responder with zero or one of four         tematic review [18, 56].
mutually exclusive cyberbullying roles, as well as an-            In actual TAR workflows, recall would be estimated
notating the pair as a whole for any combination of 15         from a labeled random sample. Since the cost of this sam-
types of textual expressions related to cyberbullying. We      ple would be constant across our experimental conditions
treated these annotations as defining 23 binary classifica-    we used an oracle for recall instead.
tions for a pair, with prevalence of the positive examples
ranging from 4.63% to 0.04%.
   For both data sets we refer to the binary classification    5. Results and Analysis
tasks as topics and the units being classified as documents.   Our core finding was that, as in eDiscovery, active selec-
Documents were tokenized by separating at punctuation          tion of which documents to review reduces costs over
and whitespace. Each distinct term became a feature. We        random selection. Figure 2 shows mean cost to reach
used log tf weighting as the features for the underlying       80% recall over 20 replications (different seed sets and
classification model. The value of a feature was 0 if not      random samples) for six representative categories. On all
present, and else 1 + 𝑙𝑜𝑔(𝑡𝑓 ), where 𝑡𝑓 is the number of      six categories, all TAR workflows within a few iterations
occurrences of that term in the document.                      beat the baseline of reviewing a random 80% of the data
                                                               set (horizontal line labeled Manual Review).
4.2. Algorithms and Workflow                                      The Wikipedia Attack category is typical of low to
                                                               moderate prevalence categories (𝑝 = 0.1344). Uncer-
Our experiments simulated a typical TAR workflow. The
                                                               tainty sampling strongly dominates both random sam-
first training round is a seed set consisting of one ran-
                                                               pling (too few positives chosen) and relevance feedback
dom positive example (simulating manual input) and one
                                                               (too many redundant positives chosen for good training).
random negative example. At the end of each round, a
                                                               Costs decrease uniformly with additional training. We
logistic regression model was trained and applied to the
                                                               plot 99% confidence intervals under the assumption that
unlabeled documents. The training batch for the next
                                                               costs are normally distributed across replicates. Costs
round was then selected by one of three methods: a ran-
                                                               are not only higher for relevance feedback, but less pre-
dom sampling baseline, uncertainty sampling [52], or
                                                               dictable.
relevance feedback (top scoring documents) [51]. Vari-
                                                                  The ASKfm Curse Exclusion (𝑝 = 0.0169) and
ants of the latter two are widely used in eDiscovery [57].
                                                               Wikipedia Other attack (𝑝 = 0.0019) category are typi-
Labels for the training batch were looked up, the batch
                                                               cal low prevalence categories. Uncertainty sampling and
was added to the training set, and a new model trained to
                                                               relevance feedback act similarly in such circumstances:
repeat the cycle. Batches of size 100 and 200 were used
                                                               even top scoring documents are at best uncertainly posi-
and training continued for 80 and 40 iterations respec-
                                                               tive. Average cost across replicates levels off and starts to
tively, resulting in 8002 coded training documents at the
                                                               increase after 44 iterations for uncertainty sampling and
end.
                                                               45 iterations for relevance feedback. This is the point at
   We implemented the TAR workflow in libact5 [58],
                                                               which additional training no longer pays for itself by im-
an open-source framework for active learning experi-
                                                               proving the ranking of documents. For this category (and
ments. We fit logistic regression models using Vowpal
                                                               typically) this occurs shortly before 80% recall is reached
Wabbit6 with default parameter settings. Our experiment
    5
        https://github.com/ntucllab/libact
    6                                                              7
        https://vowpalwabbit.org/                                      https://github.com/eugene-yang/TAR-Content-Moderation
Figure 2: Total cost for TAR alternatives to identify 80% of positive documents for Wikipedia Attack, Other Attack, and
Recipient Attack, and ASKfm Curse Exclusion, General Insult, and Sexism classifications. Values are averaged over 20 replicates,
and a 99% confidence interval on costs is shown as shading around each curve. Horizontal line is cost to review a random
80% of the data set.


on the training data alone (iteration 48 for uncertainty         ing sets reached 5000 documents for ASKfm but continue
sampling and iteration 52 for relevance feedback).               for Wikipedia. Categories in Wikipedia (𝑝 = 0.1344
   Task such as the ASKfm Sexism category (𝑝 = 0.0030)           to 0.0018) are generally more frequent comparing to
that deals with nuances in human languages requires              ASKfm (𝑝 = 0.0463 to 0.001), providing more advan-
more training data to produce a stable classifier. While         tage for training to identify more positive documents.
obtaining training data by random sampling stops reduc-          Larger batch size slightly reduce the improvement as the
ing the cost after the first iteration, uncertainty sampling     underlying classifiers are retrained less frequently. In
and relevance feedback continue to take advantage of             practice, the sizes are depending on the cost structure of
additional training data to minimize the cost and become         reviewing and specific workflows in each organization.
more predictable.                                                However, as the classifiers are frequently updated with
   Note that the general relationship between the preva-         more coded documents, the total cost would be reduced
lence of the task and the cost of reaching a certain recall      over the iterations.
target using TAR workflows is discussed Yang et al. [9].            Besides the overall cost reduction, Figure 3 shows
   Table 1 looks more broadly at the two datasets, averag-       a heatmap of mean precision across 20 replicates for
ing costs both over all topics and over 20 replicate runs        batches 1 to 81 with batch size of 100, to give insight
for each topic for batch sizes of both 100 and 200 . By          into the moderator experience of TAR workflows. Pre-
20 iterations with batch size of 100 (2002 training doc-         cision for relevance feedback starts high and declines
uments), TAR workflows with both relevance feedback              very gradually. Uncertainty sampling maintains rela-
and uncertainty sampling significantly reduce costs ver-         tively constant precision. For the very low prevalence
sus TAR with random sampling. (Significance is based on          category Curse Exclusion we cut off the heatmap at 52
paired t-tests assuming non-identical variances and mak-         iterations for relevance feedback and 48 iterations for
ing a Bonferroni correction for 72 tests.) All three TAR         uncertainty sampling since on average 80% recall is ob-
methods in turn dominate reviewing a random 80% of               tained on training data alone by those iterations. For
the dataset, which costs 92,590 for Wikipedia and 90,958         both categories, even applying uncertainty sampling that
for ASKfm.                                                       is intended to improve the quality of the classifier im-
   The improvement over cost plateaued after the train-          proves the batch precision over the random sampling be
Table 1
Total review cost to reach 80% recall. Values are averaged over all topics for a data set and 20 replicates. Percentages show
relative cost reduction over the random sample training baseline. A * indicates that the difference is statistically significant
over the random sample training baseline with 99% confidence by conducting paired t-test with Bonferroni correction.

                                          ASKfm                                                Wikipedia
  batch    # Train    Random            Relevance          Uncertainty     Random              Relevance         Uncertainty
  100      202        47685.53   *49833.73 (-4.50)    *50273.21 (-5.43)    52948.45   *60751.69 (-14.74)     52210.00 ( 1.39)
           1002       46327.93   *43329.31 ( 6.47)    *42723.12 ( 7.78)    49010.71    52931.28 ( -8.00)    *39879.78 (18.63)
           2002       45139.15   *38179.79 (15.42)    *37938.19 (15.95)    47805.25    46673.34 ( 2.37)     *29387.06 (38.53)
           3002       44148.28   *34909.72 (20.93)    *34719.50 (21.36)    47065.66   *38964.91 ( 17.21)    *25676.82 (45.44)
           4002       43731.25   *33439.69 (23.53)    *32795.05 (25.01)    47234.75   *34408.14 ( 27.16)    *24202.29 (48.76)
           5002       43469.91   *32261.33 (25.78)    *31957.57 (26.48)    47125.79   *31267.88 ( 33.65)    *22746.94 (51.73)
           6002       42973.85   *31767.73 (26.08)    *31384.51 (26.97)    47300.02   *28945.59 ( 38.80)    *21922.42 (53.65)
           7002       42563.09   *30567.00 (28.18)    *30502.95 (28.33)    47086.42   *27356.89 ( 41.90)    *21301.92 (54.76)
           8002       42385.43   *30708.85 (27.55)    *30441.77 (28.18)    47106.34   *25949.51 ( 44.91)    *21144.28 (55.11)
  200      202        47685.53   *49302.36 (-3.39)    *49339.93 (-3.47)    52948.45   *58866.41 (-11.18)     55747.35 (-5.29)
           1002       46327.93    45014.51 ( 2.84)     44733.10 ( 3.44)    49010.71   *55302.14 (-12.84)    *42896.71 (12.47)
           2002       45139.15   *40473.12 (10.34)    *39894.98 (11.62)    47805.25    49968.88 ( -4.53)    *33981.56 (28.92)
           3002       44148.28   *37050.02 (16.08)    *36902.63 (16.41)    47065.66    42521.55 ( 9.65)     *28332.55 (39.80)
           4002       43731.25   *35310.13 (19.26)    *34888.22 (20.22)    47234.75   *37492.98 ( 20.62)    *25667.95 (45.66)
           5002       43469.91   *33690.33 (22.50)    *33519.15 (22.89)    47125.79   *34933.90 ( 25.87)    *24070.44 (48.92)
           6002       42973.85   *32425.25 (24.55)    *32612.13 (24.11)    47300.02   *33004.90 ( 30.22)    *22839.39 (51.71)
           7002       42563.09   *31488.77 (26.02)    *31813.08 (25.26)    47086.42   *31664.04 ( 32.75)    *22084.88 (53.10)
           8002       42385.43   *31198.75 (26.39)    *31171.80 (26.46)    47106.34   *29346.76 ( 37.70)    *21837.84 (53.64)


Figure 3: Precision in each batch for TAR workflows on Wikipedia Attack(𝑝 = 0.1344) and ASKfm Curse Exclusion(𝑝 =
0.0169) classifications. The x-axis shows the iteration number. A lighter color in an iteration block indicates higher precision.


a significant amount.                                      scales) streaming collection of data, and concomitant con-
                                                           straints on the time available to make a review decision.
                                                           Batching and prioritization must reflect these constraints.
6. Summary and Future Work                                 Moderation in addition must deal with temporal variation
                                                           in both textual content and the definitions of sensitive
Our results suggest that TAR workflows developed for
                                                           content, as well as scaling across many languages and
legal review tasks may substantially reduce costs for
                                                           cultures. As litigation and investigations become more
content moderation tasks. Other legal workflow tech-
                                                           international, these challenges may be faced in the law as
niques, such as routing near duplicates and conversa-
                                                           well, providing opportunity for the legal and moderation
tional threads in batches to the same reviewer, may be
                                                           fields to learn from each other.
worth testing as well.
   This preliminary experiment omitted complexities that
should be explored in more detailed studies. Both con-
tent moderation and legal cases involve (at different time
References                                                       S. Tomlinson, Evaluation of information retrieval
                                                                 for e-discovery, Artificial Intelligence and Law 18
 [1] N. Duarte, E. Llanso, A. Loup, Mixed messages?              (2010) 347–386.
     the limits of automated social media content analy-    [14] A. Roegiest, G. V. Cormack, M. R. Grossman,
     sis, in: Conference on Fairness, Accountability and         C. Clarke, Trec 2015 total recall track overview,
     Transparency, PMLR, 2018, pp. 106–106.                      in: TREC, 2015.
 [2] J. A. Gallo, C. Y. Cho, Social Media: Misinformation   [15] M. R. Grossman, G. V. Cormack, A. Roegiest, Trec
     and Content Moderation Issues for Congress, Tech-           2016 total recall track overview., in: TREC, 2016.
     nical Report R46662, Congress Research Service,        [16] J. R. Baron, N. Payne, Dark archives and edemoc-
     2021. URL: https://crsreports.congress.gov/produc           racy: strategies for overcoming access barriers to
     t/pdf/R/R46662.                                             the public record archives of the future, in: 2017
 [3] M. Ruckenstein, L. L. M. Turunen, Re-humanizing             Conference for E-Democracy and Open Govern-
     the platform: Content moderators and the                    ment (CeDEM), IEEE, 2017, pp. 3–11.
     logic of care,       new media & society (2019)        [17] I. J. Marshall, B. C. Wallace, Toward systematic
     1461444819875990.                                           review automation: a practical guide to using ma-
 [4] C. Consultants, Use of AI in Online Content Mod-            chine learning tools in research synthesis, System-
     eration, 2019. URL: https://www.ofcom.org.uk/__d            atic reviews 8 (2019) 163.
     ata/assets/pdf_file/0028/157249/cambridge-cons         [18] B. C. Wallace, T. A. Trikalinos, J. Lau, C. Brodley,
     ultants-ai-content-moderation.pdf.                          C. H. Schmid, Semi-automated screening of biomed-
 [5] R. Gorwa, R. Binns, C. Katzenbach, Algorithmic              ical citations for systematic reviews, BMC bioinfor-
     content moderation: Technical and political chal-           matics 11 (2010) 55.
     lenges in the automation of platform governance,       [19] V. Bekkers, A. Edwards, D. de Kool, Social media
     Big Data & Society 7 (2020) 2053951719897945.               monitoring: Responsive governance in the shadow
 [6] T. Gillespie, Custodians of the Internet: Platforms,        of surveillance?, Government Information Quar-
     Content Moderation, and the Hidden Decisions                terly 30 (2013) 335–342.
     That Shape Social Media, Yale University Press,        [20] A. Veglis, Moderation techniques for social media
     2018. URL: https://books.google.com/books?i                 content, in: International Conference on Social
     d=cOJgDwAAQBAJ.                                             Computing and Social Media, Springer, 2014, pp.
 [7] Fact.MR, Content Moderation Solutions Market                137–148.
     Forecast, Trend Analysis & Competition Track-          [21] K. Langvardt, Regulating online content modera-
     ing - Global Market Insights 2019 to 2029, Tech-            tion, The Georgetown law journal 106 (2017) 1353.
     nical Report FACT4522MR, Fact.MR, 2020. URL:           [22] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Go-
     https://www.factmr.com/report/4522/content                  harian, O. Frieder, Hate speech detection: Chal-
     -moderation-solutions-market.                               lenges and solutions, PloS one 14 (2019) e0221152.
 [8] M. Surguy, International E-discovery: A Global         [23] T. Xiang, S. MacAvaney, E. Yang, N. Goharian,
     Handbook of Law and Technology, Global Law and              ToxCCIn: Toxic content classification with inter-
     Business Limited, 2018. URL: https://books.google           pretability, in: 11th Workshop on Computational
     .com/books?id=pfK3swEACAAJ.                                 Approaches to Subjectivity, Sentiment & Social Me-
 [9] E. Yang, D. D. Lewis, O. Frieder, On minimizing cost        dia Analysis, 2021. URL: https://arxiv.org/abs/2103
     in legal document review workflows, in: Proceed-            .01328.
     ings of the 21st ACM Symposium on Document             [24] A. Holland, C. Bavitz, J. Hermes, A. Sellars, R. Bud-
     Engineering), 2021.                                         ish, M. Lambert, N. Decoster, Intermediary lia-
[10] R. G. Maura, V. C. Gordon, Quantifying Success:             bility in the united states, Network of Centers–
     Using Data Science to Measure the Accuracy of               Publixphere (2014).
     Technology-Assisted Review in Electronic Discov-       [25] A. Halevy, C. C. Ferrer, H. Ma, U. Ozertem, P. Pan-
     ery, in: Data-Driven Law: Data Analytics and the            tel, M. Saeidi, F. Silvestri, V. Stoyanov, Preserving
     New Legal Services, CRC Press, 2018, pp. 127–152.           integrity in online social networks, arXiv preprint
[11] J. Baron, R. Losey, M. Berman, Perspectives on Pre-         arXiv:2009.10311 (2020).
     dictive Coding: And Other Advanced Search Meth-        [26] S. Akhtar, V. Basile, V. Patti, Modeling annotator
     ods for the Legal Practitioner, American Bar As-            perspective and polarized opinions to improve hate
     sociation, Section of Litigation, 2016. URL: https:         speech detection, in: Proceedings of the AAAI
     //books.google.com/books?id=TdJ2AQAACAAJ.                   Conference on Human Computation and Crowd-
[12] J. Tredennick, TAR for Smart People, Catalyst               sourcing, volume 8, 2020, pp. 151–154.
     Repository Systems, 2015.                              [27] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith,
[13] D. W. Oard, J. R. Baron, B. Hedin, D. D. Lewis,             The risk of racial bias in hate speech detection, in:
     Proceedings of the 57th Annual Meeting of the As-      [39] G. K. Pitsilis, H. Ramampiaro, H. Langseth, De-
     sociation for Computational Linguistics, 2019, pp.          tecting offensive language in tweets using deep
     1668–1678.                                                  learning, arXiv preprint arXiv:1801.04433 (2018).
[28] J. Pavlopoulos, P. Malakasiotis, I. Androutsopoulos,   [40] S. Sotudeh, T. Xiang, H.-R. Yao, S. MacAvaney,
     Deeper attention to abusive user content moder-             E. Yang, N. Goharian, O. Frieder, Guir at semeval-
     ation, in: Proceedings of the 2017 conference on            2020 task 12: Domain-tuned contextualized models
     empirical methods in natural language processing,           for offensive language detection, arXiv preprint
     2017, pp. 1125–1135.                                        arXiv:2007.14477 (2020).
[29] C. Van Hee, G. Jacobs, C. Emmery, B. Desmet,           [41] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal,
     E. Lefever, B. Verhoeven, G. De Pauw, W. Daele-             N. Farra, R. Kumar, Semeval-2019 task 6: Identify-
     mans, V. Hoste, Automatic detection of cyber-               ing and categorizing offensive language in social
     bullying in social media text, PloS one 13 (2018)           media (offenseval), arXiv preprint arXiv:1903.08983
     e0203794.                                                   (2019).
[30] K. Reynolds, A. Kontostathis, L. Edwards, Using        [42] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova,
     machine learning to detect cyberbullying, in: 2011          G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis,
     10th International Conference on Machine learning           Ç. Çöltekin, Semeval-2020 task 12: Multilingual
     and applications and workshops, volume 2, IEEE,             offensive language identification in social media
     2011, pp. 241–244.                                          (offenseval 2020), arXiv preprint arXiv:2006.07235
[31] A. Schmidt, M. Wiegand, A survey on hate speech             (2020).
     detection using natural language processing, in:       [43] R. Binns, M. Veale, M. Van Kleek, N. Shadbolt, Like
     Proceedings of the Fifth International workshop on          trainer, like bot? inheritance of bias in algorithmic
     natural language processing for social media, 2017,         content moderation, in: International Conference
     pp. 1–10.                                                   on Social Informatics, Springer, 2017, pp. 405–415.
[32] E. Wulczyn, N. Thain, L. Dixon, Ex machina: Per-       [44] L. Dixon, J. Li, J. Sorensen, N. Thain, L. Vasser-
     sonal attacks seen at scale, in: Proceedings of the         man, Measuring and mitigating unintended bias
     26th International Conference on World Wide Web,            in text classification, in: Proceedings of the 2018
     International World Wide Web Conferences Steer-             AAAI/ACM Conference on AI, Ethics, and Society,
     ing Committee, 2017, pp. 1391–1399.                         2018, pp. 67–73.
[33] T. Davidson, D. Warmsley, M. Macy, I. Weber, Au-       [45] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman,
     tomated hate speech detection and the problem of            A. Galstyan, A survey on bias and fairness in ma-
     offensive language, in: Eleventh international aaai         chine learning, arXiv preprint arXiv:1908.09635
     conference on web and social media, 2017.                   (2019).
[34] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Ra-      [46] D. Link, B. Hellingrath, J. Ling, A human-is-the-
     dosavljevic, N. Bhamidipati, Hate speech detection          loop approach for semi-automated content moder-
     with comment embeddings, in: Proceedings of the             ation., in: ISCRAM, 2016.
     24th international conference on world wide web,       [47] N. M. Pace, L. Zakaras, Where the money goes:
     ACM, 2015, pp. 29–30.                                       Understanding litigant expenditures for producing
[35] P. Fortuna, S. Nunes, A survey on automatic de-             electronic discovery, RAND Corporation, 2012.
     tection of hate speech in text, ACM Computing          [48] M. Bagdouri, W. Webber, D. D. Lewis, D. W. Oard,
     Surveys (CSUR) 51 (2018) 1–30.                              Towards minimizing the annotation cost of certified
[36] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad,              text classification, in: CIKM 2013, ACM, 2013, pp.
     Y. Chang, Abusive language detection in online              989–998.
     user content, in: Proceedings of the 25th interna-     [49] G. V. Cormack, M. R. Grossman, Autonomy and reli-
     tional conference on world wide web, International          ability of continuous active learning for technology-
     World Wide Web Conferences Steering Committee,              assisted review, arXiv preprint arXiv:1504.06868
     2016, pp. 145–153.                                          (2015).
[37] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal,       [50] B. Settles, Active learning literature survey (2009).
     N. Farra, R. Kumar, Predicting the type and target     [51] J. Rocchio, Relevance feedback in information re-
     of offensive posts in social media, arXiv preprint          trieval, The Smart retrieval system-experiments in
     arXiv:1902.09666 (2019).                                    automatic document processing (1971) 313–323.
[38] R. Kumar, A. N. Reganti, A. Bhatia, T. Maheshwari,     [52] D. D. Lewis, W. A. Gale, A sequential algorithm for
     Aggression-annotated corpus of hindi-english code-          training text classifiers, in: SIGIR 1994, 1994, pp.
     mixed data, in: Proceedings of the Eleventh Inter-          3–12.
     national Conference on Language Resources and          [53] G. V. Cormack, M. R. Grossman, Engineering Qual-
     Evaluation (LREC-2018), 2018.                               ity and Reliability in Technology-Assisted Review,
     in: SIGIR, ACM Press, Pisa, Italy, 2016, pp. 75–84.
     URL: http://dl.acm.org/citation.cf m?doid=2911
     451.2911510. doi:10.1145/2911451.2911510,
     00024.
[54] D. D. Lewis, E. Yang, O. Frieder, Certifying one-
     phase technology-assisted reviews (2021).
[55] E. Yang, D. D. Lewis, O. Frieder, Heuristic stopping
     rules for technology-assisted review, in: Proceed-
     ings of the 21st ACM Symposium on Document
     Engineering, 2021.
[56] D. Li, E. Kanoulas, When to stop reviewing in
     technology-assisted reviews: Sampling from an
     adaptive distribution to estimate residual relevant
     documents, ACM Transactions on Information Sys-
     tems (TOIS) 38 (2020) 1–36.
[57] G. F. Cormack, M. F. Grossman, Evaluation of
     machine-learning protocols for technology-assisted
     review in electronic discovery, SIGIR 2014 (2014)
     153–162. doi:10.1145/2600428.2609601.
[58] Y.-Y. Yang, S.-C. Lee, Y.-A. Chung, T.-E. Wu, S.-A.
     Chen, H.-T. Lin, libact: Pool-based Active Learn-
     ing in Python, Technical Report, National Taiwan
     University, 2017. URL: https://github.com/ntu
     cllab/libact, available as arXiv preprint https:
     //arxiv.org/abs/1710.00379.