<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Fifth International Workshop on Systems and Algorithms for Formal Argumentation, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Reassessing the Impact of Reading Behaviour in Online Debates Under the Lens of Gradual Semantics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jordan Thieyre</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aurélie Beynier</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Maudet</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Srdjan Vesic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CRIL - CNRS - Univ. Artois</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIP6 - CNRS, Sorbonne Université</institution>
          ,
          <addr-line>4 place Jussieu, F-75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>17</volume>
      <issue>2024</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>While it is unrealistic to assume users of online debate platforms to read and interpret all the arguments available, it is important to understand how positions will emerge on the basis of a fraction of those arguments. What arguments exactly will be accessed by users depend on assumptions on the platform design or on the readers' behaviours. Young et al. were the rst to explore this question and report results in the context of an underlying extension-based semantics. We undertake a similar study in the context of gradual semantics, using a more comprehensive set of metrics, testing a larger number of behaviours, and come to di erent conclusions. We show in particular that a reading behaviour balancing supports and attacks provides interesting results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Online debates</kwd>
        <kwd>reading behaviour</kwd>
        <kwd>bipolar gradual argumentation frameworks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The Internet allows people to express their opinions by participating in online discussions,
sometimes involving many users and comments. These debates can provide a wealth of relevant
information for users curious about the subjects under discussion. The main obstacle that such
users may encounter is the sheer volume of comments exchanged, making it di cult for a
human being to read all the arguments of a debate and to assess the relevance of each point of
view in a reasonable amount of time. Indeed, under time constraints, the reading behaviour, i.e.
the way users read the arguments of the debate, a ects the subset of arguments they are exposed
to, and consequently, their assessment of the acceptability or strength of those arguments. In
this paper, we explore several “natural” reading behaviours and experimentally investigate how
these behaviours in uence the user’s perception of a debate.</p>
      <p>Online debates often take the form of an initial topic, to which many participants have
responded with comments, comments which themselves have responses, and so on. These
debates can therefore be represented in the form of graphs, more speci cally trees, where the
nodes are the comments and the edges are the attack or support relationships between two
comments. In short, these debates are well suited to the use of argumentation theory. In this
paper, we focus on the Kialo platform1 and we note that the number of arguments can be high.
It is unlikely that an average user will have the time to read all the arguments.</p>
      <p>
        The question investigated here is to study how the order in which the arguments are
considered can a ect the view readers have about the acceptability (as de ned in the extension-based
semantics introduced by Dung [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) of the most central arguments in an online debate. In the
rst attempt to answer this question, Young et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] presented a study of “comment sorting
policies”2 on Kialo data to determine how the order of presentation of arguments dynamically
a ects the acceptability status. As Kialo debates contain attacks and supports, Young et al. use
a attening3 approach, borrowed from Cayrol and Lagasquie-Schiex [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], to transform scraped
debates into attack-only argumentation graphs in order to readily use extension-based
semantics. Then, for each debate, for each of the four reading behaviour they have introduced, they
compute the grounded extension of the debate graph and the grounded extension of the n
rst arguments read following the considered reading behaviour. Young et al. then calculate a
distance between the two obtained extensions.
      </p>
      <p>We take inspiration from this study, but we depart from the methodology used by Young et
al. for the following reasons. First, the attening approach they use might lead to information
loss. Let us illustrate this on an example with x being the central proposition attacked by one
argument a and supported by two arguments b and c. The algorithm used by Young et al. to
atten this graph results here in ignoring the supports (they will be deleted). In our work, we
propose a completely distinct way to take into account the supports (namely by directly using
bipolar gradual semantics, and not a attening which is inappropriate in the case of online
debates) to take them into account properly.</p>
      <p>
        Another issue with the work of Young et al. is that they use grounded semantics. As previously
emphasized, extension-based semantics su er from a lack of smoothness [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It is undesirable if
a single argument results in drastic changes in the acceptability degrees. This Christmas tree
behaviour where arguments suddenly change their acceptability status when adding a new
argument is not intuitive in the context of online debates where one needs more robustness. As
a consequence, even though we borrow the methodology proposed by Young et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we use a
model we believe to be more adapted, namely that of bipolar gradual argumentation semantics.
It addresses both problems mentioned above. We use several state-of-the-art gradual semantics
to conduct our experiments. Besides these key di erences, our approach departs from the work
of Young et al. in the following manner: (1) we use a more diverse set of metrics to assess our
results; (2) we focus on the evaluation of the most central arguments (as opposed to all the
arguments of the debate); and (3) we introduce and study many more reading behaviours.
      </p>
      <p>
        Recently, behavioural studies have become more popular as a way to assess formal
argumentation frameworks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Most of these studies focus on the reasoning perspective, testing
1www.kialo.com
2Young et al. study “comment sorting policies”, but we take a more user-oriented perspective and talk about
“reading behaviours”, thus acknowledging that the order in which arguments are read is not only determined by the
designer’s choice of how they are presented on the platform. In practice both aspects interplay, but we keep things
simple in the context of this paper.
3Flattening is the procedure used to delete supports, preferences or other data and obtain a vanilla argumentation
graph that is somehow equivalent to the initial graph.
in particular how argumentation semantics match observed human behaviour. On the other
hand, the outcomes of gradual semantics have been compared [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but only using synthetic (and
complete) data. Finally, incomplete argumentation frameworks have been studied [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], but
from a theoretical perspective and in the extension-based framework. We stress again that we do
not take the (best case) perspective of a designer trying to optimize the sequence of arguments
to be read, or the (worst case) adversarial perspective of a reader who could manipulate the
system. Instead, we study “natural” reading behaviours to see how they a ect some global
metrics evaluating the distance to some ground-truth under complete information.
      </p>
      <p>The remainder of this paper is as follows. In Section 2 we present Kialo and the data. Section
3 provides the background on gradual semantics. We detail the reading behaviours and the
metrics used in Section 4 and Section 5. The experimental results are reported in Section 6.</p>
      <p>The full code together with a notebook allowing to explore many other parameters, semantics
and reading behaviours is available on our Git4 repository.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Kialo Dataset</title>
      <sec id="sec-2-1">
        <title>2.1. Presentation of Kialo</title>
        <p>Kialo is an online platform for structured debates. When a user initiates a debate, she puts
an initial claim which stands for the central question. Users can then add claims that respond
directly to the question if it is closed-ended, or to one of the existing alternatives if the question
is open-ended, or to another claim already in the discussion. Claims are classi ed as “PRO” and
“CON” depending on whether they support or attack the claim to which they are attached. The
moderation of the platform allows moderators to rephrase claims, to move or merge claims or
to delete claims that do not ful ll the requirements of the platform. Finally, users can vote on
the claims by choosing a score from 0 to 4 (integers only).</p>
        <p>Example 1 (A debate of Kialo). One of Kialo’s biggest debates (in terms of number of claims) is
“The Ethics of Eating Animals: Is Eating Meat Wrong?”. The central proposition debated is “Humans
should stop eating animal meat.” An example of a “PRO” claim for this proposition is “Eating meat,
in the majority of cases, involves the cruel and immoral treatment of animals.” An example of a
“CON” claim is “The taste of meat is delicious and brings many people pleasure in a manner that
vegetarian food cannot fully imitate.” This claim received 185 votes, distributed as follows: 66 users
voted for 0, 31 users for 1, 26 users for 2, 17 users for 3, and 45 users for 4.</p>
        <p>Kialo debates are “clean” thanks to the moderation carried out by users: there are no insults,
claims must be concise, be based on logic and facts, and present a single point related to the
theme of the debate. Claims can therefore be considered as arguments. Furthermore, claims
must not duplicate other claims. Finally, Kialo has been designed in such a way that the most
general claims are the closest to the central question, allowing users to dive into more details as
they move down the graph.
4https://gitlab.com/jthieyre/reassessing-the-impact-of-reading-behaviour-in-online-debates-under-the-lens-ofgradual-semantics</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Scraping and Resulting Dataset</title>
        <p>Firstly, we used Kialo’s API (Application Programming Interface) to retrieve the list of public
debate IDs. These IDs are then used to ask the API for the data relating to the debates, i.e. the
ID, the text, the votes, the date of creation and of last modi cation of the claim, and the ID
of its author; the ID, the type of relation (attack or support), the date of creation and of last
modi cation of the relation between this claim and another, and the ID of its author. Of this
large amount of data, we have only kept the IDs, dates and votes of the claims, and the dates
and relationships between claims with their types, to protect the privacy of users as much as
possible. In total, we recovered data from 2,959 public debates. We analysed the data collected
and observed that a large proportion of the data in each debate corresponded to “archived” data,
in other words, outdated claims or relationships. Using a depth- rst search (DFS) algorithm, we
were able to isolate the claims that corresponded to those displayed by the Kialo front-end from
those that are archived. The outcome is 377,182 claims spread over 2,959 cleaned debates.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. First analysis of debates</title>
        <p>The distribution of claims per debate is illustrated in Figure 1a and shows that only a small
number of debates have a signi cant number of claims. This is consistent with the distribution
of maximum depths depicted in Figure 1c, which shows that half of the debate graphs have
a maximum depth of 5 or less. All of Kialo’s claims received a total of 590,261 votes. The
distribution of votes by claim is depicted in Figure 1b and shows a very uneven distribution.
The closed-question debates were relatively balanced, as shown in Figures 1d and 1e: Figure 1d
gives the di erence between the number of PRO claims and the number of CON claims, as a
proportion of the number of claims in the debate. For example, 25% of debates have more CON
claims than PRO claims, so that the di erence between the two is at least 9% of the total number
of claims in these debates. Figure 1e depicted the distribution of the di erence between the
number of claims which contribute to support the central question and the number of claims
which help to attack it, as a proportion of the number of claims in the debate. In the same way
as illustrated in Figure 1d, we can see that 10% of debates have a greater number of claims
supporting the central proposition than attacking it, so that the di erence between the two
is at least 33% of the number of claims in these debates. Figure 1f illustrates the distribution
of incoming degrees, and shows that at least half of the claims have neither an attacker nor a
supporter.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <sec id="sec-3-1">
        <title>3.1. Bipolar gradual semantics</title>
        <p>
          As stated in Section 1, the Kialo debates are well suited to the use of argumentation theory.
Abstract argumentation theory was rst introduced in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The idea is to model argumentative
debates using graphs: the nodes represent the arguments and the edges the attack relations
between arguments. Initially, an argument can be “accepted” or “rejected” depending on the
arguments that attack it. Since then, a large number of papers have enriched the modeling
possibilities. The most advanced approaches today include gradual bipolar argumentation
s
m
i
a
l
c
f
o
r
e
b
m
N 6 25 56 151 305 447
u
        </p>
        <p>
          3,539
978
s
e
t
o
v
f
o
r
e
b
m
u
N 0 1 4 8 16 58
225
709
o
it
a
r
e
c
n
a
l
a
B
h
t
p
e
d
x
M 1 3 5 7 9 11
a
e
e
r
g
e
d
n
I
frameworks [
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
          ]: the arguments are still represented by nodes, but the edges represent
support relationships as well as attack relationships. The arguments are associated with two
real numbers: the rst can be interpreted as representing the initial score (or acceptability) of
an argument, i.e. when it is considered on its own, the second can represent the nal score
(or acceptability), i.e. when it is considered with the arguments that attack and/or support
it. The “functions” that enable the nal score to be calculated from the initial score and the
relations between arguments are called bipolar gradual semantics. We could not study every
bipolar gradual semantics from the literature here. We have chosen to study three of the most
prominent ones: Discontinuity Free Quantitative Argumentation Debate (DF-QuAD),
Eulerbased Semantics and Quadratic Energy Model (QuEM). These semantics are diverse as they
satisfy di erent sets of principles [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], as we will discuss later. The approach developed here is
designed to handle any bipolar gradual semantics. Our objective is to empirically study reading
behaviours and draw conlcusions that are as general as possible, regardless of the semantics
used.
        </p>
        <p>De nition 1. A Bipolar Argumentation Framework (BAF) is a triple (A, R , R+) such that:
• A is a
• R ✓</p>
        <p>nite set of arguments,
A ⇥ A is an acyclic binary relation on A describing attack relations,
• R+ ✓ A ⇥ A is an acyclic binary relation on A describing support relations.
R (a) hence denotes the set of direct attackers of an argument a and R+(a) its set of direct
supporters. In the following, we will generalize the equations by using the notation R⇤ which
can designate either R+ or R .</p>
        <p>
          The rst semantics we studied, DF-QuAD [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], was introduced to overcome the discontinuities
of the QuAD semantics [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. A single function is used to de ne the strength of the attackers and
supporters separately, and the function for the nal score is simpli ed (compared to QuAD). In
the following si denotes the initial score of an argument and sf its nal score.
De nition 2 (DF-QuAD, Discontinuity Free QuAD). Let f ⇤ be the function used to calculate
the strength of the attackers or supporters such that:
f ⇤ (a) =1
f ⇤ : A ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
        </p>
        <p>Y (1
b2 R⇤ (a)
sf (b))
where sf is the function used to calculate the nal score (see below), andf ⇤ stands for f +
(respectively f ) if we consider the set of supporters (respectively attackers). Note that R⇤ stands for R+
or R (but not both at the same time). By convention, if R⇤ (a) = ; , Q (1 sf (b)) = 1.
b2 R⇤ (a)
The function sf computing the nal score of an argument is de ned as:
De nition 3 (Euler-based semantics). The function sf computing the nal score of an
argument is de ned as:</p>
        <p>sf : A ! [0, 1[
sf (a) =1
1</p>
        <p>si(a)2
1 + si(a)eE(a)
where E(a) =</p>
        <p>X
b2 R+(a)
sf (b)</p>
        <p>X
b2 R (a)
sf (b)</p>
        <p>
          Finally, the last semantics we considered, is the Quadratic Energy Model (QuEM) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This
semantics uses the same de nition of E as the Euler-based semantics but it proposes a di erent
way of calculating the nal score of an argument in order to guarantee a symmetric impact of
attacks and supports. Like the Euler-based semantics, QuEM satis es all the desirable properties
identi ed by Amgoud and Ben-Naim.
De nition 4 (QuEM, Quadratic Energy Model). Let e 2 R, the impact of e is given by h
such that:
        </p>
        <p>
          h : R ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
h(e) =
        </p>
        <p>max(e, 0)2
1 + max(e, 0)2
The function sf used to calculate the nal score of an argument is de ned as:</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Score Initialization</title>
        <p>
          All studied semantics need to de ne an initial score si for each argument. In this paper, we
chose to leverage the votes on arguments provided by Kialo debates to initialize the scores.
As mentioned before, on the Kialo platform, each participant can vote for an argument by
choosing an integer score in [0 · · · 4]. For each argument a, the votes are summarized by a
vector v(a) 2 R5 of size 5 indicating how many participants voted for each score. We de ne
the initial score of each argument a by performing a normalized weighted average as proposed
by Yang et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In the following, w 2 R5 denotes the vector of weights. The initial score of
each argument is then calculated as follows:
si : A ⇥ R5 ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
si(a, w) =
(0.5
        </p>
        <p>v(a)·w
Pk2 v(a) k
if v(a) = [0, 0, 0, 0, 0]
otherwise</p>
        <p>As explained by Young et al., we can interpret the ve vote values as representing a “rational”
reader’s belief in the truth of a claim, with 0 for those thinking it is false and 4 for those believing
it is true. Then, we use weights w = [0, 0.25, 0.5, 0.75, 1] for score initialization, where each
step re ects an equal increase in con dence.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. First observations about the impact of the semantics</title>
        <p>
          The semantics presented in the previous section propose di erent ways to update the score
of an argument. Before studying di erent reading behaviours, we wanted to emphasize how
these di erent semantics a ect the nal scores obtained in debates. As discussed before, the
axiomatic analysis of gradual semantics may provide precious insights. For instance, the notion
of open-mindedness has been proposed by Potyka [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] to capture the fact that nal scores can
reach the bounds of their interval de nition, regardless of their initial score. It was established
that Euler-based and DF-QuAD do not satisfy this axiom, while QuEM does. In practice though,
we observe that the score variation is higher with DF-QuAD than with QuEM on our Kialo
debates. Figure 2 illustrates the distributions of the di erences between the nal score and
1
n
o
i
t
a
i
r
a0
v
e
r
o
c
S
1
        </p>
        <p>DF-QuAD</p>
        <sec id="sec-3-3-1">
          <title>Euler</title>
          <p>Semantics</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>QuEM</title>
          <p>
            the initial score of all arguments in the Kialo dataset as a function of the semantics used, when
these di erences are di erent from 0. In fact, the majority of arguments keep the same score
value. This is due to the fact that they have neither attacker nor supporter, as can be seen in
Figure 1f. Besides that, Figure 2 shows that the Euler-based semantics modi es the score less
than the other semantics, but is biased towards positive variation, while the other semantics
show a symmetric behaviour. This asymmetric behaviour of the Euler-based semantics is in
line with axiomatic analysis [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
          </p>
          <p>Secondly, to get a better intuition regarding the impact of debates’ depth on the scores of
arguments for each semantics, we illustrate how they vary on single-path attack-only debates of
increasing lengths. As depicted in Table 1, we iteratively derive from an argumentation graph
G1 containing a single argument (a), 5 other graphs (G2 to G6) organized as a line. A node
corresponds to an argument and arrows represent attack relations between two arguments.
From each graph Gi, we build Gi+1 by adding an attacking argument to the last added node.
Table 1 reports the variation of the nal score of argument a for the di erent graphs. Note that
we assume that each argument has an initial score of 0.5. Graph G1 is not mentioned since
the score of a remains the same (0.5) for each semantics (a has no attack nor support). For all
studied semantics, the variation of the nal score of a decreases as the depth of the graph i.e.
the length of the attack path, increases. Due to the way we extend the graphs, the sign of the
variation is alternating: the last argument attacks a or attacks an attack on a. Nonetheless, we
can see that the relation between the depth and impact of arguments is very di erent among
semantics. We observe that the Euler-based semantics leads to almost no variation in the score
of argument a already from depth 3. On the other hand, the largest variations are observed for
DF-QuAD.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Reading Behaviours</title>
      <p>
        Debates can involve a large number of arguments leading to an information overload, so that
users are rarely able to read all the arguments. In this section, we introduce various debate
reading behaviours that seem natural for humans. In particular, we assume that the reader starts
with the central question being debated and the arguments directly related to it. As reported by
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] in the context of participatory platforms, stakeholders “are mainly looking for assessments
and arguments related to their proposal”. These proposals are typically the main alternatives
under discussion which can be found at the rst level.
      </p>
      <p>Since Kialo debates are represented as trees, each behaviour is based on a tree traversal method
combined with (or without) the exploitation of information about the arguments (number of
votes, chronological order, PRO/CON classi cation).</p>
      <p>We considered the following methods for traversing the debate tree:
• Depth First Search (DFS),
• Breadth First Search (BFS),
• Hybrid Traversal (HT ): based on a DFS but forcing to explore all the child nodes of an
argument before exploring in depth the sub-tree of one of these children,
• No traversal (NT ), which does not exploit the structure of the tree.</p>
      <p>We combine these traversal methods with a ranking method for ordering the nodes (i.e.
arguments) within the same level:
• Chronological Order (CO): oldest to newest (based on the timestamps of the arguments),
• Descending likes (DL): arguments are ranked based on their initial score si,
• Descending likes with PRO-CON Diversity (PCD): the PRO argument with the highest
like value is ranked rst, then the CON argument with the highest like value, then the
PRO argument with the second highest like value, and so on.</p>
      <p>
        We cannot present all the combinations of traversing and ranking methods but we focus on
the following reading behaviours5:
1. Behaviour NT+CO: chronological order with no traversal,
2. Behaviour DFS+CO: depth rst search + chronological order,
3. Behaviour DFS+DL: depth rst search + descending likes (introduced as “likes” policy by
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]),
4. Behaviour DFS+PCD: depth rst search + PRO-CON diversity,
5. Behaviour BFS+PCD: breadth rst search + PRO-CON diversity,
6. Behaviour HT+PCD: hybrid traversal + PRO-CON diversity,
7. Behaviour MDHT+PCD: HT with maximum depth + PRO-CON diversity. A variant of
HT+PCD where we rst rank all the arguments whose depth is less or equal to 3 and then
rank the remaining arguments. Indeed, we believe that a standard reader rarely ventures
further than this depth. This intuition can be justi ed by the fact that 51% of votes are
located at a depth of 3 or less.
5More reading behaviours are available on our Git repository.
Note that the behaviours NT+CO, DFS+CO and DFS+DL correspond to the comment sorting
policies proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Example 2. We illustrate the reading order of the arguments obtained by the di erent reading
behaviours on the graph below. Argument a is the source node. Numerical values correspond to the
like value of each node. Arguments are assumed to be labeled in chronological order (a is the oldest
one and k is the newest). Solid edges represent attacks and dashed ones represent supports. PRO
(resp. CON) arguments are depicted in green (resp. red).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Metrics</title>
      <p>
        The question is now how to compare the di erent reading behaviours presented in the previous
section. Recall that the objective is to compare the situation where someone would have entirely
read the debate, with the situation where someone has partially read the debate. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in
line with their choice of studying an extension-based semantics, Young et al. use the Jaccard
coe cient over the set of accepted arguments obtained from the whole debate and set of
accepted arguments obtained from the partially read debate. In our gradual setting, we will use
the Root-Mean-Square-Error, Kendall Tau distance, and Jaccard coe cient in the same way to
compare nal scores of arguments. But before, we need to de ne some usefull notations. Let:
• F = (A, R , R+) be a BAF,
• N = |A| the number of arguments,
• n 2 [1 · · · N ] the number of arguments read,
• rb(F, n) a reading behaviour returning the ordered list of the n rst read arguments of F ,
• sf (a, rb(F, n)) (respectively sf (a, rb(F, N ))) the nal score of a when we consider the
partially read graph (respectively the entire graph)
• Sn = (sf (a, rb(F, n))a2 rb(F,n) the list of scores of the n rst read arguments of F when
we consider the partially read graph,
• SN = (sf (a, rb(F, N ))a2 rb(F,n) the list of scores of the n rst read arguments of F when
we consider the entire graph.
In the following Sn(i) (resp. SN (i)) denotes the score of the i-th element of Sn (resp. SN ). We
assume the i-th elements of Sn and SN correspond to the same argument a. Note that sf being
recursive, its result depends on the size of the considered graph, but we keep only the results of
n rst arguments given by rb. We can now de ne the three considered metrics:
• the Root-Mean-Square-Error (RMSE) between the obtained scores:
      </p>
      <p>RM SE = utuv X (sf (a, rb(F, n))
n</p>
      <p>
        sf (a, rb(F, N ))2
a2 rb(F,n)
• the Kendall Tau (KT) (normalized) distance between the rankings of the arguments induced
by the scores, as was done in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
      </p>
      <p>KT = 2|{(i, j) : i &lt; j, B(i, j) _ C(i, j)}|</p>
      <p>n(n 1)
where B(i, j) = (Sn(i) &lt; Sn(j)) ^ (SN (i) &gt; SN (j)) and C(i, j) = (Sn(i) &gt; Sn(j)) ^
(SN (i) &lt; SN (j)). KT counts the number of pairwise disagreements between two ranking
lists. Thus, B(i, j) checks whether the i-th argument is ranked before the j-th in Sn, and
after in SN . Similarly, C(i, j) checks whether the i-th argument is ranked after the j-th
in Sn, and before in SN .
• the Jaccard coe cient (JC) between the sets of accepted arguments, where arguments are
considered as accepted in our setting when their score reaches a prede ned threshold
(set to 0.5 in this paper):</p>
      <p>J C = |Dn T DN |
|Dn S DN |
where</p>
      <p>Dn = {a 2 rb(F, n)|sf (a, rb(F, n)) &gt; 0.5}</p>
      <p>DN = {a 2 rb(F, n)|sf (a, rb(F, N )) &gt; 0.5}
JC measures similarity between Dn, the set of accepted arguments when we consider the
partially read graph, and DN , the set of accepted arguments when we consider the entire
graph and when we keep only the n rst arguments given by rb. Indeed, when a reader
has seen n arguments, she cannot expect to have seen more than n accepted arguments.</p>
      <p>By doing this, JC is not biased when n is less than the nal number of accepted arguments.
Example 3. Let s = h0.8, 0.2, 0.7, 0.6, 0.1i and s0 = h0.6, 0.4, 0.7, 0.1, 0.2i be two score vectors
for some arguments a, b, c, d, e. The RSME between these two vectors s and s0 of argument scores
is 0.261. The induced rankings are respectively a c d b e and c a b e d,
yielding a KT distance of 0.3. Finally, the set of accepted arguments of s is {a, c, d} while it is
{a, c} for s0, hence the JC is 2/3.</p>
      <p>
        Furthermore, we will parametrize our metrics in such a way that the focus of the study can be
put on speci c arguments of the debates. For instance, top-k focused metrics are only concerned
with the central question, together with arguments up to depth k. Technically, it su ces to
weight the di erent metrics introduced above. As mentioned in Section 4, because stakeholders
of participatory platforms “are mainly looking for assessments and arguments related to their
proposal”[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], our interest will be the top-1, so in our metrics the root argument and the rst
level will have a weight of 1, and other arguments will have a weight of 0. But our pipeline
can handle any weighting. On average, we found there are a bit fewer than 9 arguments when
considering the top-1 of a debate.
0.2
D
A
u
-Q0.1
F
D
      </p>
      <p>0
0.15</p>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Results</title>
      <p>Observation 1. Whatever the metric or semantics used, behaviour BFS+PCD followed by
MDHT+PCD outperform all the others.</p>
      <p>Observation 2. Under BFS+PCD, the true ranking can be retrieved for all semantics when reading
at least 17% in the best case and in the worst case 50% of the graph.</p>
      <p>Observation 3. In trend, values of the metrics decrease for all the reading behaviours except for
NT+CO under the DF-QuAD semantics for which the error rst increases and then decreases.</p>
      <p>We have investigated the di erences in errors between the behaviours and highlighted 3
main explanations:
1. Traversal direction: adding an argument in breadth rather than in depth leads to a
stronger reduction of the median error. This can be observed when comparing DFS+PCD,
HT+PCD, MDHT+PCD and BFS+PCD. HT has a behaviour close to a DFS, while MDHT is close
to a BFS. This coincides with Figure 3 where we can see that the results of these behaviours can
be ranked (from best to worst) as: BFS+PCD MDHT+PCD HT+PCD DFS+PCD. Finally,
when we analyse the type of traversal that NT+CO produces at the start of reading, we realise
that it is similar to a DFS.</p>
      <p>2. Respecting the nal attack and support balance: let a be one of the arguments
considered by our metrics, i.e. with a weight di erent from 0. When we analyse the factors
in uencing the evaluation of the nal score of a, we see that the balance between the attackers
and the supporters of a plays an important role. Indeed, the nal score depends on the number
of attackers and supporters, and their respective scores. Therefore, any behaviour that leads
to a balance, when n arguments have been read, that is too di erent from the one when all
the arguments have been read, will produce a greater di erence between the nal scores of
a, and therefore a greater error. In practice, this result can be observed when we compare
DFS+CO, DFS+DL and DFS+PCD: the ranking method PCD produces a more faithful balance
(by alternating between PRO and CON arguments and placing the strongest arguments rst)
throughout the reading, and so it performs better. For instance, at the maximum RMSE for
DF-QuAD semantics, i.e. when a reader has read between 11% and 12% of the graph, 62% of the
arguments added by NT+CO are PRO arguments compared with 54% for BFS+PCD. However,
we did not observe any signi cant di erence between the average scores for the arguments
added by these two behaviours for this example.</p>
      <p>3. The semantics used: although the relative performance is preserved from one semantics
to another, we can see that the error values di er. In addition, we can see that the DF-QuAD
semantics is more sensitive to the two error explanation factors that we described above, than
the Euler-based and QuEM semantics.</p>
      <p>One important nding is thus that NT+CO produces an unbalanced and in-depth reading of
the debate. From a behavioural point of view, this can be explained by the fact that on Kialo,
the rst arguments are added by the creator of the debate who is more likely to be biased by
her own opinion on the issue and therefore add, at the rst stage of the debate, arguments that
support the central question, and also arguments that support her other arguments. Oldest
arguments are hence more likely to support the central question.</p>
      <p>
        Even though our approach di ers from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in many respects, it is instructive to compare our
respective results. Recall that, contrary to us, they single out DFS as the best policy for their
metric. We found out that the relative performances of the behaviours strongly depends on the
weighting applied to the metrics. In our case of strong focus on the most central arguments,
behaviours of type BFS perform best. But when no weighting is applied as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we also obtain
results showing that the DFS type behaviours are the ones that minimise the error.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>
        In this paper we performed a new assessment of the impact of the behaviour of users when
reading through online debates, following the methodology proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In a departure from
this work, we explore a setting of gradual semantics natively designed for bipolar argumentation
frameworks. We study a large number of natural behaviours and evaluate them through various
metrics, focusing on the core arguments of the debate. Our results show di erences with those
of Young et al.: if your focus is on the core arguments of the debate, breadth- rst reading gives
a fairly accurate evaluation with as few as a third of the arguments read. It also illustrates the
value of keeping a good diversity when skimming through debates. The complex interplay
between the reading behaviours, the semantics used and the focus of the metrics picture a rich
landscape which calls for further investigation, and one of our contributions is also to o er a
fully available software environment to perform complementary studies. In future work, we
plan to investigate whether our approach can provide guidance for comment sorting policies
design in the case of arguments elicitated as important by the debate owner. While our metrics
are exible enough to address these cases, it may be challenging to come up with general
guidelines when arguments of interest lie between the limit cases of all the arguments and only
the most central arguments. We also intend to further explore our conjecture that the debate
creator’s bias strongly a ects the initial chronological reading.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been supported by ANR (french National Research Agency) project AGGREEY
(ANR-22-CE23-0005).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Dung</surname>
          </string-name>
          ,
          <article-title>On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games</article-title>
          ,
          <source>Arti cial Intelligence</source>
          <volume>77</volume>
          (
          <year>1995</year>
          )
          <fpage>321</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joglekar</surname>
          </string-name>
          , G. Boschi,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <article-title>Ranking comment sorting policies in online debates</article-title>
          ,
          <source>Argument &amp; Computation</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>265</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cayrol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Lagasquie-Schiex</surname>
          </string-name>
          ,
          <article-title>On the acceptability of arguments in bipolar argumentation frameworks</article-title>
          , in: L.
          <string-name>
            <surname>Godo</surname>
          </string-name>
          (Ed.),
          <source>Symbolic and Quantitative Approaches to Reasoning with Uncertainty</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2005</year>
          , pp.
          <fpage>378</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <article-title>Social abstract argumentation</article-title>
          , in: T. Walsh (Ed.),
          <source>Proceedings of the 22nd International Joint Conference on Arti cial Intelligence</source>
          ,
          <source>IJCAI/AAAI</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>2287</fpage>
          -
          <lpage>2292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cerutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guillaume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hadoux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Polberg</surname>
          </string-name>
          ,
          <source>Empirical Cognitive Studies About Formal Argumentation</source>
          ,
          <year>2021</year>
          , p.
          <source>Chapter</source>
          <volume>14</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bonzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Delobelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Konieczny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maudet</surname>
          </string-name>
          ,
          <article-title>An empirical and axiomatic comparison of ranking-based semantics for abstract argumentation</article-title>
          ,
          <source>Journal of Applied Non-Classical Logics</source>
          <volume>33</volume>
          (
          <year>2023</year>
          )
          <fpage>328</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Baumeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Järvisalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Neugebauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niskanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rothe</surname>
          </string-name>
          ,
          <article-title>Acceptance in incomplete argumentation frameworks</article-title>
          ,
          <source>Arti cial Intelligence</source>
          <volume>295</volume>
          (
          <year>2021</year>
          )
          <fpage>103470</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Mailly</surname>
          </string-name>
          ,
          <article-title>Extension-based semantics for incomplete argumentation frameworks: properties, complexity and algorithms</article-title>
          ,
          <source>Journal of Logic and Computation</source>
          <volume>33</volume>
          (
          <year>2023</year>
          )
          <fpage>406</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aurisicchio</surname>
          </string-name>
          , G. Bertanza,
          <article-title>Automatic evaluation of design alternatives with quantitative argumentation</article-title>
          ,
          <source>Argument &amp; Computation</source>
          <volume>6</volume>
          (
          <year>2015</year>
          )
          <fpage>24</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aurisicchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <article-title>Discontinuity-free decision support with quantitative argumentation debates</article-title>
          ,
          <source>in: Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning</source>
          ,
          <source>KR</source>
          <year>2016</year>
          , AAAI Press,
          <year>2016</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Amgoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ben-Naim</surname>
          </string-name>
          ,
          <article-title>Evaluation of arguments in weighted bipolar graphs</article-title>
          , in: A.
          <string-name>
            <surname>Antonucci</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Cholvy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          Papini (Eds.),
          <source>Symbolic and Quantitative Approaches to Reasoning with Uncertainty</source>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Potyka</surname>
          </string-name>
          ,
          <article-title>Continuous dynamical systems for weighted bipolar argumentation</article-title>
          , in: M.
          <string-name>
            <surname>Thielscher</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Toni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
          </string-name>
          (Eds.),
          <source>Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR</source>
          <year>2018</year>
          , AAAI Press,
          <year>2018</year>
          , pp.
          <fpage>148</fpage>
          -
          <lpage>157</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Potyka</surname>
          </string-name>
          ,
          <article-title>Open-mindedness of gradual argumentation semantics</article-title>
          , in: N. Ben
          <string-name>
            <surname>Amor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Quost</surname>
          </string-name>
          , M. Theobald (Eds.),
          <source>Scalable Uncertainty Management</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W.</given-names>
            <surname>Aboucaya</surname>
          </string-name>
          ,
          <article-title>Collaborative systems for large-scale online citizen participation</article-title>
          ,
          <source>Phd thesis</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>