Exploring the potential of defeasible argumentation for quantitative inferences in real-world contexts: An assessment of computational trust Lucas Rizzo[0000−0001−9805−5306] , Pierpaolo Dondio[0000−0001−7874−8762] , and Luca Longo[0000−0002−2718−5426] Technological University Dublin, Dublin, Ireland {lucas.rizzo,pierpaolo.dondio,luca.longo}@tudublin.ie Abstract. Argumentation has recently shown appealing properties for inference under uncertainty and conflicting knowledge. However, there is a lack of studies focused on the examination of its capacity of exploiting real-world knowledge bases for performing quantitative, case-by-case inferences. This study performs an analysis of the inferential capacity of a set of argument-based models, de- signed by a human reasoner, for the problem of trust assessment. Precisely, these models are exploited using data from Wikipedia, and are aimed at inferring the trustworthiness of its editors. A comparison against non-deductive approaches re- vealed that these models were superior according to values inferred to recognised trustworthy editors. This research contributes to the field of argumentation by employing a replicable modular design which is suitable for modelling reasoning under uncertainty applied to distinct real-world domains. Keywords: Defeasible Argumentation, Argumentation Theory, Explainable Ar- tificial Intelligence, Non-monotonic Reasoning, Computational Trust 1 Introduction Trust is a crucial human construct investigated within several disciplines, such as psy- chology, sociology and philosophy, with many applications. It is an ill-defined con- struct, whose formalisation lies, among others, in the domain of knowledge representa- tion and reasoning. It is a complex phenomenon, essential to support decision-making processes and delegation in uncertain domains. Many definitions of trust can be found in the literature [19]. Briefly, it can be described as a prediction that a trusted entity will bring to completion the expectations of a trustor in some specific context. A computa- tional model of trust is one that brings this prediction to fruition when software agents are involved. Such models have emerged, aimed at making use of the notion of hu- man trust in open digital worlds [8]. They help an agent to collect, aggregate, quantify and classify evidence to inform its decision about how/whether to interact with another agent. Reasoning applied for the definition of computational models of trust is likely suitable to be modelled by defeasible argumentation [7]. Within Artificial Intelligence, defeasible argumentation is aimed at developing computational models of arguments Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 L. Rizzo et al. [21]. These models are typically built upon layers specialised for the definition of in- ternal structure of arguments, the resolution of conflicts between arguments and the possible resolution strategies for reaching a justifiable conclusion. Here, the modelling of reasoning via defeasible argumentation and, in turn, applied to the inference of com- putational trust, is proposed in the context of Wikipedia editors. The goal is to design knowledge-driven, argument-based models capable of assigning a trust value in the range [0, 1] ⊂ R to editors on a case-by-case basis. One means complete trust should be assigned to an editor, while 0 means an absence of trust assigned to the editor. These models are built upon domain knowledge and instantiated by quantitative data, thus can provide numerical inferences. As with [8], assigning trust is assumed to be a reasoning process or a rational decision grounded on evidence made by rational agents. More- over, it is assumed to be a defeasible reasoning process, whose underlying beliefs can be negated by new information. For example, an initial analysis might conclude that a Wikipedia editor should be assigned a high trustworthiness value, due to a large amount of previous interactions performed by him/her. However, if the reputation achieved by this agent after performing these interactions is not positive, then a new low trustwor- thiness value might be inferred instead, retracting the previous conclusion. The fact that these pieces of evidence and arguments can be withdrawn in light of new information allows this process to be seen as a form of defeasible reasoning activity. If successful, this reasoning activity might reinforce the generalisability of defeasible argumentation for carrying out quantitative, case-by-case inferences with uncertain and conflicting ev- idence, such as performed in other domains [23, 25, 26]. Thus, the research question under investigation is: “Can the consideration of conflicts and their resolution through defeasible argumentation lead to a better inference of trust of Wikipedia editors than a non-deductive aggregation of evidence?” The remainder of this paper continues with Section 2 providing the related work on computational trust. Section 3 defines the concept of better inference of trust in the context of this study, and presents the design of an empirical experiment for tackling the research question. The results, the analysis and the discussion of this experiment are provided in Section 4. Lastly, Section 5 concludes the study and suggests future work. 2 Related Work The first computational model of trust was proposed in [15]. Its goal was to enable artificial agents to make trust-based decisions in the domain of Distributed Artificial Intelligence. In general, trust evidence includes recommendation, reputation, past inter- actions, credentials and many other factors that might lead to contradicting assessments of trust. In this paper, the context under evaluation comes from the Wikipedia project. This project is under constant change from different types of contributors, ranging from domain experts and to casual contributors, to vandals and committed editors. Several works have attempted to compute the trust of Wikipedia editors and Wikipedia articles. For instance, [1] presents a content-driven reputation system for Wikipedia editors, as- suming that the reputation of editors can be used as a rough guide to the trust assigned to articles edited by them. In turn, reputation is assigned according to the longevity Examining the potential of defeasible argumentation for quantitative... 3 of the text inserted and the longevity of the text edited by each editor. In a subsequent work, [2] computes the trust of a word in a Wikipedia article according to the reputation of the original editor of the word, as well as the reputation of editors who edited con- tent in the vicinity of the word. The study demonstrates that text labelled as high trust has a significantly lower chance of being edited in the future. Similarly, [29] explores the revision history of an article to assess the trustworthiness of the article through a dynamic Bayesian network. A trust value is defined in the range [0, 1] ⊂ R, where 0 means complete untrustworthiness and 1 means complete trustworthiness. A set of 200 articles was evaluated and correctly classified in approximately 83% of cases accord- ing to a trust value threshold. The classes considered were featured articles (assumed to be highly trustworthy for being thoroughly reviewed) and clean-up articles (marked for major revision by editors). In short, other works evaluate the trust of Wikipedia’s contributors through a multi-agent trust model [11] and the Wikipedia editor reputation through the stability of content inserted [10]. Several works have examined the relation between defeasible reasoning and computational trust [16, 18], or proposed argument- based approaches for reasoning about trust [3, 28]. However, to the best of the authors’ knowledge, the use of defeasible argumentation, instantiated by quantitative informa- tion, for the inference of trust of Wikipedia editors as a numerical scalar has not been attempted so far. Hence, it is expected that inferential models built with defeasible argu- mentation might provide a useful approach to produce knowledge-driven, case-by-case inferences of trust. This investigation also extends previous works [23, 14, 25, 26] which have adopted a similar approach, but in different domains of application. Thus, an ad- ditional goal comes from enhancing the generalisability of defeasible argumentation as an effective approach to reason with quantitative, uncertain and conflicting information in real-world contexts. 3 Design and Methodology A primary research study was designed, which included a comparison between the in- ferences produced by defeasible argumentation models and two baseline inferences constructed for comparison purposes. The baselines were computed by measures of central tendency (average and weighted average) of the features employed for the in- ference of computational trust, also resulting in a value in the range [0, 1] ⊂ R. Two knowledge bases in the form of logical expressions that can be adapted as computa- tional arguments were produced by the first author of this paper. These were employed for the development of argument-based models. These models follow the five-layer modelling approach proposed in [12] and employed in other studies [23, 25, 26]: 1) definition of the structure of arguments, 2) definition of their conflicts, 3) their evalu- ation 4) the computation of the acceptance status of each argument and 5) their final accrual. A comparison of the inferences produced by defeasible argumentation models and baseline measures was done by assessing the values assigned to Barnstar editors. A Barnstar1 represents an award used by Wikipedia to recognise valuable editors. It is a non-automatic award bestowed from a Wikipedia editor to another Wikipedia editor. 1 https://en.wikipedia.org/wiki/Wikipedia:Barnstars 4 L. Rizzo et al. Therefore, it is not a ground truth for trust. Instead, it is used as a proxy measure in order to evaluate the produced inferences. Two metrics are employed for comparison of inferential models: rank of Barnstars and spread of values assigned to Barnstars. When sorting editors in descending order by their assigned trust values, it is assumed that the ranking of the best models will result in Barnstar editors being placed at the highest positions. Non-Barnstar editors may also be highly trustworthy. Nonetheless, Barnstar editors still should, presumably, be ranked at the highest positions. Moreover, since trust is not a binary concept, it is expected that the distribution of the trust values assigned by these same models to Barnstar editors should have a positive, continuous spread. Spread is measured by the standard deviation of the values assigned to Barnstar editors. Figure 1 summarises the design of the research. Knowledge-base 1 [24] Knowledge-base 2 [24] Design of argument- based models from domain knowledge Argument-based models 1. Structure of arguments 2. Conflicts of arguments Dataset 3. Evaluation of conflicts Instantiation 4. Acceptance status of models 5. Accrual of arguments Inferences through Argument-based average and weighted models’ inferences average of features Comparison of spread and rank of Barnstar editors Fig. 1 Design and evaluation strategy schema. 3.1 Dataset An XML dump of the Portuguese-language Wikipedia was selected for examination2 . It contained 1, 076, 396 articles, 1, 798, 363 editors and 67 Barnstar editors up to Decem- ber 2018. The rationale behind this decision was merely the suitability of the dump for the available computational resources. No natural language information contained in each article was analysed, but only quantitative data related to editors. Each Wikipedia page is identified by its title and it has a number of associated revisions containing: i) its own ID; ii) a time stamp; iii) a contributor (editor) identified by a user name or IP address if anonymous; iv) an optional commentary left by the editor; v) the cur- rent number of bytes of the page on current revision; vi) and an optional tag indicating whether the revision is minor or major and should be reviewed by other editors. From the data contained in each revision, the author applied its knowledge and intuition in this domain to design a set of quantitative features believed by him to be useful for the inference of trust3 . Table 1 list this set of features associated to each editor (including 2 File ptwiki-20190201-stub-meta-history.xml, downloaded on 2 January 2019 from https://dumps.wikimedia.org/. 3 Another human reasoner might have produced a different set of features, which could lead to different assignments of trust. Examining the potential of defeasible argumentation for quantitative... 5 anonymous ones identified by their IP). Some of these features such as presence, regu- larity and frequency were first proposed in [13]. A time window of 30 days was selected for evaluation of the frequency and regularity factors, in line with the statistical exam- ination performed by the Wikimedia Foundation’s Analytics that also selects this time window for some of its analysis of Wikipedia dumps. Designed features were in turn employed for constructing two knowledge bases, as exemplified in the next sections. Due to space limitations, these can be found in a public repository [24]. Table 1: Summary of features employed by a human reasoner for trust assessment. Feature Description Pages Integer number [1, 1,076,396] of unique pages edited by the user. Activity Integer number [1, 694,239] of edits performed by the user. Categorical value (Yes [1], No [0]) indicating whether the user is Anonymity anonymous or not. Anonymous users are identified by their IP. Ratio [0, 1] of edits flagged by the own editor for revision. 1 (0) means Not Minor all (no) edits of the editor flagged by him or herself as not minor. Ratio of [0, 1] edits in which a comment was included. One comment Comments allowed per edition. Ratio [0, 1] between the registration date of the user and the date of the Presence beginning of the system (January 2001). Frequency ratio [0, 1] of edits per time window of 30 days in the editor’s Frequency life cycle. Maximum value limited at 1. Regularity ratio [0, 1] per time window of 30 days. 1 means at least one Regularity interaction every 30 days in the editor’s life cycle. Overall integer number [-1 ·108 , 8 ·108 ] of bytes edited by the user. Bytes Insertions/deletions respectively increase/decrease the amount of bytes. 3.2 Defeasible Argumentation Models and Knowledge Bases Layer 1 - Definition of the structure of arguments The first step of this argumentation process focuses on the construction of forecast arguments [16]. Here, these are extended in order to allow the manipulation of numerical inputs, similarly to the structure adopted in fuzzy inference rules [27]. Definition 1 (Forecast argument). A generic forecast argument arg is defined, without loss of generalisability for AND and OR operators, as: arg: (i1 ⊂ [l1 , u1 ] AND i2 ⊂ [l2 , u2 ] ) OR (i3 ⊂ [l3 , u3 ] AND i4 ⊂ [l4 , u4 ]) → conclusion ⊂ [lc , uc ] Where in ∈ R is the input value of the feature n with numerical range [ln ∈ R, un ∈ R]; the range [lc ∈ R, uc ∈ R] is the numerical range of the conclusion level being inferred, or in this case the trust level; and AND and OR are boolean logical operators. This structure includes a set of premises (believed to influence the conclusion being inferred) and a conclusion derivable by applying an inference rule →. It is an uncertain impli- cation which is used to represent a defeasible argument. Premises and conclusions are strictly bounded in numerical ranges. In order to facilitate the reasoning process, natural language terms (for instance low and high) are also mapped to these numerical ranges. 6 L. Rizzo et al. Both linguistic terms and numerical ranges are usually provided by the knowledge base designer. In this paper, some of these values were defined based on the statistical anal- ysis of Wikipedia dumps provided by the Wikimedia Foundation’s Analytics, while others were defined intuitively based on the author’s experience with digital collabora- tive environments. Examples of forecast arguments using natural language terms and their respective numerical ranges employed in this study are: - arg1: low activity factor [0, 5] → low trust [0, 0.25) - arg2: high regularity factor [0.75, 1] → high trust [0.75, 1] - arg3: medium low pres. factor [0.25, 0.50) → medium low trust[0.25, 0.50) Layer 2 - Definition of the conflicts of arguments In order to evaluate inconsisten- cies, the notion of mitigating argument [16] is introduced. These are arguments that attack other forecast arguments or other mitigating arguments. Both forecast and mit- igating arguments are special defeasible rules, as defined in [17]. Informally, if their premises hold then presumably (defeasibly) their conclusions also hold. Different types of attacks, and consequently, mitigating arguments, exist in the literature [20, 4]. In the present study, three types are employed: undermining, undercutting and rebuttal attack. Table 2 lists their definitions and examples. Note that the coexistence of arguments Table 2: Types of attacks employed by mitigating arguments for the modelling of con- flicts among arguments. Attack type Definition Example A forecast argument and an inference ⇒ Undermining to an argument B (forecast or mitigating): arg1 ⇒ ¬ arg2 f orecast argument ⇒ ¬B low frequency factor A set of premises and an inference ⇒ to AND low regular. Undercutting an argument B (forecast or mitigating): factor AND low activ. premises ⇒ ¬B fac. ⇒ ¬ arg3 A bi-direction inference ⇔ between forecast arguments that support mutually Rebuttal arg2 ⇔ arg3 exclusive conclusions: f orecast arg. ⇔ f orecast arg. inferring different conclusions might be possible according to some expert’s reasoning, hence not all arguments with different conclusions lead to rebuttal attacks. The compu- tation of the acceptability status of arguments and final numerical scalar being produced by such models is performed in the next layers. This computation is made via abstract argumentation theory as proposed by [9]. In this case, all attacks are seen as a binary relation. All the designed arguments and attacks can now be seen as an argumentation framework (AF) depicted in Fig. 2. Layer 3 - Evaluation of the conflicts of arguments At this stage an AF can be elicited with data. Forecast and mitigating arguments can be activated or discarded, based on whether their premises evaluate true or false. Attacks between activated arguments will be evaluated before being activated as well. As mentioned in the previous layer, attacks Examining the potential of defeasible argumentation for quantitative... 7 Fig. 2: Graphical representation of AFs extracted from knowledge bases 1 (a) and 2 (b). Internal structure of labelled arguments can be seen in [24]. In this figure, it is important to observe the topology of the argumentation graphs. usually have a form of a binary relation. In a binary relation, a successful (activated) attack occurs whenever both of its source (argument attacking) and its target (argu- ment being attacked) are activated. However, this study also makes use of the notion of strength of arguments as presented in [20]. In this case, an attack is considered success- ful only if the strength of its source is equal to, or greater than, the strength of its target. To define the strength of an argument, feature weights are defined based on a pairwise comparison between the 9 employed features (Table 1) performed by the knowledge base designer. Hence, they will be numbers in the range [0, 8] ⊂ N, being 0 if a feature is considered less important than any other feature for the inference of computational trust, and 8 if it is considered more important than any other feature. The weight of a feature will also represent the strength of the argument employing this feature. These weights can be seen in the full knowledge bases [24]. Layer 4 - Definition of the acceptance status of arguments Given a set of activated attacks and arguments, acceptability semantics [9, 6] are applied to compute the accep- tance status of each argument, that is, its acceptability. Extension-based and ranking- based semantics are used to evaluate the overall interaction of arguments across the set, in order to select the arguments that should ultimately be accepted. In this study, two extension-based semantics (grounded and preferred [9]) and one ranking-based seman- tics (categoriser [6]) are employed. Layer 5 - Accrual of acceptable arguments In the last step of the reasoning pro- cess, a final inference must be produced. In the case of extension-based semantics, if multiple extensions are computed, the cardinality of an extension (number of accepted arguments) is used as a mechanism for the quantification of its credibility. Intuitively, a larger extension of arguments might be seen as more credible than smaller extensions. If the computed extensions have all the same cardinality, these are all brought forward in the reasoning process. After the selection of the larger extension/s or best-ranked argu- ment/s, a single scalar is produced through the accrual of the values inferred by forecast arguments. Mitigating arguments have already completed their role by contributing to the resolution of conflicting information and thus are not considered in this layer. In order to infer a crisp value at the end of the reasoning process, it is also necessary to 8 L. Rizzo et al. infer a crisp value by each accepted forecast argument. Following Definition 1, this is done as proposed in [22]: Definition 2 (Crisp conclusion of a forecast argument). The crisp value of a conclu- sion mapped to a numerical range [lc , uc ] in a generic forecast argument arg (Definition 1) is given by the function: v = min[max(i1 , i2 ), max(i3 , i4 )] |uc −lc | f (arg) = Rmax −Rmin · (v − Rmax ) + uc , where R max = min[max(u1 , u2 ), max(u3 , u4 )] Rmin = min[max(l1 , l2 ), max(l3 , l4 )] Three cases are possible depending on the values of lc and uc : 1. lc < uc : the higher the value of the premises of arg, the higher the value of f (arg). 2. lc > uc : the higher the value of the premises of arg, the lower the value of f (arg). 3. lc = uc : arg already infers a crisp value and Definition 2 is not necessary. Finally, the accrual of the crisp values inferred by forecast arguments will result in the trust value inferred by this reasoning process. This accrual can be made in different ways, for instance considering measures of central tendency. The average is accounted in this study for models that use a binary relation of attacks, while the weighted average is accounted for models that use the notion of strengths of arguments. Note that in the case of two preferred extensions with the same number of accepted forecast arguments, the outcome of the preferred semantics is the mean of its two extensions. Table 3 summarises the design of the argument-based models with different param- eters for each of their layers. Let us point out that the literature of defeasible argumenta- tion is vast [7, 17], and allows for many other configurations. Hence, we do not propose an optimal set of models. Instead, we have borrowed well known parameters that are believed to be enough for an initial account of the proposed assessment of trust and adequate for the knowledge bases in hand. Table 3: Models built with defeasible argumentation. Due to space limitations, knowl- edge bases (KB) are detailed in [24]. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Model Arguments Conflicts Attack Relation Semantics Accrual A1 (A7) KB1 (KB2) Binary Preferred card. + average A2 (A8) KB1 (KB2) Binary Categorizer average A3 (A9) KB1 (KB2) Binary Grounded average A4 (A10) KB1 (KB2) Strength of arg. Preferred card. + w. average A5 (A11) KB1 (KB2) Strength of arg. Categorizer w. average A6 (A12) KB1 (KB2) Strength of arg. Grounded w. average 4 Results and Discussion The data extracted from a Portuguese Wikipedia dump was used to elicit the designed argument-based models (Table 3) and baseline instruments. The inferences produced by them were employed for the evaluation of the rank and spread of trust values assigned to Barnstar editors. Table 4 lists the procedure of calculation of each metric, while Figure 3 depicts the respective results. Examining the potential of defeasible argumentation for quantitative... 9 Table 4: Calculation of metrics employed to assess the performed trust inferences. Metric Calculation Sort all other editors by their trust values in descending order. Non-barnstars tied with Barnstars are ranked above. Sum the ranks of the Barnstar editors Rank of Barnstars and normalise the result in the range [0, 100] ⊂ R. 0 means all Barnstars with an assigned trust value are ranked above any non-Barnstar, while 100 means they are ranked below any non-Barnstar. Spread Standard deviation of the trust values assigned to Barnstars. KB1 KB2 Baselines Standard deviation Rank of Barnstars 0.25 40.4 0.3 40 24.1 0.14 0.14 0.14 0.14 0.14 0.14 0.109 0.2 12.955 12.132 12.132 0.1 0.1 9.216 0.036 20 0.1 1.102 0.02 0.02 0.93 0.93 0.94 0.87 0.87 0.88 0 0 W. Average W. Average . A10 . ⊗ A12 . ⊗ A11 . A12 . ‡ A11 . ‡ A10 Average Average . ⊗ A4 . A6 ? ⊗ A1 ? A3 ? ⊗ A7 . ⊗ A4 . A6 ? ⊗ A1 ? A3 ? ⊗ A7 . ‡ A5 ? ‡ A2 ? ‡ A8 ? ‡ A8 . ‡ A5 ? ‡ A2 (a) (b) Sum of models’ ranks 30 23 23 22 22 22 20 15 13 10 10 9 9 10 5 5 0 W. Average . ⊗ A11 . ‡ A10 . ‡ A12 Average . ⊗ A4 A6 ? ⊗ A1 A3 ? ⊗ A7 . ‡ A5 ? ‡ A8 ? ‡ A2 . ? (c) Fig. 3: Results achieved by each designed model of inference and baselines. Inferior symbols are used to represent grounded semantics ( ); preferred semantics (⊗); cat- egoriser semantics (‡); and use (respectively no use) of the argument’s strength (., respectively ?). Model A9 was removed due to 52.33% of cases undecided. Fig. 3a depicts the resulting normalised sum of Barnstar ranks, indicating if Barn- star editors were ranked at the highest positions or not. It is possible to observe that the computed ranks by argument-based models were effective, ranging from 0.87 to 12.95. This suggests that defeasible argumentation was capable of capturing, to some degree, the notions of the ill-defined construct of trust. In contrast, the baseline in- struments (average and weighted average), presented poor performance (ranks equal to 21.1 and 40.4 respectively). It implied that the reasoning performed by argument-based models was able to greatly improve the use of the selected features for the ranking of Barnstars. Among the argument-based models, the inferences produced by those built 10 L. Rizzo et al. with KB1 (A{1-6}) did not result in a rank of Barnstar significantly different. A possi- ble reason might be the simplified topology of KB1 (Figure 2a). In contrast, note that the inferences produced by models built with KB2 resulted in a higher variance of the rank of Barnstar (1.102 - 12.955), with model A9 not being reported due to the higher number of cases with no inference (52.33%). It implies that, as expected, acceptability semantics are more significant when employed over AFs of greater topological com- plexity. The other metric evaluated, the spread of the trust values assigned to Barnstar editors, was measured through the standard deviation (σ ) of these values. Figure 3b de- picts the results for this metric. The inferences of models built with KB1 (A{1-6}) had low variance and robust results. The inferences of models built with KB2 (A{7-12}) had higher variance, with the best results achieved by the categoriser and preferred semantics with no strength of arguments (A7 and A8). In comparison to the baseline in- struments, argument-based models achieved better results except when built with KB2 and strength of arguments (A{10-12}). It might be argued that a single set of strengths was selected for all the exploited data, thus it is not adequate for case-by-case reasoning. Figure 3c reports the sum of the ranks achieved by each model for each metric of evaluation. While relative differences are lost when models are ranked, this sum still provides a general account on their performance. Argument-based models seem to con- firm the likely superior inferential capacity of defeasible argumentation compared to the selected baselines. In particular, models A4 and A6 built with KB1 presented the best general solutions. The exception comes from model A9 (grounded semantics, no strength of arguments, and KB2). This configuration of parameters led to a high number of cases with no inference (52.33%). The grounded semantics, with no strength of ar- guments, is a sceptical approach, and, as expected, likely unable to solve a high number of rebuttals. In summary, the use of defeasible argumentation for the inference of trust of the Wikipedia editors could be seen as more appealing than the compared baseline instruments. Such instruments do not take into account possible conflicts among the se- lected pieces of evidence. Thus, the results of this study indicate that the assumption of assigning a numerical trust value to Wikipedia editors as a form of defeasible reasoning process is likely valid. Hence, it is a promising reasoning technique because it offers a flexible approach for translating different knowledge bases and beliefs of human rea- soners into computational rules. Moreover, it allows the creation of models that can be extended, falsified, and replicated, supporting the enhancement of the understanding of computational trust itself. These advantages are observed also against data-driven techniques, even the ones able to produce interpretable solutions such as decision-trees. 5 Conclusions and Future Work This study presented an empirical evaluation of defeasible argumentation for the infer- ence of computational trust in the context of the Wikipedia project. It employed two knowledge bases formed by computational rules and grounded on the domain knowl- edge of a human reasoner. A primary research has been conducted including the con- struction of inferential models using defeasible argumentation. These were employed to represent the reasoning applied to assess and infer the trust of Wikipedia editors as a nu- Examining the potential of defeasible argumentation for quantitative... 11 merical metric. Moreover, they were elicited with real-world, quantitative data provided by publicly available Wikipedia dumps. The output of these models were scalars repre- senting a trust value assigned to each editor in the range [0, 1] ⊂ R. The selected metrics for the evaluation of their inferential capacity were the spread and rank of trust values assigned to editors recognised as trustworthy by the Wikipedia community. Findings indicated that models built with defeasible argumentation outperformed non-deductive calculations in both metrics. Therefore, the assessment of computational trust as a form of defeasible reasoning process is presumably plausible. Thus, this research contributes to the field of defeasible argumentation by exemplifying a practical use of this reasoning approach seldom reported in the literature. This use is done via a modular design which is suitable for modelling reasoning applied to distinct real-world domains. For instance, previous works have employed this design for the inference of other phenomena, such as human mental workload [23] and risk of mortality in elderly individuals [25, 26]. Therefore, the results presented here reinforce the generalisability of defeasible argu- mentation for knowledge representation and production of quantitative inferences in distinct domains characterized by uncertain and conflicting evidence. Future work will concentrate on replicating this experiment by considering other reasoning approaches, such as fuzzy reasoning and expert systems, and taking into account knowledge bases built by multiple reasoners and/or including human-in-the-loop alternatives [5] for the automation of the creation of arguments and attacks. References 1. Adler, B.T., de Alfaro, L.: A content-driven reputation system for the wikipedia. In: Proceed- ings of the 16th Int. Conf on World Wide Web. pp. 261–270. WWW ’07, ACM, New York, NY (2007) 2. Adler, B.T., Chatterjee, K., de Alfaro, L., Faella, M., Pye, I., Raman, V.: Assigning trust to wikipedia content. In: Proc. of the 4th Int. Symposium on Wikis. pp. 26:1–26:12. WikiSym ’08, ACM (2008) 3. Amgoud, L., Demolombe, R.: An argumentation-based approach for reasoning about trust in information sources. Argument & Computation 5(2-3), 191–215 (2014) 4. Amgoud, L., Vesic, S.: Rich preference-based argumentation frameworks. International Jour- nal of Approximate Reasoning 55(2), 585 – 606 (2014) 5. Barakat, N., Bradley, A.P.: Rule extraction from support vector machines: A review. Neuro- computing 74(1), 178 – 190 (2010), artificial Brains 6. Besnard, P., Hunter, A.: A logic-based theory of deductive arguments. Artificial Intelligence 128(1-2), 203–235 (2001) 7. Bryant, D., Krause, P.: A review of current defeasible reasoning implementations. The Knowledge Engineering Review 23(3), 227–260 (2008) 8. Dondio, P., Longo, L.: Computing trust as a form of presumptive reasoning. In: Web Intelli- gence (WI) and Intelligent Agent Technologies (IAT), IEEE/WIC/ACM Int. Joint Conf. on. vol. 2, pp. 274–281 (2014) 9. Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial intelligence 77(2), 321–358 (1995) 10. Javanmardi, S., Lopes, C., Baldi, P.: Modeling user reputation in wikis. Statistical Analysis and Data Mining: The ASA Data Science Journal 3(2), 126–139 (2010) 12 L. Rizzo et al. 11. Krupa, Y., Vercouter, L., Hübner, J.F., Herzig, A.: Trust based evaluation of wikipedia’s contributors. In: Aldewereld, H., Dignum, V., Picard, G. (eds.) Engineering Societies in the Agents World X. pp. 148–161. Springer Berlin Heidelberg (2009) 12. Longo, L.: Argumentation for knowledge representation, conflict resolution, defeasible infer- ence and its integration with machine learning. In: Machine Learning for Health Informatics. pp. 183–208 (2016) 13. Longo, L., Dondio, P., Barrett, S.: Temporal factors to evaluate trustworthiness of virtual identities. In: 2007 Third International Conference on Security and Privacy in Communica- tions Networks and the Workshops-SecureComm 2007. pp. 11–19. IEEE (2007) 14. Longo, L., Rizzo, L., Dondio, P.: Examining the modelling capabilities of defeasible ar- gumentation and non-monotonic fuzzy reasoning. Knowledge-Based Systems p. (in Press) (2020) 15. Marsh, S.P.: Formalizing Trust as a Computational Concept. Ph.d. thesis, University of Stir- ling, Department of Computer Science and Mathematics (1994) 16. Matt, P.A., Morgem, M., Toni, F.: Combining statistics and arguments to compute trust. In: 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto. vol. 1, pp. 209–216. ACM (May 2010) 17. Modgil, S., Prakken, H.: A general account of argumentation with preferences. Artificial Intelligence 195, 361 – 397 (2013) 18. Parsons, S., Atkinson, K., Li, Z., McBurney, P., Sklar, E., Singh, M., Haigh, K., Levitt, K., Rowe, J.: Argument schemes for reasoning about trust. Argument & Computation 5(2-3), 160–190 (2014) 19. Parsons, S., McBurney, P., Sklar, E.: Reasoning about trust using argumentation: A position paper. In: Workshop on Argumentation in Multi-Agent Systems. pp. 159–170 (2010) 20. Pollock, J.L.: Cognitive carpentry: A blueprint for how to build a person. Mit Press (1995) 21. Prakken, H., Sartor, G.: The role of logic in computational models of legal argument: A critical survey. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond: Essays in Honour of Robert A. Kowalski Part II, pp. 342–381. Springer, Berlin, Heidelberg (2002) 22. Rizzo, L.: Evaluating the Impact of Defeasible Argumentation as a Modelling Technique for Reasoning under Uncertainty. Ph.D. thesis, Technological University Dublin (2020) 23. Rizzo, L., Longo, L.: An empirical evaluation of the inferential capacity of defeasible argu- mentation, non-monotonic fuzzy reasoning and expert systems. Expert Systems with Appli- cations 147, (in press) (2020) 24. Rizzo, L., Longo, L.: Structured knowledge bases for the inference of computational trust of Wikipedia editors (2020, accessed May 5, 2020), doi.org/10.6084/m9.figshare. 12249770 25. Rizzo, L., Majnaric, L., Dondio, P., Longo, L.: An investigation of argumentation theory for the prediction of survival in elderly using biomarkers. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds.) Artificial Intelligence Applications and Innovations. pp. 385–397. Springer International Publishing, Cham (2018) 26. Rizzo, L., Majnaric, L., Longo, L.: A comparative study of defeasible argumentation and non-monotonic fuzzy reasoning for elderly survival prediction using biomarkers. In: Ghi- dini, C., Magnini, B., Passerini, A., Traverso, P. (eds.) AI*IA 2018 – Advances in Artificial Intelligence. pp. 197–209. Springer International Publishing, Cham (2018) 27. Ross, T.J.: Fuzzy Logic with Engineering Applications. New York: McGraw-Hill (1995) 28. Tang, Y., Cai, K., McBurney, P., Sklar, E., Parsons, S.: Using argumentation to reason about trust and belief. Journal of Logic and Computation 22(5), 979–1018 (2012) 29. Zeng, H., Alhossaini, M.A., Ding, L., Fikes, R., McGuinness, D.L.: Computing trust from revision history. In: Intl. Conf. on Privacy, Security and Trust (2006)