Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) Argumentation Devices in Reasoning About Health Sally Jackson and Jodi Schneider University of Illinois at Urbana-Champaign, sallyj@illinois.edu University of Pittsburgh, jos188@pitt.edu Abstract Example 1, Press Release from AutismSpeaks non-profit1 Health controversies are infused with products of In the largest-ever study of its kind, researchers again found that expert reasoning, often interpreted by non-experts. the measles-mumps-rubella (MMR) vaccine did not increase To understand these controversies, we must pay risk for autism spectrum disorder (ASD). This proved true even closer attention both to the field-dependent devices among children already considered at high risk for the disorder. that characterize expert reasoning, and to how non- In all, the researchers analyzed the health records of 95,727 experts engage with experts’ evidence and children, including more than 15,000 children unvaccinated at reasoning in their own argumentative practices. We age 2 and more than 8,000 still unvaccinated at age 5. Nearly describe two argumentation devices that have 2,000 of these children were considered at risk for autism emerged in medical research and discuss the role of because they were born into families that already had a child these devices within health controversies. with the disorder. The report appears today in JAMA, the Journal of the American 1 Introduction Medical Association. Argumentation is a constantly evolving social practice, one that builds on thousands of years of human experience. The Example 2, The New York Times2 ubiquitous human practice of seeking advice from experts, According to Dr. Paul Offit, an infectious disease specialist at for example, has very long historical roots, but it is also a Children's Hospital of Philadelphia, young children readily basis for decision-making that is in constant flux as the handle the immune challenges of multiple vaccines. For grounds for expert opinion change. Expert fields do not just example, studies have shown the five-in-one vaccine Pediarix accumulate information; they also invent specialized ways against hepatitis B, polio, tetanus, diphtheria and pertussis is as of reasoning about information. Toulmin [1958] noted this safe and effective as giving each of these vaccines individually. fact and discussed at length the possibility that warrants (or backing for warrants) might justify the movement from data Example 3, The Guardian3 to claim only within particular fields. The Argument Inter- The evidence of no link between MMR and autism is now change Format [Chesñevar et al., 2006] acknowledges field extremely strong. In February 2012, the Cochrane Collaboration dependence in argumentation by including context in the - which compiles gold-standard reviews of medical evidence - core model and assuming that context may include domain- conducted a huge study into the safety of MMR. This mega- specific argumentation rules that are direct counterparts of review brought together evidence from 54 difference(sic) domain-independent schemes. Our goal in this paper is to scientific studies using a variety of methodologies and involving explore field-dependent patterns of reasoning in health and 14.7 million children from around the world. medicine and to consider how these can be modeled. Several examples drawn from a contemporary health These passages are typical of the appearance of expert controversy illustrate an important fact: As expert fields knowledge in the public discussion of childhood innovate in their own reasoning practices, arguments built vaccination. But the critical questions associated with the by non-experts on the prior arguments of experts may take argument from expert opinion scheme will not provide the forms quite unlike the canonical form of argument from kind of searching evaluation that these examples require. expert opinion. Each example mentions a conclusion drawn 1 by an expert or group of experts, and at first glance, it would https://www.autismspeaks.org/science/science-news/no-mmr- seem that each would pass all of the tests defined by autism-link-large-study-vaccinated-vs-unvaccinated-kids 2 standard lists of critical questions for the expert opinion http://well.blogs.nytimes.com/2015/08/10/not-vaccinating- scheme [Walton et al., 2008, p. 15], including the “backup children-is-the-greater-risk/?_r=0 3 evidence question.” http://www.theguardian.com/society/2013/apr/25/measles- mmr-the-essential-guide 49 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) The basis for the expert opinion is in each case not only 2.1 Randomized controlled trials field-specific information (“backup evidence”) but also Establishing and defending claims about medical some field-dependent inference strategy, applied directly by treatments is central to health science and practice. the expert source mentioned in Examples 1 and 3, and Although the problem has existed throughout human indirectly (by the expert’s own expert sources) in history, our standards for defense of such claims have Example 2. How should the differences among texts like changed dramatically in the last century, with the invention these be represented, and what new critical questions do of the Randomized Controlled Trial (RCT). RCTs combine these arguments invite? three features: (1) a comparison of a treatment of interest with a control condition (or with an alternative treatment); 2 Field-dependent argumentation devices (2) random allocation of patients to treatment conditions; Expert fields may build up repertoires of reasoning and (3) “blinding” of patients and researchers to the strategies over time, resulting in field-dependent inference treatment any given individual receives. rules. When any such new inference rule is proposed, other Meldrum [2000] provides an illuminating account of the experts may challenge it, describing undercuts or rebuts to emergence of RCTs, documenting the series of innovations the strategy (as we will describe in 2.1 and 2.2). Iterative that, when combined into a single experimental design, repair and critique continue, often over long periods of time, became the standard against which all other medical until the strategy is defeated, abandoned, or stabilized. evidence has come to be compared. We summarize her We will use the term argumentation device to describe a account here to highlight the fact that specific innovations stable inference rule, currently accepted within a given field (like random allocation) serve specific argumentative as a repeatable method for generating new, valid arguments functions, so much so that their omission is said to make the within the field’s domain. An argumentation device may experiment invalid as evidence for a conclusion about the contain material components that augment human reasoning effect of a treatment. in various ways and institutional components that Prior to the 1900’s, controlled experiments in human underwrite their dependability. health were rare, and according to Meldrum, even more In many respects, argumentation devices resemble rarely conducted on treatments that could be administered to argumentation schemes. Schemes, though, are generally individual patients. Medical practitioners engaged in careful assumed to be domain-independent and stable over long observation and sharing of results, and the literature was periods of time [Chesñevar et al., 2006, p. 297], while the filled with case reports of what had worked in individual inventions we call argumentation devices are deeply cases, but without procedural controls needed for strong entwined with the state of knowledge in a given domain. inference from these observations. They work like schemes (as rules that justify drawing a Proliferation of treatments – particularly drugs and patent conclusion from data); and like schemes, they have medicines – led to the formation of assessment agencies in specifiable critical questions. However, the critical the early 1900’s, including the American Medical questions needed to evaluate the output of an argumentation Association’s Council on Pharmacy and Chemistry, and the device need to be discovered for each such device, often by first U.S. federal bureau empowered to review “the seeing how the device fares in actual debate among experts, extravagant claims” made by the pharmaceutical industry of and then again, in larger contexts (like public debate) where the time [Meldrum, 2000, p749]. Of central importance to the output of the device may be used as evidence for some our treatment of RCTs as an argumentation device is the further conclusion. They may change in response to change role agencies played in challenging these extravagant in the substantive knowledge of the field, as when some claims. newly discovered fact about the phenomena exposes a To understand RCTs as an argumentation device, it is previously undetectable way for the device to go wrong. important to understand how profoundly doubt, Argumentation devices can be extremely complex, disagreement, and error have affected the elaboration of this incorporating material and institutional components that device over time. Scientists working with human subjects simply do not figure in ordinary schemes. For domains had to discover the need for randomization in the advancing high-stakes claims, like medical research, there assignment of patients or other subjects to experimental are many different motivations for critical scrutiny conditions; the general superiority of comparisons based on (scientific commitment to empirical adequacy, pragmatic randomly assigned groups is counterintuitive, but is interest in quality of health care, patient concern for safety, nowadays universally acknowledged to be the best defense financial interest in health care products, and more), and any against bias or suspicion of bias. Other innovations like of these motivations can lead either to the discovery of new double-blinding were added as standard features of critical questions or to the invention of new strategies for experiments on human subjects, not because logic requires disarming them. In the next two sections, we introduce two them, but because of the practical discovery that patients’ argumentation devices that have emerged over the past half- and experimenters’ expectations could affect health century and co-evolved rapidly, supported by significant outcomes, leading to novel criticisms of experiments for investment in material and institutional resources. falling prey to “the placebo effect.” RCTs with various forms of blinding are the present standard for evidence in medicine, but they achieved their present status only slowly, 50 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) and only incrementally. At each stage of development, it has evaluation of the relevance and strength of evidence in each been a device meant to disarm known objections to the study; prescribed methods for combining information conclusions drawn from a set of observations. quantitatively; preferred methods for presentation of RCTs stabilized into a standardized, widely accepted findings; and more form only in the late 1950’s [Meldrum, 2000, p754], about Unlike RCTs, systematic reviews do not generate new ten years after the first large-scale trials were initiated (1946 observations. They assemble evidence that already exists in in the US, 1947 in the UK). A decade later RCTs gained a scientific literature and draw inferences from this evidence institutional status. In the wake of thalidomide-associated in a highly disciplined way. Evidence that would be birth defects, the U.S. Food and Drug Administration began considered inconsistent from a common-sense point of view to investigate new approaches for reviewing drugs for safety is taken as input to the review, and interpreted in light of [Meldrum, 2000]. This led to a 1970 regulation enshrining what experts know about variability. A Cochrane Review the RCT in U.S. law. treats study-to-study variation in findings from multiple RCTs are not by any means a secure defense for a claim RCTs as normal and unremarkable, and because all relevant about a treatment effect. A series of RCTs, each evidence is included, it offers good defense against any competently executed, can come to different conclusions charge of cherry-picking. New reviewing standards emerge about a treatment. And each one remains vulnerable to in response to problems noticed in the quality of subtle counterarguments that only expert researchers are argumentation produced by a review. For example, the likely to discover—previously unknown confounds, for Cochrane handbook includes cautions against “common example. However, RCTs handily defeat most other forms mistakes” made in reviewing, such as concluding that there of evidence that might be advanced for the same class of is evidence of no effect of an intervention when all that is claims. They are a “package deal” of evidence for a claim really justified by the literature is that there is no evidence and evidence against a standard set of possible rebuttals, of an effect.5 Against a charge that the Cochrane Review is creating a strong but still defeasible conclusion. only as good as the body of primary research available for aggregation, the Cochrane Collaboration (more than 37,000 2.2 Cochrane Reviews contributors from over 130 countries) has adopted a formal As noted briefly above, RCTs on a particular treatment may practice of “grading” the strength of the evidence base itself. accumulate within a scientific literature, each reporting Although systematic review methods are still in a period some measurement of the effect of the treatment. Despite of rapid methodological innovation, the Cochrane Review the widely acknowledged value of RCTs for evaluating has already achieved the status of a trusted argumentation treatment effects, expertise in interpretation is still device, largely because its procedures are so explicitly necessary. One of the things experts know is that random linked to critical questions on which earlier styles of variability is always present in the results of any series of research synthesis regularly failed. The methodical search identically designed experiments on human subjects. This procedures required for a Cochrane Review make it hard for creates an opportunity for confirmation bias to operate as a critic to object that evidence was assembled to fit the readers cherry-pick results that support their beliefs and reviewer’s own hypothesis. Counter-arguing individual ignore or discount results that do not. Accompanying the studies (a once-common practice in narrative reviews of rise of RCTs in medicine is another important invention, the literature) is replaced with careful and explicit coding systematic research review designed to aggregate evidence decisions applied impartially to the entire corpus of from many individual studies into a statement of what the potentially relevant studies. Reviewer bias is further research as a whole may be taken to support. Over just the minimized through highly structured reporting methods: For past three decades, a highly standardized form of systematic example, if the review includes meta-analysis (a technique review has emerged, known as the Cochrane Review. for transforming results of each individual study into a Cochrane Reviews are named for Archie Cochrane, a quantitative effect size measure), the results must be Scottish doctor and epidemiologist, who championed the displayed as a “forest plot” that allows readers to inspect use of RCTs for guidance of clinical practice. In 1989, the results on a study-by-study basis. publication of a 2-volume work on pregnancy and childbirth marked what Cochrane regarded as “a real milestone in the 3 Modeling the role of argumentation devices history of randomised trials and in the evaluation of care” In a very preliminary way, we want to consider the [Chalmers et al., 1989; and Cochrane’s Foreword]. This was challenges of including argumentation devices like these in the first major systematic review in health science, a formal models. Argumentation devices resemble schemes in massive undertaking involving ten years of effort to review most respects; they serve as reusable links between different over 3000 controlled trials published since 1950 [Review, collections of data and conclusions drawn from these data. 1990]. A Cochrane Review is a review of literature They are applied to data, and although devices do not need conducted using very well-defined procedures outlined in an defense in each application, they do have a context- official handbook.4 These procedures include exhaustive independent defense that can be attacked either in the search for relevant studies; use of scoring rubrics for particular occasion of use or in a general critique of all 4 5 http://handbook.cochrane.org Cochrane Handbook Part 2 section 12.7.2 51 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) arguments using the device. In an argument network, they understand the workings of the device and have confidence would be better represented as a scheme node than as an in it. Both RCTs and Cochrane Reviews share a well- information node. In a Toulmin diagram, the device is the defined context consisting of an audience of medical warrant for conclusions drawn from data. The field- experts, a pre-existing literature, and other features whose dependence of a argumentation device will commonly be argumentative relevance is as yet unclear. Both have most apparent in what appears in the backing for the device. developed iteratively from critique within the field, and both An argumentation device gains its status through are still being elaborated to eliminate vulnerabilities in their incorporation of various assurances of its own ability to conclusions. Argumentation devices demand consideration deliver reliable conclusions, including new institutional of context: not only the community within which they resources that underwrite the device as a whole. The emerge but also the state of play within that community. Cochrane device is a particularly clear example, since it depends very openly on the growth of institutional resources #/)0)+) to assure that a conclusion from a Cochrane Review is based    ##-. -. on the most exhaustive search possible for relevant # evidence. Although a machine-searchable database of  $% medical research literature (MEDLINE, for the US National  Library of Medicine) has been available since the 1960s, the Cochrane Collaborative has created a specialized database  # specifically for controlled trials, known as CENTRAL (Cochrane Central Registry of Controlled Trials), that includes both a subset of MEDLINE entries and other items $%   retrieved from a variety of sources, including manual search  !'$  $%& $  $#   ,$% -! #! %#* of conference programs by members of the Cochrane • &#!$ &# • # . Collaboration. Reviewers are expected to search both • &#!$ • #! • )!$ MEDLINE and CENTRAL to identify every possible relevant item, and to examine each item for whether it meets )  - #. inclusion criteria. A typical Cochrane Review will identify thousands of potentially relevant items and winnow these to a few dozen studies that actually provide relevant data. Figure 1. General form of a Cochrane Review’s argument, with The resources that are required for an argumentation delegations of responsibility in the backing for the warrant. device to operate at all need some presence in any graph, diagram, or other formal representation of an argument from expertise that is itself an argument from some field- 4 Critical questions about devices dependent device—arguments like those presented in A Cochrane Review is organized around both presentation Examples 1, 2, and 3. These resources are meant as of data and response to critical questions about the gathering strengtheners of the expert argument, but they are also a and interpretation of the data. In other words, much of the system of delegations in which responsibility for the validity text of a Cochrane Review consists of explicit answers to of any one conclusion has been spread throughout a huge the questions other experts would be presumed to have. An collective of participants. The individual performers of enormous advantage that comes with use of an established Cochrane Reviews take responsibility for faithful adherence argumentation device is that the device itself does not need to Cochrane procedures, but responsibility for the defense for each occasion of use. It can function as a exhaustiveness of the search is delegated to databases; the warrant for many specific conclusions, each of which has its responsibility for what is available to be retrieved is own unique body of evidence. delegated to funding agencies that set research priorities; Although an argumentation device may be applied in a and the responsibility for establishing hierarchies of completely uncontroversial way within an expert field, that evidence is delegated to trusted working groups within the is no protection against questions or challenges from beyond Cochrane Collaboration. These delegations are themselves the field. The fact that a device has earned the confidence of an interesting fact about contemporary argumentation a group of experts is not quite sufficient to earn trust from [Jackson, 2015a] that could be better understood if they other potential audiences. The testing ground for any new were explicitly included in formal models of argumentation. argumentation device is argumentation itself. The device Figure 1 illustrates how these delegations might be must earn its status by withstanding critique. We end by incorporated in a Toulmin diagram, as forms of backing for considering what kinds of questions might arise, reasonably the Cochrane Review procedure. or even unreasonably, as devices like Cochrane Reviews The most distinctive differences between argumentation enter new testing grounds. devices and familiar argumentation schemes are their field- To begin with, critical questions relevant to arguments specificity and their openness to redesign [Jackson, 2015b]. supported by Cochrane Reviews share some similarities The primary purpose of an argumentation device is to with critical questions for arguments from expert opinion. provide convincing evidence for a conclusion to people who The accuracy of an arguer’s understanding of expert opinion 52 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) is always relevant. Consider again Example 3—the involves questions that may need to be asked to correct an Guardian’s appeal to a Cochrane Review of 54 studies as unsuspected bias. Such questions can sometimes be evidence against any link between autism and MMR. The formulated more easily by non-experts than by the experts review [Demicheli et al., 2012] did in fact look at 54 themselves, by coming from a perspective with its own studies, but only 10 included autism as an outcome variable, biases, but different ones. and by the reviewers’ assessments of quality, it does not In health controversies where much is at stake, both appear that they would agree that the 10 studies relevant to experts and non-experts will fully explore the possible this particular claim provide “extremely strong” evidence. grounds for disagreement with conclusions drawn from (None of the ten were RCTs, and none individually offered experts’ argumentation devices, and the devices themselves a strong design for detecting a link between MMR and will improve in order to better withstand critique. An autism. Reviewers classified all ten of the autism-related important goal in modeling argumentation devices is to studies as containing either “high” risk of bias or expose avenues for productive examination of the devices “moderate/unknown” risk of bias.) Where the Guardian has by non-experts, and to assist experts in responding gone wrong here is in assuming that a “gold standard” productively to even the most skeptical critique. procedure can produce “extremely strong” evidence from a research literature that is inadequate, a failure to understand Acknowledgments that any limitations of the primary research literature are The second author was supported by training grant inherited by the review. 5T15LM007059-29 from the National Library of Medicine But in addition to questions similar to those relevant to and National Institute of Dental and Cranio-facial Research. assessment of argument from expert opinion, any device of this kind will be vulnerable to challenges specific to the device. A significant feature of the current design of the References Cochrane Review is that it aggregates evidence from [Chalmers et al., 1989] Iain Chalmers, Murray Enkin, and scientific literature (sometimes including unpublished data, Marc J.N.C. Keirse. Effective care in pregnancy and but mostly from reports published in some form and childbirth: Pregnancy. Oxford University Press, 1989. included in a database). By design, a Cochrane Review [Chesñevar et al., 2006] Carlos Chesñevar et al. Towards ignores evidence that could, in principle, be relevant. This an Argument Interchange Format, The Knowledge includes the very wide range of evidence types that can be Engineering Review 21(4): 293–316, December 2006. supplied by ordinary people paying attention to their own health and their own reactions to treatments. For the [Demicheli et al., 2012] Vittorio Demicheli, Alessandro vaccination controversy, this includes evidence that is Rivetti, Maria Grazia Delabine, and Carlo di Pietrantonj. highly credible to many members of the public (first-hand Vaccines for measles, mumps and rubella in children. parent observations of adverse reactions to vaccines); the Cochrane Database of Systematic Reviews 2012 Issue 2. fact that no serious effort has been made to systematically Art. No.: CD004407. review these reports is a reason for those affected to [Jackson, 2015a] Sally Jackson. Deference, distrust, and question the credibility of the institutions that back the delegation: Three design hypotheses. Reflections on Cochrane device. So one class of critical questions have to Theoretical Issues in Argumentation Theory (pp. 227- do with whether there are forms of evidence the device does 243). Springer International Publishing, August 2015. not (or cannot) ingest. [Jackson, 2015b] Sally Jackson. Design thinking in Another class of critical questions have to do with biases argumentation theory and practice. Argumentation, built into the device. The device is always designed to 29(3): 243-263, August 2015. answer some set of questions but not others, and to assume those things that its expert users assume. To illustrate, a [Meldrum, 2000] Marcia L. Meldrum. A brief history of the common notion within anti-vaccination discourse is that the randomized controlled trial: From oranges and lemons to institutions responsible for the production of the primary the gold standard. Hematology/oncology clinics of North research have so strong an interest in mass immunization America, 14(4), 745-760, August 2000. that they conceal or suppress evidence of serious risks— [Oliver and Wood, 2014] J. Eric Oliver and Thomas Wood. characterized as conspiracy thinking by Oliver and Wood Medical conspiracy theories and health behaviors in the [2014]. While no one seriously expects scientists to respond United States. JAMA Intern Med, 174(5):817-818, 2014. to conspiracy theories, it is certainly reasonable to ask what [Review, 1990] Book review of [Chalmers et al., 1989]. interests and assumptions shared within an expert Birth, 17(1): 55–62, March 1990. community might make the community blind to certain evidence or deaf to certain arguments. [Toulmin, 1958] Stephen E. Toulmin. The uses of Seeing argumentation devices as an encapsulation of argument. Cambridge University Press, 1958. how the expert community reasons, questions can be asked [Walton et al., 2008] Douglas Walton, Chris Reed, and not only about the individual use of the device in one Fabrizio Macagno, Argumentation Schemes, 1st edn, argument, but also about the assumptions the device Cambridge University Press, August 2008. encapsulates. This is an important shift of scale that 53