Are We Overconfident in Our Understanding of Overconfidence? Raymond R. Panko Shidler College of Business University of Hawai`i 2404 Maile Way Honolulu, HI 96821 001.808.377.1149 Ray@Panko.com ABSTRACT In spreadsheet error research, there is a Grand Paradox. Although 2. RISK BLINDNESS IN BEHAVIORAL many studies have looked at spreadsheet errors, and have found, STUDIES without exception, has error rates that are unacceptable in organizations, organizations continue to ignore spreadsheet risks. This paper introduces other possible approaches for understanding They do not see the need to apply software engineering disciplines the Grand Paradox. It focuses on risk blindness, which is our long seen to be necessary in software development, in which error unawareness of errors when they occur. types and rates are similar to those in spreadsheet development..1 Naatanen and Summala [9] first articulated the idea that humans Traditionally, this Great Paradox had been attributed to over- are largely blind to risks. Expanding on this idea, Howarth [5] confidence. This paper introduces other possible approaches for studied drivers who approached children wanting to cross at an understanding the Grand Paradox. It focuses on risk blindness, intersection. Fewer than 10% of drivers took action, and those which is our unawareness of errors when they occur. actions would have come too late if the children had started cros- sing the street. Svenson [14] studied drivers approaching blind Categories and Subject Descriptors bends in a road. Unfamiliar drivers slowed down. Familiar drivers K.8.1: Spreadsheets. D.2.5 Testing and Debugging. did not, approaching at speeds that would have made accident avoidance impossible. General Terms Experimentation, Verification. Fuller [2] suggested that risk blindness in experienced people stems from something like operant conditioning. If we speed in a Keywords dangerous area, we get to our destination faster. This positive Methodology. Spreadsheet Experiments, Experiments, Inspection. feedback reinforces risky speeding behavior. In spreadsheet Sampling, Statistics development, developers who do not do comprehensive error checking finish faster and avoid onerous testing work. In contrast, negative reinforcement in the form of accidents is uncertain and 1. INTRODUCTION rare. Despite overwhelming and unanimous evidence that spreadsheet Even near misses may reinforce risky behavior rather than to reduce errors are widespread and material, companies have continued to it. In a simulation study of ship handling, Habberley, Shaddick, and ignore spreadsheet error risks. In the past, this Great Paradox had Taylor [4] observed that skilled watch officers consistently came been attributed to overconfidence. Human beings are overconfident hazardously close to other vessels. In addition, when risky behavior in most things, from driving skills to their ability to create large required error-avoiding actions, watch officers experienced a gain error-free spreadsheets. In one of the earliest spreadsheet experi- in confidence in their “skills” because they had successfully avoi- ments, Brown and Gould [1] noted that developers were extremely ded accidents. Similarly, in spreadsheet development, if we catch confident in their spreadsheets’ accuracy, although every par- some errors as we work, we may believe that we are skilled in ticipant made at least one undetected error during the development catching errors and so have no need for formal post-development process. Later experimenters also remarked on overconfidence. testing. Panko conducted an experiment to see if feedback would reduce overconfidence, as has been the case in some general over- Another possible explanation comes from modern cognitive/ confidence studies. The study found a statistically significant neuroscience. Although we see comparatively little of what is in reduce in confidence and error rates, but the error rate reduction front of us well and pay attention to much less, our brain’s was minimal. Goo performed another experiment to see if feedback constructed reality gives us the illusion what we see what is in front could reduce overconfidence and errors. There was some reduction of us clearly [11]. To cope with limited cognitive processing power, in overconfidence but no statistical reduction in errors. the CR construction process includes the editing of anything irrelevant to the constructed vision. Part of this is not making us aware of the many errors we make [11]. Error editing makes sense for optimal performance, but it means that humans have very poor spreadsheets as their understanding grows. Testing methods must intuition about the error rates and ability to avoid errors [11]. For reflect the real process of software development. the CR process this is an acceptable tradeoff, but it makes us con- fident that what we are doing works well. 4. REFERENCES Another explanation from cognitive/neuroscience is System 1 thinking, which has been discussed in depth by Kahneman [7]. [1] Brown, P. S. and Gould, J. D. 1987. An experimental study System 1 thinking uses parallel processing to generate conclusions of people creating spreadsheets. ACM Transactions on Office it is fast and easy, but its working are opaque. If we are walking Information Systems. 5, 3 (Nov. 1987), 258-272. down a street and a dog on a leash snaps at us, we jump. This is fast [2] Fuller, R. 1990. Learning to make errors: evidence from a or System 1 thinking. It is very effective and dominates nearly all driving simulation. Ergonomics, 33, 10/11 (Oct/Nov, 1993), of our actions, but it has drawbacks. First, it gives no indication that 1241-1250. it may be wrong. Unless we actively turn on slow System 2 [3] Goo, Justin M. W. 2002. The effect of feedback on thinking, which we cannot do all the time, we will accept System 1 confidence calibration in spreadsheet development. Doctoral suggestions uncritically. One problem with doing so is that System Dissertation, University of Hawaii. 1 thinking, when faced with an impossible or at least very difficult task, may solve a simpler task and make a decision on that basis. [4] Habberley, J. S., Shaddick, C. A., and Taylor, D. H. 1986. A For instance, if you are told that a bat and ball cost a dollar and ten behavioural study of the collision avoidance task in bridge cents and that the bat costs a dollar more than the ball, a typical watchkeeping. College of Marine Studies, Southampton, System 1 thought response is that the ball costs ten cents. This is England. Cited in Reason (1990). wrong, of course, but System 1 thinking tends to solve the simpler [5] Howarth, C. I. 1990. The relationship between objective risk, problem, $1.10 - $1.00. If we do not force ourselves to engage in subjective risk, and behavior. Ergonomics, 31, 527-535. slow and odious System 2 thinking, we are likely to accept the Cited in Wagenaar & Reason, 1990. System 1 alternative problem solution. [6] Jones, T. C. 1998. Estimating software costs. McGraw-Hill, This may be why, when developers are asked whether a spreadsheet New York, NY. they have just completed has errors, they quickly say no, on the basis of something other than reasoned risk. Reithel, Nichols, and [7] Kahneman, D. 2011. Thinking, fast or slow. Farrar, Strauss Robinson [13] had participants look at a small poorly formatted and Giroux, New York, NY. spreadsheet, a small nicely formatted spreadsheet, a large poorly [8] Kimberland, K. 2004. Microsoft’s pilot of TSP yields formatted spreadsheet, and a large nicely formatted spreadsheet. dramatic results, news@sei, No. 2. Participants rated their confidence in the four spreadsheets. http://www.sei.cmu.edu/news-at-sei/. Confidence was modest for three of the four spreadsheets. It was [9] Naatanen, R. and Summala, H. 1976. Road user behavior much higher for the large well-formatted spreadsheet. Logically, and traffic accidents. North-Holland, Amsterdam. Cited in this does not make sense. Larger spreadsheets are more likely to Wagenaar & Reason, 1990. have errors than smaller spreadsheets. This sounds like System 1 alternative problem solving. [10] Panko, R. R. 2007. Two experiments in reducing overconfidence in spreadsheet development. Journal of Organizational and End User Computing, 19, 1 (January– 3. CONCLUSION March 2007), 1-23. If we are to address the Great Paradox successfully and convince [11] Panko, R. R. 2013. The cognitive science of spreadsheet organizations and individuals that they need to create spreadsheets errors: Why thinking is bad. Proceedings of the 46th Hawaii more carefully, we must understand its causes so that we can be International Conference on System Sciences (Maui, Hawaii, persuasive. Beyond that, we must address the Spreadsheet Software January 7-10, 2013). Engineering Paradox—that computer scientists and information [12] Reason, J. 1990. Human error. Cambridge University Press, systems researchers have focused on spreadsheet creation aspects Cambridge, England. of software engineering, largely ignoring the importance and com- plexity of testing after the development of modules, functional [13] Reithel, B. J., Nichols, D. L., and Robinson, R. K. 1996. An units, and complete spreadsheets. In software engineering, it accep- experimental investigation of the effects of size, format, and ted that reducing errors during development is good but never gets errors on spreadsheet reliability perception. Journal of close to success. Commercial software developers spend 30% to Computer Information Systems, 54-64. 50% of their development resources on testing [6,8], and this does [14] Svensen, O. 1977. Risks of road transportation from a not count rework costs after errors are found. Yet spreadsheet psychological perspective: A pilot study. Report 3-77, Project engineering discussions typically downplay or completely ignore Risk Generation and Risk Assessment in a Social this five-ton elephant in the room. It may be that spreadsheets are Perspective, Committee for Future-Oriented Research, simply newer than software development, but spreadsheets have Stockholm, Sweden, 1977. Cited in Fuller, 1990. been use for a generation, and strong evidence of error risks have been around almost that long. [15] Wagenaar, W. A. and Reason, J. T. 1990. Types and tokens in road accident causation. Ergonomics, 33, 10/11 (Nov. We have only looked at the situation at the individual level. Testing 1993), 1365-1375. must be accepted by groups and even corporations. Even at the group level, this paper has not explored such theories as the diffusion of innovations. If spreadsheet testing is mandated, that will reduce risks. However, user developers must have the freedom to explore their problem spaces freely by modifying their