Diagnosis at Scale: Detecting the Expertise Level and Knowledge States of Lifelong Professional Learners Oluwabukola Mayowa Ishola, Gord McCalla Department of Computer Science University of Saskatchewan, Saskatoon, Canada bukola.ishola@usask.ca , mccalla@cs.usask.ca shown in figure 1 (akin to that in [8]). Only 5 tags were used in this study: java, python, cplusplus, mysql, and sql. The restriction ABSTRACT on choice of tags used in modelling knowledge was employed so Our research is about identifying gaps in the knowledge of as to have enough data about each category. Although, just 5 tags professional software developers, as part of an ongoing project to were considered, the total number of posts under each of these provide tools to support their lifelong learning needs. We tags was large, ranging from a low of 238,487 posts regarding developed metrics that when applied to programmers’ online SQL to a high of 708,533 posts regarding java. activities in Stack Overflow allowed us to determine the knowledge states of users on specific topics indicating what each user knows they know and their knowledge “gaps”, both what they know they don’t know and what they don’t know they don’t know. Further we were able to find patterns that showed that at all levels of expertise there are still “unknown unknowns”, and these are particularly dangerous since the software professional is unaware of their weaknesses in these areas. KEYWORDS: knowledge states, diagnosis, lifelong learning 1. INTRODUCTION Figure 1. Hierarchical Structure Model Employed In Diagnosis Advanced learning technology research has begun to take on a of Expertise complex challenge: supporting lifelong learning [1]. Professional We then determined for each user their expertise level in each of learning is an important subset of lifelong learning that is (at least the 5 leaf areas, based on their SO reputation scores in the area. In somewhat) more tractable than the full lifelong learning SO the reputation data fits a power law in which the majority of challenge. Professional lifelong learning is an ever more critical users have a low reputation score; the higher the score the fewer the issue as the rate at which knowledge is generated in almost every number of users. Using the method explored by Jiang [5], we fit a professional discipline continues to accelerate.. Of course, power curve to the actual SO reputation data for each area and professionals will evolve and develop their skills in the day-to-day computed Xmin and α, where Xmin represents the point where the practice of their profession, but workplace skills are not exactly exponential behavior begins in the dataset and α is the exponential the same as professional development because these skills are factor. Users below Xmin were considered to be beginners. We then specific to their job role or even their particular workplace [2]. divided the remainder of the users into two equal sized chunks, the Professionals can be so overwhelmed with work responsibilities intermediate and expert users. Having diagnosed the expertise that they are ignorant of important new knowledge that exists. level of each user in each area, we then inferred their expertise Our goal in this research is to be able to diagnose the level at the higher level nodes. In making this inference, we took expertise of software professionals. We turned to a the highest level of expertise of the user on the leaf nodes beneath categorization of knowledge made by several different people a non-leaf node and assigned this level to the non-leaf node [3,4] In this categorization knowledge can be divided into 4 (recognizing that high expertise in one sub-area transfers to the knowledge states: the things we know we know, the “known more generic category, even in the absence of direct evidence). knowns” (KK); the things we know we don’t know, the “known This was done recursively, up the hierarchy. unknowns” (KU); the things we are not aware we know but we do know, the “unknown knowns” (UK); and, lastly, the things 3. DIAGNOSIS OF KNOWLEDGE STATES we don’t know we don’t know, the “unknown unknowns” (UU) Next we wanted to diagnose the knowledge states of each user. [4]. The known unknowns and the unknown unknowns we Again, we considered only the 5 basic knowledge areas. The collectively call the “gaps” in a person’s knowledge, and the “known knowns” were determined by looking at the distinct most worrisome of these are the unknown unknowns, since a answers the user has given under each tag that were up-voted. The person is ignorant of their own ignorance. “known unknowns” were determined by looking at the tags of questions the user has asked. The “unknown unknowns” were 2. DIAGNOSIS OF EXPERTISE determined by looking at the tags of questions that the user has The experimental test bed for our research is the well known answered where the answer was down voted. At this stage in this online programmers’ forum called Stack Overflow (SO). We work, no metric has been defined for the “unknown knowns”; i.e. wanted to look for patterns in SO posts that allowed us to the things the user knows but is not aware that they know. To diagnose the expertise of the SO users. Posts were grouped under determine the knowledge state of each user on each of the 5 topics their related tags and tags were mapped into appropriate represented by the tags we simply count the number of KK, KU, knowledge areas as represented by leaf nodes in the hierarchy and UU posts for a given tag for a given user and determine the relative percentage of each. The highest percentage exhibited by they need to be fully capable professionals. Moreover, Ley and the user is diagnosed to be their knowledge state for the topic Kump [7] argued that tasks performed alone is a weak measure in represented by that specific tag. For instance, a user whose accessing competency of professionals, as competency will at evidence of KK for java is 70%, KU for java is 20% and UU for most be judged in comparison to fellow workers rather than with java is 10%, will be determined to know java, i.e. java is a known professionals in society at large. known. This process is carried out for all 5 tags, to determine the Working in the professional programming domain, our study knowledge state a user exhibited for the topic represented by that goes beyond these limitations in several ways. First, we define tag. competence in terms of knowledge states with a particular focus 4. RESULTS on what is known and unknown to the professional. Further, rather In analyzing the data, we computed the average percentage of KK, than restricting ourselves to examining job performance for KU and UU for users in various expertise classes for each of the evidence of capability, we look at the actual social interactions of knowledge areas represented by the 5 tags. For example, professional programmers as they seek and receive help in a considering all users who posted in java whose competency level professional forum. Competency is judged in the context of other is ‘beginner’, the average percentage for the KK, KU and UU was professionals who are mostly outside their own work places. Our computed. Aggregate results from the 5 knowledge areas (for all 3 approach also scales to a large number of users (we had access to expertise levels) is represented in figure 2 below. the data of 888,603 active professionals). The approach also scales temporally: as a discipline evolves new knowledge over time that knowledge will automatically filter into professional interactions, and thus the knowledge states of users on this new knowledge can be readily diagnosed (assuming that the ontology and tag-to-ontology mappings are updated). To be sure there is much more to be done. We need to confirm the results of this first experiment with further evidence that our diagnoses are accurate. We need to explore other competency and performance metrics that can be mined from SO data. We need to create more refined ontologies that we hope will allow tracking knowledge at a finer grain size. And, ultimately, we wish to create an open user modeling system that can reflect the diagnoses back to the professional user. We believe this approach to “diagnosis at Figure 2. Aggregate Distribution over All Knowledge Areas scale” has a promising future in supporting the lifelong learning.needs of professionals. Figure 2 shows that as a professional’s competency level increases, the proportion of their knowledge that consists of 6. ACKNOWLEDGEMENTS known knowns also increases. This is true for all 5 knowledge Thanks to the Natural Sciences and Engineering Research Council areas. This is reasonable, since presumably one measure of a of Canada and the U of Saskatchewan for funding this research. professional’s growing capability is that they come to know more (and that they know they know more). Similarly, across all 5 7. REFERENCES knowledge areas, the proportion of unknown unknowns steadily [1] Kay, J., & Kummerfield, B. (2009). Lifelong user modelling declines as expertise increases. The overall trend seems to be that goals, issues and challenges. In Proceedings of the Lifelong the known unknowns continue to constitute about the same User Modelling Workshop at UMAP-2009, pp. 27-34). proportion of their knowledge when they are of intermediate [2] Bruce, C. S. (1999). Workplace experiences of information capability as when they are beginners. Since their known knowns literacy. International journal of information management, are a higher proportion of their knowledge than when they were 19(1), 33-47. beginners, this suggests that at the intermediate stage [3] Dunning, D. (2011). 5 The Dunning-Kruger Effect: On professionals not only come to know more, but also come to know Being Ignorant of One's Own Ignorance. Advances in more about what they don’t know. Reassuringly, across all experimental social psychology, 44, 247. knowledge areas, the proportion of known unknowns decline as a [4] Rumsfeld, D. (2011). Known and unknown: a memoir. professional of intermediate capability becomes an expert. Again, Penguin. this suggests that the professional has growing expertise and has [5] Jiang, B. (2013). Head/tail breaks: A new classification acted to reduce his or her known weaknesses. Perhaps the most scheme for data with a heavy-tailed distribution. The interesting overall lesson from this analysis is that experts still Professional Geographer, 65(3), 482-494. have a considerable residue of unknown unknowns. The expert [6] Ley, T., Ulbrich, A., Scheir, P., Lindstaedt, S. N., Kump, B., himself or herself may indeed find it difficult to believe that the & Albert, D. (2008). Modeling competencies for supporting knowledge they have learned and practiced for years is not as work-integrated learning in knowledge work. Journal of comprehensive as they thought. This suggests the need for tools Knowledge Management, 12(6), 31-47. that will enhance the self-awareness of professionals about their [7] Ley, T., & Kump, B. (2013). Which User Interactions Predict knowledge states, especially their unknown unknowns. Levels of Expertise in Work-Integrated Learning. In Scaling up Learning for Sustained Impact (pp. 178-190). Springer 5. DISCUSSION Berlin Heidelberg. The competency of professionals has been determined in the past [8] Ishola, O. M., Shoewu, O., & Olatinwo, S. O. (2013). A mainly by tracking their job performance [6]. This is not sufficient Conceptual Design of Analytical Hierarchical Process Model to judge their overall competence in their profession since the job to the Boko Haram Crisis in Nigeria. In Information and (and the workplace) will likely require only a subset of the skills Knowledge Management (Vol. 3, No. 3, pp. 1-19).