Diagnosis at Scale: Detecting the Expertise Level and
Knowledge States of Lifelong Professional Learners
                                                    Oluwabukola Mayowa Ishola, Gord McCalla
                                                         Department of Computer Science
                                                   University of Saskatchewan, Saskatoon, Canada
                                                   bukola.ishola@usask.ca , mccalla@cs.usask.ca
                                                                         shown in figure 1 (akin to that in [8]). Only 5 tags were used in
                                                                         this study: java, python, cplusplus, mysql, and sql. The restriction
ABSTRACT                                                                 on choice of tags used in modelling knowledge was employed so
Our research is about identifying gaps in the knowledge of               as to have enough data about each category. Although, just 5 tags
professional software developers, as part of an ongoing project to       were considered, the total number of posts under each of these
provide tools to support their lifelong learning needs. We               tags was large, ranging from a low of 238,487 posts regarding
developed metrics that when applied to programmers’ online               SQL to a high of 708,533 posts regarding java.
activities in Stack Overflow allowed us to determine the
knowledge states of users on specific topics indicating what each
user knows they know and their knowledge “gaps”, both what
they know they don’t know and what they don’t know they don’t
know. Further we were able to find patterns that showed that at all
levels of expertise there are still “unknown unknowns”, and these
are particularly dangerous since the software professional is
unaware of their weaknesses in these areas.
KEYWORDS: knowledge states, diagnosis, lifelong learning

1. INTRODUCTION                                                           Figure 1. Hierarchical Structure Model Employed In Diagnosis
Advanced learning technology research has begun to take on a              of Expertise
complex challenge: supporting lifelong learning [1]. Professional
                                                                         We then determined for each user their expertise level in each of
learning is an important subset of lifelong learning that is (at least
                                                                         the 5 leaf areas, based on their SO reputation scores in the area. In
somewhat) more tractable than the full lifelong learning
                                                                         SO the reputation data fits a power law in which the majority of
challenge. Professional lifelong learning is an ever more critical
                                                                         users have a low reputation score; the higher the score the fewer the
issue as the rate at which knowledge is generated in almost every        number of users. Using the method explored by Jiang [5], we fit a
professional discipline continues to accelerate.. Of course,             power curve to the actual SO reputation data for each area and
professionals will evolve and develop their skills in the day-to-day     computed Xmin and α, where Xmin represents the point where the
practice of their profession, but workplace skills are not exactly       exponential behavior begins in the dataset and α is the exponential
the same as professional development because these skills are            factor. Users below Xmin were considered to be beginners. We then
specific to their job role or even their particular workplace [2].       divided the remainder of the users into two equal sized chunks, the
Professionals can be so overwhelmed with work responsibilities           intermediate and expert users. Having diagnosed the expertise
that they are ignorant of important new knowledge that exists.           level of each user in each area, we then inferred their expertise
   Our goal in this research is to be able to diagnose the               level at the higher level nodes. In making this inference, we took
expertise of software professionals. We turned to a                      the highest level of expertise of the user on the leaf nodes beneath
categorization of knowledge made by several different people             a non-leaf node and assigned this level to the non-leaf node
[3,4] In this categorization knowledge can be divided into 4             (recognizing that high expertise in one sub-area transfers to the
knowledge states: the things we know we know, the “known                 more generic category, even in the absence of direct evidence).
knowns” (KK); the things we know we don’t know, the “known               This was done recursively, up the hierarchy.
unknowns” (KU); the things we are not aware we know but we
do know, the “unknown knowns” (UK); and, lastly, the things              3. DIAGNOSIS OF KNOWLEDGE STATES
we don’t know we don’t know, the “unknown unknowns” (UU)                 Next we wanted to diagnose the knowledge states of each user.
[4]. The known unknowns and the unknown unknowns we                      Again, we considered only the 5 basic knowledge areas. The
collectively call the “gaps” in a person’s knowledge, and the            “known knowns” were determined by looking at the distinct
most worrisome of these are the unknown unknowns, since a                answers the user has given under each tag that were up-voted. The
person is ignorant of their own ignorance.                               “known unknowns” were determined by looking at the tags of
                                                                         questions the user has asked. The “unknown unknowns” were
2. DIAGNOSIS OF EXPERTISE                                                determined by looking at the tags of questions that the user has
The experimental test bed for our research is the well known             answered where the answer was down voted. At this stage in this
online programmers’ forum called Stack Overflow (SO). We                 work, no metric has been defined for the “unknown knowns”; i.e.
wanted to look for patterns in SO posts that allowed us to               the things the user knows but is not aware that they know. To
diagnose the expertise of the SO users. Posts were grouped under         determine the knowledge state of each user on each of the 5 topics
their related tags and tags were mapped into appropriate                 represented by the tags we simply count the number of KK, KU,
knowledge areas as represented by leaf nodes in the hierarchy            and UU posts for a given tag for a given user and determine the
relative percentage of each. The highest percentage exhibited by       they need to be fully capable professionals. Moreover, Ley and
the user is diagnosed to be their knowledge state for the topic        Kump [7] argued that tasks performed alone is a weak measure in
represented by that specific tag. For instance, a user whose           accessing competency of professionals, as competency will at
evidence of KK for java is 70%, KU for java is 20% and UU for          most be judged in comparison to fellow workers rather than with
java is 10%, will be determined to know java, i.e. java is a known     professionals in society at large.
known. This process is carried out for all 5 tags, to determine the        Working in the professional programming domain, our study
knowledge state a user exhibited for the topic represented by that     goes beyond these limitations in several ways. First, we define
tag.                                                                   competence in terms of knowledge states with a particular focus
4. RESULTS                                                             on what is known and unknown to the professional. Further, rather
In analyzing the data, we computed the average percentage of KK,       than restricting ourselves to examining job performance for
KU and UU for users in various expertise classes for each of the       evidence of capability, we look at the actual social interactions of
knowledge areas represented by the 5 tags. For example,                professional programmers as they seek and receive help in a
considering all users who posted in java whose competency level        professional forum. Competency is judged in the context of other
is ‘beginner’, the average percentage for the KK, KU and UU was        professionals who are mostly outside their own work places. Our
computed. Aggregate results from the 5 knowledge areas (for all 3      approach also scales to a large number of users (we had access to
expertise levels) is represented in figure 2 below.                    the data of 888,603 active professionals). The approach also
                                                                       scales temporally: as a discipline evolves new knowledge over
                                                                       time that knowledge will automatically filter into professional
                                                                       interactions, and thus the knowledge states of users on this new
                                                                       knowledge can be readily diagnosed (assuming that the ontology
                                                                       and tag-to-ontology mappings are updated).
                                                                           To be sure there is much more to be done. We need to confirm
                                                                       the results of this first experiment with further evidence that our
                                                                       diagnoses are accurate. We need to explore other competency and
                                                                       performance metrics that can be mined from SO data. We need to
                                                                       create more refined ontologies that we hope will allow tracking
                                                                       knowledge at a finer grain size. And, ultimately, we wish to create
                                                                       an open user modeling system that can reflect the diagnoses back
                                                                       to the professional user. We believe this approach to “diagnosis at
Figure 2. Aggregate Distribution over All Knowledge Areas              scale” has a promising future in supporting the lifelong
                                                                       learning.needs of professionals.
Figure 2 shows that as a professional’s competency level
increases, the proportion of their knowledge that consists of          6. ACKNOWLEDGEMENTS
known knowns also increases. This is true for all 5 knowledge          Thanks to the Natural Sciences and Engineering Research Council
areas. This is reasonable, since presumably one measure of a           of Canada and the U of Saskatchewan for funding this research.
professional’s growing capability is that they come to know more
(and that they know they know more). Similarly, across all 5           7. REFERENCES
knowledge areas, the proportion of unknown unknowns steadily           [1] Kay, J., & Kummerfield, B. (2009). Lifelong user modelling
declines as expertise increases. The overall trend seems to be that        goals, issues and challenges. In Proceedings of the Lifelong
the known unknowns continue to constitute about the same                   User Modelling Workshop at UMAP-2009, pp. 27-34).
proportion of their knowledge when they are of intermediate            [2] Bruce, C. S. (1999). Workplace experiences of information
capability as when they are beginners. Since their known knowns            literacy. International journal of information management,
are a higher proportion of their knowledge than when they were             19(1), 33-47.
beginners, this suggests that at the intermediate stage                [3] Dunning, D. (2011). 5 The Dunning-Kruger Effect: On
professionals not only come to know more, but also come to know            Being Ignorant of One's Own Ignorance. Advances in
more about what they don’t know. Reassuringly, across all                  experimental social psychology, 44, 247.
knowledge areas, the proportion of known unknowns decline as a         [4] Rumsfeld, D. (2011). Known and unknown: a memoir.
professional of intermediate capability becomes an expert. Again,          Penguin.
this suggests that the professional has growing expertise and has      [5] Jiang, B. (2013). Head/tail breaks: A new classification
acted to reduce his or her known weaknesses. Perhaps the most              scheme for data with a heavy-tailed distribution. The
interesting overall lesson from this analysis is that experts still        Professional Geographer, 65(3), 482-494.
have a considerable residue of unknown unknowns. The expert            [6] Ley, T., Ulbrich, A., Scheir, P., Lindstaedt, S. N., Kump, B.,
himself or herself may indeed find it difficult to believe that the        & Albert, D. (2008). Modeling competencies for supporting
knowledge they have learned and practiced for years is not as              work-integrated learning in knowledge work. Journal of
comprehensive as they thought. This suggests the need for tools            Knowledge Management, 12(6), 31-47.
that will enhance the self-awareness of professionals about their      [7] Ley, T., & Kump, B. (2013). Which User Interactions Predict
knowledge states, especially their unknown unknowns.                       Levels of Expertise in Work-Integrated Learning. In Scaling
                                                                           up Learning for Sustained Impact (pp. 178-190). Springer
5. DISCUSSION                                                              Berlin Heidelberg.
The competency of professionals has been determined in the past        [8] Ishola, O. M., Shoewu, O., & Olatinwo, S. O. (2013). A
mainly by tracking their job performance [6]. This is not sufficient       Conceptual Design of Analytical Hierarchical Process Model
to judge their overall competence in their profession since the job        to the Boko Haram Crisis in Nigeria. In Information and
(and the workplace) will likely require only a subset of the skills        Knowledge Management (Vol. 3, No. 3, pp. 1-19).