<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Diagnosis at Scale: Detecting the Expertise Level Knowledge States of Lifelong Professional Learners and</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Oluwabukola Mayowa Ishola, Gord McCalla Department of Computer Science University of Saskatchewan</institution>
          ,
          <addr-line>Saskatoon</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our research is about identifying gaps in the knowledge of professional software developers, as part of an ongoing project to provide tools to support their lifelong learning needs. We developed metrics that when applied to programmers' online activities in Stack Overflow allowed us to determine the knowledge states of users on specific topics indicating what each user knows they know and their knowledge “gaps”, both what they know they don't know and what they don't know they don't know. Further we were able to find patterns that showed that at all levels of expertise there are still “unknown unknowns”, and these are particularly dangerous since the software professional is unaware of their weaknesses in these areas.</p>
      </abstract>
      <kwd-group>
        <kwd>knowledge states</kwd>
        <kwd>diagnosis</kwd>
        <kwd>lifelong learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Advanced learning technology research has begun to take on a
complex challenge: supporting lifelong learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Professional
learning is an important subset of lifelong learning that is (at least
somewhat) more tractable than the full lifelong learning
challenge. Professional lifelong learning is an ever more critical
issue as the rate at which knowledge is generated in almost every
professional discipline continues to accelerate.. Of course,
professionals will evolve and develop their skills in the day-to-day
practice of their profession, but workplace skills are not exactly
the same as professional development because these skills are
specific to their job role or even their particular workplace [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Professionals can be so overwhelmed with work responsibilities
that they are ignorant of important new knowledge that exists.
      </p>
      <p>
        Our goal in this research is to be able to diagnose the
expertise of software professionals. We turned to a
categorization of knowledge made by several different people
[
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ] In this categorization knowledge can be divided into 4
knowledge states: the things we know we know, the “known
knowns” (KK); the things we know we don’t know, the “known
unknowns” (KU); the things we are not aware we know but we
do know, the “unknown knowns” (UK); and, lastly, the things
we don’t know we don’t know, the “unknown unknowns” (UU)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The known unknowns and the unknown unknowns we
collectively call the “gaps” in a person’s knowledge, and the
most worrisome of these are the unknown unknowns, since a
person is ignorant of their own ignorance.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. DIAGNOSIS OF EXPERTISE</title>
      <p>
        The experimental test bed for our research is the well known
online programmers’ forum called Stack Overflow (SO). We
wanted to look for patterns in SO posts that allowed us to
diagnose the expertise of the SO users. Posts were grouped under
their related tags and tags were mapped into appropriate
knowledge areas as represented by leaf nodes in the hierarchy
shown in figure 1 (akin to that in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Only 5 tags were used in
this study: java, python, cplusplus, mysql, and sql. The restriction
on choice of tags used in modelling knowledge was employed so
as to have enough data about each category. Although, just 5 tags
were considered, the total number of posts under each of these
tags was large, ranging from a low of 238,487 posts regarding
SQL to a high of 708,533 posts regarding java.
We then determined for each user their expertise level in each of
the 5 leaf areas, based on their SO reputation scores in the area. In
SO the reputation data fits a power law in which the majority of
users have a low reputation score; the higher the score the fewer the
number of users. Using the method explored by Jiang [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we fit a
power curve to the actual SO reputation data for each area and
computed Xmin and α, where Xmin represents the point where the
exponential behavior begins in the dataset and α is the exponential
factor. Users below Xmin were considered to be beginners. We then
divided the remainder of the users into two equal sized chunks, the
intermediate and expert users. Having diagnosed the expertise
level of each user in each area, we then inferred their expertise
level at the higher level nodes. In making this inference, we took
the highest level of expertise of the user on the leaf nodes beneath
a non-leaf node and assigned this level to the non-leaf node
(recognizing that high expertise in one sub-area transfers to the
more generic category, even in the absence of direct evidence).
This was done recursively, up the hierarchy.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. DIAGNOSIS OF KNOWLEDGE STATES</title>
      <p>Next we wanted to diagnose the knowledge states of each user.
Again, we considered only the 5 basic knowledge areas. The
“known knowns” were determined by looking at the distinct
answers the user has given under each tag that were up-voted. The
“known unknowns” were determined by looking at the tags of
questions the user has asked. The “unknown unknowns” were
determined by looking at the tags of questions that the user has
answered where the answer was down voted. At this stage in this
work, no metric has been defined for the “unknown knowns”; i.e.
the things the user knows but is not aware that they know. To
determine the knowledge state of each user on each of the 5 topics
represented by the tags we simply count the number of KK, KU,
and UU posts for a given tag for a given user and determine the
relative percentage of each. The highest percentage exhibited by
the user is diagnosed to be their knowledge state for the topic
represented by that specific tag. For instance, a user whose
evidence of KK for java is 70%, KU for java is 20% and UU for
java is 10%, will be determined to know java, i.e. java is a known
known. This process is carried out for all 5 tags, to determine the
knowledge state a user exhibited for the topic represented by that
tag.</p>
    </sec>
    <sec id="sec-4">
      <title>4. RESULTS</title>
      <p>In analyzing the data, we computed the average percentage of KK,
KU and UU for users in various expertise classes for each of the
knowledge areas represented by the 5 tags. For example,
considering all users who posted in java whose competency level
is ‘beginner’, the average percentage for the KK, KU and UU was
computed. Aggregate results from the 5 knowledge areas (for all 3
expertise levels) is represented in figure 2 below.
Figure 2 shows that as a professional’s competency level
increases, the proportion of their knowledge that consists of
known knowns also increases. This is true for all 5 knowledge
areas. This is reasonable, since presumably one measure of a
professional’s growing capability is that they come to know more
(and that they know they know more). Similarly, across all 5
knowledge areas, the proportion of unknown unknowns steadily
declines as expertise increases. The overall trend seems to be that
the known unknowns continue to constitute about the same
proportion of their knowledge when they are of intermediate
capability as when they are beginners. Since their known knowns
are a higher proportion of their knowledge than when they were
beginners, this suggests that at the intermediate stage
professionals not only come to know more, but also come to know
more about what they don’t know. Reassuringly, across all
knowledge areas, the proportion of known unknowns decline as a
professional of intermediate capability becomes an expert. Again,
this suggests that the professional has growing expertise and has
acted to reduce his or her known weaknesses. Perhaps the most
interesting overall lesson from this analysis is that experts still
have a considerable residue of unknown unknowns. The expert
himself or herself may indeed find it difficult to believe that the
knowledge they have learned and practiced for years is not as
comprehensive as they thought. This suggests the need for tools
that will enhance the self-awareness of professionals about their
knowledge states, especially their unknown unknowns.</p>
    </sec>
    <sec id="sec-5">
      <title>5. DISCUSSION</title>
      <p>
        The competency of professionals has been determined in the past
mainly by tracking their job performance [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This is not sufficient
to judge their overall competence in their profession since the job
(and the workplace) will likely require only a subset of the skills
they need to be fully capable professionals. Moreover, Ley and
Kump [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] argued that tasks performed alone is a weak measure in
accessing competency of professionals, as competency will at
most be judged in comparison to fellow workers rather than with
professionals in society at large.
      </p>
      <p>Working in the professional programming domain, our study
goes beyond these limitations in several ways. First, we define
competence in terms of knowledge states with a particular focus
on what is known and unknown to the professional. Further, rather
than restricting ourselves to examining job performance for
evidence of capability, we look at the actual social interactions of
professional programmers as they seek and receive help in a
professional forum. Competency is judged in the context of other
professionals who are mostly outside their own work places. Our
approach also scales to a large number of users (we had access to
the data of 888,603 active professionals). The approach also
scales temporally: as a discipline evolves new knowledge over
time that knowledge will automatically filter into professional
interactions, and thus the knowledge states of users on this new
knowledge can be readily diagnosed (assuming that the ontology
and tag-to-ontology mappings are updated).</p>
      <p>To be sure there is much more to be done. We need to confirm
the results of this first experiment with further evidence that our
diagnoses are accurate. We need to explore other competency and
performance metrics that can be mined from SO data. We need to
create more refined ontologies that we hope will allow tracking
knowledge at a finer grain size. And, ultimately, we wish to create
an open user modeling system that can reflect the diagnoses back
to the professional user. We believe this approach to “diagnosis at
scale” has a promising future in supporting the lifelong
learning.needs of professionals.</p>
    </sec>
    <sec id="sec-6">
      <title>6. ACKNOWLEDGEMENTS</title>
      <p>Thanks to the Natural Sciences and Engineering Research Council
of Canada and the U of Saskatchewan for funding this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kay</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kummerfield</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Lifelong user modelling goals, issues and challenges</article-title>
          .
          <source>In Proceedings of the Lifelong User Modelling Workshop at UMAP-2009</source>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bruce</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Workplace experiences of information literacy</article-title>
          .
          <source>International journal of information management</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <fpage>33</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Dunning</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>5 The Dunning-Kruger Effect: On Being Ignorant of One's Own Ignorance</article-title>
          .
          <source>Advances in experimental social psychology</source>
          ,
          <volume>44</volume>
          ,
          <fpage>247</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Rumsfeld</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Known and unknown: a memoir</article-title>
          .
          <source>Penguin.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution</article-title>
          .
          <source>The Professional Geographer</source>
          ,
          <volume>65</volume>
          (
          <issue>3</issue>
          ),
          <fpage>482</fpage>
          -
          <lpage>494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ulbrich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheir</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lindstaedt</surname>
            ,
            <given-names>S. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kump</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Albert</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Modeling competencies for supporting work-integrated learning in knowledge work</article-title>
          .
          <source>Journal of Knowledge Management</source>
          ,
          <volume>12</volume>
          (
          <issue>6</issue>
          ),
          <fpage>31</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kump</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Which User Interactions Predict Levels of Expertise in Work-Integrated Learning</article-title>
          .
          <source>In Scaling up Learning for Sustained Impact</source>
          (pp.
          <fpage>178</fpage>
          -
          <lpage>190</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ishola</surname>
            ,
            <given-names>O. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shoewu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Olatinwo</surname>
            ,
            <given-names>S. O.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A Conceptual Design of Analytical Hierarchical Process Model to the Boko Haram Crisis in Nigeria</article-title>
          .
          <source>In Information and Knowledge Management</source>
          (Vol.
          <volume>3</volume>
          , No.
          <issue>3</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>