<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1145/2470654.2470707</article-id>
      <title-group>
        <article-title>Personality Modeling: Potential and Pitfalls</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David N. Chin</string-name>
          <email>chin@hawaii.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Hawaiʻi at Mānoa Dept. of Information &amp; Computer Sciences</institution>
          <addr-line>1680 East West Rd, POST 317 Honolulu, HI 96822</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Personality affects the responses of users to many things, so would be very useful for user adaptation. Since assessing personality requires long questionnaires that may not be practical or can be falsified, there is a need for techniques that infer personality from other user artifacts. This research field is too new to have established best practice research procedures. Pitfalls include overfitting data and how to correctly compare different classifiers or multiple regression models for statistical significance of accuracy differences.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. POTENTIAL</title>
      <p>The typical Big Five personality questionnaire has 50-items (IPIP)
with a slightly shorter 20-item version (Mini-IPIP) available and
the Myers-Briggs Type Indicator (MBTI) standard questionnaire
is a 93-item assessment with longer (144 and 222 items) versions
available. In many applications, users either will not or cannot
take the time to answer that many questions. In other
applications, users can easily cheat on the personality
selfassessments, invalidating the results. This is particularly
problematic in high-stakes contexts such as job applications and
applications to medical schools.</p>
      <p>
        By using indirect indicators to infer personality, user modeling
systems can bypass the drudgery of answering many questions in
a personality assessment. For example, [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] infers personality in
the massively multiplayer online game (MMOG) World of
Warcraft from player actions such as the ratio of dungeon-based
achievements versus all achievements and the ratio of need rolls
versus greed rolls, features of guild/character names such as the
number of negative or positive words in the name, and social
network measures such as degree centrality and frequency of
playing with different numbers of other characters.
      </p>
      <p>
        Also if these indicators are collected from user artifacts produced
before their applications for jobs or medical school, the likelihood
of cheating can be greatly minimized. Even if the indicators are
collected as part of the application process, the indirect indicators
may prove more difficult for applicants to trick. For example
many researchers have found correlations between text and
personality [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], 11, [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]. If an essay is included in a medical
school application, that text could be analyzed to predict
personality and cheating on the essay to masquerade as a different
personality would likely be much more difficult than picking
different answers on a personality questionnaire. Also the
predictions from the text analysis could be compared to results
from a personality questionnaire to catch cheaters.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. PITFALLS</title>
      <p>
        Inferring personality is still a very new research area, so
researchers have yet to establish best practice research procedures.
The most common methodology is to use machine learning to
train a classifier or derive a multiple regression equation for each
personality trait. As with all machine learning tasks, care must be
taken to avoid overfitting the data, which will typically happen
when the number of possible training features gets close to the
number of data points (users with personality profiles). Even the
largest known dataset of personality profiles, the myPersonality
Facebook dataset [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with over 6M personality profiles is eclipsed
by the number of possible words (from an estimated 20K for daily
newspapers to over 1M in comprehensive dictionaries), bigrams
(# words squared), and trigrams (# words cubed) in English.
As always, test datasets should be strongly segregated from
training and tuning datasets so that no test set data is ever used for
anything other than testing, including feature selection. To avoid
overfitting, the number of features should be trimmed using a
cutoff unrelated to their predictive value (e.g., information
theoretic measures such as pointwise mutual information). For
example, features could be trimmed based purely on their
frequency in the training dataset. Although there is no commonly
agreed upon ratio for the number of features relative to the sample
size (number of users), [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has found that regression equations
stabilize only after reaching a ratio of 100 users per predictor
feature.
      </p>
      <p>Another pitfall is how to compare classifiers and multiple
regression models. Researchers building personality classifiers
seem to have settled on binary classifiers that divide the
population evenly into high and low classes for each personality
trait based on above and below the mean. Occasionally
threeclass models divide the population into high/medium/low at one
standard deviation above and below the mean. Usually, because
different datasets are used, classifiers and regression models
cannot be directly compared. A higher classification accuracy or
a smaller root mean squared error (RMSE) does not mean
anything if they are from two different datasets and even worse if
those two datasets are from totally different domains.</p>
      <p>
        Even when the same dataset is used, it is still problematic
comparing two classifiers or regression equations. It may very
well be that using a different dataset from the same general
population would reverse the accuracy orderings. One would like
to know if the difference in accuracies are statistically significant.
The recommended practice is to use pairwise comparisons. For
example, [8] recommend comparing only those datapoints that
either classifier got right and the other got wrong using a
Binomial test since a t-test is the wrong statistical test because a
ttest assumes independence of the datasets for each treatment (each
classifier), which obviously is false since the classifiers are being
tested on the same test dataset. For multiple regression models,
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] also recommend pairwise comparisons of RMSE for each
datapoint. Thus to test the statistical significance of differences in
accuracy between classifiers or regression equations, researchers
need not only access to the same datasets, but also either the
prediction (classifiers) or RMSE (regression) for each datapoint in
the test set.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Feelders</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Verkooijen</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>On the statistical comparison of inductive learning methods</article-title>
          . In D. Fisher and H.
          <string-name>
            <surname>-J. Lenz</surname>
          </string-name>
          (Eds.),
          <source>Learning from Data: Artificial and Intelligence</source>
          V, pages
          <fpage>271</fpage>
          -
          <lpage>279</lpage>
          . Springer-Verlag
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Arvid</given-names>
            <surname>Karsvall</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Personality preferences in graphical interface design</article-title>
          .
          <source>In Proceedings of the second Nordic conference on Human-computer interaction (NordiCHI '02)</source>
          . ACM, New York, NY, USA,
          <fpage>217</fpage>
          -
          <lpage>218</lpage>
          . DOI=http://dx.doi.org/10.1145/572020.572049
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Kosinski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gosling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Stillwell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Facebook as a Social Science Research Tool: Opportunities, Challenges, Ethical Considerations</article-title>
          and
          <string-name>
            <given-names>Practical</given-names>
            <surname>Guidelines</surname>
          </string-name>
          .
          <source>American Psychologist</source>
          <volume>70</volume>
          (
          <issue>6</issue>
          ).
          <fpage>543</fpage>
          -
          <lpage>556</lpage>
          . http://dx.doi.org/10.1037/a0039210
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Lievens</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ones</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dilchert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Personality scale validities increase throughout medical school</article-title>
          .
          <source>J. Appl</source>
          . Psychol Nov;
          <volume>94</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1514</fpage>
          -
          <lpage>35</lpage>
          . DOI=
          <volume>10</volume>
          .1037/a0016137
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Oded</given-names>
            <surname>Nov</surname>
          </string-name>
          , Ofer Arazy, Claudia López, and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Exploring personality-targeted UI design</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Osborne</surname>
            ,
            <given-names>Jason W.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Prediction in multiple regression</article-title>
          .
          <source>Practical Assessment</source>
          , Research &amp; Evaluation,
          <volume>7</volume>
          (
          <issue>2</issue>
          ).
          <source>Retrieved May 24</source>
          ,
          <year>2016</year>
          from http://PAREonline.net/getvn.asp?v=
          <volume>7</volume>
          &amp;n=
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>King</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Linguistic styles: language use as an individual difference</article-title>
          .
          <source>Journal of personality and social psychology</source>
          ,
          <volume>77</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1296</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sfetsos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamelos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Deligiannis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Investigating the impact of personality types on communication and collaboration-viability in pair programming-an empirical study. In Extreme programming and agile processes in software engineering</article-title>
          (pp.
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          ). Springer Berlin Heidelberg
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brdiczka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducheneaut</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yee</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Begole</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Inferring Personality of Online Gamers by Fusing Multiple-View Predictions</article-title>
          . In User Modeling, Adaptation, and Personalization: 20th International Conference,
          <string-name>
            <surname>UMAP</surname>
          </string-name>
          <year>2012</year>
          , Montreal, Canada,
          <source>July 16-20</source>
          ,
          <year>2012</year>
          , Proceedings, pages
          <fpage>261</fpage>
          -
          <lpage>273</lpage>
          . Springer Berlin Heidelberg
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eichstaedt</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dziurzynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramones</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , …
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L. H.</given-names>
          </string-name>
          <year>2013</year>
          . Personality, Gender, and
          <article-title>Age in the Language of Social Media: The Open-Vocabulary Approach</article-title>
          .
          <source>PLoS ONE</source>
          ,
          <volume>8</volume>
          (
          <issue>9</issue>
          ),
          <year>e73791</year>
          . http://doi.org/10.1371/journal.pone.0073791
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>W.R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>D.N.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Personality Profiling from Text: Introducing Part-of-Speech N-Grams</article-title>
          . In V. Dimitrova. T. Kuflik,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dolog</surname>
          </string-name>
          , G.-J. Houben (Eds.),
          <string-name>
            <surname>User</surname>
            <given-names>Modeling</given-names>
          </string-name>
          , Adaptation, and
          <string-name>
            <surname>Personalization</surname>
          </string-name>
          , 22nd International Conference, UMAP 2014, Aalborg, Denmark, July 7-
          <issue>11</issue>
          ,
          <year>2014</year>
          Proceedings, pp.
          <fpage>243</fpage>
          -
          <lpage>253</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>