=Paper= {{Paper |id=Vol-1137/LA_machinelearning_submission_1 |storemode=property |title=Automated Cognitive Presence Detection in Online Discussion Transcripts |pdfUrl=https://ceur-ws.org/Vol-1137/LA_machinelearning_submission_1.pdf |volume=Vol-1137 |dblpUrl=https://dblp.org/rec/conf/lak/KovanovicJGH14 }} ==Automated Cognitive Presence Detection in Online Discussion Transcripts== https://ceur-ws.org/Vol-1137/LA_machinelearning_submission_1.pdf
           Automated Cognitive Presence Detection in Online
                       Discussion Transcripts

                   Vitomir Kovanovic                              Srecko Joksimovic                           Dragan Gasevic
                 Simon Fraser University                         Simon Fraser University                    Athabasca University
                 Vancouver, BC, Canada                           Vancouver, BC, Canada                     Edmonton, AB, Canada
            vitomir_kovanovic@sfu.ca                               sjoksimo@sfu.ca                          dgasevic@acm.org
                                                                     Marek Hatala
                                                                 Simon Fraser University
                                                                 Vancouver, BC, Canada
                                                                    mhatala@sfu.ca

ABSTRACT                                                                          theories of educational research, and focus mostly on the quanti-
In this paper we present the results of an exploratory study that                 tative aspects of the trace and log data. Given the need to assess
examined the use of text mining and text classification for the au-               the qualitative aspects of the learning products this is not enough.
tomation of the content analysis of discussion transcripts within                 To address this issue, we base our transcript analysis approach on
the context of distance education. We used Community of In-                       the well established Community of Inquiry (CoI) model of distance
quiry (CoI) framework and focused on the content analysis of the                  education [10, 11] which is used for more than a decade to answer
cognitive presence construct given its central position within the                this type of questions.
CoI model. Our results demonstrate the potentials of proposed ap-                    In this paper we present the results of a study that focused on the
proach; The developed classifier achieved 58.4% accuracy and Co-                  automation of the content analysis of discussion transcripts using
hen’s Kappa of 0.41 for the 5-category classification task. In this               Community of Inquiry coding technique. We developed an SVM-
paper we analyze different classification features and describe the               based classifier for automatic classification of the discussion tran-
main problems and lessons learned from the development of such a                  scripts in accordance with the CoI framework, and we discuss in
system. Furthermore, we analyzed the use of several novel classifi-               detail the challenges and issues with this type of text classification,
cation features that are based on the specifics of cognitive presence             most notably the creation of the relevant classification features.
construct and our results indicate that some of them significantly
improve classification accuracy.                                                  2.     BACKGROUND WORK
                                                                                     We based our work on the theoretical foundations of the Com-
1.    INTRODUCTION                                                                munity of Inquiry framework and previous work done in the field
   One of the important aspects of modern distance education is the               of text classification. In this section we will present an overview of
focus on the social construction of the knowledge by the means of                 the Community of Inquiry framework and the relevant findings in
asynchronous discussion groups [2]. Their increased usage in dis-                 text classification field that informed our approach.
tance education has produced an abundant amount of records on
the learning processes [7]. Educational researchers recognized the                2.1      Community of Inquiry (CoI) Framework
importance of this "gold-mine of information" [14] about the learn-                  Among the different techniques for assessment of quality of dis-
ing process, and used it mainly for research, usually long after the              tance education environments, one of the best-researched models
courses are over. Nowadays, there is a need to analyze this learners              that comprehensively explain different dimensions of social learn-
generated data in automatic and continuous fashion in order to in-                ing is Community of Inquiry (CoI) model [10, 11]. The model con-
form instructors, and student about the current student performance               sists of the three interdependent constructs that together provide
and possible learning outcomes. Learning Analytics, an emerging                   comprehensive coverage of distance learning phenomena [10, 11]:
research field that aims to make a sense of the large volume of edu-              i) Social presence describes relationships and social climate in a
cational data in order to understand and improve learning [21], is a              learning community [10], ii) Cognitive presence describes the dif-
promising area of research that could be successfully used to ana-                ferent phases of students’ cognitive engagement and knowledge
lyze and understand the discussion transcript logs in their full com-             construction [11], and iii) Teaching presence explains the instruc-
plexity. However, at the moment the majority of the approaches for                tor’s role in the course planning and execution [10].
analysis of discussion transcripts are not based on the established                  For our study the most important is the Cognitive Presence con-
                                                                                  struct which is defined as “an extent to which the participants in
Permission to make digital or hard copies of all or part of this work for         any particular configuration of a community of inquiry are able to
personal or classroom use is granted without fee provided that copies are not
                                                                                  construct meaning through sustained communication.” [10, p .89].
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components    The model defines four different phases of cognitive presence:
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to         1. Triggering event: In this phase some issues, dilemma or prob-
redistribute to lists, requires prior specific permission and/or a fee. Request           lem is identified. Often, in the formal educational context,
permissions from Permissions@acm.org.
LAK ’14 March 24 - 28 2014, Indianapolis, IN, USA                                         they are explicitly defined by the instructors, but also can be
Copyright 2014 ACM 978-1-4503-2664-3/14/03 ...$15.00.                                     created by any student that participates in the discussions [11].
   2. Exploration: In this phase students move between their pri-                     ID    Phase              Messages        (%)
      vate reflective world and shared world where social construc-                   0     Other                    140    8.01%
      tion of knowledge happens [11].                                                 1     Triggering Event         308   17.63%
                                                                                      2     Exploration              684   39.17%
   3. Integration: This phase is characterized by the synthesis of                    3     Integration              508   29.08%
      the ideas that are generated in the exploration phase and ulti-                 4     Resolution               107    6.12%
      mately construction of meaning.                                                       All phases              1747     100%

   4. Resolution: In this phase students analyze practical applica-
                                                                          Table 1: Number of Messages in Different Phases of Cognitive
      bility of the generated knowledge, test different hypotheses,
                                                                          Presence
      and ultimately start a new cycle of knowledge construction
      by generating a new triggering event.

   The framework comes with its own content analysis scheme and           as a ModifierWord provides a small, but statistically significant
it attracted a lot of attention in the research community resulting       improvement over baseline unigram model [15].
in the fairly large number of replication studies and empirical test-        Another type of features that are also utilized are word pattern
ing of the framework [12]. However, even though Community of              features. For example, for sarcasm detection in online product
Inquiry proved to be a viable model for assessing learning quality        reviews Tsur and Davidov [23] used K-nearest neighbors (KNN)
in an online educational contexts, the practical issues of applying       classifier with the patterns of 1-6 content words and 2-6 high fre-
CoI analysis and its coding scheme remain; It is still a manual, time     quency words (i.e., words that occur frequently in many reviews) as
consuming process which makes the coding of the messages very             classification features. In the context of stylistic differences among
expensive. For example, for the study presented here, it took ap-         genders, the idea of pattern features was further expanded by Gian-
proximately one month for the two coders to manually code the             fortoni et al. [13] with more complex notion of word patterns, how-
1747 discussion messages. This need for manual coding has been            ever, reported results showed very modest improvement in classifi-
pointed as one of the main reasons why many transcript analysis           cation accuracy achieving Cohen’s Kappa of only 0.18 in the best
techniques had almost no impact on educational practice and never         case.
moved out of the domain of educational research [7]. In order to             Finally, there are other approaches as well, most notably the ones
support broader adoption of CoI framework there is a need for an          which are based on the use of Latent Semantic Analysis (LSA) in
automation of the coding process, and that is the exact purpose of        the context of automate assessment of student essays [8] or the use
this study. We focus on the coding of the cognitive presence, how-        of more complex features which make a use of genetic program-
ever the overall goal is to automate content analysis for all three       ming [4, 18].
CoI presences in order to provide a comprehensive picture of the             In terms of the classification methods used, the majority of ap-
learning process. This would allow instructors to adopt CoI frame-        proaches use K Nearest Neighbors (KNN) or Support Vector Ma-
work for guiding instructional interventions, and to provide learn-       chines (SVM) algorithms. SVM is particularly popular algorithm
ers with feedback making them more aware of their own learning            for text classification and according to Aggarwal and Zhai [1], “text
and learning of their peers.                                              data is ideally suited for SVM classification because of the sparse
                                                                          high-dimensional nature of text, in which few features are irrele-
2.2    Text Classification and Automatic Content                          vant, but they tend to be correlated with one another and gener-
       Analysis Approaches                                                ally organized into linearly separable categories”[pg. 195]. SVM
   In order to automate the content analysis of discussion transcripts,   classifiers also work well with a large number of weak predictors,
we adopted text mining classification techniques [1]. As the cog-         which is the case of text classification where typically the majority
nitive presence is a latent construct and not clearly observable, we      of features are very weakly predicting class label [18].
based our work on the previous work that focused also on mining
latent constructs. The work done on opinion mining of online prod-
uct reviews [3, 15, 23], gender style differences [13] and sentiment
                                                                          3.    METHODS
analysis [4] are some of the main areas of research that informed
our classification approach.                                              3.1    Data set
   The text classification tasks have been studied in the context of         For the purpose of our study, we used the data set obtained from
several different areas. In general, the majority of the studies exten-   a graduate level course in Software Engineering from a Canadian
sively used lexical features such as N-grams, Part-of-Speech (PoS)        fully distance learning university. The data set consists of 1747
tags and word dependency triplets, or some mixture of them as their       messages which were coded by two human coders for the levels of
main type of features. For example, for the problem of classifying        cognitive presence. Coders achieved excellent interrater agreement
online product reviews as either based of qualified or unqualified        (percent agreement=98.1%, Cohen’s Kappa=0.974) indicating the
claims, Arora et al. [3] used the combination of N-grams, PoS bi-         quality of the content analysis scheme. The most frequent type of
grams and dependency triples with the approximation of syntactic          messages were exploration messages occurring on average 39% of
scope [3]. Authors achieved Cohen’s Kappa of .353 and classifi-           the time (Table 1) while the least frequent were resolution messages
cation accuracy of 72.6% for their binary classification task. For        occurring on average in 6% of the cases. These large differences
the similar problem, Joshi and Penstein-Rosé [15] used word de-           in the category distributions are not surprising as they are shown
pendency triplets  as fea-                    by the previous work in CoI research field. The reason for this is
tures where Rel is a grammatical relation between the words (e.g.,        that the majority of students are not progressing to the later stages
Adjective), while HeadWord and ModifierWord are either a                  of cognitive presence [11] which in turn limits the potential for
concrete words (e.g., Camera, Great) or PoS classes (e.g., Noun,          development of their critical thinking skills. Thus, even though we
Adverb). Their study showed that in the context of opinion min-           have 5 categories, the baseline accuracy using the simplest majority
ing use of the PoS class as a HeadWord and the concrete word              vote classification is 39%.
      Feature Type                       Feature Names                       Feature Set              Additional   Classification   Cohen’s P -val
                                                                                                       Features        Accuracy      Kappa
      N-grams                            unigrams,
                                         bigrams,                            majority vote baseline           0            0.392      0.000
                                         trigrams                            unigrams baseline             2241            0.547      0.364
      Part-of-Speech N-grams             pos-bigrams,                        + bigrams                     3155            0.556      0.376   0.427
                                         pos-trigrams                        + trigrams                     911            0.554      0.374   0.571
      Back-Off N-grams                   bo-bigrams,                         + pos-bigrams                  737            0.561      0.385   0.249
                                         bo-trigrams                         + pos-trigrams                2810            0.560      0.382   0.304
                                                                             + bo-bigrams                  6953            0.560      0.381   0.323
      Dependency Triplets                dep-triplets                        + bo-trigrams                17986            0.584      0.410   0.006
      Back-Off Dependency Triplets       h-bo-triplets,                      + dep-triplets                1435            0.564      0.386   0.062
                                         m-bo-triplets,                      + h-bo-triplets               1931            0.571      0.396   0.031
                                         hm-bo-triplets                      + m-bo-triplets               2771            0.579      0.406   0.003
                                                                             + hm-bo-triplets              1375            0.558      0.379   0.359
      Named Entities                     entity-count                        + entity-count                   1            0.559      0.381   0.030
      Thread Position Features           is-first,                           + is-first                       1            0.555      0.375   0.037
                                         is-reply-first                      + is-reply-first                 1            0.550      0.367   0.665


                       Table 2: Extracted Features                       Table 3: Classification Results. Bold typeface indicates statisti-
                                                                         cally significant features
3.2      Feature Extraction
   Based on our literature review described in Section 2, we ex-         the data) in order to keep the number of features reasonable and to
tracted a wide variety of features that were frequently used in the      protect from overfitting the classifier to the noise in the data which
similar studies (Table 2). We extracted the commonly used N-             is captured by the low supported features. We used linear kernel
gram features (i.e., unigrams, bigrams and trigrams) and Part-of-        and default values of parameters (C = 1, gamma = 1/k). In or-
Speech (PoS) bigrams and trigrams. In addition, similarly to the         der to compare different set of features we used McNemar’s test [9]
works of Joshi and Penstein-Rosé [15] we extracted: i) back-off          as it is shown to have low Type I error rate [6].
versions of bigrams and trigrams by replacing one or more words             To implement the classifier and feature extraction we used sev-
in a N-gram by the corresponding PoS tag, and ii) word dependency        eral popular open source tools and libraries. In the feature extrac-
triplets and their back-off versions. Finally, in addition to the fea-   tion step we used Stanford CoreNLP suite1 of tools for tokeniza-
tures found in the research literature, we extracted an additional set   tion, Part-of-Speech tagging [22] and dependency parsing [17]. We
of features which we thought might be useful given the specifics of      used the popular Weka [24] data mining toolkit and LibSVM li-
the cognitive presence construct.                                        brary [5] for developing the classifier, and to implement the McNe-
   Given the difference among the phases of Cognitive Presence,          mar’s test we used Java Statistical Classes (JSC) library2 .
we extracted the entity-count feature, which shows the num-
ber of named entities that were mentioned in the message using           4.       RESULTS
DBPedia Spotlight [19] web service. The rationale behind this fea-
                                                                            Table 3 shows the results of our classification experiment. The
ture is that different phases of cognitive presence could be poten-
                                                                         baseline unigram model achieved 54.72% accuracy which is slightly
tially characterized by the different number of constructs that were
                                                                         less than in the case of the more complex models with larger num-
discussed in the message. For example, it might be the case that ex-
                                                                         ber of features. The biggest improvement was observed by adding
ploration messages contain on average a larger number of concepts,
                                                                         the back-off version of trigrams which improved classification ac-
as one of the key characteristics of exploration is brainstorming of
                                                                         curacy to 58.38% and Cohen’s Kappa to 0.41 which is accompanied
different problem solutions and ideas [11].
                                                                         with the largest increase in the feature space.
   Another important aspect of cognitive presence is that it devel-
                                                                            Our results are similar to the ones of Arora et al. [3] and Joshi
ops over time through the communication with other students [11].
                                                                         and Penstein-Rosé [15] with our classifier having somewhat lower
In practice this means that triggering and exploration messages
                                                                         absolute levels of accuracy and a slightly bigger values of Cohen’s
are more likely to be observed in the early stages of discussions,
                                                                         Kappa metric. Our results also show that adding both head-backoff
while integration and resolution messages are more likely in the
                                                                         and modifier-backoff versions of dependency triplets improves the
later stages of the discussions. To test this hypothesis, as the first
                                                                         classification accuracy, as well as the ordinary dependency triplets.
step we extracted two simple features: i) is-first, which in-
                                                                         With respect to the three features that we proposed, the use of the
dicates whether a message is the first in the discussion topic, and
                                                                         indicators for the number of named entities (i..e, entity-count)
ii) is-reply-first which indicates whether a message is the
                                                                         and discussion starters (i.e., is-first) also showed statistically
reply to the original discussion opening message.
                                                                         significant improvement over the baseline unigram model. In ad-
3.3      Classifier Implementation                                       dition, the use of those features has an almost nonexistent impact
                                                                         on the classifier feature space making the building of classification
   For the purpose of this study we decided to use SVM classifi-
                                                                         model faster and more interpretable.
cation as it is a well known and popular technique especially well
suited for the purpose of text classification as we described in Sec-
tion 2. In order to maximize classification quality and assess the       5.       CONCLUSIONS AND FUTURE WORK
usefulness of different types of features, we experimented with the         As our results show, the proposed approach for automating con-
several different sets of features and evaluated them using 10-fold      tent analysis seems promising. The current level of Cohen’s Kappa
cross validation, which is considered a good compromise between
                                                                         1
sizes of training and test data [20]. We used only features that had         http://nlp.stanford.edu/software/corenlp.shtml
                                                                         2
support threshold of 10 or more (i.e., occurring 10 or more times in         http://www.jsc.nildram.co.uk/index.htm
is at the lower end of 0.4-0.7 range which is considered to be a fair   [12] D. R. Garrison, T. Anderson, and W. Archer. The first decade
to good agreement [16]. However, in order to replace manual mes-             of the community of inquiry framework: A retrospective. The
sage coding, the Cohen’s Kappa should be above 0.7 level which is            Internet and Higher Education, 13(1–2):5–9, Jan. 2010.
still out of reach.
   One important aspect of coding discussion transcripts that we        [13] P. Gianfortoni, D. Adamson, and C. P. Rosé. Modeling of
observed and which does not affect the work that we reviewed is              stylistic variation in social media with stretchy patterns. In
message quoting. We observed many instances in which student                 Proceedings of the First Workshop on Algorithms and Re-
puts direct quotation of others’ message into his own which makes            sources for Modelling of Dialects and Language Varieties,
a problem for classification based on lexical features such as N-            page 49–59, Stroudsburg, PA, USA, 2011. Association for
grams, PoS tags or Dependency triplets. In our future works we               Computational Linguistics.
will look for a ways to address this issue and to estimate the impact   [14] F. Henri. Computer conferencing and content analysis. In
of quoting on classification accuracy.                                       A. R. Kaye, editor, Collaborative Learning Through Com-
   We also showed the potential of novel features which are based            puter Conferencing, pages 117–136. Springer Berlin Heidel-
on the deeper theoretical understanding of the latent construct un-          berg, Jan. 1992.
der interest and its coding instrument. They could provide a sig-
nificant improvement of the classification accuracy without a big       [15] M. Joshi and C. Penstein-Rosé. Generalizing dependency
impact on the feature space complexity.                                      features for opinion mining. In Proceedings of the ACL-
                                                                             IJCNLP 2009 Conference, page 313–316, Stroudsburg, PA,
References                                                                   USA, 2009. Association for Computational Linguistics.
 [1] C. C. Aggarwal and C. Zhai. Mining Text Data. Springer,            [16] K. H. Krippendorff. Content Analysis: An Introduction to Its
     Feb. 2012.                                                              Methodology. Sage Publications, 0 edition, Dec. 2003.
 [2] T. Anderson and J. Dron. Three generations of distance ed-         [17] M.-C. d. Marneffe, B. MacCartney, and C. D. Manning. Gen-
     ucation pedagogy. The International Review of Research in               erating typed dependency parses from phrase structure parses.
     Open and Distance Learning, 12(3):80–97, Nov. 2010.                     In Proceedings of the International Conference on Language
                                                                             Resources and Evaluation (LREC), page 449–454, 2006.
 [3] S. Arora, M. Joshi, and C. P. Rosé. Identifying types of
     claims in online customer reviews. In Proceedings of the           [18] E. Mayfield and C. Penstein-Rosé. Using feature construc-
     HLT-NAACL 2009, page 37–40, Stroudsburg, PA, USA, 2009.                 tion to avoid large feature spaces in text classification. In
     Association for Computational Linguistics.                              Proceedings of the 12th annual conference on Genetic and
                                                                             evolutionary computation, page 1299–1306, New York, NY,
 [4] S. Arora, E. Mayfield, C. Penstein-Rosé, and E. Nyberg. Sen-            USA, 2010. ACM.
     timent classification using automatically extracted subgraph
     features. In Proceedings of the NAACL HLT 2010 Workshop            [19] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DB-
     on Computational Approaches to Analysis and Generation of               pedia spotlight: shedding light on the web of documents. In
     Emotion in Text, page 131–139, Stroudsburg, PA, USA, 2010.              Proceedings of the 7th International Conference on Semantic
     Association for Computational Linguistics.                              Systems, page 1–8, New York, NY, USA, 2011. ACM.
 [5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support           [20] P. Refaeilzadeh, L. Tang, and H. Liu. Cross-validation. In
     vector machines. ACM Transactions on Intelligent Systems                L. LIU and M. T. ÖZSU, editors, Encyclopedia of Database
     and Technology, 2:27:1–27:27, 2011.                                     Systems, pages 532–538. Springer US, Jan. 2009.
 [6] T. G. Dietterich. Approximate statistical tests for comparing      [21] G. Siemens, D. Gasevic, C. Haythornthwaite, S. Dawson,
     supervised classification learning algorithms. Neural Com-              S. B. Shum, R. Ferguson, E. Duval, K. Verbert, and R. S.
     put., 10(7):1895–1923, Oct. 1998.                                       d Baker. Open learning analytics: an integrated & modular-
                                                                             ized platform. Proposal to design, implement and evaluate an
 [7] R. Donnelly and J. Gardner. Content analysis of computer                open platform to integrate heterogeneous learning analytics
     conferencing transcripts. Interactive Learning Environments,            techniques, 2011.
     19(4):303–315, 2011.
                                                                        [22] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer.
 [8] R. M. Duwairi. A framework for the computerized assessment              Feature-rich part-of-speech tagging with a cyclic dependency
     of university student essays. Computers in Human Behavior,              network. In Proceedings of the HLT-NAACL 2003, page
     22(3):381–388, May 2006.                                                252–259, 2003.
 [9] B. Everitt. The analysis of contingency tables. Chapman and
                                                                        [23] O. Tsur and D. Davidov. Icwsm – a great catchy name: Semi-
     Hall, 1977.
                                                                             supervised recognition of sarcastic sentences in product re-
[10] D. R. Garrison, T. Anderson, and W. Archer. Critical in-                views. In Proceeding of the International AAAI Conference
     quiry in a text-based environment: Computer conferencing                on Weblogs and Social Media, 2010.
     in higher education. The Internet and Higher Education, 2
                                                                        [24] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Prac-
     (2–3):87–105, 1999.
                                                                             tical Machine Learning Tools and Techniques, Third Edition.
[11] D. R. Garrison, T. Anderson, and W. Archer. Critical think-             Morgan Kaufmann, 3 edition, Jan. 2011.
     ing, cognitive presence, and computer conferencing in dis-
     tance education. American Journal of Distance Education,
     15(1):7–23, 2001.