=Paper=
{{Paper
|id=Vol-1137/LA_machinelearning_submission_1
|storemode=property
|title=Automated Cognitive Presence Detection in Online Discussion Transcripts
|pdfUrl=https://ceur-ws.org/Vol-1137/LA_machinelearning_submission_1.pdf
|volume=Vol-1137
|dblpUrl=https://dblp.org/rec/conf/lak/KovanovicJGH14
}}
==Automated Cognitive Presence Detection in Online Discussion Transcripts==
Automated Cognitive Presence Detection in Online Discussion Transcripts Vitomir Kovanovic Srecko Joksimovic Dragan Gasevic Simon Fraser University Simon Fraser University Athabasca University Vancouver, BC, Canada Vancouver, BC, Canada Edmonton, AB, Canada vitomir_kovanovic@sfu.ca sjoksimo@sfu.ca dgasevic@acm.org Marek Hatala Simon Fraser University Vancouver, BC, Canada mhatala@sfu.ca ABSTRACT theories of educational research, and focus mostly on the quanti- In this paper we present the results of an exploratory study that tative aspects of the trace and log data. Given the need to assess examined the use of text mining and text classification for the au- the qualitative aspects of the learning products this is not enough. tomation of the content analysis of discussion transcripts within To address this issue, we base our transcript analysis approach on the context of distance education. We used Community of In- the well established Community of Inquiry (CoI) model of distance quiry (CoI) framework and focused on the content analysis of the education [10, 11] which is used for more than a decade to answer cognitive presence construct given its central position within the this type of questions. CoI model. Our results demonstrate the potentials of proposed ap- In this paper we present the results of a study that focused on the proach; The developed classifier achieved 58.4% accuracy and Co- automation of the content analysis of discussion transcripts using hen’s Kappa of 0.41 for the 5-category classification task. In this Community of Inquiry coding technique. We developed an SVM- paper we analyze different classification features and describe the based classifier for automatic classification of the discussion tran- main problems and lessons learned from the development of such a scripts in accordance with the CoI framework, and we discuss in system. Furthermore, we analyzed the use of several novel classifi- detail the challenges and issues with this type of text classification, cation features that are based on the specifics of cognitive presence most notably the creation of the relevant classification features. construct and our results indicate that some of them significantly improve classification accuracy. 2. BACKGROUND WORK We based our work on the theoretical foundations of the Com- 1. INTRODUCTION munity of Inquiry framework and previous work done in the field One of the important aspects of modern distance education is the of text classification. In this section we will present an overview of focus on the social construction of the knowledge by the means of the Community of Inquiry framework and the relevant findings in asynchronous discussion groups [2]. Their increased usage in dis- text classification field that informed our approach. tance education has produced an abundant amount of records on the learning processes [7]. Educational researchers recognized the 2.1 Community of Inquiry (CoI) Framework importance of this "gold-mine of information" [14] about the learn- Among the different techniques for assessment of quality of dis- ing process, and used it mainly for research, usually long after the tance education environments, one of the best-researched models courses are over. Nowadays, there is a need to analyze this learners that comprehensively explain different dimensions of social learn- generated data in automatic and continuous fashion in order to in- ing is Community of Inquiry (CoI) model [10, 11]. The model con- form instructors, and student about the current student performance sists of the three interdependent constructs that together provide and possible learning outcomes. Learning Analytics, an emerging comprehensive coverage of distance learning phenomena [10, 11]: research field that aims to make a sense of the large volume of edu- i) Social presence describes relationships and social climate in a cational data in order to understand and improve learning [21], is a learning community [10], ii) Cognitive presence describes the dif- promising area of research that could be successfully used to ana- ferent phases of students’ cognitive engagement and knowledge lyze and understand the discussion transcript logs in their full com- construction [11], and iii) Teaching presence explains the instruc- plexity. However, at the moment the majority of the approaches for tor’s role in the course planning and execution [10]. analysis of discussion transcripts are not based on the established For our study the most important is the Cognitive Presence con- struct which is defined as “an extent to which the participants in Permission to make digital or hard copies of all or part of this work for any particular configuration of a community of inquiry are able to personal or classroom use is granted without fee provided that copies are not construct meaning through sustained communication.” [10, p .89]. made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components The model defines four different phases of cognitive presence: of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to 1. Triggering event: In this phase some issues, dilemma or prob- redistribute to lists, requires prior specific permission and/or a fee. Request lem is identified. Often, in the formal educational context, permissions from Permissions@acm.org. LAK ’14 March 24 - 28 2014, Indianapolis, IN, USA they are explicitly defined by the instructors, but also can be Copyright 2014 ACM 978-1-4503-2664-3/14/03 ...$15.00. created by any student that participates in the discussions [11]. 2. Exploration: In this phase students move between their pri- ID Phase Messages (%) vate reflective world and shared world where social construc- 0 Other 140 8.01% tion of knowledge happens [11]. 1 Triggering Event 308 17.63% 2 Exploration 684 39.17% 3. Integration: This phase is characterized by the synthesis of 3 Integration 508 29.08% the ideas that are generated in the exploration phase and ulti- 4 Resolution 107 6.12% mately construction of meaning. All phases 1747 100% 4. Resolution: In this phase students analyze practical applica- Table 1: Number of Messages in Different Phases of Cognitive bility of the generated knowledge, test different hypotheses, Presence and ultimately start a new cycle of knowledge construction by generating a new triggering event. The framework comes with its own content analysis scheme and as a ModifierWord provides a small, but statistically significant it attracted a lot of attention in the research community resulting improvement over baseline unigram model [15]. in the fairly large number of replication studies and empirical test- Another type of features that are also utilized are word pattern ing of the framework [12]. However, even though Community of features. For example, for sarcasm detection in online product Inquiry proved to be a viable model for assessing learning quality reviews Tsur and Davidov [23] used K-nearest neighbors (KNN) in an online educational contexts, the practical issues of applying classifier with the patterns of 1-6 content words and 2-6 high fre- CoI analysis and its coding scheme remain; It is still a manual, time quency words (i.e., words that occur frequently in many reviews) as consuming process which makes the coding of the messages very classification features. In the context of stylistic differences among expensive. For example, for the study presented here, it took ap- genders, the idea of pattern features was further expanded by Gian- proximately one month for the two coders to manually code the fortoni et al. [13] with more complex notion of word patterns, how- 1747 discussion messages. This need for manual coding has been ever, reported results showed very modest improvement in classifi- pointed as one of the main reasons why many transcript analysis cation accuracy achieving Cohen’s Kappa of only 0.18 in the best techniques had almost no impact on educational practice and never case. moved out of the domain of educational research [7]. In order to Finally, there are other approaches as well, most notably the ones support broader adoption of CoI framework there is a need for an which are based on the use of Latent Semantic Analysis (LSA) in automation of the coding process, and that is the exact purpose of the context of automate assessment of student essays [8] or the use this study. We focus on the coding of the cognitive presence, how- of more complex features which make a use of genetic program- ever the overall goal is to automate content analysis for all three ming [4, 18]. CoI presences in order to provide a comprehensive picture of the In terms of the classification methods used, the majority of ap- learning process. This would allow instructors to adopt CoI frame- proaches use K Nearest Neighbors (KNN) or Support Vector Ma- work for guiding instructional interventions, and to provide learn- chines (SVM) algorithms. SVM is particularly popular algorithm ers with feedback making them more aware of their own learning for text classification and according to Aggarwal and Zhai [1], “text and learning of their peers. data is ideally suited for SVM classification because of the sparse high-dimensional nature of text, in which few features are irrele- 2.2 Text Classification and Automatic Content vant, but they tend to be correlated with one another and gener- Analysis Approaches ally organized into linearly separable categories”[pg. 195]. SVM In order to automate the content analysis of discussion transcripts, classifiers also work well with a large number of weak predictors, we adopted text mining classification techniques [1]. As the cog- which is the case of text classification where typically the majority nitive presence is a latent construct and not clearly observable, we of features are very weakly predicting class label [18]. based our work on the previous work that focused also on mining latent constructs. The work done on opinion mining of online prod- uct reviews [3, 15, 23], gender style differences [13] and sentiment 3. METHODS analysis [4] are some of the main areas of research that informed our classification approach. 3.1 Data set The text classification tasks have been studied in the context of For the purpose of our study, we used the data set obtained from several different areas. In general, the majority of the studies exten- a graduate level course in Software Engineering from a Canadian sively used lexical features such as N-grams, Part-of-Speech (PoS) fully distance learning university. The data set consists of 1747 tags and word dependency triplets, or some mixture of them as their messages which were coded by two human coders for the levels of main type of features. For example, for the problem of classifying cognitive presence. Coders achieved excellent interrater agreement online product reviews as either based of qualified or unqualified (percent agreement=98.1%, Cohen’s Kappa=0.974) indicating the claims, Arora et al. [3] used the combination of N-grams, PoS bi- quality of the content analysis scheme. The most frequent type of grams and dependency triples with the approximation of syntactic messages were exploration messages occurring on average 39% of scope [3]. Authors achieved Cohen’s Kappa of .353 and classifi- the time (Table 1) while the least frequent were resolution messages cation accuracy of 72.6% for their binary classification task. For occurring on average in 6% of the cases. These large differences the similar problem, Joshi and Penstein-Rosé [15] used word de- in the category distributions are not surprising as they are shown pendency tripletsas fea- by the previous work in CoI research field. The reason for this is tures where Rel is a grammatical relation between the words (e.g., that the majority of students are not progressing to the later stages Adjective), while HeadWord and ModifierWord are either a of cognitive presence [11] which in turn limits the potential for concrete words (e.g., Camera, Great) or PoS classes (e.g., Noun, development of their critical thinking skills. Thus, even though we Adverb). Their study showed that in the context of opinion min- have 5 categories, the baseline accuracy using the simplest majority ing use of the PoS class as a HeadWord and the concrete word vote classification is 39%. Feature Type Feature Names Feature Set Additional Classification Cohen’s P -val Features Accuracy Kappa N-grams unigrams, bigrams, majority vote baseline 0 0.392 0.000 trigrams unigrams baseline 2241 0.547 0.364 Part-of-Speech N-grams pos-bigrams, + bigrams 3155 0.556 0.376 0.427 pos-trigrams + trigrams 911 0.554 0.374 0.571 Back-Off N-grams bo-bigrams, + pos-bigrams 737 0.561 0.385 0.249 bo-trigrams + pos-trigrams 2810 0.560 0.382 0.304 + bo-bigrams 6953 0.560 0.381 0.323 Dependency Triplets dep-triplets + bo-trigrams 17986 0.584 0.410 0.006 Back-Off Dependency Triplets h-bo-triplets, + dep-triplets 1435 0.564 0.386 0.062 m-bo-triplets, + h-bo-triplets 1931 0.571 0.396 0.031 hm-bo-triplets + m-bo-triplets 2771 0.579 0.406 0.003 + hm-bo-triplets 1375 0.558 0.379 0.359 Named Entities entity-count + entity-count 1 0.559 0.381 0.030 Thread Position Features is-first, + is-first 1 0.555 0.375 0.037 is-reply-first + is-reply-first 1 0.550 0.367 0.665 Table 2: Extracted Features Table 3: Classification Results. Bold typeface indicates statisti- cally significant features 3.2 Feature Extraction Based on our literature review described in Section 2, we ex- the data) in order to keep the number of features reasonable and to tracted a wide variety of features that were frequently used in the protect from overfitting the classifier to the noise in the data which similar studies (Table 2). We extracted the commonly used N- is captured by the low supported features. We used linear kernel gram features (i.e., unigrams, bigrams and trigrams) and Part-of- and default values of parameters (C = 1, gamma = 1/k). In or- Speech (PoS) bigrams and trigrams. In addition, similarly to the der to compare different set of features we used McNemar’s test [9] works of Joshi and Penstein-Rosé [15] we extracted: i) back-off as it is shown to have low Type I error rate [6]. versions of bigrams and trigrams by replacing one or more words To implement the classifier and feature extraction we used sev- in a N-gram by the corresponding PoS tag, and ii) word dependency eral popular open source tools and libraries. In the feature extrac- triplets and their back-off versions. Finally, in addition to the fea- tion step we used Stanford CoreNLP suite1 of tools for tokeniza- tures found in the research literature, we extracted an additional set tion, Part-of-Speech tagging [22] and dependency parsing [17]. We of features which we thought might be useful given the specifics of used the popular Weka [24] data mining toolkit and LibSVM li- the cognitive presence construct. brary [5] for developing the classifier, and to implement the McNe- Given the difference among the phases of Cognitive Presence, mar’s test we used Java Statistical Classes (JSC) library2 . we extracted the entity-count feature, which shows the num- ber of named entities that were mentioned in the message using 4. RESULTS DBPedia Spotlight [19] web service. The rationale behind this fea- Table 3 shows the results of our classification experiment. The ture is that different phases of cognitive presence could be poten- baseline unigram model achieved 54.72% accuracy which is slightly tially characterized by the different number of constructs that were less than in the case of the more complex models with larger num- discussed in the message. For example, it might be the case that ex- ber of features. The biggest improvement was observed by adding ploration messages contain on average a larger number of concepts, the back-off version of trigrams which improved classification ac- as one of the key characteristics of exploration is brainstorming of curacy to 58.38% and Cohen’s Kappa to 0.41 which is accompanied different problem solutions and ideas [11]. with the largest increase in the feature space. Another important aspect of cognitive presence is that it devel- Our results are similar to the ones of Arora et al. [3] and Joshi ops over time through the communication with other students [11]. and Penstein-Rosé [15] with our classifier having somewhat lower In practice this means that triggering and exploration messages absolute levels of accuracy and a slightly bigger values of Cohen’s are more likely to be observed in the early stages of discussions, Kappa metric. Our results also show that adding both head-backoff while integration and resolution messages are more likely in the and modifier-backoff versions of dependency triplets improves the later stages of the discussions. To test this hypothesis, as the first classification accuracy, as well as the ordinary dependency triplets. step we extracted two simple features: i) is-first, which in- With respect to the three features that we proposed, the use of the dicates whether a message is the first in the discussion topic, and indicators for the number of named entities (i..e, entity-count) ii) is-reply-first which indicates whether a message is the and discussion starters (i.e., is-first) also showed statistically reply to the original discussion opening message. significant improvement over the baseline unigram model. In ad- 3.3 Classifier Implementation dition, the use of those features has an almost nonexistent impact on the classifier feature space making the building of classification For the purpose of this study we decided to use SVM classifi- model faster and more interpretable. cation as it is a well known and popular technique especially well suited for the purpose of text classification as we described in Sec- tion 2. In order to maximize classification quality and assess the 5. CONCLUSIONS AND FUTURE WORK usefulness of different types of features, we experimented with the As our results show, the proposed approach for automating con- several different sets of features and evaluated them using 10-fold tent analysis seems promising. The current level of Cohen’s Kappa cross validation, which is considered a good compromise between 1 sizes of training and test data [20]. We used only features that had http://nlp.stanford.edu/software/corenlp.shtml 2 support threshold of 10 or more (i.e., occurring 10 or more times in http://www.jsc.nildram.co.uk/index.htm is at the lower end of 0.4-0.7 range which is considered to be a fair [12] D. R. Garrison, T. Anderson, and W. Archer. The first decade to good agreement [16]. However, in order to replace manual mes- of the community of inquiry framework: A retrospective. The sage coding, the Cohen’s Kappa should be above 0.7 level which is Internet and Higher Education, 13(1–2):5–9, Jan. 2010. still out of reach. One important aspect of coding discussion transcripts that we [13] P. Gianfortoni, D. Adamson, and C. P. Rosé. Modeling of observed and which does not affect the work that we reviewed is stylistic variation in social media with stretchy patterns. In message quoting. We observed many instances in which student Proceedings of the First Workshop on Algorithms and Re- puts direct quotation of others’ message into his own which makes sources for Modelling of Dialects and Language Varieties, a problem for classification based on lexical features such as N- page 49–59, Stroudsburg, PA, USA, 2011. Association for grams, PoS tags or Dependency triplets. In our future works we Computational Linguistics. will look for a ways to address this issue and to estimate the impact [14] F. Henri. Computer conferencing and content analysis. In of quoting on classification accuracy. A. R. Kaye, editor, Collaborative Learning Through Com- We also showed the potential of novel features which are based puter Conferencing, pages 117–136. Springer Berlin Heidel- on the deeper theoretical understanding of the latent construct un- berg, Jan. 1992. der interest and its coding instrument. They could provide a sig- nificant improvement of the classification accuracy without a big [15] M. Joshi and C. Penstein-Rosé. Generalizing dependency impact on the feature space complexity. features for opinion mining. In Proceedings of the ACL- IJCNLP 2009 Conference, page 313–316, Stroudsburg, PA, References USA, 2009. Association for Computational Linguistics. [1] C. C. Aggarwal and C. Zhai. Mining Text Data. Springer, [16] K. H. Krippendorff. Content Analysis: An Introduction to Its Feb. 2012. Methodology. Sage Publications, 0 edition, Dec. 2003. [2] T. Anderson and J. Dron. Three generations of distance ed- [17] M.-C. d. Marneffe, B. MacCartney, and C. D. Manning. Gen- ucation pedagogy. The International Review of Research in erating typed dependency parses from phrase structure parses. Open and Distance Learning, 12(3):80–97, Nov. 2010. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), page 449–454, 2006. [3] S. Arora, M. Joshi, and C. P. Rosé. Identifying types of claims in online customer reviews. In Proceedings of the [18] E. Mayfield and C. Penstein-Rosé. Using feature construc- HLT-NAACL 2009, page 37–40, Stroudsburg, PA, USA, 2009. tion to avoid large feature spaces in text classification. In Association for Computational Linguistics. Proceedings of the 12th annual conference on Genetic and evolutionary computation, page 1299–1306, New York, NY, [4] S. Arora, E. Mayfield, C. Penstein-Rosé, and E. Nyberg. Sen- USA, 2010. ACM. timent classification using automatically extracted subgraph features. In Proceedings of the NAACL HLT 2010 Workshop [19] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DB- on Computational Approaches to Analysis and Generation of pedia spotlight: shedding light on the web of documents. In Emotion in Text, page 131–139, Stroudsburg, PA, USA, 2010. Proceedings of the 7th International Conference on Semantic Association for Computational Linguistics. Systems, page 1–8, New York, NY, USA, 2011. ACM. [5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support [20] P. Refaeilzadeh, L. Tang, and H. Liu. Cross-validation. In vector machines. ACM Transactions on Intelligent Systems L. LIU and M. T. ÖZSU, editors, Encyclopedia of Database and Technology, 2:27:1–27:27, 2011. Systems, pages 532–538. Springer US, Jan. 2009. [6] T. G. Dietterich. Approximate statistical tests for comparing [21] G. Siemens, D. Gasevic, C. Haythornthwaite, S. Dawson, supervised classification learning algorithms. Neural Com- S. B. Shum, R. Ferguson, E. Duval, K. Verbert, and R. S. put., 10(7):1895–1923, Oct. 1998. d Baker. Open learning analytics: an integrated & modular- ized platform. Proposal to design, implement and evaluate an [7] R. Donnelly and J. Gardner. Content analysis of computer open platform to integrate heterogeneous learning analytics conferencing transcripts. Interactive Learning Environments, techniques, 2011. 19(4):303–315, 2011. [22] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. [8] R. M. Duwairi. A framework for the computerized assessment Feature-rich part-of-speech tagging with a cyclic dependency of university student essays. Computers in Human Behavior, network. In Proceedings of the HLT-NAACL 2003, page 22(3):381–388, May 2006. 252–259, 2003. [9] B. Everitt. The analysis of contingency tables. Chapman and [23] O. Tsur and D. Davidov. Icwsm – a great catchy name: Semi- Hall, 1977. supervised recognition of sarcastic sentences in product re- [10] D. R. Garrison, T. Anderson, and W. Archer. Critical in- views. In Proceeding of the International AAAI Conference quiry in a text-based environment: Computer conferencing on Weblogs and Social Media, 2010. in higher education. The Internet and Higher Education, 2 [24] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Prac- (2–3):87–105, 1999. tical Machine Learning Tools and Techniques, Third Edition. [11] D. R. Garrison, T. Anderson, and W. Archer. Critical think- Morgan Kaufmann, 3 edition, Jan. 2011. ing, cognitive presence, and computer conferencing in dis- tance education. American Journal of Distance Education, 15(1):7–23, 2001.