PRET: Prerequisite-Enriched Terminology. A Case Study on Educational Texts Chiara Alzetta, Forsina Koceva, Samuele Passalacqua, Ilaria Torre, Giovanni Adorni DIBRIS, University of Genoa (Italy) {chiara.alzetta,frosina.koceva}@edu.unige.it, samuele.passalacqua@dibris.unige.it, {ilaria.torre,adorni}@unige.it Abstract support the learning process respecting the prereq- uisite relation. English. In this paper we present PRET, a In the literature, the evaluation of the extracted gold dataset annotated for prerequisite re- prerequisite relations is usually performed through lations between educational concepts ex- comparison with a gold standard produced by hu- tracted from a computer science textbook, man subjects that annotate relations between con- and we describe the language and domain cepts (see, among the others, (Talukdar and Co- independent approach for the creation of hen, 2012; Liang et al., 2015; Fabbri et al., 2018)). the resource. Additionally, we have cre- However, most of the evaluations lack a systematic ated an annotation tool to support, validate approach or simply lack the details that allow them and analyze the annotation. to be repeated. In this paper, we present our ex- Italiano. In questo articolo presentiamo perience in building PRET (Prerequisite-Enriched PRET, un dataset annotato manualmente Terminology), a gold dataset annotated with the rispetto alla relazione di prerequisito fra prerequisite relation between pairs of concepts. concetti estratti da un manuale di infor- The issues emerged with PRET led us to define matica, e descriviamo la metodologia, in- a methodology and a tool for manual prerequisite dipendente da lingua e dominio, usata per annotation. The goal of the tool is to support the la creazione della risorsa. Per favorire creation of gold datasets for validating automatic l’annotazione, abbiamo creato uno stru- extraction of prerequisites. Both the PRET dataset mento per il supporto, la validazione e and the tool are available online1 . l’analisi dell’annotazione. PRET was constructed in two main steps: first we exploited computational linguistics methods to extract relevant terms from a textbook2 , then 1 Introduction we asked humans to manually identify and anno- Educational Concept Maps (ECM) are acyclic tate the prerequisite relations between educational graphs which formally represent a domain’s concepts. Since the terminology creation step was knowledge and make explicit the pedagogical de- extensively described in Adorni et al. (2018), this pendency relations between concepts (Adorni and paper mainly focuses on the annotation phase. Koceva, 2016). A concept, in an ECM, is an The annotation task consists in making explicit atomic piece of knowledge of the subject domain. the prerequisite relations between two distinct From a pedagogical point of view, the most im- concepts if the relation is somehow inferable from portant dependency relation between concepts is the text in question. We represent a concept as a the prerequisite relation, that explicits which con- domain–specific term denoting domain entities ex- cepts a student has to learn before moving to the pressed by either single nominal terms (e.g. inter- next. Several approaches have been proposed to net, network, software) or complex nominal struc- extract prerequisite relations from various educa- tures with modifiers (e.g. malicious software, tro- tional sources (Vuong et al., 2011; Yang et al., jan horse, HyperText Document). Figure 1 shows 2015; Gordon et al., 2016; Wang et al., 2016; 1 Liang et al., 2017; Liang et al., 2018; Adorni et http://teldh.dibris.unige.it/pret 2 For the annotation we used chapter 4 of the computer sci- al., 2018). Textbooks in particular are a valuable ence textbook “Computer Science: An Overview” (Brook- resource for this task since they are designed to shear and Brylow, 2015). ment usually results very low, so an expert can be consulted (Chaplot et al., 2016; Gordon et al., 2016). Regardless of the annotation methodology, we observe that in the mentioned related works prerequisite relation properties (i.e. irreflexivity, anti–symmetry, etc.) are rarely taken into account in the annotation instructions for annotators. For example, the fact that a concept cannot be anno- Figure 1: Sample of PRET represented as an tated as prerequisite of itself is usually left unspec- ECM. ified. To support the annotation of prerequisites be- tween pairs of concepts, Gordon et al. (2016) de- a sample of the ECM resulting from PRET. Ac- veloped an interface showing, for each concept of cording to PRET dataset, an example of prerequi- the domain, the list of relevant terms and docu- site relation is network is a prerequisite of internet, ments. Although this can be of some support for since a student has to know network before learn- the annotation providing certain useful informa- ing internet. tion, it cannot be considered an annotation tool it- The paper is organized as follows. The re- self. According to our knowledge, a tool specif- lated work pertaining to the proposed method is ically designed for prerequisite structure annota- discussed in Section 2. Section 3 describes the tion which also features agreement metrics is still methodology used for the creation of the PRET missing. dataset and Section 4 presents the characteristics of the obtained gold dataset and the agreement 3 Annotation Methodology computed for each pair of annotators together with other statistics about the data. Section 5 describes In Section 4 we will describe the PRET dataset, the main features of the annotation tool we de- while here we present the annotation methodology signed. Section 6 concludes the paper. that we used to build PRET and that we refined on the basis of such experience. 2 Related Work Concept identification. Our methodology for Automatic prerequisite identification is a task that prerequisite annotation requires that concepts are gained growing interest in recent years, especially extracted from educational materials, that we among scholars interested in automatic synthesis broadly define Document (D), and provided to an- of study plans (Gasparetti et al., 2015; Yang et al., notators. Although we are conscious that a con- 2015; Agrawal et al., 2016; Alsaad et al., 2018). cept, as mental structure, might entail multiple When applying automatic prerequisite extraction terms, we simplify the problem of concept iden- methods, a baseline for evaluation is needed. De- tification assuming that each relevant term of D spite being time consuming, creating manually an- represents a concept (Novak and Cañas, 2006). notated datasets is more effective and produces Thus, our list of concepts is a terminology (T) of gold resources, which are still rare. domain–specific terms (either single or complex To the best of our knowledge, Talukdar and Co- nominal structures) ordered according to the first hen (2012) is the only case where crowd–sourcing appearance of the terms of T in D and where each is employed for annotation: they infer prerequi- concept corresponds to a single term. site relationship between concepts by exploiting For the task of prerequisite annotation, it does hyper-links in Wikipedia pages and use crowd- not matter if concepts are extracted automati- sourcing to validate those relations in order to have cally, manually or semi–automatically. To build a gold training dataset for a classifier. PRET, we extracted concepts automatically. To More frequently the annotation of prerequisite identify our terminology T, we relied on Text- relations is performed by domain experts (Liang et To-Knowledge (T2K2 ) (Dell’Orletta et al., 2014), al., 2015; Liang et al., 2018; Fabbri et al., 2018) or a software platform developed at the Institute by students with a certain competence on the do- of Computational Linguistics A. Zampolli of the main (Wang et al., 2015; Pan et al., 2017). When CNR in Pisa. T2K2 exploits Natural Language annotation is performed by non–experts, agree- Processing, statistical text analysis and machine learning to extract and organize the domain knowl- prerequisite of WWW according to the transitive edge from a linguistically annotated text. property). We applied T2K2 to a text of 20,378 tokens dis- To keep the annotation as uniform as possible, tributed over 751 sentences. 185 terms were rec- we provided the annotators with suggestions on ognized as concepts of the domain (around 20% of how to perform the task together with the book the total number of nouns in the corpus). As ex- chapter and the terminology extracted from it. pected, the extracted terminology contained both Considering the material supplied, we asked an- single nominal structures, such as computer, net- notators to trust the text considering only pairs of work and software, and complex nominal struc- distinct concepts of T and annotating the existence tures with modifiers, like hypertext transfer pro- of a prerequisite relation between the two concepts tocol, world wide web and hypertext markup lan- only if derivable from D. In our method, annota- guage. The set of concepts did not go through any tors should read the text and, for each new concept post–processing phase. (i.e. never mentioned in the previous lines), iden- Annotators selection. The role of annotators is tify all its prerequisites, but, if no prerequisite can fundamental in order to obtain a gold dataset that be identified, they should not enter any annotation. represents the pedagogical relations expressed in We also wanted pedagogical relation properties to the educational material. Consequently, the choice be preserved, so we asked to respect the irreflex- of annotators is crucial. As mentioned above, in ive property not annotating self–prerequisites and the literature annotators are often domain experts to avoid adding transitive relations. Considering (Liang et al., 2015; Liang et al., 2018; Fabbri the topology of an ECM, we also asked annota- et al., 2018) or students with some knowledge in tors not to enter cycles in the annotation because that domain (Wang et al., 2015; Pan et al., 2017). they represent conceptually wrong relations. To Based on our experience with different types of better understand this point, consider the ECM in annotators, we suggest that annotators should have Figure 1: having a prerequisite relation between enough knowledge to understand the content of computer and network and between network and the educational material. Otherwise, the anno- internet, entering a relation where internet is pre- tation can be distorted by wrong comprehension requisite of computer would create a cycle (loop). of the relations between concepts. On the other The output of the annotation of each annota- hand, experts should not rely on their background tor is an enriched terminology: a set of concepts knowledge to identify relations, since the goal of paired and enhanced with the prerequisite relation. the annotation is to capture the knowledge embod- The enriched terminology can be used to create ied in the educational resource. To build PRET we an ECM where each concept is a node and the recruited 6 annotators among professors and PhD edges are prerequisite relations identified by hu- students working in fields related to computer sci- mans (see Figure 1). ence, but eventually 2 of them revealed not to have Annotation validation. Human annotators are enough knowledge for the task. not immune from making mistakes and violating Annotation task. A prerequisite relation be- the supplied recommendations. The tool we pro- tween two concepts A and B is defined as a de- pose addresses this issue by introducing controls pendency relation which represents what a learner to prevent the annotators from making errors (e.g. must know/study (concept A), before approaching cycles, reflexive relations, symmetric relations). concept B. Thus, by definition, the prerequisite re- In the next section we will describe the approach lation has the following properties: i) asymmetry: we used to identify some mistakes by using graph if concept A is a prerequisite of concept B, the op- analysis algorithms. posite cannot be true (e.g. network is prerequisite Annotators agreement evaluation. Our expe- of internet, so internet cannot be prerequisite of rience and the literature (Fabbri et al., 2018) show network); ii) irreflexivity: a concept cannot be pre- that human judgments about prerequisite identi- requisite of itself; iii) transitiveness: if concept A fication can vary considerably, even when strict is a prerequisite of concept B, and concept B of guidelines are provided. This can depend on sev- concept C, then concept A is also a prerequisite of eral factors, including the subjectivity of annota- concept C (e.g. browser is prerequisite of HTTP, tors and the type and complexity of D. Evaluating HTTP is prerequisite of WWW, hence browser is the annotators’ agreement can be useful to assess Relation Type Weight Count (%) The validation was conducted on the ECM derived Non–prerequisite 0 33,699 (98.46%) Prerequisite All weights 526 (154%) from the enriched terminology of each annotator 1 annot. 0.25 293 (55.70%) using a graph analysis algorithm. We operated on 2 annot. 0.50 131 (24.90%) cycles and transitive relations. In some variations, 3 annot. 0.75 75 (14.26%) 4 annot. 1 27 (5.13%) the latter were added if the pair of concepts in the Total number of pairs 34,225 ECM is connected by a path shorter than a certain threshold, defined by considering the ECM diame- Table 1: Relations and weight distribution in ter, while cycles were either preserved or removed PRET dataset. depending on the variation we wanted to obtain. Eventually, we obtained the following an- if the gold dataset is to be trusted or further an- notation variations: no cycles (removing cy- notators are required. Section 4 will describe the cles), cycles and transitive (preserving cycles measures we used to evaluate annotators’ agree- and adding transitive relations), cycles and non– ment in PRET. transitive (preserving cycles and keeping only di- The final combination of the enriched termi- rect links), no cycles and transitive (removing cy- nologies produced by each annotator is a neces- cles and adding transitivity) and no cycles and sary step to build a gold dataset but, due to space non–transitive (removing both cycles and transi- constraints, below we will only present our ap- tivity). proach, while a survey on combination metrics is out of the scope of this paper. 4.1 Annotators Agreement in PRET Following Artstein and Poesio (2008), we com- 4 The PRET Dataset puted the agreement between multiple annotators The PRET gold dataset consists of 34,225 con- using Fleiss’ k (Fleiss, 1971) and between pairs cept pairs obtained by all possible combinations of of annotators using Cohen’s k (Cohen, 1960). Us- the elements in the concepts set (excluding self– ing the scale defined by Landis and Koch (1977), prerequisites). Pairs vary with respect to the re- Fleiss’ k values show fair agreement, suggesting lation weight, computed for each pair by dividing that prerequisite annotation is difficult. Similar the number of annotators that annotated the pair by tasks obtained comparable or lower values, con- the total number of annotators. Only 1.54% (526) firming our hypothesis: Gordon et al. (2016) mea- of the pairs has a relation weight higher than 0 (i.e. sured the agreement as Pearson Correlation ob- it was annotated as prerequisite by at least one an- taining 36%, while Fabbri et al. (2018) and Chap- notator). Details about the distribution of prereq- lot et al. (2016) obtained respectively 30% and uisite relations and respective weights are reported 19% of Fleiss’ k. in Table 1. Compared to the other variations, removing cy- 55.70% (293) of the prerequisite pairs was iden- cles and adding transitive relations showed the tified by only one annotator, meaning that it is hard highest improvement on the agreement, also for for humans to agree on what a prerequisite is. We pairs of annotators (Table 2). Our results sug- further investigate this aspect in section 4.1. gest that different competence level entails dif- The analysis of the dataset carried out before ferent annotations and values of agreement, con- applying validation checks highlighted some crit- firming previous results (Gordon et al., 2016): ical issues: some transitive relations were explic- lower agreement can be observed when annotator itly annotated and some cycles were erroneously 4 (quasi–expert) is involved, possibly due to the added in the dataset, violating the instructions. lower competence level if compared to the other While cycles are due to distraction, transitive rela- annotators. Annotator 4 is also the one who con- tions are hard to recognize per se, especially when sidered the highest number of transitive relations, broad terms are involved (e.g. computer, software, producing a more connected ECM: it is likely that machine). when the competence in the domain is lower, a In order to study how these issues impact the person tends to consider a higher number of pre- dataset, each annotation was validated against cy- requisites for each concept. On the other hand, an- cles and transitive relations obtaining 5 dataset notators with more experience show even moder- variations, in addition to the original annotation. ate (pairs A1-A3 and A2-A3) or substantial agree- No Cycl. with the terminology T as a list L of concepts or- Metric Orig. Diff & Trans. Fleiss’s k All raters 38.50% 39.94% +1.44 dered by their first occurrence in the text. This is Cohen’s k A1-A2 34.46% 42.81% +8.35 done in order to give the annotator an overview of A1-A3 57.80% 50.84% -6.96 the context in which the concept occurs. We ob- A1-A4 37.59% 39.29% +1.70 A2-A3 56.50% 63.62% +7.12 served that the textual context plays a crucial role A2-A4 28.02% 29.42% +1.40 in deciding which concepts are prerequisites of the A3-A4 25.35% 25.71% +0.36 one under observation, so for each term we show the list of other terms with visual indication of the Table 2: Agreement values and differences for two progress in the text. Additionally, as said before, annotation variations. the tool validates the map resulting from the anno- tation against the existence of symmetric relations, ment (pair A2-A3 for the variation). Adding tran- transitivity and cycles. sitive relations and removing cycles generally im- Once the annotation is completed, the user can proves the agreement values also when we con- choose to generate different types of visualization sider pairs: we notice an increase of 8.35 points of her/his annotation. The goal of this functional- for A1-A2. The only exception is observed for the ity is to provide information visualization and data pair A1-A3, which experienced a decrease of al- summarization for analyzing and exploring the re- most 7 points. The cause is though to be the num- sult of the annotation. We provide the following ber of transitive relations considered by annotator different views: Matrix (ordered by concept fre- 3, which is around one third of the transitive re- quency, clusters, temporal, occurrence or alpha- lations annotated by annotator 1: the validation betic order), Arc Diagram, Graph and Clusters. creates more distance between the two annotations Furthermore, the Data Synthesis task provides the reducing the agreement. number of concepts, number of relations, number As a support for the annotation, the experts used and list of disconnected nodes and transitive rela- a n × n matrix of the terminology T where they tions. entered a binary value in the intersection between Lastly, the tool computes the agreement be- two concepts to indicate the presence of a pre- tween relations inserted by all annotators who took requisite relation. We believe that our results are part in the task (see Section 4.1) and provides vi- partially influenced by the instrument we used to sualization of the final dataset, which results as perform the annotation: a large matrix structure a combination of all users’ annotation. This fea- is likely to cause distraction errors and does not ture also outputs a Data Synthesis that provides the perform validation checks during the annotation. number of relations of every annotator, number of Based on this experience and the encountered is- transitive relations and the direction of conflicting sues, we developed an annotation tool able to sup- relations between annotators. port and validate the annotation. It will be de- The demo version of the tool is available online scribed in the next section. at the URL provided in the Introduction. 5 Annotation and Analysis Tool 6 Conclusion and Future Work We provide a language and domain independent In this paper, we described PRET, a gold dataset prototype tool which aims on the one hand to sup- manually annotated for prerequisite relations be- port and validate the annotation process and on tween pairs of concepts; moreover we presented the other hand to perform annotation analysis. All the methodology we adopted and a tool to support its main features have been designed taking into prerequisite annotation. The case study, even lim- account real problems encountered while build- ited as for the number of annotators and the edu- ing PRET. Thus, this tool is highly valuable for cational material, was a reasonably good training annotators because specifically addresses annota- ground to set the basis to define a methodology tors’ needs and, at the same time, avoids possible for prerequisite annotation and to identify the ma- annotation biases. In particular, the tool has three jor issues related to this task. Moreover, the anal- main functionalities: annotation support, annota- ysis of the annotation provided insights for auto- tion representation and analysis of the results. matic identification of concepts and prerequisites, To support the annotation, the user is provided that will be investigated in future work. References prerequisite relationships among learning objects. In 2015 International Conference on Information Giovanni Adorni and Frosina Koceva. 2016. Educa- Technology Based Higher Education and Training, tional concept maps for personalized learning path ITHET 2015, Lisbon, Portugal, June 11-13, 2015, generation. In Conference of the Italian Association pages 1–6. for Artificial Intelligence, pages 135–148. Springer. Jonathan Gordon, Linhong Zhu, Aram Galstyan, Prem Giovanni Adorni, Felice Dell’Orletta, Frosina Koceva, Natarajan, and Gully Burns. 2016. Modeling con- Ilaria Torre, and Giulia Venturi. 2018. Extracting cept dependencies in a scientific corpus. In Proceed- dependency relations from digital learning content. ings of the 54th Annual Meeting of the Association In Italian Research Conference on Digital Libraries, for Computational Linguistics (Volume 1: Long Pa- pages 114–119. Springer. pers), volume 1, pages 866–875. Rakesh Agrawal, Behzad Golshan, and Evangelos Pa- J. Richard Landis and Gary G. Koch. 1977. The mea- palexakis. 2016. Toward data-driven design of edu- surement of observer agreement for categorical data. cational courses: a feasibility study. Journal of Ed- Biometrics, 33(1):159–174. ucational Data Mining, 8(1):1–21. Chen Liang, Zhaohui Wu, Wenyi Huang, and C Lee Fareedah Alsaad, Assma Boughoula, Chase Geigle, Giles. 2015. Measuring prerequisite relations Hari Sundaram, and Chengxiang Zhai. 2018. Min- among concepts. In Proceedings of the 2015 Con- ing MOOC lecture transcripts to construct concept ference on Empirical Methods in Natural Language dependency graphs. In Proceedings of the 11th In- Processing, pages 1668–1674. ternational Conference on Educational Data Min- ing, EDM 2018, Buffalo, NY, USA, July 15-18, 2018. Chen Liang, Jianbo Ye, Zhaohui Wu, Bart Pursel, and Ron Artstein and Massimo Poesio. 2008. Inter-coder C Lee Giles. 2017. Recovering concept prerequisite agreement for computational linguistics. Computa- relations from university course dependencies. In tional Linguistics, 34(4):555–596. AAAI, pages 4786–4791. Glenn Brookshear and Dennis Brylow, 2015. Com- Chen Liang, Jianbo Ye, Shuting Wang, Bart Pursel, and puter Science: An Overview, Global Edition, chapter C Lee Giles. 2018. Investigating active learning for 4 Networking and the Internet. Pearson Education concept prerequisite learning. Proc. EAAI. Limited. Joseph D. Novak and Alberto J. Cañas. 2006. The the- Devendra Singh Chaplot, Yiming Yang, Jaime G. Car- ory underlying concept maps and how to construct bonell, and Kenneth R. Koedinger. 2016. Data- and use them. research report 2006-01 Rev 2008-01, driven automated induction of prerequisite structure Florida Institute for Human and Machine Cognition. graphs. In Proceedings of the 9th International Con- Liangming Pan, Chengjiang Li, Juanzi Li, and Jie Tang. ference on Educational Data Mining, EDM 2016, 2017. Prerequisite relation learning for concepts in Raleigh, North Carolina, USA, June 29 - July 2, moocs. In Proceedings of the 55th Annual Meet- 2016, pages 318–323. ing of the Association for Computational Linguistics Jacob Cohen. 1960. A coefficient of agreement (Volume 1: Long Papers), volume 1, pages 1447– for nominal scales. Educational and psychological 1456. measurement, 20(1):37–46. Partha Pratim Talukdar and William W Cohen. 2012. Felice Dell’Orletta, Giulia Venturi, Andrea Cimino, Crowdsourced comprehension: predicting prerequi- and Simonetta Montemagni. 2014. T2k2 : a system site structure in wikipedia. In Proceedings of the for automatically extracting and organizing knowl- Seventh Workshop on Building Educational Appli- edge from texts. In Proceedings of the Ninth In- cations Using NLP, pages 307–315. Association for ternational Conference on Language Resources and Computational Linguistics. Evaluation (LREC-2014). Annalies Vuong, Tristan Nixon, and Brendon Towle. Alexander R Fabbri, Irene Li, Prawat Trairatvorakul, 2011. A method for finding prerequisites within Yijiao He, Wei Tai Ting, Robert Tung, Caitlin West- a curriculum. In Proceedings of the 4th Inter- erfield, and Dragomir R Radev. 2018. Tutorialbank: national Conference on Educational Data Mining, A manually-collected corpus for prerequisite chains, Eindhoven, The Netherlands, July 6-8, 2011, pages survey extraction and resource recommendation. In 211–216. ACL. Shuting Wang, Chen Liang, Zhaohui Wu, Kyle Joseph L Fleiss. 1971. Measuring nominal scale Williams, Bart Pursel, Benjamin Brautigam, Sher- agreement among many raters. Psychological bul- wyn Saul, Hannah Williams, Kyle Bowen, and letin, 76(5):378. C Lee Giles. 2015. Concept hierarchy extraction from textbooks. In Proceedings of the 2015 ACM Fabio Gasparetti, Carla Limongelli, and Filippo Scia- Symposium on Document Engineering, pages 147– rrone. 2015. Exploiting wikipedia for discovering 156. ACM. Shuting Wang, Alexander Ororbia, Zhaohui Wu, Kyle Williams, Chen Liang, Bart Pursel, and C Lee Giles. 2016. Using prerequisites to extract concept maps fromtextbooks. In Proceedings of the 25th acm international on conference on information and knowledge management, pages 317–326. ACM. Yiming Yang, Hanxiao Liu, Jaime Carbonell, and Wanli Ma. 2015. Concept graph learning from ed- ucational data. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 159–168. ACM.