Semantic Textual Similarity of Course Materials at a Distance-Learning University Niels Seidel Moritz Rieger Tobias Walle FernUniversität in Hagen FernUniversität in Hagen FernUniversität in Hagen niels.seidel@fernuni- moritz.rieger@posteo.de tobias.walle@student.fernuni- hagen.de hagen.de ABSTRACT and interests. However, only module handbooks and course Choosing computer science courses from a wide range of websites are usually available for decision-making purposes. courses is not an easy task for students - especially in the first Learning materials published in advance as textbooks or in semesters. To overcome the shortcomings of course descrip- the sense of OER are the exception. In both cases, how- tions and vague recommendations by acquaintances, we pro- ever, the amount of information is difficult to manage. The vide a method to identify and visualize semantic similarities linear format of the module manuals, which often contain between courses using textual learning materials. To achieve more than one hundred pages, makes it difficult to iden- this goal, a complete set of course materials (94 courses, 572 tify courses that are similar in content and build on each course units / PDF textbooks) from the Faculty of Mathe- other. Moreover, the concise descriptions of the modules matics and Computer Science at FernUniversität in Hagen represent only a fraction of the learning content. Courses was vectorized as document embeddings and then compared that are not assigned to the course of study will, of course, using the cosine similarity of the vectors. The process can not appear in the module handbook. Many students there- be fully automated and does not require labeled data. fore seek advice from friends and fellow students or follow The results were compared with the semantic similarity as- recommendations from teachers. However, prospective stu- sessed by domain experts. Also the similarity of consecutive dents and first-year students do not yet have these contacts. courses and sections within the same course have been eval- This challenge becomes particularly clear when looking at uated against the average similarity of all courses. the example of the FernUniversität in Hagen. With over The presented approach has been integrated into a course 74,000 registered students and a course offering of over 1,600 recommendation system, a course dashboard for teachers courses, the distance-learning university is the largest uni- and a component of an adaptive learning environment. versity in Germany. The Faculty of Mathematics and Com- puter Science alone accounts for 134 courses, of which 94 are courses and 40 are seminars or internships. For students Keywords at the faculty, choosing from this large number of courses is NLP, Semantic Textual Similarity, Document Embedding, a particular challenge. In contrast to attendance universi- Educational Data Mining ties, it is usually not possible to benefit from the experience of fellow students. Furthermore, the authors or supervisors 1. INTRODUCTION are usually not personally known to the students, so that Before each semester, students are faced with the question contacts with lecturers can hardly make the decision easier. of which course to take. In order to achieve the goal within Students can use the short descriptions of course contents a course of study, the examination regulations contain infor- and learning goals in the module manuals (approx. 100-200 mation on the optional and compulsory modules and courses. words) as well as short readings of one chapter of the script Study plans of the Student Advisory Service flank this frame- for decision-making. For universities with a very large num- work with recommendations on the number, sequence and ber of courses and a very wide range of options, the planning selection of courses for the individual semesters. Ultimately, of the study program is therefore time-consuming and com- the dates of the courses result in further organizational re- plex. quirements with which the individual timetable must be brought into line. Despite these organizational restrictions, Teachers who wish to avoid redundancies to other courses the internal autonomy of the universities opens up many and who wants to build on previous knowledge or develop options for selecting courses according to content criteria the same for other courses when planning and creating learn- ing materials face a similar hurdle. In view of the large number of courses, however, the people concerned do not always know exactly what their colleagues teach in detail in their courses. Consequently, overlaps and gaps in content remain undetected and potential for cooperation in the field of teaching is not recognized. In this paper a method for the analysis of semantic similarity Copyright c 2020 for this paper by its authors. Use permitted under Cre- of courses using text-based learning materials is presented. ative Commons License Attribution 4.0 International (CC BY 4.0). In the second section, related works regarding methods for Brackhage et al. had experts manually keyword module de- determining the semantic similarity of texts as well as on scriptions of several universities and visualized these data to- course selection recommendation systems will be presented. gether with further metadata in a web application as a forced Subsequently, the method document embeddings used here layout graph and adjacency matrix heatmap and made them for the analysis of semantic similarities is presented in sec- searchable with the help of complex filters [7]. However, key- tion 3 using the example of a corpus of 94 courses of the wording proved to be extremely time-consuming and has to FernUniversität in Hagen. The results will be evaulated in be updated frequently. Baumann, Endraß and Alezard used section 4. Based on the semantic relations of the course ma- study history data to visualize “on the one hand the distri- terials, we present three prototypical applications in section bution of students across the modules in a study program 5: i) a tool for exploration and recommondation of courses, and on the other hand the distribution of students in a mod- ii) a teacher dashboard, and ii) an adaptive course recomon- ule across different study programs” [4], without, however, dations for long study texts. The article ends with a sum- concretizing their benefit for the intended support in the mary and an outlook. choice of courses. Askinadze and Conrad used examination data from one study program to illustrate the progress and 2. RELATED WORKS discontinuation of studies in various visualisations [3]. How- ever, there is a large number of applications and approaches The processing of natural language using Natural Language to recommend courses to students. Lin et al. used the sparse Processing (NLP) techniques has made enormous progress linear method to develop topN recommendations based on in recent years. Conventional NLP methods generate from occupancy data of specific groups of students [19]. With a text document by Bag of Words (BOW), frequency-based the help of K-Means and Apriori Association Rules, Aher methods like Term Frequency Inverse Document Frequency and Lobo presented a recommendation system for courses (TF-IDF), Latent Dirichlet Allocation (LDA) etc. vectors in the learning management system Moodle [2]. Zablith et and calculate the distance between the vectors [19]. How- al. present several recommendation systems based on linked ever, these methods cannot capture the semantic distance data from the Open University UK1 [29]. The Social Study or are very computationally intensive [21] and usually do application, for example, suggests courses to learners based not achieve good results. Newer machine learning methods on their facebook profile, while Linked OpenLearn offers me- achieve much better results in the analysis of semantic text dia and courses to the distance learning university’s OER to representations [10]. A central challenge is the determina- learners. The recommendations are based on course-related tion of Semantic Textual Similarity (STS), which is used metadata and links to other courses and media, but do not in machine translation, semantic search, question-answering consider the semantics of the courses. D’Aquin and Jay try and chatbots. With the help of developments in the field of to reconstruct the missing semantics with the help of differ- distributed representations, especially neural networks and ent linked data sources (e.g. DBpedia) in order to trace fre- word embeddings such as Word2Vec [21] and Glove [23], se- quently occurring course occupancy (frequency sequences) mantic properties of words can be mapped into vectors. Le [11]. The analysis of semantic similarities on the basis of and Mikolov have shown with Doc2Vec that the principles the complete learning materials does not only provide in- used can also be applied to documents[17]. sights into the content relations of learning resources but also opens up the possibility to understand the temporal The similarity of extensive book collections has so far been structure and patterns of course assignments for the deci- investigated in only a few works. The SkipThrough Vectors sion making process when choosing a course. presented by Kiros et al. train an encoder-decoder model that attempts to reconstruct the surrounding sentences of an From the perspective of course planning, Kardan et al. have encoded passage [16]. However, the experiments were based developed a prediction model for the number of course book- on a relatively small body of only 11 books [30]. Spasojevic ings with the help of a neural network [15]. Ognjanovic and Pocin, on the other hand, determined the semantic sim- et al. have also modeled the course occupancy for several ilarity at the level of individual pages and entire books for semesters in advance [22]. However, the authors of this pa- the corpus of Google Books, which contains about 15 million per could not find any contributions in the literature for books [26]. The similarity of two books was determined from a didactically motivated use of occupancy statistics. The the Jaccard index of the permuted hash values of normalized same applies to the use of these data for the modeling of word groups (features). Liu et al., however, point out that learners within adaptive or at least personalized learning the semantic structure of longer documents cannot be taken environments. into account in this way and therefore propose the represen- tation as a Concept Interaction Graph [20]. Keywords are determined from a pair of documents and combined into con- 3. DETERMINATION OF THE SEMANTIC cepts (nodes) using community detection algorithms. These SIMILARITY OF TEXTS nodes are connected by edges that represent the interactions In this section, a procedure for analyzing the semantic sim- between the nodes based on sentences from the documents. ilarity of courses using text-based learning materials is pre- Although the method seems very promising, it has so far sented using the example of the study texts of the Faculty only been investigated on the basis of news articles. The of Mathematics and Computer Science of the FernUniver- SemEval-2018 Task 7 [12] pursues a similar goal as this pa- sität in Hagen. This procedure consists of four steps, which per, for example, with regard to STS, where semantic rela- are mainly based on the work of [blinded]. First a corpus tions from abstracts of scientific articles are to be found. The of course materials is created. Then these data are vector- gold standard used for evaluation is based on named entities ized to determine the similarity in the third step. Finally, (persons, places, organizations), which cannot be annotated 1 with reasonable effort for large amounts of text. See http://data.open.ac.uk/ (last accessed 15.06.2020). an evaluation with a gold standard and other comparison Fig. 1). Since the word vectors capture the semantics of the parameters is carried out. words as an indirect consequence of the estimation task, this is done in a similar way with Document Embeddings. One A corpus is a collection of related documents. In order to can imagine the training of the PV as the training of another create a corpus, source data of 94 courses from all 20 subject word. A PV acts as a kind of memory that contains informa- areas of the faculty were available. A course consists of 3 to tion about missing words in the context within a document. 10 documents, that we call course units or units. The course For this reason this model is also called Distributed Mem- units were available as PDF documents that have between ory Model of PV. Building on the Word Embeddings, PVs 20 and 60 pages. The PDFs differed in terms of their format have been trained to represent entire documents. Now the (e.g. PDF/A, PDF/X), the PDF versions and the tools used PVs can be processed as characteristics of the documents to create them. The formatting of the type area was also not to recognize semantic similarities of the documents. Before uniform. For these reasons, a programmatic extraction of the documents are compared with each other, the semantic chapters using regular expressions and PDF outlines proved similarity of texts is first examined in general. To find a to be unreliable and had to be discarded. The cover pages commonality of all terms, similarity has to be thought of as as well as redundant tables of contents and keyword indexes a “complex network of similarities” [28] of different entities. within a course were removed. The PDF documents were This complex form of similarity of natural language means therefore first converted to text and divided into sentences that two documents cannot be considered semantically sim- and words using NLTK [5]. To avoid errors with mathemat- ilar on the basis of common features, but that similarity is ical formulas and dotted lines in the table of contents, the to be understood as the interaction of many direct and in- text was also normalized. The resulting corpus contains 572 direct relationships between the words contained in them course units, consisting of 654,367 sentences with a total of [13]. This concept of similarity is taken into account in the 9,507,770 words. The vocabulary contains 179,078 different training of word embeddings. The weight matrix, which ul- words. Document Embeddings, also called Paragraph Vec- timately contains the word embeddings, is the result of the tors (PV) by Le and Mikolov, were used to vectorize the use of the words in all contexts of the entire text corpus and documents [17]. Since document embeddings are based on thus represents the complex network of similarities described word embeddings, they must be created first. For this pur- by Wittgenstein. To compare Distributed Representations, pose, the words in one-hot-encoding enter a neural network. the cosine similarity is usually used as a metric [27, 13]. For This serves an estimation task, whereby the word most likely normalized PV there is a linear relationship to the Euclidean to be in the context of a word is to be estimated. The neural distance. network is trained with tuples from the text. For this pur- pose, a window is pushed through the entire corpus and the 4. EVALUATION OF NLP SYSTEMS resulting tuple combinations are noted within the window. Since vectorizing the documents as PV is an Unsupervised By feeding back the estimation error into the re-estimation, Machine Learning method, there is no underlying test data the weights of the weight matrix W are optimized. This has against which the system can be tested. Following the Se- the consequence that the weights of the estimation task for mEval competitions, a gold standard was therefore devel- semantically close words assume similar values, since com- oped, which consists of a set of test and training data. How- parable tuples were used in the training. The weights of the ever, this gold standard could not be generated by crowd- estimation task represent the word embeddings. sourcing [1], as is the case with many SemEval tasks, since a high degree of competence in the respective fields of knowl- edge is required for the assessment of semantic similarity. For this reason, three experts, which are authors of course texts themselves, were asked to compare one of their own courses with a course that they thought was similar. As an incidence they selected 6 unique courses. By evaluat- ing documents that are related in terms of topic or content, monotonous gold standards that do not show any similari- ties could be avoided. Each of the three experts evaluated two courses which consisted of 4 and 7 units each. Each evaluator had thus made 28 comparisons. The similarity was indicated on a continuous scale from 0 (not similar) to 100 (identical). A nominal gradation of the scale was omit- ted due to expected problems of understanding with regard Figure 1: Continuous Bag-of-Words model as well to the valence and equidistance of the scale values. Half of as the paragraph ID (orange), which is included in the gold standard data was used for training different hy- the estimate of w(t) in addition to the context words perparameters, as shown in Fig. 3. The hyperparameters for document embeddings. were composed of the window size of the Continuous Bag of Words, the dimension of the PV and the minimum fre- quency of occurrence of the words considered. The values In order to be able to represent whole documents semanti- for the individual parameters are based on plausibility tests cally as vectors, the idea of word embeddings is extended and are within the value ranges known from literature (e.g. to whole texts. For this purpose, a paragraph vector, a col- [21]). To avoid overfitting, the values for each parameter umn of another weight matrix D, is combined with word were only roughly graded. The minimum mean square error vectors to estimate the next word from a given context (see could be determined for a window size of 20, a dimension of In order to test the first hypotheses, eight courses were ini- tially identified which, given the numbering contained in the course title, clearly build on each other. The mean cosine similarity of the consecutive courses is 0.32, which is above the average of the whole corpus (0.18). Hypothesis 1 is thus confirmed. The second hypothesis could already be recog- nized by the strongly colored rectangular artifacts along the diagonals in the adjacency matrix in Fig. 2. The mean simi- larity of course units is 0.51 and is thus significantly greater than the mean cosine similarity of the whole corpus (see Fig. 5). Hypothesis 2 is therefore also confirmed. A further part of the plausibility check consisted, among other things, in excluding undesired effects of the document size on the semantic similarity. There is no correlation between the dif- ference in the word count of two documents and their cosine similarity (r = 0.013). 5. APPLICATIONS 5.1 Course exploration and recommendation The hurdles in the choice of courses addressed in the intro- duction to this paper address an application in which learn- Figure 2: Adjacency matrix of course unit relations. ers can explore the semantic similarity of courses and course Courses are represented by a running number along units by means of visualizations in the form of chord dia- the axis. The darker the boxes, the greater the se- grams, forced layout graphs and heat maps. These node-link mantic similarity. The dark colored rectangular arti- diagrams are primarily suitable for small graphs, since the facts along the diagonals indicate the high similarity visualization quickly becomes confusing due to overlapping of units of the same course. edges. Heatmaps in particular, may contain many nodes, but require a lot of space. Their readability depends largely on the arrangement of the elements. Due to this limitation, PV of 140 and a value of 20 as the lower limit for the fre- it seemed necessary to realize the exploration over the en- quency of occurrence of words (see solid blue dot in Fig. 3). tire set of courses not graphically, but textually. Besides Based on these hyperparameters a model was trained and the given structuring of the courses according to study pro- tested with the second part of the gold standard (test data). grams, chairs and lecturers, we tried to identify overlapping Pearson’s r as a measure of the linear relationship reached topics. Using Latent Dirichlet Allocation 11 topics were de- a value of 0.598. However, since in the present case whole termined based on the word distribution [24]. For each topic documents were compared instead of individual sentences, the 20 most weighted terms were displayed in a word cloud. the gold standard and the PV are more fuzzy. Fine fluc- After the user has made pre-selection (e.g. by choosing a tuations in cosine similarity are not reflected in the gold topic), a limited set of up to 20 courses including their course standard. However, in view of the subjective assessment units can be explored. For this pupose various interactive on the continuous scale, which can be freely interpreted by node-link diagrams were created as Data Driven Documents the evaluators, the value for Pearson’s r must be regarded [6]. as high. To establish the monotonous relationship between cosine similarity and the gold standard, Kendall’s τ was de- The recommendation of courses is based on two approaches. termined as a rank correlation coefficient with a value of Firstly, other courses with a high cosine similarity were pro- 0.451. In general, smaller values of correlation are obtained posed for a course. The suggestions were justified by a list for Kendall’s τ compared to Pearson’s r. However, the low of the particularly similar course units (see Fig. 6). In this value is also due to the individual definition of the concept way, the algorithmic decision can be understood on the basis of similarity and the individual mapping of the subjectively of the available texts. perceived similarity to the scale. Looking at the areas of high similarity shown in Fig. 4, the correlation is more ob- Secondly, the Alternating Least Squares Algorithm by Hu, vious. Koren, and Volinski [14] was used for collaborative filtering in order to create a recommendation system based on the In addition to the gold standard, the NLP system was checked courses that of other students have been enrolled to in the for plausibility of the results. Two hypotheses were put for- past. Collaborative filtering often works with explicit feed- ward for this purpose: back based on user ratings. However, course enrollment data does not express an assessment but a learner’s preference, which is called implicit feedback. By choosing a course, a student indirectly expresses his or her preferences. Students H1 Course units of consecutive courses are more similar who have taken similar courses may be interested in similar than units of other courses. courses in the future. The numerical result of the implied feedback indicates the confidence, but not the students’ pref- H2 Course units of one course are more similar than units erence for a course. The user behavior can be used to de- of other courses. duce which courses the user is likely to prefer. Fig. 7 shows Figure 3: Minimizing the mean square error for multiple configurations of hyperparameters. Each dot represents a hyperparameter configuration. The highlighted dot in solid blue represents the best parameter combination. variable 0.5 cosine gold 0.4 Similarity 0.3 0.2 0.1 0.0 0 20 40 60 80 Test pairs Figure 4: Ratio of gold standard (orange) to cosine similarity (blue) for the individual test pairs Figure 5: Distribution and mean value of the cosine similarity in the entire corpus (blue), between the consecutive courses (orange), and the course units (green). a screen grab of the recommender system. The filtering procedure described here only briefly has clear limits. For example, the order in which courses are taken is not considered. However, this can have a high relevance, as a student should not be recommended to take any more basic courses at the end of his studies. The method always interprets the attendance of a course as a positive factor. However, this is not always the case, for example, because a student attends a course but has not perceived it as interest- ing or valuable. Furthermore, there are compulsory modules in many courses of study, which must be attended in any case. However, this is a general disadvantage of recommen- dations based on implicit feedback. The chosen approach of collaborative filtering cannot make recommendations for prospective students who have not taken a course. In this case, however, the usual introductory courses of a degree program can be recommended. Besides the examination of certain subjects the course choice is not constraint by study regulations or other pre-requesites at our faculty. Such con- straints might have to be considered for course recommender systems. Figure 7: Course recommendations based on the in- dividual course of study and the data on the enroll- ment of all students in the study program Figure 8: Extract from the dashboard for teachers 5.3 Adaptive course recommendations for long study texts Figure 6: Course details view with a list of related In the third use case, adaptive navigation support in the courses sense of direct guidance [8] was integrated in the online learning environment Moodle. The Moodle standard page plugin (mod page) has been enhanced for the readability of long texts [18], so that the course texts, some of which are 5.2 Teacher dashboard over 60 DIN-A4 pages long, can also be used on screen. The second application scenario is primarily aimed at teach- ers and authors of learning materials. In a Learning Ana- The marginal columns of the text are used to point readers lytics Dashboard [25] course occupancy statistics are linked to chapters of other courses that are very similar to the cur- with the semantic relations of the course materials. By in- rently displayed text paragraph ,The recommendations are cluding the semantic textual similarity of other courses and limited to two links per text paragraph. No recommenda- course chapters, responsible teachers can identify connec- tions are made for paragraphs of less than 100 words. The tions to other courses and identify possible content duplica- threshold value for the degree of similarity was chosen rel- tions. The dashboard consists of six tiles in a three-column atively high in order to avoid recommendations of courses layout: (1) An adjacency matrix shows the similarity of the that show only a little similarity. course units contained in the course (Fig. 8, left). (2) The In terms of adaptive learning it is taken into account whether five most similar courses are shown in a matrix (Fig. 8, mid- the learner has already taken the recommended course. This dle). (3) A line chart shows the course attendance of the information will be analyzed in relation to the learning pro- last few years (Fig. 8, right). In addition, the dashboard gress in the current Moodle course. In case of a lower contains statistics of the most frequently (4) previously, (5) progress and comparatively low quiz results and only a few simultaneously and (6) subsequently attended courses in the points achieved in the assignments we want to encourage form of horizontal bar charts. the learner to make use of his previous knowledge, which he has acquired in previous courses. Consequently, the recom- not been considered either, but could be learned from la- mended links point to courses that the learner already know beled texts and applied to other texts. In order to be able and which are semantically related to the currently displayed to reproduce the learning materials of a course completely text paragraph. In the second case high performing students in the corpus, texts from diagrams and other visualizations or those who almost completed the current Moodle course should also be included. The possible applications shown will be provided with links to courses they have not enrolled in section 5 illustrate possible fields of application for the so far. Often these are more advanced courses, if the stu- use of semantic relations of study texts, but require further dents are in the beginning of their studies or if they have investigation – especially user studies. already enrolled to the primitive courses. In this way, we would like to encourage students to deepen their knowledge In all three use cases it becomes clear that the textual simi- in a specific area through targeted course recommendations. larity of the learning materials alone is not sufficient to rec- ommend courses, present comprehensive data for course au- 6. CONCLUSION AND OUTLOOK thors or make meaningful recommendations in an adaptive learning environment. Apart from that, the identification of An expandable corpus of the Faculty of Mathematics and course duplicates and overlaps might be another interesting Computer Science of the FernUniversität in Hagen was cre- use case for the corpus of study materials. In order to en- ated. Special attention was paid to the fact that this corpus able further research of this kind, we are trying to publish can be extended without manual effort. The corpus allows the text corpus as research data. a storage-efficient access to single course units or to several units per faculty, chair and course, so that it can serve as a basis for further studies. Subsequently, methods for feature 7. REFERENCES extraction of the documents were investigated. The focus [1] E. Agirre, M. Diab, D. Cer, and A. Gonzalez-Agirre. was on the mapping of semantics in the vector representa- SemEval-2012 task 6: a pilot on semantic textual tion. For the selected PV model from [17] it was shown similarity, 2012. that PV can map semantic information even in texts with [2] S. B. Aher and L. Lobo. Combination of machine several thousand words. The results were evaluated with a learning algorithms for recommendation of courses in gold standard and show a high correlation to it. In relation E-Learning System based on historical data. to comparable studies (e.g. [9]), this paper compared much Knowledge-Based Systems, 51:1–14, 2013. larger texts with several thousand sentences instead of just [3] A. Askinadze and S. Conrad. Development of an individual sentences, which can be more precisely semanti- Educational Dashboard for the Integration of German cally assigned. By means of Word and Document Embed- State Universities’ Data. In Proceedings of the 11th dings, the similarity of two courses can be justified to the International Conference on Educational Data Mining, users of the system by considering the subordinate course pages 508–509, 2018. units belonging to a course. In a next development step, a [4] A. Baumann, M. Endraß, and A. Alezard. Visual chapter-by-chapter or page-by-page analysis could make the Analytics in der Studienverlaufsplanung. In Mensch & relations of the units comprehensible by means of the rela- Computer, pages 467–469, 2015. tions of the chapters contained in the course. In order to [5] S. Bird, E. Klein, and E. Loper. Natural language improve the reliability of the evaluation, we have presented processing with Python: analyzing text with the natural an approach to define a gold standard and two metrics (H1 language toolkit. O’Reilly Media, 2009. and H2) for assessing STS for larger texts. However, the [6] M. Bostock, V. Ogievetsky, and J. Heer. D3 gold standard needs to be extended to make better conclu- Data-Driven Documents. IEEE Trans. Vis. Comput. sions about the quality of the approach. However, there is Graph., 17(12):2301–2309, 2011. also a need for other metrics that can be determined with [7] N. Brackhage, Carsten Schaarschmidt, E. Schön, and less effort in order to large text similarity. N. Seidel. ModuleBase: Inter-university database of study programme modules, 2016. In this article it was shown by way of example how the STS can be examined by extensive textual learning resources of [8] P. Brusilovsky. Adaptive Navigation Support. In The a distance-learning university. However, the methods are Adaptive Web: Methods and Strategies of Web also transferable to traditional universities, which work more Personalization, pages 263–290. 2007. with presentation slides and online resources. Furthermore, [9] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and it is conceivable to compare courses and study programs L. Specia. SemEval-2017 Task 1: Semantic Textual of different universities [7] and thus facilitate the choice of Similarity - Multilingual and Cross-lingual Focused study places. From the administrative perspective of course Evaluation. arxiv.org, 2017. planning and accreditation further fields of application of the [10] A. M. Dai, C. Olah, Q. V. Le, T. Mikolov, K. Chen, technology could arise. This only works as far as textual G. Corrado, and J. Dean. Efficient Estimation of representations of learning materials such as presentation Word Representations in Vector Space. CoRR, slides, video transcripts or online courses cover the content abs/1507.0, jul 2013. of a course. [11] M. D’Aquin and N. Jay. Interpreting data mining results with linked data for learning analytics: The STS approach used here is subject to some limitations, motivation, case study and directions. In D. Suthers which at the same time indicate a need for further research. and K. Verbert, editors, Third Conference on Learning In connection with documents embeddings, intrinsic infor- Analytics and Knowledge, LAK ’13, Leuven, Belgium, mation on the content of the documents has not been con- April 8-12, 2013, pages 155–164. ACM, 2013. sidered so far (see [10] and LDA or LSA). Homonyms have [12] K. Gábor, H. Zargayouna, I. Tellier, D. Buscaldi, and T. Charnois. Exploring Vector Spaces for Semantic [28] L. Wittgenstein. Philosophical Investigations. In New Relations. In Proceedings of the 2017 Conference on York: The Macmillan Company, page 272. Blackwell, Empirical Methods in Natural Language Processing, 1953. pages 1814–1823, Stroudsburg, PA, USA, 2017. [29] F. Zablith, M. Fernandez, and M. Rowe. Production Association for Computational Linguistics. and consumption of university Linked Data. [13] E. Grefenstette. Analysing Document Similarity Interactive Learning Environments, 23(1):55–78, 2015. Measures. PhD thesis, University of Oxford, 2009. [30] Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, [14] Y. Hu, Y. Koren, and C. Volinsky. Collaborative R. Urtasun, A. Torralba, and S. Fidler. Aligning Filtering for Implicit Feedback Datasets. In 2008 Books and Movies: Towards Story-like Visual Eighth IEEE International Conference on Data Explanations by Watching Movies and Reading Books. Mining, pages 263–272, dec 2008. arXiv e-prints, page arXiv:1506.06724, jun 2015. [15] A. A. Kardan, H. Sadeghi, S. S. Ghidary, and M. R. F. Sani. Prediction of student course selection in online higher education institutes using neural network. Computers & Education, 65:1–11, 2013. [16] R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. CoRR, abs/1506.0, 2015. [17] Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. jmlr.org, 2014. [18] Q. Li, M. R. Morris, A. Fourney, K. Larson, and K. Reinecke. The Impact of Web Browser Reader Views on Reading Speed and User Experience. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, New York, NY, USA, 2019. Association for Computing Machinery. [19] J. Lin, H. Pu, Y. Li, and J. Lian. Intelligent Recommendation System for Course Selection in Smart Education. Procedia Computer Science, 129:449–453, 2018. [20] B. Liu, T. Zhang, D. Niu, J. Lin, K. Lai, and Y. Xu. Matching Long Text Documents via Graph Convolutional Networks. CoRR, abs/1802.0, 2018. [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. 2013. [22] I. Ognjanovic, D. Gasevic, and S. Dawson. Using institutional data to predict student course selections in higher education. The Internet and Higher Education, 29:49–62, 2016. [23] J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. [24] R. Rehurek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, 2010. ELRA. [25] B. Schwendimann, M. Rodriguez-Triana, A. Vozniuk, L. Prieto, M. Boroujeni, A. Holzer, D. Gillet, and P. Dillenbourg. Perceiving learning at a glance: A systematic literature review of learning dashboard research. IEEE Transactions on Learning Technologies, PP(99):1, 2016. [26] N. Spasojevic and G. Poncin. Large Scale Page-Based Book Similarity Clustering. In ICDAR 2011, 2011. [27] S. M. Weiss, N. Indurkhya, and T. Zhang. Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer London, London, 2015.