The Comparison of Self-Reported and Real Effects of Using Corpus-Based Exercises in ESP Course to Improve Students’ Language Skills Inga V. Kuznetsovaa a ITMO University, Kronverksky Pr. 49, Saint Petersburg, 197101, Russia Abstract This paper presents the outcomes of the controlled experiment in which we studied the impact of corpus-based classroom activities and the inquiry-based learning method (a form of active learning) on students' academic progress in their English for special purposes (ESP) course. The research objective was to compare the actual change in students’ language skills with the self-reported one. Students’ feedback is often used to make adjustments to the course to improve it. Our literature review results indicated that students’ opinion often lacks objectivity and cannot be fully relied on when evaluating the usefulness of the course. In this study we called into question the validity of students’ evaluations of the effectiveness of a corpus-based approach to learning ESP. In our experiment, we used two groups of third year ITMO university students, majoring in Biotechnology. We compared students’ self-reported improvement of their language skills with the actual improvement of skills. The methods used in this study included a questionnaire and pre-, and post-tests. The statistical analysis of tests’ scores indicated that using corpus-based tasks, in addition to the regular ESP course, resulted in considerable improvement of students’ language skills in the experimental group. Interestingly, in the experimental group, students’ questionnaire answers revealed that the majority of them failed to realize the real scale of their language skills improvement due to their work with corpora. Therefore, the results of our study suggest that students might underestimate the value of using corpus-based activities in the classroom. Keywords1 Corpus linguistics, ESP, language skills, DDL, corpus-based exercises 1. Introduction One of the main teaching approaches of higher education nowadays involves making students participate actively in their own educational process. One of the branches of Computer Assisted Language Learning (CALL) is the active approach to learning called Data-Driven-Learning (DDL), in which students access corpora in order to discover the behavior of language themselves. Over the past three decades, the DDL approach has received a lot of attention in the English for Special Purposes (ESP) research community [1]. A number of studies have shown that using corpora for teaching a foreign language contributes to the development of autonomy and motivation of students. Student's ability to access corpora and study the meaning of science terms and their collocations is very important for the development of lexical skills. The vocabulary of ESP language includes numerous words related to science, which, being easily recognizable in writing, have completely different meanings or pronunciation in English. Corpora can provide students with the information about ESP terms pronunciation, collocations and usage in different contexts. Therefore, students’ knowledge IMS 2021 - International Conference "Internet and Modern Society", June 24-26, 2021, St. Petersburg, Russia EMAIL: ingakuznetsova2014@mail.ru ORCID: 0000-0002-4404-3884 © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 298 PART 2: Computational Linguistics about how to use corpus technology can have a huge impact on their ability to self-correct errors in pronunciation, vocabulary and grammar, dramatically improving their language skills. In reality, though, there is a gap between using corpora in linguistic research and for ESP classroom teaching. Based on the results of numerous studies, the gap is still growing [2, p.3; 3, p.32; 4, p.461]. Some reasons for this gap include teachers’ fear that a DDL course might receive negative evaluations from students. Indeed, increased students’ motivation and their high enthusiasm about DDL are regarded by many scientists as evidence of a successful outcome of using corpora tools to teach language [5, 6, 7, 8]. For example, a recent meta-analysis of 64 studies focusing on the effects of using corpus linguistics for teaching foreign languages demonstrated that DDL can be both effective and efficient in almost any context [6]. However, there is always the possibility that students might dislike using DDL and corpora. Some teachers also believe that technical issues and students’ unwillingness to learn actively might result in negative learning outcomes. Furthermore, despite publications featuring successful use of corpus applications for a broad range of teaching language purposes, some teachers still believe that using corpora to teach language might affect learners’ perceptions of the learning process and have negative impact on students’ evaluations of their language course. Although course evaluations by students are used as a tool of getting feedback in many colleges and universities, there’s still much uncertainty on the objectivity of students’ answers to questions about the usefulness of an ESP course enhanced with corpus-based exercises. According to some scientists, students may not accurately assess the changes in their performance. A critical look at the students’ evaluations might reveal that students are not always capable of fully appreciating the benefits of active learning. For instance, the recently published research at Harvard University demonstrated students’ misperception of their learning outcomes. This research studied students’ reactions to active learning under controlled conditions and was conducted in a college physics course which was taught using both active and passive instruction [8, p.19251]. Having observed the students’ self-reported perception of learning, researchers came to the conclusion that most of the students preferred the passive way of teaching and considered it more effective. However, at the end of the course, their test results proved the opposite. Students who were taught using an active approach got higher scores for their tests, compared to students who just listened to the lecturers. Therefore, the scientists reported an anti-correlation between self-reported and the actual results of active teaching [8, p. 19256]. The authors of the research claimed that students can’t always fully appreciate the value of being actively engaged in the learning process, because, in their minds, it is associated with increased cognitive effort, which is often regarded by students as a sign of poor learning [8, p.19251]. According to the authors, this fact explains why students and faculty prefer traditional lectures to active learning [8]. In our study we called into question the validity of students’ evaluations of the effectiveness of a corpus-based approach to learning ESP. In our controlled experiment, we used two groups of third year ITMO university students, majoring in Biotechnology. We developed corpus-based exercises promoting an inquiry-based learning (which is a form of active learning) for our experiment. The students in the experimental group were introduced to corpus technology and completed our corpus- based exercises in addition to the regular ESP course, used in the control group. The methods implemented in this study included a questionnaire, and pre-, and post-tests. We compared students’ self-reported improvement of their language skills with the actual improvement of skills as indicated by their test scores. 2. Methods 2.1. Research questions RQI: Is there any statistically significant difference between the mid-course test results, before the experiment in both groups? RQ2: Is there any statistically significant difference in the end-of-course (EOC) test results, after the experiment, in both groups? RQ3: Does students’ self-perception of their learning correlate with the actual change of their test scores? IMS-2021. International Conference “Internet and Modern Society” 299 In this section we describe the approach we took to answer our research questions. We describe tests and statistics used in order to evaluate differences in test scores between two groups. We also describe the questionnaire that students in both groups used to rate their courses. Finally, we describe the corpus-based activities designed for the experimental group in order to show how inquiry-based learning was implemented in our experiment. 2.2. Tests and questionnaire description In order to find answers to our research questions, we used two groups of ITMO university students in their third year. The students from both groups majored in biotechnology and studied ESP. Group A (control) had 13 students and Group B (experimental) had 12 students. All students were randomly assigned, and had the same English language level, ranging from level B2 to C1. Both groups received identical ESP class content, but the students in Group B were taught using corpus- based exercises in addition to their main ESP course. Both groups completed two tests: the midcourse test, before the experiment, and the End of Course Test (EOC), at the end of the semester, after the experiment. Both tests were the same for two groups. Each test consisted of ten tasks: seven lexical tasks, checking the mastery of professional vocabulary; two listening tasks, which checked listening skills; one written task, in which students had to describe one of the methods for biomaterial processing. Students’ speaking skills were assessed during their oral presentations on a biotechnology topic chosen individually by each student. The first aim of our study was to check if both groups had the same level of English before the experiment. For this purpose, the midcourse test scores of Group A and Group B were statistically evaluated and compared, using unpaired t test. Our null hypothesis stated that there was no significant difference between the means of the two groups. The p value was calculated in order to see if the null hypothesis had to be accepted or rejected. The second aim of our study was to check if both groups had the same level of English knowledge after the experiment. In order to answer the second research question, the EOC test scores of Group A and Group B were also statistically evaluated and compared, using unpaired t test. Then, the p value was calculated. Based on the p value, the null hypothesis, stating that there was no significant difference between two groups’ test results, had to be accepted or rejected. The third aim of the study was to see if students’ self-perception of their learning correlates with the actual change of their test scores. Thus, in order to find an answer to our third research question, we compared actual changes in students’ test’s scores with their self-reports on language skills improvement. Therefore, both groups were asked to complete a questionnaire at the end of their ESP course. In order to find answers to our third research question, we asked students to complete the questionnaire in which students rated their level of agreement with statements on a 5-point Likert scale. In this scale, the number 1 typically represents an answer “strongly disagree” and the number 5 is “strongly agree”. Students evaluated 4 statements: “The course was boring”, “I enjoyed the course”, “The course was very interesting”, and “My speaking and writing skills improved”. 2.3. Corpus exercises description The corpus-based exercises were created with the assumption that motivation is increased as learners become engaged in activities, that they base on their own intentions, concerns and interests. Students used corpora to complete tasks given by the teacher in order to study professional vocabulary. Then they used their findings to improve their productive skills in speaking and writing. Students were required to post their completed assignments on Moodle in a Forum task. This way they were able to share their research results, give each other feedback, and use corpora findings for writing and speaking. Students were also required to update their Moodle Glossary with new terms and collocations, found in corpora. Students used information about terms’ synonyms and collocations to paraphrase texts from the research articles in their fields and write down the summary of the texts. The corpus-based exercises also included using corpora for making oral presentations in groups, pairs and individually. During this experiment students accessed two corpora: NOW [9] and 300 PART 2: Computational Linguistics iWeb [10]. Below, there are some examples of the exercises used in the ESP course for the experimental group. 2.3.1. iWeb corpus exercises used in the experiment The exercises demonstrate how an inquiry-based teaching approach was used in our experiment. This method taught students to discover different patterns in authentic language use. iWeb corpus exercises. Read the text from the article “Colony Sequencing: Direct Sequencing of Plasmid DNA from Bacterial Colonies” [11]: “In sequencing projects, the preparation of templates, involving either the growth of bacteria and subsequent plasmid purification or the amplification by polymerase chain reaction (PCR) of inserts in vectors (1–3), is one of the most costly steps in terms of reagents and time. We have developed a simple method for directly sequencing plasmids from bacterial colonies that requires only heat- induced lysis of bacterial colonies followed by cycle sequencing (5), thus circumventing template preparation” [11]. 1. Go to the website https://www.english-corpora.org/iweb/ and create an account. Enter an underlined term from the text into the search window and highlight the option “Word” above it. Then click the button “See detailed info for word”. Please start with the noun “plasmid”. 2. Click on the icon demonstrating the word’s pronunciation. Repeat the word as many times as you need. Repeat this activity for all the underlined terms. 3. Work in pairs and practice pronunciation. Ask your partner to read the text, focusing your attention on his/her correct pronunciation of the terms. Then switch your roles. 4. Click on the icon and see the visual representation of the term’s meaning. 5. Click on the tab “Russian language” in the window next to the icon and choose one of the applications (Google, WordRef, Reverso, or Linguee) to read the Russian translation of the term and its definition both in Russian and English. 6. Scroll down and click on websites/virtual corpora to see the table with websites ranked by the percentage of the term “plasmid” in them. Choose the website that has the word “plasmid” listed as one of the key words for this website. Click on the website, and find the information about your term and other key words that can be found there. 7. Click on ‘collocates” of the word “plasmid” and see the list of nouns, adjectives and verbs that collocate with this term. Complete Table 1 (below). and post your findings in a forum on Moodle. 8. Click on the tab “Clusters” above the table. The most frequent word clusters with this term are shown on the top of the table and are highlighted in deep blue color. Study the clusters for each term in Table 2. Complete Table 2 and post your findings in a forum on Moodle. Think of 2 sentences with the clusters you found and complete Table 3. Did you use your cluster as a subject or an object in your sentence? Post your sentences in a forum on Moodle. 9. Click on the cluster number 60 “with the plasmid” and then on the source of the text. You will be taken to the website with the full text. Read this text and discuss it with your partner. 10. Choose another term and study clusters for it. Find full texts with the clusters of your choice. Read the text. Did you learn something new in your field? Retell this text to your partner. Repeat this exercise for other terms and find more articles in the field of biotechnology. Choose one and get ready to make a short presentation about it (3-5 minutes). Use iWeb for reference to find synonyms, word clusters and collocations for your presentation. Make sure you practice the pronunciation of new terms. Give your presentation in front of the class and teach your peers new terms. Get ready to answer your peers’ questions about your mini-presentation. IMS-2021. International Conference “Internet and Modern Society” 301 Table 1 Collocations from the iWeb corpus Term Collocation 1(verb or noun) Collocation 2 (adjective) Example: Plasmid Carry Recombinant Contain Circular Encode Linear Template Your example Your example PCR Your example Your example Vector Your example Your example Purification Your example Your example Amplification Your example Your example Lysis Your example Your example Reagent Your example Your example Colony Your example Your example Sequencing Your example Your example Circumventing Your example Your example Induced Your example Your example Table 2 Clusters from the iWeb corpus Cluster Your Sentence #1 Your sentence #2 Example: Plasmid Plasmid encoding Plasmid transfection Template Your example Your example PCR Your example Your example Vector Your example Your example Purification Your example Your example Amplification Your example Your example Lysis Your example Your example Reagent Your example Your example Colony Your example Your example Sequencing Your example Your example Circumventing Your example Your example Induced Your example Your example Read the examples in the Table 3 and use the iWeb corpus to find answers to the following questions: Task1. Read the sentences: 1. Researchers also gained some insight into how tea plants came to acquire the genes that encode for caffeine. 2. This way mRNA can encode for several different proteins. 3. The best approach is to encode to MP4 files, and then repackage as necessary for the target platforms. 4. You cannot encode to 10 bits with this system. Can you explain the reason for different prepositions after the term “encode” in these sentences? 302 PART 2: Computational Linguistics Table 3 Make sentences with clusters from the iWeb corpus Cluster Your Sentence #1 Your sentence #2 Example: Plasmid encoding In this research we focus on We have developed a new characteristics of plasmids technique allowing the growth encoding these specific genes. of plasmids encoding for specific proteins. Example: Plasmid transfection In our experiment we The possibility of viral plasmid performed plasmid transfection was eliminated. transfection into mammalian cells. Template Your example Your example PCR Your example Your example Vector Your example Your example Purification Your example Your example Amplification Your example Your example Lysis Your example Your example Reagent Your example Your example Colony Your example Your example Sequencing Your example Your example Circumventing Your example Your example Induced Your example Your example Task 2.Use iWeb corpus to search the phrase” encode for proteins”. How many examples did you find? Now search the phrase “encode proteins”. Are there more or fewer examples of this phrase in the corpus? What conclusion can you make based on these findings? Task 3. Use iWeb corpus to study the cluster “Plasmids transfection “and find texts containing this cluster. What does “transfection” mean? How is the term “transduction” different from the term “transfection”? List three collocations and three clusters used with the term “transduction”. 2.3.2. NOW corpus exercises used in the experiment Task 1. Fix mistakes Work in groups of four-five people. Read the sentence: “Why don't you pay your attention for investing cure?” What is wrong with it? Use corpora iWeb or NOW to find and fix the mistakes in this sentence. 1. How many mistakes did you find? 2. What queries did you use to fix mistakes? Task 2. Countable/uncountable nouns Work in pairs and choose 3 terms to study. Access corpus NOW to study these terms: Acid, alkali weight, liquid, pressure, light, achievement, evidence, knowledge, reference persuasiveness, guidance, advice, proposal, implication, distortion What corpus queries can help you to find out if a noun is countable? Which of these terms are countable? What articles can be used with uncountable nouns? Can we use no article with a countable noun? What quantifiers can be used with uncountable nouns? Task 3 Singular/plural form. Access corpus NOW to get singular form these nouns: bacteria, data, criteria, analyses, theses, species, syllabi. What queries did you use? IMS-2021. International Conference “Internet and Modern Society” 303 Task 4. In research articles you can often see these two structures: “it is_________ believed”, “It is________ accepted”. Access corpus NOW and study the most frequent adverbs used in these structures (list 4-5 adverbs). What corpus queries did you have to make? Use your findings to think of and write down 3 sentences relevant to your field of study. 3. Results All calculations were conducted using an online t-test calculator [12]. Calculations of the statistical difference between the test results in the control and experimental groups before the experiment, showed no statistically significant difference. P value and statistical significance before the experiment: The two-tailed P value equaled 0.0868; so р>0,05. Confidence interval: The mean of Group One minus Group Two equals -0.89 95% confidence interval of this difference: From -1.93 to 0.14 Thus, this difference is considered to be not quite statistically significant and the null hypothesis could be accepted. Our null hypothesis stated that there was no significant difference between the means of the two groups. This indicates approximately the same level of language skills in students of the two groups before the start of the experiment. Below, in Figure 1, there is a chart that clearly demonstrates the results of preliminary testing in the control and experimental groups. The graph shows that the spread of students' grades is approximately the same in both groups. Figure 1: The chart of the preliminary test results (mid-course test) in the control and the experimental groups Calculations of the statistical difference between the test results in the control and experimental groups after the experiment, showed a statistically significant difference. P value and statistical significance after the experiment: The calculated two-tailed P value was less than 0.0001. By conventional criteria, this difference is considered to be extremely statistically significant. Confidence interval: The mean of Group One minus Group Two equals -2.10 95% confidence interval of this difference: From -2.71 to -1.49 Our null hypothesis stated that there was no significant difference between the means of the two groups. Therefore, the null hypothesis had to be rejected. 304 PART 2: Computational Linguistics The chart in Figure 2, shows that the students in the experimental group have significantly increased the level of their language skills, as assessed by the test at the end of the course. Unlike students in the control Group A, students in the experimental Group B had significantly improved their language skills, getting much better grades for their EOC tests. While nobody in group A received a score higher than 95%, around eighty percent of students from the experimental group B demonstrated excellent results, getting EOC test scores of 96% and higher. Figure 2: The chart of the end-of-course test results (EOC test) in the control and the experimental groups We also compared the actual improvement of the test scores in both the control and experimental groups with students’ self-reports on their improvements (Figure 3). Slightly more students in Group B than in Group A claimed that their speaking and writing skills had improved. Analysis of the questionnaire answers also showed that nobody agreed with the statement “The course was boring” in the experimental group B. Higher percentage of students in Group B agreed with the statement that the course was very interesting; however, fewer students in group B enjoyed their course. IMS-2021. International Conference “Internet and Modern Society” 305 Figure 3: The chart of statements made by students in the control and the experimental groups 4. Discussion The students in the experimental group were introduced to active learning and had to complete corpus-based exercises in addition to the regular ESP course, which was used in the control group. The study participants in both groups took tests before and after the experiment. All tests’ scores in both the control and the experimental groups were statistically evaluated and compared using unpaired t-tests and the p value. The experiment data showed the statistically significant difference between the end-of-course (EOC) test results of the experimental and control groups. Based on the calculations, it can be concluded that, as a result of working with the corpus-based exercises, the experimental group has shown significant improvement in their test scores. Evaluation of students’ oral presentations in both groups showed that students in the experimental group had better speaking skills by the end of the semester. Unlike students in the control group, the students in the experimental group spoke English with more confidence, made fewer grammatical errors, and used more complex grammar structures in their speech, compared to students of the control group. Thus, in general, the results of the experiment confirmed that corpus-based activities have a positive effect on students’ English skills. Students in both groups were asked to complete a questionnaire in which they self-reported the improvement of their language skills by the end of the semester and evaluated their ESP course. Students in the experimental group B showed statistically significant improvement in their mid-course and EOC test results; however, it did not correlate with their self-reports, which suggested only slight improvement in skills. These findings correspond to the results of Deslauriers et al. research at 306 PART 2: Computational Linguistics Harvard University, which showed that students are not always capable of fully appreciating the true value of being actively engaged in the lesson [8]. In our study, fewer students enjoyed the course in Group B than in Group A. Students’ answers showed that they resisted active learning and disliked being actively engaged in learning. Apparently, cognitive effort associated with active learning can influence some students’ motivation in a negative way, making them perceive this effort as something unpleasant, or as a sign of failure [8]. This may explain why the DDL approach in teaching languages might be met by some students without enthusiasm. These results also confirm findings in the research [8], where students also preferred being taught in a passive way, while their actual testing results showed that active learning was more beneficial for them. While students’ surveys are often used for collecting data in language program evaluations, according to the research, students are not always able to understand the value of being actively involved in the learning process, considering it to be an ineffective way of teaching. Students might prefer passive learning; however, active learning is more beneficial for their language skills development. The results of this study suggest that teachers should not feel discouraged from using corpora in their language classroom. Corpus-based exercises promote active learning in the classroom and encourage students to actively participate in the process of their language skills formation using corpora as a reference. Using corpora exposes students to authentic ESP language, teaching them to work autonomously on their language skills improvement. Corpora provides easy access to different types of authentic materials and can be used for designing various exercises for an ESP course. 5. Conclusion In this study students in the experimental group used corpus technology as a learning tool (for vocabulary and grammar) and as a reference resource (for writing and speaking tasks and self- correction of errors). The research results demonstrated that students’ motivation and their self- perceived improvement of skills did not correlate with the scale of the actual effectiveness of using corpora for teaching ESP. Therefore, we believe that students’ perception of the data driven learning impact on their language skills does not always mirror the real improvement. The limitation of the experiment is a small sample size. The results, therefore, require confirmation in a more representative sample. In the future, it is planned to conduct a similar study with a greater number of students, as this experiment included a small sample. 6. References [1] S. Marinow, Training ESP students in corpus use- challenges of using corpus-based exercises with students of non-philological studies, Teaching English with Technology, 13(4), 2013, pp. 49-76. [2] YA. Breyer, Corpora in language teaching and learning: potential, evaluation, challenges, Frankfurt am Main, Lang, 2011. [3] G. Bennett, English for Specific Purposes, Vol. 57, 2020, pp. 32-33. https://doi.org/10.1016/j.esp.2018.11.003 [4] F. Meunier, Corpus linguistics and second/foreign language learning: exploring multiple paths, Revista Brasileira de Linguística Aplicada, 11, 2010, pp. 459-477. https://doi.org/10.1590/S1984-63982011000200008 [5] C. Gbollie, H. Keamu, Student Academic Performance: The Role of Motivation, Strategies, and Perceived Factors Hindering Liberian Junior and Senior High School Students Learning, Education Research International, 2017, pp. 1-11. https://doi.org/10.1155/2017/1789084 [6] A. Boulton, T. Cobb, Corpus use in Language Learning: A meta-analysis: Meta-analysis of corpus use in Language Learning, Language Learning, 67(2), 2017, pp. 348-393. https://doi.org/10.1111/lang.12224 [7] A. Boulton, Integrating corpus tools and techniques in ESP courses, ASp, 69, 2016, pp. 113-137. IMS-2021. International Conference “Internet and Modern Society” 307 [8] L. Deslauriers, L.S. McCarty, K. Miller, K. Callaghan, G. Kestin, Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. In: Proceedings of the National Academy of Sciences, Sep 2019, 116 (39), pp. 19251-19257. https://doi.org/10.1073/pnas.1821936116 [9] M. Davies, Corpus of News on the Web (NOW), 2016. https://www.english-corpora.org/now/ [10] M. Davies, The iWeb Corpus, 2018. https://www.english-corpora.org/iWeb/ [11] “Colony Sequencing”: Direct Sequencing of Plasmid DNA from Bacterial Colonies, BioTechniques, 22, March 1997, pp. 412-418. [12] t Test Calculator. GraphPad. https://www.graphpad.com/quickcalcs/ttest1.cfm