<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Investigating the Use of Lexical Bundles and Keyness in B2 and C1 ESL Learners' Academic Writing</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Liverpool</institution>
          ,
          <addr-line>Liverpool L69 3BX</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>This research investigates whether there is a relationship between the use of three- and four-word Lexical bundles and language proficiency. The study conducts both quantitative and qualitative analyses to see whether learners from different CEFR levels groups exhibit the same behaviour in the use of Lexical bundles. Therefore, in the first stage, it compares between two different levels B2 and C1 in terms of frequency, structures and functions of Lexical bundles to give an overview of some of the linguistic features to differentiate between the levels. In the second stage, a longitudinal study investigated the development of ESL learners use of Lexical bundles across the levels to give a picture of the increases of the proficiency levels. A major finding from the analysis shows that generally, ESL learners favoured using more signalling bundles in their writing, three-word bundles turned out to be the most frequent bundles in ESL sub-corpora. Moreover, significant progress has been found in the variability of the structures and functions of Lexical bundles, C1 writers are found to have used various structures and functions as professional writers in their academic writing. For the development of Lexical bundles in relation to the CEFR levels, the findings clearly indicate that there is no significant relationship between the increased use of Lexical bundles and academic performance. However, multiple regression analysis revealed that there is a direct proportionality between variations of the use of Lexical bundles and the CEFR levels, as (C1) students act as professional writers and used variant structures and functions than (B2) Students.</p>
      </abstract>
      <kwd-group>
        <kwd>Academic writing</kwd>
        <kwd>Lexical bundles</kwd>
        <kwd>ESL Learners</kwd>
        <kwd>Corpus-based</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Lexical bundles are word combinations that can be defined as continuous multiword
sequences that recur frequently to satisfy specified frequency and dispersion
thresholds; for example, occurring at least 20-40 times per million words in five texts, or in
at least 10% of texts [
        <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
        ]. Lexical bundles have captured the attention of many
linguists since Biber et al. (1999) first introduced the notion in Longman Grammar of
Spoken and Written English. Considerable attention has been given to lexical bundles
within the area of corpus linguistics, and interest has increased since being widely
agreed that lexical bundles are widespread in spoken and written registers, serving a
“building blocks of discourse," where “frequent use of these bundles is indicative
      </p>
      <p>Copyright ©2020 for this paper by its authors.</p>
      <p>
        Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
of fluency in linguistic production" [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These bundles have been found to be used by
both native and non-native speakers of a language to fulfill specific discourse
functions within a particular context [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ].
      </p>
      <p>
        The bundles are important elements by which to measure learners’ language
development, and both native and non-native speakers indicate their language proficiency
by using lexical bundles in their academic writing; the absence of these bundles
signals a novice writer. This idea has been supported with empirical evidence showing
that the competent use of lexical bundles contributes to fluent language production.
[
        <xref ref-type="bibr" rid="ref12 ref6">6, 12</xref>
        ] For example, Biber et al. (1999) investigation of lexical bundles in
conversation and academic prose found that bundles constituted approximately 21% of the
written discourse. Cortes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] agrees that using lexical bundles is an indication of a
competent language user, and Ellis et al. 2008 argue that use of lexical bundles
frequently results in native-like language use.
      </p>
      <p>
        However, many studies have investigated the use of lexical bundles by non-native
speakers of different levels across a range of registers and academic disciplines.
According to the previous studies, although there has been an increase in the use of
lexical bundles by non-native speakers, their use is limited to specific bundles causing
them to overuse some expressions compared to others, making their writing appear
non-native [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Some studies have argued that experts writers use lexical bundles in
a way that is functionally different from novice authors and, in general, that lexical
bundles are used much more frequently by experts than novice writers [
        <xref ref-type="bibr" rid="ref1 ref11">1, 11</xref>
        ]. Römer
(2009) states that experts are more important than nativeness and the distinction
between novices and experts is more important than L1 andL2 distinction. Similarly,
Staples et al. (2013) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] investigated idiomaticity through the use of lexical bundles
in written response across three proficiency levels in the Test of English as a Foreign
Language Internet-Based Test TOEFL iBT, in a controlled environment. The study
found an increase in the number of lexical bundles used as proficiency level
increased.
      </p>
      <p>To the best of the researcher’s knowledge, while most previous studies have paid
considerable attention to the use of lexical bundles across different registers and a
number of disciplines, little research has been done to investigate whether learners
from different proficiency level groups exhibit the same behaviour in their use (or
not) of lexical bundles. This research investigates whether there is a relationship
between the use of three- and four-word lexical bundles and language competence. The
study utilises both quantitative and qualitative analyses to determine whether learners
from different CEFR (Common European Framework of Reference) level groups
exhibit the same behaviour in the use of lexical bundles. Additionally, this study
examines the development of lexical bundles across proficiency levels. Specifically, it
compares between two different levels, B2 and C1, in terms of the frequency,
structures, and functions of lexical bundles to give an overview of some of the linguistic
features that differentiate between the levels. This study addresses the following
questions:
– What are the most frequently used three- and four-word lexical bundles in the</p>
      <p>B2 and C1 sub-corpora?
– –What does a keyness analysis reveal about lexical bundles identified in theB2
and C1 sub-corpora?
– –How do lexical bundles in the B2 sub-corpus differ from C1 in terms of
structure and function?
– Is there any growth in the lexical bundles identified in the study between B2and</p>
      <p>C1 learners?
1
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>
          This study is first interested in the relationship between the use of lexical bundles and
academic performance; thus, the author compared B2 and C1 sub-corpora (for the
frequency, structures, and functions of lexical bundle) of ESL learners and then
compared them with a reference corpus. The data used came from written essays
equivalent to the IELTS test in terms of the title, written by 42intermediate and advanced
ESL learners from different mother tongue who have studied in the UK who
contributing 130 essays. These learners write academic essays to test their progress and place
them at new levels if they meet the requirements at the English Language Centre
(ELC). Only argumentative or expository pieces written by L2 learners were chosen
for the sub-corpora. The decision to use learners’ sub-corpora was based on the
assumption that they are useful to explore and identify the similarities and differences in
the use of recurrent word combinations across L2 proficiencies of “actual language in
use” [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The second stage of the study consisted of second language development research,
which compares learners’ language across proficiency levels (CEFR levels). A
longitudinal study investigated the development over three months of two ESL learners use
of lexical bundles in their academic essays across the levels to trace the increases in
proficiency level. The participants were two ESL students (one male and one female)
at the upper intermediate level that moved to advanced level after two months who
contributed 36 essays to be used for the investigation.
1.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Determination of CEFR levels</title>
        <p>
          The procedure for determining the CEFR level originates from the manual for
Relating Language Examinations to the CEFR for Languages [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Using the manual helps
to choose the appropriate samples – for standardisation purposes – from the collected
essays, which are considered representative of the B2 and C1 levels [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Three
experienced examiners working at the British Council and teaching IELTS preparation were
trained to rate the essays using a Writing Assessment Scale developed by the CEFR.
The essays were marked by two raters independently; if any essays were given
different scores, they were then re-rated by a third rater. Therefore, they received three
ratings rather than two. If an essay received three different ratings, it was excluded. If
raters agreed, the inter-rater reliability for the two raters was calculated to determine
the percentage of agreement among the raters, following [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] which used by (Chen
and Baker [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] as a statistic to measure inter-rater reliability between the raters. After
the rating step, the total number of words in the ESL learner’s corpus forming 15488
in B1 sub-corpus and 12752 as described in Table 1.
For the longitudinal study, 35 essays were rerated to be used in the investigation; 15
essays were incorporated into the B2 sub-corpus, totaling 5,007 words, by contrast,
the C1 sub-corpus consisted of 20 essays totaling 10,597 words.
1.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Reference corpus</title>
        <p>
          The reference corpus used in this study was taken from the British Academic Writing
English (BAWE) corpus, which contains 2,761 texts of proficient assessed academic
works written at universities in the UK (6,506,995 words), ranging in length from
around 500 words to approximately 5,000 words. However, since the target
subcorpora used argumentative essays (equivalent to the IELTS task 2)written by ESL
learners, it was decided to use BAWE (linguistics and English disciplines) as a
reference corpus to avoid skewing the sample heavily toward one discipline. These two
disciplines are big enough to be used as a reference corpus as well as include relative
language that ESL learners use in their academic essays, using other disciplines such
as Philosophy or Biochemistry might effect the results. Therefore, linguistics and
English disciplines are suited to the goal of this study as they provide a wide range of
language representative of ESL students writing in an authentic academic context.
As stated by Leech [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] ‘A Reference Corpus is designed to provide comprehensive
information about the language which has to be a general Corpus of wide coverage of
the language”. To ensure comparability, only 65 short texts of the BAWE corpus
(linguistics and English disciplines) were selected for the investigation. This was
sufficient number for a reference corpus and was used in this study, comprising
163,091 words – this is more than five times greater than the target sub-corpora (B2
and C1), having 15,488 and 12,752 words, respectively.
1.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Analysis</title>
        <p>
          The analysis used to answer the above research questions was carried out using
Wordsmith computer software. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] Due to the smaller sub-corpora size in this study,
the low-frequency cut-off point of four times per 100,000 (40times per million words)
was selected to include highly used lexical bundles in the analysis and eliminate
lowfrequency parameters. In addition to frequency cut-off, dispersion criteria were
applied where a bundle had to be found in at least three to five texts [
          <xref ref-type="bibr" rid="ref11 ref4 ref8">4, 8, 11</xref>
          ]
or in at least 10% of the texts [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to avoid focusing on idiosyncratic uses by the
individual authors of the texts.
        </p>
        <p>
          After retrieving the corpus and applying the frequency and distribution criteria,
Wordsmith provided lists of three- and four-word lexical bundles for both
subcorpora. Hyponyms were checked and cleared from all the bundles found. In order to
narrow down the included lexical bundles, all content-based bundles were discarded,
as they do not reflect the use of general academic language, such as The United
Kingdom or The University of Liverpool. In addition, overlap-ping bundles were
combined as one bundle to avoid duplication in the counting of high-frequency bundles.
For example, the bundle can be used to and it can be used to were counted as one
bundle, adding a word between the brackets such as, (it) can be used to [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
2
2.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Finding and discussion</title>
      <sec id="sec-3-1">
        <title>Frequency of lexical bundles</title>
        <p>
          The results revealed that the B2 sub-corpus accounted for 102 (type) three- and
fourword lexical bundles, which occurred 458 times, making up 9.2 % of the total number
of words in the sub-corpus. The C1 essays contained 45 (type) three-and four-word
lexical bundles, which occurred 204 times in the sub-corpus and made up 5 % of the
total words in the sub-corpus. What stands out is that the lower-level students used a
larger stock of lexical bundles than the higher-level students as presented in table 2.
In addition, the three-word bundles were revealed to be the most common bundles at
both levels. Therefore, it can be concluded that ESL learners have a tendency to
employ a higher number of three-word than four-word bundles with an increase in
lowlevel students. A possible explanation might be related to the complexity of their
production, which language learners avoid in their writing, as it requires more effort and
time for students to produce longer sequences than shorter ones. The result was not
surprising; Biber et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] states that three-word Lexical bundles are extremely
common because they are “a kind of extended collocational association”, while longer
bundles are “more phrasal in nature and correspondingly less common”. Another
finding to note is that the bundle on the other hand was the most frequently appearing
bundle in the B2 and C1 sub-corpora. This bundle is common and important in
academic discourse; most ESL learners are familiar with it and know how to use it both
structurally and functionally.
        </p>
        <p>
          Surprisingly, few of the most frequent bundles in the BAWE corpus were found in the
ESL learners’ corpora: only eight out of the 50 most frequent lexical bundles in B2
and C1 sub-corpora were identified in the BAWE corpus. According to that, although
the B2 level students used more lexical bundles than C1 students, certain bundles
were new and used by only a few learners with repeated the same bundle more than
once in their essays. For example, the bundle on the other was identified 19 times in
the B2 sub-corpora (although one student used it three times in one text). A possible
explanation for this might be that ESL learners tend to use certain lexical bundles
more frequently to reflect a high level of formality and demonstrate their language
competence; alternatively, they may still be in the process of learning additional
lexical bundles. This result conflicts with those presented by Chen and Baker [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], who
found many shared lexical bundles across both native and non-native academic
writing.
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Keyness analysis</title>
        <p>To determine the ’key’ bundles in B2 and C1,WordSmithsoftware was used to
generate a list of ‘key’ bundles that occur unusually frequently in the target sub-corpora
when compared with a reference corpus (i.e. BAWE) by means of statistical tests (e.g.
chi-square or log-likelihood). A ‘keyness’ value is given for each bundle that has
statistically significant, the higher the keyness score, the more the key bundle is
statistically significant. The WordSmith software provides a list of lexical bundles which
are positively and negatively key. However, as the main focus only on the positive
keyness, the WordSmith tool was sitting to ignore all the negative results as provided
in Table 3 and 4.</p>
        <p>
          The results provided some evidence for the common assertion in the previous
studies that ESL learners favour particular bundles and overuse them in their writing [
          <xref ref-type="bibr" rid="ref12 ref15 ref19">12,
15, 19</xref>
          ].
The keyness analysis of the sub-corpora revealed that L2 learners overuse some
signaling words in their writing. In general, therefore, it seems that low-level students
are more likely to rely on the use of lexical bundles than C1 students, and accounted
for more instances: nine significant key bundles were identified at the B2 level,
whereas only two key bundles were found in the C1 sub-corpus. This result might be
affected by the corpus size for this study, as the C1 sub-corpus consisted of only
12752 words.
2.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Structures and functions in B2 and C1 sub-corpora</title>
        <p>
          Structurally, Biber et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] structural taxonomies were adopted, which have been
used in various research studies in this area [
          <xref ref-type="bibr" rid="ref10 ref12 ref3">3, 10, 12</xref>
          ]. However, they were modified
and developed for this study, using Biber et al. (2004) classification to place the
identified bundles that did not fall under Biber et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] structural taxonomy, as provided
in the table 5.
Although B2 and C1 writers showed variation in the use of lexical bundles according
to the structural classification, there were differences in the use of lexical bundles
between EFL sub-corpora and the RC.
        </p>
        <p>
          The results showed that EFL learners used more phrase bundles than clausal
bundles in their writing. More specifically, verb-based bundles were the most frequent
three- and four-word bundles found in the B2 and C1 sub-corpora. Among the two
CEFR levels, the C1 level had the highest proportions of verb-based bundles, at
53.4%, while the B2 level had a lower percentage, 40.5%. These results conflict with
the idea of the rarity of verb-based bundles in academic discourse [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The results of
the present study suggest that the language of EFL writing contains more
conversational bundles. By contrast, the reference corpus clearly represents the formal writing
genre, as it contains more noun-based bundles, which is a sign of academic writing.
        </p>
        <p>It can be concluded that the three groups employed a different percentage of most
of the structural sub-categories, except the ‘preposition-based’ category. The
chisquare test results of the correlational analysis revealed a significant difference among
the corpora. The standardised residuals in a chi-square contingency table for the
distribution of structural types revealed that greater differences occurred in the
‘verbbased’, ‘noun-based’ and ‘other’ categories. For instance, the test shows that C1
writers overused verb-based bundles compared to B2writers, which supports the idea that
C1 students rely more on spoken language in their writing. In regard to Noun-based, it
appeared that B2 students underused these bundles in their writing. On the other hand,
B2 writers overused ‘other’ bundles not related to any sub-category (e.g., as adverbial
or modal bundles).</p>
        <p>As the standardised residuals in a chi-square did not show any significant
difference in the use of ‘prepositions-based’ bundles, the result reflects the similarity of the
proportion of preposition-based bundles in both levels and BAWE, at 15% of total
bundles. The ‘PP expressions’ subcategory is typically used to show the logical
relationship between prepositional elements, which means that EFL learners could use
this type of lexical bundle to link between the ideas of the argumentation. The
difference in frequency of the use of different structural categories across the levels
suggests that as their level increases the students are able to recognise and use the
adverbial meaning of the bundles.</p>
        <p>
          Functionally, Hyland’s taxonomy was adopted, since the data used in the this study
was mainly academic prose (see Table 6) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          In order to be able to classify bundles into the correct sub-categories, it was
important to look at the concordance line to see the bundles in their context and to tackle
the issue of multi-functionality of the target bundles. There was similarity in the use
of functional categories between the levels. The most frequent functions of the
identified bundles across the levels were research-oriented followed by participant-oriented,
and then text-oriented. The increase in use of research-oriented bundles in the B2 and
C1 sub-corpora might be due to the fact that in argumentative essays, students need to
describe various aspects and provide different justifications of their ideas to the
reader. Bundles of this function accounted for more than 40% of all bundles identified in
the corpora. This result is similar to previous studies, which have found that academic
writing is dominated by research-oriented bundles over other categories [
          <xref ref-type="bibr" rid="ref14 ref6 ref7">6, 7, 14</xref>
          ].
A consequence of the high proportion of research-oriented bundles might be a focus
on describing the problems in the argumentative essay rather than its presentation.
Researchoriented
        </p>
        <sec id="sec-3-3-1">
          <title>Textoriented</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Participantoriented</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>Overall</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Location</title>
          <p>Procedure
Quantification
Description
Topic
Total
Transition signals
Regulative signals
Structuring signals
Framing signals
Total text-oriented
Stance features
Engagement
features
Total
participantoriented
10
types
In the comparison between the levels, it was seen that B2 writers used
researchoriented bundles more often than C1 writers. By contrast, C1 writers employed more
text-based and participant-based bundles than B2 writers. The study found a direct
proportionality between the percentage of text-oriented and participant-oriented
bundles as the level increased. In addition, chi-square unstandardized residuals statistical
methods were used in the analysis of structural and functional type, to further support
arguments in this study. Statistically, the study has failed to demonstrate any
statistically significant difference in functional distributions between the levels.
2.4</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Longitudinal study</title>
        <p>For the development of lexical bundles across the levels in the second stage, the
results were similar to the first stage, where three-word bundles were found to be the
most frequent bundles in the EFL sub-corpora. However, the results provide some
evidence that suggests there may be development of the use of lexical bundles across
the levels, but not to a statistically significant degree. This might be due to the number
of collected essays that made up the sub-corpus and the short period of time the
learners were tracked over.</p>
        <p>Structurally, there was much variability in terms of the structures and functions of
lexical bundles across the levels. High-level EFL learners used a greater variety
of structures and functions in their writing than low-level learners. The results showed
that there were distinctive differences in terms of the greater use of ‘noun–based’,
‘preposition-based’ and ‘verb-based’ bundles by both levels and in the reference
corpus. It should be noted that, across the four categories, the percentage of three
structural categories in the C1 level seem closer to those in the reference corpus than did
those at the B2 level. The B2 levels students used six out of 12 subcategories, while
C1 and reference corpus students used 10 out of 12 subcategories. The chi-square
revealed significant differences between the levels and the reference corpus, and the
standardised residuals (R), which compared observed and expected counts in each
cell, showed that greater differences occurred in all the categories, as the C1 and
reference corpus used significantly more ‘verb-based’ and ‘noun-based’ bundles and
fewer ‘preposition-based’ bundles than B2, except in the ‘other’ category, which did
not show any significant difference between the levels, which reflected the frequent
use of bundles such as I want to, a lot of, the fact that (the), and the development of.</p>
        <p>By contrast, the overuse of preposition-based bundles in the B2 sub-corpus
reflected the frequent use of bundles such as in order to and as well as. Functionally, while
the density of text-oriented bundles appeared almost identical in the B2 sub-corpus,
the use of research-oriented and participant-oriented bundles in the C1 sub-corpus
seems to be more aligned with the reference corpus.</p>
        <p>Further analysis of the functional sub-categories revealed the same results as for
the structural sub-categories: the C1 level seemed closer to the reference corpus than
the B2 level. The result of the chi-square test revealed a significant difference among
the three groups. The standardised residuals (R), which compare observed and
expected counts in each cell, showed that the greatest differences between the groups
occurred in the ‘text-oriented’ and ‘participant-oriented ‘categories, as the C1 and
reference corpora used significantly more participant-oriented bundles but fewer
textoriented than the B2 level. This might be due to the wide range of topics that
argumentative and expository essays covered.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Limitations</title>
      <sec id="sec-4-1">
        <title>Summary of findings</title>
        <p>A major finding from the analysis was that, generally, EFL learners favoured using
more signaling bundles in their writing; three-word bundles were found to be the most
frequent bundles in EFL sub-corpora. Moreover, significant progress was identified in
terms of the variability of the structures and functions of lexical bundles, C1 writers
were found to have used various structures and functions as professional writers in
their academic writing. In terms of the development of lexical bundles in relation to
CEFR level, the findings clearly indicated that there was no significant relationship
between the increased use of lexical bundles and academic performance. However,
multiple regression analysis revealed that there is a direct proportionality between
variations in the use of lexical bundles and CEFR level, as higher-level students (C1)
acted as professional writers and used more variant structures and functions than
lower-level students (B2).</p>
        <p>The results of this study show that there are specific lexical bundles that maybe
considered to be the building blocks of ESL learners academic essays.
These results might be interesting for English language teachers and instructors
because it provides insights into the ESL learners community preferences in academic
writing.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Limitation</title>
        <p>
          Like many other studies, the present investigation has its limitations. One of which is
the small corpora size. However, small corpora size can produce more lexical bundles
than the big corpus. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] To avoid biased results, the frequency cut-off point and
dispersion criteria were set at 40 occurrences per million words to include highly used
lexical bundles in the analysis and eliminate low-frequency parameters. In addition to
frequency cut-off, dispersion criteria were also applying in at least three texts.
Acknowledgement. We are grateful to the three anonymous reviewers who provided
insightful comments on earlier versions of this article.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ädel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Römer</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Research on advanced student writing across disciplines and levels: Introducing the Michigan Corpus of Upper-level Student Papers</article-title>
          .
          <source>International Journal of Corpus Linguistics</source>
          ,
          <volume>17</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>34</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Adolphs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Introducing electronic text analysis: A practical guide for language and literary studies</article-title>
          ,
          <source>Routledge</source>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bal</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Analysis of Four-word Lexical Bundles in Published Resesarch Articles Written by Turkish Scholars</article-title>
          . Georgia State University (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Lexical bundles in university spoken and written registers</article-title>
          .
          <source>English for specific purposes</source>
          ,
          <volume>26</volume>
          ,
          <fpage>263</fpage>
          -
          <lpage>286</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>If you look at</article-title>
          . . . :
          <article-title>Lexical bundles in university teaching and textbooks</article-title>
          .
          <source>Applied Linguistics</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ),
          <fpage>371</fpage>
          -
          <lpage>405</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johansson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leech</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conrad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finegan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Quirk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Longman grammar of spoken and written English</article-title>
          , MIT Press Cambridge, MA (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Lexical bundles in L1 and L2 academic writing</article-title>
          .
          <source>Language Learning &amp; Technology</source>
          ,
          <volume>14</volume>
          ,
          <fpage>30</fpage>
          -
          <lpage>49</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Lexical bundles in L1 and L2 academic writing</article-title>
          .
          <volume>14</volume>
          ,
          <fpage>30</fpage>
          -
          <lpage>49</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Investigating criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and C1</article-title>
          . Applied Linguistics,
          <volume>37</volume>
          ,
          <fpage>849</fpage>
          -
          <lpage>880</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Cortes</surname>
          </string-name>
          , V.:
          <article-title>Lexical bundles in freshman composition</article-title>
          , Amsterdam, John Benjamins Publishing Company (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cortes</surname>
          </string-name>
          , V.:
          <article-title>Lexical bundles in published and student disciplinary writing: Examples from history and biology</article-title>
          .
          <source>English for Specific Purposes</source>
          ,
          <volume>23</volume>
          ,
          <fpage>397</fpage>
          -
          <lpage>423</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hyland</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Academic clusters: text patterning in published and postgraduate writing</article-title>
          .
          <source>International Journal of Applied Linguistics</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>62</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hyland</surname>
            ,
            <given-names>K. J.</given-names>
          </string-name>
          : Bundles in academic discourse.
          <volume>32</volume>
          ,
          <fpage>150</fpage>
          -
          <lpage>169</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jalali</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Zarei</surname>
            ,
            <given-names>G. R.</given-names>
          </string-name>
          :
          <article-title>Academic writing revisited: A phraseological analysis of applied linguistics high-stake genres from the perspective of lexical bundles</article-title>
          .
          <source>Journal of Teaching Language Skills</source>
          ,
          <volume>34</volume>
          ,
          <fpage>87</fpage>
          -
          <lpage>114</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D. Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S. X.</given-names>
          </string-name>
          :
          <article-title>Making a bigger deal of the smaller words: Function words and other key items in research writing by Chinese learners</article-title>
          .
          <source>Journal of Second Language Writing</source>
          ,
          <volume>18</volume>
          ,
          <fpage>281</fpage>
          -
          <lpage>296</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Leech</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The importance of reference corpora. Hizkuntza-corpusak. Oraina eta geroa (</article-title>
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Schmitt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>The acquisition of lexical phrases in academic writing: A longitudinal case study</article-title>
          .
          <source>Journal of Second Language Writing</source>
          ,
          <volume>18</volume>
          ,
          <fpage>85</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mchugh</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          :
          <article-title>Interrater reliability: the kappa statistic</article-title>
          .
          <source>Biochemia medica: Biochemia medica</source>
          ,
          <volume>22</volume>
          ,
          <fpage>276</fpage>
          -
          <lpage>282</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Römer</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>English in academia: Does nativeness matter</article-title>
          .
          <source>Anglistik: International Journal of English Studies</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>100</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Scott</surname>
          </string-name>
          , M.:
          <source>WordSmith Tools (Computer Software. Version 6.0)</source>
          .
          <source>Liverpool: Lexical Analysis Software</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Staples</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Egbert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Mcclair</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section</article-title>
          .
          <source>Journal of English for academic purposes</source>
          ,
          <volume>12</volume>
          (
          <issue>3</issue>
          ),
          <fpage>214</fpage>
          -
          <lpage>225</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Verhelst</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Van Avermaet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Figueras</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>North</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Common European Framework of Reference for Languages: learning, teaching</article-title>
          , assessment, Cambridge University Press (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>