<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Second Language Writing 26
(2014) 42-65. URL: https://www.sciencedirect.com/
science/article/pii/S1060374314000666. doi:https:
//doi.org/10.1016/j.jslw.2014.09.005</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.18653/v1</article-id>
      <title-group>
        <article-title>at LangLearn: Language Development Assessment Model based on Sequential Information Attention Mechanism</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hongyan Wu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nankai Lin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shengyi Jiang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lixian Xiao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Language Development Assessment, Sequential Information Attention Mechanism, BERT,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Asian Languages and Cultures, Guangdong University of Foreign Studies</institution>
          ,
          <addr-line>Guangzhou, Guangdong</addr-line>
          ,
          <country country="CN">PR China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Technology, Guangdong University of Technology</institution>
          ,
          <addr-line>Guangzhou, Guangdong</addr-line>
          ,
          <country country="CN">PR China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Information Science and Technology, Guangdong University of Foreign Studies</institution>
          ,
          <addr-line>Guangzhou, Guangdong</addr-line>
          ,
          <country country="CN">PR China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>cessing and Speech Tools for Italian</institution>
          ,
          <addr-line>Sep 7 - 8, Parma, IT</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>1</volume>
      <fpage>49</fpage>
      <lpage>58</lpage>
      <abstract>
        <p>In recent years, investigations into language acquisition have greatly benefited from the utilization of natural language processing technologies, particularly in analyzing extensive corpora consisting of authentic texts produced by learners across the realms of first and second language acquisition. A crucial task in this domain involves the assessment of language learners' language ability development. The “Language Learning Development” task featured in EVALITA 2023 [1] marks a significant milestone as the inaugural shared task focused on automated language development assessment, which entails predicting the relative order of two essays written by the same student. We introduce a novel attention mechanism, namely sequential information attention mechanism, with the primary objective of exploiting information interaction between sequence texts. Experimental results on the COWS dataset show the efectiveness of our proposed sequential information attention mechanism, showcasing its substantial impact on model performance during the final evaluation phase.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Mechanism</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Recently, there has been a surge of interest in
harnessing the potential of natural language processing (NLP)
tools and machine learning techniques to explore the
realm of language development, both in first (L1) and
second language (L2) acquisition scenarios. The primary
ing the linguistic attributes of learners and the dynamic
evolution of their language ability across diferent
modalities and stages of acquisition. The utilization of learner
corpora and the enhanced dependability of linguistic
features extracted through computational tools and machine
learning techniques have significantly advanced our
comprehension of the linguistic properties exhibited by
language learners. The empirical evidence has shed light
on the temporal dynamics and the evolution of these
language properties as learners progress in language ability
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>A significant focus of scholarly inquiry has been
directed towards the exploration of various avenues for
advancing the field of language development research.
nEvelop-O
(L. Xiao)
progression of language acquisition.</p>
      <p>the study aimed to establish a comparison between the
The “Language Learning Development” task featured
scores obtained from these measures and the subjective
in EVALITA 2023 is concerned with predicting the
ratings provided for the overall writing quality of the
chronological sequence of essays produced by the same
learners.
student over diferent periods. We introduce a novel
atRecent work on the application of neural networks to
tention mechanism, namely sequential information atten- language modeling has shown that models based on
certion mechanism (SIAM), intending to exploit information
tain neural architectures can capture syntactic
informainteraction between sequence texts. We submitted three
tion from utterances and sentences even without explicit
results in total, namely the fine-tuned BERT model (Run
2), the fine-tuned BERT model with SIAM (Run 3), and
syntactic goals. Sagae [11] conducted a study to
determine whether a fully data-driven model of language
dethe fusion of the results of the previous two models (Run
velopment, utilizing a recurrent neural network encoder
1). Experimental results demonstrate that our proposed
to encode utterances, could track changes in children’s
sequential information attention mechanism has a re- language over the course of their language development
markable impact on model performance during the final
in a comparable manner to the leverage of expertly
estabThese systems are trained on genuine learner data per- interaction of information within the sequence of text,
evaluation phase.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related</title>
    </sec>
    <sec id="sec-4">
      <title>Work</title>
      <p>The existing research on the language development
assessment task is mainly divided into two types, one
focuses on the construction of the language ability
development assessment model based on language features,
and the other is concerned with the construction of a
language ability development assessment model based
on neural networks.</p>
      <p>
        Given the inherent challenge of establishing a unique
indicator of linguistic complexity within the domain of
second language (L2) development, a diverse range of
features spanning various linguistic levels have been
employed as inputs for supervised classification systems.
taining to diferent L2 languages. Notable examples
include the works of Hancke and Meurers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as well as
Vajjala and Lõo [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which respectively investigated L2
German and L2 Estonian. Pilán and Volodina [8] provided
a comprehensive analysis of predictive features extracted
from both receptive and productive texts within the
context of Swedish L2 acquisition. Miaschi et al. [9] used
various linguistic features automatically extracted from
students’ written expressions to track the evolution of
written language abilities of second-language Spanish
learners. Furthermore, Miaschi et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a
natural language processing-based style measure to track the
evolution of Italian L1 learners’ written language
competence, which relied on capturing a range of linguistically
motivated features in terms of text style. In a study
conducted by Bulté and Housen [10], the objective was to
determine the nature and extent of English L2 writing
proficiency development among 45 adult ESL learners
throughout the duration of an intensive short-term
academic English language program. The investigation
employed quantitative measures that specifically targeted
various aspects of lexical and syntactic complexity
exhibited in the learners’ writing performance. Additionally,
lished language assessment metrics for language-specific
information.
      </p>
      <p>The untapped potential of neural networks in language
development assessment tasks necessitates further
exploration, as the application of pre-trained models in this
context has not been investigated.
3.</p>
    </sec>
    <sec id="sec-5">
      <title>Method</title>
      <sec id="sec-5-1">
        <title>3.1. Overview</title>
        <p>methodology. Initially, we concatenate the historical
text and current text, and utilize the pre-trained model
BERT to encode them. Subsequently, we employ the
sequential information attention mechanism to capture the
thereby updating the representation of the historical text
to obtain an improved global representation. Ultimately,
we combine the enhanced global representation of the
historical text with the original sentence representation
for the final assessment of language development.</p>
      </sec>
      <sec id="sec-5-2">
        <title>3.2. Text Representation</title>
        <p>Aiming to efectively capture the intricate semantic
information embedded within the text, we employ a
nonautoregressive pre-trained model BERT [12] renowned
for its remarkable performance in generating text-based
semantic representations for sentence encoding. BERT
possesses abundant linguistic, syntactic, and lexical
knowledge, which is acquired through unsupervised
training on a substantial corpus during the pre-training
phase. The fundamental architecture of the model
encompasses a multi-layer bidirectional Transformer encoder
[13], facilitating global information processing and
extraction. Given a historical text   = {  1,   2,   3, ...,    }
cial tokens []
and a current text  
and [ ]
= {  1, 
of BERT are utilized to
3, ...,</p>
        <p>}, two
spestitch them together, forming the text input  =
{[], 
 1,   2,   3, ...,   
, [ ],</p>
        <p>the length of the two texts respectively. The semantic rep- tion weight of the current text   to the historical text
and  denotes the dimension</p>
        <p>historical text:
where  = {ℎ []
, ℎ1, ℎ2, ℎ3, ..., ℎ , ℎ

[]
, ℎ1, ℎ2, ℎ3, to capture diferences between the current text and the
 

=  ⋅</p>
        <p>For the updated</p>
        <p>, we use the average pooling
operation to obtain an enhanced global representation of
historical text:
ℎ = (</p>
        <p>)</p>
      </sec>
      <sec id="sec-5-3">
        <title>3.4. Language Development Assessment</title>
        <p>ℎ
[]
We concatenate the enhanced global representation ℎ of
to obtain a text representation for classification:
ℎ = (ℎ
[]
, ℎ )
 =  ( )
..., ℎ , ℎ[]</p>
        <p>} ∈  (+)⋅
of semantic representation.
  is respectively:
  = {ℎ1, ℎ2, ℎ3, ..., ℎ }</p>
        <p>= {ℎ1, ℎ2, ℎ3, ..., ℎ }</p>
      </sec>
      <sec id="sec-5-4">
        <title>3.3. Sequential Information Attention</title>
      </sec>
      <sec id="sec-5-5">
        <title>Mechanism</title>
        <p>We present an innovative attention mechanism known as
the sequential information attention mechanism (SIAM),
specifically designed to exploit information interaction</p>
        <p>Then the semantic representation of text   and text
(4)
(5)
(6)
it as ( 2, 3,1). Then we construct sample 3 based on
associated with the token “[CLS]” within the given sen- the above two samples. In terms of sample 3, essay 1
tence. The representation ℎ is utilized as the sentence’s
overall feature representation, which is subsequently fed
into a linear classifier with a softmax function. The
preappears before essay 3, which is defined as (  1, 3,1). In
addition, we expand the negative samples based on the
above positive samples, namely ( 2, 1,0), ( 3, 2,0), and
dicted probabilities language development assessment of ( 3, 1,0), where ’0’ represents the the negative sample.
the text   .</p>
        <p>where  is the one-hot encoding of the text’s actual
expected value. When  = 1, 0 means that the writing
time of the text   is before the text   ; otherwise, when
 = 0, 1 means that the writing time of the text   is after</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Experiments</title>
      <sec id="sec-6-1">
        <title>4.1. Experimental Setup</title>
        <p>All experimental procedures are conducted utilizing the
NVIDIA A30 24-GB GPU. We utilize pytorch [14] and
transformers [15] to build our models. Considering the
similarity between the two languages, we only use the
Italian BERT model (dbmdz/bert-base-italian-uncased),
as we think it also contains a small amount of Spanish
information. The feed-forward layer is initialized using
weights drawn from a truncated normal distribution with
a standard deviation of 2e-2, while the bias is initialized
to zero. A fixed initial learning rate of 5e-5 is consistently
applied across all experiments. The maximum sequence
length is set to 512, representing the prescribed constraint
on the number of tokens within a sentence. To optimize
training, a warmup proportion of 1e-3 is implemented.
The training episodes span 10 epochs with a batch size
of 4.</p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Datasets</title>
        <p>The datasets provided by EVALITA 2023 “Language
Learning Development” task come from two samples,
CItA [16] and COWS-L2H [9], where the number of
training sets is 2394 and 1009 respectively. We perform data
augmentation based on the datasets. Specifically, if essay
1 in sample 1 appears before essay 2, we describe it as
( 1, 2,1), where ’1’ denotes the positive sample. While
essay 2 in sample 2 appears before essay 3, we describe
The scales the augmented datasets for CItA and
COWSL2H are 5056 and 2042, respectively. In the training set,
(8) each positive sample can match a corresponding negative
sample, so the number of positive and negative samples
in the dataset is consistent. Ultimately, two datasets are
combined to get a new training set.</p>
        <p>In order to ensure the rationality of our strategy
evaluation, we employ a 5-fold cross-validation methodology,
which involves dividing the datasets into five distinct
subsets to construct an ensemble model that exhibits
enhanced generalization capabilities. More precisely, four
of these subsets are assigned for training purposes, while
the remaining subset is utilized for verification. The
effective evaluation results of our strategies are derived by
averaging the outcomes obtained from the five models.</p>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. Submission</title>
        <p>We submit three results in total, namely the fine-tuned
BERT model (Run 2), the fine-tuned BERT model with
SIAM (Run 3), and the merge method (Run 1). The
finetuned BERT model is to fine-tune directly on the BERT
model based on the dataset. Concretely, the model in
section 3 removes the sequential information attention
mechanism. The merge method is the fusion of output
probabilities of the fine-tuned BERT model (Run 2) and
the fine-tuned BERT model with SIAM (Run 3).</p>
      </sec>
      <sec id="sec-6-4">
        <title>4.4. Experimental results</title>
        <p>Experimental results in the evaluation phase are shown
in Table 1.</p>
        <p>It can be seen that on the CItA test set, the BERT model
achieves the best performance,  
,  
curacy are 0.9338, 0.9315 and 0.9316 respectively, while
the BERT model with SIAM has slightly declined. We
deem that the impunity can be attributed to our methods
being trained on two corpora simultaneously, to some
extent, the information of the two corpora afects each
other, sacrificing the performance of the CItA dataset
in exchange for the improvement of the COWS dataset.
Concerning the merge method, regardless of the CItA
test set, the COWS test set or the combined test set, the
strategy of model fusion is powerless, which has not
and the
acbrought efective improvement.</p>
        <p>Our proposed sequential information attention
mechanism has demonstrated substantial improvements in both
the COWS test set and the combined test set. Specifically,
on the COWS test set, the BERT model with SIAM
outperforms the BERT model by 0.0427, 0.0300, and 0.0313
in the three indicators of   ,   , and accuracy,
respectively. Likewise, on the combined test set, the
BERT model with SIAM gains consistent improvement
of 0.0143, 0.0126, and 0.0128 in three metrics based on
the BERT model.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>The “Language Learning Development” task revolves
around accurately predicting the sequential order of two
essays authored by a single student. In the study, we first
attempt to tackle the task leveraging a high-performing
pre-trained language model, demonstrating the strong
potential of pre-trained language models to solve the
language development assessment task. Moreover, we
present a novel attention mechanism, known as
sequential information attention, designed to efectively
capture and leverage the interaction of information within
sequential texts. In the final evaluation stage,
experimental results reveal the efectiveness of our proposed
method, substantiating that sequential information
attention contributes to tracking the evolution of language
competence.</p>
      <p>In the future, we will further try to focus on neural
networks to extract language features suitable for language
development assessment tasks, so as to further improve
the performance of the model, driving advancements in
the field of language development assessment.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported by the Guangdong Philosophy
and Social Science Foundation (No. GD20CWY10), the
National Social Science Fund of China (No. 22BTQ045),
and the Science and Technology Program of Guangzhou
(No.202002030227).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Crossley</surname>
          </string-name>
          ,
          <article-title>Linguistic features in writing quality and development: An overview</article-title>
          ,
          <source>Journal of Writing Research</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>415</fpage>
          -
          <lpage>443</lpage>
          . URL: https: //jowr.org/index.php/jowr/article/view/582. doi:
          <volume>10</volume>
          . 17239/jowr- 2020.
          <volume>11</volume>
          .03.01.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sagae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavie</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. MacWhinney</surname>
          </string-name>
          ,
          <article-title>Automatic measurement of syntactic development in child language, in: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), Association for Computational Linguistics</article-title>
          , Ann Arbor, Michigan,
          <year>2005</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>204</lpage>
          . URL: https://aclanthology.org/P05-1025. doi:
          <volume>10</volume>
          .3115/1219840.1219865.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Automatic measurement of syntactic complexity in child language acquisition</article-title>
          ,
          <source>International Journal of Corpus Linguistics</source>
          <volume>14</volume>
          (
          <year>2009</year>
          )
          <fpage>3</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          .1075/ijcl.14.1.02lu.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Dell'Orletta, A nlp-based stylometric approach for tracking the evolution of l1 written language competence</article-title>
          ,
          <source>Journal of Writing Research</source>
          <volume>13</volume>
          (
          <year>2021</year>
          )
          <fpage>71</fpage>
          -
          <lpage>105</lpage>
          . URL: https://www. jowr.org/index.php/jowr/article/view/778. doi:
          <volume>10</volume>
          . 17239/jowr- 2021.
          <volume>13</volume>
          .01.03.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hancke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Meurers</surname>
          </string-name>
          ,
          <article-title>Exploring cefr classification for german based on rich linguistic modeling</article-title>
          ,
          <year>2013</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vajjala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lõo</surname>
          </string-name>
          ,
          <article-title>Automatic CEFR level prediction for Estonian learner text, in: Proceedings of the third workshop on NLP for computer-assisted language learning</article-title>
          , LiU Electronic Press, Uppsala, Swe-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>