<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ISEE</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Toward the Automatic Assessment of Text Exercises</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Philip Bernius</string-name>
          <email>janphilip.bernius@tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernd Bruegge</string-name>
          <email>bruegge@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Technical University of Munich</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>2</volume>
      <fpage>19</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>-Exercises are an essential part of learning. Manual assessment of exercises requires efforts from instructors and can also lead to quality problems and inconsistencies between assessments. Especially with growing student populations, this also leads to delayed grading, and it is more and more difficult to provide individual feedback. The goal is to provide timely responses to homework submissions in large classes. By reducing the required efforts for assessments, instructors can invest more time in supporting students and providing individual feedback. This paper argues that automated assessment provides more individual feedback for students, combined with quicker feedback and grading cycles. We introduce a concept for automatic assessment of text exercises using machine learning techniques. Also, we describe our plans to use this concept in a case study with 1900 students.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION AND PROBLEM</title>
    </sec>
    <sec id="sec-2">
      <title>Instructors face a large population of students in their</title>
      <p>
        courses. Students require feedback on their exercises to reflect
on their progress [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The concepts of interactive learning
[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] helps to increase the interaction between instructors
and students but also increases the workload for instructors.
Software engineering students need to learn constructive and
creative capabilities. It is important for the instructor to
facilitate the problem-solving learning process. Concrete
problemsolving strategies are taught in paradigms, accepted by the
profession [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Each paradigm provides a set of
problemsolving exercises. These are usual textual exercises that involve
the application of problem-solving techniques.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Exercises are a proven method to train higher cognitive</title>
      <p>skills including the acquisition of domain-specific knowledge,
analysis and design methods and the evaluation of the results.</p>
    </sec>
    <sec id="sec-4">
      <title>Trivial exercises, such as multiple-choice quizzes, do not</title>
      <p>
        stimulate higher cognitive skills and do not reflect engineers
daily work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Exercises help students to learn, understand and apply a
paradigm. A student needs feedback to reflect and improve on
their solution to the exercise. Text exercise assessment causes
time-intensive efforts with instructors, preventing them from
spending time on improving their lectures, having discussions
with their students or update exercises to incorporate
technology evolution.</p>
      <p>
        Increasing student populations make it harder to keep
assessments fair and at equal quality. Students do not benefit
from quantitative feedback alone [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Qualitative feedback
helps students to improve.Splitting assessment efforts with
multiple instructors can lead to inconsistencies. Providing
timely or instant feedback in a large class is hard [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Waiting
for feedback delays the students learning progress and hinders
interactive learning. We strive toward a system to provide
automated text assessments based on instructor feedback
decreasing student feedback waiting times.
      </p>
    </sec>
    <sec id="sec-5">
      <title>This paper is structured as follows: Section I introduces the</title>
      <p>domain and outlines the problems with the current correction
process for text exercise. Our vision is described in Section II
in the form of a visionary scenario. Section III describes
the assessment workflow of a possible implementation and</p>
    </sec>
    <sec id="sec-6">
      <title>VIRTUAL ONE-TO-ONE, a machine learning based mecha</title>
      <p>nism for providing individualized feedback for students in
large classes. Section V discusses applicability and limitations
of the system. We present related work in Section VI.
Section IV proposes our evaluation approach, and Section VII
concludes the paper.</p>
    </sec>
    <sec id="sec-7">
      <title>II. VISIONARY SCENARIO</title>
    </sec>
    <sec id="sec-8">
      <title>The following scenario describes how we envision to improve the assessment of text exercises:</title>
    </sec>
    <sec id="sec-9">
      <title>Anna and Tom are students participating in a software</title>
      <p>engineering course. During a lecture, the instructor starts
an in-class text exercise to be completed in the assessment
system. Anna and Tom both submit a solution to the system.</p>
    </sec>
    <sec id="sec-10">
      <title>The instructor starts manually assessing a set of submissions</title>
      <p>selected by the system. The system asks the instructor to assess</p>
    </sec>
    <sec id="sec-11">
      <title>Annas solution. The instructor provides a score and a comment</title>
      <p>explaining his assessment. After receiving the assessment,
the system decides to assess Toms solution automatically
based on the assessments provided previously. Anna and Tom
get individual feedback for their solution to reflect on their
learning progress.</p>
    </sec>
    <sec id="sec-12">
      <title>Tom is not satisfied with his submission after receiving</title>
      <p>his feedback. He decides to improve his work and resubmits
a refined version of his solution. The system automatically
assesses Toms resubmission and provides a new assessment.</p>
    </sec>
    <sec id="sec-13">
      <title>Tom is now satisfied with his assessment and fished the exercise.</title>
    </sec>
    <sec id="sec-14">
      <title>III. ASSESSMENT WORKFLOW</title>
    </sec>
    <sec id="sec-15">
      <title>In a first prototypical implementation, we extend the</title>
    </sec>
    <sec id="sec-16">
      <title>ArTEMiS system, already capable of assessing programming and modelling exercises automatically [1, 7], by adding semiautomated text assessment. A student submits his solution for</title>
      <p>Submit
solution
Refine
solution
no
yes</p>
      <p>Satisfied?</p>
      <p>Submission</p>
      <p>Review
assessment</p>
      <p>Assessment</p>
      <p>Calculate
Total Score
a text exercise to the ArTEMiS system. The activity diagram in
Fig. 1 depicts the assessment workflow. The system supports
two means of assessment: Manual assessment provided by the
instructor (Section III-A) and automatic assessment generated
by the system based on an assessment model (Section III-B).</p>
    </sec>
    <sec id="sec-17">
      <title>ArTEMiS decides which assessment method is required for</title>
      <p>each submission based on the quality of the assessment model.</p>
    </sec>
    <sec id="sec-18">
      <title>Both means of assessment provide a set of Feedback Items.</title>
    </sec>
    <sec id="sec-19">
      <title>The assessment of the submission is a composition of all</title>
      <p>
        feedback items. The final score is the sum of all feedback
scores (see Fig. 2). Student review the assessment of their
submission. If they are not satisfied, they can submit a
refined solution for assessment, enabling continuous interactive
learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with text exercises.
      </p>
      <sec id="sec-19-1">
        <title>A. Manual Assessment</title>
        <p>
          ArTEMiS selects text exercise submissions for manual
assessment by instructors if the assessment model does not allow
for a confident assessment. Instructors are used to grading
exercises using a set of rubrics. A rubric defines a set of traits
of the students’ submission, which are evaluated based on a
scale [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Rubrics can exist in different levels of detail, such
as only listing aspects of the assignment or defining different
scoring levels. If instructors do not define a rubric beforehand
explicitly, they build a rubric in their mind while assessing.
        </p>
      </sec>
    </sec>
    <sec id="sec-20">
      <title>Instructors break down a submission into blocks and match</title>
      <p>each block with a rubric. As illustrated in Fig. 3, instructors
define text blocks themselves as a phrase, sentence or
paragraph by selecting a piece of text as they see fit. They assess
each block quantitatively and qualitatively using a score and
a feedback comment (see Feedback in Fig. 2).</p>
      <sec id="sec-20-1">
        <title>B. Automatic Assessment</title>
      </sec>
    </sec>
    <sec id="sec-21">
      <title>ArTEMiS assesses submissions automatically, if the quality</title>
      <p>of the assessment model allows for a confident assessment.</p>
    </sec>
    <sec id="sec-22">
      <title>The assessment model is trained based on the manual assessments of text blocks provided by instructors. Fig. 4 depicts the automatic assessment process. For automatic assessment,</title>
      <p>Automatic
Feedback</p>
      <p>ArTEMiS
Automatic assessment</p>
      <p>possible?
yes</p>
      <p>Assess
automatically</p>
      <p>«affects»
Assessment</p>
      <p>Model
Train Assessment</p>
      <p>Model
no</p>
      <p>Instructor
Assess
manually
Manual
Feedback
submissions need to be broken down to text blocks
automatically, first. Second, a vector representation of the text blocks
is calculated as an input value for further computations. Third,
the assessment needs to be generated for each text block.</p>
    </sec>
    <sec id="sec-23">
      <title>A first, simple approach is using sentences as text blocks.</title>
    </sec>
    <sec id="sec-24">
      <title>We split submissions into sentences using delimiter characters</title>
      <p>( . : ? ! ) or line breaks. In a later stage, we plan on applying
techniques such as topic modelling for text block calculation
if the simple approach does not provide sufficient results. All
text blocks need feedback to complete an assessment.</p>
    </sec>
    <sec id="sec-25">
      <title>ArTEMiS calculates a vector representation for each text block. Therefore, blocks are translated into a multidimensional vector space, following the word2vec algorithm</title>
      <p>Text Exercise
problemStatement
sampleSolution
participate()</p>
      <p>AssessmentModel</p>
      <p>SimilarityCluster
✱
Student</p>
      <p>VectorRepresentation
✱</p>
      <p>Submission
solution
submit()
score</p>
      <p>0..1
Assessment
Instructor</p>
      <p>TextBlock</p>
      <p>0..1</p>
      <p>Feedback
phrase
score
comment
Manual
Feedback
provide()</p>
      <p>
        Automatic
Feedback
confidence
Fig. 2. The relevant entities in the system are depicted in a class diagram.
A student creates a submission for a text exercise. An assessment is a
composition of multiple feedback items referencing text blocks. A feedback
item can be a manual or automatic feedback item. An instructor provides
manual feedback. Automatic feedback items are a proxy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for manual
feedback items. A similarity cluster aggregates the vector representations of
text blocks. The assessment model consists of many similarity clusters.
[
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] and its doc2vec extension for sentences and
paragraphs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The algorithm can employ different strategies to
calculate one-hot word vectors.
      </p>
    </sec>
    <sec id="sec-26">
      <title>Using the resulting vector representation, we use cluster</title>
      <p>
        analysis to detect clusters of submission blocks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] from
all submissions of the same exercise. These clusters list the
different statements submitted by all students as a part of their
solutions.
      </p>
      <p>Our primary assumption is that a single feedback item can
be valid for text blocks from multiple submissions. Feedback
for text blocks within the same similarity cluster can be applied
to other nodes within the same cluster. This allows the system
to provide VIRTUAL ONE-TO-ONE feedback: Real instructor
feedback is applied to equivalent text blocks in a new
submission automatically. ArTEMiS chooses a previously assessed
text block located closely in the same similarity cluster, the
nearest neighbour. The instructor feedback is selected for the
new submission and ArTEMiS creates an automatic feedback
item, a proxy for the manual feedback item (see Fig. 2).</p>
    </sec>
    <sec id="sec-27">
      <title>If a cluster does not contain a manual feedback item, the system decides that an automatic assessment is not possible and requests a manual assessment from the instructor.</title>
    </sec>
    <sec id="sec-28">
      <title>IV. EVALUATION APPROACH</title>
    </sec>
    <sec id="sec-29">
      <title>We plan to conduct a case study to evaluate the automated</title>
      <p>assessment quality in the Introduction to Software Engineering
(EIST) lecture taught at the Technical University of Munich
to 1900 students. Students in the course complete weekly
homework exercises. We will use the system for text exercise
submissions and assessments in two stages.</p>
      <p>As the first stage, we conduct a shadow test using our
prototypical implementation. The learners submit their solution to a
text question using our system. Instructors establish a truth set
by assessing all submissions manually. Automatic assessment
is not used during this stage. The truth set will be used for
quantitative evaluation of the automatic assessment accuracy
by comparing automatic assessments with the corresponding
manual assessment.</p>
    </sec>
    <sec id="sec-30">
      <title>Hypophysis 1: Automatic assessments of text exercises</title>
      <p>following the presented concept produce results identical to
manual assessments with an accuracy greater than 85%.</p>
    </sec>
    <sec id="sec-31">
      <title>In a qualitative study, we will interview the instructors to</title>
      <p>analyze the block-based assessment concept (Sec. III-A), and
its applicability to grading and providing feedback.</p>
    </sec>
    <sec id="sec-32">
      <title>Hypophysis 2: The assessment concept allows capturing</title>
      <p>all feedback necessary for assessment of text exercises. No
information is lost compared to traditional assessment.</p>
    </sec>
    <sec id="sec-33">
      <title>In the second stage, we will conduct a second study in a later</title>
    </sec>
    <sec id="sec-34">
      <title>EIST lecture to evaluate the complete automatic assessment workflow. We will evaluate how many manual assessments are needed to generate accurate assessments and the effects on assessment time.</title>
    </sec>
    <sec id="sec-35">
      <title>Hypophysis 3: Employing automatic assessment can save</title>
      <p>more than 50% in total required assessment time for all
submissions. The assessment time per submission will increase
compared to paper-based assessments.</p>
    </sec>
    <sec id="sec-36">
      <title>A qualitative study with student interviews assesses the usefulness of automated feedback for them. Further, we want to understand students feeling toward automatic feedback.</title>
    </sec>
    <sec id="sec-37">
      <title>V. DISCUSSION</title>
    </sec>
    <sec id="sec-38">
      <title>We discuss applicability, limitations and implications of</title>
      <p>automatic text assessment. Feedback generated following the
concepts introduced in this paper can only be as good as
the feedback provided by the instructor. The system supports
the assessment process by automating the repetitive process
involved in assessing text submissions.</p>
      <p>Grading based on automatic assessment leads to ethical
problems. It is unclear whether non-native language or special
figures of speech could lead to decreased scores. Applications
in grading should be preceded by an extensive evaluation of
assessment quality. While applications in grading are
out-ofscope for this paper, we propose application in a two-phase
grading process only. We intend to apply the system as a
learning-support system. The generated feedback should help
students during their learning progress and should not be used
during a grading process.</p>
    </sec>
    <sec id="sec-39">
      <title>The applicability of the described systems depends on the</title>
      <p>variety of possible solutions. Exercises with a variable answer
space require more knowledge for assessment, increasing
the complexity. The system focuses on assessing exercises
from the lower spectrum of the revised Bloom’s Taxonomy:</p>
    </sec>
    <sec id="sec-40">
      <title>Remember, Understand, Apply and Analyze [14]. Exercises</title>
      <p>of the given categories provide a lower variability of possible
solutions and therefore limit the number of similarity clusters.</p>
    </sec>
    <sec id="sec-41">
      <title>Exercises from the categories Evaluate and Create are out of scope for this paper.</title>
    </sec>
    <sec id="sec-42">
      <title>The design of the system allows for a hybrid assessment approach. A future system could combine manual and automatic feedback to further reduce the efforts for instructors. This could be especially useful if a certain aspect of the solution</title>
      <p>Split Submission
into Text Blocks</p>
      <p>Feedback</p>
      <p>TTeexxttBlock</p>
      <p>TextBBlolocckk
Find existing</p>
      <p>Feedback in
Similarity Cluster</p>
      <p>Calculate Vector
Representation
Similarity Cluster</p>
      <p>Vector
Representation
Find Similarity
Cluster of Text</p>
      <p>Blocks
has a larger variability. A possible example is an exercise
asking for two definitions and a comparison of the terms.</p>
    </sec>
    <sec id="sec-43">
      <title>The variability for the definitions is small, but the variability for the comparison part is larger. A hybrid approach allows instructors to focus the manual assessment on the comparison part, as soon as the definitions can be assessed confidently.</title>
    </sec>
    <sec id="sec-44">
      <title>VI. RELATED WORK</title>
      <p>
        Kiefer and Pado suggest a system to simplify the grading
process presenting responses to instructors in a sorted manner
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Submissions are sorted by similarity with a defined
sample solution. Terms used in both the sample solution and
the submission are highlighted. The tool supports instructors
during the grading process but does not automatically
assess submissions. The only criterion is the sample solution.
      </p>
    </sec>
    <sec id="sec-45">
      <title>Instructor assessments are not considered for the following</title>
      <p>submissions.</p>
      <p>
        Wolska et al. and Basu et al. suggest a grading process
where instructors grade submissions sorted by clusters of
similar submissions for exercises in the domains of German as a
foreign language [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and the United States Citizenship Exam
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. They propose clusters of entire submissions, compared
to the text block based clustering approach presented in this
paper. Basu et al. introduce grading of an entire cluster of
submissions as a single action [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
    </sec>
    <sec id="sec-46">
      <title>Gradescope Inc. offers its tool Gradescope, a commercial</title>
      <p>
        solution for grading assistance and ”AI-assisted Grading”.
Their core product offers a rubric based grading system,
allowing instructors to define a set of scores with feedback
comments per exercise. Instructors manually select rubrics for
each submission. Changes to the scores and comments in a
rubric are applied to previously assessed submissions. The
”AI-assisted Grading” feature creates groups of submissions
(compare with similarity clusters), allowing the instructor to
select rubrics for the entire group of submissions, similar to the
approach of Basu et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The automatic creation of groups
is limited to multiple-choice and fill-in-the-blank exercises. It
does not offer an automatic grouping of text questions.
      </p>
    </sec>
    <sec id="sec-47">
      <title>These works focus on traditional exam assessment. The</title>
      <p>primary objective is an accelerated grading process, rather
than providing feedback through comments. The focus of our
approach is primarily providing more qualitative feedback to
students on homework and in-class assignments.</p>
    </sec>
    <sec id="sec-48">
      <title>VII. CONCLUSION</title>
      <p>Assessments of text exercises require time-intensive efforts
from instructors today. We argue that an automated process
to generate VIRTUAL ONE-TO-ONE feedback can reduce
assessment efforts for instructors and increase the amount
of feedback for students. The system should use machine
learning techniques to detect text blocks of the same meaning
in submissions and automatically link real instructor feedback
to equivalent blocks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Krusche</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Seitz</surname>
          </string-name>
          , “
          <article-title>Increasing the Interactivity in Software Engineering MOOCs -</article-title>
          A
          <source>Case Study,” in 31th Conference on Software Engineering Education and Training</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kolb</surname>
          </string-name>
          ,
          <article-title>Experiential Learning: Experience As The Source Of Learning And Development</article-title>
          . Prentice Hall,
          <year>1984</year>
          , vol.
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Krusche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seitz</surname>
          </string-name>
          , J. Bo¨rstler, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Bruegge</surname>
          </string-name>
          , “
          <article-title>Interactive Learning: Increasing Student Participation through Shorter Exercise Cycles,” in 19th Australasian Computing Education Conf</article-title>
          . ACM,
          <year>2017</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          , The Structure of Scientific Revolutions. University of Chicago Press,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sadler</surname>
          </string-name>
          and E. Good, “
          <article-title>The Impact of Self-</article-title>
          and
          <source>Peer-Grading on Student Learning,” Educational Assessment</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          , Feb.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jerse and M. Lokar</surname>
          </string-name>
          , “
          <article-title>Providing Better Feedback for Students Using Projekt Tomo,”</article-title>
          <source>in 1st ISEE Workshop</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Krusche</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Seitz</surname>
          </string-name>
          , “
          <article-title>ArTEMiS - An Automatic Assessment Management System for Interactive Learning</article-title>
          ,
          <source>” in 49th Technical Symposium on Computer Science Education. ACM</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bruegge</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Dutoit</surname>
          </string-name>
          ,
          <string-name>
            <surname>Object-Oriented Software Engineering Using</surname>
            <given-names>UML</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterns</surname>
          </string-name>
          , and
          <string-name>
            <surname>Java</surname>
          </string-name>
          , 3rd ed.
          <source>Prentice Hall</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V. J. A. Barbara E.</given-names>
            <surname>Walvoord</surname>
          </string-name>
          , Effective Grading:
          <article-title>A Tool for Learning and</article-title>
          Assessment in College, 2nd ed.
          <source>Jossey-Bass</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          , “
          <article-title>Vector-based Models of Semantic Composition,” in 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</article-title>
          ,
          <year>2008</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          , “
          <article-title>Efficient Estimation of Word Representations in Vector Space,” CoRR</article-title>
          , vol.
          <volume>1301</volume>
          .3781,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , “
          <article-title>Distributed Representations of Sentences and Documents,”</article-title>
          <source>in 31st International Conference on Machine Learning</source>
          , vol.
          <volume>32</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>II</fpage>
          -1188
          <string-name>
            <surname>-</surname>
          </string-name>
          II-1196.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          , “Correlation Clustering,”
          <source>Machine Learning</source>
          , vol.
          <volume>56</volume>
          , no.
          <issue>1-3</issue>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>113</lpage>
          , Jul.
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Krathwohl</surname>
          </string-name>
          , “
          <article-title>A revision of bloom's taxonomy: An overview</article-title>
          ,”
          <source>Theory into Practice</source>
          , vol.
          <volume>41</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>218</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiefer</surname>
          </string-name>
          and U. Pado, “
          <article-title>Freitextaufgaben in Online-Tests - Bewertung und Bewertungsunterstu¨tzung,” HMD Praxis der Wirtschaftsinformatik</article-title>
          , vol.
          <volume>52</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>107</lpage>
          , Feb.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Horbach</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Palmer</surname>
          </string-name>
          , “
          <article-title>Computer-Assisted Scoring of Short Responses: The Efficiency of a Clustering-Based Approach in a Real-Life Task,”</article-title>
          <source>in Advances in Natural Language Processing</source>
          . Springer,
          <year>2014</year>
          , pp.
          <fpage>298</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          , and L. Vanderwende, “
          <article-title>Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the Association for Computational Linguistics</article-title>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>402</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>