<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Diagnostic Assessment of Adults' Reading Deficiencies in an Intelligent Tutoring System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Genghu Shi</string-name>
          <email>gshi@memphis.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anne M. Lippert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew J. Hampton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Su Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ying Fang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arthur C. Graesser</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Intelligent Systems</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Memphis</institution>
          ,
          <addr-line>Memphis TN 38111</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>105</fpage>
      <lpage>112</lpage>
      <abstract>
        <p>In this paper, we investigate whether a version of AutoTutor that teaches comprehension strategies can be used to diagnose reading deficiencies in adults with low literacy. We hypothesized that the speed and accuracy with which participants answered questions during the AutoTutor conversation could be diagnostic of their mastery of reading comprehension components: words, the explicit textbase, the situation model, and rhetorical structure. We used linear mixed effect models to compare the accuracy and response times of 52 low literacy adults who worked on 29 AutoTutor lessons during a four-month intervention period. Our results show that adults' response accuracy for questions addressing more basic reading components (e.g., meaning of words) was higher than for those pertaining to deeper discourse levels. In contrast, question response time did not vary significantly among the theoretical levels. A correlation analysis between theoretical levels and performance (accuracy and time) supported this trend. These results affirm that adults with low literacy tend to have more proficiency for basic reading levels than for deeper discourse levels. In addition, the results of exact binomial test showed that hints or prompts were effective in scaffolding learning reading. Furthermore, we describe how response accuracy on the four comprehension components can provide a more nuanced diagnosis of reading problems than a single overall performance score. More fine-grained diagnoses can assist both educators wanting more detailed insight into learner difficulties, and ITS developers looking to improve the personalization and adaptivity of learning environments.</p>
      </abstract>
      <kwd-group>
        <kwd>CSAL AutoTutor</kwd>
        <kwd>Reading strategies</kwd>
        <kwd>Comprehension framework</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>One in six adults in the United States has low levels of literacy skills [1]. Low literacy
has a negative impact on the social health and economic stability of entire countries as
well as the personal well-being of its citizens [1, 2]. Adult literacy educational programs
are often funded by government or non-profit organizations, but unfortunately these
programs generally do not reach the level that can accommodate all adults in need.
Moreover, it is difficult to teach comprehension strategies at deeper levels because few
teachers and tutors in literacy centers are trained to cover these levels of reading
difficulty. Intelligent tutoring systems can help close this gap and provide the necessary,
deeper training. An intelligent tutoring system that can differentially diagnose reading
deficits constitutes an important first step in adaptively remediating individuals’
deficits. In this study, we explore the assessment capabilities of a version of a web-based
intelligent tutoring system, AutoTutor [4, 7], specifically created for adults with low
literacy. In particular, we use AutoTutor to classify the reading comprehension
deficiencies of adults within the Graesser and McNamara [3] multilevel theoretical
framework of reading comprehension.</p>
    </sec>
    <sec id="sec-2">
      <title>AutoTutor for CSAL</title>
      <p>The version of AutoTutor we developed was part of an intervention led by the Center
for the Study of Adult Literacy (CSAL) [4, 7], and helps improve reading
comprehension in low literacy adults. The system has two computer agents (one tutor and one peer
student) that hold conversations with the human learners and with each other, called
trialogues [4, 5]. Trialogues illustrate comprehension strategies to adult learners, help
them apply these strategies, and give them feedback when assessing their performance,
all in natural language. CSAL AutoTutor has 35 lessons that focus on distinct
theoretical levels of reading comprehension [6, 7]. For each lesson, the system starts out
assigning words or texts at a medium level of difficulty and AutoTutor asks 8-12
questions about the words or text, all embedded in an overarching conversation. Struggling
readers tend to have even more pronounced difficulties in writing, so most of their
responses are entered by clicking response options on the interface. Learner response
accuracy on the medium level questions determines whether AutoTutor assigns new
words or texts at a hard or easy (above or below some performance threshold) level [8].
When answers do not include all component parts of a good answer, the learner receives
hints or prompts, providing another chance to pick an answer from the remaining two
choices with somewhat more guidance.</p>
      <p>CSAL AutoTutor was designed to “care” about the particular motivations,
metacognitions and emotions of struggling adult readers. The caring aspect of CSAL AutoTutor
is critical because most adults participating in literacy programs do so voluntarily, and
if the instruction is not adult-oriented, engaging, and pertinent to adult daily life, they
will stop attending. Thus, in addition to allowing easy access, individualized self-paced
instruction, and intuitive design for low literacy adult learners, AutoTutor was designed
to optimize engagement. First, lessons were carefully scripted to contain texts that have
practical value to the adult (such as rental agreements, job applications, recipes, health
information) or are expected to interest adults. Second, texts are adaptively selected by
AutoTutor to be at a reading level that the student can handle (not too hard or too easy),
so that the student does not become frustrated or bored. Third, trialogues were written
to boost the self-esteem of the adult learner who may feel embarrassment or shame over
his or her skill level. Both agents express positive encouraging messages when the adult
is not performing well, and sometimes stage game-like competitions between the adult
and a peer agent (with the adult always winning, thereby enhancing self-esteem). These
caring functionalities of AutoTutor help create situations that users find engaging and
welcoming and simultaneously allow the system to assess learner ability.
1.1</p>
    </sec>
    <sec id="sec-3">
      <title>The Multilevel Framework of Comprehension</title>
      <p>The Graesser and McNamara [3] framework identifies six theoretical levels: words,
syntax, the explicit textbase, the referential situation model, the discourse genre and
rhetorical structure, and the pragmatic communication level (between speaker and
listener, or writer and reader). Because AutoTutor for CSAL includes only one lesson for
syntax and none for pragmatic communication, we did not include these levels in our
study. Of the levels we included, word represents the lower-level basic reading
components that include morphology, word decoding, and vocabulary. The textbase consists
of meaning of the explicit ideas in sentences and texts. The referential situation model
(sometimes called the mental model) represents the subject matter that the texts are
describing. Genre and rhetorical structure focuses on the type of discourse and its
composition, such as narrative, persuasive, and informational genres, and also the
subcategories of these genres. The last three theoretical levels (all except word) represent
deeper discourse levels.</p>
      <p>We hypothesize that the accuracy and time on questions in AutoTutor will be
diagnostic of adult learners’ mastery of comprehension components. By comparing the
accuracy and time on questions of four theoretical levels [3], we can better pinpoint where
adult learners’ strengths and weaknesses in reading comprehension lie. Such results can
provide a more nuanced diagnosis of reading problems than a single overall
performance score and ultimately help improve the adaptivity of an ITS like AutoTutor. We
also hypothesize that adult learners who do not answer correctly on the first attempt,
and receive guidance through hints or prompts for the second attempt will perform
better than chance on these questions. These results will provide insight into AutoTutor’s
effectiveness in helping adult learners with reading comprehension.
2
2.1</p>
      <sec id="sec-3-1">
        <title>Method</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Participants</title>
      <p>The participants were 52 adults recruited from CSAL literacy classes in Metro-Atlanta
(n = 20) and Metro-Toronto (n = 32). They worked on 29 lessons during a four-month
intervention. Each lesson took 20 to 50 minutes to complete. Their ages ranged from
16–69 years (Mean = 40, SD = 14.97). Most of the participants were female (73.1%).
All participants read at 3.0–7.9 grade levels, and 30% reported that they were either
diagnosed as learning disabled or attended special education classes in their childhood.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Measures and Data Collection</title>
      <p>Only the adults’ initial responses (1 as correct, 0 as incorrect) of medium level questions
in each of the 29 lessons contributed to the diagnostic analysis. This ensured a balanced
design, as all participants were assigned the medium level texts, but not all participants
subsequently received the easy or difficult texts. In addition, the medium level
questions produce higher level discrimination. We used only the initial (as opposed to
second) attempts to questions because we felt these would best reveal adults’ actual
mastery of the theoretical levels of comprehension. For these medium-level observations,
we collected the accuracy (1 or 0) and the time to produce an answer (in seconds). Time
was measured from the onset of the question to the onset of the participant’s answer.</p>
      <p>To assess the effectiveness of the hints or prompts, we collected accuracy (1 or 0) of
the second attempt to all questions which were answered incorrectly on the first attempt
by learners. Second attempts involved all difficulty levels (medium, easy, and hard).</p>
      <p>We calculated accuracy and time measures for 29 lessons. Most of the lessons focus
on more than one theoretical level (at most three) but have varying degrees of relevance
within a lesson. For example, the lesson “Compare and Contrast” addresses mainly the
rhetorical structure level, but also includes material involving the textbase and
situation model levels. Thus, we included a relevance score for each of the four theoretical
levels for each lesson. The most relevant theoretical level on a lesson received a score
of 1.00, with scores of 0.67 and 0.33 assigned to the second and third order,
respectively. The fourth theoretical level received a 0.00 and was thus nullified for that lesson.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Data Analysis</title>
      <p>From each set of participant log files, we extracted time and accuracy data for the 29
lessons. We found that the distribution of response time per question was positively
skewed. To alleviate the bias brought by potential outliers, we truncated the data by
replacing observations falling outside three standard deviation above the mean with the
corresponding value at three z-score units beyond the mean.</p>
      <p>We first performed a descriptive analysis of the data by exploring the means and
standard deviations of accuracy and time on questions of the four theoretical levels.
Next we used mixed effect modeling [9], where item (question) was the unit of analysis,
to test for differences in time and accuracy among the four theoretical levels. To account
for the variability in participants, lessons, and questions, these components were
included in the linear mixed effect models as random intercepts. We also added
by-participant random slopes on different theoretical levels and random intercepts of the
interaction between lesson and item for the nesting relationships. Follow-up correlational
analyses were performed on the continuous measures of theoretical levels, as well as
on the accuracy and time for the 29 lessons. In addition, we conducted an exact binomial
test on the accuracy of second attempts to see if the proportion of correct responses is
greater than chance (50%).
3</p>
      <sec id="sec-6-1">
        <title>Results</title>
        <p>Results from our logistic mixed effect model of response accuracy showed a
significant difference (χ2(3) = 8.34, p = 0.040) in accuracy among the four theoretical levels.</p>
        <sec id="sec-6-1-1">
          <title>Word</title>
        </sec>
        <sec id="sec-6-1-2">
          <title>Textbase 1981</title>
        </sec>
        <sec id="sec-6-1-3">
          <title>Situation Model 5049</title>
        </sec>
        <sec id="sec-6-1-4">
          <title>Rhetorical Struc</title>
          <p>ture
5071</p>
        </sec>
        <sec id="sec-6-1-5">
          <title>No. of Items 1455</title>
          <p>PMaoradmeleter 1.66 -0.588 -0.763 -0.584
rAacccyu- p Value -- 0.058 0.004 0.028
EOsdtdimsated 1.66 1.07 0.894 1.07</p>
          <p>PMaoradmeleter 34.3 2.23 2.84 3.15
Time p Value -- 0.804 0.716 0.694</p>
          <p>PTriemdeicted 34.3 36.5 37.1 37.7</p>
          <p>Our correlational analysis showed a significant positive correlation between mean
accuracies on 29 lessons and word level (r = .386, p &lt; .05), but this correlation did not
extend to any of the discourse levels. The times showed no significant correlations
among theoretical levels. The pattern of correlations reinforced the results of mixed
effect models of accuracy and time. In addition, the word level had a significant
negative correlation with each of the three discourse levels (textbase, situation model,
rhetorical structure, with r values of -0.365, -0.485, and -0.567, respectively).</p>
          <p>The results of exact binomial test with 712 correct responses out of 1044 questions
showed that the proportion of correct responses was significantly greater than chance
(one tail p-value = 0.00).
4</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Discussion and Conclusion</title>
        <p>We performed mixed effect models and correlation analysis to see if there were
differences among adult learners’ accuracy and response times to questions in each of the
four theoretical levels. As expected, the results indicated that adult learners’
performance on word level was higher than the three discourse levels, and correlational
analysis reinforced this trend. One reason for adult learners’ higher performance for word
level items is that word items tend to focus on individual words or single sentences.
This type of stimulus is less taxing on working memory compared to items that address
deeper discourse levels, which are more time-consuming, strategic, and taxing on
cognitive resources.</p>
        <p>In a previous study [6], learning gains within the four theoretical levels were tracked
by considering performance on all items (medium, easy, and hard). Results revealed
learning occurred for lessons involving rhetorical structure, but not on other theoretical
levels. This implies that learning gains may be affected by the particular time frame
(i.e., within lessons versus across lessons) used for assessment, the difficulty of the
words and texts, and the specific theoretical levels being used. Future work is needed
to further clarify these issues.</p>
        <p>With respect to response time, we found no difference between theoretical levels,
despite a trend in the data that suggested learners were slower to respond as theoretical
level increased. Part of the explanation for this apparent discrepancy may be due to the
modest sample size (N = 52), which did not provide adequate power to detect all
differences. Another reason may be disengagement—the data may have been muddied by
adult learners who became bored or distracted. Identifying chunks of disengagement
and either removing or controlling for these periods in our analysis may reveal relevant
response time variability.</p>
        <p>The results of exact binomial test indicated that hints and prompts significantly
increased a learner’s probability of correctly answering a question that he or she had
previously answered incorrectly. This led us to the conclusion that the trialogues in
AutoTutor did help learners.</p>
        <p>In summary, we showed how AutoTutor can be used to assess reading ability in low
literacy adults and how AutoTutor trialogues scaffold learning of reading
comprehension skills. By assessing comprehension within a multi-level theoretical framework, we
attempted to provide a more nuanced diagnosis of adults’ reading abilities than a single
overall performance score. Future research could focus on designing comprehension
tests for each of the theoretical levels of the multilevel comprehension framework. The
results of these tests could be used to establish target population norms for each of the
six components of comprehension. Knowing the range of abilities of the target adult
population could help designers develop more adaptive intelligent tutoring systems for
adult literacy and provide customized learning content to low literacy adults.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Acknowledgements</title>
        <p>This research was supported by the National Center of Education Research (NCER) in
the Institute of Education Sciences (IES) (R305C120001) and the National Science
Foundation Data Infrastructure Building Blocks program under Grant No.
(ACI1443068).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>OECD</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>OECD Skills Studies Time for the U.S. to Reskill? What the Survey of Adult Skills Says: What the Survey of Adult Skills Says</article-title>
          . OECD Publishing Vernon,
          <string-name>
            <given-names>J. A.</given-names>
            ,
            <surname>Trujillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rosenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            , &amp;
            <surname>DeBuono</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Low health literacy: Implications for national health policy</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Graesser</surname>
            <given-names>AC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamara</surname>
            <given-names>DS</given-names>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>Computational analyses of multilevel discourse comprehension</article-title>
          .
          <source>Topics in Cognitive Science</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>371</fpage>
          -398
          <string-name>
            <surname>Graesser</surname>
            <given-names>AC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forsyth</surname>
            <given-names>C</given-names>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Learning by Communicating in Natural Language with Conversational Agents</article-title>
          .
          <source>Curr Dir Psychol Sci</source>
          <volume>23</volume>
          :
          <fpage>374</fpage>
          -380
          <string-name>
            <surname>McNamara</surname>
            <given-names>DS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Reilly</surname>
            <given-names>TP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Best</surname>
            <given-names>RM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozuru</surname>
            <given-names>Y</given-names>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>Improving Adolescent Students' Reading Comprehension with Istart</article-title>
          .
          <source>Journal of Educational Computing Research</source>
          <volume>34</volume>
          :
          <fpage>147</fpage>
          -
          <lpage>171</lpage>
          Shi,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,
          <article-title>Pavlik Jr</article-title>
          .,
          <string-name>
            <given-names>P.</given-names>
            , &amp;
            <surname>Graesser</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Using an additive factor model and performance factor analysis to assess learning gains in a tutoring system to help adults with reading difficulties</article-title>
          . In X. Hu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hershkovitz</surname>
          </string-name>
          , L. Paquette (Eds),
          <source>Proceedings of the 10th International Conference on Educational Data Mining</source>
          (pp.
          <fpage>376</fpage>
          -
          <lpage>377</lpage>
          ). Wuhan, China: EDM Society.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          (
          <year>2017</year>
          ).
          <article-title>Reading Comprehension Lessons in AutoTutor for the Center for the Study of Adult Literacy</article-title>
          . In S. Crossley and
          <string-name>
            <surname>D. S.</surname>
          </string-name>
          McNamara (Eds.),
          <source>Adaptive Educational Technologies for Literacy Instruction</source>
          (pp.
          <volume>288</volume>
          ─
          <fpage>294</fpage>
          ). New York: Routledge.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Graesser</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Two technologies to help adults with reading difficulties improve their comprehension</article-title>
          . In E. Segers and P. Van den Broek (Eds.),
          <article-title>Developmental perspectives in written language and literacy</article-title>
          .
          <source>In honor of Ludo Verhoeven</source>
          (pp.
          <fpage>295</fpage>
          -
          <lpage>313</lpage>
          ). John Benjamin Publishing Company.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bates</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mächler</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolker</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            <given-names>S</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Fitting Linear Mixed-Effects Models Using lme4</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>67</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>48</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>