=Paper=
{{Paper
|id=Vol-3051/LDI_1
|storemode=property
|title=The Learner Data Institute—Conceptualization:  A Progress Report
|pdfUrl=https://ceur-ws.org/Vol-3051/LDI_1.pdf
|volume=Vol-3051
|authors=Vasile Rus,Stephen E. Fancsali,Philip Pavlik Jr.,Deepak Venugopal,Arthur C. Graesser,Steve Ritter,Dale Bowman,The LDI Team
|dblpUrl=https://dblp.org/rec/conf/edm/RusFPVGRBT21
}}
==The Learner Data Institute—Conceptualization:  A Progress Report==
<pdf width="1500px">https://ceur-ws.org/Vol-3051/LDI_1.pdf</pdf>
<pre>
              The Learner Data Institute—Conceptualization:
                           A Progress Report
                                                   Vasile Rus1, Stephen E.
                                                  Fancsali2, Philip Pavlik, Jr.1,
                                                 Deepak Venugopal1, Arthur C.
                                                 Graesser1, Steve Ritter2, Dale
                                                 Bowman1, and The LDI Team
                                                       1The University of Memphis
                                                         2Carnegie Learning, Inc.

                                                     vrus@memphis.edu,
                                                sfancsali@caregielearning.com


ABSTRACT                                                                The LDI is a “frameworks” project funded by the United States’
This paper provides a progress report on the first 18 months of         National Science Foundation (NSF) under the Data-intensive
Phase 1, the conceptualization phase, of the Learner Data Institute     Research in Science and Engineering (DIRSE) program to make
(LDI; www.learnerdatainstitute.org). LDI is currently in Phase 1,       the learning ecosystem more effective, efficient, engaging,
the conceptualization phase, to be followed by Phase 2, the institute   equitable, relevant, and affordable. It is part of the NSF’s
or convergence phase. The current 2-year conceptualization phase        Harnessing the Data Revolution1 (HDR) Institutes effort. “HDR
has two major goals: (1) develop, implement, evaluate, and refine       Institutes… enable breakthroughs in science and engineering
a framework for data-intensive science and engineering for the          through collaborative, co-designed programs to formulate
future institute, and (2) use the framework to provide prototype        innovative data-intensive approaches to address critical national
solutions, based on data, data science, and science convergence, to     challenges” (NSF-HDR, 2021). LDI focuses on data-intensive
a number of core challenges in learning science and engineering.        approaches to developing and improving learning environments
By targeting a critical mass of key challenges that are at a tipping    that include adaptive instructional systems as a means to address
point, LDI aims to start a chain reaction that will transform the       the challenge of offering access to high-quality education to
whole learning ecosystem. We will emphasize here the key                everyone—no matter what neighborhood they live in, and
elements of the LDI science convergence framework that our team         regardless of gender, race, national origin, native language,
developed, implemented, and now is in the process of evaluating         personal interests, or any other factor that might limit such access
and refining. We highlight important outcomes of the convergence        and educational opportunity.
framework and related processes, including a 5-year plan for the        There is a twofold focus during the current 2-year conceptualization
institute phase and data-intensive prototype solutions to transform     phase: (1) develop, implement, evaluate, and refine a framework
the learning ecosystem.                                                 for data-intensive science and engineering, and (2) use the
                                                                        framework to provide prototype solutions, based on data, data
Keywords                                                                science, and science convergence, to a number of core challenges
                                                                        in learning science and engineering. The institute or convergence
big data in education, science convergence, learning engineering,
                                                                        phase would build on results realized and insights gained from this
adaptive instructional systems, intelligent tutoring systems.
                                                                        conceptualization phase. By targeting a critical mass of key
1. INTRODUCTION                                                         challenges that are at a tipping point (i.e., targeting challenges for
This paper provides a progress report on the first 18 months of the     which timely investment in data-intensive approaches has the
two-year conceptualization phase of the Learner Data Institute          maximum potential for a transformative effect), LDI will start a
(LDI; www.learnerdatainstitute.org). The present work updates           chain reaction that will transform the whole learning ecosystem,
that of Rus et al. (2020), which provided an introduction to LDI and    lifting it to a qualitatively higher state that is more effective,
early activities and outcomes. We emphasize here the                    engaging, equitable, relevant, and affordable. Indeed, since the
developments of the past 12 months (since the 2020 paper),              learning ecosystem is a complex web of interrelated elements,
focusing on the key elements of the science convergence                 improvements in key aspects will percolate throughout the whole
framework, its development, implementation, evaluation, and             learning ecosystem.
refinement, and key outcomes such as the 5-year plan of the future      LDI has brought together a team which currently consists of 60+
institute and data-intensive prototype solutions to address key         researchers, developers, and practitioners from three continents
challenges in the learning ecosystem.                                   spanning many disciplines and backgrounds. Team members are


Copyright © 2021 for this paper by its authors. Use permitted under     1 https://www.nsf.gov/cise/harnessingdata/

Creative Commons License Attribution 4.0 International (CC BY
4.0).
drawn from institution and organizations representing academia,           data, together with advanced data science methods, are likely to
government, and industry.                                                 offer insights about learning and instruction and lead to the
Together, we intend a rigorous test of the hypothesis that emerging       development of effective and affordable instructional tools that
learning ecologies that incorporate adaptive instructional systems        were not possible before. This is promising enough to believe that
(AISs) are capable of providing affordable, effective, efficient,         the learning ecosystem is at a tipping-point to be transformed.
equitable, and engaging individualized assistance for both learners       Indeed, LDI is built on the belief that AISs constitute a necessary
and instructors, and that the characteristics, parameters, and            catalyst to enable the transformation of the learning ecosystem
impacts of these systems, for example, effectiveness (in terms of         through harnessing the data revolution because, as noted earlier,
learning gains), can be improved over time given sufficient               AISs can monitor and scaffold learners at a very fine granularity
attention to evidence, captured as data, and expertise, provided by       level, at scale, and across time. It should be noted that much of
teams of interdisciplinary researchers like ours.                         education data, (e.g., currently collected by schools), relies on a set
The idea that AISs and data science have the potential to radically       of predefined competencies or standards to monitor student
transform existing learning ecosystems is based on the following:         progress. Such data only reveal what students know or mastered
(1) evidence suggesting that individualized instruction is generally      and what they don’t know (didn’t master yet), but such data often
more effective than traditional classroom instruction where               do not reveal much about the learning and instructional process.
monitoring and tailored support to each individual learner is not         That is, much of the school data focus on “where the student is” but
possible (Bloom, 1984; Chi, Roy, & Hausmann, 2008; Cohen,                 not what they do during instructional activities. Fundamentally,
Kulik, & Kulik, 1982; VanLehn et al., 2007); (2) the capability of        teachers and schools in general lack the capacity to monitor and
modern technologies to collect, store, and access vast and rich           store data about all students at every single step of the learning and
learner data; (3) incentive-based mechanisms to share goods such          instruction process. LDI will thus offer schools a new powerful
as education data using online market places (Hartline, 2012;             framework to understand, monitor, and intervene at a fine-grain
Hartline et al., 2019) and secure and privacy preserving ways to          level with potentially transformative effects on the learning
access and process data based on differential privacy and multi-          ecosystem.
party computation (Dwork, 2008; Wang, Ranellucci, & Katz,
2017); (4) promising new advances in data science, including
                                                                          3. FRAMEWORK FOR SCIENCE
powerful machine learning and statistical methods such as deep            CONVERGENCE
neural networks, statistical relational learning, causal modelling,       A major goal of LDI conceptualization phase is to develop,
and probabilistic temporal graphs, for extracting useful knowledge        implement, test, and refine a framework for data-intensive research
from massive educational data sets (Spirtes, Glymour, & Scheines,         in science and engineering enabling science convergence, aligning
2001; LeCun, Bengio, & Hinton, 2015; Schmidhuber,                         with the Growing Convergence Research (GCR) “big idea”
2015; Bach, Broecheler, Huang, & Getoor, 2017; Pearl &                    identified by the National Science Foundation.
Mackenzie, 2018); and (5) recently available access to affordable,
                                                                          According to NSF, “convergence research is a means of solving
powerful, and scalable cloud-based computing resources for
                                                                          vexing research problems, in particular, complex problems
processing big data (Hellerstein et al., 2019; Atwal, 2020).
                                                                          focusing on societal needs. It entails integrating knowledge,
2. DATA SCIENCE AND AISs — A                                              methods, and expertise from different disciplines and forming
                                                                          novel frameworks to catalyze scientific discovery and innovation."
TRANSFORMATIVE MIX FOR THE                                                Also, “convergence is a deeper, more intentional approach to the
LEARNING ECOSYSTEM                                                        integration of knowledge, techniques, and expertise from multiple
The LDI is founded on the key observation that data science and           disciplines in order to address the most compelling scientific and
AISs are a powerful mix with potentially transformative impact on         societal challenges” (NSF-GCR, 2020).
the learning ecosystem.
                                                                          NSF identifies Convergence Research as having two primary
Big educational data (edu-data) create tremendous opportunities to        characteristics:
reveal facets along which learner experiences can be tailored or
adapted in ways heretofore impossible. A particular learning                  “Research driven by a specific and compelling problem.
environment may result in different learning outcomes for different            Convergence Research is generally inspired by the need to
(groups of) students because of students’ idiosyncratic prior                  address a specific challenge or opportunity, whether it arises
knowledge, experience(s), interest(s) and motivation(s). A small               from deep scientific questions or pressing societal needs.”
minority of students, for example, that approach a problem in a               “Deep integration across disciplines. As experts from
unique way could be overlooked in a small dataset, but larger                  different disciplines pursue common research challenges, their
datasets give us the possibility to detect and account for individual          knowledge, theories, methods, data, research communities and
differences in learning. To this end, our mission is to harness the            languages become increasingly intermingled or integrated.
data revolution to further our understanding of how people learn.              New frameworks, paradigms or even disciplines can form
AISs can monitor and scaffold learners at a fine level of granularity          sustained interactions across multiple communities” NSF-
(e.g., capturing every single step during instructional activities) and        (GCR, 2020).
with respect to many aspects of learning (e.g., cognitive,
                                                                          LDI’s compelling problem is making the learning ecosystem more
behavioral, affective, social, motivational facets of learning) at
                                                                          effective, engaging, equitable, efficient, relevant, and affordable.
scale (i.e., for millions of learners and teachers and across many
topics and domains) and across time periods (e.g., across grade-          To foster deep integration across scientific disciplines, we have put
levels). Such rich data, when collected, can be characterized as deep     in place a convergence framework, comprising a diverse team,
(many data instances from millions of learners), wide (capturing          organizational structures, processes, mechanisms, activities, and
many aspects of the learning process at a fine granularity level), and    tools, meant to encourage broad participation, coordination,
long (longitudinal, i.e., across time and grade levels). Such big edu-
collaboration, and diffusion and integration of knowledge across             Incentives for team members to proactively and deeply engage
disciplines.                                                                  in convergent activities and working towards accomplishing
                                                                              the goal/mission of the team which is to solve the compelling
LDI has intentionally sought, from its inception, to follow NSF’s
                                                                              problem:
characterization of convergence research by “intentionally
bring[ing] together [from the inception] intellectually diverse                    o    Resources
researchers and stakeholders to frame … research questions,
develop effective ways of communicating across disciplines and                     o    Freedom to propose research tasks that fit their own
sectors, adopt common frameworks for their solution, and, when                          interests and align with the LDI mission
appropriate, develop a new scientific vocabulary.” (NSF-GCR,                       o    Bottom-up and top-down strategies for agenda
2020) The LDI team seeks, where possible, to develop “sustainable                       setting
relationships that may not only create solutions to the problem that
engendered the collaboration, but also develop novel ways of                       o    Semi-autonomous teams/groups
framing related research questions and open new research vistas”                   o    Flexible, open structure
(NSF-GCR, 2020).
                                                                             Progress monitoring and refinement of the convergence
To make these intentions a reality, LDI’s leadership team and
                                                                              framework
participants have designed, prototyped, and tested a process and a
corresponding set of tools designed to transform what is currently       Our framework will enable team members to develop a shared
a loosely coupled group of research centers, AIS commercial              vision and language, which over time should lead to effective and
providers, and governments research labs engaged in similar but          meaningful cross-discipline, collaborations, i.e., science
disparate research and development efforts into a set of interacting     convergence. Such mutual sense- making, science convergence,
teams (Berry, 2011; Lilian, 2014), in aggregate constituting a           and R&D efforts are likely to incubate solutions to complex
physical and virtual community of practice (Lave & Wenger,               problems to enable effective, efficient, engaging, equitable, and
1991). We have not and will not attempt to “tighten” the coupling        affordable learning experiences for everyone. We detail next the
between participating research centers. As Weik (1991) has argued        main components of our science convergence framework.
in respect to educational systems, loosely coupled systems have
several advantages over tightly coupled ones—not least flexibility,      3.1 LDI’s Mission and Vision
survivability (with dysfunction in individual nodes tolerable), and      LDI’s mission is to harness the data revolution (HDR) to further
increased likelihood of beneficial “mutations.” Rather, LDI’s            our understanding of how people learn, how to improve adaptive
leadership has intended to design and test a set of processes and        instructional systems (AISs), and how to make emerging learning
tools that will support the independent work of the participating        ecologies that include online and blended learning with AISs more
research centers, facilitate the flow of information and ideas within    effective, efficient, engaging, equitable, relevant, and affordable.
and across these centers, and help to keep participants focused on
                                                                         Our vision is for LDI to: (i) serve as a hub to identify investment
common problems without the need for direct intervention (e.g., in
                                                                         opportunities for data-intensive approaches to core learning science
the form of a top-down, tightly controlled research agenda).
                                                                         and engineering challenges to accelerate progress toward equitable
LDI’s team structure and processes enable the harnessing and             learning and achievement in education; (ii) foster, support, and
diffusion of expertise from various areas in an efficient and            build a portfolio of inter-related, inter-disciplinary prototyping or
effective way while fostering individual initiative and interests. For   “Scale-up Projects” to research, develop, and disseminate data-
example, LDI team members were encouraged in the                         intensive solutions across multiple academic and non-academic
conceptualization phase to propose prototyping tasks that they are       communities that currently cannot easily communicate with each
interested in and which fit the LDI mission statement (see more          other, embodying a process of science convergence; (iii) bridge the
details later). Organizational structures and processes are              HDR ecosystem with the educational data science and learning
intentionally open, flexible, and scalable to enable the LDI to grow     engineering community and the broader education world, and, in
and transform based on emerging findings and partnerships with           particular, serve as the education & training hub for the HDR
other NSF-supported HDR teams.                                           ecosystem, assisting other teams with developing data science
                                                                         training platforms for their communities.
The key elements of the LDI convergence framework are listed
below.                                                                   LDI will forge new HDR frontiers by:
    Mission/Common Goal                                                     furthering our understanding of learning and instructional
                                                                              processes and environments;
    An intellectually diverse team with stakeholder representation
     (researchers, developers, practitioners including school and            developing data science infrastructure for the education and
     teachers’ representatives)                                               the HDR ecosystem;
    An effective and efficient team structure                               improving AISs and scale them up both horizontally and
                                                                              vertically;
    Activities and     processes    that   foster   cross-discipline
     interactions                                                            advancing research at the human-technology frontier in future
                                                                              learning ecologies that involve AISs;
    Processes, mechanisms, and tools to nurture collaboration,
     broad participation, diffusion and integration of knowledge             transforming communities of practice (e.g., triggering a
     across disciplines, and coordination                                     culture shift in teacher training programs);
    Resources, in terms of funding, student support, travel, and            exploring how data science can address equity, ethics,
     access to big edu-data and other cyber-infrastructure resources          diversity, and inclusion aspects of education.
3.2 LDI’s Team and Team Structure                                      efforts such as concrete prototyping tasks that are being carried out
LDI’s team evolved and grew from 45+ members (see Rus et al.,          in the Phase 1 conceptualization and (2) to help shape the 5-year
2020) to over 60 as of this writing. In preparation for the longer-    plan for Phase 2 by identifying opportunities for investment (i.e.,
term “convergence” or institute phase (LDI Phase 2), we have           promising developments in one area that could benefit the other
extended our interdisciplinary team to include additional              areas or specific activities of the institute).
researchers and personnel from academia, K-12 schools, industry,
and government, giving us access to the necessary stakeholders,
infrastructure, expertise, and learning data to pursue targeted
investment opportunities.
LDI is led by the Institute of Intelligent Systems at The University
of Memphis and main corporate partner Carnegie Learning,
developer of commercial-grade AISs serving over 500,000 students
in 2,000+ school districts. The assembled team now spans 14 main
organizations on 3 continents, including NSF-funded partners such
as the Institute for Data, Econometrics, Algorithms, and Learning
(IDEAL; NSF HDR TRIPODS project led by researchers at
Northwestern University) and LearnSphere: Building a Scalable
Infrastructure for Data-Driven Discovery and Innovation in
Education (NSF DIBBs project; Carnegie Mellon University lead).
                                                                                          Figure 1. LDI team structure.
In addition, partners include researchers, practitioners, and other
stakeholders from the US Army’s Generalized Intelligent                The following Expert Panels were initially formed: Data Science,
Framework for Tutoring project (Sottilare et al, 2016) and 6           K-12 Education, Learning Sciences, Learning Systems
additional corporate partners, 3 laboratory schools (The Early         Engineering, Ethics & Equity, and Human-Technology Frontier.
Learning & Research Center, Campus Elementary School, and              Expert Panel membership is flexible; LDI participants may belong
University Middle School in Memphis, TN), 3 K-12 school districts      to more than one Expert Panel but must be actively engaged in at
- Shelby County Schools (Memphis, TN area; 200 schools, 100,000        least one. Expert Panels have co-leaders who are responsible for
students), Brockton Public Schools (Boston, MA area; 24 schools,       ensuring that the panels successfully reach milestones (e.g.,
15,000 students), Val Verde Unified School District (Los Angeles,      reviewing concrete tasks).
California area; 21 schools, 20,000 students), and one teacher
training program at Christian Brothers University.                     Concrete tasks or “Scale-Up Projects” are prototyping endeavors
                                                                       led by individual researchers (see the section on Building
3.3 Team Structure                                                     Prototypes for Concrete Tasks later). Examples of concrete tasks
The team structure consists of a leadership team, domain-oriented      include projects directed at scaling data-driven domain model
Expert Panels, and task-oriented groups that in the                    refinement, using auto-encoders for student assessment, and data-
conceptualization phase have driven prototyping projects for very      driven instructional strategy discovery.
concrete, well-defined tasks, hence called concrete tasks.
                                                                       3.4 Stakeholder Representation
The LDI Core Leadership Team is responsible for overseeing and         Our team includes representatives of various communities with an
coordinating LDI activities, making sure those activities align with   invested interest in the learning ecosystem such as researchers,
the mission of the institute and offering necessary support for        developers, practitioners, government, policymakers, and funders.
cohesiveness of activities. The Leadership Team consists of Lead       Nevertheless, there are gaps in LDI’s expertise. For instance, we do
Principal Investigator (PI) Dr. Vasile Rus, Carnegie Learning          not currently have representatives from domains including
Principal Investigator Dr. Stephen Fancsali (co-PI), and co-PIs        neuroscience, the law, and social and moral philosophy, primarily
from University of Memphis: Dr. Dale Bowman, Dr. Philip Pavlik,        due to Phase 2 budget constraints. We hope to account for such
and Dr. Deepak Venugopal. Project coordinator Jody Cockroft,           expertise through ad-hoc engagement with appropriate experts
Senior Research Scientist Dr. Donald Morrison, Dr. Arthur              (e.g., reviewing and feedback from targeted experts in those areas).
Graesser, a Professor Emeritus at The University of Memphis
                                                                       While diverse opinions and perspectives are represented within the
round out the Leadership Team.
                                                                       team and make possible greater organizational learning and
LDI Expert Panels are homogeneous in terms of expertise in order       synergy, interdisciplinary teams also deal with the pull of
to maximize intellectual coverage of particular research areas, as     competing loyalties and demands (Berry, 2011). Sense-making of
individual researchers are specialized in different subareas of a      the beliefs or actions of others (here, disparate experts) is a constant
relatively broad area such as Data Science or Learning Science.        struggle in team environments (Guribye, Andressen, & Wasson,
Expert Panels were composed in this homogenous way to                  2003), and this difficulty can be exacerbated by the greater
encourage meaningful discussions from the start leading to more        intellectual diversity of the team. Shared goals and shared
efficient and engaging conversations early on, benefitting team        understandings are required, and negotiation of these common
building and engagement. Cross-domain interactions are more            goals is an intrinsic part of the team-building process. Effective
challenging. One major purpose of LDI is to engage our team            social relationships are a required constant for effective
members (including Expert Panels) in cross-domain interactions         collaborative work, virtual or face to face, and it may occur more
that develop shared sense making, a common language, and               slowly at first (Vroman & Kovachich, 2002; Walther, 1995).
mission-driven culture over time.
                                                                       3.5 Convergence Processes
The role of the Expert Panels is twofold: (1) to provide solid         A key element of the LDI convergence framework is a set of
(breadth and depth) input from an area of expertise to all LDI         processes, mechanisms, and tools to foster collaboration, broad
participation, diffusion and integration of knowledge across           3.6 New Shared Vocabulary
disciplines, and coordination.                                         LDI participants have started to develop an emerging shared
LDI has implemented an iterative process of idea and solution          vocabulary and language, which enables more effective and
generation and refinement that includes internal (from other LDI       efficient communication and collaboration across disciplines and
members) and external (paid, external ad-hoc reviewers) feedback       which constitutes a key ingredient of convergence research. For
loops. Furthermore, we have set in place synchronous and               instance, new vocabulary includes introducing many team
asynchronous, face-to-face and virtual coordination, collaboration,    members to the notion of convergence research, concrete tasks or
and communication channels supported by adequate processes that        “Scale-Up Projects,” “learner model,” “cloud continuum,” scaling-
will facilitate exchange of ideas across disciplines. Processes that   up AISs “horizontally” and “vertically,” and AISs-teacher
enable broad participation and input from everyone were designed       partnership models. The vocabulary is dynamic and evolving. For
and implemented, including the use of NGT (Nominal Group               instance, we have been using the term “concrete task” to indicate
Technique; Delbecq & Van de Ven, 1971) process for meetings to         prototyping tasks led by researchers in LDI Phase 1 which would
ensure everyone’s voice is heard and accounted for. Other              result in some kind of data science prototype or deliverable (e.g., a
processes such as SWOT analysis (to identify strengths,                significant dataset and/or peer-reviewed publications). In this work,
weaknesses, opportunities, threats) and “pre-mortem” analysis          we use the term “concrete task” and “Scale-Up Project” essentially
(Klein, 2007) (i.e., identifying possible points of failure            interchangeably as the latter reflects our intent for each concrete
prospectively rather than retrospectively, by imagining a future       task to scale up in some dimension in Phase 2.
situation in which a project has failed and considering how that       Synchronous and asynchronous interactions and activities have
imaginary failure might have occurred) were used as well.              enabled better communication and understanding of various
Processes implemented were intended to grow science convergence        domain-specific terms by team members with limited initial
among our large team of interdisciplinary experts. Within- and         expertise or understanding of those terms (e.g., “model parameters”
cross-domain interaction and collaboration processes were              in machine learning/data science, “domain model” in learning
designed among subgroups of our team as well as all-team               engineering, or the meaning and importance of the socio-cultural
interactions and communications (e.g., whole-team meetings,            aspects of human learning). We expect the development and
mailing lists, website) in order to develop a common vision and        emergence of a shared vocabulary and language to continue and
language and to ensure cohesiveness and clarity with respect to the    stabilize over time.
mission of the LDI, responsibility for various tasks, and engaging
the community for assistance when needed.                              3.7 New Research Vistas—Investment
                                                                       Opportunities in the 5-year Institute Plan
    An abbreviated list of activities, tools, and structures LDI      Our strategy to accomplish the LDI mission of transforming the
     implemented to realize the above iterative idea and solution      learning ecosystems, in a proposed 5-year institute, is to focus on a
     generation and broad and deep collaborations include: An          number of carefully selected research priorities, targeting key
     iterative process of ideas and solution generation and            aspects of the learning ecosystem which we believe are at a “tipping
     refinement that includes internal (from other team members)       point” (i.e., a point at which timely investment in data-intensive
     and external (paid, external ad-hoc reviewers) feedback loops     approaches focusing on those critical aspects has the maximum
    asynchronous and synchronous, face-to-face and virtual            potential for a transformative effect).
     coordination, collaboration, and communication channels
     supported by adequate processes that will facilitate exchange     The identified research priorities were the result of an intense
     of ideas across disciplines                                       science convergence process involving a number of activities (e.g.,
    A federation of semi-autonomous groups (e.g., Expert Panels,      brainstorming sessions or “ideas labs” followed by iterative
     concrete task teams) coordinated by a Leadership Team             discussions for ranking and selection at “all-hands” virtual
                                                                       meetings, engagement with Expert Panels, etc.). Processes and
    Regular virtual meetings of the Core Leadership Team (as the
                                                                       activities engaged all LDI team members across many disciplines
     conceptualization phase has largely taken place during the
                                                                       (e.g., educators, education researchers, computer scientists,
     global pandemic)
                                                                       statisticians, cognitive scientists), developers (Carnegie Learning,
    Two full-team or “all-hands” virtual meetings each year
                                                                       Age of Learning, Gooru), school districts (Shelby County Schools,
    Two workshops (in 2020 and 2021) at the International             Brockton Public Schools), as well as researchers from other
     Conference on Educational Data Mining (to which this piece        projects funded by NSF (e.g., Northwestern’s TRIPODS Cohort II
     contributes) to engage with a broader international community
                                                                       project: IDEAL - The Institute for Data, Econometrics, Algorithms,
     of scholars                                                       and Learning; CMU’s DIBBS LearnSphere: Building a Scalable
    Meetings at major conferences that our team members attend        Infrastructure for Data-Driven Discovery and Innovation in
    Quarterly updates and Requests-for-Comments from Expert           Education; and the University of Memphis NSF project: Advancing
     Panels                                                            the Science of Learning Data Science with Adaptive Learning for
    Mini-workshops in the form of full-day brainstorming              Future Workforce Development). That is, the identified research
     sessions on a particular task                                     priorities reflect our collective interdisciplinary wisdom that timely
    Transformative app ideation at “all-hands” meetings               investment in data-intensive approaches will have the maximum
    Email, cloud-shared documents, wikis, Slack, and other            potential for a transformative effect.The identified investment
     collaboration tools for collaboratively drafting and refining     opportunities (or research priorities) constitute the central focus of
     ideas, solutions, and processes                                   the 5-year plan for the LDI. It should be noted that we also
    Software repository managed with the version control              generated a 10-year plan such that the impacts of the LDI Institute
     software, e.g., github or SVN                                     will propagate and evolve beyond the lifetime of the award and
    Project management software to keep track of task progress        beyond our own team thus acting as an agent of change for how
     and major milestone deadlines and deliverables
research questions are conceived and addressed through                   learner data at scale using distributed computing (e.g., leveraging
interdisciplinary collaboration.                                         the cloud-continuum), scalable algorithms, and richer/more
Identified key investment opportunity areas or thrusts include:          powerful algorithms (e.g., emerging neuro-symbolic approaches).
                                                                         Indeed, access to data at scale is a more critical, upstream challenge
    Investment Opportunity Area 1: Scaling Up Access To
                                                                         that needs to be addressed first as before being able to process
     Learning Data – From Impoverished Datasets To Learning
                                                                         learning data, one must have access to the data and have permission
     Data Convergence To Comprehensive Learner Models
                                                                         to share it. LDI adopts the principle that data owners (e.g., learner/
    Investment Opportunity Area 2: Novel, Richer, More                  parent/ guardian/ teacher/ school/ developer/ etc.) should be given
     Powerful, Scalable, and Accurate Data-intensive Solutions to        a spectrum of options with respect to data sharing or, if deciding
     Core Education Tasks                                                not to share, with respect to providing access to data. The spectrum
    Investment Opportunity Area 3: Human Technology                     of options should accommodate all attitudes that learners/learning
     Frontier – Pushing For Wider Adoption and Integration Of            data owners may have towards data ownership, security, and
     AISs                                                                privacy. Indeed, access to learner data is a complex issue due to
                                                                         privacy, security, ownership, and regulatory concerns.
Investment Opportunity Area 1: Scaling Up Access To Learning
Data. To enable data science, there must be data and in particular       We are aware that full data convergence would be hard to achieve
“big” education data (big edu-data). To this end, a key long term        for various reasons. However, our goal is to push the limits of what
goal of LDI is learning data convergence, i.e., collecting and           is possible, understand those limits, and act accordingly.
aligning (more) comprehensive data about the same learner(s)             Understanding the limits of data convergence will allow us to
across skills, disciplines, and modalities (cognitive, meta-cognitive,   understand the limits of technology, what teachers can do to
emotional, motivational, behavioral, social) and across time (e.g.,      compensate for those limitations, and how to best orchestrate the
K-12 grade-levels), as well as data about the learning process and       learner-teacher-AISs partnership.
environment.                                                             Our data convergence activity focuses on concrete examples from
Prior efforts such as LearnSphere/DataShop have made progress            math and computer science (STEAM+C) as well as literacy and
towards building data infrastructure and capacity in education           leverage prior efforts in the area of building data infrastructure and
contexts, but slow data convergence is a critical issue that hinders     capacity, contributing and expanding on those previous efforts to
realizing the full potential of data and data science to transform the   move us closer to the goal of full data convergence. Specifically,
learning ecosystem. For instance, the DataShop metric reports            one major goals is to build a fine-grain, large, and diverse (deep,
show that most of the data is composed of datasets in the standard       wide, long) dataset that will enable LDI to explore the potential of
DataShop format, of which there are about 3500                           data science methods to better model learners and the learner
(https://pslcdatashop.web.cmu.edu/MetricsReport).               While    process. We announced and started the process of building
accumulating this many datasets is no small feat, the average            LearnerNet in Fall 2019 as part of LDI Phase 1 (see Rus, 2019 –
number of observations per student is less than 400. A large number      ADL Directors’ meeting talk). Indeed, we have called for the
of students, greater than 800,000, is spread across more than 3000       development of LearnerNet (Rus et al., 2020), an “ImageNet” (Su,
datasets, resulting in less than 260 students per dataset. Similary,     Deng, & Fei-Fei, 2012) for learner modeling which could enable a
the recently released EduNet (Choi et al., 2020) contains data from      transformation of our modelling and understanding of how learners
784,309 students preparing for the Test of English for International     learn, of how AISs can be made more capable of adapting to diverse
Communication at an average of 400.2 interactions per student.           learners, and fueling a better understanding of the learning
Despite progress in building edu-data repositories, there is an          ecosystem as a whole.
“impoverished datasets” challenge in education.                          Investment Opportunity Area 2: Novel, Richer, More Powerful,
Ideally, big edu-data would include data about millions of learners      Scalable, and Accurate Data-intensive Solutions to Core Education
that are fine-grain (e.g., step/substep level information or detailed    Tasks.
process data), rich (capturing cognitive, affective, motivational,       This investment opportunity area focuses on improving existing
behavioral, social, and epistemic facets of learning), and               methods and models with respect to their scaling and extension
longitudinal (across many grades). That is, big edu-data should be       using big edu-data and developing novel, richer, more powerful,
deep (e.g., about many learners), wide (e.g., capture as many            scalable, and accurate computational models for a number of core
learning relevant aspects as possible), and long (being longitudinal,    educational tasks such as prediction and assessment of learner
across many grades or even a learner’s lifetime). Convergence            mastery of knowledge components (KCs; micro-competencies or
efforts will seek to “deepen” samples and “lengthen” timeframes of       skills), domain model refinement (i.e., improving models of what
datasets that are (sometimes, but not always, already) “wide” in         learners need to learn to acquire mastery of a domain), and inferring
terms of features captured.                                              optimal strategies to coordinate the behavior of AISs for how and
Using these concepts, our goal can be re-stated as enabling the          when to optimally implement guidance to promote student
collection of deep, wide, and long education data which could then       learning. The goal is to improve our understanding of how learners
be analyzed using emerging, state-of-the-art data science methods        learn, improve the effectiveness and efficiency of AISs, make AISs
capable of learning patterns from such massive collections of data       more affordable and scalable horizontally (across topics and
and also accounting for input from diverse domain experts with the       domains), and scale AISs vertically (offering training on higher-
ultimate goal of transforming the learning ecosystem.                    level skills such as deep conceptual understanding and
                                                                         collaborative problem solving).
In order to fully harness the data revolution to transform the
learning ecosystem we need: (1) improved, at-scale data collection       One major opportunity from a learning engineering perspective is
and (near) real-time access to big edu-data (i.e., addressing the        the automation of the development and refinement of AISs and
“impoverished datasets” challenge) in ways that account for              adaptive instructional content. Making progress towards
security, privacy, and ownership and (2) infrastructure to process       automating the authoring of AISs should begin to enable better
                                                                         scalability across topics and domains (horizontal scalability), which
currently is a major stumbling block for a wider adoption of such          transforming communities of practice effort. To this end, we plan
systems. Expert-driven approaches to developing domain models,             to develop new curricula for data literacy to be used by teacher
learner models, and instructional strategies for new topics and            training programs.
domains are expensive, tedious, and time-consuming. Automated              Models of Learner-Teacher-AISs Partnership. Finding the best
or semi-automated approaches to discovering domains models,                learner-teacher-AISs partnerships could have transformative
inferring learner models, and discovering instructional strategies         impact on the learning ecosystem such as freeing teachers from
are much needed. For instance, we intend to use neuro-symbolic             certain duties that AISs can do in an autonomous manner thus
approaches to automatically extract from both structured, e.g.,            allowing them to focus on higher level tasks such as designing new
student performance data, and semi-structured data, i.e., text in          instructional materials or novel tailored interventions for students,
textbooks, domain models.                                                  , motivational support, and other tasks for which AISs are not ideal
A second major opportunity within this thrust involves AISs for            This better distribution of duties and coordination between teachers
collaborative learning with intelligent discourse components.              and AISs should lead to a more effective, efficient, engaging, and
Widely deployed, commercial AISs largely do not target advanced            equitable learning ecosystem. We will study four levels of AISs
topics such as collaborative problem solving. Collaborative work           autonomy with respect to how teachers may use AISs (see later).
and collaborative problem-solving skills are much needed in the            Detect and Mitigate Issues Related to Ethics, Equity, Inclusion, and
21st century (Autor, Levy, & Murnane, 2003; Carnevale & Smith,             Diversity in Education. As a general principle, all LDI activities
2013), and learning activities fostering the acquisition of such skills    will be informed and guided by our goal of using data science and
must be adopted by learning ecologies of the future in order to make       AISs to promote ethics and equity in education (Riddle et al., 2015;
such ecologies more effective and equitable for all learners and           Corbett-Davies & Goel, 2018; Gardner, Brooks, & Baker, 2019).
more relevant to emerging needs and new realities. Our goal is to          At the same time, the Ethics and Equity Expert Panel will review
scale up AISs vertically, to offer training opportunities for such         all LDI efforts to ensure ethics and equity aspects are properly
advanced skills. The strategy is to extend AISs such as those              addressed. Furthermore, our institute 5-year plan includes a set of
offered by Carnegie Learning and Age of Learning with language             activities focusing on ethics and equity which fall into three
through discourse components.                                              categories: (1) using data and data science to further our
Language and discourse play a central role in learning (Vygotsky,          understanding of biases and achievement gaps in the learning
1978), particularly for the acquisition of difficult topics that require   ecosystem; (2) understanding and mitigating ethics and equity
deep comprehension, reasoning, problem solving, and                        throughout the data lifecycle with a focus on algorithmic bias and
collaboration that are required for higher paying jobs in the 21st         developing tools to address these issues throughout the work of the
century (Autor, Levy, & Murnane, 2003; Carnevale & Smith,                  LDI; and (3) increasing diversity and inclusion during collaborative
2013). Language and discourse are essential for developing                 learning activities.
argumentation skills (Ferretti & de la Paz, 2011), disciplinary
literacy (Goldman et al., 2016; Shanahan & Shanahan, 2008;                 3.8 Evaluation and Refinement
Shaffer, 2017), reasoning associated with mental models (Graesser,         Evaluation and analysis are key elements of the LDI convergence
2020), and formulating explanations of complex systems in science          framework to both demonstrate its effectiveness and provide a way
(Chi et al., 1989; Graesser, 2015), math (Fancsali et al., 2016), and      to identify opportunities for improvement and refinement. We
computer code (Lasang et al., 2021).                                       focus on quantitative and qualitative metrics for LDI community
                                                                           building and engagement efforts, identifying investment
Language and discourse is not only essential for learning within
                                                                           opportunities priorities, and development and refinement of
individuals but also learning in group contexts. Problems have
                                                                           prototyping concrete task or Scale-Up Project activities. For
dramatically increased in complexity, requiring collaborative
                                                                           quantitative metrics, to account for different perspectives, we will
problem solving by people with disparate expertise and
                                                                           report how many experts and from how many different disciplines
perspectives (Carnevale & Smith, 2013; Graesser et al., 2018;
                                                                           contribute to specific tasks (e.g., identification of data requirements
OECD, 2017).
                                                                           for Investment Opportunity Area 1, above). For each expert, we can
Investment Opportunity Area 3: Human Technology Frontier –                 monitor their individual contributions in terms of content (e.g.,
Pushing For Wider Adoption and Integration Of AISs                         word counts), comments, and revisions to others’ contributions (by
This investment opportunity fosters a portfolio of efforts to push         using shared documents that track such metrics). More
for wider adoption and integration of AISs with school-based and           qualitatively, each member’s contributions will be assessed in
teacher-led learning activities at the Human-Technology Frontier,          terms of the depth of their contributions. A researcher might
one other of NSF’s ten Big Ideas for Future Investment.                    identify that a particular expert’s contribution initiated the
                                                                           development of a novel solution that could improve the detections
Many teachers are overwhelmed by the many duties and tasks they            of learners’ emotions in a classroom context.
have to handle, resulting in burnout and reduced teacher job
satisfaction and retention rates (Grayson & Alvarez, 2007; Rhodes,         Furthermore, we report the scientific and societal impact of the
Nevill, and Allan, 2004). To assist teachers, major goals and              proposed convergence framework. Scientific impact can be
corresponding Scale-up Projects include: (1) to help teachers better       reported in terms of the number of publications, presentations,
understand the potential of using AISs and data science to                 tutorials, meetings, email exchanges and other forms of direct
transform education including their job performance and                    communication (among LDI members and the broader research
satisfaction; (2) to propose and investigate learner-teacher-AISs          community) as well as improvements of prototype solutions over
collaboration models and interfaces including the validation of a          existing solutions. Other scientific success measures can monitor
framework for learning experience design; and (3) to design and            longer term impact such as how many citations the products of this
develop dashboards for teachers to learn from, interpret, and make         project generate and how many research groups integrate the
decisions based upon fine-grained, comprehensive learning data.            proposed solutions (e.g., user adoption of analysis toolkits
Helping teachers, parents, and other stakeholders understand the           developed).
potential of data science and AISs is important for LDI’s
Societal impact can be assessed through impact on learners and         phase (Phase 2). Expert panels had the freedom to adopt different
teachers as well as impact on the learning ecosystem (e.g., in terms   internal processes to identify investment opportunities.
of how LDI efforts have made aspects of the learning ecosystem
more effective, engaging, equitable, efficient, relevant, and              Expert Panel      9 (1 of 10 Expert Panel members left LDI
affordable, as well as other outcomes such as transforming                 Reviewer Pool     after assignment to Expert Panel.)
educators’ community of practice).                                         Participation     7 / 9 (Two members were assigned reviews
                                                                           rate              but did not submit any reviews.)
An important requirement for the evaluation process is
documentation of the various elements of the convergence                   Concrete Tasks    17
framework. For this purpose, for instance, all meetings of the             Reviewed
leadership team were recorded (key metric: hours of meetings and           Total Concrete    34 (17 task x 2 reviews/task)
interactions; volume of those interactions). Other processes and           Task Reviews
activities have been documented in various ways such as Google
docs, meeting recording, and Slack asynchronous discussions. For           Number of         3.3 (average over the 7 reviewers submitting
instance, the convergence process implemented to generate the 5-           Reviews Per       at least one review; min: 2; max: 7)
year institute plan has been well documented through other records         Member
such as spreadsheets used in NGT processes employed by the                 Total Expert      (34 x 2) + (7 x 2) = 82 hours of expert time
various Expert Panels to generate and rank ideas for investment            Time              (assuming 2 hours spent per concrete task
opportunities to be included in the 5-year plan.                                             review and 2 hours of Expert Panel meeting
We will illustrate how we have been evaluating the effectiveness of                          to summarize the reviews for each concrete
convergence framework holistically as well as from the perspective                           task)
of Expert Panels. For brevity, we illustrate the evaluation of the         Expert Panelist   4.82 hours (82 total hours / 17 concrete
convergence process from the perspective of the Learning                   Time per          tasks)
Engineering Expert Panel.                                                  Concrete Task
The LDI’s Learning Engineering Expert Panel comprised a diverse            Panel             279 words per task (average); 4,749 total
group of researchers and developers with vast experience in                Summary
research and development of learning systems. The 10-member                Word Count
expert panel was drawn from the academe, government, and               Table 2. A summary of the quantitative evaluation of the concrete
industry.                                                              task review and feedback process by the Learning Engineering
The Learning Engineering Expert Panel, like other LDI expert           Expert Panel.
panels, engaged in two major activities that contribute to the LDI     This policy was adopted for two main reasons: (i) offer autonomy
Phase 1 project:                                                       to each expert panel to self-organize and (ii) explore different
-    Provide input to each of the concrete tasks (forward-looking      collaboration processes in order to discover the best one (e.g., in
     “Scale-Up Projects”) addressing various challenges in the         terms of member engagement, effectiveness, and efficiency) or
     learning ecosystem with the goal of converging to solutions to    identify from each expert panel a set of best practices for later
     those challenges that account for input from many domains.        adoption. In the case of the Learning Engineering Expert Panel,
                                                                       investment opportunity ideas were solicited via e-mail from the
-    Identify, rank, and propose investment opportunities for the 5-   Expert Panel by the Co-Leads. A brief summary of candidate
     year plan of the convergence or institute phase (LDI Phase 2)     opportunities is provided below:
The concrete task reviewing and feedback process involved                    Improving and scaling up AISs horizontally across topics and
significant expert time (see Table 2, which presents a summary of             domains
the quantitative evaluation of the initial cycle of the review and
feedback process by the Learning Engineering Expert Panel).                  Scaling up AISs vertically targeting advanced skills such as
                                                                              collaborative problem solving and deep conceptual
In addition to this quantitative summary of the convergence process           understanding of complex STEAM+C topics
related to concrete tasks, we also developed a 5-stage model to
characterize the maturity of concrete tasks: (1) ideation or initial         (More) Comprehensive learner models
idea, (2) conceptualization and convergence of a data science
solution with input from experts from many domains, (3)                      Pushing for wider adoption and integration of AISs in school-
implementation & refinement, (4) product release (e.g., an                    based and teacher-led instruction (Human-Tech Frontier)
emerging data science prototype or dataset release), (5) impact, in          Models of Teacher - AISs inter-operation
which the product from stage 4 is adopted by or integrated into
external research projects or a learning environment, having some            Causal modeling for learning engineering
external impact on the research landscape or on the learning
                                                                             Inclusive learning engineering R&D (ethics, equity, inclusion,
ecosystem. Work of LDI participants during the conceptualization
                                                                              and diversity)
phase has centered primarily on concrete tasks in the first four
phases (ideation, conceptualization and convergence, and product       This list was further discussed and the initial investment
release). Ideally, the transition from concrete task to “Scale-Up      opportunities were ranked by all expert panel members. A
Projects” in LDI Phase 2 will reflect progression to later stages of   recommendation of the most important investment opportunities
this model.                                                            was put forward to the whole LDI team for further debate and
                                                                       refinement by other Expert Panels and paid, ad-hoc external
The other major task of each Expert Panel was to identify
                                                                       reviewers and the public at large. Many of the proposed investment
investment opportunities for the 5-year plan of the LDI institute
opportunities that originated in the Learning Engineering Expert          other level (level 0) which are self-improving, fully autonomous
Panel are part of the 5-year institute plan adopted by the broader        AISs – they improve with experience with minimal or no developer
LDI community.                                                            intervention. While we will explore as resources permit the role of
                                                                          data science to enable such level 0, self-improving fully
Holistically, the LDI convergence framework can be evaluated in           autonomous AISs, from a teacher and learner perspective they are
terms of the level of engagement of a diverse team of researchers,        similar to the fully autonomous level of AISs (level 1).
developers, practitioners, and other stakeholders as well as its key
outcome, which is the 5-year plan for the institute or convergence        We plan to study and understand the trade-offs in terms of teacher
phase which was described and submitted as a proposal to NSF.             involvement in tuning AISs vs. levels of AIS autonomy. For
The level of engagement can be summarized briefly by noting that          instance, teachers may choose a fully autonomous mode of
our 60+ strong team participated so far in 3 all-hands meeting each       operation for an AIS meant for students working independently
for about 20 hours (2.5 days) resulting in 60 x 20 = 1,200 expert         with the system afterschool as supplemental instruction, whereas
hours of effort. Experts spent hundreds of additional hours spent in      for student interactions with the AIS during a class period (i.e., in a
other meetings and other activities. Most meetings were recorded          blended-learning environment), the same teacher may choose to
and transcribed. A more detailed, quantitative and qualitative            control more the behavior of the AISs. Similarly, teachers may
analysis is being conducted right now, and the results will be widely     decide to use/download a pre-trained learner model and update it
disseminated.                                                             with data from her students, assuring data security and privacy and
                                                                          maintaining full ownership of the data. They may decide to share a
4. EMERGING IDEAS                                                         sample of her own student data to benefit the pooled/pre-trained
We conclude this progress report by briefly presenting two                models that everyone can download as default.
emerging ideas from the collective work of the LDI during its
conceptualization phase to date.
                                                                          4.3 Transforming Communities of Practice
                                                                          LDI intends to serve as an agent of change for how research
4.1 Policy Recommendations                                                questions are conceived and addressed through interdisciplinary
Our work so far also results in a number of policy                        collaboration such that LDI’s impacts will propagate and evolve
recommendations:                                                          beyond the lifetime of the award.
-    Publicly funded education technologies similar to publicly           More specifically, we have the explicit intent to start a culture shift
     funded education adopted in the 19th and 20th century.               in teacher training programs through two specific actions: (1)
                                                                          involve a few dozen teachers and pre-service teachers in our work
-    Learning data owners keep ownership of their data and have
                                                                          in order to co-design solutions and account for their input and
     decision power with respect to where their data is stored, how
                                                                          expose them to the potential of data science and AISs while also
     the data is accessed, by whom and for what purposes, how
                                                                          introducing them to science convergence approaches to address key
     their data is used, and if their data can be shared, with whom,
                                                                          challenges in education and (2) develop new curriculum
     and under what conditions and circumstances.
                                                                          recommendations for teacher training programs as well as
-    Learning data infrastructure is needed to enable responsible         accompanying training materials to build capacity for teachers and
     learning data collection, storage, access, sharing, and              other stakeholders to adopt AISs and data science approaches,
     processing.                                                          tools, and principles to improve learning and teaching.
-    The need for a culture shift in teacher training programs and        Wider adoption of advanced data-driven science and engineering
     data literacy curriculum for future teachers.                        approaches and tools such as AISs is still lacking for at least three
                                                                          reasons: (1) Data science and education technology training is often
4.2 AISs Autonomy Levels or Teacher-AISs                                  limited in teacher training programs. (2) The sophistication and
Partnership Models                                                        complexity of AISs often entail a significant effort to train teachers
Finding the best teacher/learner-AISs partnerships could have             to effectively use such advanced education technologies. (3) New
transformative impact on the learning ecosystem, potentially              approaches are often developed with a lack of substantive
freeing teachers from certain duties that AISs can do in an               involvement of educators and schools.
autonomous manner and allowing teachers to focus on higher level
                                                                          Involving educators will help to ensure that new approaches based
tasks such as tailored, individualized interventions for students,
                                                                          on data science to tackle various education challenges, next-
motivational support, and other tasks for which AISs are not ideal.
                                                                          generation AISs, and learning environments that include AISs, are
This better distribution of duties and coordination between teachers
                                                                          designed to help eliminate biases and promote equity, inclusion,
and AISs should lead to a more effective, efficient, engaging, and
                                                                          and diversity, offering high quality education opportunities for all
equitable learning ecosystem.
                                                                          learners. We will therefore push for schools, teacher training
We defined and intend to study four levels of AISs’ “autonomy”            programs, and instructors to collaborate more with data science and
with respect to how teachers can use such AISs: (1) fully                 educational technology researchers and developers to improve
autonomous – teachers need little (if any) training and have little (if   learning and instruction. To this end, in addition to substantive
any) involvement in “tuning” AISs, (2) minimal teacher                    involvement of teachers and other stakeholders in LDI activities,
involvement – teachers tune the parameters of the AISs with the           we will explore avenues for delivering professional learning,
help of the AISs developer at the beginning of the school year or         including workshops for teachers, summer schools (e.g., by adding
semester (minimal teacher training with respect to the workings of        a track to CMU’s LearnSphere summer school) for pre-service
the AISs), (3) average teacher involvement – teachers require             teachers and Research Methods instructors in schools of education.
training, and they work with the system on a weekly basis selecting
                                                                          We are an expanding community of practice and promote Scale-Up
instructional tasks and receiving information from the AISs, (4)
                                                                          Projects that will ideally become bona fide research programs
teacher-driven – the teachers exerts full control of the AISs
                                                                          beyond the award period, securing their own funding as they make
including overriding decisions the AISs may take or suggest, the
                                                                          scientific progress. Furthermore, Scale-Up projects and research
teacher will interact almost daily with the AISs. There is in fact one
thrusts will ideally result in career-long efforts for some younger           Artificial Intelligence in Education. AIED 2020. Lecture
faculty members.                                                              Notes in Computer Science, vol 12164. Springer, Cham.
To sum up, our strong team of interdisciplinary experts, developers,          https://doi.org/10.1007/978-3-030-52240-7_13
and practitioners will work together during the 5-year LDI institute     [12] Cohen, P. A., Kulik, J. A., & Kulik, C. C. (1982).
project to move current practices beyond the small-scale studies to           Educational outcomes of tutoring: A meta-analysis of
bring the learning sciences into the era of big data and                      findings. American Educational Research Journal, 19, 237-
interdisciplinary science convergence. The impact of LDI will be              248.
felt far and wide, propagating and evolving beyond the lifetime of       [13] Cohen, P.R. (2015). DARPA's Big Mechanism program.
the award and beyond our own team, acting as an agent of change               Physical Biology, Volume 12, Number 4, 1-9.
for how research questions are conceived and addressed through
interdisciplinary, collaboration, and co-designed research and           [14] Corbett-Davies, S. & Goel, S. (2018). The Measure and
development. The proposed processes, methods, and studies pave                Mismeasure of Fairness: A Critical Review of Fair Machine
the way for taking these outcomes to other domains.                           Learning, arXiv:1808.00023, 2018.
                                                                         [15] Delbecq, A. L., & Van de Ven, A. H. (1971). A group
ACKNOWLEDGMENTS                                                               process model for problem identification and program
The Learner Data Institute is sponsored by the National Science               planning. The Journal of App. Beh. Science, 7(4), 466-492.
Foundation (NSF; award #1934745). The opinions, findings, and
results are solely the authors’ and do not reflect those of NSF.         [16] Dwork, C. (2008). Differential privacy: A survey of results.
                                                                              In International conference on theory and applications of
5. REFERENCES                                                                 models of computation, pp. 1–19. Springer.
[1] Anders, R., Oravecz, Z., & Batchelder, W. (2014). Cultural           [17] Fancsali, S.E., Ritter, S., Berman, S.R., Yudelson, M., Rus,
    consensus theory for continuous responses: A latent appraisal             V., and Morrison, D.M. (2016). Toward Integrating
    model for information pooling. Journal of Mathematical                    Cognitive Tutor Interaction Data with Human Tutoring Text
    Psychology, 61, 1–13.                                                     Dialogue Data in LearnSphere. In: J.P. Rowe and E.L. Snow
[2] Atwal, H. (2020). DataOps Technology. In Practical                        (Eds.), Proceedings of the Workshops at the 9th Intern. Conf.
    DataOps 2020 (pp. 215-247). Apress, Berkeley, CA.                         on Educ. Data Mining, Raleigh, NC, USA, June 29, 2016.
[3] Autor, D., Levy, F., & Murnane, R. (2003). The Skill                 [18] Fancsali, S.E., Yudelson, M.V., Berman, S.R., Ritter, S.
    Content of Recent Technological Change: An Empirical                      (2018). Intelligent instructional hand offs. In: K.E. Boyer,
    Exploration, Quarterly Journal of Economics, 118(4),                      M.V. Yudelson, (Eds.) Proceedings of the 11th International
    November 2003, 1279-1334.                                                 Conference on Educational Data Mining (EDM 2018), pp.
                                                                              198–207. International Educational Data Mining Society.
[4] Bach, S.H., Broecheler, M., Huang, B., and Getoor, L.
    (2017). Hinge-loss Markov Random Fields and Probabilistic            [19] Ferretti, R. P., & De La Paz, S. (2011). On the
    Soft Logic. Journal of Machine Learning Research, 18, pp. 1               comprehension and production of written texts: Instructional
    – 67, 2017.                                                               activities that support content-area literacy. In R. O’Connor
                                                                              & P. Vadasy (Eds.), Handbook of reading interventions (pp.
[5] Berry, G. R. (2011). Enhancing effectiveness on virtual                   326–355). New York, NY: Guilford.
    teams: Understanding why traditional team skills are
    insufficient. The Journal of Business Communication, 48(2),          [20] Gardner, J., Brooks, C., & Baker, R. S. J. d. (2019).
    186-206.                                                                  Evaluating the Fairness of Predictive Student Models
                                                                              Through Slicing Analysis, in Proceedings of the 9th
[6] Bishop, C. M. (2013). Model-based machine learning.                       International Conference on Learning Analytics &
    Philosophical Trans. of the Royal Society A: Mathematical,                Knowledge, 2019, pp. 225–234.
    Physical and Engineering Sciences, 371(1984).
                                                                         [21] Goldman, S. R., Britt, M. A., Brown, W., Cribb, G., George,
[7] Bloom, B. S. (1984). The 2 Sigma Problem The Search for                   M., Greenleaf, C., Lee, C. D., Shanahan, C., & Project
    Methods of Group Instruction as Effective as One-to-One                   READI. (2016). Disciplinary literacies and learning to read
    Tutoring. Educational Researcher, 13, 4-16.                               for understanding: A conceptual framework of core
[8] Carnevale, A.P., & Smith, N. (2013). Workplace basics: The                processes and constructs. Educational Psychologist, 51, 219-
    skills employees need and employers want. Human Resource                  246.
    Development International, 16, 491–501.                              [22] Graesser, A.C., Fiore, S.M., Greiff, S., Andrews-Todd, J.,
[9] Chesler, N. C., Bagley, E., Breckenfeld, E., West, D., &                  Foltz, P.W., & Hesse, F.W. (2018). Advancing the science of
    Shaffer, D. W. (2010). A virtual hemodialyzer design project              collaborative problem solving. Psychological Science in the
    for first-year engineers: An epistemic game approach. In                  Public Interest, 19, 59-92.
    ASME 2010 Summer Bioengineering Conference (pp. 585-                 [23] Grayson, J. L., & Alvarez, H. K. (2007). School climate
    586). American Society of Mechanical Engineers.                           factors relating to teacher burnout: A mediator model.
[10] Chi, M.T.H., Roy, M.& Hausmann, R.G.M. (2008). Learning                  Teaching and Teacher Education, 24(5), 1349-1363.
     from observing tutoring collaboratively: Insights about             [24] Growing Convergence Research. (NSF-GCR, 2020).
     tutoring effectiveness from vicarious learning. Cognitive                National Science Foundation’s Growing Convergence
     Science, 32, 301-341.                                                    Program,
[11] Choi Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J.,        https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5056
     Bae, C., Kim, B, & Heo, J. (2020) EdNet: A Large-Scale                   37 (accessed online on June 15, 2020)
     Hierarchical Dataset in Education. In: Bittencourt I.,
     Cukurova M., Muldner K., Luckin R., Millán E. (eds)
[25] Guribye, F. , Andressen, E.F. , & Wasson, B. ( 2003).The            [40] Rus, V., Banjade, R., Maharjan, N., Morrison, D., Ritter, S.,
     organization of interaction in distributed collaborative                 and Yudelson, M. (2016). Preliminary Results on Dialogue
     learning. In B. Wasson , S. Ludvigsen , & U. Hoppe (Eds.),               Act Classification in Chatbased Online Tutorial Dialogues,
     Designing for change in networked learning environments                  Proceedings of the 9th International Conference on
     (pp. 385-394). Dortrecht, Netherlands: Kluwer Academic.                  Educational Data Mining, Raleigh, NC, 2016.
[26] Harnesssing the Data Revolution. (NSF-HDR, 2021).                   [41] Rus, V., Fancsali, S.E., Bowman, D., Pavlik Jr., P., Ritter, S.,
     National Science Foundation’s Harnessing the Data                        Venugopal, D., Morrison, D., and The LDI Team (2020).
     Revolution, https://www.nsf.gov/cise/harnessingdata/                     The Learner Data Institute: Mission, Framework, &
     (accessed online on June 11, 2021)                                       Activities. In V. Rus & S.E. Fancsali (Eds.) Proceedings of
[27] Hartline, J. D. (2012). Bayesian mechanism design.                       The First Workshop of the Learner Data Institute, The 13th
     Theoretical Computer Science 8(3), 143–263.                              International Conference on Educational Data Mining (EDM
                                                                              2020), July 10-13, Ifrane, Morroco (held online).
[28] Hartline, J. D., A. Johnsen, D. Nekipelov, and O. Zoeter
     (2019). Dashboard mechanisms for online marketplaces. In            [42] Shaffer, D. W. (2017). Quantitative ethnography. Madison,
     Proceedings of the 2019 ACM Conference on Economics                      WI: Cathcart Press.
     and Computation, pp. 591–592.                                       [43] Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary
[29] Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith,        literacy to adolescents: Rethinking content-area literacy.
     J., Sreekanti, V., Tumanov, A., & Wu, C. (2019). Serverless              Harvard Educational Review, 78, 40−59.
     Computing: One Step Forward, Two Steps Back.                        [44] Sottilare, R.A., Brawner, K.W., Goldberg, B.S., & Holden,
     arXiv:1812.03651, 2019.                                                  H.K. (2012). The Generalized Intelligent Framework for
[30] Hoffmann, L. (2019). Reaching New Heights with Artificial                Tutoring (GIFT). Downloaded from www.gifttutoring.org on
     Neural Networks: ACM A.M. Turing Award recipients                        November 30, 2012.
     Yoshua Bengio, Geoffrey Hinton, and Yann LeCun.                     [45] Spirtes, P., Glymour, C., Scheines, R. (2001). Causation,
     Communications of the ACM. June - 2019, p. 96-95.                        Prediction, and Search. 2nd Edition. MIT.
[31] Klein, G. (2007). Performing a Project Premortem. Harvard           [46] Su, H., Deng, J. and Fei-Fei, L. (2012). Crowdsourcing
     Business Review. 85 (9): 18–19.                                          Annotations for Visual Object Detection. AAAI 2012 Human
[32] Lave, J., & Wenger, E. (1991). Situated learning: Legitimate             Computation Workshop, 2012.
     peripheral participation. Cambridge University Press.               [47] Tamang, L.J., Alshaikh, Z., Ait-Khayi, N., Oli, P., & Rus, V.
[33] Lilian, S. C. (2014). Virtual teams: Opportunities and                   (2021). A Comparative Study of Free Self-Explanations and
     challenges for e-leaders. Procedia-Social and Behavioral                 Socratic Tutoring Explanations for Source Code
     Sciences, 110, 1251-1261.                                                Comprehension, Proceedings of the 52nd ACM Technical
                                                                              Symposium on Computer Science Education, pp. 219-225,
[34] Liu, R., Koedinger, K., Stamper, J., & Pavlik Jr., P. I. (2017).         March, 2021.
     Workshop: Sharing and Reusing Data and Analytic Methods
     with LearnSphere. In X. Hu, T. Barnes, A. Hershkovitz, & L.         [48] VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P.,
     Paquette (Eds.), Proc. of the 10th Int. Conf. on Educ. Data              Olney, A., & Rose, C. P. (2007). When are tutorial dialogues
     Mining (pp. 475-476). Wuhan, China.                                      more effective than reading? Cognitive Science, 31, 3-62.

[35] Mislevy, R. J., Almond, R. G., Yan, D., & Steinberg, L. S.          [49] Vroman, K. , & Kovachich, J. ( 2002). Computer-mediated
     (1999). Bayes nets in educational assessment: Where the                  interdisciplinary teams: Theory and reality. Journal of
     numbers come from. In Proceedings of the fifteenth                       Interprofessional Care, 16, 159-170.
     conference on uncertainty in artificial intelligence (pp. 437–      [50] Vygotsky, L.S. (1978). Mind in society: the development of
     446). UAI’99. Stockholm, Sweden: Morgan Kaufmann Pubs.                   higher psychological processes. London: Harvard University
[36] OECD (2017). PISA 2015 Results (Volume V): Collaborative                 Press.
     Problem Solving. Paris: OECD Publishing.                            [51] Walther, J.B. (1995). Related aspects of computer-mediated
[37] Pearl, J. & Mackenzie, D. (2018). The Book of Why: The                   communication: Experiential observations. Organizational
     New Science of Cause and Effect. Basic Books, New York.                  Science, 6, 180-203.

[38] Rhodes, C., Nevill, A. & Allan, J. (2004) Valuing and               [52] Wang, X., S. Ranellucci, and J. Katz. (2017). Global-scale
     supporting teachers: A survey of teacher satisfaction,                   secure multiparty computation. In B. M. Thuraisingham, D.
     dissatisfaction, morale and retention in an English local                Evans, T. Malkin, and D. Xu (Eds.), ACM CCS 2017: 24th
     education authority. Research in Education, 71 (1), 67-80.               Conference on Computer and Communications Security,
                                                                              Dallas, TX, USA, pp. 39–56. ACM Press.
[39] Riddle, T., Bhagavatula, S., Guo, W., Muresan, S., Cohen,
     G., Cook, J., and Purdie-Vaughns, V. (2015). Mining a               [53] Weick, K. E. (1976). Educational organizations as loosely
     Written Values Affirmation Intervention to Identify the                  coupled systems. Administrative science quarterly, 1-19.
     Unique Linguistic Features of Stigmatized Groups.
     Proceedings of EDM 2015.

</pre>