=Paper=
{{Paper
|id=Vol-3051/LDI_1
|storemode=property
|title=The Learner Data Institute—Conceptualization: A Progress Report
|pdfUrl=https://ceur-ws.org/Vol-3051/LDI_1.pdf
|volume=Vol-3051
|authors=Vasile Rus,Stephen E. Fancsali,Philip Pavlik Jr.,Deepak Venugopal,Arthur C. Graesser,Steve Ritter,Dale Bowman,The LDI Team
|dblpUrl=https://dblp.org/rec/conf/edm/RusFPVGRBT21
}}
==The Learner Data Institute—Conceptualization: A Progress Report==
The Learner Data Institute—Conceptualization:
A Progress Report
Vasile Rus1, Stephen E.
Fancsali2, Philip Pavlik, Jr.1,
Deepak Venugopal1, Arthur C.
Graesser1, Steve Ritter2, Dale
Bowman1, and The LDI Team
1The University of Memphis
2Carnegie Learning, Inc.
vrus@memphis.edu,
sfancsali@caregielearning.com
ABSTRACT The LDI is a “frameworks” project funded by the United States’
This paper provides a progress report on the first 18 months of National Science Foundation (NSF) under the Data-intensive
Phase 1, the conceptualization phase, of the Learner Data Institute Research in Science and Engineering (DIRSE) program to make
(LDI; www.learnerdatainstitute.org). LDI is currently in Phase 1, the learning ecosystem more effective, efficient, engaging,
the conceptualization phase, to be followed by Phase 2, the institute equitable, relevant, and affordable. It is part of the NSF’s
or convergence phase. The current 2-year conceptualization phase Harnessing the Data Revolution1 (HDR) Institutes effort. “HDR
has two major goals: (1) develop, implement, evaluate, and refine Institutes… enable breakthroughs in science and engineering
a framework for data-intensive science and engineering for the through collaborative, co-designed programs to formulate
future institute, and (2) use the framework to provide prototype innovative data-intensive approaches to address critical national
solutions, based on data, data science, and science convergence, to challenges” (NSF-HDR, 2021). LDI focuses on data-intensive
a number of core challenges in learning science and engineering. approaches to developing and improving learning environments
By targeting a critical mass of key challenges that are at a tipping that include adaptive instructional systems as a means to address
point, LDI aims to start a chain reaction that will transform the the challenge of offering access to high-quality education to
whole learning ecosystem. We will emphasize here the key everyone—no matter what neighborhood they live in, and
elements of the LDI science convergence framework that our team regardless of gender, race, national origin, native language,
developed, implemented, and now is in the process of evaluating personal interests, or any other factor that might limit such access
and refining. We highlight important outcomes of the convergence and educational opportunity.
framework and related processes, including a 5-year plan for the There is a twofold focus during the current 2-year conceptualization
institute phase and data-intensive prototype solutions to transform phase: (1) develop, implement, evaluate, and refine a framework
the learning ecosystem. for data-intensive science and engineering, and (2) use the
framework to provide prototype solutions, based on data, data
Keywords science, and science convergence, to a number of core challenges
in learning science and engineering. The institute or convergence
big data in education, science convergence, learning engineering,
phase would build on results realized and insights gained from this
adaptive instructional systems, intelligent tutoring systems.
conceptualization phase. By targeting a critical mass of key
1. INTRODUCTION challenges that are at a tipping point (i.e., targeting challenges for
This paper provides a progress report on the first 18 months of the which timely investment in data-intensive approaches has the
two-year conceptualization phase of the Learner Data Institute maximum potential for a transformative effect), LDI will start a
(LDI; www.learnerdatainstitute.org). The present work updates chain reaction that will transform the whole learning ecosystem,
that of Rus et al. (2020), which provided an introduction to LDI and lifting it to a qualitatively higher state that is more effective,
early activities and outcomes. We emphasize here the engaging, equitable, relevant, and affordable. Indeed, since the
developments of the past 12 months (since the 2020 paper), learning ecosystem is a complex web of interrelated elements,
focusing on the key elements of the science convergence improvements in key aspects will percolate throughout the whole
framework, its development, implementation, evaluation, and learning ecosystem.
refinement, and key outcomes such as the 5-year plan of the future LDI has brought together a team which currently consists of 60+
institute and data-intensive prototype solutions to address key researchers, developers, and practitioners from three continents
challenges in the learning ecosystem. spanning many disciplines and backgrounds. Team members are
Copyright © 2021 for this paper by its authors. Use permitted under 1 https://www.nsf.gov/cise/harnessingdata/
Creative Commons License Attribution 4.0 International (CC BY
4.0).
drawn from institution and organizations representing academia, data, together with advanced data science methods, are likely to
government, and industry. offer insights about learning and instruction and lead to the
Together, we intend a rigorous test of the hypothesis that emerging development of effective and affordable instructional tools that
learning ecologies that incorporate adaptive instructional systems were not possible before. This is promising enough to believe that
(AISs) are capable of providing affordable, effective, efficient, the learning ecosystem is at a tipping-point to be transformed.
equitable, and engaging individualized assistance for both learners Indeed, LDI is built on the belief that AISs constitute a necessary
and instructors, and that the characteristics, parameters, and catalyst to enable the transformation of the learning ecosystem
impacts of these systems, for example, effectiveness (in terms of through harnessing the data revolution because, as noted earlier,
learning gains), can be improved over time given sufficient AISs can monitor and scaffold learners at a very fine granularity
attention to evidence, captured as data, and expertise, provided by level, at scale, and across time. It should be noted that much of
teams of interdisciplinary researchers like ours. education data, (e.g., currently collected by schools), relies on a set
The idea that AISs and data science have the potential to radically of predefined competencies or standards to monitor student
transform existing learning ecosystems is based on the following: progress. Such data only reveal what students know or mastered
(1) evidence suggesting that individualized instruction is generally and what they don’t know (didn’t master yet), but such data often
more effective than traditional classroom instruction where do not reveal much about the learning and instructional process.
monitoring and tailored support to each individual learner is not That is, much of the school data focus on “where the student is” but
possible (Bloom, 1984; Chi, Roy, & Hausmann, 2008; Cohen, not what they do during instructional activities. Fundamentally,
Kulik, & Kulik, 1982; VanLehn et al., 2007); (2) the capability of teachers and schools in general lack the capacity to monitor and
modern technologies to collect, store, and access vast and rich store data about all students at every single step of the learning and
learner data; (3) incentive-based mechanisms to share goods such instruction process. LDI will thus offer schools a new powerful
as education data using online market places (Hartline, 2012; framework to understand, monitor, and intervene at a fine-grain
Hartline et al., 2019) and secure and privacy preserving ways to level with potentially transformative effects on the learning
access and process data based on differential privacy and multi- ecosystem.
party computation (Dwork, 2008; Wang, Ranellucci, & Katz,
2017); (4) promising new advances in data science, including
3. FRAMEWORK FOR SCIENCE
powerful machine learning and statistical methods such as deep CONVERGENCE
neural networks, statistical relational learning, causal modelling, A major goal of LDI conceptualization phase is to develop,
and probabilistic temporal graphs, for extracting useful knowledge implement, test, and refine a framework for data-intensive research
from massive educational data sets (Spirtes, Glymour, & Scheines, in science and engineering enabling science convergence, aligning
2001; LeCun, Bengio, & Hinton, 2015; Schmidhuber, with the Growing Convergence Research (GCR) “big idea”
2015; Bach, Broecheler, Huang, & Getoor, 2017; Pearl & identified by the National Science Foundation.
Mackenzie, 2018); and (5) recently available access to affordable,
According to NSF, “convergence research is a means of solving
powerful, and scalable cloud-based computing resources for
vexing research problems, in particular, complex problems
processing big data (Hellerstein et al., 2019; Atwal, 2020).
focusing on societal needs. It entails integrating knowledge,
2. DATA SCIENCE AND AISs — A methods, and expertise from different disciplines and forming
novel frameworks to catalyze scientific discovery and innovation."
TRANSFORMATIVE MIX FOR THE Also, “convergence is a deeper, more intentional approach to the
LEARNING ECOSYSTEM integration of knowledge, techniques, and expertise from multiple
The LDI is founded on the key observation that data science and disciplines in order to address the most compelling scientific and
AISs are a powerful mix with potentially transformative impact on societal challenges” (NSF-GCR, 2020).
the learning ecosystem.
NSF identifies Convergence Research as having two primary
Big educational data (edu-data) create tremendous opportunities to characteristics:
reveal facets along which learner experiences can be tailored or
adapted in ways heretofore impossible. A particular learning “Research driven by a specific and compelling problem.
environment may result in different learning outcomes for different Convergence Research is generally inspired by the need to
(groups of) students because of students’ idiosyncratic prior address a specific challenge or opportunity, whether it arises
knowledge, experience(s), interest(s) and motivation(s). A small from deep scientific questions or pressing societal needs.”
minority of students, for example, that approach a problem in a “Deep integration across disciplines. As experts from
unique way could be overlooked in a small dataset, but larger different disciplines pursue common research challenges, their
datasets give us the possibility to detect and account for individual knowledge, theories, methods, data, research communities and
differences in learning. To this end, our mission is to harness the languages become increasingly intermingled or integrated.
data revolution to further our understanding of how people learn. New frameworks, paradigms or even disciplines can form
AISs can monitor and scaffold learners at a fine level of granularity sustained interactions across multiple communities” NSF-
(e.g., capturing every single step during instructional activities) and (GCR, 2020).
with respect to many aspects of learning (e.g., cognitive,
LDI’s compelling problem is making the learning ecosystem more
behavioral, affective, social, motivational facets of learning) at
effective, engaging, equitable, efficient, relevant, and affordable.
scale (i.e., for millions of learners and teachers and across many
topics and domains) and across time periods (e.g., across grade- To foster deep integration across scientific disciplines, we have put
levels). Such rich data, when collected, can be characterized as deep in place a convergence framework, comprising a diverse team,
(many data instances from millions of learners), wide (capturing organizational structures, processes, mechanisms, activities, and
many aspects of the learning process at a fine granularity level), and tools, meant to encourage broad participation, coordination,
long (longitudinal, i.e., across time and grade levels). Such big edu-
collaboration, and diffusion and integration of knowledge across Incentives for team members to proactively and deeply engage
disciplines. in convergent activities and working towards accomplishing
the goal/mission of the team which is to solve the compelling
LDI has intentionally sought, from its inception, to follow NSF’s
problem:
characterization of convergence research by “intentionally
bring[ing] together [from the inception] intellectually diverse o Resources
researchers and stakeholders to frame … research questions,
develop effective ways of communicating across disciplines and o Freedom to propose research tasks that fit their own
sectors, adopt common frameworks for their solution, and, when interests and align with the LDI mission
appropriate, develop a new scientific vocabulary.” (NSF-GCR, o Bottom-up and top-down strategies for agenda
2020) The LDI team seeks, where possible, to develop “sustainable setting
relationships that may not only create solutions to the problem that
engendered the collaboration, but also develop novel ways of o Semi-autonomous teams/groups
framing related research questions and open new research vistas” o Flexible, open structure
(NSF-GCR, 2020).
Progress monitoring and refinement of the convergence
To make these intentions a reality, LDI’s leadership team and
framework
participants have designed, prototyped, and tested a process and a
corresponding set of tools designed to transform what is currently Our framework will enable team members to develop a shared
a loosely coupled group of research centers, AIS commercial vision and language, which over time should lead to effective and
providers, and governments research labs engaged in similar but meaningful cross-discipline, collaborations, i.e., science
disparate research and development efforts into a set of interacting convergence. Such mutual sense- making, science convergence,
teams (Berry, 2011; Lilian, 2014), in aggregate constituting a and R&D efforts are likely to incubate solutions to complex
physical and virtual community of practice (Lave & Wenger, problems to enable effective, efficient, engaging, equitable, and
1991). We have not and will not attempt to “tighten” the coupling affordable learning experiences for everyone. We detail next the
between participating research centers. As Weik (1991) has argued main components of our science convergence framework.
in respect to educational systems, loosely coupled systems have
several advantages over tightly coupled ones—not least flexibility, 3.1 LDI’s Mission and Vision
survivability (with dysfunction in individual nodes tolerable), and LDI’s mission is to harness the data revolution (HDR) to further
increased likelihood of beneficial “mutations.” Rather, LDI’s our understanding of how people learn, how to improve adaptive
leadership has intended to design and test a set of processes and instructional systems (AISs), and how to make emerging learning
tools that will support the independent work of the participating ecologies that include online and blended learning with AISs more
research centers, facilitate the flow of information and ideas within effective, efficient, engaging, equitable, relevant, and affordable.
and across these centers, and help to keep participants focused on
Our vision is for LDI to: (i) serve as a hub to identify investment
common problems without the need for direct intervention (e.g., in
opportunities for data-intensive approaches to core learning science
the form of a top-down, tightly controlled research agenda).
and engineering challenges to accelerate progress toward equitable
LDI’s team structure and processes enable the harnessing and learning and achievement in education; (ii) foster, support, and
diffusion of expertise from various areas in an efficient and build a portfolio of inter-related, inter-disciplinary prototyping or
effective way while fostering individual initiative and interests. For “Scale-up Projects” to research, develop, and disseminate data-
example, LDI team members were encouraged in the intensive solutions across multiple academic and non-academic
conceptualization phase to propose prototyping tasks that they are communities that currently cannot easily communicate with each
interested in and which fit the LDI mission statement (see more other, embodying a process of science convergence; (iii) bridge the
details later). Organizational structures and processes are HDR ecosystem with the educational data science and learning
intentionally open, flexible, and scalable to enable the LDI to grow engineering community and the broader education world, and, in
and transform based on emerging findings and partnerships with particular, serve as the education & training hub for the HDR
other NSF-supported HDR teams. ecosystem, assisting other teams with developing data science
training platforms for their communities.
The key elements of the LDI convergence framework are listed
below. LDI will forge new HDR frontiers by:
Mission/Common Goal furthering our understanding of learning and instructional
processes and environments;
An intellectually diverse team with stakeholder representation
(researchers, developers, practitioners including school and developing data science infrastructure for the education and
teachers’ representatives) the HDR ecosystem;
An effective and efficient team structure improving AISs and scale them up both horizontally and
vertically;
Activities and processes that foster cross-discipline
interactions advancing research at the human-technology frontier in future
learning ecologies that involve AISs;
Processes, mechanisms, and tools to nurture collaboration,
broad participation, diffusion and integration of knowledge transforming communities of practice (e.g., triggering a
across disciplines, and coordination culture shift in teacher training programs);
Resources, in terms of funding, student support, travel, and exploring how data science can address equity, ethics,
access to big edu-data and other cyber-infrastructure resources diversity, and inclusion aspects of education.
3.2 LDI’s Team and Team Structure efforts such as concrete prototyping tasks that are being carried out
LDI’s team evolved and grew from 45+ members (see Rus et al., in the Phase 1 conceptualization and (2) to help shape the 5-year
2020) to over 60 as of this writing. In preparation for the longer- plan for Phase 2 by identifying opportunities for investment (i.e.,
term “convergence” or institute phase (LDI Phase 2), we have promising developments in one area that could benefit the other
extended our interdisciplinary team to include additional areas or specific activities of the institute).
researchers and personnel from academia, K-12 schools, industry,
and government, giving us access to the necessary stakeholders,
infrastructure, expertise, and learning data to pursue targeted
investment opportunities.
LDI is led by the Institute of Intelligent Systems at The University
of Memphis and main corporate partner Carnegie Learning,
developer of commercial-grade AISs serving over 500,000 students
in 2,000+ school districts. The assembled team now spans 14 main
organizations on 3 continents, including NSF-funded partners such
as the Institute for Data, Econometrics, Algorithms, and Learning
(IDEAL; NSF HDR TRIPODS project led by researchers at
Northwestern University) and LearnSphere: Building a Scalable
Infrastructure for Data-Driven Discovery and Innovation in
Education (NSF DIBBs project; Carnegie Mellon University lead).
Figure 1. LDI team structure.
In addition, partners include researchers, practitioners, and other
stakeholders from the US Army’s Generalized Intelligent The following Expert Panels were initially formed: Data Science,
Framework for Tutoring project (Sottilare et al, 2016) and 6 K-12 Education, Learning Sciences, Learning Systems
additional corporate partners, 3 laboratory schools (The Early Engineering, Ethics & Equity, and Human-Technology Frontier.
Learning & Research Center, Campus Elementary School, and Expert Panel membership is flexible; LDI participants may belong
University Middle School in Memphis, TN), 3 K-12 school districts to more than one Expert Panel but must be actively engaged in at
- Shelby County Schools (Memphis, TN area; 200 schools, 100,000 least one. Expert Panels have co-leaders who are responsible for
students), Brockton Public Schools (Boston, MA area; 24 schools, ensuring that the panels successfully reach milestones (e.g.,
15,000 students), Val Verde Unified School District (Los Angeles, reviewing concrete tasks).
California area; 21 schools, 20,000 students), and one teacher
training program at Christian Brothers University. Concrete tasks or “Scale-Up Projects” are prototyping endeavors
led by individual researchers (see the section on Building
3.3 Team Structure Prototypes for Concrete Tasks later). Examples of concrete tasks
The team structure consists of a leadership team, domain-oriented include projects directed at scaling data-driven domain model
Expert Panels, and task-oriented groups that in the refinement, using auto-encoders for student assessment, and data-
conceptualization phase have driven prototyping projects for very driven instructional strategy discovery.
concrete, well-defined tasks, hence called concrete tasks.
3.4 Stakeholder Representation
The LDI Core Leadership Team is responsible for overseeing and Our team includes representatives of various communities with an
coordinating LDI activities, making sure those activities align with invested interest in the learning ecosystem such as researchers,
the mission of the institute and offering necessary support for developers, practitioners, government, policymakers, and funders.
cohesiveness of activities. The Leadership Team consists of Lead Nevertheless, there are gaps in LDI’s expertise. For instance, we do
Principal Investigator (PI) Dr. Vasile Rus, Carnegie Learning not currently have representatives from domains including
Principal Investigator Dr. Stephen Fancsali (co-PI), and co-PIs neuroscience, the law, and social and moral philosophy, primarily
from University of Memphis: Dr. Dale Bowman, Dr. Philip Pavlik, due to Phase 2 budget constraints. We hope to account for such
and Dr. Deepak Venugopal. Project coordinator Jody Cockroft, expertise through ad-hoc engagement with appropriate experts
Senior Research Scientist Dr. Donald Morrison, Dr. Arthur (e.g., reviewing and feedback from targeted experts in those areas).
Graesser, a Professor Emeritus at The University of Memphis
While diverse opinions and perspectives are represented within the
round out the Leadership Team.
team and make possible greater organizational learning and
LDI Expert Panels are homogeneous in terms of expertise in order synergy, interdisciplinary teams also deal with the pull of
to maximize intellectual coverage of particular research areas, as competing loyalties and demands (Berry, 2011). Sense-making of
individual researchers are specialized in different subareas of a the beliefs or actions of others (here, disparate experts) is a constant
relatively broad area such as Data Science or Learning Science. struggle in team environments (Guribye, Andressen, & Wasson,
Expert Panels were composed in this homogenous way to 2003), and this difficulty can be exacerbated by the greater
encourage meaningful discussions from the start leading to more intellectual diversity of the team. Shared goals and shared
efficient and engaging conversations early on, benefitting team understandings are required, and negotiation of these common
building and engagement. Cross-domain interactions are more goals is an intrinsic part of the team-building process. Effective
challenging. One major purpose of LDI is to engage our team social relationships are a required constant for effective
members (including Expert Panels) in cross-domain interactions collaborative work, virtual or face to face, and it may occur more
that develop shared sense making, a common language, and slowly at first (Vroman & Kovachich, 2002; Walther, 1995).
mission-driven culture over time.
3.5 Convergence Processes
The role of the Expert Panels is twofold: (1) to provide solid A key element of the LDI convergence framework is a set of
(breadth and depth) input from an area of expertise to all LDI processes, mechanisms, and tools to foster collaboration, broad
participation, diffusion and integration of knowledge across 3.6 New Shared Vocabulary
disciplines, and coordination. LDI participants have started to develop an emerging shared
LDI has implemented an iterative process of idea and solution vocabulary and language, which enables more effective and
generation and refinement that includes internal (from other LDI efficient communication and collaboration across disciplines and
members) and external (paid, external ad-hoc reviewers) feedback which constitutes a key ingredient of convergence research. For
loops. Furthermore, we have set in place synchronous and instance, new vocabulary includes introducing many team
asynchronous, face-to-face and virtual coordination, collaboration, members to the notion of convergence research, concrete tasks or
and communication channels supported by adequate processes that “Scale-Up Projects,” “learner model,” “cloud continuum,” scaling-
will facilitate exchange of ideas across disciplines. Processes that up AISs “horizontally” and “vertically,” and AISs-teacher
enable broad participation and input from everyone were designed partnership models. The vocabulary is dynamic and evolving. For
and implemented, including the use of NGT (Nominal Group instance, we have been using the term “concrete task” to indicate
Technique; Delbecq & Van de Ven, 1971) process for meetings to prototyping tasks led by researchers in LDI Phase 1 which would
ensure everyone’s voice is heard and accounted for. Other result in some kind of data science prototype or deliverable (e.g., a
processes such as SWOT analysis (to identify strengths, significant dataset and/or peer-reviewed publications). In this work,
weaknesses, opportunities, threats) and “pre-mortem” analysis we use the term “concrete task” and “Scale-Up Project” essentially
(Klein, 2007) (i.e., identifying possible points of failure interchangeably as the latter reflects our intent for each concrete
prospectively rather than retrospectively, by imagining a future task to scale up in some dimension in Phase 2.
situation in which a project has failed and considering how that Synchronous and asynchronous interactions and activities have
imaginary failure might have occurred) were used as well. enabled better communication and understanding of various
Processes implemented were intended to grow science convergence domain-specific terms by team members with limited initial
among our large team of interdisciplinary experts. Within- and expertise or understanding of those terms (e.g., “model parameters”
cross-domain interaction and collaboration processes were in machine learning/data science, “domain model” in learning
designed among subgroups of our team as well as all-team engineering, or the meaning and importance of the socio-cultural
interactions and communications (e.g., whole-team meetings, aspects of human learning). We expect the development and
mailing lists, website) in order to develop a common vision and emergence of a shared vocabulary and language to continue and
language and to ensure cohesiveness and clarity with respect to the stabilize over time.
mission of the LDI, responsibility for various tasks, and engaging
the community for assistance when needed. 3.7 New Research Vistas—Investment
Opportunities in the 5-year Institute Plan
An abbreviated list of activities, tools, and structures LDI Our strategy to accomplish the LDI mission of transforming the
implemented to realize the above iterative idea and solution learning ecosystems, in a proposed 5-year institute, is to focus on a
generation and broad and deep collaborations include: An number of carefully selected research priorities, targeting key
iterative process of ideas and solution generation and aspects of the learning ecosystem which we believe are at a “tipping
refinement that includes internal (from other team members) point” (i.e., a point at which timely investment in data-intensive
and external (paid, external ad-hoc reviewers) feedback loops approaches focusing on those critical aspects has the maximum
asynchronous and synchronous, face-to-face and virtual potential for a transformative effect).
coordination, collaboration, and communication channels
supported by adequate processes that will facilitate exchange The identified research priorities were the result of an intense
of ideas across disciplines science convergence process involving a number of activities (e.g.,
A federation of semi-autonomous groups (e.g., Expert Panels, brainstorming sessions or “ideas labs” followed by iterative
concrete task teams) coordinated by a Leadership Team discussions for ranking and selection at “all-hands” virtual
meetings, engagement with Expert Panels, etc.). Processes and
Regular virtual meetings of the Core Leadership Team (as the
activities engaged all LDI team members across many disciplines
conceptualization phase has largely taken place during the
(e.g., educators, education researchers, computer scientists,
global pandemic)
statisticians, cognitive scientists), developers (Carnegie Learning,
Two full-team or “all-hands” virtual meetings each year
Age of Learning, Gooru), school districts (Shelby County Schools,
Two workshops (in 2020 and 2021) at the International Brockton Public Schools), as well as researchers from other
Conference on Educational Data Mining (to which this piece projects funded by NSF (e.g., Northwestern’s TRIPODS Cohort II
contributes) to engage with a broader international community
project: IDEAL - The Institute for Data, Econometrics, Algorithms,
of scholars and Learning; CMU’s DIBBS LearnSphere: Building a Scalable
Meetings at major conferences that our team members attend Infrastructure for Data-Driven Discovery and Innovation in
Quarterly updates and Requests-for-Comments from Expert Education; and the University of Memphis NSF project: Advancing
Panels the Science of Learning Data Science with Adaptive Learning for
Mini-workshops in the form of full-day brainstorming Future Workforce Development). That is, the identified research
sessions on a particular task priorities reflect our collective interdisciplinary wisdom that timely
Transformative app ideation at “all-hands” meetings investment in data-intensive approaches will have the maximum
Email, cloud-shared documents, wikis, Slack, and other potential for a transformative effect.The identified investment
collaboration tools for collaboratively drafting and refining opportunities (or research priorities) constitute the central focus of
ideas, solutions, and processes the 5-year plan for the LDI. It should be noted that we also
Software repository managed with the version control generated a 10-year plan such that the impacts of the LDI Institute
software, e.g., github or SVN will propagate and evolve beyond the lifetime of the award and
Project management software to keep track of task progress beyond our own team thus acting as an agent of change for how
and major milestone deadlines and deliverables
research questions are conceived and addressed through learner data at scale using distributed computing (e.g., leveraging
interdisciplinary collaboration. the cloud-continuum), scalable algorithms, and richer/more
Identified key investment opportunity areas or thrusts include: powerful algorithms (e.g., emerging neuro-symbolic approaches).
Indeed, access to data at scale is a more critical, upstream challenge
Investment Opportunity Area 1: Scaling Up Access To
that needs to be addressed first as before being able to process
Learning Data – From Impoverished Datasets To Learning
learning data, one must have access to the data and have permission
Data Convergence To Comprehensive Learner Models
to share it. LDI adopts the principle that data owners (e.g., learner/
Investment Opportunity Area 2: Novel, Richer, More parent/ guardian/ teacher/ school/ developer/ etc.) should be given
Powerful, Scalable, and Accurate Data-intensive Solutions to a spectrum of options with respect to data sharing or, if deciding
Core Education Tasks not to share, with respect to providing access to data. The spectrum
Investment Opportunity Area 3: Human Technology of options should accommodate all attitudes that learners/learning
Frontier – Pushing For Wider Adoption and Integration Of data owners may have towards data ownership, security, and
AISs privacy. Indeed, access to learner data is a complex issue due to
privacy, security, ownership, and regulatory concerns.
Investment Opportunity Area 1: Scaling Up Access To Learning
Data. To enable data science, there must be data and in particular We are aware that full data convergence would be hard to achieve
“big” education data (big edu-data). To this end, a key long term for various reasons. However, our goal is to push the limits of what
goal of LDI is learning data convergence, i.e., collecting and is possible, understand those limits, and act accordingly.
aligning (more) comprehensive data about the same learner(s) Understanding the limits of data convergence will allow us to
across skills, disciplines, and modalities (cognitive, meta-cognitive, understand the limits of technology, what teachers can do to
emotional, motivational, behavioral, social) and across time (e.g., compensate for those limitations, and how to best orchestrate the
K-12 grade-levels), as well as data about the learning process and learner-teacher-AISs partnership.
environment. Our data convergence activity focuses on concrete examples from
Prior efforts such as LearnSphere/DataShop have made progress math and computer science (STEAM+C) as well as literacy and
towards building data infrastructure and capacity in education leverage prior efforts in the area of building data infrastructure and
contexts, but slow data convergence is a critical issue that hinders capacity, contributing and expanding on those previous efforts to
realizing the full potential of data and data science to transform the move us closer to the goal of full data convergence. Specifically,
learning ecosystem. For instance, the DataShop metric reports one major goals is to build a fine-grain, large, and diverse (deep,
show that most of the data is composed of datasets in the standard wide, long) dataset that will enable LDI to explore the potential of
DataShop format, of which there are about 3500 data science methods to better model learners and the learner
(https://pslcdatashop.web.cmu.edu/MetricsReport). While process. We announced and started the process of building
accumulating this many datasets is no small feat, the average LearnerNet in Fall 2019 as part of LDI Phase 1 (see Rus, 2019 –
number of observations per student is less than 400. A large number ADL Directors’ meeting talk). Indeed, we have called for the
of students, greater than 800,000, is spread across more than 3000 development of LearnerNet (Rus et al., 2020), an “ImageNet” (Su,
datasets, resulting in less than 260 students per dataset. Similary, Deng, & Fei-Fei, 2012) for learner modeling which could enable a
the recently released EduNet (Choi et al., 2020) contains data from transformation of our modelling and understanding of how learners
784,309 students preparing for the Test of English for International learn, of how AISs can be made more capable of adapting to diverse
Communication at an average of 400.2 interactions per student. learners, and fueling a better understanding of the learning
Despite progress in building edu-data repositories, there is an ecosystem as a whole.
“impoverished datasets” challenge in education. Investment Opportunity Area 2: Novel, Richer, More Powerful,
Ideally, big edu-data would include data about millions of learners Scalable, and Accurate Data-intensive Solutions to Core Education
that are fine-grain (e.g., step/substep level information or detailed Tasks.
process data), rich (capturing cognitive, affective, motivational, This investment opportunity area focuses on improving existing
behavioral, social, and epistemic facets of learning), and methods and models with respect to their scaling and extension
longitudinal (across many grades). That is, big edu-data should be using big edu-data and developing novel, richer, more powerful,
deep (e.g., about many learners), wide (e.g., capture as many scalable, and accurate computational models for a number of core
learning relevant aspects as possible), and long (being longitudinal, educational tasks such as prediction and assessment of learner
across many grades or even a learner’s lifetime). Convergence mastery of knowledge components (KCs; micro-competencies or
efforts will seek to “deepen” samples and “lengthen” timeframes of skills), domain model refinement (i.e., improving models of what
datasets that are (sometimes, but not always, already) “wide” in learners need to learn to acquire mastery of a domain), and inferring
terms of features captured. optimal strategies to coordinate the behavior of AISs for how and
Using these concepts, our goal can be re-stated as enabling the when to optimally implement guidance to promote student
collection of deep, wide, and long education data which could then learning. The goal is to improve our understanding of how learners
be analyzed using emerging, state-of-the-art data science methods learn, improve the effectiveness and efficiency of AISs, make AISs
capable of learning patterns from such massive collections of data more affordable and scalable horizontally (across topics and
and also accounting for input from diverse domain experts with the domains), and scale AISs vertically (offering training on higher-
ultimate goal of transforming the learning ecosystem. level skills such as deep conceptual understanding and
collaborative problem solving).
In order to fully harness the data revolution to transform the
learning ecosystem we need: (1) improved, at-scale data collection One major opportunity from a learning engineering perspective is
and (near) real-time access to big edu-data (i.e., addressing the the automation of the development and refinement of AISs and
“impoverished datasets” challenge) in ways that account for adaptive instructional content. Making progress towards
security, privacy, and ownership and (2) infrastructure to process automating the authoring of AISs should begin to enable better
scalability across topics and domains (horizontal scalability), which
currently is a major stumbling block for a wider adoption of such transforming communities of practice effort. To this end, we plan
systems. Expert-driven approaches to developing domain models, to develop new curricula for data literacy to be used by teacher
learner models, and instructional strategies for new topics and training programs.
domains are expensive, tedious, and time-consuming. Automated Models of Learner-Teacher-AISs Partnership. Finding the best
or semi-automated approaches to discovering domains models, learner-teacher-AISs partnerships could have transformative
inferring learner models, and discovering instructional strategies impact on the learning ecosystem such as freeing teachers from
are much needed. For instance, we intend to use neuro-symbolic certain duties that AISs can do in an autonomous manner thus
approaches to automatically extract from both structured, e.g., allowing them to focus on higher level tasks such as designing new
student performance data, and semi-structured data, i.e., text in instructional materials or novel tailored interventions for students,
textbooks, domain models. , motivational support, and other tasks for which AISs are not ideal
A second major opportunity within this thrust involves AISs for This better distribution of duties and coordination between teachers
collaborative learning with intelligent discourse components. and AISs should lead to a more effective, efficient, engaging, and
Widely deployed, commercial AISs largely do not target advanced equitable learning ecosystem. We will study four levels of AISs
topics such as collaborative problem solving. Collaborative work autonomy with respect to how teachers may use AISs (see later).
and collaborative problem-solving skills are much needed in the Detect and Mitigate Issues Related to Ethics, Equity, Inclusion, and
21st century (Autor, Levy, & Murnane, 2003; Carnevale & Smith, Diversity in Education. As a general principle, all LDI activities
2013), and learning activities fostering the acquisition of such skills will be informed and guided by our goal of using data science and
must be adopted by learning ecologies of the future in order to make AISs to promote ethics and equity in education (Riddle et al., 2015;
such ecologies more effective and equitable for all learners and Corbett-Davies & Goel, 2018; Gardner, Brooks, & Baker, 2019).
more relevant to emerging needs and new realities. Our goal is to At the same time, the Ethics and Equity Expert Panel will review
scale up AISs vertically, to offer training opportunities for such all LDI efforts to ensure ethics and equity aspects are properly
advanced skills. The strategy is to extend AISs such as those addressed. Furthermore, our institute 5-year plan includes a set of
offered by Carnegie Learning and Age of Learning with language activities focusing on ethics and equity which fall into three
through discourse components. categories: (1) using data and data science to further our
Language and discourse play a central role in learning (Vygotsky, understanding of biases and achievement gaps in the learning
1978), particularly for the acquisition of difficult topics that require ecosystem; (2) understanding and mitigating ethics and equity
deep comprehension, reasoning, problem solving, and throughout the data lifecycle with a focus on algorithmic bias and
collaboration that are required for higher paying jobs in the 21st developing tools to address these issues throughout the work of the
century (Autor, Levy, & Murnane, 2003; Carnevale & Smith, LDI; and (3) increasing diversity and inclusion during collaborative
2013). Language and discourse are essential for developing learning activities.
argumentation skills (Ferretti & de la Paz, 2011), disciplinary
literacy (Goldman et al., 2016; Shanahan & Shanahan, 2008; 3.8 Evaluation and Refinement
Shaffer, 2017), reasoning associated with mental models (Graesser, Evaluation and analysis are key elements of the LDI convergence
2020), and formulating explanations of complex systems in science framework to both demonstrate its effectiveness and provide a way
(Chi et al., 1989; Graesser, 2015), math (Fancsali et al., 2016), and to identify opportunities for improvement and refinement. We
computer code (Lasang et al., 2021). focus on quantitative and qualitative metrics for LDI community
building and engagement efforts, identifying investment
Language and discourse is not only essential for learning within
opportunities priorities, and development and refinement of
individuals but also learning in group contexts. Problems have
prototyping concrete task or Scale-Up Project activities. For
dramatically increased in complexity, requiring collaborative
quantitative metrics, to account for different perspectives, we will
problem solving by people with disparate expertise and
report how many experts and from how many different disciplines
perspectives (Carnevale & Smith, 2013; Graesser et al., 2018;
contribute to specific tasks (e.g., identification of data requirements
OECD, 2017).
for Investment Opportunity Area 1, above). For each expert, we can
Investment Opportunity Area 3: Human Technology Frontier – monitor their individual contributions in terms of content (e.g.,
Pushing For Wider Adoption and Integration Of AISs word counts), comments, and revisions to others’ contributions (by
This investment opportunity fosters a portfolio of efforts to push using shared documents that track such metrics). More
for wider adoption and integration of AISs with school-based and qualitatively, each member’s contributions will be assessed in
teacher-led learning activities at the Human-Technology Frontier, terms of the depth of their contributions. A researcher might
one other of NSF’s ten Big Ideas for Future Investment. identify that a particular expert’s contribution initiated the
development of a novel solution that could improve the detections
Many teachers are overwhelmed by the many duties and tasks they of learners’ emotions in a classroom context.
have to handle, resulting in burnout and reduced teacher job
satisfaction and retention rates (Grayson & Alvarez, 2007; Rhodes, Furthermore, we report the scientific and societal impact of the
Nevill, and Allan, 2004). To assist teachers, major goals and proposed convergence framework. Scientific impact can be
corresponding Scale-up Projects include: (1) to help teachers better reported in terms of the number of publications, presentations,
understand the potential of using AISs and data science to tutorials, meetings, email exchanges and other forms of direct
transform education including their job performance and communication (among LDI members and the broader research
satisfaction; (2) to propose and investigate learner-teacher-AISs community) as well as improvements of prototype solutions over
collaboration models and interfaces including the validation of a existing solutions. Other scientific success measures can monitor
framework for learning experience design; and (3) to design and longer term impact such as how many citations the products of this
develop dashboards for teachers to learn from, interpret, and make project generate and how many research groups integrate the
decisions based upon fine-grained, comprehensive learning data. proposed solutions (e.g., user adoption of analysis toolkits
Helping teachers, parents, and other stakeholders understand the developed).
potential of data science and AISs is important for LDI’s
Societal impact can be assessed through impact on learners and phase (Phase 2). Expert panels had the freedom to adopt different
teachers as well as impact on the learning ecosystem (e.g., in terms internal processes to identify investment opportunities.
of how LDI efforts have made aspects of the learning ecosystem
more effective, engaging, equitable, efficient, relevant, and Expert Panel 9 (1 of 10 Expert Panel members left LDI
affordable, as well as other outcomes such as transforming Reviewer Pool after assignment to Expert Panel.)
educators’ community of practice). Participation 7 / 9 (Two members were assigned reviews
rate but did not submit any reviews.)
An important requirement for the evaluation process is
documentation of the various elements of the convergence Concrete Tasks 17
framework. For this purpose, for instance, all meetings of the Reviewed
leadership team were recorded (key metric: hours of meetings and Total Concrete 34 (17 task x 2 reviews/task)
interactions; volume of those interactions). Other processes and Task Reviews
activities have been documented in various ways such as Google
docs, meeting recording, and Slack asynchronous discussions. For Number of 3.3 (average over the 7 reviewers submitting
instance, the convergence process implemented to generate the 5- Reviews Per at least one review; min: 2; max: 7)
year institute plan has been well documented through other records Member
such as spreadsheets used in NGT processes employed by the Total Expert (34 x 2) + (7 x 2) = 82 hours of expert time
various Expert Panels to generate and rank ideas for investment Time (assuming 2 hours spent per concrete task
opportunities to be included in the 5-year plan. review and 2 hours of Expert Panel meeting
We will illustrate how we have been evaluating the effectiveness of to summarize the reviews for each concrete
convergence framework holistically as well as from the perspective task)
of Expert Panels. For brevity, we illustrate the evaluation of the Expert Panelist 4.82 hours (82 total hours / 17 concrete
convergence process from the perspective of the Learning Time per tasks)
Engineering Expert Panel. Concrete Task
The LDI’s Learning Engineering Expert Panel comprised a diverse Panel 279 words per task (average); 4,749 total
group of researchers and developers with vast experience in Summary
research and development of learning systems. The 10-member Word Count
expert panel was drawn from the academe, government, and Table 2. A summary of the quantitative evaluation of the concrete
industry. task review and feedback process by the Learning Engineering
The Learning Engineering Expert Panel, like other LDI expert Expert Panel.
panels, engaged in two major activities that contribute to the LDI This policy was adopted for two main reasons: (i) offer autonomy
Phase 1 project: to each expert panel to self-organize and (ii) explore different
- Provide input to each of the concrete tasks (forward-looking collaboration processes in order to discover the best one (e.g., in
“Scale-Up Projects”) addressing various challenges in the terms of member engagement, effectiveness, and efficiency) or
learning ecosystem with the goal of converging to solutions to identify from each expert panel a set of best practices for later
those challenges that account for input from many domains. adoption. In the case of the Learning Engineering Expert Panel,
investment opportunity ideas were solicited via e-mail from the
- Identify, rank, and propose investment opportunities for the 5- Expert Panel by the Co-Leads. A brief summary of candidate
year plan of the convergence or institute phase (LDI Phase 2) opportunities is provided below:
The concrete task reviewing and feedback process involved Improving and scaling up AISs horizontally across topics and
significant expert time (see Table 2, which presents a summary of domains
the quantitative evaluation of the initial cycle of the review and
feedback process by the Learning Engineering Expert Panel). Scaling up AISs vertically targeting advanced skills such as
collaborative problem solving and deep conceptual
In addition to this quantitative summary of the convergence process understanding of complex STEAM+C topics
related to concrete tasks, we also developed a 5-stage model to
characterize the maturity of concrete tasks: (1) ideation or initial (More) Comprehensive learner models
idea, (2) conceptualization and convergence of a data science
solution with input from experts from many domains, (3) Pushing for wider adoption and integration of AISs in school-
implementation & refinement, (4) product release (e.g., an based and teacher-led instruction (Human-Tech Frontier)
emerging data science prototype or dataset release), (5) impact, in Models of Teacher - AISs inter-operation
which the product from stage 4 is adopted by or integrated into
external research projects or a learning environment, having some Causal modeling for learning engineering
external impact on the research landscape or on the learning
Inclusive learning engineering R&D (ethics, equity, inclusion,
ecosystem. Work of LDI participants during the conceptualization
and diversity)
phase has centered primarily on concrete tasks in the first four
phases (ideation, conceptualization and convergence, and product This list was further discussed and the initial investment
release). Ideally, the transition from concrete task to “Scale-Up opportunities were ranked by all expert panel members. A
Projects” in LDI Phase 2 will reflect progression to later stages of recommendation of the most important investment opportunities
this model. was put forward to the whole LDI team for further debate and
refinement by other Expert Panels and paid, ad-hoc external
The other major task of each Expert Panel was to identify
reviewers and the public at large. Many of the proposed investment
investment opportunities for the 5-year plan of the LDI institute
opportunities that originated in the Learning Engineering Expert other level (level 0) which are self-improving, fully autonomous
Panel are part of the 5-year institute plan adopted by the broader AISs – they improve with experience with minimal or no developer
LDI community. intervention. While we will explore as resources permit the role of
data science to enable such level 0, self-improving fully
Holistically, the LDI convergence framework can be evaluated in autonomous AISs, from a teacher and learner perspective they are
terms of the level of engagement of a diverse team of researchers, similar to the fully autonomous level of AISs (level 1).
developers, practitioners, and other stakeholders as well as its key
outcome, which is the 5-year plan for the institute or convergence We plan to study and understand the trade-offs in terms of teacher
phase which was described and submitted as a proposal to NSF. involvement in tuning AISs vs. levels of AIS autonomy. For
The level of engagement can be summarized briefly by noting that instance, teachers may choose a fully autonomous mode of
our 60+ strong team participated so far in 3 all-hands meeting each operation for an AIS meant for students working independently
for about 20 hours (2.5 days) resulting in 60 x 20 = 1,200 expert with the system afterschool as supplemental instruction, whereas
hours of effort. Experts spent hundreds of additional hours spent in for student interactions with the AIS during a class period (i.e., in a
other meetings and other activities. Most meetings were recorded blended-learning environment), the same teacher may choose to
and transcribed. A more detailed, quantitative and qualitative control more the behavior of the AISs. Similarly, teachers may
analysis is being conducted right now, and the results will be widely decide to use/download a pre-trained learner model and update it
disseminated. with data from her students, assuring data security and privacy and
maintaining full ownership of the data. They may decide to share a
4. EMERGING IDEAS sample of her own student data to benefit the pooled/pre-trained
We conclude this progress report by briefly presenting two models that everyone can download as default.
emerging ideas from the collective work of the LDI during its
conceptualization phase to date.
4.3 Transforming Communities of Practice
LDI intends to serve as an agent of change for how research
4.1 Policy Recommendations questions are conceived and addressed through interdisciplinary
Our work so far also results in a number of policy collaboration such that LDI’s impacts will propagate and evolve
recommendations: beyond the lifetime of the award.
- Publicly funded education technologies similar to publicly More specifically, we have the explicit intent to start a culture shift
funded education adopted in the 19th and 20th century. in teacher training programs through two specific actions: (1)
involve a few dozen teachers and pre-service teachers in our work
- Learning data owners keep ownership of their data and have
in order to co-design solutions and account for their input and
decision power with respect to where their data is stored, how
expose them to the potential of data science and AISs while also
the data is accessed, by whom and for what purposes, how
introducing them to science convergence approaches to address key
their data is used, and if their data can be shared, with whom,
challenges in education and (2) develop new curriculum
and under what conditions and circumstances.
recommendations for teacher training programs as well as
- Learning data infrastructure is needed to enable responsible accompanying training materials to build capacity for teachers and
learning data collection, storage, access, sharing, and other stakeholders to adopt AISs and data science approaches,
processing. tools, and principles to improve learning and teaching.
- The need for a culture shift in teacher training programs and Wider adoption of advanced data-driven science and engineering
data literacy curriculum for future teachers. approaches and tools such as AISs is still lacking for at least three
reasons: (1) Data science and education technology training is often
4.2 AISs Autonomy Levels or Teacher-AISs limited in teacher training programs. (2) The sophistication and
Partnership Models complexity of AISs often entail a significant effort to train teachers
Finding the best teacher/learner-AISs partnerships could have to effectively use such advanced education technologies. (3) New
transformative impact on the learning ecosystem, potentially approaches are often developed with a lack of substantive
freeing teachers from certain duties that AISs can do in an involvement of educators and schools.
autonomous manner and allowing teachers to focus on higher level
Involving educators will help to ensure that new approaches based
tasks such as tailored, individualized interventions for students,
on data science to tackle various education challenges, next-
motivational support, and other tasks for which AISs are not ideal.
generation AISs, and learning environments that include AISs, are
This better distribution of duties and coordination between teachers
designed to help eliminate biases and promote equity, inclusion,
and AISs should lead to a more effective, efficient, engaging, and
and diversity, offering high quality education opportunities for all
equitable learning ecosystem.
learners. We will therefore push for schools, teacher training
We defined and intend to study four levels of AISs’ “autonomy” programs, and instructors to collaborate more with data science and
with respect to how teachers can use such AISs: (1) fully educational technology researchers and developers to improve
autonomous – teachers need little (if any) training and have little (if learning and instruction. To this end, in addition to substantive
any) involvement in “tuning” AISs, (2) minimal teacher involvement of teachers and other stakeholders in LDI activities,
involvement – teachers tune the parameters of the AISs with the we will explore avenues for delivering professional learning,
help of the AISs developer at the beginning of the school year or including workshops for teachers, summer schools (e.g., by adding
semester (minimal teacher training with respect to the workings of a track to CMU’s LearnSphere summer school) for pre-service
the AISs), (3) average teacher involvement – teachers require teachers and Research Methods instructors in schools of education.
training, and they work with the system on a weekly basis selecting
We are an expanding community of practice and promote Scale-Up
instructional tasks and receiving information from the AISs, (4)
Projects that will ideally become bona fide research programs
teacher-driven – the teachers exerts full control of the AISs
beyond the award period, securing their own funding as they make
including overriding decisions the AISs may take or suggest, the
scientific progress. Furthermore, Scale-Up projects and research
teacher will interact almost daily with the AISs. There is in fact one
thrusts will ideally result in career-long efforts for some younger Artificial Intelligence in Education. AIED 2020. Lecture
faculty members. Notes in Computer Science, vol 12164. Springer, Cham.
To sum up, our strong team of interdisciplinary experts, developers, https://doi.org/10.1007/978-3-030-52240-7_13
and practitioners will work together during the 5-year LDI institute [12] Cohen, P. A., Kulik, J. A., & Kulik, C. C. (1982).
project to move current practices beyond the small-scale studies to Educational outcomes of tutoring: A meta-analysis of
bring the learning sciences into the era of big data and findings. American Educational Research Journal, 19, 237-
interdisciplinary science convergence. The impact of LDI will be 248.
felt far and wide, propagating and evolving beyond the lifetime of [13] Cohen, P.R. (2015). DARPA's Big Mechanism program.
the award and beyond our own team, acting as an agent of change Physical Biology, Volume 12, Number 4, 1-9.
for how research questions are conceived and addressed through
interdisciplinary, collaboration, and co-designed research and [14] Corbett-Davies, S. & Goel, S. (2018). The Measure and
development. The proposed processes, methods, and studies pave Mismeasure of Fairness: A Critical Review of Fair Machine
the way for taking these outcomes to other domains. Learning, arXiv:1808.00023, 2018.
[15] Delbecq, A. L., & Van de Ven, A. H. (1971). A group
ACKNOWLEDGMENTS process model for problem identification and program
The Learner Data Institute is sponsored by the National Science planning. The Journal of App. Beh. Science, 7(4), 466-492.
Foundation (NSF; award #1934745). The opinions, findings, and
results are solely the authors’ and do not reflect those of NSF. [16] Dwork, C. (2008). Differential privacy: A survey of results.
In International conference on theory and applications of
5. REFERENCES models of computation, pp. 1–19. Springer.
[1] Anders, R., Oravecz, Z., & Batchelder, W. (2014). Cultural [17] Fancsali, S.E., Ritter, S., Berman, S.R., Yudelson, M., Rus,
consensus theory for continuous responses: A latent appraisal V., and Morrison, D.M. (2016). Toward Integrating
model for information pooling. Journal of Mathematical Cognitive Tutor Interaction Data with Human Tutoring Text
Psychology, 61, 1–13. Dialogue Data in LearnSphere. In: J.P. Rowe and E.L. Snow
[2] Atwal, H. (2020). DataOps Technology. In Practical (Eds.), Proceedings of the Workshops at the 9th Intern. Conf.
DataOps 2020 (pp. 215-247). Apress, Berkeley, CA. on Educ. Data Mining, Raleigh, NC, USA, June 29, 2016.
[3] Autor, D., Levy, F., & Murnane, R. (2003). The Skill [18] Fancsali, S.E., Yudelson, M.V., Berman, S.R., Ritter, S.
Content of Recent Technological Change: An Empirical (2018). Intelligent instructional hand offs. In: K.E. Boyer,
Exploration, Quarterly Journal of Economics, 118(4), M.V. Yudelson, (Eds.) Proceedings of the 11th International
November 2003, 1279-1334. Conference on Educational Data Mining (EDM 2018), pp.
198–207. International Educational Data Mining Society.
[4] Bach, S.H., Broecheler, M., Huang, B., and Getoor, L.
(2017). Hinge-loss Markov Random Fields and Probabilistic [19] Ferretti, R. P., & De La Paz, S. (2011). On the
Soft Logic. Journal of Machine Learning Research, 18, pp. 1 comprehension and production of written texts: Instructional
– 67, 2017. activities that support content-area literacy. In R. O’Connor
& P. Vadasy (Eds.), Handbook of reading interventions (pp.
[5] Berry, G. R. (2011). Enhancing effectiveness on virtual 326–355). New York, NY: Guilford.
teams: Understanding why traditional team skills are
insufficient. The Journal of Business Communication, 48(2), [20] Gardner, J., Brooks, C., & Baker, R. S. J. d. (2019).
186-206. Evaluating the Fairness of Predictive Student Models
Through Slicing Analysis, in Proceedings of the 9th
[6] Bishop, C. M. (2013). Model-based machine learning. International Conference on Learning Analytics &
Philosophical Trans. of the Royal Society A: Mathematical, Knowledge, 2019, pp. 225–234.
Physical and Engineering Sciences, 371(1984).
[21] Goldman, S. R., Britt, M. A., Brown, W., Cribb, G., George,
[7] Bloom, B. S. (1984). The 2 Sigma Problem The Search for M., Greenleaf, C., Lee, C. D., Shanahan, C., & Project
Methods of Group Instruction as Effective as One-to-One READI. (2016). Disciplinary literacies and learning to read
Tutoring. Educational Researcher, 13, 4-16. for understanding: A conceptual framework of core
[8] Carnevale, A.P., & Smith, N. (2013). Workplace basics: The processes and constructs. Educational Psychologist, 51, 219-
skills employees need and employers want. Human Resource 246.
Development International, 16, 491–501. [22] Graesser, A.C., Fiore, S.M., Greiff, S., Andrews-Todd, J.,
[9] Chesler, N. C., Bagley, E., Breckenfeld, E., West, D., & Foltz, P.W., & Hesse, F.W. (2018). Advancing the science of
Shaffer, D. W. (2010). A virtual hemodialyzer design project collaborative problem solving. Psychological Science in the
for first-year engineers: An epistemic game approach. In Public Interest, 19, 59-92.
ASME 2010 Summer Bioengineering Conference (pp. 585- [23] Grayson, J. L., & Alvarez, H. K. (2007). School climate
586). American Society of Mechanical Engineers. factors relating to teacher burnout: A mediator model.
[10] Chi, M.T.H., Roy, M.& Hausmann, R.G.M. (2008). Learning Teaching and Teacher Education, 24(5), 1349-1363.
from observing tutoring collaboratively: Insights about [24] Growing Convergence Research. (NSF-GCR, 2020).
tutoring effectiveness from vicarious learning. Cognitive National Science Foundation’s Growing Convergence
Science, 32, 301-341. Program,
[11] Choi Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J., https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5056
Bae, C., Kim, B, & Heo, J. (2020) EdNet: A Large-Scale 37 (accessed online on June 15, 2020)
Hierarchical Dataset in Education. In: Bittencourt I.,
Cukurova M., Muldner K., Luckin R., Millán E. (eds)
[25] Guribye, F. , Andressen, E.F. , & Wasson, B. ( 2003).The [40] Rus, V., Banjade, R., Maharjan, N., Morrison, D., Ritter, S.,
organization of interaction in distributed collaborative and Yudelson, M. (2016). Preliminary Results on Dialogue
learning. In B. Wasson , S. Ludvigsen , & U. Hoppe (Eds.), Act Classification in Chatbased Online Tutorial Dialogues,
Designing for change in networked learning environments Proceedings of the 9th International Conference on
(pp. 385-394). Dortrecht, Netherlands: Kluwer Academic. Educational Data Mining, Raleigh, NC, 2016.
[26] Harnesssing the Data Revolution. (NSF-HDR, 2021). [41] Rus, V., Fancsali, S.E., Bowman, D., Pavlik Jr., P., Ritter, S.,
National Science Foundation’s Harnessing the Data Venugopal, D., Morrison, D., and The LDI Team (2020).
Revolution, https://www.nsf.gov/cise/harnessingdata/ The Learner Data Institute: Mission, Framework, &
(accessed online on June 11, 2021) Activities. In V. Rus & S.E. Fancsali (Eds.) Proceedings of
[27] Hartline, J. D. (2012). Bayesian mechanism design. The First Workshop of the Learner Data Institute, The 13th
Theoretical Computer Science 8(3), 143–263. International Conference on Educational Data Mining (EDM
2020), July 10-13, Ifrane, Morroco (held online).
[28] Hartline, J. D., A. Johnsen, D. Nekipelov, and O. Zoeter
(2019). Dashboard mechanisms for online marketplaces. In [42] Shaffer, D. W. (2017). Quantitative ethnography. Madison,
Proceedings of the 2019 ACM Conference on Economics WI: Cathcart Press.
and Computation, pp. 591–592. [43] Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary
[29] Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith, literacy to adolescents: Rethinking content-area literacy.
J., Sreekanti, V., Tumanov, A., & Wu, C. (2019). Serverless Harvard Educational Review, 78, 40−59.
Computing: One Step Forward, Two Steps Back. [44] Sottilare, R.A., Brawner, K.W., Goldberg, B.S., & Holden,
arXiv:1812.03651, 2019. H.K. (2012). The Generalized Intelligent Framework for
[30] Hoffmann, L. (2019). Reaching New Heights with Artificial Tutoring (GIFT). Downloaded from www.gifttutoring.org on
Neural Networks: ACM A.M. Turing Award recipients November 30, 2012.
Yoshua Bengio, Geoffrey Hinton, and Yann LeCun. [45] Spirtes, P., Glymour, C., Scheines, R. (2001). Causation,
Communications of the ACM. June - 2019, p. 96-95. Prediction, and Search. 2nd Edition. MIT.
[31] Klein, G. (2007). Performing a Project Premortem. Harvard [46] Su, H., Deng, J. and Fei-Fei, L. (2012). Crowdsourcing
Business Review. 85 (9): 18–19. Annotations for Visual Object Detection. AAAI 2012 Human
[32] Lave, J., & Wenger, E. (1991). Situated learning: Legitimate Computation Workshop, 2012.
peripheral participation. Cambridge University Press. [47] Tamang, L.J., Alshaikh, Z., Ait-Khayi, N., Oli, P., & Rus, V.
[33] Lilian, S. C. (2014). Virtual teams: Opportunities and (2021). A Comparative Study of Free Self-Explanations and
challenges for e-leaders. Procedia-Social and Behavioral Socratic Tutoring Explanations for Source Code
Sciences, 110, 1251-1261. Comprehension, Proceedings of the 52nd ACM Technical
Symposium on Computer Science Education, pp. 219-225,
[34] Liu, R., Koedinger, K., Stamper, J., & Pavlik Jr., P. I. (2017). March, 2021.
Workshop: Sharing and Reusing Data and Analytic Methods
with LearnSphere. In X. Hu, T. Barnes, A. Hershkovitz, & L. [48] VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P.,
Paquette (Eds.), Proc. of the 10th Int. Conf. on Educ. Data Olney, A., & Rose, C. P. (2007). When are tutorial dialogues
Mining (pp. 475-476). Wuhan, China. more effective than reading? Cognitive Science, 31, 3-62.
[35] Mislevy, R. J., Almond, R. G., Yan, D., & Steinberg, L. S. [49] Vroman, K. , & Kovachich, J. ( 2002). Computer-mediated
(1999). Bayes nets in educational assessment: Where the interdisciplinary teams: Theory and reality. Journal of
numbers come from. In Proceedings of the fifteenth Interprofessional Care, 16, 159-170.
conference on uncertainty in artificial intelligence (pp. 437– [50] Vygotsky, L.S. (1978). Mind in society: the development of
446). UAI’99. Stockholm, Sweden: Morgan Kaufmann Pubs. higher psychological processes. London: Harvard University
[36] OECD (2017). PISA 2015 Results (Volume V): Collaborative Press.
Problem Solving. Paris: OECD Publishing. [51] Walther, J.B. (1995). Related aspects of computer-mediated
[37] Pearl, J. & Mackenzie, D. (2018). The Book of Why: The communication: Experiential observations. Organizational
New Science of Cause and Effect. Basic Books, New York. Science, 6, 180-203.
[38] Rhodes, C., Nevill, A. & Allan, J. (2004) Valuing and [52] Wang, X., S. Ranellucci, and J. Katz. (2017). Global-scale
supporting teachers: A survey of teacher satisfaction, secure multiparty computation. In B. M. Thuraisingham, D.
dissatisfaction, morale and retention in an English local Evans, T. Malkin, and D. Xu (Eds.), ACM CCS 2017: 24th
education authority. Research in Education, 71 (1), 67-80. Conference on Computer and Communications Security,
Dallas, TX, USA, pp. 39–56. ACM Press.
[39] Riddle, T., Bhagavatula, S., Guo, W., Muresan, S., Cohen,
G., Cook, J., and Purdie-Vaughns, V. (2015). Mining a [53] Weick, K. E. (1976). Educational organizations as loosely
Written Values Affirmation Intervention to Identify the coupled systems. Administrative science quarterly, 1-19.
Unique Linguistic Features of Stigmatized Groups.
Proceedings of EDM 2015.