The Learner Data Institute—Conceptualization: A Progress Report Vasile Rus1, Stephen E. Fancsali2, Philip Pavlik, Jr.1, Deepak Venugopal1, Arthur C. Graesser1, Steve Ritter2, Dale Bowman1, and The LDI Team 1The University of Memphis 2Carnegie Learning, Inc. vrus@memphis.edu, sfancsali@caregielearning.com ABSTRACT The LDI is a “frameworks” project funded by the United States’ This paper provides a progress report on the first 18 months of National Science Foundation (NSF) under the Data-intensive Phase 1, the conceptualization phase, of the Learner Data Institute Research in Science and Engineering (DIRSE) program to make (LDI; www.learnerdatainstitute.org). LDI is currently in Phase 1, the learning ecosystem more effective, efficient, engaging, the conceptualization phase, to be followed by Phase 2, the institute equitable, relevant, and affordable. It is part of the NSF’s or convergence phase. The current 2-year conceptualization phase Harnessing the Data Revolution1 (HDR) Institutes effort. “HDR has two major goals: (1) develop, implement, evaluate, and refine Institutes… enable breakthroughs in science and engineering a framework for data-intensive science and engineering for the through collaborative, co-designed programs to formulate future institute, and (2) use the framework to provide prototype innovative data-intensive approaches to address critical national solutions, based on data, data science, and science convergence, to challenges” (NSF-HDR, 2021). LDI focuses on data-intensive a number of core challenges in learning science and engineering. approaches to developing and improving learning environments By targeting a critical mass of key challenges that are at a tipping that include adaptive instructional systems as a means to address point, LDI aims to start a chain reaction that will transform the the challenge of offering access to high-quality education to whole learning ecosystem. We will emphasize here the key everyone—no matter what neighborhood they live in, and elements of the LDI science convergence framework that our team regardless of gender, race, national origin, native language, developed, implemented, and now is in the process of evaluating personal interests, or any other factor that might limit such access and refining. We highlight important outcomes of the convergence and educational opportunity. framework and related processes, including a 5-year plan for the There is a twofold focus during the current 2-year conceptualization institute phase and data-intensive prototype solutions to transform phase: (1) develop, implement, evaluate, and refine a framework the learning ecosystem. for data-intensive science and engineering, and (2) use the framework to provide prototype solutions, based on data, data Keywords science, and science convergence, to a number of core challenges in learning science and engineering. The institute or convergence big data in education, science convergence, learning engineering, phase would build on results realized and insights gained from this adaptive instructional systems, intelligent tutoring systems. conceptualization phase. By targeting a critical mass of key 1. INTRODUCTION challenges that are at a tipping point (i.e., targeting challenges for This paper provides a progress report on the first 18 months of the which timely investment in data-intensive approaches has the two-year conceptualization phase of the Learner Data Institute maximum potential for a transformative effect), LDI will start a (LDI; www.learnerdatainstitute.org). The present work updates chain reaction that will transform the whole learning ecosystem, that of Rus et al. (2020), which provided an introduction to LDI and lifting it to a qualitatively higher state that is more effective, early activities and outcomes. We emphasize here the engaging, equitable, relevant, and affordable. Indeed, since the developments of the past 12 months (since the 2020 paper), learning ecosystem is a complex web of interrelated elements, focusing on the key elements of the science convergence improvements in key aspects will percolate throughout the whole framework, its development, implementation, evaluation, and learning ecosystem. refinement, and key outcomes such as the 5-year plan of the future LDI has brought together a team which currently consists of 60+ institute and data-intensive prototype solutions to address key researchers, developers, and practitioners from three continents challenges in the learning ecosystem. spanning many disciplines and backgrounds. Team members are Copyright © 2021 for this paper by its authors. Use permitted under 1 https://www.nsf.gov/cise/harnessingdata/ Creative Commons License Attribution 4.0 International (CC BY 4.0). drawn from institution and organizations representing academia, data, together with advanced data science methods, are likely to government, and industry. offer insights about learning and instruction and lead to the Together, we intend a rigorous test of the hypothesis that emerging development of effective and affordable instructional tools that learning ecologies that incorporate adaptive instructional systems were not possible before. This is promising enough to believe that (AISs) are capable of providing affordable, effective, efficient, the learning ecosystem is at a tipping-point to be transformed. equitable, and engaging individualized assistance for both learners Indeed, LDI is built on the belief that AISs constitute a necessary and instructors, and that the characteristics, parameters, and catalyst to enable the transformation of the learning ecosystem impacts of these systems, for example, effectiveness (in terms of through harnessing the data revolution because, as noted earlier, learning gains), can be improved over time given sufficient AISs can monitor and scaffold learners at a very fine granularity attention to evidence, captured as data, and expertise, provided by level, at scale, and across time. It should be noted that much of teams of interdisciplinary researchers like ours. education data, (e.g., currently collected by schools), relies on a set The idea that AISs and data science have the potential to radically of predefined competencies or standards to monitor student transform existing learning ecosystems is based on the following: progress. Such data only reveal what students know or mastered (1) evidence suggesting that individualized instruction is generally and what they don’t know (didn’t master yet), but such data often more effective than traditional classroom instruction where do not reveal much about the learning and instructional process. monitoring and tailored support to each individual learner is not That is, much of the school data focus on “where the student is” but possible (Bloom, 1984; Chi, Roy, & Hausmann, 2008; Cohen, not what they do during instructional activities. Fundamentally, Kulik, & Kulik, 1982; VanLehn et al., 2007); (2) the capability of teachers and schools in general lack the capacity to monitor and modern technologies to collect, store, and access vast and rich store data about all students at every single step of the learning and learner data; (3) incentive-based mechanisms to share goods such instruction process. LDI will thus offer schools a new powerful as education data using online market places (Hartline, 2012; framework to understand, monitor, and intervene at a fine-grain Hartline et al., 2019) and secure and privacy preserving ways to level with potentially transformative effects on the learning access and process data based on differential privacy and multi- ecosystem. party computation (Dwork, 2008; Wang, Ranellucci, & Katz, 2017); (4) promising new advances in data science, including 3. FRAMEWORK FOR SCIENCE powerful machine learning and statistical methods such as deep CONVERGENCE neural networks, statistical relational learning, causal modelling, A major goal of LDI conceptualization phase is to develop, and probabilistic temporal graphs, for extracting useful knowledge implement, test, and refine a framework for data-intensive research from massive educational data sets (Spirtes, Glymour, & Scheines, in science and engineering enabling science convergence, aligning 2001; LeCun, Bengio, & Hinton, 2015; Schmidhuber, with the Growing Convergence Research (GCR) “big idea” 2015; Bach, Broecheler, Huang, & Getoor, 2017; Pearl & identified by the National Science Foundation. Mackenzie, 2018); and (5) recently available access to affordable, According to NSF, “convergence research is a means of solving powerful, and scalable cloud-based computing resources for vexing research problems, in particular, complex problems processing big data (Hellerstein et al., 2019; Atwal, 2020). focusing on societal needs. It entails integrating knowledge, 2. DATA SCIENCE AND AISs — A methods, and expertise from different disciplines and forming novel frameworks to catalyze scientific discovery and innovation." TRANSFORMATIVE MIX FOR THE Also, “convergence is a deeper, more intentional approach to the LEARNING ECOSYSTEM integration of knowledge, techniques, and expertise from multiple The LDI is founded on the key observation that data science and disciplines in order to address the most compelling scientific and AISs are a powerful mix with potentially transformative impact on societal challenges” (NSF-GCR, 2020). the learning ecosystem. NSF identifies Convergence Research as having two primary Big educational data (edu-data) create tremendous opportunities to characteristics: reveal facets along which learner experiences can be tailored or adapted in ways heretofore impossible. A particular learning  “Research driven by a specific and compelling problem. environment may result in different learning outcomes for different Convergence Research is generally inspired by the need to (groups of) students because of students’ idiosyncratic prior address a specific challenge or opportunity, whether it arises knowledge, experience(s), interest(s) and motivation(s). A small from deep scientific questions or pressing societal needs.” minority of students, for example, that approach a problem in a  “Deep integration across disciplines. As experts from unique way could be overlooked in a small dataset, but larger different disciplines pursue common research challenges, their datasets give us the possibility to detect and account for individual knowledge, theories, methods, data, research communities and differences in learning. To this end, our mission is to harness the languages become increasingly intermingled or integrated. data revolution to further our understanding of how people learn. New frameworks, paradigms or even disciplines can form AISs can monitor and scaffold learners at a fine level of granularity sustained interactions across multiple communities” NSF- (e.g., capturing every single step during instructional activities) and (GCR, 2020). with respect to many aspects of learning (e.g., cognitive, LDI’s compelling problem is making the learning ecosystem more behavioral, affective, social, motivational facets of learning) at effective, engaging, equitable, efficient, relevant, and affordable. scale (i.e., for millions of learners and teachers and across many topics and domains) and across time periods (e.g., across grade- To foster deep integration across scientific disciplines, we have put levels). Such rich data, when collected, can be characterized as deep in place a convergence framework, comprising a diverse team, (many data instances from millions of learners), wide (capturing organizational structures, processes, mechanisms, activities, and many aspects of the learning process at a fine granularity level), and tools, meant to encourage broad participation, coordination, long (longitudinal, i.e., across time and grade levels). Such big edu- collaboration, and diffusion and integration of knowledge across  Incentives for team members to proactively and deeply engage disciplines. in convergent activities and working towards accomplishing the goal/mission of the team which is to solve the compelling LDI has intentionally sought, from its inception, to follow NSF’s problem: characterization of convergence research by “intentionally bring[ing] together [from the inception] intellectually diverse o Resources researchers and stakeholders to frame … research questions, develop effective ways of communicating across disciplines and o Freedom to propose research tasks that fit their own sectors, adopt common frameworks for their solution, and, when interests and align with the LDI mission appropriate, develop a new scientific vocabulary.” (NSF-GCR, o Bottom-up and top-down strategies for agenda 2020) The LDI team seeks, where possible, to develop “sustainable setting relationships that may not only create solutions to the problem that engendered the collaboration, but also develop novel ways of o Semi-autonomous teams/groups framing related research questions and open new research vistas” o Flexible, open structure (NSF-GCR, 2020).  Progress monitoring and refinement of the convergence To make these intentions a reality, LDI’s leadership team and framework participants have designed, prototyped, and tested a process and a corresponding set of tools designed to transform what is currently Our framework will enable team members to develop a shared a loosely coupled group of research centers, AIS commercial vision and language, which over time should lead to effective and providers, and governments research labs engaged in similar but meaningful cross-discipline, collaborations, i.e., science disparate research and development efforts into a set of interacting convergence. Such mutual sense- making, science convergence, teams (Berry, 2011; Lilian, 2014), in aggregate constituting a and R&D efforts are likely to incubate solutions to complex physical and virtual community of practice (Lave & Wenger, problems to enable effective, efficient, engaging, equitable, and 1991). We have not and will not attempt to “tighten” the coupling affordable learning experiences for everyone. We detail next the between participating research centers. As Weik (1991) has argued main components of our science convergence framework. in respect to educational systems, loosely coupled systems have several advantages over tightly coupled ones—not least flexibility, 3.1 LDI’s Mission and Vision survivability (with dysfunction in individual nodes tolerable), and LDI’s mission is to harness the data revolution (HDR) to further increased likelihood of beneficial “mutations.” Rather, LDI’s our understanding of how people learn, how to improve adaptive leadership has intended to design and test a set of processes and instructional systems (AISs), and how to make emerging learning tools that will support the independent work of the participating ecologies that include online and blended learning with AISs more research centers, facilitate the flow of information and ideas within effective, efficient, engaging, equitable, relevant, and affordable. and across these centers, and help to keep participants focused on Our vision is for LDI to: (i) serve as a hub to identify investment common problems without the need for direct intervention (e.g., in opportunities for data-intensive approaches to core learning science the form of a top-down, tightly controlled research agenda). and engineering challenges to accelerate progress toward equitable LDI’s team structure and processes enable the harnessing and learning and achievement in education; (ii) foster, support, and diffusion of expertise from various areas in an efficient and build a portfolio of inter-related, inter-disciplinary prototyping or effective way while fostering individual initiative and interests. For “Scale-up Projects” to research, develop, and disseminate data- example, LDI team members were encouraged in the intensive solutions across multiple academic and non-academic conceptualization phase to propose prototyping tasks that they are communities that currently cannot easily communicate with each interested in and which fit the LDI mission statement (see more other, embodying a process of science convergence; (iii) bridge the details later). Organizational structures and processes are HDR ecosystem with the educational data science and learning intentionally open, flexible, and scalable to enable the LDI to grow engineering community and the broader education world, and, in and transform based on emerging findings and partnerships with particular, serve as the education & training hub for the HDR other NSF-supported HDR teams. ecosystem, assisting other teams with developing data science training platforms for their communities. The key elements of the LDI convergence framework are listed below. LDI will forge new HDR frontiers by:  Mission/Common Goal  furthering our understanding of learning and instructional processes and environments;  An intellectually diverse team with stakeholder representation (researchers, developers, practitioners including school and  developing data science infrastructure for the education and teachers’ representatives) the HDR ecosystem;  An effective and efficient team structure  improving AISs and scale them up both horizontally and vertically;  Activities and processes that foster cross-discipline interactions  advancing research at the human-technology frontier in future learning ecologies that involve AISs;  Processes, mechanisms, and tools to nurture collaboration, broad participation, diffusion and integration of knowledge  transforming communities of practice (e.g., triggering a across disciplines, and coordination culture shift in teacher training programs);  Resources, in terms of funding, student support, travel, and  exploring how data science can address equity, ethics, access to big edu-data and other cyber-infrastructure resources diversity, and inclusion aspects of education. 3.2 LDI’s Team and Team Structure efforts such as concrete prototyping tasks that are being carried out LDI’s team evolved and grew from 45+ members (see Rus et al., in the Phase 1 conceptualization and (2) to help shape the 5-year 2020) to over 60 as of this writing. In preparation for the longer- plan for Phase 2 by identifying opportunities for investment (i.e., term “convergence” or institute phase (LDI Phase 2), we have promising developments in one area that could benefit the other extended our interdisciplinary team to include additional areas or specific activities of the institute). researchers and personnel from academia, K-12 schools, industry, and government, giving us access to the necessary stakeholders, infrastructure, expertise, and learning data to pursue targeted investment opportunities. LDI is led by the Institute of Intelligent Systems at The University of Memphis and main corporate partner Carnegie Learning, developer of commercial-grade AISs serving over 500,000 students in 2,000+ school districts. The assembled team now spans 14 main organizations on 3 continents, including NSF-funded partners such as the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL; NSF HDR TRIPODS project led by researchers at Northwestern University) and LearnSphere: Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education (NSF DIBBs project; Carnegie Mellon University lead). Figure 1. LDI team structure. In addition, partners include researchers, practitioners, and other stakeholders from the US Army’s Generalized Intelligent The following Expert Panels were initially formed: Data Science, Framework for Tutoring project (Sottilare et al, 2016) and 6 K-12 Education, Learning Sciences, Learning Systems additional corporate partners, 3 laboratory schools (The Early Engineering, Ethics & Equity, and Human-Technology Frontier. Learning & Research Center, Campus Elementary School, and Expert Panel membership is flexible; LDI participants may belong University Middle School in Memphis, TN), 3 K-12 school districts to more than one Expert Panel but must be actively engaged in at - Shelby County Schools (Memphis, TN area; 200 schools, 100,000 least one. Expert Panels have co-leaders who are responsible for students), Brockton Public Schools (Boston, MA area; 24 schools, ensuring that the panels successfully reach milestones (e.g., 15,000 students), Val Verde Unified School District (Los Angeles, reviewing concrete tasks). California area; 21 schools, 20,000 students), and one teacher training program at Christian Brothers University. Concrete tasks or “Scale-Up Projects” are prototyping endeavors led by individual researchers (see the section on Building 3.3 Team Structure Prototypes for Concrete Tasks later). Examples of concrete tasks The team structure consists of a leadership team, domain-oriented include projects directed at scaling data-driven domain model Expert Panels, and task-oriented groups that in the refinement, using auto-encoders for student assessment, and data- conceptualization phase have driven prototyping projects for very driven instructional strategy discovery. concrete, well-defined tasks, hence called concrete tasks. 3.4 Stakeholder Representation The LDI Core Leadership Team is responsible for overseeing and Our team includes representatives of various communities with an coordinating LDI activities, making sure those activities align with invested interest in the learning ecosystem such as researchers, the mission of the institute and offering necessary support for developers, practitioners, government, policymakers, and funders. cohesiveness of activities. The Leadership Team consists of Lead Nevertheless, there are gaps in LDI’s expertise. For instance, we do Principal Investigator (PI) Dr. Vasile Rus, Carnegie Learning not currently have representatives from domains including Principal Investigator Dr. Stephen Fancsali (co-PI), and co-PIs neuroscience, the law, and social and moral philosophy, primarily from University of Memphis: Dr. Dale Bowman, Dr. Philip Pavlik, due to Phase 2 budget constraints. We hope to account for such and Dr. Deepak Venugopal. Project coordinator Jody Cockroft, expertise through ad-hoc engagement with appropriate experts Senior Research Scientist Dr. Donald Morrison, Dr. Arthur (e.g., reviewing and feedback from targeted experts in those areas). Graesser, a Professor Emeritus at The University of Memphis While diverse opinions and perspectives are represented within the round out the Leadership Team. team and make possible greater organizational learning and LDI Expert Panels are homogeneous in terms of expertise in order synergy, interdisciplinary teams also deal with the pull of to maximize intellectual coverage of particular research areas, as competing loyalties and demands (Berry, 2011). Sense-making of individual researchers are specialized in different subareas of a the beliefs or actions of others (here, disparate experts) is a constant relatively broad area such as Data Science or Learning Science. struggle in team environments (Guribye, Andressen, & Wasson, Expert Panels were composed in this homogenous way to 2003), and this difficulty can be exacerbated by the greater encourage meaningful discussions from the start leading to more intellectual diversity of the team. Shared goals and shared efficient and engaging conversations early on, benefitting team understandings are required, and negotiation of these common building and engagement. Cross-domain interactions are more goals is an intrinsic part of the team-building process. Effective challenging. One major purpose of LDI is to engage our team social relationships are a required constant for effective members (including Expert Panels) in cross-domain interactions collaborative work, virtual or face to face, and it may occur more that develop shared sense making, a common language, and slowly at first (Vroman & Kovachich, 2002; Walther, 1995). mission-driven culture over time. 3.5 Convergence Processes The role of the Expert Panels is twofold: (1) to provide solid A key element of the LDI convergence framework is a set of (breadth and depth) input from an area of expertise to all LDI processes, mechanisms, and tools to foster collaboration, broad participation, diffusion and integration of knowledge across 3.6 New Shared Vocabulary disciplines, and coordination. LDI participants have started to develop an emerging shared LDI has implemented an iterative process of idea and solution vocabulary and language, which enables more effective and generation and refinement that includes internal (from other LDI efficient communication and collaboration across disciplines and members) and external (paid, external ad-hoc reviewers) feedback which constitutes a key ingredient of convergence research. For loops. Furthermore, we have set in place synchronous and instance, new vocabulary includes introducing many team asynchronous, face-to-face and virtual coordination, collaboration, members to the notion of convergence research, concrete tasks or and communication channels supported by adequate processes that “Scale-Up Projects,” “learner model,” “cloud continuum,” scaling- will facilitate exchange of ideas across disciplines. Processes that up AISs “horizontally” and “vertically,” and AISs-teacher enable broad participation and input from everyone were designed partnership models. The vocabulary is dynamic and evolving. For and implemented, including the use of NGT (Nominal Group instance, we have been using the term “concrete task” to indicate Technique; Delbecq & Van de Ven, 1971) process for meetings to prototyping tasks led by researchers in LDI Phase 1 which would ensure everyone’s voice is heard and accounted for. Other result in some kind of data science prototype or deliverable (e.g., a processes such as SWOT analysis (to identify strengths, significant dataset and/or peer-reviewed publications). In this work, weaknesses, opportunities, threats) and “pre-mortem” analysis we use the term “concrete task” and “Scale-Up Project” essentially (Klein, 2007) (i.e., identifying possible points of failure interchangeably as the latter reflects our intent for each concrete prospectively rather than retrospectively, by imagining a future task to scale up in some dimension in Phase 2. situation in which a project has failed and considering how that Synchronous and asynchronous interactions and activities have imaginary failure might have occurred) were used as well. enabled better communication and understanding of various Processes implemented were intended to grow science convergence domain-specific terms by team members with limited initial among our large team of interdisciplinary experts. Within- and expertise or understanding of those terms (e.g., “model parameters” cross-domain interaction and collaboration processes were in machine learning/data science, “domain model” in learning designed among subgroups of our team as well as all-team engineering, or the meaning and importance of the socio-cultural interactions and communications (e.g., whole-team meetings, aspects of human learning). We expect the development and mailing lists, website) in order to develop a common vision and emergence of a shared vocabulary and language to continue and language and to ensure cohesiveness and clarity with respect to the stabilize over time. mission of the LDI, responsibility for various tasks, and engaging the community for assistance when needed. 3.7 New Research Vistas—Investment Opportunities in the 5-year Institute Plan  An abbreviated list of activities, tools, and structures LDI Our strategy to accomplish the LDI mission of transforming the implemented to realize the above iterative idea and solution learning ecosystems, in a proposed 5-year institute, is to focus on a generation and broad and deep collaborations include: An number of carefully selected research priorities, targeting key iterative process of ideas and solution generation and aspects of the learning ecosystem which we believe are at a “tipping refinement that includes internal (from other team members) point” (i.e., a point at which timely investment in data-intensive and external (paid, external ad-hoc reviewers) feedback loops approaches focusing on those critical aspects has the maximum  asynchronous and synchronous, face-to-face and virtual potential for a transformative effect). coordination, collaboration, and communication channels supported by adequate processes that will facilitate exchange The identified research priorities were the result of an intense of ideas across disciplines science convergence process involving a number of activities (e.g.,  A federation of semi-autonomous groups (e.g., Expert Panels, brainstorming sessions or “ideas labs” followed by iterative concrete task teams) coordinated by a Leadership Team discussions for ranking and selection at “all-hands” virtual meetings, engagement with Expert Panels, etc.). Processes and  Regular virtual meetings of the Core Leadership Team (as the activities engaged all LDI team members across many disciplines conceptualization phase has largely taken place during the (e.g., educators, education researchers, computer scientists, global pandemic) statisticians, cognitive scientists), developers (Carnegie Learning,  Two full-team or “all-hands” virtual meetings each year Age of Learning, Gooru), school districts (Shelby County Schools,  Two workshops (in 2020 and 2021) at the International Brockton Public Schools), as well as researchers from other Conference on Educational Data Mining (to which this piece projects funded by NSF (e.g., Northwestern’s TRIPODS Cohort II contributes) to engage with a broader international community project: IDEAL - The Institute for Data, Econometrics, Algorithms, of scholars and Learning; CMU’s DIBBS LearnSphere: Building a Scalable  Meetings at major conferences that our team members attend Infrastructure for Data-Driven Discovery and Innovation in  Quarterly updates and Requests-for-Comments from Expert Education; and the University of Memphis NSF project: Advancing Panels the Science of Learning Data Science with Adaptive Learning for  Mini-workshops in the form of full-day brainstorming Future Workforce Development). That is, the identified research sessions on a particular task priorities reflect our collective interdisciplinary wisdom that timely  Transformative app ideation at “all-hands” meetings investment in data-intensive approaches will have the maximum  Email, cloud-shared documents, wikis, Slack, and other potential for a transformative effect.The identified investment collaboration tools for collaboratively drafting and refining opportunities (or research priorities) constitute the central focus of ideas, solutions, and processes the 5-year plan for the LDI. It should be noted that we also  Software repository managed with the version control generated a 10-year plan such that the impacts of the LDI Institute software, e.g., github or SVN will propagate and evolve beyond the lifetime of the award and  Project management software to keep track of task progress beyond our own team thus acting as an agent of change for how and major milestone deadlines and deliverables research questions are conceived and addressed through learner data at scale using distributed computing (e.g., leveraging interdisciplinary collaboration. the cloud-continuum), scalable algorithms, and richer/more Identified key investment opportunity areas or thrusts include: powerful algorithms (e.g., emerging neuro-symbolic approaches). Indeed, access to data at scale is a more critical, upstream challenge  Investment Opportunity Area 1: Scaling Up Access To that needs to be addressed first as before being able to process Learning Data – From Impoverished Datasets To Learning learning data, one must have access to the data and have permission Data Convergence To Comprehensive Learner Models to share it. LDI adopts the principle that data owners (e.g., learner/  Investment Opportunity Area 2: Novel, Richer, More parent/ guardian/ teacher/ school/ developer/ etc.) should be given Powerful, Scalable, and Accurate Data-intensive Solutions to a spectrum of options with respect to data sharing or, if deciding Core Education Tasks not to share, with respect to providing access to data. The spectrum  Investment Opportunity Area 3: Human Technology of options should accommodate all attitudes that learners/learning Frontier – Pushing For Wider Adoption and Integration Of data owners may have towards data ownership, security, and AISs privacy. Indeed, access to learner data is a complex issue due to privacy, security, ownership, and regulatory concerns. Investment Opportunity Area 1: Scaling Up Access To Learning Data. To enable data science, there must be data and in particular We are aware that full data convergence would be hard to achieve “big” education data (big edu-data). To this end, a key long term for various reasons. However, our goal is to push the limits of what goal of LDI is learning data convergence, i.e., collecting and is possible, understand those limits, and act accordingly. aligning (more) comprehensive data about the same learner(s) Understanding the limits of data convergence will allow us to across skills, disciplines, and modalities (cognitive, meta-cognitive, understand the limits of technology, what teachers can do to emotional, motivational, behavioral, social) and across time (e.g., compensate for those limitations, and how to best orchestrate the K-12 grade-levels), as well as data about the learning process and learner-teacher-AISs partnership. environment. Our data convergence activity focuses on concrete examples from Prior efforts such as LearnSphere/DataShop have made progress math and computer science (STEAM+C) as well as literacy and towards building data infrastructure and capacity in education leverage prior efforts in the area of building data infrastructure and contexts, but slow data convergence is a critical issue that hinders capacity, contributing and expanding on those previous efforts to realizing the full potential of data and data science to transform the move us closer to the goal of full data convergence. Specifically, learning ecosystem. For instance, the DataShop metric reports one major goals is to build a fine-grain, large, and diverse (deep, show that most of the data is composed of datasets in the standard wide, long) dataset that will enable LDI to explore the potential of DataShop format, of which there are about 3500 data science methods to better model learners and the learner (https://pslcdatashop.web.cmu.edu/MetricsReport). While process. We announced and started the process of building accumulating this many datasets is no small feat, the average LearnerNet in Fall 2019 as part of LDI Phase 1 (see Rus, 2019 – number of observations per student is less than 400. A large number ADL Directors’ meeting talk). Indeed, we have called for the of students, greater than 800,000, is spread across more than 3000 development of LearnerNet (Rus et al., 2020), an “ImageNet” (Su, datasets, resulting in less than 260 students per dataset. Similary, Deng, & Fei-Fei, 2012) for learner modeling which could enable a the recently released EduNet (Choi et al., 2020) contains data from transformation of our modelling and understanding of how learners 784,309 students preparing for the Test of English for International learn, of how AISs can be made more capable of adapting to diverse Communication at an average of 400.2 interactions per student. learners, and fueling a better understanding of the learning Despite progress in building edu-data repositories, there is an ecosystem as a whole. “impoverished datasets” challenge in education. Investment Opportunity Area 2: Novel, Richer, More Powerful, Ideally, big edu-data would include data about millions of learners Scalable, and Accurate Data-intensive Solutions to Core Education that are fine-grain (e.g., step/substep level information or detailed Tasks. process data), rich (capturing cognitive, affective, motivational, This investment opportunity area focuses on improving existing behavioral, social, and epistemic facets of learning), and methods and models with respect to their scaling and extension longitudinal (across many grades). That is, big edu-data should be using big edu-data and developing novel, richer, more powerful, deep (e.g., about many learners), wide (e.g., capture as many scalable, and accurate computational models for a number of core learning relevant aspects as possible), and long (being longitudinal, educational tasks such as prediction and assessment of learner across many grades or even a learner’s lifetime). Convergence mastery of knowledge components (KCs; micro-competencies or efforts will seek to “deepen” samples and “lengthen” timeframes of skills), domain model refinement (i.e., improving models of what datasets that are (sometimes, but not always, already) “wide” in learners need to learn to acquire mastery of a domain), and inferring terms of features captured. optimal strategies to coordinate the behavior of AISs for how and Using these concepts, our goal can be re-stated as enabling the when to optimally implement guidance to promote student collection of deep, wide, and long education data which could then learning. The goal is to improve our understanding of how learners be analyzed using emerging, state-of-the-art data science methods learn, improve the effectiveness and efficiency of AISs, make AISs capable of learning patterns from such massive collections of data more affordable and scalable horizontally (across topics and and also accounting for input from diverse domain experts with the domains), and scale AISs vertically (offering training on higher- ultimate goal of transforming the learning ecosystem. level skills such as deep conceptual understanding and collaborative problem solving). In order to fully harness the data revolution to transform the learning ecosystem we need: (1) improved, at-scale data collection One major opportunity from a learning engineering perspective is and (near) real-time access to big edu-data (i.e., addressing the the automation of the development and refinement of AISs and “impoverished datasets” challenge) in ways that account for adaptive instructional content. Making progress towards security, privacy, and ownership and (2) infrastructure to process automating the authoring of AISs should begin to enable better scalability across topics and domains (horizontal scalability), which currently is a major stumbling block for a wider adoption of such transforming communities of practice effort. To this end, we plan systems. Expert-driven approaches to developing domain models, to develop new curricula for data literacy to be used by teacher learner models, and instructional strategies for new topics and training programs. domains are expensive, tedious, and time-consuming. Automated Models of Learner-Teacher-AISs Partnership. Finding the best or semi-automated approaches to discovering domains models, learner-teacher-AISs partnerships could have transformative inferring learner models, and discovering instructional strategies impact on the learning ecosystem such as freeing teachers from are much needed. For instance, we intend to use neuro-symbolic certain duties that AISs can do in an autonomous manner thus approaches to automatically extract from both structured, e.g., allowing them to focus on higher level tasks such as designing new student performance data, and semi-structured data, i.e., text in instructional materials or novel tailored interventions for students, textbooks, domain models. , motivational support, and other tasks for which AISs are not ideal A second major opportunity within this thrust involves AISs for This better distribution of duties and coordination between teachers collaborative learning with intelligent discourse components. and AISs should lead to a more effective, efficient, engaging, and Widely deployed, commercial AISs largely do not target advanced equitable learning ecosystem. We will study four levels of AISs topics such as collaborative problem solving. Collaborative work autonomy with respect to how teachers may use AISs (see later). and collaborative problem-solving skills are much needed in the Detect and Mitigate Issues Related to Ethics, Equity, Inclusion, and 21st century (Autor, Levy, & Murnane, 2003; Carnevale & Smith, Diversity in Education. As a general principle, all LDI activities 2013), and learning activities fostering the acquisition of such skills will be informed and guided by our goal of using data science and must be adopted by learning ecologies of the future in order to make AISs to promote ethics and equity in education (Riddle et al., 2015; such ecologies more effective and equitable for all learners and Corbett-Davies & Goel, 2018; Gardner, Brooks, & Baker, 2019). more relevant to emerging needs and new realities. Our goal is to At the same time, the Ethics and Equity Expert Panel will review scale up AISs vertically, to offer training opportunities for such all LDI efforts to ensure ethics and equity aspects are properly advanced skills. The strategy is to extend AISs such as those addressed. Furthermore, our institute 5-year plan includes a set of offered by Carnegie Learning and Age of Learning with language activities focusing on ethics and equity which fall into three through discourse components. categories: (1) using data and data science to further our Language and discourse play a central role in learning (Vygotsky, understanding of biases and achievement gaps in the learning 1978), particularly for the acquisition of difficult topics that require ecosystem; (2) understanding and mitigating ethics and equity deep comprehension, reasoning, problem solving, and throughout the data lifecycle with a focus on algorithmic bias and collaboration that are required for higher paying jobs in the 21st developing tools to address these issues throughout the work of the century (Autor, Levy, & Murnane, 2003; Carnevale & Smith, LDI; and (3) increasing diversity and inclusion during collaborative 2013). Language and discourse are essential for developing learning activities. argumentation skills (Ferretti & de la Paz, 2011), disciplinary literacy (Goldman et al., 2016; Shanahan & Shanahan, 2008; 3.8 Evaluation and Refinement Shaffer, 2017), reasoning associated with mental models (Graesser, Evaluation and analysis are key elements of the LDI convergence 2020), and formulating explanations of complex systems in science framework to both demonstrate its effectiveness and provide a way (Chi et al., 1989; Graesser, 2015), math (Fancsali et al., 2016), and to identify opportunities for improvement and refinement. We computer code (Lasang et al., 2021). focus on quantitative and qualitative metrics for LDI community building and engagement efforts, identifying investment Language and discourse is not only essential for learning within opportunities priorities, and development and refinement of individuals but also learning in group contexts. Problems have prototyping concrete task or Scale-Up Project activities. For dramatically increased in complexity, requiring collaborative quantitative metrics, to account for different perspectives, we will problem solving by people with disparate expertise and report how many experts and from how many different disciplines perspectives (Carnevale & Smith, 2013; Graesser et al., 2018; contribute to specific tasks (e.g., identification of data requirements OECD, 2017). for Investment Opportunity Area 1, above). For each expert, we can Investment Opportunity Area 3: Human Technology Frontier – monitor their individual contributions in terms of content (e.g., Pushing For Wider Adoption and Integration Of AISs word counts), comments, and revisions to others’ contributions (by This investment opportunity fosters a portfolio of efforts to push using shared documents that track such metrics). More for wider adoption and integration of AISs with school-based and qualitatively, each member’s contributions will be assessed in teacher-led learning activities at the Human-Technology Frontier, terms of the depth of their contributions. A researcher might one other of NSF’s ten Big Ideas for Future Investment. identify that a particular expert’s contribution initiated the development of a novel solution that could improve the detections Many teachers are overwhelmed by the many duties and tasks they of learners’ emotions in a classroom context. have to handle, resulting in burnout and reduced teacher job satisfaction and retention rates (Grayson & Alvarez, 2007; Rhodes, Furthermore, we report the scientific and societal impact of the Nevill, and Allan, 2004). To assist teachers, major goals and proposed convergence framework. Scientific impact can be corresponding Scale-up Projects include: (1) to help teachers better reported in terms of the number of publications, presentations, understand the potential of using AISs and data science to tutorials, meetings, email exchanges and other forms of direct transform education including their job performance and communication (among LDI members and the broader research satisfaction; (2) to propose and investigate learner-teacher-AISs community) as well as improvements of prototype solutions over collaboration models and interfaces including the validation of a existing solutions. Other scientific success measures can monitor framework for learning experience design; and (3) to design and longer term impact such as how many citations the products of this develop dashboards for teachers to learn from, interpret, and make project generate and how many research groups integrate the decisions based upon fine-grained, comprehensive learning data. proposed solutions (e.g., user adoption of analysis toolkits Helping teachers, parents, and other stakeholders understand the developed). potential of data science and AISs is important for LDI’s Societal impact can be assessed through impact on learners and phase (Phase 2). Expert panels had the freedom to adopt different teachers as well as impact on the learning ecosystem (e.g., in terms internal processes to identify investment opportunities. of how LDI efforts have made aspects of the learning ecosystem more effective, engaging, equitable, efficient, relevant, and Expert Panel 9 (1 of 10 Expert Panel members left LDI affordable, as well as other outcomes such as transforming Reviewer Pool after assignment to Expert Panel.) educators’ community of practice). Participation 7 / 9 (Two members were assigned reviews rate but did not submit any reviews.) An important requirement for the evaluation process is documentation of the various elements of the convergence Concrete Tasks 17 framework. For this purpose, for instance, all meetings of the Reviewed leadership team were recorded (key metric: hours of meetings and Total Concrete 34 (17 task x 2 reviews/task) interactions; volume of those interactions). Other processes and Task Reviews activities have been documented in various ways such as Google docs, meeting recording, and Slack asynchronous discussions. For Number of 3.3 (average over the 7 reviewers submitting instance, the convergence process implemented to generate the 5- Reviews Per at least one review; min: 2; max: 7) year institute plan has been well documented through other records Member such as spreadsheets used in NGT processes employed by the Total Expert (34 x 2) + (7 x 2) = 82 hours of expert time various Expert Panels to generate and rank ideas for investment Time (assuming 2 hours spent per concrete task opportunities to be included in the 5-year plan. review and 2 hours of Expert Panel meeting We will illustrate how we have been evaluating the effectiveness of to summarize the reviews for each concrete convergence framework holistically as well as from the perspective task) of Expert Panels. For brevity, we illustrate the evaluation of the Expert Panelist 4.82 hours (82 total hours / 17 concrete convergence process from the perspective of the Learning Time per tasks) Engineering Expert Panel. Concrete Task The LDI’s Learning Engineering Expert Panel comprised a diverse Panel 279 words per task (average); 4,749 total group of researchers and developers with vast experience in Summary research and development of learning systems. The 10-member Word Count expert panel was drawn from the academe, government, and Table 2. A summary of the quantitative evaluation of the concrete industry. task review and feedback process by the Learning Engineering The Learning Engineering Expert Panel, like other LDI expert Expert Panel. panels, engaged in two major activities that contribute to the LDI This policy was adopted for two main reasons: (i) offer autonomy Phase 1 project: to each expert panel to self-organize and (ii) explore different - Provide input to each of the concrete tasks (forward-looking collaboration processes in order to discover the best one (e.g., in “Scale-Up Projects”) addressing various challenges in the terms of member engagement, effectiveness, and efficiency) or learning ecosystem with the goal of converging to solutions to identify from each expert panel a set of best practices for later those challenges that account for input from many domains. adoption. In the case of the Learning Engineering Expert Panel, investment opportunity ideas were solicited via e-mail from the - Identify, rank, and propose investment opportunities for the 5- Expert Panel by the Co-Leads. A brief summary of candidate year plan of the convergence or institute phase (LDI Phase 2) opportunities is provided below: The concrete task reviewing and feedback process involved  Improving and scaling up AISs horizontally across topics and significant expert time (see Table 2, which presents a summary of domains the quantitative evaluation of the initial cycle of the review and feedback process by the Learning Engineering Expert Panel).  Scaling up AISs vertically targeting advanced skills such as collaborative problem solving and deep conceptual In addition to this quantitative summary of the convergence process understanding of complex STEAM+C topics related to concrete tasks, we also developed a 5-stage model to characterize the maturity of concrete tasks: (1) ideation or initial  (More) Comprehensive learner models idea, (2) conceptualization and convergence of a data science solution with input from experts from many domains, (3)  Pushing for wider adoption and integration of AISs in school- implementation & refinement, (4) product release (e.g., an based and teacher-led instruction (Human-Tech Frontier) emerging data science prototype or dataset release), (5) impact, in  Models of Teacher - AISs inter-operation which the product from stage 4 is adopted by or integrated into external research projects or a learning environment, having some  Causal modeling for learning engineering external impact on the research landscape or on the learning  Inclusive learning engineering R&D (ethics, equity, inclusion, ecosystem. Work of LDI participants during the conceptualization and diversity) phase has centered primarily on concrete tasks in the first four phases (ideation, conceptualization and convergence, and product This list was further discussed and the initial investment release). Ideally, the transition from concrete task to “Scale-Up opportunities were ranked by all expert panel members. A Projects” in LDI Phase 2 will reflect progression to later stages of recommendation of the most important investment opportunities this model. was put forward to the whole LDI team for further debate and refinement by other Expert Panels and paid, ad-hoc external The other major task of each Expert Panel was to identify reviewers and the public at large. Many of the proposed investment investment opportunities for the 5-year plan of the LDI institute opportunities that originated in the Learning Engineering Expert other level (level 0) which are self-improving, fully autonomous Panel are part of the 5-year institute plan adopted by the broader AISs – they improve with experience with minimal or no developer LDI community. intervention. While we will explore as resources permit the role of data science to enable such level 0, self-improving fully Holistically, the LDI convergence framework can be evaluated in autonomous AISs, from a teacher and learner perspective they are terms of the level of engagement of a diverse team of researchers, similar to the fully autonomous level of AISs (level 1). developers, practitioners, and other stakeholders as well as its key outcome, which is the 5-year plan for the institute or convergence We plan to study and understand the trade-offs in terms of teacher phase which was described and submitted as a proposal to NSF. involvement in tuning AISs vs. levels of AIS autonomy. For The level of engagement can be summarized briefly by noting that instance, teachers may choose a fully autonomous mode of our 60+ strong team participated so far in 3 all-hands meeting each operation for an AIS meant for students working independently for about 20 hours (2.5 days) resulting in 60 x 20 = 1,200 expert with the system afterschool as supplemental instruction, whereas hours of effort. Experts spent hundreds of additional hours spent in for student interactions with the AIS during a class period (i.e., in a other meetings and other activities. Most meetings were recorded blended-learning environment), the same teacher may choose to and transcribed. A more detailed, quantitative and qualitative control more the behavior of the AISs. Similarly, teachers may analysis is being conducted right now, and the results will be widely decide to use/download a pre-trained learner model and update it disseminated. with data from her students, assuring data security and privacy and maintaining full ownership of the data. They may decide to share a 4. EMERGING IDEAS sample of her own student data to benefit the pooled/pre-trained We conclude this progress report by briefly presenting two models that everyone can download as default. emerging ideas from the collective work of the LDI during its conceptualization phase to date. 4.3 Transforming Communities of Practice LDI intends to serve as an agent of change for how research 4.1 Policy Recommendations questions are conceived and addressed through interdisciplinary Our work so far also results in a number of policy collaboration such that LDI’s impacts will propagate and evolve recommendations: beyond the lifetime of the award. - Publicly funded education technologies similar to publicly More specifically, we have the explicit intent to start a culture shift funded education adopted in the 19th and 20th century. in teacher training programs through two specific actions: (1) involve a few dozen teachers and pre-service teachers in our work - Learning data owners keep ownership of their data and have in order to co-design solutions and account for their input and decision power with respect to where their data is stored, how expose them to the potential of data science and AISs while also the data is accessed, by whom and for what purposes, how introducing them to science convergence approaches to address key their data is used, and if their data can be shared, with whom, challenges in education and (2) develop new curriculum and under what conditions and circumstances. recommendations for teacher training programs as well as - Learning data infrastructure is needed to enable responsible accompanying training materials to build capacity for teachers and learning data collection, storage, access, sharing, and other stakeholders to adopt AISs and data science approaches, processing. tools, and principles to improve learning and teaching. - The need for a culture shift in teacher training programs and Wider adoption of advanced data-driven science and engineering data literacy curriculum for future teachers. approaches and tools such as AISs is still lacking for at least three reasons: (1) Data science and education technology training is often 4.2 AISs Autonomy Levels or Teacher-AISs limited in teacher training programs. (2) The sophistication and Partnership Models complexity of AISs often entail a significant effort to train teachers Finding the best teacher/learner-AISs partnerships could have to effectively use such advanced education technologies. (3) New transformative impact on the learning ecosystem, potentially approaches are often developed with a lack of substantive freeing teachers from certain duties that AISs can do in an involvement of educators and schools. autonomous manner and allowing teachers to focus on higher level Involving educators will help to ensure that new approaches based tasks such as tailored, individualized interventions for students, on data science to tackle various education challenges, next- motivational support, and other tasks for which AISs are not ideal. generation AISs, and learning environments that include AISs, are This better distribution of duties and coordination between teachers designed to help eliminate biases and promote equity, inclusion, and AISs should lead to a more effective, efficient, engaging, and and diversity, offering high quality education opportunities for all equitable learning ecosystem. learners. We will therefore push for schools, teacher training We defined and intend to study four levels of AISs’ “autonomy” programs, and instructors to collaborate more with data science and with respect to how teachers can use such AISs: (1) fully educational technology researchers and developers to improve autonomous – teachers need little (if any) training and have little (if learning and instruction. To this end, in addition to substantive any) involvement in “tuning” AISs, (2) minimal teacher involvement of teachers and other stakeholders in LDI activities, involvement – teachers tune the parameters of the AISs with the we will explore avenues for delivering professional learning, help of the AISs developer at the beginning of the school year or including workshops for teachers, summer schools (e.g., by adding semester (minimal teacher training with respect to the workings of a track to CMU’s LearnSphere summer school) for pre-service the AISs), (3) average teacher involvement – teachers require teachers and Research Methods instructors in schools of education. training, and they work with the system on a weekly basis selecting We are an expanding community of practice and promote Scale-Up instructional tasks and receiving information from the AISs, (4) Projects that will ideally become bona fide research programs teacher-driven – the teachers exerts full control of the AISs beyond the award period, securing their own funding as they make including overriding decisions the AISs may take or suggest, the scientific progress. Furthermore, Scale-Up projects and research teacher will interact almost daily with the AISs. There is in fact one thrusts will ideally result in career-long efforts for some younger Artificial Intelligence in Education. AIED 2020. Lecture faculty members. Notes in Computer Science, vol 12164. Springer, Cham. To sum up, our strong team of interdisciplinary experts, developers, https://doi.org/10.1007/978-3-030-52240-7_13 and practitioners will work together during the 5-year LDI institute [12] Cohen, P. A., Kulik, J. A., & Kulik, C. C. (1982). project to move current practices beyond the small-scale studies to Educational outcomes of tutoring: A meta-analysis of bring the learning sciences into the era of big data and findings. American Educational Research Journal, 19, 237- interdisciplinary science convergence. The impact of LDI will be 248. felt far and wide, propagating and evolving beyond the lifetime of [13] Cohen, P.R. (2015). DARPA's Big Mechanism program. the award and beyond our own team, acting as an agent of change Physical Biology, Volume 12, Number 4, 1-9. for how research questions are conceived and addressed through interdisciplinary, collaboration, and co-designed research and [14] Corbett-Davies, S. & Goel, S. (2018). The Measure and development. The proposed processes, methods, and studies pave Mismeasure of Fairness: A Critical Review of Fair Machine the way for taking these outcomes to other domains. Learning, arXiv:1808.00023, 2018. [15] Delbecq, A. L., & Van de Ven, A. H. (1971). A group ACKNOWLEDGMENTS process model for problem identification and program The Learner Data Institute is sponsored by the National Science planning. The Journal of App. Beh. Science, 7(4), 466-492. Foundation (NSF; award #1934745). The opinions, findings, and results are solely the authors’ and do not reflect those of NSF. [16] Dwork, C. (2008). Differential privacy: A survey of results. In International conference on theory and applications of 5. REFERENCES models of computation, pp. 1–19. Springer. [1] Anders, R., Oravecz, Z., & Batchelder, W. (2014). Cultural [17] Fancsali, S.E., Ritter, S., Berman, S.R., Yudelson, M., Rus, consensus theory for continuous responses: A latent appraisal V., and Morrison, D.M. (2016). Toward Integrating model for information pooling. Journal of Mathematical Cognitive Tutor Interaction Data with Human Tutoring Text Psychology, 61, 1–13. Dialogue Data in LearnSphere. In: J.P. Rowe and E.L. Snow [2] Atwal, H. (2020). DataOps Technology. In Practical (Eds.), Proceedings of the Workshops at the 9th Intern. Conf. DataOps 2020 (pp. 215-247). Apress, Berkeley, CA. on Educ. Data Mining, Raleigh, NC, USA, June 29, 2016. [3] Autor, D., Levy, F., & Murnane, R. (2003). The Skill [18] Fancsali, S.E., Yudelson, M.V., Berman, S.R., Ritter, S. Content of Recent Technological Change: An Empirical (2018). Intelligent instructional hand offs. In: K.E. Boyer, Exploration, Quarterly Journal of Economics, 118(4), M.V. Yudelson, (Eds.) Proceedings of the 11th International November 2003, 1279-1334. Conference on Educational Data Mining (EDM 2018), pp. 198–207. International Educational Data Mining Society. [4] Bach, S.H., Broecheler, M., Huang, B., and Getoor, L. (2017). Hinge-loss Markov Random Fields and Probabilistic [19] Ferretti, R. P., & De La Paz, S. (2011). On the Soft Logic. Journal of Machine Learning Research, 18, pp. 1 comprehension and production of written texts: Instructional – 67, 2017. activities that support content-area literacy. In R. O’Connor & P. Vadasy (Eds.), Handbook of reading interventions (pp. [5] Berry, G. R. (2011). Enhancing effectiveness on virtual 326–355). New York, NY: Guilford. teams: Understanding why traditional team skills are insufficient. The Journal of Business Communication, 48(2), [20] Gardner, J., Brooks, C., & Baker, R. S. J. d. (2019). 186-206. Evaluating the Fairness of Predictive Student Models Through Slicing Analysis, in Proceedings of the 9th [6] Bishop, C. M. (2013). Model-based machine learning. International Conference on Learning Analytics & Philosophical Trans. of the Royal Society A: Mathematical, Knowledge, 2019, pp. 225–234. Physical and Engineering Sciences, 371(1984). [21] Goldman, S. R., Britt, M. A., Brown, W., Cribb, G., George, [7] Bloom, B. S. (1984). The 2 Sigma Problem The Search for M., Greenleaf, C., Lee, C. D., Shanahan, C., & Project Methods of Group Instruction as Effective as One-to-One READI. (2016). Disciplinary literacies and learning to read Tutoring. Educational Researcher, 13, 4-16. for understanding: A conceptual framework of core [8] Carnevale, A.P., & Smith, N. (2013). Workplace basics: The processes and constructs. Educational Psychologist, 51, 219- skills employees need and employers want. Human Resource 246. Development International, 16, 491–501. [22] Graesser, A.C., Fiore, S.M., Greiff, S., Andrews-Todd, J., [9] Chesler, N. C., Bagley, E., Breckenfeld, E., West, D., & Foltz, P.W., & Hesse, F.W. (2018). Advancing the science of Shaffer, D. W. (2010). A virtual hemodialyzer design project collaborative problem solving. Psychological Science in the for first-year engineers: An epistemic game approach. In Public Interest, 19, 59-92. ASME 2010 Summer Bioengineering Conference (pp. 585- [23] Grayson, J. L., & Alvarez, H. K. (2007). School climate 586). American Society of Mechanical Engineers. factors relating to teacher burnout: A mediator model. [10] Chi, M.T.H., Roy, M.& Hausmann, R.G.M. (2008). Learning Teaching and Teacher Education, 24(5), 1349-1363. from observing tutoring collaboratively: Insights about [24] Growing Convergence Research. (NSF-GCR, 2020). tutoring effectiveness from vicarious learning. Cognitive National Science Foundation’s Growing Convergence Science, 32, 301-341. Program, [11] Choi Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J., https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5056 Bae, C., Kim, B, & Heo, J. (2020) EdNet: A Large-Scale 37 (accessed online on June 15, 2020) Hierarchical Dataset in Education. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) [25] Guribye, F. , Andressen, E.F. , & Wasson, B. ( 2003).The [40] Rus, V., Banjade, R., Maharjan, N., Morrison, D., Ritter, S., organization of interaction in distributed collaborative and Yudelson, M. (2016). Preliminary Results on Dialogue learning. In B. Wasson , S. Ludvigsen , & U. Hoppe (Eds.), Act Classification in Chatbased Online Tutorial Dialogues, Designing for change in networked learning environments Proceedings of the 9th International Conference on (pp. 385-394). Dortrecht, Netherlands: Kluwer Academic. Educational Data Mining, Raleigh, NC, 2016. [26] Harnesssing the Data Revolution. (NSF-HDR, 2021). [41] Rus, V., Fancsali, S.E., Bowman, D., Pavlik Jr., P., Ritter, S., National Science Foundation’s Harnessing the Data Venugopal, D., Morrison, D., and The LDI Team (2020). Revolution, https://www.nsf.gov/cise/harnessingdata/ The Learner Data Institute: Mission, Framework, & (accessed online on June 11, 2021) Activities. In V. Rus & S.E. Fancsali (Eds.) Proceedings of [27] Hartline, J. D. (2012). Bayesian mechanism design. The First Workshop of the Learner Data Institute, The 13th Theoretical Computer Science 8(3), 143–263. International Conference on Educational Data Mining (EDM 2020), July 10-13, Ifrane, Morroco (held online). [28] Hartline, J. D., A. Johnsen, D. Nekipelov, and O. Zoeter (2019). Dashboard mechanisms for online marketplaces. In [42] Shaffer, D. W. (2017). Quantitative ethnography. Madison, Proceedings of the 2019 ACM Conference on Economics WI: Cathcart Press. and Computation, pp. 591–592. [43] Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary [29] Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith, literacy to adolescents: Rethinking content-area literacy. J., Sreekanti, V., Tumanov, A., & Wu, C. (2019). Serverless Harvard Educational Review, 78, 40−59. Computing: One Step Forward, Two Steps Back. [44] Sottilare, R.A., Brawner, K.W., Goldberg, B.S., & Holden, arXiv:1812.03651, 2019. H.K. (2012). The Generalized Intelligent Framework for [30] Hoffmann, L. (2019). Reaching New Heights with Artificial Tutoring (GIFT). Downloaded from www.gifttutoring.org on Neural Networks: ACM A.M. Turing Award recipients November 30, 2012. Yoshua Bengio, Geoffrey Hinton, and Yann LeCun. [45] Spirtes, P., Glymour, C., Scheines, R. (2001). Causation, Communications of the ACM. June - 2019, p. 96-95. Prediction, and Search. 2nd Edition. MIT. [31] Klein, G. (2007). Performing a Project Premortem. Harvard [46] Su, H., Deng, J. and Fei-Fei, L. (2012). Crowdsourcing Business Review. 85 (9): 18–19. Annotations for Visual Object Detection. AAAI 2012 Human [32] Lave, J., & Wenger, E. (1991). Situated learning: Legitimate Computation Workshop, 2012. peripheral participation. Cambridge University Press. [47] Tamang, L.J., Alshaikh, Z., Ait-Khayi, N., Oli, P., & Rus, V. [33] Lilian, S. C. (2014). Virtual teams: Opportunities and (2021). A Comparative Study of Free Self-Explanations and challenges for e-leaders. Procedia-Social and Behavioral Socratic Tutoring Explanations for Source Code Sciences, 110, 1251-1261. Comprehension, Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pp. 219-225, [34] Liu, R., Koedinger, K., Stamper, J., & Pavlik Jr., P. I. (2017). March, 2021. Workshop: Sharing and Reusing Data and Analytic Methods with LearnSphere. In X. Hu, T. Barnes, A. Hershkovitz, & L. [48] VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P., Paquette (Eds.), Proc. of the 10th Int. Conf. on Educ. Data Olney, A., & Rose, C. P. (2007). When are tutorial dialogues Mining (pp. 475-476). Wuhan, China. more effective than reading? Cognitive Science, 31, 3-62. [35] Mislevy, R. J., Almond, R. G., Yan, D., & Steinberg, L. S. [49] Vroman, K. , & Kovachich, J. ( 2002). Computer-mediated (1999). Bayes nets in educational assessment: Where the interdisciplinary teams: Theory and reality. Journal of numbers come from. In Proceedings of the fifteenth Interprofessional Care, 16, 159-170. conference on uncertainty in artificial intelligence (pp. 437– [50] Vygotsky, L.S. (1978). Mind in society: the development of 446). UAI’99. Stockholm, Sweden: Morgan Kaufmann Pubs. higher psychological processes. London: Harvard University [36] OECD (2017). PISA 2015 Results (Volume V): Collaborative Press. Problem Solving. Paris: OECD Publishing. [51] Walther, J.B. (1995). Related aspects of computer-mediated [37] Pearl, J. & Mackenzie, D. (2018). The Book of Why: The communication: Experiential observations. Organizational New Science of Cause and Effect. Basic Books, New York. Science, 6, 180-203. [38] Rhodes, C., Nevill, A. & Allan, J. (2004) Valuing and [52] Wang, X., S. Ranellucci, and J. Katz. (2017). Global-scale supporting teachers: A survey of teacher satisfaction, secure multiparty computation. In B. M. Thuraisingham, D. dissatisfaction, morale and retention in an English local Evans, T. Malkin, and D. Xu (Eds.), ACM CCS 2017: 24th education authority. Research in Education, 71 (1), 67-80. Conference on Computer and Communications Security, Dallas, TX, USA, pp. 39–56. ACM Press. [39] Riddle, T., Bhagavatula, S., Guo, W., Muresan, S., Cohen, G., Cook, J., and Purdie-Vaughns, V. (2015). Mining a [53] Weick, K. E. (1976). Educational organizations as loosely Written Values Affirmation Intervention to Identify the coupled systems. Administrative science quarterly, 1-19. Unique Linguistic Features of Stigmatized Groups. Proceedings of EDM 2015.