<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Classification in Math Class: Using Convolutional Neural Networks to Categorize Student Cognitive Demand</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Victoria</forename><surname>Delaney</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Stanford University</orgName>
								<address>
									<addrLine>485 Lasuen Mall</addrLine>
									<postCode>94305</postCode>
									<settlement>Stanford</settlement>
									<region>CA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jai</forename><surname>Bhatia</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Fremont High School</orgName>
								<address>
									<addrLine>575 W. Fremont Avenue</addrLine>
									<postCode>94087</postCode>
									<settlement>Sunnyvale</settlement>
									<region>CA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Classification in Math Class: Using Convolutional Neural Networks to Categorize Student Cognitive Demand</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D6AF7244A45C154DA04D2E7BE9584288</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Cognitive Demand</term>
					<term>Mathematics Education</term>
					<term>Convolutional Neural Networks</term>
					<term>Transfer Learning Computer Class Train_Acc: 96.4% Val_Acc: 94.8% Train_Acc: 55.7% Val_Acc: 53.0% Train_Acc: 96% Val_Acc: 97.5% Train_Acc: 99% Val_Acc: 98.5% Not tested Leaning Class Train_Acc: 77.2% Val_Acc: 68.3% Train_Acc: 72.3% Val_Acc: 69.4% Train_Acc: 74.3% Val_Acc: 68.2% Train_Acc: 85.7% Val_Acc: 70.9% Train_Acc: 85.6% Val_Acc: 76% Pointing Class Train_Acc: 76.3% Val_Acc: 60.2% Train_Acc: 74.1% Val_Acc: 63.9% Train_Acc: 78.7% Val_Acc: 64.2% Train_Acc: 90.2% Val_Acc: 67.8% Not tested Teacher Class Train_Acc: 81.8% Val_Acc: 66.5% Train_Acc: 79.3% Val_Acc: 64.2% Train_Acc: 84.1% Val_Acc: 69.4% Train_Acc: 88.5% Val_Acc: 75.2% Train_Acc: 97.5% Val_Acc: 81.6% 78.3% Val_Acc: 76.7% Train_Acc: 68.4% Val_Acc: 66.6% Train_Acc: 76.3% Val_Acc: 71.2% Train_Acc: 75.2% Val_Acc: 73.4% Train_Acc: 88.0% Val_Acc: 77.1%</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Maintaining cognitively demanding instruction is a primary goal of classroom teachers. Yet students' cognitive demand is difficult to measure and track during the enactment of a rigorous task. This in-progress research addresses this problem space by predicting and modeling students' cognitive demand with computer vision and convolutional neural networks, providing an in-the-moment analysis of cognitive demand during an eighth grade mathematics task enactment. The findings suggest that models which leveraged behaviorbased visual proxies for cognitive demand (e.g., gesturing, using a computer) achieved substantially higher accuracy than the baseline model. Taken together, the results of this work build toward a classroom analytic tool for teachers and have implications for the contributions of computer vision in real-world classroom studies.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>There has been much interest in applying artificial intelligence analytic tools to classroom settings in the past decade. Although many educational applications that leverage AI examine speech data with natural language processing [26, 29], there exists a growing enthusiasm for computer vision-based research in classrooms to analyze and improve teachers' instructional practices <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b9">10]</ref>. This study explores the extent to which students' cognitive demand, one aspect of classroom instruction, can be modeled with computer vision via the analysis of classroom video recordings in eighth grade mathematics.</p><p>The maintenance of students' cognitive demand, the amount of intellectual work required to create meaning for a mathematical task and solve it <ref type="bibr" target="#b14">[14]</ref>, is crucial for teachers to measure and track from students because of its direct relationship to learning outcomes <ref type="bibr" target="#b29">[27]</ref>. When students exhibit high cognitive demand, they develop deeper understandings and connect concepts across the discipline <ref type="bibr" target="#b26">[28]</ref>. However, cognitive demand is not a static construct and can be influenced by a number of instructional factors, including the initial presentation of the task to students <ref type="bibr" target="#b14">[14]</ref>, resources provided to the students while solving the task <ref type="bibr">[19]</ref>, and teacherstudent and student-student interactions during enactment <ref type="bibr">[15]</ref>. Because measuring cognitive demand in-the-moment is difficult, yet potentially beneficial for teachers, we are curious to explore the extent to which computer vision may be used to provide cognitive demand measurements as students solve a mathematical task in small groups. Such data may assist teachers by providing indicators for which students continue to exhibit high cognitive demand throughout the task's enactment, and conversely, which students struggle to uphold high demand after the task is launched.</p><p>Since cognitive demand is not a purely visual construct, our model draws upon five proxy student behaviors to identify potentially cognitively demanding activity while solving a EMAIL: vdoch@stanford.edu, jbhatia187@student.fuhsd.org mathematics task, then uses the presence of the five behaviors to predict the level of cognitive demand. Though this approach omits cues from students' speech, we hypothesized that modeling cognitively demanding visual behaviors may yield additional contributions toward predicting overall demand. We therefore ask: to what extent can computer vision model changes in students' cognitive demand during mathematical problem solving?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature Review</head><p>Modeling cognitive demand with computer vision is a novel task in classroom analytics research. Our exploration of relevant literature investigates the extent to which other computer vision-based methods have demonstrated success in tasks with adjacent features. By incorporating these features: transfer learning, multiclass binary classification, and use of pretrained networks, into the present study, we aim to utilize the affordances of computing toward a classroom setting. We discuss each feature in detail.</p><p>Transfer Learning. This research leverages transfer learning using ImageNet pre-trained weights; an approach that is not uncommon for developing novel applications in image classification. Since its onset, ImageNet has been established as a reliable, general-purpose benchmark for transfer learning on a variety of learning tasks, number of classes, and amounts of trainable data <ref type="bibr" target="#b17">[17]</ref>. Numerous past studies have investigated the relationship between factors that impact transfer learning and fine-tuning of convolutional neural network (CNN) models, including the perils of model overfitting <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b28">30]</ref> and the layers of ImageNet that should be optimized for transfer learning <ref type="bibr" target="#b16">[16]</ref>. We drew upon this research when considering the duration of hyperparameter tuning and the overall fit of the training data to each binary classification model, as it suggests that model overfitting may be perilous to transferring learning to validation and testing sets.</p><p>Pre-Trained Networks for Multiclass Classification. Additionally, we relied on MobileNet V2, a neural network specifically constructed for classification tasks, for binary and categorical classifications of cognitive demand. MobileNet V2 was developed for "lightweight classification tasks" <ref type="bibr" target="#b10">[11]</ref> in transfer learning, image classification, and localization. It is commonly used in object recognition and classification tasks, such as detecting human tissue abnormalities in medical research <ref type="bibr" target="#b2">[3]</ref>. As our investigation involves the classification of certain objects in order to detect human behavior (e.g., detecting the presence of a computer in the "using a computer" proxy behavior class), MobileNet V2 served as a reasonable choice for a first-pass exploration of the data.</p><p>One drawback experienced was the amount of labeled training data needed to optimize transfer learning using ImageNet and MobileNet V2. Past studies and experiments suggest that a large quantity of labeled training data is required for transfer learning, particularly in tasks that involve feature localization <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b29">31]</ref> and modification of architectures that improve transfer learning <ref type="bibr" target="#b3">[4]</ref>. Using unlabeled data has been an appealing area to explore in this research space <ref type="bibr" target="#b21">[20]</ref> and some self-supervised methods have attempted to improve feature generalization in auxiliary tasks <ref type="bibr" target="#b4">[5]</ref>, although none have outperformed ImageNet's performance on purely supervised learning tasks. Weak supervision, which applies noisy labels from non-expert users <ref type="bibr" target="#b18">[18]</ref>, is now seen as a plausible middle-ground for large-scale ImageNet transfer learning tasks. We utilized weak supervision when applying hand labels to binary classes in the training data, as one member of the research team was unfamiliar with coding student and teacher behaviors in mathematics education research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head><p>Our approach to modeling cognitive demand through convolutional neural networks consisted of three primary steps. First, we constructed the baseline cognitive demand model for comparison, which predicted demand from still images alone. Next, we devised the experimental model, which utilized binary classification for students' cognitive demand proxy activities (computer use, leaning in, pointing to the task, talking to the teacher, and writing on the task) to predict cognitive demand. Finally, we compared performance between the two models.</p><p>Both models applied transfer learning from ImageNet weights. Although the MobileNet V2 network, which relies upon ImageNet weights, contains approximately 2.3 million parameters, our method applies transfer learning to the bottom Dense bottleneck trainable parameters (approximately 1,300 layers). These layers solely focus upon the localized features of the five binary classes. Figure <ref type="figure" target="#fig_0">1</ref> shows a depiction of our model as well as a schematic of the trainable network architecture that was applied. Categorical cross-entropy loss was used in the baseline model to classify levels of students' cognitive demand, where level 1 indicated the least demanding activity and level 4 indicated the most demanding activity. Binary cross-entropy loss was used in the experimental model to categorize each of the five feature classes, as we aimed to assess whether each student behavior was present. Finally, we implemented a support vector machine classifier to transform intermediate binary feature predictions to cognitive demand scores on testing data. SVMs potentially work well with smaller datasets, such as ours, and are ideal for categorizing data into linear classes <ref type="bibr" target="#b25">[24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Data and Preprocessing</head><p>The data were collected from two eighth grade mathematics classrooms that focused on building students' capacities for cognitively demanding work through engagement with mathematical tasks. Four 30-minute video recordings were taken in Spring 2017 that featured students solving "The Washing Machine Problem" with Desmos, a dynamic graphing calculator application. The video recordings were rated for cognitive demand on a 1-4 scale called the Instructional Quality of Assessment Rubric <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>, a research-backed tool for rating cognitive demand of students' mathematical activity. Demand was rated at the level of the entire student group, and importantly, cognitive demand ratings were not uniformly distributed between the 1-4 scale. This is to be expected, because students were more likely to achieve moderate cognitive demand throughout the task (level 2 or 3) than extreme ratings (level 1 or 4). Initial ratings were assigned in Winter 2021 by two mathematics education experts (Delaney &amp; Kinsey) who reached 87.9% inter-rater agreement. 87.9%, classified as "very good agreement" <ref type="bibr" target="#b22">[21]</ref> serves as the upper accuracy threshold for human performance on this task.</p><p>The video recordings were spliced into still images taken at 1-second intervals, and we assigned 1-4 cognitive demand labels to each image from Delaney and Kinsey's ratings. We then hand-labeled each image in the five binary classes according to the following schematic:</p><p> Computer class: the image received a "1" if students were using the computer to solve the task, and a "0" otherwise.  Leaning class: the image received a "1" if more than one student was leaning into the center of the table to collaborate with the group, and a "0" otherwise.  Pointing class: the image received a "1" if one or more students were visibly pointing to gesturing to the task or computer, and a "0" otherwise.  Teacher class: the image received a "1" if the teacher and students were conversing with one another at the same table, and a "0" otherwise.  Writing class: the image received a "1" if one or more students were writing on the task card, and a "0" otherwise.</p><p>The five binary classes were generated based on our hypothesized relationship of each indicator to students' cognitive demand. Prior research has demonstrated that students' use of computational tools to assist with problem solving can either raise or lower cognitive demand, contingent upon how students use it <ref type="bibr" target="#b12">[13]</ref>. Similarly, conferring with a teacher should increase cognitive demand, as teachers may draw students' attention to cognitively demanding features of the task during small-group interactions <ref type="bibr">[15]</ref>. Finally, the ways in which students work collaboratively and use one another as resources may increase cognitive demand, as visually indicated through pointing, collective writing, and leaning in toward the "middle space" <ref type="bibr">[22]</ref>.</p><p>In total, the data set contained 2000 images distributed uniformly across the four classroom video recordings. Each image was rescaled to 224 by 224 pixels to accommodate the maximum weight size of MobileNet V2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Experiment 1: Training the Baseline Model</head><p>The first model classifies cognitive demand from images alone. We expected the accuracy of this model to be relatively low because, in comparison with the five binary class indicators in the experimental model, the baseline model's feature space was high-dimensional. We experimented with various combinations of hyperparameters: learning rates, batch sizes, epochs, and optimizers to investigate the training accuracy of the baseline. We aimed to achieve accuracy of around 25%, the expected accuracy that would be generated from a balanced cognitive demand distribution over the four levels. Table <ref type="table">1</ref> shows the training and validation accuracy as we tuned hyperparameters over 20 epochs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Hyperparameter Tuning with Baseline Cognitive Demand Model a learning rate of 0.0001. Conceptually, we anticipated that Sparse Categorical Cross-Entropy Loss would have been a better fit because it is designed for integer input; however, this was not the case during training. The final combination of hyperparameters caused the training accuracy to increase quickly, then level off after approximately 10 epochs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Experiment 2: Training the Experimental Model using Binary Behavioral Proxies</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">Phase 1: Binary classification using MobileNet V2</head><p>The experimental model sought to improve cognitive demand predictions by first identifying five binary student behaviors that might impact demand, then applying predicted binary class labels for the behaviors to testing data for prediction. Each of the five binary class sub-models were trained using MobileNet V2 with ImageNet weights. Data were split into 80% training, 10% validation, and 10% testing. We ensured that both the validation and testing sets contained all four cognitive demand levels.</p><p>Hyperparameters were tuned for each class separately, although many classes showed optimal training accuracy using similar inputs. Similar to Experiment 1, all binary classes were tuned for learning rate, number of epochs, loss optimization function, and batch size. The ADAM optimizer was used in all classes because it handled the noisy classroom data well, an important consideration for localization of class features.</p><p>We anticipated that the Teacher and Computer classes would achieve high training accuracy faster, because there was less ambiguity in labeling those classes compared to Writing, Leaning, and Pointing. We hypothesized that the latter classes would take longer to converge because they were based on pose estimation, and were more likely to vary per student. For example, we associated students' elbows on the table with the "leaning" class, but since not every student in the group need to have exhibited the "leaning" action in order for the image to be classified as "leaning," this nuance may have been difficult for the model to detect. Table <ref type="table">2</ref> shows the training and validation accuracies as we tuned hyperparameters for all five student behavioral proxies trained over 50 epochs.</p><p>The models performed best given low learning rates, smaller batch sizes, and longer training duration to achieve high training and validation accuracy. This is not surprising, given the localization required for the network to learn and classify each of the five feature behaviors. Models were trained until each obtained a training accuracy over 85%, a value similar to human accuracy applied for the original cognitive demand labels. In the event that multiple models fit this criterion, the model whose parameters the highest validation accuracy was selected. The final selected hyperparameters are highlighted in yellow in Table <ref type="table">2</ref>. Figure <ref type="figure" target="#fig_1">2</ref> illustrates one example of our error analysis per each binary class. As we interrogated the nuances these errors, it appeared that some class models learned to identify subtleties in the data better than others. For example, the highly-accurate Computer class differentiated between closed computers and open computers after 50 epochs of training. In a majority of images, the Teacher class teased apart differences between the teacher's presence at the table versus the teacher performing other actions in the image background. Classification errors occurred when the teacher was only partially visible in the image, which made sense, as teachers were not actively monitoring their body position and placement during the original video recordings. Errors in the Pointing, Writing, and Leaning classes occurred when the students did not clearly demonstrate the intended action; for example, when the point was blurry or incomplete, when only one student was writing or leaning, or when the leaning action was subtle. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Phase 2: Labeling Cognitive Demand using Trained Multiclass Models and a Support Vector Machine</head><p>Once the binary multiclass models were established, we utilized a small test set of data (n = 40 images, 10 per cognitive demand class) to examine the Computer, Leaning, Pointing, Teacher, and Writing models' abilities to (1) correctly predict the five binary classes of students' behaviors in the test set and (2) calculate cognitive demand based on labels generated by the five models. We tested both a linear and a generalized support vector machine to predict final cognitive demand labels. Regularization parameters were tuned in both models (e.g., the kernel and gamma parameters in the generalized SVM, and the loss function in the linear SVM). Figure <ref type="figure" target="#fig_2">3</ref> summarizes the results for both classifiers and provides a confusion matrix to summarize classification errors. After tuning the regularization rate and aforementioned hyperparameters, it did not appear that the models' predictive accuracies for cognitive demand varied substantially. The general SVM classifier was the better overall choice because it improved cognitive demand classification from the baseline model (55.7%), although it does not surpass human performance (87.9%). This result is not surprising, because cognitive demand is an abstract concept that was previously rated by human experts using both speech and visual cues. However, the drastic improvements in cognitive demand classification from the baseline model validate our current approach despite the relatively small size of the data set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>The experimental two-phase model did not approach human-level performance, but showed both improvements from the baseline model and promise for future work. Compared to the baseline model, the SVM classifier performed better than unsupervised classification. This result presents a case for weak supervision to be used when training data are identified, sorted, and labeled, which could draw upon the expertise of more teachers in future iterations of this work. We hypothesize that teachers' involvement during data labeling would offer improvements to the model due to their developed practices in interpreting students' behaviors in their day-to-day experiences.</p><p>Future developments in this study will increase the sample sizes and apply data augmentation to re-examine outcomes. Increasing the sample size will improve predictive stability in the binary models, particularly the Pointing class, which contained a smaller proportion of positive cases with respect to the others. Future data augmentations to be tested include varying the brightness in classroom photos, rotating classroom images, and including more classroom images with noisy features (for example, the presence of additional individuals in the image frame who are not the teacher). Although MobileNet V2 appeared to be a suitable classifier for binary class inputs, it was likely not the best choice for the baseline categorical model. Other neural networks, such as VGG net, may have produced better transfer learning <ref type="bibr" target="#b11">[12]</ref>, and will be tested in future iterations of this work.</p><p>A key long-term goal of this project is to build toward a cognitive demand classification tool that can be used to support and empower teachers' professional learning. By analyzing their students' variations in cognitive demand throughout a mathematical task, teachers may better understand the range and variation in students' enacted demand, and adjust their future instructional practices accordingly. Such a tool may be useful in teachers' video clubs [25], a form professional development activity designed to hone teachers' noticing and inquiry of student behavior. By supplying teachers with a cognitive demand classifier, teachers may attend to student behavioral features that impact cognitive demand more frequently, and adjust their practices in response. We aim to test this theory in future iterations of this work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architecture of Trainable Neural Network Layers</figDesc><graphic coords="3,72.00,296.54,456.19,203.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: One example of error analysis in the Pointing Class's training data</figDesc><graphic coords="6,74.50,516.39,445.70,199.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Testing accuracy for cognitive demand classification with a support vector machine</figDesc><graphic coords="7,74.45,258.17,445.95,100.90" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Acknowledgements</head><p>We thank the Amir Lopatin Fellowship committee, which supplied funding to this project in support of its potential contributions to the learning sciences. This study emerged under the mentorship of Dr. Nick Haber and Dr. Ranjay Krishna during Stanford's Spring 2021 academic quarter. It was originally submitted as the project component of their CS 432 and CS 231n courses, respectively. We thank Gina Kinsey for her work in hand-labeling the original cognitive demand data and Jagriti Agrawal for her contributions during the initial conception and modeling in this study.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Analyzing the performance of multilayer neural networks for object recognition</title>
		<author>
			<persName><forename type="first">P</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European conference on computer vision</title>
				<meeting><address><addrLine>, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014-09">2014. September</date>
			<biblScope unit="page" from="329" to="344" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A computer-vision based application for student behavior monitoring in classroom</title>
		<author>
			<persName><forename type="first">Ngoc</forename><surname>Anh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tung Son</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Truong Lam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Le Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Huu Tuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Cong Dat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename></persName>
		</author>
		<author>
			<persName><forename type="first">.</forename><forename type="middle">.</forename><surname>Van Dinh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page">4729</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Breast cancer detection and localization using MobileNet based transfer learning for mammograms</title>
		<author>
			<persName><forename type="first">W</forename><surname>Ansar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Shahid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Raza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Dar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International symposium on intelligent computing systems</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020-03">2020. March</date>
			<biblScope unit="page" from="11" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">From generic to specific deep representations for visual recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Azizpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sharif Razavian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sullivan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Carlsson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition workshops</title>
				<meeting>the IEEE conference on computer vision and pattern recognition workshops</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="36" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Unsupervised feature learning and deep learning: A review and new perspectives</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<idno>CoRR, abs/1206.5538</idno>
		<imprint>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Are we done with ImageNet?</title>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">J</forename><surname>Hénaff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V D</forename><surname>Oord</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.07159</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Quantifying classroom instructor dynamics with computer vision</title>
		<author>
			<persName><forename type="first">N</forename><surname>Bosch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Wammes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Smilek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence in Education</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018-06">2018. June</date>
			<biblScope unit="page" from="30" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Assessing instructional quality in mathematics</title>
		<author>
			<persName><forename type="first">M</forename><surname>Boston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Elementary School Journal</title>
		<imprint>
			<biblScope unit="volume">113</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="76" to="104" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Assessing Academic Rigor in Mathematics Instruction: The Development of the Instructional Quality Assessment Toolkit</title>
		<author>
			<persName><forename type="first">M</forename><surname>Boston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Wolf</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
		<respStmt>
			<orgName>National Center for Research on Evaluation, Standards, and Student Testing (CRESST</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">CSE Technical Report 672</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Monitoring students&apos; attention in a classroom through computer vision</title>
		<author>
			<persName><forename type="first">D</forename><surname>Canedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Trifan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Neves</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Practical Applications of Agents and Multi-Agent Systems</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018-06">2018. June</date>
			<biblScope unit="page" from="371" to="378" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Identification of plant disease images via a squeeze-and-excitation MobileNet model and twice transfer learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Suzauddola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Nanehkaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IET Image Processing</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1115" to="1127" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Transfer learning with convolutional neural networks for classification of abdominal ultrasound images</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Malhi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of digital imaging</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="234" to="243" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Academic work</title>
		<author>
			<persName><surname>Doyle</surname></persName>
		</author>
		<ptr target="http://www.corestandards.org/assets/CCSSI_Math%20Standards.pdf" />
	</analytic>
	<monogr>
		<title level="m">Common Core State Standards for mathematics</title>
				<imprint>
			<date type="published" when="1983">2010. 1983</date>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="159" to="199" />
		</imprint>
	</monogr>
	<note>Common Core State Standards Initiative</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Student engagement with others&apos; mathematical ideas: The role of teacher invitation and support moves</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Franke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Turrou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">M</forename><surname>Webb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fernandez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Elementary School Journal</title>
		<imprint>
			<biblScope unit="volume">116</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="126" to="148" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Spottune: transfer learning through adaptive fine-tuning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Grauman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rosing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Feris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4805" to="4814" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">What makes ImageNet good for transfer learning?</title>
		<author>
			<persName><forename type="first">M</forename><surname>Huh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Efros</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1608.08614</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Learning visual features from large weakly supervised data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jabri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vasilache</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Computer Vision</title>
				<meeting><address><addrLine>, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016-10">2016. October</date>
			<biblScope unit="page" from="67" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Technology and mathematics education</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Kaput</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Handbook of research on</title>
		<imprint>
			<biblScope unit="volume">515</biblScope>
			<biblScope unit="page">556</biblScope>
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
	<note>mathematics teaching and learning</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Auto-encoding variational bayes</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6114</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Landis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Koch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biometrics</title>
		<imprint>
			<biblScope unit="page" from="363" to="374" />
			<date type="published" when="1977">1977</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Talking and Working Together: Conditions for Learning in Complex Instruction</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Lotan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Support vector machines for classification in remote sensing</title>
		<author>
			<persName><forename type="first">M</forename><surname> ; Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Mather</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of remote sensing</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1007" to="1011" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Do ImageNet classifiers generalize to ImageNet?</title>
		<author>
			<persName><forename type="first">B</forename><surname>Recht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Roelofs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Shankar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2019-05">2019. May</date>
			<biblScope unit="page" from="5389" to="5400" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Effects of video club participation on teachers&apos; professional vision</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gamoran Sherin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Van Es</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of teacher education</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="20" to="37" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Using deep learning to automatically detect talk moves in teachers&apos; mathematics lessons</title>
		<author>
			<persName><forename type="first">A</forename><surname>Suresh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sumner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jacobs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Foland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ward</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Big Data (Big Data)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018-12">2018. December. 2018</date>
			<biblScope unit="page" from="5445" to="5447" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Instructional tasks and the development of student capacity to think and reason: An analysis of the relationship between teaching and learning in a reform mathematics project</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lane</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Educational Research and Evaluation</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="50" to="80" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Learning to see teaching in new ways: A foundation for maintaining cognitive demand</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tekkumru Kisa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American Educational Research Journal</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="105" to="136" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Incorporating learning analytics in the classroom</title>
		<author>
			<persName><forename type="first">C</forename><surname>Thille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zimmaro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Directions for Higher Education</title>
		<imprint>
			<biblScope unit="issue">179</biblScope>
			<biblScope unit="page" from="19" to="31" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Fruit image classification based on Mobilenetv2 with transfer learning technique</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Xiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Hu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Computer Science and Application Engineering</title>
				<meeting>the 3rd International Conference on Computer Science and Application Engineering</meeting>
		<imprint>
			<date type="published" when="2019-10">2019. October</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">How transferable are features in deep neural networks?</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lipson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1411.1792</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
