<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Improving Accuracy of Anomaly Detection in Spatial-Temporal Population Data through SHAP Values of Reconstruction Errors ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ryo</forename><surname>Koyama</surname></persName>
							<email>ryou.koyama.aw@nttdocomo.com</email>
							<affiliation key="aff0">
								<orgName type="institution">NTT DOCOMO, INC</orgName>
								<address>
									<addrLine>Sanno Park Tower, 2-11-1 Nagatacho, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tomohiro</forename><surname>Mimura</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">NTT DOCOMO, INC</orgName>
								<address>
									<addrLine>Sanno Park Tower, 2-11-1 Nagatacho, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Shin</forename><surname>Ishiguro</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">NTT DOCOMO, INC</orgName>
								<address>
									<addrLine>Sanno Park Tower, 2-11-1 Nagatacho, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Keisuke</forename><surname>Kiritoshi</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">NTT Communications Corporation</orgName>
								<address>
									<addrLine>Otemachi Place West Tower, 2-3-1 Otemachi, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Takashi</forename><surname>Suzuki</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">NTT DOCOMO, INC</orgName>
								<address>
									<addrLine>Sanno Park Tower, 2-11-1 Nagatacho, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Akira</forename><surname>Yamada</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">NTT DOCOMO, INC</orgName>
								<address>
									<addrLine>Sanno Park Tower, 2-11-1 Nagatacho, Chiyoda-ku</addrLine>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Improving Accuracy of Anomaly Detection in Spatial-Temporal Population Data through SHAP Values of Reconstruction Errors ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">23D2179DA70D36D5A9DB15467B20B9B8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Anomaly Detection</term>
					<term>Spatial-Temporal</term>
					<term>SHAP</term>
					<term>Dimensionality Reduction</term>
					<term>Reconstruction Error</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>When accidents, disasters, or other large-scale events occur, they significantly disrupt traffic, leading to congestion and reduced mobility. To effectively address these issues, it is crucial to precisely detect the underlying causes of these disruptions through the analysis of human mobility data. A common approach in anomaly detection is to employ dimensionality reduction techniques to compute reconstruction errors. However, the reconstruction errors generated by traditional methods are influenced by the correlations among features, which may obscure the true causes of anomalies. To overcome this limitation, we introduce an approach that calculates the SHAP (SHapley Additive exPlanations) values associated with the reconstruction errors resulting from dimensionality reduction. We conducted experiments using a dataset of human mobility patterns to evaluate the effectiveness of this method. The results demonstrate that our approach provides a more accurate explanation of anomalies compared to conventional methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Anomaly detection in spatial-temporal population data has gained significant attention in recent years due to its potential applications in urban planning, disaster management, and public safety. Accurate identification of unusual patterns in human mobility can help authorities respond more effectively to disruptive events such as accidents and disasters. However traditional anomaly detection methods often face challenges in capturing the complex spatial and temporal dependencies in high dimensional population data.</p><p>Previous studies have explored various approaches to anomaly detection in spatial temporal data. Ochiai et al. <ref type="bibr" target="#b0">[1]</ref> proposed a method that utilizes mesh-based population data derived from mobile communication records to detect nondesignated evacuation centers during disasters. Their approach relies on significant reconstruction errors in anomaly scenarios, which are trained only with data representing normal conditions. While this method demonstrates potential, it may not effectively capture the underlying causes of anomalies.</p><p>On the other hand, Takeishi <ref type="bibr" target="#b1">[2]</ref> demonstrated the effectiveness of using Shapley values to explain the causes of anomalies in dimensionality reduction scenarios. By applying Shapley values to one-dimensional health data, such as myocardial infarction and breast cancer records, Takeishi's method provides a more interpretable understanding of anomaly detection results. However, the applicability of this approach to high-dimensional spatial-temporal data has not been explored.</p><p>Building upon the insights from Ochiai et al. and Takeishi, we propose a novel anomaly detection framework that combines the strengths of both approaches. Our method integrates SHAP (SHapley Additive exPlanations) values with dimensionality reduction to identify and explain anomalies in spatial-temporal population data. By leveraging the explanatory power of SHAP values, we aim to improve the accuracy and interpretability of anomaly detection results, while also extending the applicability of Takeishi's approach to high-dimensional data.</p><p>The main contributions of this paper are as follows:</p><p>• We introduce a novel anomaly detection method that integrates dimensionality reduction with SHAP values to identify anomalies in spatialtemporal population data. • We evaluate the effectiveness of our approach using a real-world dataset of human mobility patterns in a major urban area, demonstrating its superiority compared to traditional reconstruction error-based methods. • We extend the applicability of Takeishi's Shapley value-based approach to high-dimensional spatialtemporal data, enhancing its utility for real-world scenarios.</p><p>The remainder of this paper is organized as follows. Section 2 provides an overview of related work in anomaly detection and spatial-temporal data analysis. Section 3 describes our proposed methodology in detail. Section 4 presents the experimental setup. Section 5 discusses the experimental results. Finally, Section 6 concludes the paper and discusses future research directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The detection of anomalies in urban population flows has been extensively explored using diverse data sources, including road sensors <ref type="bibr" target="#b2">[3]</ref>, surveillance cameras <ref type="bibr" target="#b3">[4]</ref>, and social media data <ref type="bibr" target="#b4">[5]</ref>. While road sensors and surveillance cameras prove effective for identifying local abnormalities, their broader application for city-wide anomaly detection is hampered by high installation and maintenance costs. On the other hand, social media data facilitates multimodal anomaly detection methods, such as the integration of bike-sharing and taxi usage history <ref type="bibr" target="#b5">[6]</ref>, and the semantic interpretation of location-based anomalies identified through social media analytics <ref type="bibr" target="#b6">[7]</ref>.</p><p>This research leverages population data extracted from communication logs between mobile devices and base stations to enhance anomaly detection capabilities. In contrast to road sensors and surveillance cameras, mobile device data captures a wide array of individual behaviors throughout the entire city, thus offering a more comprehensive solution for detecting anomalies. For instance, Yabe et al. <ref type="bibr" target="#b7">[8]</ref> utilized statistical methods to detect the emergence of non-designated shelters during disasters, although their studies lacked a quantitative assessment of accuracy. Similarly, Ochiai et al. used mobile phone-based population data to detect nondesignated evacuation sites during disasters by focusing on reconstruction errors. However, as Takeishi has pointed out, these reconstruction errors, significantly influenced by feature interactions, may not accurately pinpoint anomaly locations. To overcome this limitation, Takeishi introduced a method that employs Shapley Values to elucidate the origins of anomalies within dimensionality reduction models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodologies</head><p>This section details our proposed methodology, SHAP Values of Reconstruction Error, for anomaly detection, incorporating SHAP values derived from reconstruction errors. We begin by describing Principal Component (PCA) as the foundation for dimensionality reduction. Subsequently, we outline the traditional method based on reconstruction errors. Then, we introduce our enhanced approach that integrates SHAP values to improve the accuracy and effectiveness of anomaly detection. Finally, we explain the calculation of SHAP values for multi-dimensional objective variables, extending the methodology to more complex scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Principal Component Analysis</head><p>PCA is a widely-used technique for reducing the dimensionality of high-dimensional data while preserving the most significant features. By projecting the data onto a lowerdimensional space, PCA identifies the principal components that capture the maximum variance in the data.</p><p>Given a dataset X ∈ R 𝑛×𝑑 with 𝑛 samples and 𝑑 features, the PCA process involves the following steps: In this study, we set the threshold for the variance to be retained at 90%. This means that we select the number of principal components 𝑝 such that the cumulative variance explained by these components is at least 90%. By retaining the principal components that explain the majority of the variance, PCA ensures that the most important features of the data are preserved, allowing for effective dimensionality reduction and subsequent analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Reconstruction Error</head><p>Using the principal components obtained from PCA, we can perform dimensionality reduction and reconstruction. Consider a test data vector y ∈ R 𝑑 . The reduced representation y 𝑟𝑒𝑑 ∈ R 𝑝 is obtained using the mapping function 𝑟𝑒𝑑𝑢𝑐𝑒 : R 𝑑 → R 𝑝 , and the reconstructed vector y ˆ∈ R 𝑑 is computed using the reconstruction function 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡 : R 𝑝 → R 𝑑 as follows:</p><formula xml:id="formula_0">y ˆ= 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡(𝑟𝑒𝑑𝑢𝑐𝑒(y))<label>(1)</label></formula><p>The reconstruction error 𝑒y, a measure of fidelity of reconstruction, is defined as the squared Euclidean distance between the original and reconstructed vectors:</p><formula xml:id="formula_1">𝑒y = ‖y − y ˆ‖2 2 = 𝑑 ∑︁ 𝑖=1 (𝑦𝑖 − 𝑦𝑖 ˆ)2<label>(2)</label></formula><p>This error metric helps identify significant deviations from normal patterns, which are potential indicators of anomalies.</p><p>Additionally, the reconstruction error for each feature 𝑖 is calculated as:</p><formula xml:id="formula_2">𝑒𝑦 𝑖 = |𝑦𝑖 − 𝑦𝑖 ˆ|<label>(3)</label></formula><p>This feature-specific error is employed to identify which specific features are exhibiting anomalies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">SHAP Values of Reconstruction Error</head><p>Typically, the SHAP value 𝜑𝑖(𝑥) for each feature 𝑖 is calculated using all features of the instance 𝑥 as explanatory variables and the prediction 𝑦 ˆas the objective variable as follows:</p><formula xml:id="formula_3">𝜑𝑖(𝑥) = 𝑠ℎ𝑎𝑝_𝑣𝑎𝑙𝑢𝑒(𝑖; 𝑦 ˆ, 𝑥)<label>(4)</label></formula><p>This formulation allows for measuring the impact of feature 𝑖 on the prediction 𝑦 ˆ, providing a concrete metric for understanding the significance of each feature in the model.</p><p>Similarly, the SHAP value of reconstruction error 𝜓𝑖(𝑥) is calculated using all features of the instance 𝑥 as explanatory variables and the reconstruction error 𝑒𝑦 as the objective variable using the following function:</p><formula xml:id="formula_4">𝜓𝑖(𝑥) = 𝑠ℎ𝑎𝑝_𝑣𝑎𝑙𝑢𝑒(𝑖; 𝑒𝑦, 𝑥)<label>(5)</label></formula><p>This formulation measures the impact of feature 𝑖 on the reconstruction error 𝑒𝑦, providing a concrete metric for evaluating the significance of each feature in anomaly detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">SHAP Values of Reconstruction Error for Multidimensional Obejective Variables</head><p>The SHAP value of Reconstructtion Error 𝜓𝑖(𝑥) for feature 𝑖, when the objective variable is represented in one dimension, is calculated using the following equation <ref type="bibr" target="#b8">[9]</ref>:</p><formula xml:id="formula_5">𝜓𝑖(𝑥) = ∑︁ 𝑆⊆{1,...,𝑑}∖𝑖 (𝑑 − |𝑆| − 1)!|𝑆|! 𝑑! [𝑓 (𝑥𝑆∪𝑖) − 𝑓 (𝑥𝑆)]<label>(6)</label></formula><p>In this equation, 𝑓 (𝑥𝑆∪𝑖) denotes the model's predicted value when the feature set 𝑆 includes feature 𝑖, and 𝑓 (𝑥𝑆) represents the predicted value when the set 𝑆 is used without feature 𝑖. |𝑆| denotes the number of elements in the feature subset 𝑆, and 𝑑 is the total number of features.</p><p>Subsequently, the SHAP Values of Reconstruction Error 𝜓 (𝑘) 𝑖 (𝑥) for feature 𝑖 impacting the 𝑘-th dimension of the objective variable is calculated as the average difference between the model predictions with and without feature 𝑖, across all combinations of features, thus extending equation 6 into the multidimensional context of SHAP Values of Reconstruction Error as follows:</p><formula xml:id="formula_6">𝜓 (𝑘) 𝑖 (𝑥) = 1 𝐾 𝐾 ∑︁ 𝑘=1 ⎡ ⎣ ∑︁ 𝑆⊆{1,...,𝑑}∖𝑖 (𝑑 − |𝑆| − 1)!|𝑆|! 𝑑! [𝑓 (𝑘) (𝑥𝑆∪𝑖) − 𝑓 (𝑘) (𝑥𝑆)] ]︁<label>(7)</label></formula><p>This formulation allows for measuring the impact of feature 𝑖 across different combinations of features on each dimension of the recostruction error. By doing so, it provides a concrete metric for elucidating the significance of feature 𝑖 in anomaly detection, offering detailed insights into the causes of anomalies in specific dimensions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Preliminaries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Definition: Spatial-Temporal Population Data</head><p>This study utilizes Mobile Spatial Statistics (MSS) <ref type="bibr" target="#b9">[10]</ref>, representing population counts recorded across a twodimensional geographic grid. Each record, denoted as (𝑔, 𝑡, 𝑝𝑜𝑝𝑔,𝑡), indicates the population count 𝑝𝑜𝑝𝑔,𝑡 at grid 𝑔 and timestamp 𝑡.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Problem Statement</head><p>The goal of this study is to assess the accuracy of anomaly detection in spatial-temporal population data. We compare two methodologies: a traditional approach using reconstruction errors, and a novel approach using SHAP Values of Reconstruction Errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Anomaly Insertion Methodology</head><p>During the test phase, artificial anomalies are introduced by altering population figures within selected grids. For each timestamp 𝑡, a grid 𝑔 is chosen randomly, and 𝑝𝑜𝑝𝑔,𝑡 is modified to the maximum or minimum value seen during the training period, defined as:</p><formula xml:id="formula_7">𝑝𝑜𝑝 𝑛𝑒𝑤 𝑔,𝑡 =</formula><p>{︃ max(𝑝𝑜𝑝 𝑔,𝑡 ′ : 𝑡 ′ ∈ 𝑇train), if max anomaly min(𝑝𝑜𝑝 𝑔,𝑡 ′ : 𝑡 ′ ∈ 𝑇train), if min anomaly</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Evaluation Methodology</head><p>The efficacy of each detection method is quantified using the Hits@𝑘 metric, which determines if the true anomalous grid is among the top 𝑘 ranks based on anomaly scores. The scores are calculated using the following equations:</p><p>𝑒𝑦 𝑔 = |𝑦𝑔 − 𝑦 ˆ𝑔| (as defined in Equation <ref type="formula" target="#formula_2">3</ref>) ( <ref type="formula">8</ref>)</p><formula xml:id="formula_8">𝜓 (𝑘) 𝑔 (𝑥) = 1 𝐾 𝐾 ∑︁ 𝑘=1 ⎡ ⎣ ∑︁ 𝑆⊆{1,...,𝑑}∖𝑔 (𝑑 − |𝑆| − 1)!|𝑆|! 𝑑! [𝑓 (𝑘) (𝑥𝑆∪𝑔) − 𝑓 (𝑘) (𝑥𝑆)]</formula><p>]︁ (as defined in Equation <ref type="formula" target="#formula_6">7</ref>) <ref type="bibr" target="#b8">(9)</ref> Rankings for each grid are obtained by comparing these scores against all others in the dataset.</p><p>Hits@k Calculations For both methods, Hits@k is defined and calculated separately for each method to assess the efficacy in identifying the true anomalies within the top 𝑘 ranks of predicted anomalies. The total number of test instances, denoted as 𝑁 , is used to normalize the calculations, ensuring that the results are proportional to the size of the test dataset. The calculations are as follows:</p><formula xml:id="formula_9">Hits@𝑘reconst = 1 𝑁 𝑁 ∑︁ 𝑖=1 1(rank(𝑒𝑦 𝑔 𝑖 ) ≤ 𝑘)<label>(10)</label></formula><formula xml:id="formula_10">Hits@𝑘SHAP = 1 𝑁 𝑁 ∑︁ 𝑖=1 1(rank(𝜓 (𝑘) 𝑔 𝑖 (𝑥)) ≤ 𝑘)<label>(11)</label></formula><p>where 1(•) is the indicator function, and 𝑁 represents the total number of test instances. These metrics facilitate a direct comparison of the methods' effectiveness in accurately detecting anomalies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiments</head><p>Figure <ref type="figure">1:</ref> The study area encompassing a total area of 5km x 5km around Shibuya Station, divided into a grid with each cell measuring 500m x 500m.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Dataset</head><p>This study utilizes Mobile Spatial Statistics data generated from communication records between NTT DOCOMO's base stations and mobile devices. This data is divided into mesh units across Japan in accordance with the Regional Mesh standards provided by the Ministry of Internal Affairs and Communications Statistics Bureau <ref type="bibr" target="#b10">[11]</ref>. Population counts for each grid are estimated at 10-minute intervals, considering factors such as number of devices accessing each base station, market share rates, residential areas, age, and gender. To ensure privacy, the data is prepared in accordance with guidelines published by NTT DOCOMO <ref type="bibr" target="#b11">[12]</ref>. The experimental area, as shown in figure <ref type="figure">1</ref> , comprises 100 grids of 500 square meters each, centered around Shibuya Station. The population data is treated as 100-dimensional feature vectors and standardized for each dimension.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Training Phase</head><p>The training data consists of population records with a 10minute resolution from October 17 and October 24, 2022, totaling 288 instances (6×24×2), were prepared. A PCA model was trained with these data, setting the dimensionality reduction to retain 90% of the variance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Testing Phase</head><p>The test data consists of 144 population records (6×24) from October 31, 2022, matching the same month and day of the week as the training data. Anomalies were inserted using the method described in Section 4.2.1. For each instance, one grid was randomly selected, and its population count was replaced with either the maximum or minimum population observed for that grid. A total of 288 tests were conducted to determine if the grid with the altered population could be accurately identified.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Experimental Results</head><p>This section presents the results of anomaly detection experiments conducted using both the traditional reconstruction error method and the proposed SHAP value method. The performance of each method is illustrated through selected examples at various timestamps, as detailed in Table <ref type="table" target="#tab_0">1</ref>.</p><p>The analysis shows varying levels of detection accuracy for each method, with lower ranking values indicating higher precision in anomaly detection. Specifically, at 19:10 on October 31, 2022, both methods accurately detected the anomaly in grid 5339-3588-4, achieving the lowest possible rank of 1. This instance demonstrates the effectiveness of both approaches in scenarios where there is a substantial change in population, from 1.378 to -1.257.</p><p>Conversely, at 10:50 on the same day, the anomaly in grid 5339-3574-1 was detected with lower accuracy, resulting in ranks of 9 and 8 for the reconstruction error and SHAP methods, respectively. This indicates a reduced effectiveness of both methods in detecting anomalies associated with smaller changes in population, from 1.429 to 2.187. Moreover, at 9:10, the SHAP method outperformed the reconstruction error method by more accurately identifying the anomaly in grid 5339-3584-1, with a rank of 2 compared to 6. This demonstrates the SHAP method's enhanced ability to detect subtle yet significant changes in population, from 1.536 to 1.682.</p><p>A comprehensive evaluation using the Hits@k metric, which assesses performance under scenarios of maximum  <ref type="table" target="#tab_1">2</ref>. Hits@k values range from 0 to 1, with higher values indicating more effective anomaly detection. For maximum population changes, the SHAP method demonstrates superior performance with Hits@k scores of 0.417 at 𝑘 = 1 and 0.472 at 𝑘 = 3, exceeding the scores of the reconstruction error method, which are 0.382 at 𝑘 = 1 and 0.458 at 𝑘 = 3. Similarly, in scenarios of minimum population changes, the SHAP method achieves better scores of 0.375 at 𝑘 = 1 and 0.438 at 𝑘 = 3, outperforming the reconstruction error method's scores of 0.340 at 𝑘 = 1 and 0.410 at 𝑘 = 3. These findings confirm the effectiveness of the SHAP method in consistently identifying anomalies under varied conditions, highlighting its superiority over the traditional method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This paper evaluated the performance of established reconstruction error techniques and the SHAP value method for anomaly detection in spatio-temporal population datasets. The study highlighted the SHAP method's enhanced capability for precise anomaly identification, which is crucial for high-accuracy applications such as urban planning and emergency management. The experimental datasets were synthetically modified to include anomalies, offering a controlled setting to examine and contrast the performance of these methods. Future research aims to extend the application of these techniques to real-world data, particularly in scenarios impacted by events like accidents, disasters, or significant public gatherings.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>1 . 2 .</head><label>12</label><figDesc>Standardize the Data: Subtract the mean of each feature from the dataset to center the data around the origin. Compute the Covariance Matrix: Calculate the covariance matrix C = 1 𝑛 X 𝑇 X. 3. Perform Eigenvalue Decomposition: Decompose the covariance matrix into eigenvalues and eigenvectors: C = VΛV 𝑇 , where Λ is a diagonal matrix containing the eigenvalues, and V is a matrix whose columns are the corresponding eigenvectors. 4. Select Principal Components: Choose the top 𝑝 eigenvectors corresponding to the largest eigenvalues to form the principal components. These components maximize the variance retained in the lower-dimensional space.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Examples of Anomaly Detection Results for Randomly Sampled Grids: This table compares the detection rankings from reconstruction error and SHAP value methods across three timestamps. Ranks close to 1 indicate higher detection accuracy.</figDesc><table><row><cell>Timestamp 𝑡</cell><cell cols="2">Randomly Sampled Grid 𝑔 𝑝𝑜𝑝𝑔,𝑡</cell><cell>𝑝𝑜𝑝 𝑛𝑒𝑤 𝑔,𝑡</cell><cell cols="2">rank(𝑒𝑔 ) rank(𝜓 𝑔 (𝑥)) (𝑘)</cell></row><row><cell>2022/10/31 19:10</cell><cell>5339-3588-4</cell><cell>1.378</cell><cell>-1.257</cell><cell>1</cell><cell>1</cell></row><row><cell>2022/10/31 10:50</cell><cell>5339-3574-1</cell><cell>1.429</cell><cell>2.187</cell><cell>9</cell><cell>8</cell></row><row><cell>2022/10/31 9:10</cell><cell>5339-3584-1</cell><cell>1.536</cell><cell>1.682</cell><cell>6</cell><cell>2</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparison of Hits@k Metrics for Reconstruction Error and SHAP Methods Across All Test Instances: This table presents the performance of both anomaly detection methods under conditions of maximum and minimum population changes. Hits@k values range from 0 to 1, where values closer to 1 indicate higher accuracy in detecting anomalies.</figDesc><table><row><cell></cell><cell cols="2">MAX</cell><cell cols="2">MIN</cell></row><row><cell></cell><cell>𝑘 = 1</cell><cell>𝑘 = 3</cell><cell>𝑘 = 1</cell><cell>𝑘 = 3</cell></row><row><cell>Hits@𝑘reconst</cell><cell>0.382</cell><cell>0.458</cell><cell>0.340</cell><cell>0.410</cell></row><row><cell>Hits@𝑘 SHAP</cell><cell>0.417</cell><cell>0.472</cell><cell>0.375</cell><cell>0.438</cell></row><row><cell cols="5">and minimum population changes, is summarized in Table</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Detection of non-designated shelters by extracting population concentrated areas after a disaster (industrial paper)</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ochiai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Terada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hanashima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Usuda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th International Conference on Advances in Geographic Information Systems</title>
				<meeting>the 30th International Conference on Advances in Geographic Information Systems</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Shapley values of reconstruction errors of pca for explaining anomaly detection</title>
		<author>
			<persName><forename type="first">N</forename><surname>Takeishi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Data Mining Workshops</title>
				<meeting>the International Conference on Data Mining Workshops<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="793" to="798" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A genetic programming model for real-time crash prediction on freeways</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Intelligent Transportation Systems</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="574" to="586" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Robust real-time unusual event detection using multiple fixedlocation monitors</title>
		<author>
			<persName><forename type="first">A</forename><surname>Adam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rivlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shimshoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Reinitz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="555" to="560" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Exploring extreme events on social media: A comparison of user reposting/retweeting behaviors on twitter and weibo</title>
		<author>
			<persName><forename type="first">X</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Kenneth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">R</forename><surname>Spence</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers in Human Behavior</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page" from="576" to="581" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Detecting collective anomalies from multiple spatio-temporal datasets across different domains</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd SIGSPA-TIAL international conference on advances in geographic information systems</title>
				<meeting>the 23rd SIGSPA-TIAL international conference on advances in geographic information systems</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Crowd sensing of traffic anomalies based on human mobility and social media</title>
		<author>
			<persName><forename type="first">B</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wilkie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shahabi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st ACM SIGSPA-TIAL International Conference on Advances in Geographic Information Systems</title>
				<meeting>the 21st ACM SIGSPA-TIAL International Conference on Advances in Geographic Information Systems</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="344" to="353" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A framework for evacuation hotspot detection after large scale disasters using location data from smartphones: case study of kumamoto earthquake</title>
		<author>
			<persName><forename type="first">T</forename><surname>Yabe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tsubouchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sekimoto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th SIGSPATIAL International Conference on Advances in Geographic Information Systems</title>
				<meeting>the 24th SIGSPATIAL International Conference on Advances in Geographic Information Systems</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Neural Information Processing Systems</title>
				<meeting>the 31st International Conference on Neural Information Processing Systems</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4768" to="4777" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Population estimation technology for mobile spatial statistics</title>
		<author>
			<persName><forename type="first">M</forename><surname>Terada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nagata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kobayashi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">NTT DOCOMO Technical Journal</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<ptr target="https://www.stat.go.jp/data/mesh/m_tuite.html" />
		<title level="m">Area mesh statistics</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
		<respStmt>
			<orgName>Statistics Bureau of Japan</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="https://www.docomo.ne.jp/english/binary/pdf/service/world/inroaming/inroaming_service/Mobile_Kukan_Toukei_Guidelines.pdf" />
		<title level="m">Mobile spatial statistics guideline</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
		<respStmt>
			<orgName>NTT DOCOMO</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
