STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods Kyunghee Kim1 , Minha Kim2 and Simon Woo3,∗ 1 Department of Statistics, Sungkyunkwan University, Seoul, Korea 2 Department of Artifical Intelligence, Sungkyunkwan University, Suwon, Korea 3 Department of Applied Data Science, Sungkyunkwan University, Suwon, Korea Abstract As time series data is collected and used in a variety of fields, the importance of preserving privacy on time series is also on the increase. This paper is a preliminary study of the Differential Privacy (DP) algorithm specially designed to provide privacy to time series data by integrating the time series decomposition technique. In particular, this study extends the Fourier Perturbation Algorithm (FPA) with Seasonal and Trend decomposition using LOESS (STL). In this work, we propose STL-DP, which first performs STL decomposition to the original data. Then we apply the FPA only to the core part of the time series, particularly trend or seasonal components, to provide privacy. In this preliminary study, we show that our approach consistently outperforms other baselines in terms of utility according to the experimental results. Our code is available at https://github.com/Privacy-DASH/STL-DP. Keywords Differential Privacy, Time Series, Fourier Perturbation Algorithm, STL Decomposition 1. Introduction the temporal correlations across time. In this paper, we propose a simple yet intuitive dif- Recently the need for providing data privacy has signifi- ferentially private (DP) mechanism, STL-DP, which can cantly increased, as the quantity of data is growing at an effectively mitigate the aforementioned problem. We as- unprecedented speed, and a trend to make such large data sume that time series data can be decomposed into trend accessible to the public is also growing. To share data and seasonal components, and such information can be and use them for multiple tasks, ensuring data privacy is considered sensitive by the data providers because they crucial. Therefore, many privacy protection techniques present the overall ups and downs and periodic patterns. have been proposed and researched, such as Differential Therefore, it is of great importance to maintain such Privacy (DP) [1], Homomorphic Encryption [2], and information private throughout the entire data mining Generative Adversarial Network (GAN) [3]. process [5]. We explore Seasonal and Trend decomposi- However, despite the vulnerability of time series data tion using LOESS (STL) [9] for decomposing time series due to their widespread application in various fields, which leverages Local regression (LOESS) [10]. Our pro- privacy-preserving mechanisms on time series data have posed mechanism STL-DP consists of two stages that not been extensively investigated yet [4]. In this paper, effectively hide the trend or seasonality of the time series we consider and propose a DP mechanism specially de- data. This mechanism enables us to improve the utility signed to protect the privacy on time series data. One while maintaining privacy by concealing the primary of the unique characteristics of time series is that it components of the time series. exhibits a strong correlation among successive values. The main contributions of our work are summarized Accordingly, if the adversary knows the approximate as follows: time information, information leakage can occur through contextual understanding, as shown by other research • We propose STL-DP that considers the unique works [5, 6]. However, existing perturbation methods characteristics of time series to provide the most such as Gaussian Perturbation Algorithm (GPA) [7], and suitable privacy protection method for time se- Laplace Perturbation Algorithm (LPA) [8] do not consider ries. • We show that STL-DP effectively protects the CIKM22-PAS: The 1st International Workshop on Privacy Algorithms core parts of the time series data under the same in Systems @ CIKM’22, Oct. 17-22, 2022, Atlanta, Georgia, USA ∗ Corresponding author. privacy budget, thereby significantly improving Envelope-Open kkh97122647@gmail.com (K. Kim); sunshine01@g.skku.edu utility over the existing methods. (M. Kim); swoo@g.skku.edu (S. Woo) Orcid 0000-0002-5129-1325 (K. Kim); 0000-0002-3224-0610 (M. Kim); 0000-0002-8983-1542 (S. Woo) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: The overview of our proposed STL-DP. 2. Preliminaries each individual 𝐷1 , 𝐷2 , … , 𝐷𝑈 to numbers. The DFT and IDFT for the 𝑗 𝑡ℎ element of the series is defined as (2): 2.1. 𝜖 - Differential Privacy 𝑛 2𝜋√−1 𝑗𝑖 Differential Privacy [1] ensures no significant change in 𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 = ∑ 𝑒 𝑛 𝑓 (𝐷)𝑖 , (2) the query response, whether a particular individual is in 𝑖=1 𝑛 a database or not [11]. 1 − 2𝜋√−1 𝑗𝑖 𝐼 𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 = ∑ 𝑒 𝑛 𝑓 (𝐷)𝑖 𝑛 𝑖=1 ′ Definition. There are two databases 𝐷, 𝐷 which sat- ′ isfy ||𝐷 − 𝐷 ||1 ≤ 1. D denotes composed data of 𝑈 indi- As compression methods convert the series from time vidual users, i.e., 𝐷 = ∪𝑈𝑖=1 𝐷𝑖 , and the data of any single to frequency domain, noises injected in the frequency user can be put as 𝐷𝑖 . Let us denote 𝑀 and 𝜖 as some domain are no longer independent but are correlated. randomized function and a privacy budget, respectively. For this reason, FPA is better suited for perturbing time 𝑀 guarantees 𝜖-privacy if and only if it satisfies the fol- series [13], and we extend the FPA-based method in our lowing Eq. (1): work. ′ 𝑃[𝑀(𝐷) ∈ 𝑆] ≤ 𝑒 𝜖 × 𝑃[𝑀(𝐷 ) ∈ 𝑆], ∀𝑆 ∈ 𝑅𝑎𝑛𝑔𝑒(𝑀) (1) 2.3. Seasonal and Trend decomposition using LOESS (STL) DP mechanism aims to keep the query response for ′ each 𝐷, 𝐷 the same, despite having one or fewer non- There are various time series decomposition methods overlapping individuals. Specifically, the smaller the such as classical decomposition [14], X11 [15], and 𝜖 is, the higher the privacy protection of the data be- STL [9]. The classical method is simple to implement comes [12]. but is inapplicable since some data from both ends of the sequence are lost. X11 successfully tackled the problem 2.2. DP Algorithms for Time Series of data loss but is still limited in use as it can only handle monthly or quarterly data. On the other hand, STL ef- Laplace Perturbation Algorithm (LPA). LPA [8] fectively handles the problems mentioned above. STL is adds independent noise generated from the Laplace dis- a flexible and robust time series decomposition method tribution [1]. LPA is renowned for its simplicity but it that leverages local regression (LOESS). is unsuitable for protecting time series because of its independent noise injection. 3. Our Approach Fourier Perturbation Algorithm (FPA). FPA is a We propose STL-DP to protect core information of the compression-based method that first applies the Dis- time series while improving utility within a predefined crete Fourier Transform (DFT) to the true query an- privacy budget. Refer to Figure 1 for a glance at our swers, then performs LPA to Fourier coefficients [8]. proposed STL-DP. The perturbed coefficients undergo the inverse DFT The main difference of STL-DP with the existing meth- (IDFT) to obtain the resulting perturbed sequence. The ods is the integration of STL decomposition. First, by entire process can be expressed as perturbed f(D) = incorporating the decomposition phase, we can identify 𝐼 𝐷𝐹 𝑇 (𝐿𝑃𝐴(𝐷𝐹 𝑇 (𝑓 (𝐷))), where 𝑓 is a function that maps the core components, such as the trend and seasonality of Table 1 Euclidean distance between the original and the perturbed time series. Algorithm Zone epsilon1 epsilon2 epsilon3 epsilon4 LPA Zone1 1.3418E+4 2.6102E+3 1.3052E+3 2.6694E+2 FPA 3.0351E+0 4.7537E+0 2.2658E+0 7.6124E-8 sFPA 9.6445E+0 1.2354E-7 1.4956E+0 2.3439E-8 tFPA 6.1856E-7 5.2730E-7 1.8135E+0 7.6125E-8 LPA Zone2 1.3004E+4 2.7051E+3 1.3226E+3 2.6037E+2 FPA 4.9405E+0 1.3781E+0 1.7282E+0 0.3781E+0 sFPA 1.6598E-7 1.9342E+0 6.4095E-7 4.4947E-8 tFPA 4.7581E-7 1.0069E-7 1.2109E+0 0.3782E+0 LPA Zone3 1.2928E+4 2.6534E+3 1.3374E+3 2.7253E+2 FPA 2.7932E-7 3.8466E+0 1.1964E-6 2.5183E-7 sFPA 1.3435E+1 2.4369E-7 1.6377E+0 0.2554E+0 tFPA 2.1689E+1 2.2007E+0 9.7750E-8 2.5183E-7 Table 2 Comparison of △MAPE for each mechanism; △MAPE = |MAPE (DP algorithm) - MAPE (Original Series)| epsilon1 epsilon2 epsilon3 epsilon4 epsilon1 epsilon2 epsilon3 epsilon4 LPA Linear 0.7476 0.0081 0.0125 0.0068 SimpleDNN 2.2842 1.6252 0.5990 0.2236 FPA 0.0013 0.0002 0.0002 0.0001 1.5972 0.3159 0.7612 0.1213 tFPA 0.0017 0.0005 0.0010 0.0000 0.0602 1.0390 1.0390 0.2152 sFPA 0.0094 0.0014 0.0001 0.0000 1.5158 1.7738 0.1083 0.2468 LPA RNN 1.4877 0.0194 0.0437 0.0084 Transformer 1.1464 0.5750 0.4180 0.2477 FPA 0.0169 0.0089 0.0004 0.0094 0.0738 0.1268 0.0860 0.0941 tFPA 0.0072 0.0081 0.0042 0.0004 0.2325 0.0776 0.2083 0.0271 sFPA 0.0117 0.0070 0.0007 0.0009 0.0864 0.4175 0.1468 0.0832 time series data, which may contain critical information mation, including temperature, humidity, wind and are prone to attacks. One of the STL-DP mecha- speed, general diffuse flows, and diffuse flows. nisms is referred to as sFPA, which is a method that injects • Data information : Aggregated from 550,374 in- noises only to the seasonal part of the decomposed series. habitants according to Morocco Census [16]. Similarly, the approach of performing perturbations only on the trend is named tFPA. Lastly, the perturbed compo- Throughout the experiment, we set the privacy budget nents from the seasonal or trend parts are combined with 𝜖1 − 𝜖4 as 0.48, 2.4, 4.8, and 24, respectively, and the the rest of the unperturbed components to reconstruct sensitivity as 48, which are experimental settings taken the form of the sequence. from Günther, et al. [17]. Euclidean distance results. The degree of closeness 4. Experimental Results between the original and the perturbed series under the same privacy budget can be interpreted as the level of Methods. We demonstrate the effectiveness of the pro- utility. As shown in Table 1, LPA yields a greater distance posed STL-DP by comparing the utility of our sFPA and than other methods, confirming that FPA-based methods tFPA with two baselines, LPA and FPA. Herein, we intro- are better than LPA. Furthermore, our tFPA and sFPA duce two different metrics to quantify utility. The first are consistently ranked as the best algorithm in terms of metric is the Euclidean distance between the original Euclidean distance. These results indicate the superiority and the perturbed series. Nextly, the original and the of STL-DP over the baselines. perturbed data are each fed into the forecasting model to evaluate the respective Mean Absolute Percentage Error Comparison on forecasting performances. Recall (MAPE), and the difference between the two MAPEs is that our objective is to generate noise-injected series that used as the second metric. minimize the performance drop of the forecasting model. As shown in Table 2, we used four models, from a simple Dataset. We used the power consumption data from feed-forward neural network to advanced models such 2017-01-01 to 2017-12-31 of three zones of Tetouan city as LSTM and Transformer as our forecasting model. The located in northern Morocco [16]. The properties of the models were trained to predict the upcoming 20 timesteps dataset are summarized as follows: given the past 60 timesteps. In most cases, as 𝜖 grew, the • Prediction variables : Power consumption of zone forecasting error difference (△MAPE) decreased. Not 1, 2, and 3 of Tetouan city with additional infor- surprisingly, both sFPA and tFPA outperformed other DP Generative adversarial nets, Advances in neural mechanisms for a majority of models in terms of △MAPE. information processing systems 27 (2014). [4] S. Papadimitriou, F. Li, G. Kollios, P. S. Yu, Time series compressibility and privacy, in: Proceedings 5. Conclusion and Future Work of the 33rd international conference on Very large data bases, Citeseer, 2007, pp. 459–470. As preliminary research, we introduce an effective DP [5] Y. Zhu, Y. Fu, H. Fu, On privacy in time series mechanism, STL-DP, specially designed for generating data mining, in: Pacific-Asia Conference on Knowl- privacy-protected time series data. As the experiment edge Discovery and Data Mining, Springer, 2008, results suggested, the distance between the original se- pp. 479–493. ries and the perturbed series from sFPA and tFPA was [6] Y. Zhu, Y. Fu, H. Fu, A new class of attacks on time much closer than the other baselines. Also, the difference series data mining, Intelligent Data Analysis 14 between the MAPE of the original and the perturbed se- (2010) 405–418. ries was significantly lower for our proposed sFPA and [7] N. U. Sheikh, H. J. Asghar, F. Farokhi, M. A. Kaafar, tFPA than for other perturbation algorithms. Therefore, Do auto-regressive models protect privacy inferring we showed that considering the unique property of time fine-grained energy consumption from aggregated series data improves the utility under the same privacy model parameters, IEEE Transactions on Services budget. For future works, we plan to extend our research Computing (2021). by designing a more advanced mechanism 𝐹 𝑃𝐴𝑘 , that [8] V. Rastogi, S. Nath, Differentially private aggrega- uses only the 𝑘 (<𝑛) Fourier coefficients as targets of the tion of distributed time-series with transformation perturbation. and encryption, in: Proceedings of the 2010 ACM SIGMOD International Conference on Management Acknowledgments of data, 2010, pp. 735–746. [9] R. B. Cleveland, W. S. Cleveland, J. E. McRae, I. Ter- The work was supported by the affiliated institute of penning, Stl: A seasonal-trend decomposition, J. ETRI [2022-075]. Also, this work was partially supported Off. Stat 6 (1990) 3–73. by the Basic Science Research Program through National [10] W. G. Jacoby, Loess:: a nonparametric, graphical Research Foundation of Korea (NRF) grant funded by tool for depicting relationships between variables, the Korean Ministry of Science and ICT (MSIT) under Electoral studies 19 (2000) 577–613. No. 2020R1C1C1006004 and Institute for Information [11] W. Huang, S. Zhou, T. Zhu, Y. Liao, Improving & communication Technology Planning & evaluation utility of differentially private mechanisms through (IITP) grants funded by the Korean MSIT: (No. 2022- cryptography-based technologies: a survey, arXiv 0-01199, Graduate School of Convergence Security at preprint arXiv:2011.00976 (2020). Sungkyunkwan University), (No. 2022-0-01045, Self- [12] J. Wang, S. Liu, Y. Li, A review of differential privacy directed Multi-Modal Intelligence for solving unknown, in individual data release, International Journal of open domain problems), (No. 2022-0-00688, AI Platform Distributed Sensor Networks 11 (2015) 259682. to Fully Adapt and Reflect Privacy-Policy Changes), (No. [13] H. Wang, Z. Xu, Cts-dp: publishing correlated 2021-0-02068, Artificial Intelligence Innovation Hub), (No. time-series data via differential privacy, Knowledge- 2019-0-00421, AI Graduate School Support Program at Based Systems 122 (2017) 167–179. Sungkyunkwan University), and (No. 2021-0-02309, Ob- [14] J. Cohen, W. Gorr, C. Durso, Estimation of crime sea- ject Detection Research under Low Quality Video Condi- sonality: a cross-sectional extension to time series tion). classical decomposition, H. John Heinz III Working Paper (2003). [15] A. Sutcliffe, X11 time series decomposition and sam- References pling errors, Australian Bureau of Statistics, 1993. [16] A. Salam, A. El Hibaoui, Comparison of machine [1] C. Dwork, F. McSherry, K. Nissim, A. Smith, Cali- learning algorithms for the power consumption brating noise to sensitivity in private data analysis, prediction:-case study of tetouan city–, in: 2018 6th in: Theory of cryptography conference, Springer, International Renewable and Sustainable Energy 2006, pp. 265–284. Conference (IRSEC), IEEE, 2018, pp. 1–5. [2] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al., On [17] G. Eibl, K. Bao, P.-W. Grassal, D. Bernau, data banks and privacy homomorphisms, Founda- H. Schmeck, The influence of differential privacy tions of secure computation 4 (1978) 169–180. on short term electric load forecasting, Energy [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, Informatics 1 (2018) 93–113. D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,