=Paper= {{Paper |id=Vol-3318/short5 |storemode=property |title=STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods |pdfUrl=https://ceur-ws.org/Vol-3318/short5.pdf |volume=Vol-3318 |authors=Kyunghee Kim,Minha Kim,Simon Woo3 |dblpUrl=https://dblp.org/rec/conf/cikm/KimKW22 }} ==STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods== https://ceur-ws.org/Vol-3318/short5.pdf

STL-DP: Differentially Private Time Series Exploring
Decomposition and Compression Methods
Kyunghee Kim1 , Minha Kim2 and Simon Woo3,∗
1
Department of Statistics, Sungkyunkwan University, Seoul, Korea
2
Department of Artifical Intelligence, Sungkyunkwan University, Suwon, Korea
3
Department of Applied Data Science, Sungkyunkwan University, Suwon, Korea

Abstract
As time series data is collected and used in a variety of fields, the importance of preserving privacy on time series is also on the
increase. This paper is a preliminary study of the Differential Privacy (DP) algorithm specially designed to provide privacy
to time series data by integrating the time series decomposition technique. In particular, this study extends the Fourier
Perturbation Algorithm (FPA) with Seasonal and Trend decomposition using LOESS (STL). In this work, we propose STL-DP,
which first performs STL decomposition to the original data. Then we apply the FPA only to the core part of the time
series, particularly trend or seasonal components, to provide privacy. In this preliminary study, we show that our approach
consistently outperforms other baselines in terms of utility according to the experimental results. Our code is available at
https://github.com/Privacy-DASH/STL-DP.

Keywords
Differential Privacy, Time Series, Fourier Perturbation Algorithm, STL Decomposition

1. Introduction the temporal correlations across time.
In this paper, we propose a simple yet intuitive dif-
Recently the need for providing data privacy has signifi- ferentially private (DP) mechanism, STL-DP, which can
cantly increased, as the quantity of data is growing at an effectively mitigate the aforementioned problem. We as-
unprecedented speed, and a trend to make such large data sume that time series data can be decomposed into trend
accessible to the public is also growing. To share data and seasonal components, and such information can be
and use them for multiple tasks, ensuring data privacy is considered sensitive by the data providers because they
crucial. Therefore, many privacy protection techniques present the overall ups and downs and periodic patterns.
have been proposed and researched, such as Differential Therefore, it is of great importance to maintain such
Privacy (DP) [1], Homomorphic Encryption [2], and information private throughout the entire data mining
Generative Adversarial Network (GAN) [3]. process [5]. We explore Seasonal and Trend decomposi-
However, despite the vulnerability of time series data tion using LOESS (STL) [9] for decomposing time series
due to their widespread application in various fields, which leverages Local regression (LOESS) [10]. Our pro-
privacy-preserving mechanisms on time series data have posed mechanism STL-DP consists of two stages that
not been extensively investigated yet [4]. In this paper, effectively hide the trend or seasonality of the time series
we consider and propose a DP mechanism specially de- data. This mechanism enables us to improve the utility
signed to protect the privacy on time series data. One while maintaining privacy by concealing the primary
of the unique characteristics of time series is that it components of the time series.
exhibits a strong correlation among successive values. The main contributions of our work are summarized
Accordingly, if the adversary knows the approximate as follows:
time information, information leakage can occur through
contextual understanding, as shown by other research • We propose STL-DP that considers the unique
works [5, 6]. However, existing perturbation methods characteristics of time series to provide the most
such as Gaussian Perturbation Algorithm (GPA) [7], and suitable privacy protection method for time se-
Laplace Perturbation Algorithm (LPA) [8] do not consider ries.
• We show that STL-DP effectively protects the
CIKM22-PAS: The 1st International Workshop on Privacy Algorithms
core parts of the time series data under the same
in Systems @ CIKM’22, Oct. 17-22, 2022, Atlanta, Georgia, USA
∗
Corresponding author. privacy budget, thereby significantly improving
Envelope-Open kkh97122647@gmail.com (K. Kim); sunshine01@g.skku.edu utility over the existing methods.
(M. Kim); swoo@g.skku.edu (S. Woo)
Orcid 0000-0002-5129-1325 (K. Kim); 0000-0002-3224-0610 (M. Kim);
0000-0002-8983-1542 (S. Woo)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: The overview of our proposed STL-DP.

2. Preliminaries each individual 𝐷1 , 𝐷2 , … , 𝐷𝑈 to numbers. The DFT and
IDFT for the 𝑗 𝑡ℎ element of the series is defined as (2):
2.1. 𝜖 - Differential Privacy 𝑛 2𝜋√−1
𝑗𝑖
Differential Privacy [1] ensures no significant change in 𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 = ∑ 𝑒 𝑛 𝑓 (𝐷)𝑖 , (2)
the query response, whether a particular individual is in 𝑖=1
𝑛
a database or not [11]. 1 − 2𝜋√−1 𝑗𝑖
𝐼 𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 = ∑ 𝑒 𝑛 𝑓 (𝐷)𝑖
𝑛 𝑖=1
′
Definition. There are two databases 𝐷, 𝐷 which sat-
′
isfy ||𝐷 − 𝐷 ||1 ≤ 1. D denotes composed data of 𝑈 indi- As compression methods convert the series from time
vidual users, i.e., 𝐷 = ∪𝑈𝑖=1 𝐷𝑖 , and the data of any single to frequency domain, noises injected in the frequency
user can be put as 𝐷𝑖 . Let us denote 𝑀 and 𝜖 as some domain are no longer independent but are correlated.
randomized function and a privacy budget, respectively. For this reason, FPA is better suited for perturbing time
𝑀 guarantees 𝜖-privacy if and only if it satisfies the fol- series [13], and we extend the FPA-based method in our
lowing Eq. (1): work.

′
𝑃[𝑀(𝐷) ∈ 𝑆] ≤ 𝑒 𝜖 × 𝑃[𝑀(𝐷 ) ∈ 𝑆], ∀𝑆 ∈ 𝑅𝑎𝑛𝑔𝑒(𝑀) (1) 2.3. Seasonal and Trend decomposition
using LOESS (STL)
DP mechanism aims to keep the query response for
′
each 𝐷, 𝐷 the same, despite having one or fewer non- There are various time series decomposition methods
overlapping individuals. Specifically, the smaller the such as classical decomposition [14], X11 [15], and
𝜖 is, the higher the privacy protection of the data be- STL [9]. The classical method is simple to implement
comes [12]. but is inapplicable since some data from both ends of the
sequence are lost. X11 successfully tackled the problem
2.2. DP Algorithms for Time Series of data loss but is still limited in use as it can only handle
monthly or quarterly data. On the other hand, STL ef-
Laplace Perturbation Algorithm (LPA). LPA [8] fectively handles the problems mentioned above. STL is
adds independent noise generated from the Laplace dis- a flexible and robust time series decomposition method
tribution [1]. LPA is renowned for its simplicity but it that leverages local regression (LOESS).
is unsuitable for protecting time series because of its
independent noise injection.
3. Our Approach
Fourier Perturbation Algorithm (FPA). FPA is a
We propose STL-DP to protect core information of the
compression-based method that first applies the Dis-
time series while improving utility within a predefined
crete Fourier Transform (DFT) to the true query an-
privacy budget. Refer to Figure 1 for a glance at our
swers, then performs LPA to Fourier coefficients [8].
proposed STL-DP.
The perturbed coefficients undergo the inverse DFT
The main difference of STL-DP with the existing meth-
(IDFT) to obtain the resulting perturbed sequence. The
ods is the integration of STL decomposition. First, by
entire process can be expressed as perturbed f(D) =
incorporating the decomposition phase, we can identify
𝐼 𝐷𝐹 𝑇 (𝐿𝑃𝐴(𝐷𝐹 𝑇 (𝑓 (𝐷))), where 𝑓 is a function that maps
the core components, such as the trend and seasonality of
Table 1
Euclidean distance between the original and the perturbed time series.
Algorithm Zone epsilon1 epsilon2 epsilon3 epsilon4
LPA Zone1 1.3418E+4 2.6102E+3 1.3052E+3 2.6694E+2
FPA 3.0351E+0 4.7537E+0 2.2658E+0 7.6124E-8
sFPA 9.6445E+0 1.2354E-7 1.4956E+0 2.3439E-8
tFPA 6.1856E-7 5.2730E-7 1.8135E+0 7.6125E-8
LPA Zone2 1.3004E+4 2.7051E+3 1.3226E+3 2.6037E+2
FPA 4.9405E+0 1.3781E+0 1.7282E+0 0.3781E+0
sFPA 1.6598E-7 1.9342E+0 6.4095E-7 4.4947E-8
tFPA 4.7581E-7 1.0069E-7 1.2109E+0 0.3782E+0
LPA Zone3 1.2928E+4 2.6534E+3 1.3374E+3 2.7253E+2
FPA 2.7932E-7 3.8466E+0 1.1964E-6 2.5183E-7
sFPA 1.3435E+1 2.4369E-7 1.6377E+0 0.2554E+0
tFPA 2.1689E+1 2.2007E+0 9.7750E-8 2.5183E-7

Table 2
Comparison of △MAPE for each mechanism; △MAPE = |MAPE (DP algorithm) - MAPE (Original Series)|
epsilon1 epsilon2 epsilon3 epsilon4 epsilon1 epsilon2 epsilon3 epsilon4
LPA Linear 0.7476 0.0081 0.0125 0.0068 SimpleDNN 2.2842 1.6252 0.5990 0.2236
FPA 0.0013 0.0002 0.0002 0.0001 1.5972 0.3159 0.7612 0.1213
tFPA 0.0017 0.0005 0.0010 0.0000 0.0602 1.0390 1.0390 0.2152
sFPA 0.0094 0.0014 0.0001 0.0000 1.5158 1.7738 0.1083 0.2468
LPA RNN 1.4877 0.0194 0.0437 0.0084 Transformer 1.1464 0.5750 0.4180 0.2477
FPA 0.0169 0.0089 0.0004 0.0094 0.0738 0.1268 0.0860 0.0941
tFPA 0.0072 0.0081 0.0042 0.0004 0.2325 0.0776 0.2083 0.0271
sFPA 0.0117 0.0070 0.0007 0.0009 0.0864 0.4175 0.1468 0.0832

time series data, which may contain critical information mation, including temperature, humidity, wind
and are prone to attacks. One of the STL-DP mecha- speed, general diffuse flows, and diffuse flows.
nisms is referred to as sFPA, which is a method that injects • Data information : Aggregated from 550,374 in-
noises only to the seasonal part of the decomposed series. habitants according to Morocco Census [16].
Similarly, the approach of performing perturbations only
on the trend is named tFPA. Lastly, the perturbed compo- Throughout the experiment, we set the privacy budget
nents from the seasonal or trend parts are combined with 𝜖1 − 𝜖4 as 0.48, 2.4, 4.8, and 24, respectively, and the
the rest of the unperturbed components to reconstruct sensitivity as 48, which are experimental settings taken
the form of the sequence. from Günther, et al. [17].

Euclidean distance results. The degree of closeness
4. Experimental Results between the original and the perturbed series under the
same privacy budget can be interpreted as the level of
Methods. We demonstrate the effectiveness of the pro-
utility. As shown in Table 1, LPA yields a greater distance
posed STL-DP by comparing the utility of our sFPA and
than other methods, confirming that FPA-based methods
tFPA with two baselines, LPA and FPA. Herein, we intro-
are better than LPA. Furthermore, our tFPA and sFPA
duce two different metrics to quantify utility. The first
are consistently ranked as the best algorithm in terms of
metric is the Euclidean distance between the original
Euclidean distance. These results indicate the superiority
and the perturbed series. Nextly, the original and the
of STL-DP over the baselines.
perturbed data are each fed into the forecasting model to
evaluate the respective Mean Absolute Percentage Error
Comparison on forecasting performances. Recall
(MAPE), and the difference between the two MAPEs is
that our objective is to generate noise-injected series that
used as the second metric.
minimize the performance drop of the forecasting model.
As shown in Table 2, we used four models, from a simple
Dataset. We used the power consumption data from
feed-forward neural network to advanced models such
2017-01-01 to 2017-12-31 of three zones of Tetouan city
as LSTM and Transformer as our forecasting model. The
located in northern Morocco [16]. The properties of the
models were trained to predict the upcoming 20 timesteps
dataset are summarized as follows:
given the past 60 timesteps. In most cases, as 𝜖 grew, the
• Prediction variables : Power consumption of zone forecasting error difference (△MAPE) decreased. Not
1, 2, and 3 of Tetouan city with additional infor-
surprisingly, both sFPA and tFPA outperformed other DP Generative adversarial nets, Advances in neural
mechanisms for a majority of models in terms of △MAPE. information processing systems 27 (2014).
[4] S. Papadimitriou, F. Li, G. Kollios, P. S. Yu, Time
series compressibility and privacy, in: Proceedings
5. Conclusion and Future Work of the 33rd international conference on Very large
data bases, Citeseer, 2007, pp. 459–470.
As preliminary research, we introduce an effective DP
[5] Y. Zhu, Y. Fu, H. Fu, On privacy in time series
mechanism, STL-DP, specially designed for generating
data mining, in: Pacific-Asia Conference on Knowl-
privacy-protected time series data. As the experiment
edge Discovery and Data Mining, Springer, 2008,
results suggested, the distance between the original se-
pp. 479–493.
ries and the perturbed series from sFPA and tFPA was
[6] Y. Zhu, Y. Fu, H. Fu, A new class of attacks on time
much closer than the other baselines. Also, the difference
series data mining, Intelligent Data Analysis 14
between the MAPE of the original and the perturbed se-
(2010) 405–418.
ries was significantly lower for our proposed sFPA and
[7] N. U. Sheikh, H. J. Asghar, F. Farokhi, M. A. Kaafar,
tFPA than for other perturbation algorithms. Therefore,
Do auto-regressive models protect privacy inferring
we showed that considering the unique property of time
fine-grained energy consumption from aggregated
series data improves the utility under the same privacy
model parameters, IEEE Transactions on Services
budget. For future works, we plan to extend our research
Computing (2021).
by designing a more advanced mechanism 𝐹 𝑃𝐴𝑘 , that
[8] V. Rastogi, S. Nath, Differentially private aggrega-
uses only the 𝑘 (<𝑛) Fourier coefficients as targets of the
tion of distributed time-series with transformation
perturbation.
and encryption, in: Proceedings of the 2010 ACM
SIGMOD International Conference on Management
Acknowledgments of data, 2010, pp. 735–746.
[9] R. B. Cleveland, W. S. Cleveland, J. E. McRae, I. Ter-
The work was supported by the affiliated institute of penning, Stl: A seasonal-trend decomposition, J.
ETRI [2022-075]. Also, this work was partially supported Off. Stat 6 (1990) 3–73.
by the Basic Science Research Program through National [10] W. G. Jacoby, Loess:: a nonparametric, graphical
Research Foundation of Korea (NRF) grant funded by tool for depicting relationships between variables,
the Korean Ministry of Science and ICT (MSIT) under Electoral studies 19 (2000) 577–613.
No. 2020R1C1C1006004 and Institute for Information [11] W. Huang, S. Zhou, T. Zhu, Y. Liao, Improving
& communication Technology Planning & evaluation utility of differentially private mechanisms through
(IITP) grants funded by the Korean MSIT: (No. 2022- cryptography-based technologies: a survey, arXiv
0-01199, Graduate School of Convergence Security at preprint arXiv:2011.00976 (2020).
Sungkyunkwan University), (No. 2022-0-01045, Self- [12] J. Wang, S. Liu, Y. Li, A review of differential privacy
directed Multi-Modal Intelligence for solving unknown, in individual data release, International Journal of
open domain problems), (No. 2022-0-00688, AI Platform Distributed Sensor Networks 11 (2015) 259682.
to Fully Adapt and Reflect Privacy-Policy Changes), (No. [13] H. Wang, Z. Xu, Cts-dp: publishing correlated
2021-0-02068, Artificial Intelligence Innovation Hub), (No. time-series data via differential privacy, Knowledge-
2019-0-00421, AI Graduate School Support Program at Based Systems 122 (2017) 167–179.
Sungkyunkwan University), and (No. 2021-0-02309, Ob- [14] J. Cohen, W. Gorr, C. Durso, Estimation of crime sea-
ject Detection Research under Low Quality Video Condi- sonality: a cross-sectional extension to time series
tion). classical decomposition, H. John Heinz III Working
Paper (2003).
[15] A. Sutcliffe, X11 time series decomposition and sam-
References pling errors, Australian Bureau of Statistics, 1993.
[16] A. Salam, A. El Hibaoui, Comparison of machine
[1] C. Dwork, F. McSherry, K. Nissim, A. Smith, Cali-
learning algorithms for the power consumption
brating noise to sensitivity in private data analysis,
prediction:-case study of tetouan city–, in: 2018 6th
in: Theory of cryptography conference, Springer,
International Renewable and Sustainable Energy
2006, pp. 265–284.
Conference (IRSEC), IEEE, 2018, pp. 1–5.
[2] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al., On
[17] G. Eibl, K. Bao, P.-W. Grassal, D. Bernau,
data banks and privacy homomorphisms, Founda-
H. Schmeck, The influence of differential privacy
tions of secure computation 4 (1978) 169–180.
on short term electric load forecasting, Energy
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
Informatics 1 (2018) 93–113.
D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,