=Paper= {{Paper |id=Vol-3318/short5 |storemode=property |title=STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods |pdfUrl=https://ceur-ws.org/Vol-3318/short5.pdf |volume=Vol-3318 |authors=Kyunghee Kim,Minha Kim,Simon Woo3 |dblpUrl=https://dblp.org/rec/conf/cikm/KimKW22 }} ==STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods== https://ceur-ws.org/Vol-3318/short5.pdf
STL-DP: Differentially Private Time Series Exploring
Decomposition and Compression Methods
Kyunghee Kim1 , Minha Kim2 and Simon Woo3,∗
1
  Department of Statistics, Sungkyunkwan University, Seoul, Korea
2
  Department of Artifical Intelligence, Sungkyunkwan University, Suwon, Korea
3
  Department of Applied Data Science, Sungkyunkwan University, Suwon, Korea


                                             Abstract
                                             As time series data is collected and used in a variety of fields, the importance of preserving privacy on time series is also on the
                                             increase. This paper is a preliminary study of the Differential Privacy (DP) algorithm specially designed to provide privacy
                                             to time series data by integrating the time series decomposition technique. In particular, this study extends the Fourier
                                             Perturbation Algorithm (FPA) with Seasonal and Trend decomposition using LOESS (STL). In this work, we propose STL-DP,
                                             which first performs STL decomposition to the original data. Then we apply the FPA only to the core part of the time
                                             series, particularly trend or seasonal components, to provide privacy. In this preliminary study, we show that our approach
                                             consistently outperforms other baselines in terms of utility according to the experimental results. Our code is available at
                                             https://github.com/Privacy-DASH/STL-DP.

                                             Keywords
                                             Differential Privacy, Time Series, Fourier Perturbation Algorithm, STL Decomposition



1. Introduction                                                                                                                       the temporal correlations across time.
                                                                                                                                         In this paper, we propose a simple yet intuitive dif-
Recently the need for providing data privacy has signifi-                                                                             ferentially private (DP) mechanism, STL-DP, which can
cantly increased, as the quantity of data is growing at an                                                                            effectively mitigate the aforementioned problem. We as-
unprecedented speed, and a trend to make such large data                                                                              sume that time series data can be decomposed into trend
accessible to the public is also growing. To share data                                                                               and seasonal components, and such information can be
and use them for multiple tasks, ensuring data privacy is                                                                             considered sensitive by the data providers because they
crucial. Therefore, many privacy protection techniques                                                                                present the overall ups and downs and periodic patterns.
have been proposed and researched, such as Differential                                                                               Therefore, it is of great importance to maintain such
Privacy (DP) [1], Homomorphic Encryption [2], and                                                                                     information private throughout the entire data mining
Generative Adversarial Network (GAN) [3].                                                                                             process [5]. We explore Seasonal and Trend decomposi-
   However, despite the vulnerability of time series data                                                                             tion using LOESS (STL) [9] for decomposing time series
due to their widespread application in various fields,                                                                                which leverages Local regression (LOESS) [10]. Our pro-
privacy-preserving mechanisms on time series data have                                                                                posed mechanism STL-DP consists of two stages that
not been extensively investigated yet [4]. In this paper,                                                                             effectively hide the trend or seasonality of the time series
we consider and propose a DP mechanism specially de-                                                                                  data. This mechanism enables us to improve the utility
signed to protect the privacy on time series data. One                                                                                while maintaining privacy by concealing the primary
of the unique characteristics of time series is that it                                                                               components of the time series.
exhibits a strong correlation among successive values.                                                                                   The main contributions of our work are summarized
Accordingly, if the adversary knows the approximate                                                                                   as follows:
time information, information leakage can occur through
contextual understanding, as shown by other research                                                                                       • We propose STL-DP that considers the unique
works [5, 6]. However, existing perturbation methods                                                                                         characteristics of time series to provide the most
such as Gaussian Perturbation Algorithm (GPA) [7], and                                                                                       suitable privacy protection method for time se-
Laplace Perturbation Algorithm (LPA) [8] do not consider                                                                                     ries.
                                                                                                                                           • We show that STL-DP effectively protects the
CIKM22-PAS: The 1st International Workshop on Privacy Algorithms
                                                                                                                                             core parts of the time series data under the same
in Systems @ CIKM’22, Oct. 17-22, 2022, Atlanta, Georgia, USA
∗
     Corresponding author.                                                                                                                   privacy budget, thereby significantly improving
Envelope-Open kkh97122647@gmail.com (K. Kim); sunshine01@g.skku.edu                                                                          utility over the existing methods.
(M. Kim); swoo@g.skku.edu (S. Woo)
Orcid 0000-0002-5129-1325 (K. Kim); 0000-0002-3224-0610 (M. Kim);
0000-0002-8983-1542 (S. Woo)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: The overview of our proposed STL-DP.



2. Preliminaries                                                each individual 𝐷1 , 𝐷2 , … , 𝐷𝑈 to numbers. The DFT and
                                                                IDFT for the 𝑗 𝑡ℎ element of the series is defined as (2):
2.1. 𝜖 - Differential Privacy                                                                    𝑛    2𝜋√−1
                                                                                                            𝑗𝑖
Differential Privacy [1] ensures no significant change in                     𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 = ∑ 𝑒       𝑛        𝑓 (𝐷)𝑖 ,   (2)
the query response, whether a particular individual is in                                       𝑖=1
                                                                                                𝑛
a database or not [11].                                                                       1     − 2𝜋√−1 𝑗𝑖
                                                                          𝐼 𝐷𝐹 𝑇 (𝑓 (𝐷))𝑗 =     ∑ 𝑒 𝑛 𝑓 (𝐷)𝑖
                                                                                              𝑛 𝑖=1
                                                ′
Definition. There are two databases 𝐷, 𝐷 which sat-
            ′
isfy ||𝐷 − 𝐷 ||1 ≤ 1. D denotes composed data of 𝑈 indi-        As compression methods convert the series from time
vidual users, i.e., 𝐷 = ∪𝑈𝑖=1 𝐷𝑖 , and the data of any single   to frequency domain, noises injected in the frequency
user can be put as 𝐷𝑖 . Let us denote 𝑀 and 𝜖 as some           domain are no longer independent but are correlated.
randomized function and a privacy budget, respectively.         For this reason, FPA is better suited for perturbing time
𝑀 guarantees 𝜖-privacy if and only if it satisfies the fol-     series [13], and we extend the FPA-based method in our
lowing Eq. (1):                                                 work.

                               ′
  𝑃[𝑀(𝐷) ∈ 𝑆] ≤ 𝑒 𝜖 × 𝑃[𝑀(𝐷 ) ∈ 𝑆], ∀𝑆 ∈ 𝑅𝑎𝑛𝑔𝑒(𝑀) (1)           2.3. Seasonal and Trend decomposition
                                                                     using LOESS (STL)
DP mechanism aims to keep the query response for
           ′
each 𝐷, 𝐷 the same, despite having one or fewer non-     There are various time series decomposition methods
overlapping individuals. Specifically, the smaller the   such as classical decomposition [14], X11 [15], and
𝜖 is, the higher the privacy protection of the data be-  STL [9]. The classical method is simple to implement
comes [12].                                              but is inapplicable since some data from both ends of the
                                                         sequence are lost. X11 successfully tackled the problem
2.2. DP Algorithms for Time Series                       of data loss but is still limited in use as it can only handle
                                                         monthly or quarterly data. On the other hand, STL ef-
Laplace Perturbation Algorithm (LPA). LPA [8] fectively handles the problems mentioned above. STL is
adds independent noise generated from the Laplace dis- a flexible and robust time series decomposition method
tribution [1]. LPA is renowned for its simplicity but it that leverages local regression (LOESS).
is unsuitable for protecting time series because of its
independent noise injection.
                                                                3. Our Approach
Fourier Perturbation Algorithm (FPA). FPA is a
                                                                We propose STL-DP to protect core information of the
compression-based method that first applies the Dis-
                                                                time series while improving utility within a predefined
crete Fourier Transform (DFT) to the true query an-
                                                                privacy budget. Refer to Figure 1 for a glance at our
swers, then performs LPA to Fourier coefficients [8].
                                                                proposed STL-DP.
The perturbed coefficients undergo the inverse DFT
                                                                   The main difference of STL-DP with the existing meth-
(IDFT) to obtain the resulting perturbed sequence. The
                                                                ods is the integration of STL decomposition. First, by
entire process can be expressed as perturbed f(D) =
                                                                incorporating the decomposition phase, we can identify
𝐼 𝐷𝐹 𝑇 (𝐿𝑃𝐴(𝐷𝐹 𝑇 (𝑓 (𝐷))), where 𝑓 is a function that maps
                                                                the core components, such as the trend and seasonality of
Table 1
Euclidean distance between the original and the perturbed time series.
                                Algorithm       Zone      epsilon1        epsilon2    epsilon3        epsilon4
                                LPA               Zone1   1.3418E+4       2.6102E+3   1.3052E+3       2.6694E+2
                                FPA                       3.0351E+0       4.7537E+0   2.2658E+0       7.6124E-8
                                sFPA                      9.6445E+0       1.2354E-7   1.4956E+0       2.3439E-8
                                tFPA                      6.1856E-7       5.2730E-7   1.8135E+0       7.6125E-8
                                LPA             Zone2     1.3004E+4       2.7051E+3   1.3226E+3       2.6037E+2
                                FPA                       4.9405E+0       1.3781E+0   1.7282E+0       0.3781E+0
                                sFPA                      1.6598E-7       1.9342E+0   6.4095E-7       4.4947E-8
                                tFPA                      4.7581E-7       1.0069E-7   1.2109E+0       0.3782E+0
                                LPA             Zone3     1.2928E+4       2.6534E+3   1.3374E+3       2.7253E+2
                                FPA                       2.7932E-7       3.8466E+0   1.1964E-6       2.5183E-7
                                sFPA                      1.3435E+1       2.4369E-7   1.6377E+0       0.2554E+0
                                tFPA                      2.1689E+1       2.2007E+0   9.7750E-8       2.5183E-7


Table 2
Comparison of △MAPE for each mechanism; △MAPE = |MAPE (DP algorithm) - MAPE (Original Series)|
                           epsilon1    epsilon2     epsilon3   epsilon4                    epsilon1     epsilon2   epsilon3   epsilon4
           LPA    Linear   0.7476      0.0081       0.0125     0.0068        SimpleDNN     2.2842       1.6252     0.5990     0.2236
           FPA             0.0013      0.0002       0.0002     0.0001                      1.5972       0.3159     0.7612     0.1213
           tFPA            0.0017      0.0005       0.0010     0.0000                      0.0602       1.0390     1.0390     0.2152
           sFPA            0.0094      0.0014       0.0001     0.0000                      1.5158       1.7738     0.1083     0.2468
           LPA    RNN      1.4877      0.0194       0.0437     0.0084        Transformer   1.1464       0.5750     0.4180     0.2477
           FPA             0.0169      0.0089       0.0004     0.0094                      0.0738       0.1268     0.0860     0.0941
           tFPA            0.0072      0.0081       0.0042     0.0004                      0.2325       0.0776     0.2083     0.0271
           sFPA            0.0117      0.0070       0.0007     0.0009                      0.0864       0.4175     0.1468     0.0832



time series data, which may contain critical information            mation, including temperature, humidity, wind
and are prone to attacks. One of the STL-DP mecha-                  speed, general diffuse flows, and diffuse flows.
nisms is referred to as sFPA, which is a method that injects      • Data information : Aggregated from 550,374 in-
noises only to the seasonal part of the decomposed series.          habitants according to Morocco Census [16].
Similarly, the approach of performing perturbations only
on the trend is named tFPA. Lastly, the perturbed compo-        Throughout the experiment, we set the privacy budget
nents from the seasonal or trend parts are combined with 𝜖1 − 𝜖4 as 0.48, 2.4, 4.8, and 24, respectively, and the
the rest of the unperturbed components to reconstruct sensitivity as 48, which are experimental settings taken
the form of the sequence.                                    from Günther, et al. [17].

                                                           Euclidean distance results. The degree of closeness
4. Experimental Results                                    between the original and the perturbed series under the
                                                           same privacy budget can be interpreted as the level of
Methods. We demonstrate the effectiveness of the pro-
                                                           utility. As shown in Table 1, LPA yields a greater distance
posed STL-DP by comparing the utility of our sFPA and
                                                           than other methods, confirming that FPA-based methods
tFPA with two baselines, LPA and FPA. Herein, we intro-
                                                           are better than LPA. Furthermore, our tFPA and sFPA
duce two different metrics to quantify utility. The first
                                                           are consistently ranked as the best algorithm in terms of
metric is the Euclidean distance between the original
                                                           Euclidean distance. These results indicate the superiority
and the perturbed series. Nextly, the original and the
                                                           of STL-DP over the baselines.
perturbed data are each fed into the forecasting model to
evaluate the respective Mean Absolute Percentage Error
                                                           Comparison on forecasting performances. Recall
(MAPE), and the difference between the two MAPEs is
                                                           that our objective is to generate noise-injected series that
used as the second metric.
                                                           minimize the performance drop of the forecasting model.
                                                           As shown in Table 2, we used four models, from a simple
Dataset. We used the power consumption data from
                                                           feed-forward neural network to advanced models such
2017-01-01 to 2017-12-31 of three zones of Tetouan city
                                                           as LSTM and Transformer as our forecasting model. The
located in northern Morocco [16]. The properties of the
                                                           models were trained to predict the upcoming 20 timesteps
dataset are summarized as follows:
                                                           given the past 60 timesteps. In most cases, as 𝜖 grew, the
     • Prediction variables : Power consumption of zone forecasting error difference (△MAPE) decreased. Not
        1, 2, and 3 of Tetouan city with additional infor-
surprisingly, both sFPA and tFPA outperformed other DP             Generative adversarial nets, Advances in neural
mechanisms for a majority of models in terms of △MAPE.             information processing systems 27 (2014).
                                                               [4] S. Papadimitriou, F. Li, G. Kollios, P. S. Yu, Time
                                                                   series compressibility and privacy, in: Proceedings
5. Conclusion and Future Work                                      of the 33rd international conference on Very large
                                                                   data bases, Citeseer, 2007, pp. 459–470.
As preliminary research, we introduce an effective DP
                                                               [5] Y. Zhu, Y. Fu, H. Fu, On privacy in time series
mechanism, STL-DP, specially designed for generating
                                                                   data mining, in: Pacific-Asia Conference on Knowl-
privacy-protected time series data. As the experiment
                                                                   edge Discovery and Data Mining, Springer, 2008,
results suggested, the distance between the original se-
                                                                   pp. 479–493.
ries and the perturbed series from sFPA and tFPA was
                                                               [6] Y. Zhu, Y. Fu, H. Fu, A new class of attacks on time
much closer than the other baselines. Also, the difference
                                                                   series data mining, Intelligent Data Analysis 14
between the MAPE of the original and the perturbed se-
                                                                   (2010) 405–418.
ries was significantly lower for our proposed sFPA and
                                                               [7] N. U. Sheikh, H. J. Asghar, F. Farokhi, M. A. Kaafar,
tFPA than for other perturbation algorithms. Therefore,
                                                                   Do auto-regressive models protect privacy inferring
we showed that considering the unique property of time
                                                                   fine-grained energy consumption from aggregated
series data improves the utility under the same privacy
                                                                   model parameters, IEEE Transactions on Services
budget. For future works, we plan to extend our research
                                                                   Computing (2021).
by designing a more advanced mechanism 𝐹 𝑃𝐴𝑘 , that
                                                               [8] V. Rastogi, S. Nath, Differentially private aggrega-
uses only the 𝑘 (<𝑛) Fourier coefficients as targets of the
                                                                   tion of distributed time-series with transformation
perturbation.
                                                                   and encryption, in: Proceedings of the 2010 ACM
                                                                   SIGMOD International Conference on Management
Acknowledgments                                                    of data, 2010, pp. 735–746.
                                                               [9] R. B. Cleveland, W. S. Cleveland, J. E. McRae, I. Ter-
The work was supported by the affiliated institute of              penning, Stl: A seasonal-trend decomposition, J.
ETRI [2022-075]. Also, this work was partially supported           Off. Stat 6 (1990) 3–73.
by the Basic Science Research Program through National        [10] W. G. Jacoby, Loess:: a nonparametric, graphical
Research Foundation of Korea (NRF) grant funded by                 tool for depicting relationships between variables,
the Korean Ministry of Science and ICT (MSIT) under                Electoral studies 19 (2000) 577–613.
No. 2020R1C1C1006004 and Institute for Information            [11] W. Huang, S. Zhou, T. Zhu, Y. Liao, Improving
& communication Technology Planning & evaluation                   utility of differentially private mechanisms through
(IITP) grants funded by the Korean MSIT: (No. 2022-                cryptography-based technologies: a survey, arXiv
0-01199, Graduate School of Convergence Security at                preprint arXiv:2011.00976 (2020).
Sungkyunkwan University), (No. 2022-0-01045, Self-            [12] J. Wang, S. Liu, Y. Li, A review of differential privacy
directed Multi-Modal Intelligence for solving unknown,             in individual data release, International Journal of
open domain problems), (No. 2022-0-00688, AI Platform              Distributed Sensor Networks 11 (2015) 259682.
to Fully Adapt and Reflect Privacy-Policy Changes), (No.      [13] H. Wang, Z. Xu, Cts-dp: publishing correlated
2021-0-02068, Artificial Intelligence Innovation Hub), (No.        time-series data via differential privacy, Knowledge-
2019-0-00421, AI Graduate School Support Program at                Based Systems 122 (2017) 167–179.
Sungkyunkwan University), and (No. 2021-0-02309, Ob-          [14] J. Cohen, W. Gorr, C. Durso, Estimation of crime sea-
ject Detection Research under Low Quality Video Condi-             sonality: a cross-sectional extension to time series
tion).                                                             classical decomposition, H. John Heinz III Working
                                                                   Paper (2003).
                                                              [15] A. Sutcliffe, X11 time series decomposition and sam-
References                                                         pling errors, Australian Bureau of Statistics, 1993.
                                                              [16] A. Salam, A. El Hibaoui, Comparison of machine
 [1] C. Dwork, F. McSherry, K. Nissim, A. Smith, Cali-
                                                                   learning algorithms for the power consumption
     brating noise to sensitivity in private data analysis,
                                                                   prediction:-case study of tetouan city–, in: 2018 6th
     in: Theory of cryptography conference, Springer,
                                                                   International Renewable and Sustainable Energy
     2006, pp. 265–284.
                                                                   Conference (IRSEC), IEEE, 2018, pp. 1–5.
 [2] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al., On
                                                              [17] G. Eibl, K. Bao, P.-W. Grassal, D. Bernau,
     data banks and privacy homomorphisms, Founda-
                                                                   H. Schmeck, The influence of differential privacy
     tions of secure computation 4 (1978) 169–180.
                                                                   on short term electric load forecasting, Energy
 [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
                                                                   Informatics 1 (2018) 93–113.
     D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,