Data Series Similarity Search via Deep Learning Qitong Wang1,† 1 Université Paris Cité, LIPADE Abstract A key operation for the (increasingly large) data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose to facilitate data series similarity search with deep learning techniques, involving both data series approximation and data series indexing. Our preliminary study focuses on developing Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture specially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. Finally, we propose a new sampling strategy, SEASam, that allows SEAnet to effectively train on massive datasets. Comprehensive experiments verify the advantages of DEA learned using SEAnet. These preliminary results can lead to further progress in this area, by developing more customized architectures and training strategies, better integrating DEA with index structures, learning novel data series indexes, and facilitating faster model training. Keywords data series, similarity search, indexing, neural networks, sampling 1. Introduction Nevertheless, SAX-based indexes suffer from the prob- lem that SAX fails in hard datasets with specific prop- With the rapid developments and deployments of modern erties [8]. Since SAX is the symbolization of Piecewise sensors, massive data series1 datasets are now being gen- Aggregate Approximation (PAA) [6], failure of PAA to erated, collected and analyzed in almost every scientific correctly represent some data series directly translates domain [1]. Data series similarity search aims to find the to failure of the PAA-based SAX. For example, the high closest series in a dataset to a given query series according frequency of Deep1B series means that each PAA seg- to a distance measure, such as Euclidean distance, which ment has to average many highly-varying points, leading is one of the most widely used [2]. Similarity search can to similar PAA values across different segments, and to be divided into exact search and approximate search [3]. indistinguishable SAX words across different series. In- Approximate similarity search may not always produce troducing more SAX words could alleviate the problem, the exact answers, but in most cases, it produces answers but would lead to an undesirably long summarization that are very close to the exact ones [4]. Thus, it is very that could not be effectively indexed. popular in practice, and widely used on massive series To address the aforementioned problems, we propose collections to enable interactive data exploration and to build a data series index based on Deep Embedding other latency-bounded applications [5]. In this work, we Approximations (DEA), i.e., data series summarizations focus on approximate similarity search under Euclidean derived from embeddings learned using deep neural net- distance. works. Embedding techniques, or representation learn- Indexes are widely employed to speed up data se- ing [9], is to learn vectors possessing necessary latent ries similarity search [3, 4]. Most indexes are based information for classification, clustering, and other down- on summarized representations of the data series [2] of stream applications. Embedding techniques have been lower dimensionality. Symbolic Aggregate approXima- proven to be capable of capturing frequency [10] and tion (SAX) [6] is a popular and effective discretized sum- other latent properties. However, data series embed- marization. SAX-based indexes [7] are the state-of-the- ding has not been adapted to and evaluated for similarity art (SOTA) data series similarity search methods [3, 4]. search (and could also be applied to other tasks, e.g., anomaly detection [11, 12, 13]). Proceedings of the VLDB 2022 PhD Workshop, September 5, 2022. Syd- ney, Australia. Specifically, we propose to replace traditional summa- † supervised by Themis Palpanas. rizations (e.g., PAA) with DEA, and then be symbolized $ qitong.wang@etu.u-paris.fr (Q. Wang) and indexed by an iSAX index. DEA targets to preserve © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License original pairwise distances in the lower-dimensional DEA Attribution 4.0 International (CC BY 4.0). CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 space. Thus, it is naturally capable of being symbolized 1 A data series, or data sequence, is an ordered sequence of points. into SAX, on which an iSAX index can be built. The most common type of data series is time series, where the Our preliminary results show that compared to PAA dimension that imposes the sequence ordering is time; though, this dimension could also be the mass, angle, or position [1]. and SAX (which is based on PAA), DEA better preserves Data Series pairwise distances, leading to a more effective index Collection for data series similarity search. This can be used as DEA- based a blueprint to facilitate further progress in this area. iSAX Index Query SEAsam sample (3.3) Promising further directions include developing more Answer Query customized architectures and training strategies based Train SEAnet (3.1) Construct index Approximate on observations from preliminary results, better inte- Embed series to DEA Answer grating DEA learning with index structure designs, and DEA-based SAXs Symbolize DEA facilitating faster DEA learning or transferring [14] on massive datasets. Calculate DEA-based SAX Indexing and Query Answering Note that existing studies on learned indexes [15] can- not be straightforwardly employed for facilitating data Figure 1: Workflow of DEA-based similarity search. series similarity search with deep learning techniques. This is true, because most existing methods assume that the data are sortable in a natural order, which can then be data series. 𝒮 denotes a collection of data series, i.e., captured by learned distribution functions [15]. However, 𝒮 = {𝑆1 , ..., 𝑆𝑛 }. We call 𝑛 the size of the data series such global orders for data series similarity search do not collection. A summarization 𝐸 = {𝑒1 , ..., 𝑒𝑙 } of a se- exist (since the order depends on the queries) [16]. Fur- ries 𝑆 is a lower, 𝑙-dimensional representation, which thermore, existing methods suitable for similarity search preserves some desired properties of 𝒮. For similarity are built upon grid indexes [17, 18], which do not scale to search, the target property is pairwise distance space the high dimensionalities (i.e., in the order of 100s-1000s) structure of 𝒮, i.e., ∀𝑆𝑖 , 𝑆𝑗 ∈ 𝒮, 𝑑′ (𝐸𝑖 , 𝐸𝑗 ) ≈ 𝑑(𝑆𝑖 , 𝑆𝑗 ), of data series. Hence, how to extend existing studies where 𝐸𝑖 , 𝐸𝑗 are summarizations of 𝑆𝑖 , 𝑆𝑗 , 𝑑(·, ·), and to resolve the aforementioned open problems remains a 𝑑′ (·, ·) are distance measures in series and summariza- challenging research direction. tion spaces, respectively. The distance measure 𝑑 In this work, we propose the following research direc- we use is Euclidean distance [2]. 𝑑′ in the summariza- tions: tion space √ needs√not be the same as 𝑑, e.g., for PAA, 1. [Architecture] Design novel architectures that 𝑑′ (·, ·) = 𝑚/ 𝑙 × 𝑑(·, ·). 𝑑′ for DEA is the same are specifically built to support high-quality DEA and as PAA if it’s scaled for SoS preservation. Otherwise, similarity search. Our preliminary solution, SEAnet (cf. 𝑑′ (·, ·) = 𝑑(·, ·). Given a query series 𝑆𝑞 of length 𝑚, Section 3.1), introduces and formalizes the principle of a series collection 𝒮 of size 𝑛 and length 𝑚, a distance Sum of Squares (SoS) preservation. measure 𝑑, similarity search targets to identify the se- 2. [Training Dataset] Propose novel sampling strate- ries 𝑆𝑐 ∈ 𝒮 whose distance to 𝑆𝑞 is the smallest, i.e., gies for massive data series collections, enabling effective ∀𝑆𝑜 ∈ 𝒮, 𝑆𝑜 ̸= 𝑆𝑐 , 𝑑(𝑆𝑐 , 𝑆𝑞 ) ≤ 𝑑(𝑆𝑜 , 𝑆𝑞 ). Instead of training for deep models. One such example is SEAsam finding the exact closest series 𝑆𝑐 , approximate simi- (cf. Section 3.2), which demonstrates that intelligent sam- larity search targets to find a series 𝑆𝑐′ ∈ 𝒮 such that pling strategies can help improve the performance of the 𝑑(𝑆𝑐′ , 𝑆𝑞 ) ≈ 𝑑(𝑆𝑐 , 𝑆𝑞 ). 𝑑(𝑆𝑐 , 𝑆𝑞 )/𝑑(𝑆𝑐′ , 𝑆𝑞 ) ∈ (0, 1] is deep network models. called 𝑆𝑐′ ’s tightness. 3. [Learned Indexes] Integrate index structure build- The most prominent data series indexing techniques ing into DEA learning to fully exploit the edges of DEA. can be categorized into optimized scans [19], and tree- Pushing further in this direction, it would be interesting based indexes [20]. Recent studies [3, 4] have demon- to learn a specifically designed index structure together strated that the SAX-based indexes [7] achieve SOTA per- with DEA learning. formance under several conditions. In this work, we use 4. [Model Training] Address the problem of the long MESSI as our iSAX index [21] , because its main-memory training times needed by the deep neural models (which operation and parallel design lead to SOTA performance. can be significantly slower than traditional approaches), by introducing transfer learning and domain adaptation techniques in this context. 3. DEA-based Similarity Search Figure 1 illustrates the proposed DEA-based data series 2. Background similarity search framework, including the SEAnet ar- chitecture. Given a series collection, SEAsam first draws A data series, 𝑆 = {𝑝1 , ..., 𝑝𝑚 }, is a sequence of points, representative samples to train SEAnet. After SEAnet where each point 𝑝𝑖 = (𝑣𝑖 , 𝑡𝑖 ), 1 ≤ 𝑖 ≤ 𝑚 is asso- converges, it embeds all series into DEAs, which are ciated to a real value 𝑣𝑖 and a position 𝑡𝑖 . The posi- further discretized into SAXs. Thus, DEA-based SAXs tion corresponds to the order of this value in the se- are structured into an iSAX index, where approximate quence. We call 𝑚 the length, or dimensionality of the similarity search can be efficiently conducted. Encoder LayerNorm2 … LayerNorm1 ConvLayer2 Dilation 2! Leaky ReLU ConvLayer1 𝒌 Stacked Linear1 Linear2 MaxPool Tanh Dilated Identity Leaky ReLU ResBlocks 1.9, … , 1.72 LayerNorm2 Up/ Down- sample ConvLayer1 Dilation 2! LayerNorm2 LayerNorm1 Leaky ReLU 𝒌′ Stacked ConvLayer1 Linear3 Linear2 Tanh2 Linear1 Link Leaky ReLU MaxPool Tanh Dilated ResBlocks LayerNorm1 … Decoder 𝑖-th Layer Dilated ResBlock (a) SEAnet Architecture (b) Dilated ResBlock in SEAnet Figure 2: The SEAnet architecture and the details of a dilated full-preactivation ResBlock. SEAnet is a novel autoencoder proposed to learn high- based on the largest SoS is equivalent to selecting the quality DEA (cf. Section 3.1). Moreover, it introduces the largest eigenvalues in linear dimensionality reductions principle of SoS preservation for lower dimensionality on z-normalized datasets, with the purpose of preserving representation learning (cf. Section 3.1.1). SEAsam makes information about the dataset through linear transforma- use of the inverse iSAX sortable summarization [22] (cf. tions [24]. Thus, SoS may be regarded as an indicator Section 3.2). of transformation quality. By keeping SoS invariant, the quality of DEAs is upheld from this perspective, and the 3.1. SEAnet Architecture networks then focus on learning the nonlinear transfor- mations. The SEAnet architecture is illustrated in Figure 2a. The We now elaborate on the architecture design and first part of the SEAnet encoder, from ConvLayer1 to model training under SoS preservation. Given the (z- MaxPool, comprises 𝑘 stacked dilated full-preactivation normalized) input dataset, SoS preservation requires two ResBlocks in Figure 2b for nonlinear transformations. steps: (1) z-normalizing the output of encoder (DEAs) The second part of the SEAnet encoder, from Linear1 and decoder (the √ reconstructed series); √ and (2) scaling to LayerNorm2, comprises two linear layers for dimen- the series by 1/ 𝑚 and DEA by 1/ 𝑙 in 𝐿𝐶 and 𝐿𝑅 . sionality reduction. Unlike most existing encoders with Based on theoretical analysis [25], we observe that linear final layers [23], the SEAnet encoder is finalized scaling series and DEA will not only keep the two dis- by LayerNorm2, which is specifically designed using the tances to the same level, but will also largely stabilize the SoS preservation principle. distance distributions. Thus, by z-normalizing DEA, and SEAnet is trained in a pairwise manner by mini- scaling series and DEA in 𝐿𝐶 and 𝐿𝑅 , SEAnet succeeds batched Stochastic Gradient Descent (SGD). Its loss func- in providing high-quality DEAs by preserving SoS. tion is a linear combination of two components: (1) The Compression Error 𝐿𝐶 (i.e., the average differences be- 3.2. Sampling with SEAsam tween the original distance of data series pairs (𝑆𝑖 , 𝑆𝑗 ) and their DEA distance) evaluates whether original dis- The representativeness of the training set upper bounds tances are well preserved in the DEA space. (2) The for the quality of the deep models. Not only we need our Reconstruction Error 𝐿𝑅 (i.e., the average distance be- sample to effectively cover the entire space of a given tween the original series 𝑆𝑖 and the reconstructed series) dataset, but also we need to efficiently select this sample 𝐿𝑅 evaluates how well the original series can be recon- without having to perform expensive computations on structed using SEAnet. the full dataset. To this end, we propose SEAsam (SEA Sampling), a 3.1.1. Sum of Squares Preservation novel data series sampling strategy based on the sortable data series representation, InvSAX [22]. Recall that SAX We propose a SoS preservation framework for effective first transforms the data series into 𝑙 real values , and DEA learning. SoS preservation has been observed be- then quantizes these real values, representing them using fore [24], but to the best of our knowledge, has never been discrete symbols [20]. The core observation is that ev- formally introduced to representation learning. Given ery subsequent bit in a SAX word contains a decreasing an 𝑛 × 𝑚 matrix 𝑀 , where each row 𝑀𝑖,* corresponds amount of information about the location of its corre- to a series and∑︀ each column 𝑀*,𝑗 corresponds to a po- sponding data point, and simply increases the degree of sition, SoS = 𝑖,𝑗 𝑀𝑖,𝑗 2 . Note that defining new axes precision. Interleaving SAX’s bits such that all significant Y axis: 1st BSF Tightness X axis: Number of Series Examined PAA SEAnet SEAnet-nD FDJNet TimeNet Incept 0.90 0.90 0.97 0.90 0.90 0.90 0.80 0.90 0.95 0.80 0.80 0.70 0.85 0.93 0.85 0.80 0.90 0.70 0.60 100 500 1k 5k 10k 100 500 1k 5k 10k 100 500 1k 5k 10k 100 500 1k 5k 10k 100 500 1k 5k 10k 100 500 1k 5k 10k 100 500 1k 5k 10k (a) RandWalk (b) F5 (c) F10 (d) SALD (e) Deep1B (f) Seismic (g) Astro Figure 3: Approximate query answers quality: 1st BSF tightness vs number of series visited (higher is better); 100M series bits across each SAX word precede all less significant bits firstly introduced SoS preservation principle, for effec- presents a value array with descending significance, i.e., tively learning DEA. A new sampling strategy, SEAsam, is InvSAX. SEAsam orders the series collection by their introduced in order to facilitate SEAnet’s training on mas- InvSAX representations, and draws samples at equal- sive collections. We demonstrate that the DEA learned by intervals (e.g., every 1,000 series) from this sorted order. SEAnet more closely approximates the original data se- Thus, SEAsam samples are expected to preserve the dis- ries distances, better preserves the true nearest neighbors tribution of the series collection by evenly covering its in the summarized space, better reconstructs the original InvSAX space. Moreover, the time complexity of SEAsam series, and leads to better similarity search results than is 𝒪(𝑛𝑚), and the space complexity of SEAsam is 𝒪(𝑛𝑙), the SOTA PAA-based iSAX (when examining either a rendering SEAsam an efficient strategy. small, or a large number of candidates). These prelimi- nary results are very promising, they set the ground for further advancements in this area, and have the potential 4. Preliminary Results to also improve the performance of kNN classification, anomaly detection, and other similarity search-based ap- We present our experimental evaluation of SEAnet, DEA- plications. based data series similarity search, and SEAsam using 7 Promising directions in our future studies include the diverse synthetic and real datasets. Totally, 5,040 deep following: models were trained to provide a thorough profile of DEA 1. Develop more customized architectures and train- architectures. In summary, the results demonstrate that ing strategies. An interesting candidate would be to the SEAnet DEA is robust across various dataset proper- quantify with differentiability the nearest neighborhood ties and outperforms its competitors by better preserving preservation in the DEA space [28], which shows posi- original pairwise distances and nearest neighborhood tive correlations with the qualities of query answers in structure, leading to better approximate similarity search our preliminary results. results than traditional (PAA-based) and alternative deep 2. Investigate the lower bounding properties for DEA learning (DEA-based using FDJNet [23], TimeNet [26], that will enable exact similarity search [29]. and InceptionTime [27]) approaches. 3. Integrate DEA learning with index structure build- We evaluate the benefit of using DEA for similarity ing. Such an end-to-end framework will have more po- search, by reporting the 1st Best-So-Far (BSF) tightness, tential to reduce information loss during the DEA and i.e., the 1st Nearest-Neighbor (NN) distance divided by indexing steps. Candidate index structures could be ex- the 1st BSF distance given a specific query, as a function tended from trees[30] to clusters [31] and hash tables [16]. of the number of series that the similarity search algo- DEA and index structure could be learned together to rithm examines. The results on 100M datasets and 1K fully exploit advantages from both sides. queries, are shown in Figure 3. SEAnet-nD is an encoder- 4. Design more powerful sampling strategies [32] to only version of SEAnet. SEAnet improved the 1st BSF cover the large (pairwise distances) space, whose size tightness, and thus the similarity search results, in 61 is 𝒪(𝑛2 ) (where 𝑛 is the number of series in the collec- out of the 63 experiments. Its advantage was particularly tion). Ideally, a small sample should train models able to obvious on the hard datasets, namely, Deep1B, Seismic, efficiently serve any ad-hoc query. and Astro (detailed experimental results in [25]). 5. Facilitate faster DEA learning on massive datasets. Promising techniques include incremental learning [33] 5. Discussion and Conclusions and transfer learning [14]. How to identify useful com- mon information and how to best transfer this knowledge In this paper, we introduce the use of deep learning em- between massive datasets makes this a very challenging beddings, DEA, for data series similarity search. We pro- problem. pose a novel autoencoder, SEAnet, designed under the 6. Benchmark data series summarizations for similar- ity search [34, 13]. We will design a unified workflow and proper metrics to evaluate different summarization Multivariate Data Series Classification, in: SIG- techniques, based on a set of representative data series MOD, 2022. collections for similarity search. The compatibility be- [13] J. Paparrizos, Y. Kang, P. Boniol, R. S. Tsay, T. Pal- tween different summarization techniques and indexing panas, M. J. Franklin, TSB-UAD: An End-to- techniques [35, 30] will also need to be studied. End Benchmark Suite for Univariate Time-Series Anomaly Detection, PVLDB (2022). [14] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, Acknowledgments H. Xiong, Q. He, A comprehensive survey on trans- fer learning, PIEEE (2020). Work supported by ANR-18-IDEX-000, Chinese Schol- [15] T. Kraska, A. Beutel, E. H. Chi, J. Dean, N. Polyzotis, arship Council, HIPEAC 4, GENCI–IDRIS (Grant 2020- The case for learned index structures, in: SIGMOD, 101471), and NVIDIA Corporation for the Titan Xp GPU 2018. donation used in this research. [16] M. Li, Y. Zhang, Y. Sun, W. Wang, I. W. Tsang, X. Lin, I/O efficient approximate nearest neighbour search References based on learned functions, in: ICDE, 2020. [17] V. Nathan, J. Ding, M. Alizadeh, T. Kraska, Learning [1] T. Palpanas, V. Beckmann, Report on the first and multi-dimensional indexes, in: SIGMOD, 2020. second interdisciplinary time series analysis work- [18] J. Ding, V. Nathan, M. Alizadeh, T. Kraska, Tsunami: shop (itisa), SIGMOD Record (2019). A learned multi-dimensional index for correlated [2] X. Wang, A. Mueen, H. Ding, G. Trajcevski, data and skewed workloads, PVLDB (2020). P. Scheuermann, E. J. Keogh, Experimental compar- [19] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A. E. ison of representation methods and distance mea- Abbadi, Vector approximation based indexing for sures for time series data, DMKD (2013). non-uniform high dimensional data sets, in: CIKM, [3] K. Echihabi, K. Zoumpatianos, T. Palpanas, H. Ben- 2000. brahim, The lernaean hydra of data series similarity [20] J. Shieh, E. Keogh, isax: indexing and mining ter- search: An experimental evaluation of the state of abyte sized time series, in: KDD, 2008. the art, PVLDB (2018). [21] B. Peng, P. Fatourou, T. Palpanas, MESSI: In- [4] K. Echihabi, K. Zoumpatianos, T. Palpanas, H. Ben- Memory Data Series Indexing, ICDE, 2020. brahim, Return of the lernaean hydra: experimen- [22] H. Kondylakis, N. Dayan, K. Zoumpatianos, T. Pal- tal evaluation of data series approximate similarity panas, Coconut: A scalable bottom-up approach search, PVLDB (2019). for building data series indexes, PVLDB (2018). [5] A. Gogolou, T. Tsandilas, K. Echihabi, A. Bezeri- [23] J.-Y. Franceschi, A. Dieuleveut, M. Jaggi, Unsuper- anos, T. Palpanas, Data series progressive similar- vised scalable representation learning for multivari- ity search with probabilistic quality guarantees, in: ate time series, in: NeurIPS, 2019. SIGMOD, 2020. [24] S. Wold, K. Esbensen, P. Geladi, Principal compo- [6] J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic nent analysis, Chemometrics and intelligent labo- representation of time series, with implications for ratory systems (1987). streaming algorithms, in: SIGMOD, 2003. [25] Q. Wang, T. Palpanas, Deep learning embeddings [7] T. Palpanas, Evolution of a data series index, in: for data series similarity search, in: KDD, 2021. ISIP, 2019. [26] P. Malhotra, V. TV, L. Vig, P. Agarwal, G. M. Shroff, [8] O. Levchenko, B. Kolev, D. E. Yagoubi, R. Akbarinia, Timenet: Pre-trained deep recurrent neural net- F. Masseglia, T. Palpanas, D. Shasha, P. Valduriez, work for time series classification, in: ESANN, Bestneighbor: Efficient evaluation of knn queries 2017. on large time series databases, KAIS (2020). [27] H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, [9] Y. Bengio, A. C. Courville, P. Vincent, Represen- D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, tation learning: A review and new perspectives, P. Muller, F. Petitjean, Inceptiontime: Finding PAMI (2013). alexnet for time series classification, DMKD (2020). [10] J. Wang, Z. Wang, J. Li, J. Wu, Multilevel wavelet de- [28] L. Van der Maaten, G. Hinton, Visualizing data composition network for interpretable time series using t-sne., JMLR 9 (2008). analysis, in: KDD, 2018. [29] P. Indyk, R. Motwani, P. Raghavan, S. S. Vempala, [11] P. Boniol, T. Palpanas, Series2Graph: Graph-based Locality-preserving hashing in multidimensional Subsequence Anomaly Detection for Time Series, spaces, in: SOTC, 1997. PVLDB 13 (2020). [30] K. Echihabi, P. Fatourou, K. Zoumpatianos, T. Pal- [12] P. Boniol, M. Meftah, E. Remy, T. Palpanas, dCAM: panas, H. Benbrahim, Hercules Against Data Series Dimension-wise Activation Map for Explaining Similarity Search, PVLDB (2022). [31] H. Jégou, M. Douze, C. Schmid, Product quantiza- tion for nearest neighbor search, PAMI (2011). [32] C. Wu, R. Manmatha, A. J. Smola, P. Krähenbühl, Sampling matters in deep embedding learning, in: ICCV, 2017. [33] Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, Y. Fu, Large scale incremental learning, in: CVPR, 2019. [34] R. Marcus, A. Kipf, A. van Renen, M. Stoian, S. Misra, A. Kemper, T. Neumann, T. Kraska, Bench- marking learned indexes, PVLDB (2020). [35] G. Chatzigeorgakidis, D. Skoutas, K. Patroumpas, T. Palpanas, S. Athanasiou, S. Skiadopoulos, Effi- cient Range and kNN Twin Subsequence Search in Time Series, TKDE (2022). A. Online Resources The source code, pretrained models, and datasets have been made available at http://www.mi.parisdescartes.fr/ ~themisp/seanet.