Scalable Inference in Dynamic Admixture Models Patrick Jähnichen, Florian Wenzel, and Marius Kloft {jaehnicp, wenzelfl, kloft}@hu-berlin.de Machine Learning Group Humboldt-University of Berlin Germany Dynamic probabilistic models are standard in various time-series applications, including weather forecasting, stock market analysis, and robotics. Typically such models consist of a diffusion model that governs the state of the system and a model of measuring this state. As an example consider the simple non- mixture time series model µt = µt−1 + vt , vt ∼ N (0, ν 2 ) xt = µt + wt , wt ∼ N (0, σ 2 ). where µt is the state of the system at time t and xt is a noisy measurement of that state. Note that this kind of model is akin to the well-known Kalman filter. A drawback of such a simple model is that it does not capture data that is a mixture of several possible components in varying proportions. Such data emerges in e.g. corpora modeling where each document is comprised of words that are generated by different themes that are present in the corpus and underly a time dynamic. An example of a more complex model is the continuous time dynamic topic model (cDTM) [6]. In this model the time structure of the mixture components is modeled in terms of a Markov chain. We generalize this approach to general Gaussian processes (GPs). This allows for more flexible modeling of the diffusion process (time structure) by changing the GP covariance function, capturing a wider variety/combination of mixture component dynamics. Inference in these models is a major challenge. The posterior we seek is generally intractable and we must appeal to an approximation. Up until recently, state-of-the-art approaches used variational inference as in [3, 6] and our own preliminary research [4]. As these approaches are limited to rather small datasets, [2] recently applied a stochastic gradient Langevin dynamics sampler [7] which allows for inference in these models using larger numbers of datapoints. However, [1, 5] have shown that this approach is amenable to considerable improvements. We develop an inference method which is based on more evolved stochastic gradient based sampling techniques (as e.g. [2]) leading to a novel robust in- ference method which is applicable to millions of data points. Our preliminary empirical findings suggest that we can improve performance in terms of accuracy and speed over the state-of-the-art methods. Bibliography [1] S. Ahn, A. Korattikara, and M. Welling. Bayesian posterior sampling via stochastic gradient Fisher scoring. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, 2012. [2] A. Bhadury, J. Chen, J. Zhu, and S. Liu. Scaling up Dynamic Topic Models. In Proceedings of the 25th International . . . , 2016. [3] D. M. Blei and J. D. Lafferty. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 2006. [4] P. Jähnichen. Time dynamic topic models. PhD thesis, Leipzig University, Leipzig, Mar. 2016. [5] S. Mandt, M. D. Hoffman, and D. M. Blei. A Variational Analysis of Stochas- tic Gradient Algorithms. arXiv.org, 2016. [6] C. Wang, D. M. Blei, and D. Heckerman. Continuous time dynamic topic models. Proc. of UAI, 2008. [7] M. Welling and Y.-W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning, pages 681–688, 2011.