Distributed LDA based Topic Modeling and Topic Agglomeration in a Latent Space Gopi Chand Nutakki Olfa Nasraoui Knowlede Discovery & Web Mining Lab Knowlede Discovery & Web Mining Lab University of Louisville University of Louisville g0nuta01@louisville.edu olfa.nasraoui@louisville.edu Behnoush Abdollahi Mahsa Badami Knowlede Discovery & Web Mining Lab Knowlede Discovery & Web Mining Lab University of Louisville University of Louisville b0abdo03@louisville.edu m0bada01@louisville.edu Wenlong Sun Knowlede Discovery & Web Mining Lab University of Louisville w0sun005@louisville.edu sentiment detection is performed to partition the tweets based on polarity, prior to topic Abstract modeling. We describe the methodology that we followed to automatically extract topics corresponding 1 Introduction to known events provided by the SNOW 2014 The SNOW 2014 challenge was organized within the challenge in the context of the SocialSensor context of the SocialSensor project1 , which works on project. A data crawling tool and selected fil- developing a new framework for enabling real-time tering terms were provided to all the teams. multimedia indexing and search in the Social Web. The crawled data was to be divided in 96 The aim of the challenge was to automatically extract (15-minute) timeslots spanning a 24 hour pe- topics corresponding to known events that were pre- riod and participants were asked to produce a scribed by the challenge organizers. Also provided, fixed number of topics for the selected times- was a data crawling tool along with several Twitter fil- lots. Our preliminary results are obtained us- ter terms (syria, ukraine, bitcoin, terror). The crawled ing a methodology that pulls strengths from data was to be divided in a total of 96 (15-minute) several machine learning techniques, including timeslots spanning a 24 hour period, with a goal of Latent Dirichlet Allocation (LDA) for topic extracting a fixed number of topics in each timeslot. modeling and Non-negative Matrix Factoriza- Only tweets up to the end of the timeslot could be tion (NMF) for automated hashtag annota- used to extract any topic. In this paper, we focuse on tion and for mapping the topics into a latent the topic extraction task, instead of input data filter- space where they become less fragmented and ing, or presentation of associated headline, tweets and can be better related with one another. In ad- image URL, because this was one of the activities clos- dition, we obtain improved topic quality when est to the ongoing research [AN12, HN12, CBGN12] Copyright c by the paper’s authors. Copying permitted only on multi-domain data stream clustering in the Knowl- for private and academic purposes. edge Discovery & Web Mining Lab at the University of In: S. Papadopoulos, D. Corney, L. Aiello (eds.): Proceedings Louisville. To extract topics from the tweets crawled of the SNOW 2014 Data Challenge, Seoul, Korea, 08-04-2014, published at http://ceur-ws.org 1 SocialSensor: http://www.socialsensor.eu/ Table 1: Description of used variables. Symbol Description M Number of documents in collection W Number of distinct words in vocabulary N Total number of words in collection K Number of topics xdi ith observed word in document d zdi Topic assigned to xdi Nwk Count of word assigned to topic Ndk Count of topic assigned in document φk Probability of word given topic k θd Probability of topic given document d Figure 1: Topic Modeling Framework (sentiment de- α,β Dirichlet priors tection and hashtag annotation are not shown). in each time slot, we use a Latent Dirichlet Allocation Algorithm 1 Latent Dirichlet Allocation. (LDA) based technique. We then discover latent con- Input: A document collection, hyper-parameters α cepts using Non-negative Matrix Factorization (NMF) and β. on the resulting topics, and apply hierarchical cluster- ing within the resulting Latent Space (LS) in order to Output: A list of topics. agglomerate these topics into less fragmented themes 1. Draw a distribution over topics, θd ∼ Dir(α) that can facilitate the visual inspection of how the dif- ferent topics are inter-related. We have also experi- 2. For Each word i in the document: mented with adding a sentiment detection step prior to topic modeling in order to obtain a polarity sensitive 3. Draw a topic index zdi ∈ {1, · · · , K} topic discovery, and automated hashtag annotation to from the topic weights zdi ∼ θd . improve the topic extraction. 4. Draw the observed word wdi 2 Background from the selected topic, wdi ∼ βzdi 2.1 Latent Dirichlet Allocation Latent Dirichlet Allocation (LDA) is a Bayesian prob- ˆ abilistic model for text documents. It assumes a col- lection of K topics where each topic defines a multi- p (D|α, β) = ΠM d=1 p (θd |α) nomial over the vocabulary, which is assumed to have X ! been drawn from a Dirichlet process [BNJ03][HBB10]. ΠNd n=1 p (zdn |θd ) p (wdn |zdn , β) dθd Given the topics, LDA assumes the generative process zdn for each document d, shown in Algorithm 1, where the notation is listed in Table 1. Equation 1 gives the joint The posterior is usually approximated using Markov distribution of a topic mixture θ, a set of N topics z, Chain Monte Carlo (MCMC) methods or variational and a set of N words w for parameters α and β. inference. Both methods are effective, but face signif- icant computational challenges in the face of massive data sets. For this reason, we concentrated on a dis- N Y tributed version of LDA which is summarized in the p (θ, z, w|α, β) = p (θ|α) p (zn |θ) p (wn |zn , β) (1) next section. n=1 Integrating over θ and summing over z, we obtain 2.2 Distributed Algorithms for LDA the marginal distribution of a document [BNJ03]: It is possible to distribute non-collapsed Gibbs sam- ˆ ! pling, because sampling of zdi can happen indepen- X dently given θd and φk , and thus can be done concur- p (w|α, β) = p (θ|α) ΠN n=1 p (zn |θ) p (wn |zn , β) dθ rently. In a non-collapsed Gibbs sampler, one samples zn zdi given θd and φk , and then θd and φk given zdi . If Taking the product of the marginal probabilities of individual documents are not spread across different single documents, the probability of a corpus D can processors, one can marginalize over just θd , since θd is be obtained: processor-specific. In this partially collapsed scheme, the latent variables zdi on each processor can be con- Algorithm 3 Approximate Distributed LDA currently sampled where the concurrency is over pro- [NASW09]. cessors. The slow convergence of partially collapsed Input: A list of M documents, x = and non-collapsed Gibbs samplers (due to the strong {x1 , · · · , xp , · · · , xP } dependencies between the parameters and latent vari- ables) has led to devising distributed algorithms for Output: z = {z1 , · · · , zp , · · · , zP } fully collapsed Gibbs samplers [NASW09][YMM09]. Given M documents and P processors, with ap- 1. Repeat proximately MP = M P documents, distributed on each 2. For each processor p in parallel do processor p, the M documents are partitioned into x = {x1 , · · · , xp , · · · , xP } and z = {z1 , · · · , zp , · · · , zP } be- 3. Copy global counts: Nwkp ← Nwk ing the corresponding topic assignments, where pro- cessor p stores xp , the words from documents j = 4. Sample zp locally: (p − 1) MP + 1, · · · , pMP and zP , the corresponding LDAGibbsItr(xp ,zp ,Ndkp ,Nwkp ,α,β) // topic assignments. Topic-document counts Ndk are Alg: 2 likewise distributed as Ndkp . The word-topic counts Nwk are also distributed, with each processor p keep- 5. Synchronize ing a separate local copy Nwkp . 6. Update global counts: Algorithm 2 Standard Collapsed Gibbs Sampling. P Nwk ← Nwk + p (Nwkp − Nwk ) LDAGibbsItr( |xp |, zp , Ndkp , Nwkp , α, β): 7. Until termination criterion is satisfied 1. For Each d ∈ {1, · · · , M } 2. For Each i ∈ {1, · · · , Ndkp } of the AD-LDA algorithm which can terminate after a fixed number of iterations, or based on a suitable 3. v ← xdpi , Tdpi ← Ndkpi MCMC convergence metric. The AD-LDA algorithm samples from an approximation to the posterior distri- 4. For Each j ∈ {1, · · · , Tdkpi } bution by allowing different processors to concurrently sample topic assignments on their local subsets of the 5. k̂ ← zdpij data. AD-LDA works well empirically and accelerates 6. Ndkp ← Ndkp − 1, Nwkp ← Nwkp − 1 the topic modeling process. 7. For k = 1 to K 3 Topic Extraction Methodology 8. ρk ← ρk−1 + (Ndkp + α) 3.1 Data Preprocessing P × (Nkwp + β) / ( w0 Nw0 k ) + N β The dataset consists of tweets that were acquired from the Twitter servers by continuous querying using a 9. x ∼ U nif ormDistribution(0, ρk ) wrapper for the Twitter API over a period of 24 hours. The batch of tweets are acquired in raw JSON2 format.   10. k̂ ← BinarySearch k̂ : ρk̂−1 < x < ρk̂ Various properties of the tweet such as the hashtags, 11. Ndk̂p ← Ndk̂p + 1,Nwk̂p ← Nwk̂p + 1 URLs, creation time, counts for retweets and favorites, and other user information including the encoding and 12. zdpij ← k̂ language are extracted. The hashtags can provide a good source for creating discriminating features and they were folded as terms into the bag of words model Although Gibbs sampling is a sequential process, for each tweet where they were present (without the given the typically large number of word tokens com- ’#’ prefix). The URLs can also later provide a method pared to the number of processors, the dependence of to achieve topic summarization. zij on the update of any other topic assignment zi0 j 0 is likely to be weak, thus relaxing the sequential sampling 3.2 Topic Extraction Stages constraint. If two processors are concurrently sam- pling, but with different words in different documents, The technique assumes a real time streaming data in- then concurrent sampling will approximate sequential put and is replicated using process calls to the stor- sampling. This is because the only term affecting the age records containing the tweets. For AD-LDA, each order ofP the update operations is the total word-topic 2 JSON: JavaScript Object Notation, is a text-based open counts w Nwk . Algorithm 3 shows the pseudocode standard designed for human-readable data interchange Figure 2: Dendrogram depicting few clusters’ hierarchy of top- ics from the initial window. Agglomeration is based on the cosine similarity. Average-Linkage Agglomerative Hierarchical Clustering was used. Distance is computed as (1 − similarity). Refer to the electronic version of the paper for clarity tweet is considered as a single document. Figure 1 shows the steps performed to extract the topics in each Figure 3: Portion of dendrogram depicting the clusters’ hier- window or time slot. The procedure starts with the archy of topics from the initial window (Number 0). Agglomera- extraction of key information from the Twitter JSON, tion is based on the dot product between the topics’ projections then the tweet text and other properties are used to on a lower dimensional latent space extracted using NMF with extract topics. The topic extraction is performed using kf = 30 factors. Average-Linkage Agglomerative Hierarchical the following steps: Clustering was used. Distance is computed as 1 minus similar- 1. The documents are stripped of non-English char- ity. Refer to the electronic version of the paper for clarity. acters and are converted to lowercase. The stop Bayes classifier [LGD11] trained with labeled tweet words are retained for the context information (es- samples4 and a set of labeled tokens5 with known sen- pecially for sentiment detection). timent polarity can be used to extract the sentiment levels. The tweets are then regrouped based on the 2. Groups of documents are assembled into windows sentiment level and the topic modeling is applied on based on their timestamp. A sliding window’s each group, resulting in topics that are confined to one width is equal to three consecutive time slots end- sentiment, as illustrated in Table 2. ing in the current time slot. 3. AD-LDA technique is performed on a 20 node 5 Topic Agglomeration in a Latent cluster. From each sliding window iteration a to- Space tal of 1000 topics are extracted, this higher value help in extracting finer topics. 5.1 Discovering Latent Factors Among the Discovered Topics Using Non-negative 4. The topics are ranked based on the proportion of Matrix Factorization (NMF) tweets assigned to the topic in the given window, Because initial topic modeling generated a high num- and then can be clustered/merged together to or- ber of topics (1000 topics per window), that were fur- ganize them into more general topic groups. thermore very sparse in terms of the descriptive terms The jsoup open source HTML parser3 was used to ex- within them, these topics were hard to interpret and tract multimedia content such as images and metadata could benefit from a coarser, less fragmented organi- from the URLs extracted from the tweets. The head- zation. One way to fix this problem was to merge lines are part of the metadata while the keywords are the topics based on a conceptual similarity by apply- obtained from the topic modeling itself as the terms ing Non-negative Matrix Factorization (NMF) [LS99]. with highest probability in the topic. Because the topics by words matrix is a very sparse matrix, we used NMF to project the topics onto a 4 Topic Extraction with AD-LDA common lower-dimensional latent factors’ space. NMF takes as input the matrix X of n topics by m words and Sentiment Labels (as binary features) and decomposes it into two fac- The AD-LDA technique with Gibbs sampling, along tor matrices (A and B) which represent the topics and with automatically extracted sentiment labels can also words, respectively, in a kf -dimensional latent space, be used to extract polarity-sensitive topics. Using sen- as follows: timent labels may improve the quality of the topics as it results in finer topics. Figure 1 depicts the general Xn×m ' An×kf BT m×kf (2) flow within the used methodology. A weighted Naive 4 https://github.com/ravikiranj/twitter-sentiment-analyzer 3 http://jsoup.org/ 5 https://github.com/linron84/JST Positive Sentiment Negative Sentiment optimistic ukraine antiwar nonintervention horrible building badge hiding ukraine yanukovych syria refugees about education children million syria yarmouk camp crisis food waiting unrest shocking future technology bitcoins value law accelerating cnn protocols loss gox bitcoin fault Table 2: Illustrating a sample of the finer topics extracted after a preliminary sentiment detection phase. Algorithm 4 Basic Alternating Least Square (ALS) Algorithm for NMF Input: Data matrix X, number of factors kf Output: optimal matrices A and B 1. Initialize matrix A (for example randomly) 2. Repeat (a) Solve for B in the equation: AT AB = AT X (b) Project solution onto non-negative matrix subspace: set all negative values in B to ze- ros Figure 4: Portion of dendrogram depicting the clusters’ hier- (c) Solve for A in the equation: BBT AT = archy of topics from the first 6 windows. Agglomeration is based BXT on the dot product between the topics’ projections on a lower (d) Project solution onto non-negative matrix dimensional latent space extracted using NMF with kf = 30 subspace: set all negative values in A to ze- factors. Average-Linkage Agglomerative Hierarchical Cluster- ros ing was used. Distance is computed as 1 minus similarity. Refer to the electronic version of the paper for clarity. 3. Until Cost function decrease is below threshold where kf is the approximated rank of matrices A 5.2 Topic Organization Stages: Topic Fea- and B, and is selected such that kf < min(m, n), so ture Extraction, Latent Space Compu- that the number of elements in the decomposition ma- tation using NMF, Latent Space-based trices is far less than the number of elements of the Topic Similarity Computation, and Hier- original matrix: nkf + kf m  nm. archical Clustering In the following, we summarize the steps that are ap- Topics factor (A) can then be used to find the sim- plied post-discovery of the topics, in order to generate ilarity between the topics in the new latent space in- a hierarchical organization from the sparse topics. stead of using the original space of original terms. the obtained similarity matrix from the NMF factors can 1. Preprocessing of the topic vectors: For each win- finally be used to cluster the topics. dow, the topic-word matrix (Xn×m ) is extracted from the final topic modeling results. The fea- To find A and B, the Frobenius norm of errors be- tures are the top words in a topic, and they are tween the data and the approximation is optimized, as binary (1 if a topic has the word in question and follows 0 otherwise). 2. Latent Factor Discovery using NMF : The topic- word data was normalized before running NMF. The latter produces two factors (A and B), where JN M F = ||E||2F = || X − ABT ||2F (3) n, kf and m are the number of topics, latent fac- tors and words, respectively. Our main goal was to compute the Matrix A, also called the topics basis factor, which transfers the topics to the la- Several algorithms have been proposed in the liter- tent space. Choosing the number of factors), kf , ature to minimize this cost. We used an Alternating has an impact on the results,. After trial and er- Least Square (ALS) method [PT94] that iteratively ror, we chose kf = 30. solves for the factors, by assuming that the problem is convex in either one of the factor matrices alone. 3. Generating the topic-similarity matrix in the la- perplexity score indicating better generalization per- formance.  For a test set of T documents D0 = →(1) →(T ) w ,··· ,w and Nd being the total number of keywords in dth document, the perplexity given in Equation 4, will be lower for a better topic model. Figure 5 shows the perplexity trends, suggesting that more topics result in lower (thus better) perplexity. Also, irrespective of the number of topics, AD-LDA- Figure 5: Perplexity trends for each sliding window of based topic modeling can extract topics of good qual- width three for various numbers of extracted topics. ity. tent space: The computed topic basis matrix (A)    PT →(d) →− was used to obtain the similarity in lieu of the    d=1 ln p w | α , β    original topic vectors. The normalized inner prod- 0 perplexity (D ) = exp − PT uct of the matrix A and its transpose was calcu- d=1 Nd       lated for this purpose. Normalization of the prod- uct is equivalent to computing the Cosine simi- (4) larity between topic pairs. The resulting matrix contains the pairwise similarity between each pair 6.2 Sentiment Based Topic Modeling of topics within the latent space. Table 2 shows a subset of topics, extracted from the positive and negative sentiment groups of tweets, and 4. Hierarchical Clustering of the latent space- these tend to be more refined than the standard un- projected topics based on the new pairwise similar- sentimental topics. From the initial window, 1000 top- ity scores computed in Step 3 : we experimented ics were extracted in the same way as the Distributed with several linkage strategies such as single and LDA, however topic modeling was preceded by a senti- average linkage. The latter was chosen as optimal. ment classifier that classifies the tweets based on their sentiment (positive or negative). Although positive 5.3 Automated Hashtag Annotation and negative topics still share a few keywords, they are clearly divided by sentiment. We have also experimented with a simple tag comple- tion or prediction step prior to topic modeling. An- 6.3 Topic Clustering in the Latent Space notation for a given tweet is determined by finding the top frequent tags associated with the KLS 6 near- Figure 3 shows the topic clusters created using the est neighboring tweets in the NMF-computed Latent latent space-projected features extracted using NMF. Space to the given tweet. Once the tags are com- The clusters in Figure 3 seem to have better quality pleted, they are used to enrich the tweets before topic compared to the clusters in Figure 2 because of the modeling. Of course, only the tweets bag of word de- more accurate capture of pairwise similarities between scriptions in a given window are used to compute the topics in the conceptual space. Figure 4 shows the NMF for that window’s topic modeling. The annota- clustering of the top 10 topics for a series of 6 windows, tion generally resulted in lower Perplexity of the ex- showing how the agglomeration can consolidate the tracted topic models, as shown in Figure 6. topics discovered at different time slots, helping avoid excessive fragmentation throughout the stream’s life. 6 Results 6.4 Automated Hashtag Annotation 6.1 Distributed LDA-based Topic Modeling Tweet data is very sparse and not every tweet has valu- able tags. To overcome this weakness, we applied an Figure 2 shows7 a sample of the topic clusters’ hi- NMF-based automated tweet annotation before topic erarchy extracted from the initial window and with- modeling. Adding the predicted hashtags to the tweets out NMF-based latent space projection of the top- enhanced the topic modeling. The automated tag an- ics. The clusters are of debatable quality. Per- notation, described in Section 5.3, generally resulted plexity is a common metric to evaluate language in lower Perplexity of the extracted topic models, as models [BL06][BNJ03]. It is monotonically decreas- shown in Figure 6, suggesting that the auto-completed ing in the likelihood of the test data, with a lower tags did help complete some missing and valuable in- 6 we report results for K LS = 5 formation in the sparse tweet data, thus helping the 7 Refer to the electronic version of the paper for clarity. topic modeling. [BM10] David M Blei and Jon D McAuliffe. Su- pervised topic models. arXiv preprint arXiv:1003.0783, 2010. [BNJ03] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003. Figure 6: Perplexity for different numbers of topics [CBGN12] Juan C Caicedo, Jaafar BenAbdallah, and varying window length, showing improved results Fabio A González, and Olfa Nasraoui. when NMF-based automated tweet annotation is per- Multimodal representation, indexing, au- formed before topic modeling. tomated annotation and retrieval of image 7 Conclusion collections via non-negative matrix fac- torization. Neurocomputing, 76(1):50–60, Using Distributed LDA topic modeling, followed by 2012. NMF and hierarchical clustering within the resulting Latent Space (LS), helped organize the topics into less [HBB10] Matthew Hoffman, David M Blei, and fragmented themes. Sentiment detection prior to topic Francis Bach. Online learning for latent modeling and automated hashtag annotation helped dirichlet allocation. Advances in Neural improve the learned topic models, while the agglom- Information Processing Systems, 23:856– eration of topics across several time windows can link 864, 2010. the topics discovered at different time windows. Our focus was on the topic modeling and organization us- [HN12] Basheer Hawwash and Olfa Nasraoui. ing the simplest (bag of words) features. Special- Stream-dashboard: a framework for min- ized twitter feature extraction and selection methods, ing, tracking and validating clusters in such as the ones surveyed and proposed by Aiello et a data stream. In Proceedings of the al. [APM+ 13], have the potential to improve our re- 1st International Workshop on Big Data, sults, a direction we will explore in the future. An- Streams and Heterogeneous Source Min- other direction to explore is the news domain specific, ing: Algorithms, Systems, Programming user-centered approach, discussed in [SNT+ 14] and a Models and Applications, pages 109–117. more expanded use of automated annotation to sup- ACM, 2012. port topic extraction and description. [LGD11] Chang-Hwan Lee, Fernando Gutierrez, and Dejing Dou. Calculating feature 8 Acknowledgements weights in naive bayes with kullback-leibler We would like to thank the organizers of the SNOW measure. In Data Mining (ICDM), 2011 2014 workshop, in particular the members of the So- IEEE 11th International Conference on, cialSensor team for their leadership in all the phases pages 1146–1151. IEEE, 2011. of the competition. [LH09] Chenghua Lin and Yulan He. Joint senti- ment/topic model for sentiment analysis. References In Proceedings of the 18th ACM confer- [AN12] Artur Abdullin and Olfa Nasraoui. Clus- ence on Information and knowledge man- tering heterogeneous data sets. In Web agement, pages 375–384. ACM, 2009. Congress (LA-WEB), 2012 Eighth Latin American, pages 1–8. IEEE, 2012. [LHAY07] Yang Liu, Xiangji Huang, Aijun An, and Xiaohui Yu. Arsa: a sentiment-aware [APM+ 13] Luca Maria Aiello, Georgios Petkos, Car- model for predicting sales performance us- los Martin, David Corney, Symeon Pa- ing blogs. In Proceedings of the 30th an- padopoulos, Ryan Skraba, Ayse Goker, nual international ACM SIGIR conference Ioannis Kompatsiaris, and Alejandro on Research and development in informa- Jaimes. Sensing trending topics in twitter. tion retrieval, pages 607–614. ACM, 2007. IEEE Transactions on Multimedia, 2013. [LS99] Daniel D Lee and H Sebastian Seung. [BL06] David Blei and John Lafferty. Correlated Learning the parts of objects by non- topic models. Advances in neural informa- negative matrix factorization. Nature, tion processing systems, 18:147, 2006. 401(6755):788–791, 1999. [LZ08] Yue Lu and Chengxiang Zhai. Opinion Identifying and verifying news through so- integration through semi-supervised topic cial media: Developing a user-centered modeling. In Proceedings of the 17th in- tool for professional journalists. Digital ternational conference on World wide web, Journalism, 2014. pages 121–130. ACM, 2008. [TM08] Ivan Titov and Ryan McDonald. Model- [McC] Mallet: A machine learning for language ing online reviews with multi-grain topic toolkit. http://www.cs.umass.edu/ mccal- models. In Proceedings of the 17th inter- lum/mallet. national conference on World Wide Web, pages 111–120. ACM, 2008. [NASW09] David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. Dis- [WWC05] Janyce Wiebe, Theresa Wilson, and Claire tributed algorithms for topic models. The Cardie. Annotating expressions of opin- Journal of Machine Learning Research, ions and emotions in language. Language 10:1801–1828, 2009. resources and evaluation, 39(2-3):165–210, [PCA14] Symeon Papadopoulos, David Corney, and 2005. Luca Maria Aiello. Snow 2014 data chal- [YMM09] Limin Yao, David Mimno, and Andrew lenge: Assessing the performance of news McCallum. Efficient methods for topic topic detection methods in social media. In model inference on streaming document Proceedings of the SNOW 2014 Data Chal- collections. In Proceedings of the 15th lenge, 2014. ACM SIGKDD international conference [PT94] Pentti Paatero and Unto Tapper. Positive on Knowledge discovery and data mining, matrix factorization: A non-negative fac- pages 937–946. ACM, 2009. tor model with optimal utilization of error [ZBG13] Ke Zhai and Jordan Boyd-Graber. On- estimates of data values. Environmetrics, line topic models with infinite vocabulary. 5(2):111–126, 1994. In International Conference on Machine [SNT+ 14] S Schifferes, N. Newman, N. Thurman, Learning, 2013. D. Corney, A.S. Goker, and C Martin.