Liver CT Annotation via Generalized Coupled Tensor Factorization Beyza Ermis and A. Taylan Cemgil Boğaziçi University, 34342, Bebek, İstanbul, Turkey, beyza.ermis@boun.edu.tr,taylan.cemgil@boun.edu.tr Abstract. This study deals with the missing answers prediction prob- lem. We address this problem using coupled analysis of ImageCLEF2014 dataset by representing it as a heterogeneous data, i.e., dataset in the form of matrices. We propose to use an approach based on probabilistic interpretation of tensor factorization models, i.e., Generalized Coupled Tensor Factorization, which can simultaneously fit a large class of ma- trix/tensor models to higher-order matrices/tensors with common latent factors using different loss functions. Numerical experiments demonstrate that joint analysis of data from multiple sources via coupled factorization gives high prediction performance. Keywords: matrix factorization,coupled analysis,missing value predic- tion 1 Introduction In this study, the task is to predict user expressed (UsE) features by using the computer generated (CoG) features. UsE features are some predefined questions related to liver, vessels, lesions and CoG features are the features which are derived from image itself. Our aim is to fill in a pre-prepared form and pro- vide computer aided automatic annotation of liver CT volumes by using image features [1, 2]. We consider, for answer prediction, both KL-divergence and Euclidean distance- based cost functions as well as coupled matrix factorization models [3–5] using GCTF [6] framework. 2 Methodology In this study, we use Generalized Coupled Tensor factorization framework[6] for coupled factorization of several tensors and matrices to fill in the missing entries in observed data. A generalized tensor factorization problem is specified by an observed tensor X with possibly missing entries and a collection of latent tensors to be estimated, Z1:|α| = {Zα } for α = 1...|α|. GCTF framework is a generalization of the Probabilistic Latent Tensor fac- torization (PLTF) [7] to coupled factorization. The Probabilistic Latent Tensor 421 Factorization framework (PLTF) [7] is appeared as an extension of the matrix factorization model [3] and enables one to incorporate domain specific infor- mation to any arbitrary factorization model and provides the update rules for multiplicative expectation-maximization (EM) algorithms. In this framework, the goal is to compute an approximate factorization of a given higher-order ten- sor, i.e., a multiway array, X in terms of a product of individual factors Zα as: XY X(v0 ) ≈ X̂(v0 ) = Zα (vα ), (1) v̄0 α where some of the factors are possibly fixed. Here, we define v as the set of all indices in a model, v0 as the set of visible indices, vα as the set of indices in Zα , and v̄α = v − vα as the setQof all indices not in Zα and α = 1, ...K as the factor index. Since the product α Zα (vα ) is collapsed over a set of indices, the factorization is latent. The approximation problem is cast as an optimization problem where we minimize the divergence d(X, X̂), where d is a divergence between the observed data X and model prediction X̂. In applications, d is typically taken as Euclidean (EUC), Kullback-Leibler (KL) or Itakura-Saito (IS) [7]. Here, we have two-dimensional datasets so we use nonnegative variant of the matrix factorization model. The matrix factorization model can be defined in the PLTF notation as follows. Given a matrix X, its factorization model is defined as: X X(i, j) ≈ X̂(i, j) = Z1 (i, r)Z2 (j, r) (2) r where the index sets V = {i, j, k, r}, V0 = {i, j}, V1 = {i, r} and V2 = {j, r}. The update equation for non-negative generalized matrix/tensor factorization is expressed as: ∆α (M ◦ X̂ −p ◦ X) Zα ← Zα ◦ s.t. Zα (vα ) > 0. (3) ∆α (M ◦ X̂ 1−p ) where ◦ is the Hadamard product (element-wise product), M is a 0 − 1 mask array with M (v0 ) = 1 (M (v0 ) = 0) if X(v0 ) is observed (missing). Here p determines the cost function, i.e., p = {0, 1, 2} correspond to the β-divergence [5] that unifies EUC, KL, and IS cost functions, respectively. In this iteration, we define the tensor valued function ∆α (A) as: X Y ∆α (A) = A(v0 ) Zα0 (vα0 ) (4) v̄α α0 6=α ∆α (A) is an object, the same size of Zα , obtained simply by multiplying all factors other than the one being updated with an object of the order of the data. Hence the key observation is that the ∆α function is just computing a 422 Table 1: Update rules for different p values p Cost Function Multiplicative Update Rule Rν,α ∆ P (M ◦X ) 0 Euclidean Zα ← Zα ◦ Pν Rν,α ∆α,ν (Mν ◦X̂ν ) ν α,ν ν ν P ν,α −1 ν RP ∆α,ν (Mν ◦X̂ν ◦Xν ) 1 Kullback-Leibler Zα ← Zα ◦ ν,α ∆ ν R α,ν (Mν ) tensor product and collapses this product over indices not appearing in Zα , which is algebraically equivalent to computing a marginal sum. This update rule can be used iteratively for all non-negative Zα and converges to a local minimum provided we start from some non-negative initial values. For updating Zα , we need to compute the ∆ function twice for arguments A = Mν ◦ X̂ν−p ◦ Xν and A = Mν ◦ X̂ν1−p . It is easy to verify that update equations for the KL-NMF (non-negative matrix factorization) problem (for p = 1) are obtained as a special case of (3). The Generalized Coupled Tensor Factorization (GCTF) [6] model takes the PLTF model one step further where, in this case, we have multiple observed tensors Xν that are supposed to be factorized simultaneously: XY ν,α Xν (v0,ν ) ≈ X̂ν (v0,ν ) = Zα (vα )R (5) v̄0,ν α  1 if Xν and Zα connected Rν,α = . (6) 0 otherwise where ν = 1, ...|ν| and R is a coupling matrix that is defined as in (6). Note that, distinct from PLTF model, there are multiple visible index sets (v0,ν ) in the GCTF model, each specifying the attributes of the observed tensor Xν . The inference, i.e., estimation of the shared latent factors Zα , can be achieved via iterative optimization (see [6]). For non-negative data and factors, one can obtain the following compact fixed point equation where each Zα is updated in an alternating fashion fixing the other factors Zα0 for α0 6= α ν,α ∆α,ν (Mν ◦ X̂ν−p ◦ Xν ) P νR Zα ← Zα ◦ P 1−p . (7) ν,α ∆ νR α,ν (Mν ◦ X̂ν ) Here p, as in (3), determines the cost function, (see Table 1). In this iteration, the key quantity is the ∆α,ν function that is defined as follows:   0 Rν,α  X X Y ∆α,ν (A) =  A(v0,ν ) Zα0 (vα0 ) (8) v0,ν ∩v̄α v̄0 ∩v̄α α0 6=α 423 3 Experiments and Results 3.1 Experimental Setting In this section, we address the missing answers (UsE features) prediction task on user expressed (UsE) features. UsE features are provided by a radiologist and represented by a 73×6 cell array. In our experiments, we used one column of that data which stands for annotation’s values. For the experiments, we choose and separate the questions into two groups. In the first group, we used the question set: S1 = {3, 5, 7, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28} and in the second group we used the question set: S2 = {29, 30, 31, 38, 39, 40, 41, 44, 48, 60, 66, 68, 71}. The questions in the first group have answers between {0 − 3} and the questions in the second group have binary values {0 − 1}. ImageCLEF2014 (liver CT annotation task) [2] has the content of 50 Matlab data files for training and 10 Matlab data files for test that contains CoG and UsE for each sample. We generate observation sample- questions matrices X1 and X2 by using these samples and the questions. In the end, we have the matrix X1 of size 60 × 21 and X2 of size 60 × 13. In matrix X1 , an entry X1 (i, k) indicates the value of the answer k for sample i. Likewise, an entry X2 (i, p) indicates the value of the answer p for sample i in matrix X2 . X1 Z2 f eature sample ! Z1 x sample ques1 ques1 X2 Z3 f eature x sample ! f eature ques2 ques2 Fig. 1: General sketch of the proposed model. The blocks visualize the matrices that are defined in the model and the relation between them. To predict the UsE features, we use Computer generated (CoG) features. These features are image-based features and they are generated by an inter- active segmentation software. These features are stored in a 60 × 4 cell array. 424 Likewise UsE features, we used feature’s value in our experiments. Here, we gen- erate sample-features matrix Z1 of size 60 × 447. In matrix Z1 , an entry Z1 (i, j) indicates the value of the feature j for sample i. Finally, we generate question-feature matrices Z2 of size 447 × 21 and Z3 of size 447×13 in order to complete the missing answers in X1 and X2 respectively. The first 50 rows of X1 and X2 are observed and used for training while the rows from 51 to 60 are not observed and used for test. In order to fill in the missing entries in X1 and X2 , we applied the coupled approach to a CP- style tensor factorization model by analysing the tensor X1 with X2 . They have samples mode in common and this gives us the following model: X̂1 (i, k) = Z1 (i, j)Z2 (j, k) X̂2 (i, p) = Z1 (i, j)Z3 (j, p) (See Figure 1 for the visualization of the general structure of the model.) After solving this model and obtain the values for latent factors, we construct the estimated matrices for X1 and X2 . Here, our goal is to predict ordinal values for answers on a scale of {1, . . . R}. We can reduce this prediction task to a set of binary threshold problems, namely, predicting r ≥ 1, r ≥ 2, . . . , r ≥ R. We try 1000 threshold values between 0 − 1, compute the accuracy of prediction for each value and choose the threshold value that gives the best accuracy performance. Threshold selection is not an easy task when we have different types of questions. For example, in the datasets there are several questions that their answers could have different values from all other questions. So, we have to find a unique value for that specific question. In this study, we ignore the questions with varied answers. That is why we have chosen these two sets of questions: to group the questions that have same type of answers is easier for this stage. 3.2 Results Evaluation: Completeness and accuracy of the prediction are the two main eval- uation criteria of this study. The predicted annotations are compared to the manual annotations of the test dataset. To measure the completeness, the num- ber of predicted features are divided by total number of features and to measure the accuracy, the number of correct predicted features divided by total number of predicted features. Finally, the total score is given in terms of the ratio of correct answers in separate groups. Table 2: Completion and accuracy results of previous method with EUC distance and KL divergence. Cost Function Completeness Accuracy Total Score Kullback-Leibler (KL) 0.51 0.39 0.45 425 In the previous run we have used KL-divergence to estimate the factors and non-optimal threshold values to relate the real-valued predictions to the discrete predictions. We just choose a one threshold value manually, then predict all answers by using that value. At the end, we obtained the results given in Table 2. Then, we used the thresholding method that we described in the Experi- mental Setting section. It determines the optimum value and we rerun the ex- periments by using this method. Here, we used both Euclidean distance and KL-divergence. Afterwards, we obtained the results with higher accuracy given in Table 3. We can clearly see that, our new method outperforms the previ- ous one and the selection of the threshold significantly affects the prediction accuracy. Table 3: Completion and accuracy results of our method with EUC distance and KL divergence. Cost Function Completeness Accuracy Total Score Euclidean (EUC) 0.5151 0.8911 0.6775 Kullback-Leibler (KL) 0.5151 0.8882 0.6764 4 Conclusion In this study, we have studied answer prediction problem using coupled analy- sis of imageCLEF2014 data [2] represented as datasets in the form of matrices. The problem is formulated as simultaneous factorization of higher-order ten- sors/matrices extracting common latent factors from the shared modes. We have used Generalized Coupled Tensor Factorization framework, which enables us to develop coupled models for joint analysis of multiple data sets in a compact way using various models and cost functions. In our coupled analysis, we have studied both KL-divergence and Euclidean distance-based cost functions. In this study, we have used some questions and try to predict the values of their answers by using Euclidean distance and KL-divergence. We plan to extend our study to use different losses such as logistic loss or hinge loss and predict the answers of all questions in the dataset that have answers in a wide variety. References 1. Marvasti, N., Kökciyan, N., Türkay, R., Yazıcı, A., Yolum, P., Üsküdarlı, S., Acar, B.: imageclef liver ct image annotation task 2014. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes. (2014) 2. Caputo, B., Müller, H., Martinez-Gomez, J., Villegas, M., Acar, B., Patricia, N., Marvasti, N., Üsküdarlı, S., Paredes, R., Cazorla, M., Garcia-Varea, I., Morell, V.: ImageCLEF 2014: Overview and analysis of the results. In: CLEF proceedings. Lecture Notes in Computer Science. Springer Berlin Heidelberg (2014) 426 3. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix fac- torization. Nature 401 (1999) 788–791 4. Cemgil, A.T.: Bayesian inference in non-negative matrix factorisation models. Com- putational Intelligence and Neuroscience (Article ID 785152) (2009) 5. Cichoki, A., Zdunek, R., Phan, A., Amari, S.: Nonnegative Matrix and Tensor Factorization. Wiley (2009) 6. Yilmaz, Y.K., Cemgil, A.T., Simsekli, U.: Generalised coupled tensor factorisation. In: NIPS. (2011) 7. Yilmaz, Y.K., Cemgil, A.T.: Probabilistic latent tensor factorization. In: LVA/ICA. (2010) 346–353 427