=Paper=
{{Paper
|id=Vol-2125/paper_82
|storemode=property
|title=ImageCLEF 2018: Semantic Descriptors for Tuberculosis CT Image Classification
|pdfUrl=https://ceur-ws.org/Vol-2125/paper_82.pdf
|volume=Vol-2125
|authors=Abdelkader Hamadi,Djamel Eddine Yagoub
|dblpUrl=https://dblp.org/rec/conf/clef/HamadiY18
}}
==ImageCLEF 2018: Semantic Descriptors for Tuberculosis CT Image Classification==
ImageCLEF 2018: Semantic descriptors for
Tuberculosis CT Image Classification
Abdelkader HAMADI[0000−0001−9990−332X] and Djamel Eddine YAGOUB
University of Abdelhamid Ibn Badis Mostaganem
Faculty of Exact Sciences and Computer Science
Mathematics and Computer Science Department
Mostaganem, Algeria
abdelkader.hamadi@univ-mosta.dz
djamel.ed.y@gmail.com
Abstract. In this article, we present our methodologies used in our
participation at the two sub-tasks of the ImageCLEF 2018 Tuberculosis
Task (TBT and SVR task). We proposed to extract a single semantic
descriptor of 3D CT image to describe each patient rather than using
all his slices as separate samples. In TBT task, the resulting descriptors
are then exploited in a second learning stage to identify the type of tu-
berculosis among five given classes. In SVR task, the same experimental
design is used to predict the degree of severity of the disease. We reached
a Kappa coefficient value of about 0.0629 in TBT sub-task, and our best
run on SVR was ranked 12th out of 36 submission and 5th out of 7 par-
ticipant teams. We believe that our approach could give better results if
applied properly.
Keywords: ImageCLEF · Tuberculosis Task · Deep Learning · CT Im-
age · Tuberculosis CT Image Classification · Tuberculosis Severity Scor-
ing.
1 Introduction
Tuberculosis is an infectious disease caused by a bacterium called Bacillus micro-
bacterium tuberculosis. With a high mortality rate in the world, this disease
remained one of the top ten causes of death in the world in 2015. Diagnosing
this sickness quickly and accurately is a vital goal that would limit its invasion
and damage. One of the major problems of this disease is that traditional tests
produce inaccurate or too long results. For these reasons, researchers have been
interested in this disease diagnosis, particularly in the context of the interna-
tional challenge ImageCLEF 2017 [3] and ImageCLEF 2018 [9] where two tasks
(three tasks in ImageCLEF 2018) have been reserved for it. The first aims to
detect multi-drug resistant (MDR) status of patients. The goal of the second
task is to identify the type of tuberculosis. A third task has been introduced
in ImageCLEF 2018 [5] which consists to predict the degree of severity of the
patient’s case. In all the three tasks, the predictions are based on 3D CT scans
images. Algorithms involving deep learning have been tested to diagnose the
presence or the absence of tuberculosis. The results obtained were interesting.
However, they must be improved for better control and effective diagnosis, help-
ing doctors to make the decisions and to choose the necessary treatments at the
right time.
We can summarize the objectives of the Tuberculosis task through the fol-
lowing points:
– Helping medical doctors in the diagnosis of drug-resistant TB and TB type
identification through image processing techniques;
– Introducing work towards inexpensive and quick methods for early detection
of the MDR status and TB types in patients;
– Predicting quickly the type of TB and its severity degree to help doctors to
make quick decisions and give the effective treatments.
We present in the following our work that has been made in the context of our
participation to the two sub-tasks of ImageCLEF 2018 Tuberculosis Task: Tu-
Berculosis Types classification (TBT) and Tuberculosis Severity Scoring (SVR).
The remainder of this article is organized as follows. Section 2 describes the
two tasks to which we had participated. In section 3, we present our contribution
by detailing the system deployed to complete our submissions. Section 4 details
our experimental protocols followed to generate our predictions. We detail and
analyze in the same section the results obtained. We conclude in the last section
by presenting our perspectives and future works.
2 Participation to imageCLEF 2018
2.1 Tasks description
In this paper, we focus on our participation in the TBT and the SVR sub-tasks
that we describe in the following sections.
In both tasks the data is provided as 3D CT scans. For some patients sev-
eral 3D CT scans are given while for some others only one is provided. All the
CT images are stored in NIFTI file format with .nii.gz extension file (g-zipped
.nii files). For each of the 3-dimensions of the CT image, we find a number of
slices varying from about 50 to 400. Each slice has a size of about 512×512 pixels.
A training collection is provided at the beginning of the task with its ground-
truth (labels of samples). Participants prepare and train their systems on this
dataset. A test collection is provided at a later date. Participants interrogate
their system and return their predictions to the organizers’ committee. An eval-
uation is performed by the latter to compare the performance of the systems.
TBT task consists of the automatic categorization of TB cases in 5 target
classes based on CT scans of patients. The five types considered are:
1. Infiltrative
2. Focal,
3. Tuberculoma
4. Miliary
5. Fibro-cavernous
The results will be evaluated using unweighted Cohens Kappa and accuracy.
SVR task aims to predict the degree of severity of TB cases. Given a TB pa-
tient, the main goal is to predict its severity score based on his 3D CT scan.The
degree of severity is modeled according to 5 discrete values : from 1 (“criti-
cal/very bad”) to 5 (“very good”). The score value is simplified so that values
1, 2 and 3 correspond to “high severity” class, and values 4 and 5 correspond to
“low severity”.
The classification problem are evaluated using ROC-curves (AUC) produced
from the probabilities provided by the participants. For the regression problem,
the root mean square error (RMSE) is used.
3 Our contribution
We proposed to extract semantic descriptors from 3D CT scans. We noticed that
participants of the ImageCLEF TBT 2017 task used each extracted slice as a
separate sample. Thus, hundreds of slices are considered as separate learning
samples while these slices represent the same patient. This introduces a lot of
noise. In addition, each slice will be assigned the label of the patient (its type)
even those whose content does not present any information to identify the type
of TB case. This introduces more noise. The majority of the participants [11] of
ImageCLEF 2017 highlighted this problem and its impact on the results.
To overcome this problem, we believe that the simplest solution is to pro-
duce a single descriptor for each patient. This constitutes the key idea of our
contribution.
Our proposed system goes through three main stages:
1. Input data pre-processing
2. Features extraction
3. Learning a classification model
We will detail each step in the following.
3.1 Input data pre-processing
We remind that in both tasks, 3D CT scans are provided in compressed Nifti
format. Firstly, we decompress the files and extract the slices. At the end, we
have three sets of slices corresponding to the three dimensions of the 3D image.
For each dimension and for each Nifti image we obtain a number of slices ranging
from 50 to 400 jpeg images.
The visual content of the images extracted from the different dimensions is
not similar. Indeed, the images of each dimension are taken with from a differ-
ent angle of view.We noticed from our experiments that the slices of the -Y-
dimension give better results compared to the two others (X and Z). However,
the following steps can be applied to slices of any of the three dimensions.
Extracting Image slices
slices
3 Dimensions
CT scans Converting Nifti
Nifti format to JPG
(.nii.gz) X Y Z
Selecting a
dimension
60 selected slices per
Filtering
CT image / patient Y
Fig. 1. Pre-processing of input data.
On the other hand, not all slices necessarily contain relevant information
that can be useful to identify types of TB. This is why, it is essential to filter
slices by keeping only those that can be informative and may contain relevant
information. Moreover, since we want to extract a single descriptor per patient,
it is essential to keep the same number of slices for each patient. We found that
there is usually a maximum of 60 slices visually informative. Since the slices are
ordered, the 60 most informative are usually at the center of the list. We propose
then to keep the 60 middle slices. This is not optimal but we opted for this choice
for a fully automatic approach. This choice can be improved by performing a
manual filtering with the intervention of a human expert, preferably with medical
skills on TB disease. Figure 1 summarizes the process.
3.2 Features extraction
After slices extraction and filtering, we propose to extract a single descriptor per
patient. The transfer learning presents in this context an interesting track that
can be exploited. The results of SGEast [11] and even other teams in the same
task of ImageCLEF 2017 proved the efficiency of this approach [4, 11]. Indeed,
SGEast opted for the transfer learning where they exploited the output of a
Resnet-50 [8] deep learner layer. However, this idea presents a problem of the
resulting descriptor size. Indeed, for example, SGeast considered a descriptor per
slice and not per patient. However, since we want to have a single descriptor,
it is important that the information extracted from each slice must not be very
large. Therefore, we propose to describe each slice by semantic information. This
idea is inspired by the work presented in [7].
So, we choose to exploit the probabilities predicted by a deep learner trained
on the set of slices. If K is the number of classes considered, this information
typically corresponds to the K predicted probability values for the K classes
(five probabilities of the five types for the TBT task, or the five severity degrees
for the SVR task). We obtain then for each slice K values corresponding to the
number of the considered classes.
Slices, labels
Deep learned model
Deep Class K - Classes
Labels / Learner
Slices
K classes C-1 C-2 C-3 C-4 C-k
Pr-i,j : probability score for the
i-th slice regarding the j-th classe
Slice-1 Pr-1,1 Pr-1,2 Pr-1,3 Pr-1,4 Pr-1,k
patient CT image Pre-processing Slice-2
Pr-2,1 Pr-2,2 Pr-2,3 Pr-2,4 Pr-2,k
Slice-60 Pr-60,1 Pr-60,2 Pr-60,3 Pr-60,4 Pr-60,k
Semantic Sub-discriptors: D-1 D-2 D-3 D-4 D-K
Concatenation of all
sub-discriptors
Final semantic descriptor
D-1 D-2 D-3 D-4 ---- D-k
for the patient
Fig. 2. Our semantic features extraction process.
Furthermore, K sub-descriptors are generated: D1 , D2 , D3 , D4 , ... Dk . Each
sub-descriptor Di contains the predicted probabilities for the class i for all the
slices of the patient. A final semantic descriptor is constructed by concatenat-
ing the K sub-descriptors. Figure 2 details the process of the semantic feature
extraction for one patient.
3.3 Learning a classification model
In this step, we propose to exploit the semantic descriptors of patients obtained
in the previous step. Any approach of supervised classification can be applied as
shown in figure 3.
Semantic descriptors / Train corpus
Labels
learned model
Semantic descriptor 1
Supervised
Train –corpus .
Semantic descriptor 2 Learner C-1 C-2 C-3 C-4 C-k
CT images
Semantic Features extraction
.
.
.
Semantic descriptor n
Semantic descriptor
Pr-1 Pr-2 Pr-3 Pr-4 Pr-k
CT image .
of a test .
Semantic descriptor
patient .
.
Predicting a class / K-classes
Fig. 3. Learning a classification model based on the semantic descriptors.
We recommend for this step some ideas:
– To use a deep learner having as input the semantic descriptors of patients
and their labels. As an alternative, we propose to use a bagging method
that collaborates several learners and sub-samples the train collection. This
would lead to better results as our experiments showed.
– To apply a samples selection, especially in the TBT task where several CT
images were provided for some patients. We noticed in our experiments that
using all the images for each patient introduces a lot of noise and would
give less good results than using only one image per patient. An alternative
consists of creating multiple sub-collections where each one contains a dif-
ferent single CT image per patient, and generating then a learner on each
sub-collection to aggregate finally their results. This would probably lead to
a much more robust model.
4 Experiments and results
We describe in the following sections our runs submitted to the TBT and SVR
tasks.
We implemented the semantic descriptor approach described in section 3. We
used for that the following tools:
– The Caffe frawework [10] for deep learning;
– Weka [6] for testing several learning and classification algorithms;
– med2image [1] for the conversion of nifti medical images to the classic Jpeg
format.
We chose to use slices of the -Y- dimension because our experiments showed
that they are more suitable than those of the two others and got better results.
For descriptors extraction, our approach consists to learn a deep model to
generate semantic information. Unfortunately, we had problems with our ma-
chines deployed for training our deep learner. Due to lack of time, we could not
achieve the learning process. As an alternative to this step, we deployed the
same model as the one proposed by the SGeast team [11] at the CLEF 2017
TBT Task. The model is accessible from the following link [2]. It is based on a
Resnet-50 [8] and got the best results at the TBT task of 2017 edition. We have
therefore exploited the outputs of the last layer (named prob) of the Resnet-50
corresponding to the probabilities of the 5 considered classes.
4.1 TBT task
Dataset: The dataset used in TBT tasks includes chest CT scans of TB patients
along with the TB type. Some patients include more than one scan. All scans
belonging to the same patient present the same TB type. Table 1 summarizes
the distribution of CT scans according to the five types of TB considered.
Table 1. Dataset given for Tuberculosis TBT task [9].
Train Test
TB types #Patients #CTs #Patients #CTs
Type 1 228 376 89 176
Type 2 210 273 80 115
Type 3 100 154 60 86
Type 4 79 106 50 71
Type 5 60 99 38 57
Total 677 1008 317 505
Experimental protocol: We used the train collection provided by the orga-
nizers and we split it into two sub-collections: 80% for training and 20% as
validation set. We have exploited in all our runs the semantic descriptors gener-
ated as previously described. We tested several learners in the classification step.
We finaly submitted three main runs. The other submissions are some variants
or are generated through the fusion of some of these three runs:
– Run 1 (TBT mostaganemFSEI run1): random forest as supervised classifier.
We tuned the two parameters referring to the number of iterations performed
and the number of features selected randomly;
– Run 2 (TBT mostaganemFSEI run2): bagging of a set of random forest
learners. We tuned the number of learners for the bagging and the same
two parameters as Run1 for random forest;
– Run 4 (TBT mostaganemFSEI run4): A hierarchical classification. We orga-
nized the five 5 classes into a hierarchical structure as described in figure 4.
We have created two new virtual classes V − 1 and V − 2. V − 2 regroups the
three classes Type 1, Type 2, and V − 2 contains the classes Type 3, Type 4
and Type 5. We have reorganized our collections in order to achieve a classi-
fication on two different levels. In the first stage, we classify the samples into
two virtual classes V − 1 and V − 2. In the second level of classification, we
performed a classification of the samples regarding the set of classes of the
predicted class in the previous stage. In two classification process we used a
random forest learner by tuning its two parameters as described for Run1.
First level : V-1 V-2
Second level : Type1 Type 2 Type3 Type 4 Type 5
Fig. 4. Hierarchical re-organization of TBT types.
Results: Table 2 shows the results obtained by our runs on validation collection.
Table 2. Results on validation set for TBT task.
Runs Kappa Accuracy
Run 1 (TBT mostaganemFSEI run1) 0.21 0.38
Run 2 (TBT mostaganemFSEI run2) 0.25 0.41
Run 4 (TBT mostaganemFSEI run4) 0.26 0.52
Table 3 shows the results obtained by our runs on the evaluation performed
by the ImageCLEFcommittee.
Table 3. Results on test set for TBT task.
Runs Kappa Rank Accuracy Rank
Run 1 (TBT mostaganemFSEI run1) 0.0412 28 0.2650 29
Run 2 (TBT mostaganemFSEI run2) 0.0275 29 0.2555 32
Run 4 (TBT mostaganemFSEI run4) 0.0629 25 0.2744 27
As shown on validation results, Run 4 has been our best submission and got
also the best results on test collection compared to run 1 and run 2.
Figures 5 and 6 describes the results and ranking of all submissions on TBT
task in terms of kappa coefficient and accuracy, respectively.
TBT_Task
0,26
0,24
0,22
0,2
0,18
0,16
0,14
0,12
0,1 Mean = 0,08
0,08
KAPPA
0,06
0,04
0,02
0
TBT_combined.txt
TBT-Run-01-sk-LDA-Update-317-New.txt
TBT_run_TBdescs2_zparts3_thrprob50_rf150.csv
TBT-Run-02-Mohan-RF-F20I1500S20-317.txt
TBT_m3_weighted.txt
TBT_w_combined.txt
TBT-Run-03-Mohan-RF-7FF20I1500S20-Age.txt
TBT-Run-01-sk-LDA-Update-317.txt
SVMirene.txt
TBT_m4_weighted.txt
TBT_Riesz_AllCols_euclidean_TST.csv
TBT-Run-05-Mohan-RF-F20I2000S20.txt
TBTTask_2_128.csv
TBTLast.csv
TBT-Run-04-Mohan-VoteRFLMT-7F.txt
TBT_m2.txt
TBT_m1.txt
TBT_mostaganemFSEI_run3.txt
T2SVMFinal.txt
TBT_AllFeats_std_euclidean_TST.csv
TBT_mostaganemFSEI_run1.txt
TBTLIST.txt
TBT_Riesz_std_euclidean_TST.csv
TBT-Run-06-Mix-RF-5FF20I2000S20.txt
TBT_HOG_AllCols_euclidean_TST.csv
TBT_AllFeats_std_correlation_TST.csv
TBT_mostaganemFSEI_run4.txt
TBT_m2p01_small.txt
TBT_AllFeats_mean_euclidean_TST.csv
Task2Submission64a.csv
TBT_HOG_std_correlation_TST.csv
TBT_HOG_std_correlation_TST.csv
TBT_MostaganemFSEI_run2.txt
TBT_MostaganemFSEI_run6.txt
T23nnFinal.txt
3nnconProbabilidad2.txt
TBT_AllFeats_AllCols_correlation_TST.csv
TBT_AllFeats_AllCols_correlation_TST.csv
TBT_modelsimple_lmbdap1_norm.txt
-0,02
-0,04
-0,06
-0,08
-0,1
-0,12
RUN
Fig. 5. Results and ranking in terms of Kappa coefficient on test data for TBT Task.
Although the results achieved by our submissions are not well ranked com-
pared to those of the top of the list, we can notice that several runs belong
to the same teams that had good results, and they probably do not differ too
much. On the other hand, we recall that our semantic descriptors were extracted
using a model that was not very well trained. In fact, we met problems with our
machines during the training of our deep learner. Indeed, although SGEast’s de-
ployed model got the best results at ImageCLEF 2017 Tuberculosis TBT task,
we did not have the ability to perform exactly the same pre-processing performed
by this team as described in [11]. We believe that our semantic descriptors could
give better results if they are extracted from a more adapted and well-developed
deeper model.
TBT_Task
0,45
0,42
0,39
0,36
0,33
Mean = 0,30
0,3
0,27
ACCURACY
0,24
0,21
0,18
0,15
0,12
0,09
0,06
0,03
0
3nnconProbabilidad2.txt
TBT-Run-01-sk-LDA-Update-317-New.txt
TBT_m4_weighted.txt
TBT-Run-03-Mohan-RF-7FF20I1500S20-Age.txt
TBT_m3_weighted.txt
T23nnFinal.txt
TBT_AllFeats_std_correlation_TST.csv
TBTLast.csv
TBT-Run-04-Mohan-VoteRFLMT-7F.txt
TBT_m2p01_small.txt
TBT-Run-02-Mohan-RF-F20I1500S20-317.txt
TBT-Run-05-Mohan-RF-F20I2000S20.txt
TBT_HOG_AllCols_euclidean_TST.csv
TBT_m2.txt
TBT_mostaganemFSEI_run4.txt
TBTLIST.txt
TBT_mostaganemFSEI_run3.txt
TBT_AllFeats_mean_euclidean_TST.csv
TBTTask_2_128.csv
SVMirene.txt
TBT_combined.txt
TBT_m1.txt
TBT_mostaganemFSEI_run1.txt
TBT-Run-01-sk-LDA-Update-317.txt
TBT-Run-06-Mix-RF-5FF20I2000S20.txt
TBT_w_combined.txt
T2SVMFinal.txt
Task2Submission64a.csv
TBT_MostaganemFSEI_run2.txt
TBT_MostaganemFSEI_run6.txt
TBT_run_TBdescs2_zparts3_thrprob50_rf150.csv
TBT_AllFeats_std_euclidean_TST.csv
TBT_modelsimple_lmbdap1_norm.txt
TBT_Riesz_AllCols_euclidean_TST.csv
TBT_Riesz_std_euclidean_TST.csv
TBT_HOG_std_correlation_TST.csv
TBT_HOG_std_correlation_TST.csv
TBT_AllFeats_AllCols_correlation_TST.csv
TBT_AllFeats_AllCols_correlation_TST.csv
RUN
Fig. 6. Results and ranking in terms of accuracy on test data for TBT Task.
4.2 SVR task
Dataset: The dataset for SVR task includes chest CT scans of TB patients along
with the corresponding severity score (1 to 5). Scores from 1 to 3 correspond to
the “High” severity whereas the two scores 4 and 5 refer to the “Low” degree
of severity. Table 4 summarizes the distribution of CT scans according to two
severity classes.
Table 4. Dataset given for Tuberculosis SVR task [9].
Train Test
Low severity 90 62
High severity 80 47
Total 170 109
Experimental protocol: We generated in a first step the semantic descriptors
following the approach described in the section 3. For the prediction of TB
severity scores, we treated the problem as a classification problem. We used for
this two approaches :
1. Multi-class classification problem: we considered the five scores as separate
classes. We then tested several classifiers. We selected two that have been
most effective compared to those tested: Random forest, bagging of a set of
random forest learners.
2. Hierarchical classification: We organized our data in order to carry out a
hierarchical classification. We considered the hierarchy described in figure 7.
Then, a two-level hierarchical classification is carried out. In the first level
the samples are classified into “High” or “Low” classes. In the second level,
the samples are reclassified into the descending classes of the one predicted
in the first level.
First level : High Low
Second level : 1 2 3 4 5
Fig. 7. The hierarchy of classes considered for SVR Task.
We submitted five runs:
1. Run 1 (SVR mostaganemFSEI run1): Multi-class model using Random for-
est as classifier. We tuned the two parameters : the number of iterations
performed and the number of features randomly chosen;
2. Run 2 (SVR mostaganemFSEI run2) : Multi-class model using a bagging of
a set of random forest learners with sub-sampling of the main train collection.
We created two sub-collections by balancing the number of samples for the
5 classes. We then merged the results obtained by the two sub-collections;
3. Run 3 (SVR mostaganemFSEI run3): Hierarchical classification using a Bag-
ging of a set of Random forest learners in each level of the hierarchical clas-
sification process.
4. Run 4 (SVR mostaganemFSEI run4): fusion of Run 1 and Run 2
5. Run 6 (SVR mostaganemFSEI run6): fusion of Run 3 and Run 1
Results: Table 5 shows the results obtained by our runs on validation collection.
Table 5. Results on validation set for SVR task in terms of Accuracy and Root Mean
Square Error (RMSE).
Runs Accuracy RMSE
Run 1 (SVR mostaganemFSEI run1) 0.41 0.37
Run 2 (SVR mostaganemFSEI run2) 0.36 0.45
Run 3 (SVR mostaganemFSEI run3) 0.56 0.3
Run 4 (SVR mostaganemFSEI run4) 0.42 0.36
Run 6 (SVR mostaganemFSEI run6) 0.48 0.34
Table 6 shows the results obtained by our runs on the evaluation performed
by the ImageCLEF committee on test collection.
Table 6. Results on test set for SVR task.
Runs RMSE Rank AUC Rank
Run 1 (SVR mostaganemFSEI run1) 1.0227 19 0.5971 26
Run 2 (SVR mostaganemFSEI run2) 1.0837 22 0.6127 22
Run 3 (SVR mostaganemFSEI run3) 0.9721 12 0.5987 25
Run 4 (SVR mostaganemFSEI run4) 1.0137 18 0.6107 24
Run 6 (SVR mostaganemFSEI run6) 1.0046 16 0.6119 23
We can see that our Run 3 got best results in terms of RMSE compared to
our other runs on validation collection and even on test data. However, in terms
of AUC, Run 2 seems to be more efficient.
SVR_Task
1,6
1,5
1,4
1,3
1,2
1,1 Mean = 1,064
1
0,9
RMSE
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
SVR_AllFeats_AllCols_euclidean_TST.csv
SVR-Run-05-Mohan-RF-3FI300S20.txt
SVR_LinReg_part.txt
SVR_AllFeats_mean_euclidean_TST.csv
SVR-Run-03-Mohan-MLP.txt
SVR_RanFrst.txt
SVR_RanFRST_depth_2_new_new.txt
SVR-Run-06-Mohan-VoteMLPSL-5F.txt
SVR-Gao-May4.txt
SVR-Run-04-Mohan-RF-F5-I300-S200.txt
SVR-Run-01-sk-LDA.txt
SVR9.csv
SVR_DTree_Features_Best.txt
SVR-Run-07-Mohan-MLP-6FTT100.txt
SVR_HOG_std_euclidean_TST.csv
SVR-Run-02-Mohan-RF.txt
SVR_DTree_Features_Best_All.txt
SVR-Gao-April27.txt
SVR_mostaganemFSEI_run3.txt
SVR_mostaganemFSEI_run6.txt
SVR_mostaganemFSEI_run4.txt
SVR_mostaganemFSEI_run1.txt
SVR_mostaganemFSEI_run2.txt
SVR_DTree_Features_Best_Bin.txt
SVRSubmission.txt
SVR_RanFRST_depth_2_new.txt
SVR_HOG_AllCols_euclidean_TST.csv
SVR_HOG_mean_euclidean_TST.csv
SVR_mostaganemFSEI.txt
SVR_AllFeats_AllCols_correlation_TST.csv
SVR_Riesz_std_correlation_TST.csv
SVR_HOG_mean_correlation_TST.csv
SVR_Riesz_AllCols_correlation_TST.csv
SVR_run_TBdescs2_zparts3_thrprob50_rf100.csv
SVR_HOG_AllCols_correlation_TST.csv
SVR_RanFRST_depth_2_Ludmila_new_new.txt
RUN
Fig. 8. Results and ranking in terms of Root Mean Square Error on test collection.
Figures 8 and 9 describes the results and ranking of all submissions on SVR
task in terms of RMSE and AUC values, respectively.
We can see that our best run is ranked 12th out of 36 submissions. However,
the difference between the performances of the 12 best runs is not very significant.
We recall that our best result is achieved by a hierarchical classification approach
using a bagging of random forest learners at each level of the hierarchy. We
believe that our approach could give better results using a well-trained deep
model in the semantic features extraction step.
SVR_Task
0,9
0,8
0,7
Mean = 0,64
0,6
0,5
AUC
0,4
0,3
0,2
0,1
0
SVR-Run-05-Mohan-RF-3FI300S20.txt
SVRSubmission.txt
SVR_AllFeats_mean_euclidean_TST.csv
SVR9.csv
SVR-Run-03-Mohan-MLP.txt
SVR-Run-01-sk-LDA.txt
SVR-Gao-May4.txt
SVR_RanFrst.txt
SVR-Run-02-Mohan-RF.txt
SVR-Run-04-Mohan-RF-F5-I300-S200.txt
SVR-Gao-April27.txt
SVR-Run-06-Mohan-VoteMLPSL-5F.txt
SVR-Run-07-Mohan-MLP-6FTT100.txt
SVR_mostaganemFSEI.txt
SVR_LinReg_part.txt
SVR_HOG_mean_correlation_TST.csv
SVR_HOG_mean_euclidean_TST.csv
SVR_DTree_Features_Best_All.txt
SVR_mostaganemFSEI_run2.txt
SVR_mostaganemFSEI_run6.txt
SVR_mostaganemFSEI_run4.txt
SVR_mostaganemFSEI_run3.txt
SVR_mostaganemFSEI_run1.txt
SVR_RanFRST_depth_2_new_new.txt
SVR_Riesz_std_correlation_TST.csv
SVR_AllFeats_AllCols_correlation_TST.csv
SVR_Riesz_AllCols_correlation_TST.csv
SVR_HOG_std_euclidean_TST.csv
SVR_AllFeats_AllCols_euclidean_TST.csv
SVR_RanFRST_depth_2_new.txt
SVR_DTree_Features_Best.txt
SVR_RanFRST_depth_2_Ludmila_new_new.txt
SVR_HOG_AllCols_euclidean_TST.csv
SVR_DTree_Features_Best_Bin.txt
SVR_HOG_AllCols_correlation_TST.csv
SVR_run_TBdescs2_zparts3_thrprob50_rf100.csv
RUN
Fig. 9. Results and ranking in terms of Area Under ROC curve on test collection.
5 Conclusion and future works
We have described in this article our contributions to the TBT and SVR tasks
of ImageCLEF Tuberculosis 2018. We proposed an approach that consists in
extracting a single semantic descriptor for each CT image / patient instead of
considering all the slices as separate samples. Unfortunately, we could not achieve
the training of our deep learner. However, the results obtained show that this
approach could be much more efficient and give more interesting results if it is
applied properly.
As perspectives, we plan to adopt enrichment strategies and learning samples
selection. Indeed, one of the characteristics of the problematic addressed in the
SVR and TBT tasks is the nature of the provided data collections, which are
of a small size and are noisy because of the presence of many slices that do not
contain useful information. Our bagging and sub-sampling strategies adopted in
our experiments confirmed this. In addition, we noticed during the sub-sampling
of our data that the deletion or addition of some samples had an impact on the
results. On the other hand, filtering slices effectively to keep only those that are
truly informative is a key idea that could further improve system performance
as reported by several participating teams [11]. Furthermore, we noticed in our
experiments that there is a difference in terms of precision achieved for each
studied class. Indeed, some classes are more difficult to identify than others.
This is also an interesting track to study.
References
1. med2image: https://github.com/fnndsc/med2image. Last check: 30/05/2018.
2. Sgeast model for imageclef 2017 tubeculosis task :
https://github.com/maizesix92/imageclef2017 tb sgeast. Last check: 30/05/2018.
3. Cid, Y.D., Kalinovsky, A., Liauchuk, V., Kovalev, V., Müller, H.: Overview of
the imageclef 2017 tuberculosis task - predicting tuberculosis type and drug resis-
tances. In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation
Forum, Dublin, Ireland, September 11-14, 2017. (2017), http://ceur-ws.org/Vol-
1866/invited paper 1.pdf
4. Dicente Cid, Y., Kalinovsky, A., Liauchuk, V., Kovalev, V., , Müller, H.: Overview
of ImageCLEFtuberculosis 2017 - predicting tuberculosis type and drug resis-
tances. In: CLEF2017 Working Notes. CEUR Workshop Proceedings, CEUR-
WS.org , Dublin, Ireland (September 11-14 2017)
5. Dicente Cid, Y., Liauchuk, V., Kovalev, V., , Müller, H.: Overview of ImageCLEF-
tuberculosis 2018 - detecting multi-drug resistance, classifying tuberculosis type,
and assessing severity score. In: CLEF2018 Working Notes. CEUR Workshop Pro-
ceedings, CEUR-WS.org , Avignon, France (September 10-
14 2018)
6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.:
The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18
(2009)
7. Hamadi, A., Mulhem, P., Quénot, G.: Extended conceptual feedback for se-
mantic multimedia indexing. Multimedia Tools Appl. 74(4), 1225–1248 (2015).
https://doi.org/10.1007/s11042-014-1937-y, https://doi.org/10.1007/s11042-014-
1937-y
8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
arXiv preprint arXiv:1512.03385 (2015)
9. Ionescu, B., Müller, H., Villegas, M., de Herrera, A.G.S., Eickhoff, C., Andrea-
rczyk, V., Cid, Y.D., Liauchuk, V., Kovalev, V., Hasan, S.A., Ling, Y., Farri, O.,
Liu, J., Lungren, M., Dang-Nguyen, D.T., Piras, L., Riegler, M., Zhou, L., Lux, M.,
Gurrin, C.: Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In:
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceed-
ings of the Ninth International Conference of the CLEF Association (CLEF 2018),
LNCS Lecture Notes in Computer Science, Springer, Avignon, France (September
10-14 2018)
10. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar-
rama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding.
arXiv preprint arXiv:1408.5093 (2014)
11. Sun, J., Chong, P., Tan, Y.X.M., Binder, A.: Imageclef 2017: Imageclef tuberculosis
task - the sgeast submission. In: Working Notes of CLEF 2017 - Conference and
Labs of the Evaluation Forum, Dublin, Ireland, September 11-14, 2017. (2017),
http://ceur-ws.org/Vol-1866/paper 130.pdf