Overview of ImageCLEFtuberculosis 2021 — CT-based Tuberculosis Type Classification Serge Kozlovski1 , Vitali Liauchuk1 , Yashin Dicente Cid2 , Vassili Kovalev1 and Henning Müller3,4 1 United Institute of Informatics Problems, Minsk, Belarus 2 University of Warwick, Coventry, UK 3 University of Applied Sciences Western Switzerland (HES–SO), Sierre, Switzerland 4 University of Geneva, Switzerland Abstract ImageCLEF is a part of the Conference and Labs of the Evaluation Forum (CLEF) initiative and includes a variety of tasks dedicated to multimodal image information retrieval, including image classification and annotation. The tuberculosis (TB) task is one of the ImageCLEF tasks which started in 2017 and changed from year to year. The 2021 edition was dedicated to the automatic classification of five TB types: Infiltrative, Focal, Tuberculoma, Miliary, Fibro-cavernous. The task itself repeated one of the original subtasks from 2017 but the dataset was significantly changed. In 2021, 11 groups from 9 countries participated in the task and submitted at least one successful run. The task results can be compared to the TB type classification task results in the 2017 and 2018 editions. Although top scores were not improved compared to the previous editions, the participants’ results allow us to analyze the effectiveness of applying recent deep learning approaches to the task. Keywords Tuberculosis, Computed Tomography, Image Classification, Tuberculosis Type, 3D Data Analysis 1. Introduction ImageCLEF1 is a part of the the CLEF2 initiative and presents a set of image information retrieval tasks. Medical tasks were included in the 2nd edition of ImageCLEF in 2004 and have been held every year since then [1, 2, 3, 4, 5]. The tuberculosis task is one of the medical tasks this year. More information on the other tasks organized in 2021 can be found in [6] and the past editions of ImageCLEF are described in [7, 8, 9, 10, 11, 12, 13]. Tuberculosis (TB) is a bacterial infection caused by a germ called Mycobacterium tuberculosis. About 130 years after its discovery, the disease remains a persistent threat and one of the top 10 causes of death worldwide according to the WHO [14]. The bacteria usually attack the lungs and generally TB can be cured with antibiotics. However, different types of TB require different treatments and therefore detection of the specific case characteristics is an important real-world task. CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " kozlovski.serge@gmail.com (S. Kozlovski) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 1 http://www.imageclef.org/ 2 http://www.clef-initiative.eu/ In the previous editions of this task, the setup evolved from year to year. In the first two editions [15, 16] participants had to detect multi-drug resistant patients (MDR subtask) and to classify the TB type (TBT subtask) both based only on the computed tomography (CT) image. After 2 editions it was concluded to drop the MDR subtask because it seemed impossible to solve based only on the image, and the TBT subtask was also suspended because of a very little improvement in the results between the 1st and the 2nd editions. At the same time, most of the participants obtained good results in the severity scoring (SVR) subtask introduced in 2018. In the 3d edition, the Tuberculosis task [17] was restructured to allow usage of the uniform dataset, and included two subtasks – a continued severity score (SVR) prediction subtask and a new subtask based on providing an automatic CT report on the TB case (CTR subtask). In the 4th edition [18], the SVR subtask was dropped and the automated CT report generation task was modified to be lung-based rather than CT-based. Because of the fairly high results achieved by the participants in the CTR task in 2020, we decided to discontinue the CTR task at the moment and switch to the task which was not yet solved with sufficiently high quality. So in this year’s edition, it was decided to bring back to life the Tuberculosis Type classification task from the 1st and 2nd ImageCLEFmed Tuberculosis editions. The dataset was updated, extended in size and some additional information was added for a part of the CT scans. We hoped that utilizing the newest deep learning approaches together with the available pre-trained models and additional data sets would allow the participants to achieve better results for the TB Type classification compared to the early editions of the task. This article first describes the task proposed for TB in 2021. Then, details on the data sets, evaluation methodology, and participation are given. The results section describes the submitted runs and the results obtained. A discussion and conclusion section ends the paper. 2. Task, Data Set, Evaluation, Participation 2.1. The Task in 2021 In this task, participants had to automatically categorize each TB case into one of the following five types: (1) Infiltrative, (2) Focal, (3) Tuberculoma, (4) Miliary, (5) Fibro-cavernous. So the task is a multi-class classification problem. 2.2. Data Set In this edition, a data set containing chest CT scans of 1,338 TB patients was used: 917 images for the training (development) data set and 421 for the test set. Each CT image corresponded to only one TB type and to one unique patient. Additional meta-information containing CT-report in the 2020 edition format was provided for 243 cases. Since this meta-information may be potentially used as a target label in future task editions, it was provided only for a subset of train images. We expected participants would use the best approaches from the previous year task edition to generate CT report for all train and test cases, and then use this meta-information as an additional feature for TB type prediction. For every patient, a 3D CT image series was provided with a slice size of 512 × 512 pixels and median number of slices equal to 128. All the CT images were stored in NIFTI file format with .nii.gz file extension (g-zipped .nii files). This file format stores raw voxel intensities in Hounsfield units (HU) as well as the corresponding image meta-data such as image dimensions, voxel size in physical units, slice thickness, etc. Same as in the previous year, for all patients, we provided two versions of automatically extracted masks of the lungs obtained using the methods described in [19, 20, 18]. Typical examples of CTs with different TB types are shown in Fig. 1. Table 1 details the distribution of patients within each TB type. One can note an important unbalance in the label numbers caused by natural reasons. During the data split, we tried to achieve distribution similarity between the training and the testing data. Table 1 Distribution of CT images within each class. Set Infiltrative Focal Tuberculoma Miliary Fibro-cavernous Train 420 (46%) 226 (24%) 101 (11%) 100 (11%) 70 (8%) Test 169 (40%) 86 (21%) 64 (15%) 60 (14%) 42 (10%) 2.3. Evaluation Measures and Scenario Similar to the previous editions, each participating group could submit up to 10 runs in total. The task was evaluated as a multi-class classification problem and scored using unweighted Cohen’s Kappa coefficient and accuracy metrics. The ranking of this task was done first by Kappa and then by accuracy. 2.4. Participation In 2021, there were 78 registered teams and 29 signed the end-user agreement. 11 groups from 9 countries participated and submitted results. The number of submissions is a bit higher than in 2020. Table 2 shows the list of participants and their institutions. 3. Results To perform a ranking for this task we used the Cohen’s Kappa coefficient as primary score and accuracy as secondary score. Table 3 shows these two measures calculated for the best run submitted by each participating group. For the best run of each group, Figures 2 and 3 show the corresponding confusion matrices and true positive rate for each TB type. SenticLab.UAIC [21] is the winner of the task with a Kappa score of 0.221 and an accuracy of 0.466. In their experiments, the SenticLab.UAIC team compared several approaches based on 2D and 3D CNNs. The winning method was based on the slice-wise application of an EfficientNet-B4 2D CNN. Figure 1: Slices of typical CT images with different TB types. The hasibzunair [22] team ranked 2nd in terms of both Kappa and accuracy. The team approach was based on the usage of a hybrid 2D CNN-RNN model. The team experiments included extensive usage of transfer learning techniques and a custom loss function. The SDVA-UCSD [23] team selected volumetric analysis and used 3D CNN models for the CT analysis. The team’s best solution was found using a 3D ResNet34 with convolutional block attention. The Emad_Aghajanzadeh team in their work [24] experimented with different approaches, including slice-wise analysis using a 2D CNN and a hybrid 2D CNN + RNN model, volume-based analysis using a 3D CNN and a hybrid 3D CNN + RNN model, and also a hybrid use of 2D + 3D CNN + RNN models. The team’s best result was achieved using a custom 3D CNN. The MIDL-NCAI-CUI team used slice-wise analysis using a pre-trained 2D CNN and ended up with the EfficientNet-B0 model. The uaic2021 [25] team tried both 2D projection-based and volumetric approaches. They finished up with 3D MedicalNet10 for the best run. The IALab_PUC [26] team tested several pipelines based on different 2D CNNs, including custom one and pre-trained DenseNet121. The KDE-lab [27] team used slice-wise analysis using the 2D CNN model (EfficientNet-B5). The JBTTM [28] team tried to use 3D CNNs but failed and used a simple shallow neural Table 2 List of participating teams that submitted at least one run. Group name Main institution Country Emad_Aghajanzadeh Ferdowsi University of Mashhad Iran hasibzunair Concordia University Canada IALab_PUC IALab PUC Chili JBTTM SSN College of Engineering India KDE-lab Toyohashi University Japan MIDL-NCAI-CUI COMSATS University Islamabad Pakistan SenticLab.UAIC Alexandru Ioan Cuza University of Iasi Romania SDVA-UCSD San Diego VA/UCSD USA uaic2020 Alexandru Ioan Cuza University of Iasi Romania YNUZHOU Yunnan University China Zhao_Shi_ Yunnan University China Table 3 Summary on the participant submissions and their results. Group Group # of Rank of the rank name runs Kappa Accuracy best run 1 SenticLab.UAIC 10 0.221 0.466 1 2 hasibzunair 8 0.200 0.423 4 3 SDVA-UCSD 7 0.190 0.371 8 4 Emad_Aghajanzadeh 10 0.181 0.404 11 5 MIDL-NCAI-CUI 5 0.140 0.333 23 6 uaic2020 3 0.129 0.333 28 7 IALab_PUC 3 0.120 0.401 30 8 KDE-lab 10 0.117 0.382 31 9 JBTTM 1 0.038 0.221 42 10 Zhao_Shi_ 5 0.015 0.380 47 11 YNUZHOU 1 -0.008 0.385 55 network for the final submission. The Zhao_Shi_ [29] team used slice-wise analysis using pre-trained EfficientNet-B0 2D CNN and ghost modules. 4. Discussion and Conclusions The results obtained in the task can be compared to the original TBT subtasks presented in the 2017 [15] and 2018 [16] edition. Before comparison, we should note, that although the task setup is the same in both editions, the data set was significantly changed, which means participants needed to deal with different images and label distributions, so the achieved scores can not be compared directly. Top scores in the 2017, 2018 and 2021 editions are fairly close (Table 4). The best score of 2021 Table 4 Kappa and accuracy scores achieved in 2017, 2018 and 2021 task editions. Colored entries correspond to groups that participated in more than one edition. Group rank 2017 2018 2021 1 0.244 (0.403) 0.231 (0.423) 0.221 (0.466) 2 0.233 (0.387) 0.173 (0.353) 0.200 (0.423) 3 0.219 (0.407) 0.171 (0.385) 0.190 (0.371) 4 0.196 (0.390) 0.166 (0.379) 0.181 (0.404) 5 0.153 (0.343) 0.147 (0.338) 0.140 (0.333) 6 0.022 (0.240) 0.063 (0.274) 0.129 (0.333) 7 - 0.020 (0.259) 0.120 (0.401) 8 - -0.002 (0.237) 0.117 (0.382) 9 - - 0.038 (0.221) 10 - - 0.015 (0.380) 11 - - -0.008 (0.385) was achieved by SenticLab.UAIC group and is slightly worse than the best result of 2018 and 2017 in terms of Kappa score - 0.221 vs 0.231 (-0.01 drop) and 0.244 (-0.022 drop). On the other hand, four groups overcome the 2nd best result from 2018. The top-ranked groups in the 2017 edition achieved better scores than in 2018 and this year but the difference can be explained by a decrease in TB type balance, rather than a drop in the performance of the approaches effectiveness [15, 16]. Additionally, it is worth mentioning that the group SDVA-UCSD that participated in the 2018 and 2021 editions was able to improve their Kappa score from 0.147 to 0.190. A detailed analysis of the participant predictions in Figures 2 and 3 demonstrate that many participants struggled with natural TB type unbalance, which resulted in observed overfit- ting to the most frequently presented classes. Only SDVA-UCSD and MIDL-NCAI-CUI were able to achieve better than random recall for all TB types. SenticLab.UAIC, hasibzunair and Emad_Aghajanzadeh were able to achieve better than random recall for all TB types except Tuberculoma. Analyzing the participants’ working notes papers we mentioned the variability of approaches and usage of modern machine learning techniques and methods. Thus, the top-3 groups used completely different approaches. In contrast to the 2017 and 2018 task editions, all participants used deep learning methods for CT analysis. The majority of the participants (eight groups) used 2D CNN to analyze either selected projections of CT images or all slices. Two of these groups further used the slice-wise features extracted by 2D CNN to train an RNN in order to extract inter-slice information. Four groups successfully tried to utilize 3D CNNs for whole CT analysis. Different neural network architectures and model training tweaks were used by the participants. The majority of the participants also used transfer learning techniques. All participants used some approaches for artificial data set enlargement and a few pre-processing steps, such as resizing, normalization, slice filtering etc. We should mention that analysis of the participant approaches demonstrates that in some cases results can be improved if more attention is paid to common machine learning routines like careful treatment of train-validation label distribution and accurate CT data pre-processing (for example, a few groups ignored the provided lung masks, which may affect model effectiveness). Unfortunately, none of the groups tried to utilize the provided partial CT report metadata, so its importance in this task remains an open question. Possible updates for future editions of the TBT task should consider: (i) extending the additional meta-information for CT scans; (ii) including some kind of lesion location information to the data set. Acknowledgements All the data for the Tuberculosis task were provided by the Republican Research and Practical Center for Pulmonology and Tuberculosis which is located in Minsk, Belarus. The data were collected and labeled in the framework of several projects that aim at the creation of information resources on lung TB and drug resistance challenges. The projects were conducted by a multi-disciplinary team and funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), U.S. Department of Health and Human Services, USA, through the Civilian Research and Development Foundation (CRDF). The dedicated web-portal3 developed in the framework of the projects stores information of almost 5,000 TB cases patients from 16 countries. The information includes CT scans, X-ray images, genome data, clinical and social data. Data collection was supported by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, US Department of Health and Human Services, CRDF project RDAA9-20-67103-1 "Year 9: Belarus TB Database and TB Portals". References [1] J. Kalpathy-Cramer, A. García Seco de Herrera, D. Demner-Fushman, S. Antani, S. Bedrick, H. Müller, Evaluating performance of biomedical image retrieval systems: Overview of the medical image retrieval task at ImageCLEF 2004–2014, Computerized Medical Imaging and Graphics 39 (2015) 55 – 61. [2] H. Müller, P. Clough, T. Deselaers, B. Caputo (Eds.), ImageCLEF – Experimental Evalu- ation in Visual Information Retrieval, volume 32 of The Springer International Series On Information Retrieval, Springer, Berlin Heidelberg, 2010. [3] A. García Seco de Herrera, R. Schaer, S. Bromuri, H. Müller, Overview of the ImageCLEF 2016 medical task, in: Working Notes of CLEF 2016 (Cross Language Evaluation Forum), 2016. [4] H. Müller, P. Clough, W. Hersh, A. Geissbuhler, ImageCLEF 2004–2005: Results experiences and new ideas for image retrieval evaluation, in: International Conference on Content– Based Multimedia Indexing (CBMI 2005), IEEE, Riga, Latvia, 2005. [5] T. Deselaers, T. M. Deserno, H. Müller, Automatic medical image annotation in ImageCLEF 2007: Overview, results, and discussion, Pattern Recognition Letters 29 (2008) 1988–1995. 3 http://tbportals.niaid.nih.gov/ [6] B. Ionescu, H. Müller, R. Péteri, A. Ben Abacha, M. Sarrouti, D. Demner-Fushman, S. A. Hasan, S. Kozlovski, V. Liauchuk, Y. D. Cid, V. Kovalev, O. Pelka, A. G. S. de Herrera, J. Jacutprakart, C. M. Friedrich, R. Berari, A. Tauteanu, D. Fichou, P. Brie, M. Dogariu, L. D. Ştefan, M. G. Constantin, J. Chamberlain, A. Campello, A. Clark, T. A. Oliver, H. Moustahfid, A. Popescu, J. Deshayes-Chossart, Overview of the ImageCLEF 2021: Multimedia retrieval in medical, nature, internet and social media applications, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 12th International Conference of the CLEF Association (CLEF 2021), LNCS Lecture Notes in Computer Science, Springer, Bucharest, Romania, 2021. [7] B. Ionescu, H. Müller, R. Péteri, A. B. Abacha, V. Datla, S. A. Hasan, D. Demner-Fushman, S. Kozlovski, V. Liauchuk, Y. D. Cid, V. Kovalev, O. Pelka, C. M. Friedrich, A. G. S. de Herrera, V.-T. Ninh, T.-K. Le, L. Zhou, L. Piras, M. Riegler, P. l Halvorsen, M.-T. Tran, M. Lux, C. Gur- rin, D.-T. Dang-Nguyen, J. Chamberlain, A. Clark, A. Campello, D. Fichou, R. Berari, P. Brie, M. Dogariu, L. D. Ştefan, M. G. Constantin, Overview of the ImageCLEF 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, volume 12260 of Proceedings of the 11th International Conference of the CLEF Association (CLEF 2020), LNCS Lecture Notes in Computer Science, Springer, Thessaloniki, Greece, 2020. [8] B. Ionescu, H. Müller, R. Péteri, Y. Dicente Cid, V. Liauchuk, V. Kovalev, D. Klimuk, A. Tarasau, A. B. Abacha, S. A. Hasan, V. Datla, J. Liu, D. Demner-Fushman, D.-T. Dang- Nguyen, L. Piras, M. Riegler, M.-T. Tran, M. Lux, C. Gurrin, O. Pelka, C. M. Friedrich, A. G. S. de Herrera, N. Garcia, E. Kavallieratou, C. R. del Blanco, C. C. Rodríguez, N. Vasillopoulos, K. Karampidis, J. Chamberlain, A. Clark, A. Campello, ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature, in: Experimental IR Meets Multilin- guality, Multimodality, and Interaction, volume 2380 of Proceedings of the 10th International Conference of the CLEF Association (CLEF 2019), LNCS Lecture Notes in Computer Science, Springer, Lugano, Switzerland, 2019. [9] B. Ionescu, H. Müller, M. Villegas, A. G. S. de Herrera, C. Eickhoff, V. Andrearczyk, Y. Di- cente Cid, V. Liauchuk, V. Kovalev, S. A. Hasan, Y. Ling, O. Farri, J. Liu, M. Lungren, D.-T. Dang-Nguyen, L. Piras, M. Riegler, L. Zhou, M. Lux, C. Gurrin, Overview of ImageCLEF 2018: Challenges, datasets and evaluation, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018), LNCS Lecture Notes in Computer Science, Springer, Avignon, France, 2018. [10] B. Ionescu, H. Müller, M. Villegas, H. Arenas, G. Boato, D.-T. Dang-Nguyen, Y. Dicente Cid, C. Eickhoff, A. Garcia Seco de Herrera, C. Gurrin, B. Islam, V. Kovalev, V. Liauchuk, J. Mothe, L. Piras, M. Riegler, I. Schwall, Overview of ImageCLEF 2017: Information extraction from images, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017, volume 10456 of Lecture Notes in Computer Science, Springer, Dublin, Ireland, 2017. [11] M. Villegas, H. Müller, A. Garcia Seco de Herrera, R. Schaer, S. Bromuri, A. Gilbert, L. Piras, J. Wang, F. Yan, A. Ramisa, A. Dellandrea, R. Gaizauskas, K. Mikolajczyk, J. Puigcerver, A. H. Toselli, J.-A. Sanchez, E. Vidal, General overview of ImageCLEF at the CLEF 2016 labs, in: CLEF 2016 Proceedings, Lecture Notes in Computer Science, Springer, Evora. Portugal, 2016. [12] M. Villegas, H. Müller, A. Gilbert, L. Piras, J. Wang, K. Mikolajczyk, A. García Seco de Herrera, S. Bromuri, M. A. Amin, M. Kazi Mohammed, B. Acar, S. Uskudarli, N. B. Marvasti, J. F. Aldana, M. d. M. Roldán García, General overview of ImageCLEF at the CLEF 2015 labs, in: Working Notes of CLEF 2015, Lecture Notes in Computer Science, Springer International Publishing, 2015. [13] B. Caputo, H. Müller, B. Thomee, M. Villegas, R. Paredes, D. Zellhofer, H. Goeau, A. Joly, P. Bonnet, J. Martinez Gomez, I. Garcia Varea, C. Cazorla, ImageCLEF 2013: the vision, the data and the open challenges, in: Working Notes of CLEF 2013 (Cross Language Evaluation Forum), 2013. [14] World Health Organization, et al., Global tuberculosis report 2019 (2019). [15] Y. Dicente Cid, A. Kalinovsky, V. Liauchuk, V. Kovalev, , H. Müller, Overview of ImageCLEF- tuberculosis 2017 - predicting tuberculosis type and drug resistances, in: CLEF2017 Work- ing Notes, CEUR Workshop Proceedings, CEUR-WS.org , Dublin, Ireland, 2017. [16] Y. Dicente Cid, V. Liauchuk, V. Kovalev, , H. Müller, Overview of ImageCLEFtuberculosis 2018 - detecting multi-drug resistance, classifying tuberculosis type, and assessing sever- ity score, in: CLEF2018 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Avignon, France, 2018. [17] Y. Dicente Cid, V. Liauchuk, D. Klimuk, A. Tarasau, V. Kovalev, H. Müller, Overview of ImageCLEFtuberculosis 2019 - Automatic CT-based Report Generation and Tuberculosis Severity Assessment, in: CLEF2019 Working Notes, CEUR Workshop Proceedings, CEUR- WS.org , Lugano, Switzerland, 2019. [18] S. Kozlovski, V. Liauchuk, Y. Dicente Cid, A. Tarasau, V. Kovalev, H. Müller, Overview of Im- ageCLEFtuberculosis 2020 - automatic CT-based report generation, in: CLEF2020 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Thessaloniki, Greece, 2020. [19] Y. Dicente Cid, O. Jimenez-del-Toro, A. Depeursinge, H. Müller, Efficient and fully auto- matic segmentation of the lungs in CT volumes, in: O. Orcun Goksel, Jimenez-del-Toro, A. Foncubierta-Rodriguez, H. Müller (Eds.), Proceedings of the VISCERAL Challenge at ISBI, number 1390 in CEUR Workshop Proceedings, 2015, pp. 31–35. [20] V. Liauchuk, V. Kovalev, Imageclef 2017: Supervoxels and co-occurrence for tuberculosis CT image classification, in: CLEF2017 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Dublin, Ireland, 2017. [21] C. Moisii, R. Miron, M. E. Breaban, Identifying tuberculosis type in CTs, in: CLEF2021 Work- ing Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [22] H. Zunair, A. Rahman, N. Mohammed, ViPTT-Net: Video pretraining of spatio-temporal model for tuberculosis type classification from chest CT scans, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [23] X. Lu, E. Y. Chang, C.-n. Hsu, J. Du, A. Gentili, Multi-Classification Study of the Tuberculosis with 3D CBAM-ResNet and EfficientNet, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [24] E. Aghajanzadeh, B. Shomali, D. Aminshahidi, N. Ghassemi, Classification of Tuberculosis Type on CT Scans of Lungs using a fusion of 2D and 3D Deep Convolutional Neural Networks, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [25] A. Hanganu, C. Simionescu, L.-G. Coca, A. Iftene, UAIC2021: Lung Analysis for Tu- berculosis Classification, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [26] J. M. Quintana, D. Florea, R. Deane, D. Parra, P. Pino, P. Messina, H. Lobel, PUC Chile team at TBT Task: Diagnosis of Tuberculosis Type using segmented CT scans, in: CLEF2021 Work- ing Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [27] T. Asakawa, R. Tsuneda, K. Shimizu, T. Komoda, M. Aono, ImageCLEF 2021: Deep categorizing tuberculosis cases using normalization and pseudo-color CT image, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [28] U. Balwal, S. A. Yeragudipati, J. Bhuvana, T. T. Mirnalinee, Simple Neural Network based TB Classification, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. [29] J. Li, L. Yang, B. Yang, Lijie at ImageCLEFmed Tuberculosis 2021: EfficientNet Simpli- fied Tuberculosis Case Classification, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bucharest, Romania, 2021. SenticLab.UAIC (accuracy=0.466) hasibzunair (accuracy=0.423) 125 16 7 14 7 97 36 12 12 12 1 1 50 30 5 1 0 35 31 14 5 1 2 2 3 42 13 7 0 2 34 15 11 2 2 3 30 10 0 15 5 19 8 2 23 8 4 4 20 0 0 3 19 20 0 0 6 16 5 5 1 2 3 4 5 1 2 3 4 5 SDVA-UCSD (accuracy=0.371) Emad_Aghajanzadeh (accuracy=0.404) 44 41 50 19 15 94 33 24 12 6 1 1 25 45 5 9 2 36 27 15 7 1 2 2 16 21 24 1 2 33 19 10 2 0 3 3 10 14 8 23 5 17 8 7 25 3 4 4 10 2 2 8 20 11 6 1 10 14 5 5 1 2 3 4 5 1 2 3 4 5 MIDL-NCAI-CUI (accuracy=0.333) uaic2021 (accuracy=0.333) 47 46 35 34 7 53 52 7 55 2 1 1 20 32 33 1 0 21 43 5 16 1 2 21 17 21 5 0 2 22 27 0 14 1 3 3 14 10 4 26 6 7 9 1 37 6 4 4 5 2 6 15 14 16 1 0 18 7 5 5 1 2 3 4 5 1 2 3 4 5 IALab_PUC (accuracy=0.401) KDE-lab (accuracy=0.382) 117 41 0 3 8 92 64 5 7 1 1 5 4 3 2 1 55 30 0 0 1 43 40 1 2 0 2 36 21 0 2 5 36 24 4 0 0 3 39 11 0 4 6 30 9 3 17 1 4 23 1 0 0 18 24 1 1 8 8 5 1 2 3 4 5 1 2 3 4 5 JBTTM (accuracy=0.221) Zhao_Shi_ (accuracy=0.380) 1 130 4 33 1 148 9 0 10 2 1 1 1 75 0 8 2 74 8 0 3 1 2 2 0 41 2 20 1 56 2 0 6 0 3 3 0 44 0 15 1 49 7 0 3 1 4 4 0 29 1 12 0 38 3 0 0 1 5 5 1 2 3 4 5 1 2 3 4 5 YNUZHOU (accuracy=0.385) 159 2 0 7 1 1 83 1 0 2 0 2 61 0 0 3 0 3 59 0 0 1 0 4 40 0 0 1 1 5 1 2 3 4 5 Figure 2: Confusion matrices obtained by the best run of each group. Vertical axes – true TB type, horizontal axes – predicted TB type. Best 2017 uaic2021 Best 2018 IALab_PUC SenticLab.UAIC KDE-lab 0.8 hasibzunair JBTTM SDVA-UCSD Zhao_Shi_ Emad_Aghajanzadeh YNUZHOU MIDL-NCAI-CUI True Positive Rate 0.6 0.4 0.2 0.0 (1) Infiltrative (2) Focal (3) Tuberculoma (4) Miliary (5) Fibro-cavernous TB Type Figure 3: True positive rates obtained by the best run of each group.