-

ImageCLEF 2018: Semantic descriptors for Tuberculosis CT Image Classi cation

Abdelkader HAMADI[

abdelkader.hamadi@univ-mosta.dz 0

Djamel Eddine YAGOUB

djamel.ed.y@gmail.com 0 0 University of Abdelhamid Ibn Badis Mostaganem Faculty of Exact Sciences and Computer Science Mathematics and Computer Science Department Mostaganem , Algeria

In this article, we present our methodologies used in our participation at the two sub-tasks of the ImageCLEF 2018 Tuberculosis Task (TBT and SVR task). We proposed to extract a single semantic descriptor of 3D CT image to describe each patient rather than using all his slices as separate samples. In TBT task, the resulting descriptors are then exploited in a second learning stage to identify the type of tuberculosis among ve given classes. In SVR task, the same experimental design is used to predict the degree of severity of the disease. We reached a Kappa coe cient value of about 0.0629 in TBT sub-task, and our best run on SVR was ranked 12th out of 36 submission and 5th out of 7 participant teams. We believe that our approach could give better results if applied properly.

ImageCLEF Tuberculosis Task Deep Learning CT Image Tuberculosis CT Image Classi cation Tuberculosis Severity Scoring

Tuberculosis is an infectious disease caused by a bacterium called Bacillus microbacterium tuberculosis. With a high mortality rate in the world, this disease remained one of the top ten causes of death in the world in 2015. Diagnosing this sickness quickly and accurately is a vital goal that would limit its invasion and damage. One of the major problems of this disease is that traditional tests produce inaccurate or too long results. For these reasons, researchers have been interested in this disease diagnosis, particularly in the context of the international challenge ImageCLEF 2017 [ 3 ] and ImageCLEF 2018 [ 9 ] where two tasks (three tasks in ImageCLEF 2018) have been reserved for it. The rst aims to detect multi-drug resistant (MDR) status of patients. The goal of the second task is to identify the type of tuberculosis. A third task has been introduced in ImageCLEF 2018 [ 5 ] which consists to predict the degree of severity of the patient's case. In all the three tasks, the predictions are based on 3D CT scans images. Algorithms involving deep learning have been tested to diagnose the presence or the absence of tuberculosis. The results obtained were interesting. However, they must be improved for better control and e ective diagnosis, helping doctors to make the decisions and to choose the necessary treatments at the right time.

We can summarize the objectives of the Tuberculosis task through the following points: { Helping medical doctors in the diagnosis of drug-resistant TB and TB type identi cation through image processing techniques; { Introducing work towards inexpensive and quick methods for early detection of the MDR status and TB types in patients; { Predicting quickly the type of TB and its severity degree to help doctors to make quick decisions and give the e ective treatments.

We present in the following our work that has been made in the context of our participation to the two sub-tasks of ImageCLEF 2018 Tuberculosis Task: TuBerculosis Types classi cation (TBT) and Tuberculosis Severity Scoring (SVR).

The remainder of this article is organized as follows. Section 2 describes the two tasks to which we had participated. In section 3, we present our contribution by detailing the system deployed to complete our submissions. Section 4 details our experimental protocols followed to generate our predictions. We detail and analyze in the same section the results obtained. We conclude in the last section by presenting our perspectives and future works. 2 2.1

Participation to imageCLEF 2018

Tasks description

In this paper, we focus on our participation in the TBT and the SVR sub-tasks that we describe in the following sections.

In both tasks the data is provided as 3D CT scans. For some patients several 3D CT scans are given while for some others only one is provided. All the CT images are stored in NIFTI le format with .nii.gz extension le (g-zipped .nii les). For each of the 3-dimensions of the CT image, we nd a number of slices varying from about 50 to 400. Each slice has a size of about 512 512 pixels.

A training collection is provided at the beginning of the task with its groundtruth (labels of samples). Participants prepare and train their systems on this dataset. A test collection is provided at a later date. Participants interrogate their system and return their predictions to the organizers' committee. An evaluation is performed by the latter to compare the performance of the systems. TBT task consists of the automatic categorization of TB cases in 5 target classes based on CT scans of patients. The ve types considered are:

1. In ltrative

2. Focal, 3. Tuberculoma 4. Miliary 5. Fibro-cavernous

The results will be evaluated using unweighted Cohens Kappa and accuracy. SVR task aims to predict the degree of severity of TB cases. Given a TB patient, the main goal is to predict its severity score based on his 3D CT scan.The degree of severity is modeled according to 5 discrete values : from 1 (\critical/very bad") to 5 (\very good"). The score value is simpli ed so that values 1, 2 and 3 correspond to \high severity" class, and values 4 and 5 correspond to \low severity".

The classi cation problem are evaluated using ROC-curves (AUC) produced from the probabilities provided by the participants. For the regression problem, the root mean square error (RMSE) is used. 3

Our contribution We proposed to extract semantic descriptors from 3D CT scans. We noticed that participants of the ImageCLEF TBT 2017 task used each extracted slice as a separate sample. Thus, hundreds of slices are considered as separate learning samples while these slices represent the same patient. This introduces a lot of noise. In addition, each slice will be assigned the label of the patient (its type) even those whose content does not present any information to identify the type of TB case. This introduces more noise. The majority of the participants [ 11 ] of ImageCLEF 2017 highlighted this problem and its impact on the results.

To overcome this problem, we believe that the simplest solution is to produce a single descriptor for each patient. This constitutes the key idea of our contribution.

Our proposed system goes through three main stages:

1. Input data pre-processing 2. Features extraction 3. Learning a classi cation model We will detail each step in the following.

3.1

Input data pre-processing

We remind that in both tasks, 3D CT scans are provided in compressed Nifti format. Firstly, we decompress the les and extract the slices. At the end, we have three sets of slices corresponding to the three dimensions of the 3D image. For each dimension and for each Nifti image we obtain a number of slices ranging from 50 to 400 jpeg images.

The visual content of the images extracted from the di erent dimensions is not similar. Indeed, the images of each dimension are taken with from a di erent angle of view.We noticed from our experiments that the slices of the -Ydimension give better results compared to the two others (X and Z). However, the following steps can be applied to slices of any of the three dimensions.

CT scans Nifti format (.nii.gz)

Extracting slices

Converting Nifti to JPG

Image slices 3 Dimensions X

Z 60 selected slices per

CT image / patient

Filtering

Selecting a dimension

On the other hand, not all slices necessarily contain relevant information that can be useful to identify types of TB. This is why, it is essential to lter slices by keeping only those that can be informative and may contain relevant information. Moreover, since we want to extract a single descriptor per patient, it is essential to keep the same number of slices for each patient. We found that there is usually a maximum of 60 slices visually informative. Since the slices are ordered, the 60 most informative are usually at the center of the list. We propose then to keep the 60 middle slices. This is not optimal but we opted for this choice for a fully automatic approach. This choice can be improved by performing a manual ltering with the intervention of a human expert, preferably with medical skills on TB disease. Figure 1 summarizes the process. 3.2

Features extraction

After slices extraction and ltering, we propose to extract a single descriptor per patient. The transfer learning presents in this context an interesting track that can be exploited. The results of SGEast [ 11 ] and even other teams in the same task of ImageCLEF 2017 proved the e ciency of this approach [ 4, 11 ]. Indeed, SGEast opted for the transfer learning where they exploited the output of a Resnet-50 [ 8 ] deep learner layer. However, this idea presents a problem of the resulting descriptor size. Indeed, for example, SGeast considered a descriptor per slice and not per patient. However, since we want to have a single descriptor, it is important that the information extracted from each slice must not be very large. Therefore, we propose to describe each slice by semantic information. This idea is inspired by the work presented in [ 7 ].

So, we choose to exploit the probabilities predicted by a deep learner trained on the set of slices. If K is the number of classes considered, this information typically corresponds to the K predicted probability values for the K classes ( ve probabilities of the ve types for the TBT task, or the ve severity degrees for the SVR task). We obtain then for each slice K values corresponding to the number of the considered classes.

Slices, labels

Slices Labels / K classes Deep Learner Deep learned model

ClassK - Classes

C-3 C-1 C-2 C-4 C-k

Pr-i,j : probability score for the

i-th slice regarding the j-th classe patient

CT image

Pre-processing

Slice-1 Slice-2 Slice-60

Pr-1,1 Pr-2,1 Pr-1,2 Pr-2,2 Pr-1,3 Pr-2,3 Pr-1,4 Pr-2,4 Pr-60,1 Pr-60,2 Pr-60,3 Pr-60,4

Semantic Sub-discriptors: D-1 D-2 D-3 D-4

Pr-1,k Pr-2,k Pr-60,k D-K

Final semantic descriptor for the patient

D-1 D-2 D-3

Concatenation of all

sub-discriptors

D-4 ---

D-k

Furthermore, K sub-descriptors are generated: D1, D2, D3, D4, ... Dk. Each sub-descriptor Di contains the predicted probabilities for the class i for all the slices of the patient. A nal semantic descriptor is constructed by concatenating the K sub-descriptors. Figure 2 details the process of the semantic feature extraction for one patient. In this step, we propose to exploit the semantic descriptors of patients obtained in the previous step. Any approach of supervised classi cation can be applied as shown in gure 3.

Labels Train –corpus CT images

CT image of a test patient n o i t c a r t x e s e r u t a e F c i t n a m e S

Semantic descriptors / Train corpus

Semantic descriptor 1

Semantic d.escriptor 2

. .

Semantic descriptor n

Semantic descriptor

Semantic..descriptor .

Supervised Learner learned model C-1 C-2 C-3 C-4 C-k Pr-1 Pr-2 Pr-3 Pr-4 Pr-k

Predicting a class / K-classes

We describe in the following sections our runs submitted to the TBT and SVR tasks.

We implemented the semantic descriptor approach described in section 3. We used for that the following tools: { The Ca e frawework [ 10 ] for deep learning; { Weka [ 6 ] for testing several learning and classi cation algorithms; { med2image [ 1 ] for the conversion of nifti medical images to the classic Jpeg format.

We chose to use slices of the -Y- dimension because our experiments showed that they are more suitable than those of the two others and got better results.

For descriptors extraction, our approach consists to learn a deep model to generate semantic information. Unfortunately, we had problems with our machines deployed for training our deep learner. Due to lack of time, we could not achieve the learning process. As an alternative to this step, we deployed the same model as the one proposed by the SGeast team [ 11 ] at the CLEF 2017 TBT Task. The model is accessible from the following link [ 2 ]. It is based on a Resnet-50 [ 8 ] and got the best results at the TBT task of 2017 edition. We have therefore exploited the outputs of the last layer (named prob) of the Resnet-50 corresponding to the probabilities of the 5 considered classes. 4.1

TBT task

Dataset: The dataset used in TBT tasks includes chest CT scans of TB patients along with the TB type. Some patients include more than one scan. All scans belonging to the same patient present the same TB type. Table 1 summarizes the distribution of CT scans according to the ve types of TB considered. Experimental protocol: We used the train collection provided by the organizers and we split it into two sub-collections: 80% for training and 20% as validation set. We have exploited in all our runs the semantic descriptors generated as previously described. We tested several learners in the classi cation step. We naly submitted three main runs. The other submissions are some variants or are generated through the fusion of some of these three runs: { Run 1 (TBT mostaganemFSEI run1): random forest as supervised classi er.

We tuned the two parameters referring to the number of iterations performed and the number of features selected randomly; { Run 2 (TBT mostaganemFSEI run2): bagging of a set of random forest learners. We tuned the number of learners for the bagging and the same two parameters as Run1 for random forest; { Run 4 (TBT mostaganemFSEI run4): A hierarchical classi cation. We organized the ve 5 classes into a hierarchical structure as described in gure 4. We have created two new virtual classes V 1 and V 2. V 2 regroups the three classes Type 1, Type 2, and V 2 contains the classes Type 3, Type 4 and Type 5. We have reorganized our collections in order to achieve a classication on two di erent levels. In the rst stage, we classify the samples into two virtual classes V 1 and V 2. In the second level of classi cation, we performed a classi cation of the samples regarding the set of classes of the predicted class in the previous stage. In two classi cation process we used a random forest learner by tuning its two parameters as described for Run1.

First level :

V-1

V-2 Second level : Type1

Type 2 Type3 Type 4 Type 5

Results: Table 2 shows the results obtained by our runs on validation collection.

0,26 0,24 0,22 0,2 0,18 0,16 0,14 0,12 0,1 A0,08 P P KA0,06 0,04 0,02 0 Mean=0,08

As shown on validation results, Run 4 has been our best submission and got also the best results on test collection compared to run 1 and run 2. task in terms of kappa coe cient and accuracy, respectively.

TBT_Task v t v v t t t v v v v v v t v t t t v t v t v t t v t t t t t t t t t t t t t -----00000-,,,0,,00001,146822 .fttrrrrzsssscca___e1500532bhppdoB i.ttxgTT__ee4dhBwm lil.ttsssccFSaaTTTT____eeendudBA lilli.zsssccSaTTTT____eeeCnduoBRA .I-----txFFSa3170200512020nhnuoRM i.ttxgTT__ee3dhBwm .I-----txFFSaT2000020250nhnuoBRRM illll.ttrrsssccFSaaTT____eeCnoooAA ill.tssccFSaaaTTT____eeeendunBAm lii.tzsssccSaTTTT____eeendudBR ii.sssscSkaaT642nbuom .ssckaTTT__1282B illll.ttrrsssccFSaaTT____eeCnoooAA .I-----txFFFSgae02005120730nhoRAM .tsscLaTTB i.tcxTT__endboBwm i.I-----txFFFxSTT20000202560nuBRRM .-----ttLFxFaTTTe740nhnuooBRRVMM lill.ssccSaTTTT____eeCnduoBAHGO i.tcxTT_endboBm il.ttrrssccSaTTTT____endooBHGO l.tsxaTT__102pBmm ill.tttrrsssccFSaaTTTT____eendooBA .txTT_2Bm I.ttrsxFSgEaaTT__e4unnoBmm il.ttrrssccSaTTTT____endooBHGO llil.trsxaTT___ee1npdbpdooBmmmm I.ttrsxFSgEaaTT__e1unnoBmm I.ttrsxFSgEaaTT__e2unnoBmM I.ttrsxFSgEaaTT__e6unnoBmM iil.trcxaa23Pddbbnnnoo il.tFxaT32nnn .txTT_1Bm I.tLxSTTTB I.ttrsxFSgEaaTT__e3unnoBmm .-------ttsLxkaTee71301pdnuRADUNw .------ttsLxkaTTe31701dpnuBRADU il.tFxSaT2nVM i.trxSeenVM T R T T T T n B _ - B B u T n T T T R u B r T T _ B T T B T

RUN

Although the results achieved by our submissions are not well ranked compared to those of the top of the list, we can notice that several runs belong to the same teams that had good results, and they probably do not di er too much. On the other hand, we recall that our semantic descriptors were extracted using a model that was not very well trained. In fact, we met problems with our machines during the training of our deep learner. Indeed, although SGEast's deployed model got the best results at ImageCLEF 2017 Tuberculosis TBT task, we did not have the ability to perform exactly the same pre-processing performed by this team as described in [ 11 ]. We believe that our semantic descriptors could give better results if they are extracted from a more adapted and well-developed deeper model. 0,42 0,39 0,36 0,33 0,3 0,27 Y CA0,24 R CU0,21 C A0,18 0,15 0,12 0,09 0,06 0,03 0

TBT_Task

Mean = 0,30 B T _ n u r _ T B T

B T

R T B

T with the corresponding severity score (1 to 5). Scores from 1 to 3 correspond to the \High" severity whereas the two scores 4 and 5 refer to the \Low" degree of severity. Table 4 summarizes the distribution of CT scans according to two severity classes. Experimental protocol: We generated in a rst step the semantic descriptors following the approach described in the section 3. For the prediction of TB severity scores, we treated the problem as a classi cation problem. We used for this two approaches : 1. Multi-class classi cation problem: we considered the ve scores as separate classes. We then tested several classi ers. We selected two that have been most e ective compared to those tested: Random forest, bagging of a set of random forest learners. 2. Hierarchical classi cation: We organized our data in order to carry out a hierarchical classi cation. We considered the hierarchy described in gure 7. Then, a two-level hierarchical classi cation is carried out. In the rst level the samples are classi ed into \High" or \Low" classes. In the second level, the samples are reclassi ed into the descending classes of the one predicted in the rst level.

First level :

High

Low Second level : 1 2 3 4

5 1. Run 1 (SVR mostaganemFSEI run1): Multi-class model using Random forest as classi er. We tuned the two parameters : the number of iterations performed and the number of features randomly chosen; 2. Run 2 (SVR mostaganemFSEI run2) : Multi-class model using a bagging of a set of random forest learners with sub-sampling of the main train collection. We created two sub-collections by balancing the number of samples for the 5 classes. We then merged the results obtained by the two sub-collections; 3. Run 3 (SVR mostaganemFSEI run3): Hierarchical classi cation using a Bagging of a set of Random forest learners in each level of the hierarchical classi cation process. 4. Run 4 (SVR mostaganemFSEI run4): fusion of Run 1 and Run 2 5. Run 6 (SVR mostaganemFSEI run6): fusion of Run 3 and Run 1 Results: Table 5 shows the results obtained by our runs on validation collection.

Runs Run 1 (SVR mostaganemFSEI run1) Run 2 (SVR mostaganemFSEI run2) Run 3 (SVR mostaganemFSEI run3) Run 4 (SVR mostaganemFSEI run4) Run 6 (SVR mostaganemFSEI run6)

We can see that our Run 3 got best results in terms of RMSE compared to our other runs on validation collection and even on test data. However, in terms of AUC, Run 2 seems to be more e cient.

SVR_Task E S 0,8 M R 0,7 1,6 1,5 1,4 1,3 1,2 1,1 1 0,9 0,6 0,5 0,4 0,3 0,2 0,1 0

Mean= 1,064 d B T _ n u r _ R V S

V R S V

S task in terms of RMSE and AUC values, respectively. th

We can see that our best run is ranked 12 out of 36 submissions. However, the di erence between the performances of the 12 best runs is not very signi cant. We recall that our best result is achieved by a hierarchical classi cation approach using a bagging of random forest learners at each level of the hierarchy. We believe that our approach could give better results using a well-trained deep model in the semantic features extraction step.

SVR_Task 0,9 0,8 0,7 Mean = 0,64 0,6 CU0,5 A0,4 0,3 0,2 0,1 0 illll.ttrrsssccFSvSaaTT____eeCnoooRVAA illl.trrssccSvSaTT____eCnoooRVAHGO il.sccSvSaaTT____eeendunRVHGOm illl.ssccSvSaTT____eeCnduoRVAHGO li.tssccSvSaTT____eendudRVHGO .fttrrrrrzssssccvSaT_____e1005032bhppdnuoBRV ii.ttssxSSnbuoRVm il.tttLxFSSaaT______eee2nnudhpdnRRRVwwm l.ttttrrssxFSaT____eeeeeuBRVAD illll.tsssccFvSSaaTT____eeeCnduoRVAA .scSv9RV ill.tssccFSvSaaaTT____eeeendunRVAm .tttxFSSaT_____eee2nnhpdnRRRVww .--ttxSyaa4oRVGM i.tttrLxSga__epnRRV .tttrsFxSa_nRRV .tttxFSSaT____ee2nhpdnRRRVw il.trrsccSvSaaTT____eennooRVHGOm .----ttLxSa30PnhunoRRVMM .-----tttLLFxSSae560PhnunooRRVVMM .-----ttLxFSaTT010670PhnunoRRVMM .ItttrsxFSSgEaa__e2nunoRVmm .ItttrsxFSSgEaa__e6nunoRVmm .ItttrsxFSSgEaa__e4unnoRVmm .ItttrsxFSSgEaa__e3nunoRVmm .ItttrsxFSSgEaa__e1nunoRVmm .----ttsLxSk10nuRRVAD .ttttrrssxFSaT___eeeeeuBRVD ili.ttrrzsssccSvSaTT____eendooRRV I.tttsFxSSgEaa_enoRVmm .----ttxFSa20nhnuoRRRVM .I-----ttxFFSSa02030305hnnuoRRRVM .I-------ttxFFSSa002300540nhunoRRRVM illli.trrzsssccSvSaTT____eeCnoooRRVA li.--ttrxSa27poRVAG i.ttttrrssxFSaT____eeeeenuBBRVD

RUN

Conclusion and future works We have described in this article our contributions to the TBT and SVR tasks of ImageCLEF Tuberculosis 2018. We proposed an approach that consists in extracting a single semantic descriptor for each CT image / patient instead of considering all the slices as separate samples. Unfortunately, we could not achieve the training of our deep learner. However, the results obtained show that this approach could be much more e cient and give more interesting results if it is applied properly.

As perspectives, we plan to adopt enrichment strategies and learning samples selection. Indeed, one of the characteristics of the problematic addressed in the SVR and TBT tasks is the nature of the provided data collections, which are of a small size and are noisy because of the presence of many slices that do not contain useful information. Our bagging and sub-sampling strategies adopted in our experiments con rmed this. In addition, we noticed during the sub-sampling of our data that the deletion or addition of some samples had an impact on the results. On the other hand, ltering slices e ectively to keep only those that are truly informative is a key idea that could further improve system performance as reported by several participating teams [ 11 ]. Furthermore, we noticed in our experiments that there is a di erence in terms of precision achieved for each studied class. Indeed, some classes are more di cult to identify than others. This is also an interesting track to study.

1. med2image: https://github.com/fnndsc/med2image. Last check: 30 /05/ 2018 .

2. Sgeast model for imageclef 2017 tubeculosis task : https://github.com/maizesix92/imageclef2017 tb sgeast. Last check: 30 /05/ 2018 .

3. Cid , Y.D. , Kalinovsky , A. , Liauchuk , V. , Kovalev , V. , Muller, H.: Overview of the imageclef 2017 tuberculosis task - predicting tuberculosis type and drug resistances . In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum , Dublin, Ireland, September 11-14 , 2017 . ( 2017 ), http://ceur-ws.org/Vol1866/invited paper 1.pdf

Dicente

Cid , Y. , Kalinovsky , A. , Liauchuk , V. , Kovalev , V. , , Muller, H.: Overview of ImageCLEFtuberculosis 2017 - predicting tuberculosis type and drug resistances . In: CLEF2017 Working Notes. CEUR Workshop Proceedings , CEURWS.org <http://ceur-ws. org> , Dublin, Ireland (September 11 -14 2017 )

Dicente

Cid , Y. , Liauchuk , V. , Kovalev , V. , , Muller, H.: Overview of ImageCLEFtuberculosis 2018 - detecting multi-drug resistance, classifying tuberculosis type, and assessing severity score . In: CLEF2018 Working Notes. CEUR Workshop Proceedings , CEUR-WS.org <http://ceur-ws. org> , Avignon, France (September 10- 14 2018 )

6. Hall , M. , Frank , E. , Holmes , G. , Pfahringer , B. , Reutemann , P. , Witten , I.H. : The WEKA data mining software: an update . SIGKDD Explorations 11 ( 1 ), 10 { 18 ( 2009 )

7. Hamadi , A. , Mulhem , P. , Quenot , G.: Extended conceptual feedback for semantic multimedia indexing . Multimedia Tools Appl . 74 ( 4 ), 1225 { 1248 ( 2015 ). https://doi.org/10.1007/s11042-014-1937-y, https://doi.org/10.1007/s11042-014- 1937-y

8. He , K. , Zhang , X. , Ren , S. , Sun , J.: Deep residual learning for image recognition . arXiv preprint arXiv:1512.03385 ( 2015 )

9. Ionescu , B. , Muller, H., Villegas , M., de Herrera , A.G.S. , Eickho , C. , Andrearczyk , V. , Cid , Y.D. , Liauchuk , V. , Kovalev , V. , Hasan , S.A. , Ling , Y. , Farri , O. , Liu , J. , Lungren , M. , Dang-Nguyen , D.T. , Piras , L. , Riegler , M. , Zhou , L. , Lux , M. , Gurrin , C. : Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction . Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018 ), LNCS Lecture Notes in Computer Science , Springer, Avignon, France (September 10-14 2018 )

10. Jia , Y. , Shelhamer , E. , Donahue , J. , Karayev , S. , Long , J. , Girshick , R. , Guadarrama , S. , Darrell , T.: Ca e: Convolutional architecture for fast feature embedding . arXiv preprint arXiv:1408.5093 ( 2014 )

11. Sun , J. , Chong , P. , Tan , Y.X.M. , Binder , A. : Imageclef 2017: Imageclef tuberculosis task - the sgeast submission . In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum , Dublin, Ireland, September 11-14 , 2017 . ( 2017 ), http://ceur-ws. org/ Vol-1866/paper 130.pdf