=Paper= {{Paper |id=Vol-2619/paper4 |storemode=property |title=Report of the First International Workshop on Semantic Indexing and Information Retrieval for Health from heterogeneous content types and languages |pdfUrl=https://ceur-ws.org/Vol-2619/paper4.pdf |volume=Vol-2619 |authors=Francisco Couto,Martin Krallinger |dblpUrl=https://dblp.org/rec/conf/ecir/CoutoK20a }} ==Report of the First International Workshop on Semantic Indexing and Information Retrieval for Health from heterogeneous content types and languages== https://ceur-ws.org/Vol-2619/paper4.pdf
            Report of the First International Workshop on
           Semantic Indexing and Information Retrieval for
            Health from heterogeneous content types and
                         languages (SIIRH) ?

                           Francisco M. Couto1[0000−0003−0627−1496] and Martin
                                     Krallinger2[0000−0002−2646−8782]
                     1
                      LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
                                           fcouto@di.fc.ul.pt
            2
              Life Science Department, Barcelona Supercomputing Centre (BSC-CNS), C/Jordi
                     Girona 29-31, 08034, Barcelona, Spain martin.krallinger@bsc.es



                    Abstract. This article briefly summarizes the talks and discussions that
                    occurred during the first edition of the International Workshop on Se-
                    mantic Indexing and Information Retrieval for Health from heteroge-
                    neous content types and languages (SIIRH). The workshop was a virtual
                    event held on April 14, 2020 in conjunction with the 42nd European Con-
                    ference on Information Retrieval (ECIR2020). The article also presents
                    the main conclusions and future perspectives of the field taking into
                    account the discussions that occurred during the event. All the docu-
                    ments and videos related to the workshop are available at the workshop
                    site: https://sites.google.com/view/siirh2020/.

                    Keywords: Semantic Indexing · Ontologies · Controlled Vocabularies
                    · Information Retrieval · Text Mining · Natural Language Processing ·
                    Biomedical Informatics


           1      Introduction

           The first edition of the International Workshop on Semantic Indexing and In-
           formation Retrieval for Health from heterogeneous content types and languages
           (SIIRH)[2] was a virtual event held on April 14, 2020 in conjunction with the
           42nd European Conference on Information Retrieval (ECIR2020). Semantic In-
           dexing and health-related IR are topics of particular interest for the community
           that participates in the European Conference on Information Retrieval. This
           is demonstrated by the topics of interest in the call of papers of ECIR 2020
           that cover this workshop scope, namely natural language processing and domain
           specific search.
            ?
                Supported by FCT through funding of the DeST: Deep Semantic Tagger project, ref.
                PTDC/CCI-BIO/28685/2017, and LaSIGE Research Unit, ref. UIDB/00408/2020




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           2        F. Couto and M. Krallinger


                                           Fig. 1. SIIRH2020 Program




               On March 11, 2020 the organizers decided to transform ECIR2020 as a open
           live event due to the worldwide COVID-19 situation. Due to this change, SIIRH
           and all other ECIR workshops were also transformed in virtual events [8]. SIIRH
           organization decided to have a three hour programme (2pm-5pm) with short
           presentations in order to maintain the event as most interactive as possible and
           deal with presenters from different timezones (from GMT-5 to GMT+11) (see
           Figure 1). Some additional pre-recorded talks with more details about the works
           were also provided. All videos were made available as a YouTube playlist3 . The
           event was held using Zoom and reached more than 50 online participants. The
           preliminary proceedings were also published online4 .
               Given the COVID-19 situation the keynotes talks, and in many other discus-
           sions, focused on solutions on how semantic indexing and information retrieval

            3
                https://www.youtube.com/playlist?list=PL6RYRv3A1tLwpD4aTbSVraUZNITbkRxZd
            4
                https://drive.google.com/open?id=1-sF_0R3uGinq5ybcAM5H54k5yujJE8o6




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                     Proposal of the First SIIRH Workshop   3

           systems can help in this health crisis. The solutions allow the scientific commu-
           nity to better deal with the huge amount of information that has to be processed
           and analyzed, to find ways to contain the spread of a virus as soon as possible.


           2     Program

           The workshop program was divided as following:

              – Opening Session
              – MESINESP/Plan TL
              – Full Papers
              – Short Papers
              – Closing Remarks

           The full program is available online5 .


           2.1    Opening Session

           This session included initial remarks from Francisco Couto about the organiza-
           tion of the workshop, and the importance of the field, specially in times of world
           health crisis as we face nowadays.
               The session included the keynote talk entitled CoronaTracker: A framework
           for managing and tracking data during crisis by Cher Han Lau. The talk pre-
           sented the relevance and collaborative work6 done to track all the information
           related COVID-19, including a multi-lingual perspective. The video is available
           on YouTube7 .


           2.2    MESINESP/Plan TL Session

           This session was chaired by Andre Lamurias and started with a keynote talk enti-
           tled BioASQ: The challenge and the community of biomedical semantic indexing
           and question answering by Georgios Paliouras. The talk showed the importance
           of international challenges, such as BioASQ8 , to improve the performance of
           current solutions of semantic indexing and question answering. The video is
           available on YouTube9 .
               The session ended with a presentation entitled MESINESP: Medical Seman-
           tic Indexing in Spanish: current and future directions by Martin Krallinger,
           where he demonstrated the relevance of the multi-lingual perspective to commu-
           nity challenges on the field. The video is available on YouTube10 .
            5
              https://drive.google.com/open?id=1ds5F7WUR5GZAcjKVcEqS-NVuxy14tDLU
            6
              https://www.coronatracker.com/
            7
              https://youtu.be/DJtbQfke7A0
            8
              http://bioasq.org/
            9
              https://youtu.be/GtaEtt3OwCY
           10
              https://youtu.be/4EsNf_UYheM




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           4        F. Couto and M. Krallinger

           2.3    Full Papers Session
           This session was chaired by Francisco Couto and included the presentations of
           the three works that were accepted as full papers.
              The first work entitled First Steps Towards Patient-Friendly Presentation
           of Dutch Radiology Reports was presented by Koen Dercksen [3]. The video is
           available on YouTube11 .
              The second work entitled Enriching Consumer Health Vocabulary Using En-
           hanced GloVe Word Embedding was presented by Mohammed Ibrahim [7]. The
           video is available on YouTube12 .
              The third work entitled SmokPro: Towards Tobacco Product Identification in
           Social Media Text was presented by Kartikey Pant [4]. The video is available on
           YouTube13 .
              The session ended with a keynote talk entitled The COVID-19 Open Research
           Dataset by Kyle Lo and Lucy Lu Wang where they showed their large-scale and
           recent effort14 in creating a corpus containing information related COVID-19.
           There was also a discussion about the importance and advantages of including
           multi-lingual documents on such efforts. The video is available on YouTube15 .

           2.4    Short Papers Session
           This session was chaired by Martin Krallinger and included the presentations of
           the three works that were accepted as short papers.
              The first work entitled Twitter goes to the Doctor: Detecting Medical Tweets
           using Machine Learning and BERT was presented by Kevin Roitero [5]. The
           video is available on YouTube16 .
              The second work entitled Biomedical Question Answering using Extreme
           Multi-Label Classification and Ontologies in the Multilingual Panorama was pre-
           sented by André Neves [1]. The video is available on YouTube17 .
              The third work entitled Towards a multilingual corpus for Named Entity
           Linking evaluation in the clinical domain was presented by Pedro Ruas [9]. The
           video is available on YouTube18 .

           2.5    Closing Remarks Session
           Martin Krallinger ended the workshop summarizing the main ideas discussed
           during the event, and pointing out future venues for the advancement of the
           field, and how it can positively impact the health sector.
           11
              https://youtu.be/N461QEG9r3M
           12
              https://youtu.be/9GfhtivnONQ
           13
              https://youtu.be/96lFloRSUYI
           14
              https://pages.semanticscholar.org/coronavirus-research
           15
              https://youtu.be/geX4hSRW2vA
           16
              https://youtu.be/I-SzgxU3KdM
           17
              https://youtu.be/G8YttYTn89Q
           18
              https://youtu.be/1SE7PY3sFtA




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                     Proposal of the First SIIRH Workshop          5

           3     Conclusions

           The workshop was a venue for the different types of contributors, mainly task
           providers and solution providers, to meet together and exchange their experi-
           ences. Thus, the main outcome was the gathering of a group of researchers with
           hands-on expertise on developing Information Retrieval solutions based on se-
           mantic indexing for Life and Health Sciences, which together discussed how to
           define a road map of what challenges the community should address to produce
           more efficient and robust solutions.
               The main challenge in the field is to motivate and encourage more IR re-
           searchers to work with heterogeneous health-related content types in multiple
           languages. Given that, it is critical to provide training corpora and search so-
           lutions for non-English content as well as cross-language or multilingual IR so-
           lutions [10], as well as exploitation of evaluation settings and data collections
           generated through these kind of efforts (both during the evaluation period and
           afterwards) [6]. We expect that further investigation on the topics will continue
           after the workshop, based on new insights obtained through discussions during
           the event.


           References

              1. André Neves, A.L., Couto, F.: Biomedical question answering using extreme multi-
                 label classification and ontologies in the multilingual panorama. In: Proceedings of
                 the International Workshop on Semantic Indexing and Information Retrieval for
                 Health from heterogeneous content types and languages (SIIRH 2020) (2020)
              2. Couto, F., Krallinger, M.: Proposal of the first international workshop on seman-
                 tic indexing and information retrieval for health from heterogeneous content types
                 and languages (SIIRH). In: Proceedings of the 42nd European Conference on In-
                 formation Retrieval (ECIR 2020) (2020)
              3. Dercksen, K., de Vries, A.P.: First steps towards patient-friendly presentation of
                 dutch radiology reports. In: Proceedings of the International Workshop on Se-
                 mantic Indexing and Information Retrieval for Health from heterogeneous content
                 types and languages (SIIRH 2020) (2020)
              4. Himakar Yv, K.P., Mamidi, R.: SmokPro: Towards tobacco product identification
                 in social media text. In: Proceedings of the International Workshop on Semantic
                 Indexing and Information Retrieval for Health from heterogeneous content types
                 and languages (SIIRH 2020) (2020)
              5. Kevin Roitero, Cristian Bozzato, V.D.M.S.M., Serra, G.: Twitter goes to the doc-
                 tor: Detecting medical tweets using machine learning and BERT. In: Proceedings
                 of the International Workshop on Semantic Indexing and Information Retrieval for
                 Health from heterogeneous content types and languages (SIIRH 2020) (2020)
              6. Marimon, M., Gonzalez-Agirre, A., Intxaurrondo, A., Rodrguez, H., Lopez Martin,
                 J., Villegas, M., Krallinger, M.: Automatic de-identification of medical texts in
                 spanish: the meddocan track, corpus, guidelines, methods and evaluation of results.
                 In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). vol.
                 TBA, p. TBA. CEUR Workshop Proceedings (CEUR-WS. org), Bilbao, Spain (Sep
                 2019), TBA (2019)




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           6        F. Couto and M. Krallinger

            7. Mohammed Ibrahim, Susan Gauch, O.S., Alqahatani, M.: Enriching consumer
               health vocabulary using enhanced GloVe word embedding. In: Proceedings of the
               International Workshop on Semantic Indexing and Information Retrieval for Health
               from heterogeneous content types and languages (SIIRH 2020) (2020)
            8. Nunes, S., Little, S., Bhatia, S., Boratto, L., Cabanac, G., Campos, R., Couto,
               F.M., Faralli, S., Frommholz, I., Jatowt, A., Jorge, A., Marras, M., Mayr, P.,
               Stilo, G.: ECIR 2020 workshops: Assessing the impact of going online. Tech. Rep.
               arXiv:2005.06748 [cs.IR], arXiv.org (2020)
            9. Pedro Ruas, A.L., Couto, F.: Towards a multilingual corpus for named entity link-
               ing evaluation in the clinical domain. In: Proceedings of the International Workshop
               on Semantic Indexing and Information Retrieval for Health from heterogeneous
               content types and languages (SIIRH 2020) (2020)
           10. Villegas, M., Intxaurrondo, A., Gonzalez-Agirre, A., Marimon, M., Krallinger, M.:
               The MeSpEN resource for english-spanish medical machine translation and termi-
               nologies: census of parallel corpora, glossaries and term translations. In: Proceed-
               ings of the LREC 2018 Workshop “MultilingualBIO: Multilingual Biomedical Text
               Processing”, Paris, France. European Language Resources Association (ELRA)
               (2018)




Copyright ©  2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).