=Paper= {{Paper |id=Vol-2551/xpreface |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2551/xpreface.pdf |volume=Vol-2551 }} ==None== https://ceur-ws.org/Vol-2551/xpreface.pdf
Preface

This volume contains the papers presented at HSDM20: ACM WSDM Workshop
on Health Search and Data Mining held on February 3, 2020 in Houston.
    There were 7 submissions. Each submission was reviewed by at least 2, and
on the average 2.9, program committee members. The committee decided to
accept 6 papers. The program also includes 2 invited talks.
    There are many interesting challenges in delivering intelligent decision sup-
port in the health domain. Collections of documents such as health records,
scholarly publications, clinical trials, or drug orders grow at high rates and are
distributed around the globe in a fragmented manner. Health data is highly
multi-modal (clinical notes, time series, medical images, genomics etc.) and its
interpretation is domain specific. Users of health information systems have dif-
ferent levels of expertise, and information needs, e.g., a patient vs. a primary
care physician vs. cancer researcher. At the same time, the data is highly sen-
sitive and subject to legal requirements regarding privacy, security, and confi-
dentiality. This breadth of challenges requires interdisciplinary approaches. The
Information Retrieval (IR) and Data Mining (DM) communities are particularly
well-positioned to tackle these problems.
    Search, recommendation, and information extraction systems help lay and
expert users explore ever-growing collections. Decision support systems assist in
complex decision making processes. Intelligent user interfaces present the right
information at the right time and allow for unobtrusive interaction all the way
from the lab to the bedside. Mobile device applications and other sensors help
provide a more holistic view on the patient’s case than what can be gleaned in
an 10-minute physician interview.
    Health-related topics of interest include, among others:

 – Search over images/genomics/structured data
 – Federated multi-modal search combining different data sources
 – User interfaces for biomedical/clinical search supporting complex informa-
   tion needs
 – Analysis of search logs and social media
 – User search behavior studies
 – Building and use of medical knowledge bases or ontologies
 – Privacy-preserving techniques for clinical data
 – Adverse event detection and prediction
 – Mobile (mHealth) applications
 – Wearables
 – Spoken interaction with health data
 – Whole exposome modeling and estimation
 – Applications of data mining and machine learning
 – Ethics, bias, and fairness

   This conference was sponsored by UPMC Enterprises.



                                        v
February 3, 2020                Yubin Kim
Houston, Texas, USA        Carsten Eickhoff
                               Ryen White




                      vi
Table of Contents

Machine Learning for Healthcare: Beyond i.i.d. Prediction . . . . . . . . . . . . . .                  1
  Zachary Lipton
Applying Information Retrieval to the Electronic Health Record for
Cohort Discovery and Rare Disease Detection . . . . . . . . . . . . . . . . . . . . . . . .            2
   William Hersh
Comparing Rule-based, Feature-based and Deep Neural Methods for
De-identification of Dutch Medical Records . . . . . . . . . . . . . . . . . . . . . . . . . . .       3
   Jan Trienes, Dolf Trieschnigg, Christin Seifert and Djoerd Hiemstra

Healthcare NER Models Using Language Model Pretraining . . . . . . . . . . . .                         12
   Amogh Kamat Tarcar, Aashis Tiwari, Dattaraj Rao, Vineet Naique
   Dhaimodker, Penjo Rebelo and Rahul Desai
Lung nodule classification using Convolutional Autoencoder and
Clustering Augmented Learning Method(CALM) . . . . . . . . . . . . . . . . . . . . .                   19
   Soumya Suvra Ghosal, Indranil Sarkar and Issmail El Hallaoui
A Query Taxonomy Describes Performance of Patient-Level Retrieval
from Electronic Health Record Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   27
   Steve Chamberlin, Steven Bedrick, Aaron Cohen, Yanshan Wang, An-
   drew Wen, Sijia Liu, Hongfang Liu and William Hersh

Streaming Gait Assessment for Parkinson’s Disease . . . . . . . . . . . . . . . . . . . .              34
   Cristopher Flagg, Ophir Frieder, Sean MacAvaney and Gholam Mo-
   tamedi
Clustering Large-scale Diverse Electronic Medical Records to Aid
Annotation for Generic Named Entity Recognition . . . . . . . . . . . . . . . . . . . .                43
   Nithin Haridas and Yubin Kim




                                                   vii
Program Committee

Steven Bedrick            OHSU
Leonid Boytsov            3M M*Modal
Dina Demner               U.S. National Library of Medicine, NIH
Carsten Eickhoff          Brown University
Alba Garcı́a Seco De Her- University of Essex
rera
Yubin Kim                 UPMC Enterprises
Bevan Koopman             CSIRO
Henning Müller           HES-SO
Joao Palotti              Qatar Computing Research Institute
Zhen Qin                  Google
Kirk Roberts              The University of Texas Health Science Center at
                          Houston
Karin Verspoor            The University of Melbourne
Wei Wei                   University of Pittsburgh Medical Center
Ryen White                Microsoft
Elad Yom-Tov              Microsoft




                                  viii
Additional Reviewers


                       S

                       Sergeeva, Elena




                             ix