=Paper=
{{Paper
|id=Vol-3124/paper16
|storemode=property
|title=Challenges for Automation of Public Health Data Analysis
|pdfUrl=https://ceur-ws.org/Vol-3124/paper16.pdf
|volume=Vol-3124
|authors=Ravi Shankar
|dblpUrl=https://dblp.org/rec/conf/iui/Shankar22
}}
==Challenges for Automation of Public Health Data Analysis==
Challenges for Automation of Public Health Data Analysis
Ravi Shankar
Grenoble, France
rsps1001@gmail.com
Abstract
Advancements in Machine Learning and Data Science are not adequately reflected in how
public health data is handled today. There is a visible gap between the advances in computing
and medical sciences. In this position paper, we present an example of data science applied to
the automation of a repetitive process within a cervical cancer screening program. We discuss
the challenges for automating public health data and share our insights to elevate artificial
intelligence (AI) in public healthcare.
Keywords 1
Public health, Data analysis, Cancer research, Automation
1. Introduction a microscope. This is an eye-dependent process,
therefore inter- and intra-variability is present,
and external revision is often needed as part of
More than 80% of the cervical cancer cases
quality assurance (QA). The full process
and deaths in a year occur in low medium income
including QA may not be affordable, particularly
countries (LMICs) where prevention and cervical
in LMICs. Hence, AI contributes to eliminate
screening resources are limited [1][2]. Recent
such variability while saving time and resources.
research studies have used machine learning
models to support the initial phase of screening for
detection of cancerous lesions using colposcopic
images or cervicography[3][4]. These techniques
require tech-savvy healthcare workers who are
very scarce per capita in these countries.
We aim to build a user-friendly automation
that would allow medical experts to diagnose
cancerous tissues of the cervix in a short period of
time while reducing costs and technical
experience required. This idea will work by Figure 1: Project pipeline illustrating the
combining heath and AI researchers’ expertise automation of public health data analysis
and experiences.
involving human reviewers who validate the
The main problem we aim to address is
Machine Learning model’s prediction results.
diagnosing biopsied women within a cervical
cancer program. Our motivation is driven by the
importance and time consumption of pathology Figure 1 illustrates the example of a proposed
process (i.e., pathologists reading histological pipeline in which we aim to automate the steps
slides). In the pathology process, women testing from fetching of biopsy-based cervical data
positive on screening tests are referred to within a cervical cancer screening program. We
specialised examination (colposcopy) to collect then pre-process the fetched data, followed by
biopsy samples from the cervix and then training our machine learning (ML) model to
haematoxylin and eosin (H&E) histological slides make two or three prediction sets (ensuring QA)
are prepared to be reviewed by pathologists using for human reviewers to validate, and finally
generate the reports of the analysis. The current
Joint Proceedings of the ACM IUI Workshops 2022, March 2022,
Helsinki, Finland EMAIL: rsps1001@gmail.com (A. 1)
Copyright © 2022 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Wor
Pr
ks
hop
oceedi
ngs
ht
I
tp:
//
ceur
-
SSN1613-
ws
.or
0073
g
CEUR Workshop Proceedings (CEUR-WS.org)
process (highlighted in yellow in Figure 1) 4. Handling Large Datasets: As public
excludes these steps for automation (highlighted health studies are ongoing processes which
in blue in Figure 1). include participants on a rolling basis, they can
While the system works with the current result in large datasets during overall period of
process, the automation steps are currently done the study (which might span several years). It
manually and repeatedly by a group of is crucial to prepare for handling the data in
pathologists and statisticians. As the ML model batches for faster training of the model.
does not exist in the current process, the analysis 5. Cross Validation by Experts:
reports are produced after 2-3 stages of reviews Validation of results is a necessity with respect
involving multiple meetings to concur on the to training ML models. In healthcare-related
results. Including our proposed steps for data, cross validation by experts is much more
automation in the current process will lower the important to prevent fatal diagnosis errors and
burden of the experts and improve the timeframe to check for any potential biases in the model.
up to 1/20 in comparison to the current process. 6. Human Control: It is important to have
While our proposed project pipeline (Figure 1) adequate human control so that the confidence
forecasts optimal benefits for cervical cancer of the predicted results is higher. Enabling
screening, in laying the groundwork, we were human control via the automation process’s
faced with critical challenges encompassing the interface allows to spot any discrepancies and
realms of – technical, ethical, legal, and (most malfunctioning.
importantly) end user facing challenges. In this 7. Transparency: The interface should be
Workshop on “Healthy Interfaces (HEALTHI) made simple and transparent for both non-
2022,” we look forward to discussing our research medical and other non-tech savvy stakeholders
on automation of public health data analysis. We involved. The entire automation process
hope to share our current challenges, methods, should be comprehensible to all stakeholders
and future plans for AI powered healthcare. involved for the project to succeed.
8. Legal Efforts and Approval: Last but
2. Challenges for Automation of the most important challenge is to succeed in
the legal efforts and approvals required for the
Public Health Data Analysis automation projects. Developing proof of
concepts with publicly available datasets is
In this section we generalise the problems we one of the ways to prepare for the challenge of
faced when implementing our project (Figure 1) gaining legal approvals and other grants
to discuss the challenges for automation of public
health data analysis:
1. Trained Data Entry: The first challenge
3. Conclusion
is the considerable effort needed to change the
conventional data entry practices. To AI powered public healthcare will foster a
automate, it is essential to construct database health structure in the future where the AI process
constraints, design helpful interfaces, and train drives the speed and accuracy of the diagnosis,
non-tech savvy workers to log: complete, error treatment, and recovery. People will get the right
free, and rightly formatted data. diagnosis at the right time such that their treatment
2. Patient Privacy: Anonymising the data is and recovery chances improve, thus improving
important to preserving the privacy of personal chances of a good life. Furthermore, the cost
health records of patients who sign up for the efficiency brought by AI techniques will enable
study. If possible, it should be mindfully made smart healthcare to be adapted to different
visible at the level of the interface to both the healthcare structures in different countries,
patients and their clinicians. specifically in the low-income countries, so that
3. Data Pre-processing: Data pre- healthcare becomes accessible and affordable
processing is the cleaning and preparation of there. This is a possibility only when AI
data for the model and analysis tasks. This is a researchers combine their expertise and
time-consuming underestimated challenge, if experiences with health researchers. With this
done improperly, it potentially hinders the position paper we aim to contribute by informing
performance and accuracy of the model and both medical professionals and computer
delays the overall study. scientists of the challenges for automation of
public health data analysis.
4. References [4] Liming Hu, David Bell, Sameer Antani,
Zhiyun Xue, Kai Yu, Matthew P Horning,
Noni Gachuhi, Benjamin Wilson, Mayoore S
[1] Almonte, Maribel, Raúl Murillo, Gloria Inés
Jaiswal, Brian Befano, L Rodney Long,
Sánchez, Paula González, Annabelle Ferrera,
Rolando Herrero, Mark H Einstein, Robert D
M A Picconi, Carolina Wiesner, Aurelio
Burk, Maria Demarco, Julia C Gage, Ana
Cruz-Valdéz, Eduardo Lazcano-Ponce, Jose
Cecilia Rodriguez, Nicolas Wentzensen,
Jeronimo, Catterina Ferreccio, Elena
Mark Schiffman, An Observational Study of
Kasamatsu, Laura Patricia Mendoza,
Deep Learning and Automated Evaluation of
Guillermo Rodríguez, Alejandro Calderón,
Cervical Images for Cancer Screening, JNCI:
Gino Venegas, Verónica Villagra, Silvio
Journal of the National Cancer Institute,
Alejandro Tatti, Laura Fleider, Carolina
Volume 111, Issue 9, September 2019, Pages
Terán, Armando Baena, María de la Luz
923–932,
Hernández, Mary-Luz Rol, Eric Lucas,
https://doi.org/10.1093/jnci/djy225
Sylvaine Barbier, Arianis Tatiana Ramírez,
Silvina Arrossi, Maria I. Rodriguez, E Díaz
González, Marcela Celis, Sandra Martínez,
Yuly Salgado, Marina Ortega, Andrea
Verónica Beracochea, Natalia Pérez,
Margarita M Rodríguez de la Peña, Maria de
Sales Ramon, Pilar Hernández-Nevarez,
Margarita Arboleda-Naranjo, Yessy Cabrera,
Brenda Utrera Salgado, Laura García, Marco
Antonio Retana, María Celeste Colucci,
Javier A. Arias-Stella, Yenny Bellido-
Fuentes, María Liz Bobadilla, Gladys
Olmedo, Ivone Brito-García, Armando
Méndez-Herrera, Lucía Cardinal, Betsy
Flores, J F Márquez Peñaranda, Josefina
Martínez-Better, Ana María Soilán,
Jacqueline Figueroa, Benedicta Caserta,
Carlos P. Sosa, Adrian A. Moreno, Juan
Mural, Franco Doimi, Diana Giménez,
Hernando Gutiérrez Rodríguez, Oscar Lora,
Silvana Luciani, Nathalie Jeanne Nicole
Broutet, Teresa M. Darragh and Rolando
Herrero. “Multicentric study of cervical
cancer screening with human papillomavirus
testing and assessment of triage methods in
Latin America: the ESTAMPA screening
study protocol.” BMJ Open 2020 May
24;10(5): e035796. doi: 10.1136/bmjopen-
2019-035796. PMID: 32448795; PMCID:
PMC7252979.
[2] Bray F, Jemal A, Grey N, Ferlay J, Forman
D. Global cancer transitions according to the
Human Development Index (2008-2030): a
population-based study. Lancet Oncol.
2012;13(8):790–801.
[3] Cho, BJ., Choi, Y.J., Lee, MJ. et al.
Classification of cervical neoplasms on
colposcopic photography using deep
learning. Sci Rep 10, 13652 (2020).
https://doi.org/10.1038/s41598-020-70490-4