=Paper=
{{Paper
|id=Vol-2216/healthRecSys18_paper_1
|storemode=property
|title=Personalized Symptom Checker Using Medical Claims
|pdfUrl=https://ceur-ws.org/Vol-2216/healthRecSys18_paper_1.pdf
|volume=Vol-2216
|authors=Sabin Kafle,Penny Pan,Ali Torkamani,Stevi Halley,John Powers,Hakan Kardes
|dblpUrl=https://dblp.org/rec/conf/recsys/KaflePTHPK18
}}
==Personalized Symptom Checker Using Medical Claims==
Personalized symptom checker using medical claims
Sabin Kafle Penny Pan Ali Torkamani
Cambia Health Solutions, Inc. Cambia Health Solutions, Inc. Cambia Health Solutions, Inc.
Portland, Oregon Portland, Oregon Portland, Oregon
sabin.kafle@cambiahealth.com penny.pan@regence.com ali.torkamani@cambiahealth.com
Stevi Halley John Powers Hakan Kardes
Cambia Health Solutions, Inc. Cambia Health Solutions, Inc. Cambia Health Solutions, Inc.
Portland, Oregon Portland, Oregon Portland, Oregon
stevi.halley@regence.com john.powers@cambiahealth.com hakan.kardes@cambiahealth.com
ABSTRACT ailments. Symptom checkers function by querying users’ symptoms
It is increasingly common for patients to query their symptoms to an internal medical KB and then ranking the possible diagno-
online before approaching medical professionals, with around 1% sis using Information Retrieval (IR) methods [11]. The symptoms
of Google1 search queries being related to symptoms [15]. Conse- entered by the user are usually interpreted by a Natural Language
quently, building symptom-diagnosis Knowledge Base (KB) and Processing (NLP) component to align it to the internal medical
subsequently, symptom checkers is a significant research problem KB. User interactions involve either a question answering based
[19], global symptom checkers and online search engines are unable approach with questions asked by the symptom checker [8, 12] or
to accommodate personal information which is useful for provid- a more open textual input including a list down of symptoms and
ing better health recommendations. In this work, we describe our recent events [16].
symptom checker which leverages medical claims, demographics, The vast majority of online symptom checkers are focused on
and symptoms to deliver personalized health recommendations. providing a diagnosis based on the symptoms entered by the users.
Moreover, we also explain our pipeline for building an integrative There are some which interact further with a user to obtain addi-
KB capable of leveraging both personal and textual information. tional medical information including any medical history. While the
former tends to diagnose without contextual information, the latter
CCS CONCEPTS suffers from verbosity. Also, users’ may not be comfortable in pro-
viding their medical history to online services. Another issue also
• Applied computing → Health informatics;
lies in the lack of a robust Natural Language Understanding (NLU)
component. While testing out different symptom checkers, most
KEYWORDS
are unable to understand rudimentary paraphrases and negations.
Symptom Checker; Knowledge Base; Personalization Making a relevant health decision through a symptom checker
ACM Reference Format: is based on a reliable internal medical KB [16]. A KB requires hu-
Sabin Kafle, Penny Pan, Ali Torkamani, Stevi Halley, John Powers, and Hakan man annotation to build accurate relations. Manual annotation is
Kardes. 2018. Personalized symptom checker using medical claims. In Pro- a costly process especially for symptom checkers since it requires
ceedings of the Third International Workshop on Health Recommender Systems efforts from multiple medical professionals to eliminate bias. There
co-located with Twelfth ACM Conference on Recommender Systems (HealthRec- have been few efforts to learns KB automatically either using medi-
Sys’18), Vancouver, BC, Canada, October 6, 2018 , 5 pages.
cal texts [10, 16] or Electronic Medical Records (EMRs) [18]. The
constructed KBs are heavily refined and validated by medical pro-
1 INTRODUCTION fessionals before usage. Also, no work exists leveraging multiple
It is estimated that around 35% of patients’ search for their symp- sources while building a KB, which is essential for more reliable
toms online before consulting medical personnel according to a health diagnosis.
survey in 2012 [19]. Symptom checkers and search engines are used In this work, we describe a symptom checker which aims to
primarily to rule out serious conditions and find guidance before alleviate some of the shortcomings of currently deployed online
seeking physicians. A symptom checker provides diagnostic infor- symptom checkers. We first describe a medical KB construction
mation based on the symptoms entered by the user. Most symptom pipeline which is capable of leveraging open source medical re-
checkers also ask the user for personal information including age, sources, medical texts2,3 , and medical claims data. Text data are
gender, and current location to provide more informed medical in- capable of providing medical details which serve as information to
sights, including nearby medical facilities for treatment of possible an interested user; medical resources enable structure into medical
1 https://www.google.com/
KBs while claims data empowers frequency of diagnosis along with
historical information. Secondly, we describe the architecture of
the symptom checker with NLP pipeline and personalization as
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada its core component. Our symptom checker has the advantage of
© 2018 Copyright for the individual papers remains with the authors. Copying permit-
ted for private and academic purposes. This volume is published and copyrighted by
2 https://www.ncbi.nlm.nih.gov/pubmed
its editors.
3 https://en.wikipedia.org/wiki
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Kafle et al.
being able to leverage medical claims into the diagnostic decision be mapped to symptoms using the mapping between symptoms
resulting in personalized diagnosis (based on historical medical and diagnosis.
records), recommend providers’ specialty and place of service to We describe our data generation pipeline in the following steps:
the user from probable diagnosis. (1) Use Wikipedia5 to obtain textual information regarding ICD-
10. Textual information can be attained either through ICD-
2 RELATED WORK 10 homepage in Wikipedia6 or using names of ICD-10 code
The earliest version of symptom checkers made predictions for to search in Wikipedia. We use a combination of both to
a single or closely related diagnoses (e.g.; breast cancer). Fuzzy obtain a total of 2, 319 diagnosis description linked to ICD-
rules extracted from neural networks [7] or Bayesian decision rules 10 diagnosis codes extracted either from the web page or the
[9] provide inference from symptoms to diagnosis. More recent hierarchical relationship between diagnosis codes.
symptom checkers mostly describe the KB extraction process with (2) Use MetaMap to extract all symptoms and diagnosis from
NLU and IR [11] being separate fields. PubMed and Wikipedia text. The extracted symptoms and
The KB construction process is a semi-automated method with diagnosis are then mapped using co-occurrence statistics
information extraction tools such as MetaMap [1] used for extrac- between symptoms and diagnosis. To reduce the number of
tion of medical terminologies. The relations in medical KB are unique diagnosis codes, we use only those diagnosis which
weighted using co-occurrence statistics. This method has found has a unique Wikipedia article. All other ICD-10 codes are
application in Isabel [16] and IBM Watson [10]. Rotmensch et al. mapped to the nearest codes using hierarchy relation. Symp-
[18] describe a method for KB construction based on noisy-OR toms name are also reduced using name overlap between
based Bayesian Networks [13] using Electronic Medical Records symptoms to obtain a significantly reduced list. The original
(EMRs). Middleton et al. [12] describe a symptom checker which list of symptoms can be obtained from UMLS ontology.
achieves high performance in dataset described by Semigran et al. (3) Learn the weights in KB between symptoms and diagno-
[19] but requires considerable human effort in building. Reinforce- sis. We use the Naive-Bayes weight learning [18] to learn
ment learning-based question-answer interactions also provide a associations between symptoms and diagnosis.
natural formulation to symptom checkers. Training is performed (4) Use medical claims data to provide age and gender-based
by conversion of symptom-diagnosis probability mapping to se- statistics to diagnosis codes, which propagates to symptoms
quences using likelihood sampling [4, 20]. with proportion to learned weights between symptoms and
diagnosis. The medical claims are also used to learn the
3 DATA GENERATION PIPELINE weights between different diagnosis in the temporal dimen-
A significant proportion of work in building a symptom checker lies sion. Finally, the medical claims are used to learn provider
in the construction of medical KBs. Manual construction of medical specialty and place of service for different symptoms based
KB requires a significant human effort, in turn, making the process on the learned weights and frequency. We use two year
expensive. A common alternative is the construction of KB with claims data consisting of more than 400k medical claims
slight inaccuracies based on medical texts, refined by medical pro- from around 200k members to build the statistics.
fessionals. A generic automated KB construction pipeline requires
the following resources[18] - Structured clinical resources (e.g.; 4 ARCHITECTURE DESIGN
UMLS [2], ICD-10 [14]), Medical texts (e.g.; Wikipedia, PubMed4 We summarize our process flow along with architecture in Figure 1.
abstracts), and Information Extraction (IE) Engine (e.g.; MetaMap The basic design of symptom checker currently consists of the
[1]). Unified Medical Language System (UMLS) is a medical ontol- following components:
ogy integrating multiple sources of medical knowledge including • Front-end
SnomedCT [5], ICD-10 using entity defined as concepts to build a • Web server
hierarchical relation between medical terminologies. SnomedCT • Natural Language Processing (NLP) component
is a medical ontology constructed with the objective of defining • Personalization component
medical concepts hierarchically. ICD-10 codes are used to describe
the diagnosis of patients which is then used by physicians to bill We describe each of the components in detail in preceding subsec-
the patient. All the KBs hierarchically describe the concepts with tions.
UMLS enabling linkage between multiple KBs.
In addition to the data sources mentioned above, we also use 4.1 Front-end
medical claims data. Medical claims give the diagnosis of a patient The front-end is the interactive component of the symptom checker
using ICD-10 codes which can then be cross-referenced with pa- where the user interacts with the symptom checker to obtain diag-
tients personal information to obtain a complete historical picture nostic information. It consists of the following two components:
of a user. The availability of claims data enables construction of a • A query page to obtain the user’s symptoms and their per-
more robust KB which considers temporal dimension as a compo- sonal information (age and gender currently). Users are free
nent of KB. Medical claims also provide a convenient solution for to enter additional medical events and any events considered
recommending provider specialty and place of service which can
5 https://en.wikipedia.org/wiki
4 https://www.ncbi.nlm.nih.gov/pubmed 6 https://en.wikipedia.org/wiki/ICD-10
Personalized symptom checker using medical claims HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada
Figure 2: An example response of the symptom checker
specialty, and related symptoms specific to the possible conditions.
Figure 3 shows the additional information by the symptom checker.
4.3 NLP component
Figure 1: Process flow for symptom checker with core com-
ponents NLP processor provides three functionalities - paraphrase genera-
tion, negation detection and phrase extractions.
The negation detection and phrase extraction features use the
dependency parser based on Spacy Python library8 . Phrase extrac-
relevant by the user (e.g., travel to a tropical region before
tion enables the user to enter symptom checker in either a textual
getting symptoms).
manner with long text input or as a list down of symptoms. Nega-
• A response page which displays the user’s possible con-
tion detection helps to understand the complex set of information
ditions, related symptoms, possible place of service and
which is useful in ranking out the list of diagnosis based upon both
provider specialty, along with a field to include additional
positive symptoms and negative symptoms.
symptoms. Figure 2 depicts the response page of the symp-
Paraphrase generation component uses Stacked LSTM in an
tom checker.
encoder-decoder framework with attention for training similar to
A design consideration is to make predictions regarding diagnosis [6]. UMLS concepts are used to generate dataset defining medical
regardless of the amount of information entered by the user. The paraphrases. The dataset provides synonyms for medical phrases
probability score depicts the uncertainty of the model when making including symptoms and diagnosis.
predictions.
4.4 Personalization component
4.2 Web server The personalization component enables the symptom checker to
The web server is the core component of the system through which provide multiple sets of results for the same symptoms based on
the different components of the symptom checker interacts. The personal information of the user. The first step is in identifying
symptoms, description, and demographic information entered by the relative importance of symptoms and diagnosis based on the
the user is processed by the web server. The information is then age and gender information of the user. The relative importance is
passed through NLP and personalization component to obtain a useful for narrowing down the results for symptom checker. The
better understanding of the symptoms and constraints placed on second application of personalization component lies in the re-
the possible diagnosis based upon personal information. An Elas- ranking of the result of symptom checker based on the relevance of
ticSearch7 database is then queried to generate candidate diagnosis. the diagnosis to the user based on age, gender, and medical history.
The symptom checker then interacts with the database sequentially
to further filter and rank the candidates for the entered symptoms 5 EVALUATION
and personal information. Then, the personalization component
We use a dataset of 45 clinical vignettes of different degree of sever-
is used to re-rank the diagnosis. The diagnoses are then used to
ity of diagnosis described in [19]. A clinical vignette is a full descrip-
extract useful information including likely place of service, provider
tion of a patient condition enabling a physician to make a diagnostic
7 https://www.elastic.co/products/elasticsearch 8 https://spacy.io/
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Kafle et al.
Age
Diagnosis Symptoms
Gender
Requiring Emergent Care
• Confusion
• Disorientation
Acute Liver 48 y/o
• Increasingly Drowsy
Failure Female
• Mild right upper quadrant pain
• Chronic tylenol acetaminophen
user - recently took more
Requiring non-emergent Care
• History of asthma
• Five days fever
6 y/o • Cough
Pneumonia
Male • Appetite good
• Yellow sputum
• Temperature = 101.6
Requiring self-care
• 3 days red, irritated eye
Acute 14 y/o • Watery discharge from eye
Conjunctivitis Male • URI symptoms
• No pain or light sensitivity
Table 1: Examples vignettes extracted from Semigran et al.
[19]. There are 15 diagnostic vignettes for each type of care
i.e. emergent, non-emergent, and self-care.
Figure 3: Additional results generated by the symptom
checker for search term "shoulder pain".
Metric Emergent Non emer- Self-care Overall
care gent care
decision. The dataset is divided into three degrees of severity - re-
quiring emergent care (15 cases), requiring non-emergent care (15 Top@1 20.00 20.00 50.00 29.55
cases) and requiring self-care only (15 cases). Table 1 lists some Top@3 20.00 33.33 57.14 36.36
example vignettes. Top@5 33.33 53.33 64.29 50.00
We achieve competitive performance to other online symptom Top@10 46.67 60.00 71.43 59.09
checkers despite using only unsupervised data generation process. Top@20 53.33 66.67 78.57 65.91
We report our accuracy in Table 2. The average performance of Table 2: Accuracy evaluation (%) of symptom checker across
symptom checkers is 58% for Top-20 evaluation. The symptoms are diagnosis requiring emergency, non-emergency and self-
manually entered in the format acceptable to the symptom checker care.
to achieve optimal performance. Unlike many online symptom
checkers, our system is capable of incorporating noisy text as input
and obtaining relevant information through the NLP component.
The accuracy report in Table 2 shows that the symptom checker
in our system being highly reliant on unsupervised methods rather
performs significantly better for diagnosis requiring self-care com-
than being a fully validated medical KB [18]. We expect to obtain
pared to emergent and non-emergent care. The discrepancy in
better performance on future iterations of our medical KB as we
performance is due to the symptoms listed for self-care conditions
incorporate additional resources and validation methods.
having more accurate data source in the form of textual data. A
deeper dive is needed to study the discrepancy between medical
KB and the evaluation dataset to account for the noise in medical 6 CONCLUSION AND FUTURE WORK
KB and its impact on different care types [3]. We have described a symptom checker based upon a medical KB
Other online symptom checkers as evaluated on Semigran et al. generated in an unsupervised fashion. The novelty of our approach
[19] on average obtain 34% Top-1 accuracy and 58% Top-20 accuracy lies in the unsupervised data generation process using multiple
with higher accuracy for emergent care (80%) and least accuracy data sources, which is then linked with NLP and personalization
for self-care (33%) while non-emergent care accuracy is 55%. The components to provide a robust, personalized symptom checker.
performance of some higher quality symptom checkers is higher Future work includes refinement of data generation pipeline to
with Babylon symptom checker [12] obtaining performances similar integrate additional data sources including EMRs and integration
to medical professionals [17]. The discrepancy in performance is of specific user info into the symptom checker interface to provide
primarily due to the quality of KB with the KB pipeline described a better understanding of individual symptoms.
Personalized symptom checker using medical claims HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada
REFERENCES
[1] Alan R Aronson. 2006. Metamap: Mapping text to the umls metathesaurus.
Bethesda, MD: NLM, NIH, DHHS (2006), 1–26.
[2] Olivier Bodenreider. 2004. The unified medical language system (UMLS): in-
tegrating biomedical terminology. Nucleic acids research 32, suppl_1 (2004),
D267–D270.
[3] M Alan Brookhart, Til Stürmer, Robert J Glynn, Jeremy Rassen, and Sebastian
Schneeweiss. 2010. Confounding control in healthcare database research: chal-
lenges and potential approaches. Medical care 48, 6 0 (2010), S114.
[4] Edward Y Chang, Meng-Hsi Wu, Kai-Fu Tang Tang, Hao-Cheng Kao, and Chun-
Nan Chou. 2017. Artificial Intelligence in XPRIZE DeepQ Tricorder. In Proceedings
of the 2nd International Workshop on Multimedia for Personal Health and Health
Care. ACM, 11–18.
[5] Kevin Donnelly. 2006. SNOMED-CT: The advanced terminology and coding
system for eHealth. Studies in health technology and informatics 121 (2006), 279.
[6] Sadid A Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, Oladimeji Farri,
et al. 2016. Neural Paraphrase Generation with Stacked Residual LSTM Networks.
In Proceedings of COLING 2016, the 26th International Conference on Computational
Linguistics: Technical Papers. 2923–2934.
[7] Yoichi Hayashi. 1991. A neural expert system with automated extraction of
fuzzy if-then rules and its application to medical diagnosis. In Advances in neural
information processing systems. 578–584.
[8] Hao-Cheng Kao, Kai-Fu Tang, and Edward Y Chang. 2018. Context-Aware
Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement
Learning. (2018).
[9] Igor Kononenko. 2001. Machine learning for medical diagnosis: history, state of
the art and perspective. Artificial Intelligence in medicine 23, 1 (2001), 89–109.
[10] Adam Lally, Sugato Bagchi, Michael A Barborak, David W Buchanan, Jennifer
Chu-Carroll, David A Ferrucci, Michael R Glass, Aditya Kalyanpur, Erik T Mueller,
J William Murdock, et al. 2017. WatsonPaths: scenario-based question answering
and inference over unstructured information. AI Magazine 38, 2 (2017), 59.
[11] Ray R Larson. 2010. Introduction to information retrieval. Journal of the American
Society for Information Science and Technology 61, 4 (2010), 852–853.
[12] Katherine Middleton, Mobasher Butt, Nils Hammerla, Steven Hamblin, Karan
Mehta, and Ali Parsa. 2016. Sorting out symptoms: design and evaluation of
the’babylon check’automated triage system. arXiv preprint arXiv:1606.02041
(2016).
[13] Agnieszka Oniśko, Marek J Druzdzel, and Hanna Wasyluk. 2001. Learning
Bayesian network parameters from small data sets: Application of Noisy-OR
gates. International Journal of Approximate Reasoning 27, 2 (2001), 165–182.
[14] World Health Organization et al. 1992. The ICD-10 classification of mental and
behavioural disorders: clinical descriptions and diagnostic guidelines. Geneva:
World Health Organization.
[15] Veronica Pinchin. 2016. I’m Feeling Yucky :( Searching for
symptoms on Google. https://blog.google/products/search/
im-feeling-yucky-searching-for-symptoms/
[16] P Ramnarayan, G Kulkarni, A Tomlinson, and J Britto. 2004. ISABEL: a novel
Internet-delivered clinical decision support system. Current perspectives in health-
care computing (2004), 245–256.
[17] Salman Razzaki, Adam Baker, Yura Perov, Katherine Middleton, Janie Baxter,
Daniel Mullarkey, Davinder Sangar, Michael Taliercio, Mobasher Butt, Azeem
Majeed, et al. 2018. A comparative study of artificial intelligence and human
doctors for the purpose of triage and diagnosis. arXiv preprint arXiv:1806.10698
(2018).
[18] Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, and David
Sontag. 2017. Learning a health knowledge graph from electronic medical records.
Scientific reports 7, 1 (2017), 5994.
[19] Hannah L Semigran, Jeffrey A Linder, Courtney Gidengil, and Ateev Mehrotra.
2015. Evaluation of symptom checkers for self diagnosis and triage: audit study.
bmj 351 (2015), h3480.
[20] Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y Chang. 2016.
Inquire and Diagnose: Neural Symptom Checking Ensemble using Deep Rein-
forcement Learning. In Proceedings of NIPS Workshop on Deep Reinforcement
Learning.