=Paper=
{{Paper
|id=Vol-2216/healthRecSys18_paper_1
|storemode=property
|title=Personalized Symptom Checker Using Medical Claims
|pdfUrl=https://ceur-ws.org/Vol-2216/healthRecSys18_paper_1.pdf
|volume=Vol-2216
|authors=Sabin Kafle,Penny Pan,Ali Torkamani,Stevi Halley,John Powers,Hakan Kardes
|dblpUrl=https://dblp.org/rec/conf/recsys/KaflePTHPK18
}}
==Personalized Symptom Checker Using Medical Claims==
<pdf width="1500px">https://ceur-ws.org/Vol-2216/healthRecSys18_paper_1.pdf</pdf>
<pre>
                Personalized symptom checker using medical claims
                     Sabin Kafle                                              Penny Pan                                     Ali Torkamani
          Cambia Health Solutions, Inc.                            Cambia Health Solutions, Inc.                      Cambia Health Solutions, Inc.
                Portland, Oregon                                       Portland, Oregon                                     Portland, Oregon
         sabin.kafle@cambiahealth.com                               penny.pan@regence.com                          ali.torkamani@cambiahealth.com

                    Stevi Halley                                             John Powers                                    Hakan Kardes
          Cambia Health Solutions, Inc.                           Cambia Health Solutions, Inc.                      Cambia Health Solutions, Inc.
                Portland, Oregon                                       Portland, Oregon                                   Portland, Oregon
           stevi.halley@regence.com                             john.powers@cambiahealth.com                       hakan.kardes@cambiahealth.com

ABSTRACT                                                                               ailments. Symptom checkers function by querying users’ symptoms
It is increasingly common for patients to query their symptoms                         to an internal medical KB and then ranking the possible diagno-
online before approaching medical professionals, with around 1%                        sis using Information Retrieval (IR) methods [11]. The symptoms
of Google1 search queries being related to symptoms [15]. Conse-                       entered by the user are usually interpreted by a Natural Language
quently, building symptom-diagnosis Knowledge Base (KB) and                            Processing (NLP) component to align it to the internal medical
subsequently, symptom checkers is a significant research problem                       KB. User interactions involve either a question answering based
[19], global symptom checkers and online search engines are unable                     approach with questions asked by the symptom checker [8, 12] or
to accommodate personal information which is useful for provid-                        a more open textual input including a list down of symptoms and
ing better health recommendations. In this work, we describe our                       recent events [16].
symptom checker which leverages medical claims, demographics,                             The vast majority of online symptom checkers are focused on
and symptoms to deliver personalized health recommendations.                           providing a diagnosis based on the symptoms entered by the users.
Moreover, we also explain our pipeline for building an integrative                     There are some which interact further with a user to obtain addi-
KB capable of leveraging both personal and textual information.                        tional medical information including any medical history. While the
                                                                                       former tends to diagnose without contextual information, the latter
CCS CONCEPTS                                                                           suffers from verbosity. Also, users’ may not be comfortable in pro-
                                                                                       viding their medical history to online services. Another issue also
• Applied computing → Health informatics;
                                                                                       lies in the lack of a robust Natural Language Understanding (NLU)
                                                                                       component. While testing out different symptom checkers, most
KEYWORDS
                                                                                       are unable to understand rudimentary paraphrases and negations.
Symptom Checker; Knowledge Base; Personalization                                          Making a relevant health decision through a symptom checker
ACM Reference Format:                                                                  is based on a reliable internal medical KB [16]. A KB requires hu-
Sabin Kafle, Penny Pan, Ali Torkamani, Stevi Halley, John Powers, and Hakan            man annotation to build accurate relations. Manual annotation is
Kardes. 2018. Personalized symptom checker using medical claims. In Pro-               a costly process especially for symptom checkers since it requires
ceedings of the Third International Workshop on Health Recommender Systems             efforts from multiple medical professionals to eliminate bias. There
co-located with Twelfth ACM Conference on Recommender Systems (HealthRec-              have been few efforts to learns KB automatically either using medi-
Sys’18), Vancouver, BC, Canada, October 6, 2018 , 5 pages.
                                                                                       cal texts [10, 16] or Electronic Medical Records (EMRs) [18]. The
                                                                                       constructed KBs are heavily refined and validated by medical pro-
1    INTRODUCTION                                                                      fessionals before usage. Also, no work exists leveraging multiple
It is estimated that around 35% of patients’ search for their symp-                    sources while building a KB, which is essential for more reliable
toms online before consulting medical personnel according to a                         health diagnosis.
survey in 2012 [19]. Symptom checkers and search engines are used                         In this work, we describe a symptom checker which aims to
primarily to rule out serious conditions and find guidance before                      alleviate some of the shortcomings of currently deployed online
seeking physicians. A symptom checker provides diagnostic infor-                       symptom checkers. We first describe a medical KB construction
mation based on the symptoms entered by the user. Most symptom                         pipeline which is capable of leveraging open source medical re-
checkers also ask the user for personal information including age,                     sources, medical texts2,3 , and medical claims data. Text data are
gender, and current location to provide more informed medical in-                      capable of providing medical details which serve as information to
sights, including nearby medical facilities for treatment of possible                  an interested user; medical resources enable structure into medical
    1 https://www.google.com/
                                                                                       KBs while claims data empowers frequency of diagnosis along with
                                                                                       historical information. Secondly, we describe the architecture of
                                                                                       the symptom checker with NLP pipeline and personalization as
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                its core component. Our symptom checker has the advantage of
© 2018 Copyright for the individual papers remains with the authors. Copying permit-
ted for private and academic purposes. This volume is published and copyrighted by
                                                                                          2 https://www.ncbi.nlm.nih.gov/pubmed
its editors.
                                                                                          3 https://en.wikipedia.org/wiki
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                         Kafle et al.


being able to leverage medical claims into the diagnostic decision     be mapped to symptoms using the mapping between symptoms
resulting in personalized diagnosis (based on historical medical       and diagnosis.
records), recommend providers’ specialty and place of service to         We describe our data generation pipeline in the following steps:
the user from probable diagnosis.                                          (1) Use Wikipedia5 to obtain textual information regarding ICD-
                                                                               10. Textual information can be attained either through ICD-
2    RELATED WORK                                                              10 homepage in Wikipedia6 or using names of ICD-10 code
The earliest version of symptom checkers made predictions for                  to search in Wikipedia. We use a combination of both to
a single or closely related diagnoses (e.g.; breast cancer). Fuzzy             obtain a total of 2, 319 diagnosis description linked to ICD-
rules extracted from neural networks [7] or Bayesian decision rules            10 diagnosis codes extracted either from the web page or the
[9] provide inference from symptoms to diagnosis. More recent                  hierarchical relationship between diagnosis codes.
symptom checkers mostly describe the KB extraction process with            (2) Use MetaMap to extract all symptoms and diagnosis from
NLU and IR [11] being separate fields.                                         PubMed and Wikipedia text. The extracted symptoms and
   The KB construction process is a semi-automated method with                 diagnosis are then mapped using co-occurrence statistics
information extraction tools such as MetaMap [1] used for extrac-              between symptoms and diagnosis. To reduce the number of
tion of medical terminologies. The relations in medical KB are                 unique diagnosis codes, we use only those diagnosis which
weighted using co-occurrence statistics. This method has found                 has a unique Wikipedia article. All other ICD-10 codes are
application in Isabel [16] and IBM Watson [10]. Rotmensch et al.               mapped to the nearest codes using hierarchy relation. Symp-
[18] describe a method for KB construction based on noisy-OR                   toms name are also reduced using name overlap between
based Bayesian Networks [13] using Electronic Medical Records                  symptoms to obtain a significantly reduced list. The original
(EMRs). Middleton et al. [12] describe a symptom checker which                 list of symptoms can be obtained from UMLS ontology.
achieves high performance in dataset described by Semigran et al.          (3) Learn the weights in KB between symptoms and diagno-
[19] but requires considerable human effort in building. Reinforce-            sis. We use the Naive-Bayes weight learning [18] to learn
ment learning-based question-answer interactions also provide a                associations between symptoms and diagnosis.
natural formulation to symptom checkers. Training is performed             (4) Use medical claims data to provide age and gender-based
by conversion of symptom-diagnosis probability mapping to se-                  statistics to diagnosis codes, which propagates to symptoms
quences using likelihood sampling [4, 20].                                     with proportion to learned weights between symptoms and
                                                                               diagnosis. The medical claims are also used to learn the
3    DATA GENERATION PIPELINE                                                  weights between different diagnosis in the temporal dimen-
A significant proportion of work in building a symptom checker lies            sion. Finally, the medical claims are used to learn provider
in the construction of medical KBs. Manual construction of medical             specialty and place of service for different symptoms based
KB requires a significant human effort, in turn, making the process            on the learned weights and frequency. We use two year
expensive. A common alternative is the construction of KB with                 claims data consisting of more than 400k medical claims
slight inaccuracies based on medical texts, refined by medical pro-            from around 200k members to build the statistics.
fessionals. A generic automated KB construction pipeline requires
the following resources[18] - Structured clinical resources (e.g.;     4     ARCHITECTURE DESIGN
UMLS [2], ICD-10 [14]), Medical texts (e.g.; Wikipedia, PubMed4        We summarize our process flow along with architecture in Figure 1.
abstracts), and Information Extraction (IE) Engine (e.g.; MetaMap      The basic design of symptom checker currently consists of the
[1]). Unified Medical Language System (UMLS) is a medical ontol-       following components:
ogy integrating multiple sources of medical knowledge including              • Front-end
SnomedCT [5], ICD-10 using entity defined as concepts to build a             • Web server
hierarchical relation between medical terminologies. SnomedCT                • Natural Language Processing (NLP) component
is a medical ontology constructed with the objective of defining             • Personalization component
medical concepts hierarchically. ICD-10 codes are used to describe
the diagnosis of patients which is then used by physicians to bill     We describe each of the components in detail in preceding subsec-
the patient. All the KBs hierarchically describe the concepts with     tions.
UMLS enabling linkage between multiple KBs.
   In addition to the data sources mentioned above, we also use        4.1     Front-end
medical claims data. Medical claims give the diagnosis of a patient    The front-end is the interactive component of the symptom checker
using ICD-10 codes which can then be cross-referenced with pa-         where the user interacts with the symptom checker to obtain diag-
tients personal information to obtain a complete historical picture    nostic information. It consists of the following two components:
of a user. The availability of claims data enables construction of a         • A query page to obtain the user’s symptoms and their per-
more robust KB which considers temporal dimension as a compo-                  sonal information (age and gender currently). Users are free
nent of KB. Medical claims also provide a convenient solution for              to enter additional medical events and any events considered
recommending provider specialty and place of service which can
                                                                           5 https://en.wikipedia.org/wiki
    4 https://www.ncbi.nlm.nih.gov/pubmed                                  6 https://en.wikipedia.org/wiki/ICD-10
Personalized symptom checker using medical claims                                  HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


                                                                             Figure 2: An example response of the symptom checker


                                                                         specialty, and related symptoms specific to the possible conditions.
                                                                         Figure 3 shows the additional information by the symptom checker.

                                                                         4.3      NLP component
Figure 1: Process flow for symptom checker with core com-
ponents                                                                  NLP processor provides three functionalities - paraphrase genera-
                                                                         tion, negation detection and phrase extractions.
                                                                            The negation detection and phrase extraction features use the
                                                                         dependency parser based on Spacy Python library8 . Phrase extrac-
        relevant by the user (e.g., travel to a tropical region before
                                                                         tion enables the user to enter symptom checker in either a textual
        getting symptoms).
                                                                         manner with long text input or as a list down of symptoms. Nega-
      • A response page which displays the user’s possible con-
                                                                         tion detection helps to understand the complex set of information
        ditions, related symptoms, possible place of service and
                                                                         which is useful in ranking out the list of diagnosis based upon both
        provider specialty, along with a field to include additional
                                                                         positive symptoms and negative symptoms.
        symptoms. Figure 2 depicts the response page of the symp-
                                                                            Paraphrase generation component uses Stacked LSTM in an
        tom checker.
                                                                         encoder-decoder framework with attention for training similar to
A design consideration is to make predictions regarding diagnosis        [6]. UMLS concepts are used to generate dataset defining medical
regardless of the amount of information entered by the user. The         paraphrases. The dataset provides synonyms for medical phrases
probability score depicts the uncertainty of the model when making       including symptoms and diagnosis.
predictions.
                                                                         4.4      Personalization component
4.2     Web server                                                       The personalization component enables the symptom checker to
The web server is the core component of the system through which         provide multiple sets of results for the same symptoms based on
the different components of the symptom checker interacts. The           personal information of the user. The first step is in identifying
symptoms, description, and demographic information entered by            the relative importance of symptoms and diagnosis based on the
the user is processed by the web server. The information is then         age and gender information of the user. The relative importance is
passed through NLP and personalization component to obtain a             useful for narrowing down the results for symptom checker. The
better understanding of the symptoms and constraints placed on           second application of personalization component lies in the re-
the possible diagnosis based upon personal information. An Elas-         ranking of the result of symptom checker based on the relevance of
ticSearch7 database is then queried to generate candidate diagnosis.     the diagnosis to the user based on age, gender, and medical history.
The symptom checker then interacts with the database sequentially
to further filter and rank the candidates for the entered symptoms       5     EVALUATION
and personal information. Then, the personalization component
                                                                         We use a dataset of 45 clinical vignettes of different degree of sever-
is used to re-rank the diagnosis. The diagnoses are then used to
                                                                         ity of diagnosis described in [19]. A clinical vignette is a full descrip-
extract useful information including likely place of service, provider
                                                                         tion of a patient condition enabling a physician to make a diagnostic
   7 https://www.elastic.co/products/elasticsearch                            8 https://spacy.io/
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                        Kafle et al.

                                                                                            Age
                                                                         Diagnosis                       Symptoms
                                                                                            Gender
                                                                                          Requiring Emergent Care
                                                                                                      • Confusion
                                                                                                      • Disorientation
                                                                          Acute Liver     48 y/o
                                                                                                      • Increasingly Drowsy
                                                                          Failure         Female
                                                                                                      • Mild right upper quadrant pain
                                                                                                      • Chronic tylenol acetaminophen
                                                                                                      user - recently took more
                                                                                        Requiring non-emergent Care
                                                                                                      • History of asthma
                                                                                                      • Five days fever
                                                                                          6 y/o       • Cough
                                                                          Pneumonia
                                                                                          Male        • Appetite good
                                                                                                      • Yellow sputum
                                                                                                      • Temperature = 101.6
                                                                                             Requiring self-care
                                                                                                      • 3 days red, irritated eye
                                                                          Acute           14 y/o      • Watery discharge from eye
                                                                          Conjunctivitis Male         • URI symptoms
                                                                                                      • No pain or light sensitivity
                                                                        Table 1: Examples vignettes extracted from Semigran et al.
                                                                        [19]. There are 15 diagnostic vignettes for each type of care
                                                                        i.e. emergent, non-emergent, and self-care.
Figure 3: Additional results generated by the symptom
checker for search term "shoulder pain".
                                                                            Metric   Emergent      Non emer- Self-care          Overall
                                                                                     care          gent care
decision. The dataset is divided into three degrees of severity - re-
quiring emergent care (15 cases), requiring non-emergent care (15         Top@1 20.00           20.00         50.00      29.55
cases) and requiring self-care only (15 cases). Table 1 lists some        Top@3 20.00           33.33         57.14      36.36
example vignettes.                                                        Top@5 33.33           53.33         64.29      50.00
   We achieve competitive performance to other online symptom             Top@10 46.67          60.00         71.43      59.09
checkers despite using only unsupervised data generation process.         Top@20 53.33          66.67         78.57      65.91
We report our accuracy in Table 2. The average performance of           Table 2: Accuracy evaluation (%) of symptom checker across
symptom checkers is 58% for Top-20 evaluation. The symptoms are         diagnosis requiring emergency, non-emergency and self-
manually entered in the format acceptable to the symptom checker        care.
to achieve optimal performance. Unlike many online symptom
checkers, our system is capable of incorporating noisy text as input
and obtaining relevant information through the NLP component.
   The accuracy report in Table 2 shows that the symptom checker
                                                                        in our system being highly reliant on unsupervised methods rather
performs significantly better for diagnosis requiring self-care com-
                                                                        than being a fully validated medical KB [18]. We expect to obtain
pared to emergent and non-emergent care. The discrepancy in
                                                                        better performance on future iterations of our medical KB as we
performance is due to the symptoms listed for self-care conditions
                                                                        incorporate additional resources and validation methods.
having more accurate data source in the form of textual data. A
deeper dive is needed to study the discrepancy between medical
KB and the evaluation dataset to account for the noise in medical       6     CONCLUSION AND FUTURE WORK
KB and its impact on different care types [3].                          We have described a symptom checker based upon a medical KB
   Other online symptom checkers as evaluated on Semigran et al.        generated in an unsupervised fashion. The novelty of our approach
[19] on average obtain 34% Top-1 accuracy and 58% Top-20 accuracy       lies in the unsupervised data generation process using multiple
with higher accuracy for emergent care (80%) and least accuracy         data sources, which is then linked with NLP and personalization
for self-care (33%) while non-emergent care accuracy is 55%. The        components to provide a robust, personalized symptom checker.
performance of some higher quality symptom checkers is higher           Future work includes refinement of data generation pipeline to
with Babylon symptom checker [12] obtaining performances similar        integrate additional data sources including EMRs and integration
to medical professionals [17]. The discrepancy in performance is        of specific user info into the symptom checker interface to provide
primarily due to the quality of KB with the KB pipeline described       a better understanding of individual symptoms.
Personalized symptom checker using medical claims                                           HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


REFERENCES
 [1] Alan R Aronson. 2006. Metamap: Mapping text to the umls metathesaurus.
     Bethesda, MD: NLM, NIH, DHHS (2006), 1–26.
 [2] Olivier Bodenreider. 2004. The unified medical language system (UMLS): in-
     tegrating biomedical terminology. Nucleic acids research 32, suppl_1 (2004),
     D267–D270.
 [3] M Alan Brookhart, Til Stürmer, Robert J Glynn, Jeremy Rassen, and Sebastian
     Schneeweiss. 2010. Confounding control in healthcare database research: chal-
     lenges and potential approaches. Medical care 48, 6 0 (2010), S114.
 [4] Edward Y Chang, Meng-Hsi Wu, Kai-Fu Tang Tang, Hao-Cheng Kao, and Chun-
     Nan Chou. 2017. Artificial Intelligence in XPRIZE DeepQ Tricorder. In Proceedings
     of the 2nd International Workshop on Multimedia for Personal Health and Health
     Care. ACM, 11–18.
 [5] Kevin Donnelly. 2006. SNOMED-CT: The advanced terminology and coding
     system for eHealth. Studies in health technology and informatics 121 (2006), 279.
 [6] Sadid A Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, Oladimeji Farri,
     et al. 2016. Neural Paraphrase Generation with Stacked Residual LSTM Networks.
     In Proceedings of COLING 2016, the 26th International Conference on Computational
     Linguistics: Technical Papers. 2923–2934.
 [7] Yoichi Hayashi. 1991. A neural expert system with automated extraction of
     fuzzy if-then rules and its application to medical diagnosis. In Advances in neural
     information processing systems. 578–584.
 [8] Hao-Cheng Kao, Kai-Fu Tang, and Edward Y Chang. 2018. Context-Aware
     Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement
     Learning. (2018).
 [9] Igor Kononenko. 2001. Machine learning for medical diagnosis: history, state of
     the art and perspective. Artificial Intelligence in medicine 23, 1 (2001), 89–109.
[10] Adam Lally, Sugato Bagchi, Michael A Barborak, David W Buchanan, Jennifer
     Chu-Carroll, David A Ferrucci, Michael R Glass, Aditya Kalyanpur, Erik T Mueller,
     J William Murdock, et al. 2017. WatsonPaths: scenario-based question answering
     and inference over unstructured information. AI Magazine 38, 2 (2017), 59.
[11] Ray R Larson. 2010. Introduction to information retrieval. Journal of the American
     Society for Information Science and Technology 61, 4 (2010), 852–853.
[12] Katherine Middleton, Mobasher Butt, Nils Hammerla, Steven Hamblin, Karan
     Mehta, and Ali Parsa. 2016. Sorting out symptoms: design and evaluation of
     the’babylon check’automated triage system. arXiv preprint arXiv:1606.02041
     (2016).
[13] Agnieszka Oniśko, Marek J Druzdzel, and Hanna Wasyluk. 2001. Learning
     Bayesian network parameters from small data sets: Application of Noisy-OR
     gates. International Journal of Approximate Reasoning 27, 2 (2001), 165–182.
[14] World Health Organization et al. 1992. The ICD-10 classification of mental and
     behavioural disorders: clinical descriptions and diagnostic guidelines. Geneva:
     World Health Organization.
[15] Veronica Pinchin. 2016.               I’m Feeling Yucky :( Searching for
     symptoms on Google.                           https://blog.google/products/search/
     im-feeling-yucky-searching-for-symptoms/
[16] P Ramnarayan, G Kulkarni, A Tomlinson, and J Britto. 2004. ISABEL: a novel
     Internet-delivered clinical decision support system. Current perspectives in health-
     care computing (2004), 245–256.
[17] Salman Razzaki, Adam Baker, Yura Perov, Katherine Middleton, Janie Baxter,
     Daniel Mullarkey, Davinder Sangar, Michael Taliercio, Mobasher Butt, Azeem
     Majeed, et al. 2018. A comparative study of artificial intelligence and human
     doctors for the purpose of triage and diagnosis. arXiv preprint arXiv:1806.10698
     (2018).
[18] Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, and David
     Sontag. 2017. Learning a health knowledge graph from electronic medical records.
     Scientific reports 7, 1 (2017), 5994.
[19] Hannah L Semigran, Jeffrey A Linder, Courtney Gidengil, and Ateev Mehrotra.
     2015. Evaluation of symptom checkers for self diagnosis and triage: audit study.
     bmj 351 (2015), h3480.
[20] Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y Chang. 2016.
     Inquire and Diagnose: Neural Symptom Checking Ensemble using Deep Rein-
     forcement Learning. In Proceedings of NIPS Workshop on Deep Reinforcement
     Learning.

</pre>