=Paper=
{{Paper
|id=None
|storemode=property
|title=Linking Specialized Online Medical Discussions to Online Medical Literature
|pdfUrl=https://ceur-ws.org/Vol-572/paper4.pdf
|volume=Vol-572
}}
==Linking Specialized Online Medical Discussions to Online Medical Literature==
Linking Specialized Online Medical Discussions
to Online Medical Literature
Sam Stewart1⋆ , Syed Sibte Raza Abidi1 , Allen Finley2
1
NICHE Research Group, Faculty of Computer Science, Dalhousie University,
Halifax, Canada
2
IWK Health Centre/Dalhousie University, Halifax, Canada
Abstract. The medical web comprises both medical communities en-
gaged in discussions about specialized topics and a vast array of medical
articles available through web-based databases. In this paper we present
a knowledge linkage strategy that links online specialized medical dis-
cussions with corresponding online medical articles. The idea is to link
the experiential knowledge generated in online medical discussions by a
virtual community of specialized medical practitioners with the explicit
knowledge available in online medical literature archives. We have de-
veloped a specialized medical literature search algorithm, based on the
principles of the Extended Boolean Information Retrieval algorithm [6],
to retrieve a ranked list of medical articles associated with the specialized
medical discussion. The medical literature search algorithm is part of our
knowledge linkage strategy that involves the generation of topic-specific
discussion threads from online discussions, formulation of highly special-
ized search queries based on a specialized discussion thread and retrieval
of published medical articles from PubMed that are closely related to
the online discussion. We have applied our knowledge linkage strategy to
the specialized medical topic of Pediatric Pain Management, and have
achieved an improvement in the positive return rate (recall) from 55% to
70% in terms of linking online medical discussions to the correct medical
articles.
1 Introduction
Web 2.0 technologies are embraced by medical practitioners for collaborative
case solving, professional communications, knowledge sharing, medical educa-
tion, patient interactions and so on. From a knowledge and experience sharing
perspective, online discussion forums and mailing lists provide a viable medium
for medical professionals to virtually engage in discussions around specialized
medical topics. The ensuing discussions, which constitute a thread of emails or
postings by medical professionals from different parts of the world with varying
degrees of expertise and experience, entail practical know-how in terms of what
worked and what did not work, recommendations and solutions to unusual cases
⋆
This work is carried out with the aid of a grant from the International Development
Research Centre, Ottawa, Canada
MEDEX 2010 Proceedings 35
and references to domain experts or published evidence. Such online discussions
are a vital resource for experiential medical knowledge emanating from a com-
munity of medical practitioners. Notwithstanding the utility of online discussions
on specialized medical topics, medical practitioners like to correlate the recom-
mendations with published medical literature for use in clinical decision-making.
The process of searching for information within specialized domains, however,
is a key challenge within the medical community. Studies have shown that the
lack of clinical knowledge about specialized subjects, such as pediatric pain, have
lead to incorrect interventions [2]. These problems tend to be exacerbated by the
fact that specialized practitioners do not often have the time to meet and share
information face-to-face, forcing them to rely on their own search strategies to
retrieve information from published resources.
Linking specialized online medical discussions to online medical literature
poses an information retrieval challenge because the specialized discussions are
context-sensitive, spanning multiple emails/postings and encapsulate concepts
from multiple sources. A medical practitioner seeking medical articles corre-
sponding to the online discussion is therefore required to formulate a focused
search query that captures the discussion’s context, uses the prevalent terms
and is posted to the right online medical literature archive. It may be noted that
the process of finding research articles related to a specialized topic amongst the
nineteen million different articles available on the online database of Pubmed is
a challenging task, and can be even more challenging for specialized fields for
which there is less published literature. This paper presents a medical literature
retrieval strategy that automatically comprehends a specialized online discussion
to formulate a search query that retrieves relevant medical articles from a web
of medical literature archives (in particular the online databases of Pubmed).
In this way, we establish knowledge linkages between the experiential knowledge
encapsulated within online medical discussions with explicit knowledge stored
in online medical literature archives.
In this paper we present our knowledge linkage strategy that involves a se-
quence of steps, starting from forming topic-specific discussion threads to formu-
lating highly specialized search queries based on a specialized discussion thread
to retrieving a set of published articles from PubMed that are closely related
to the online discussion (see figure 1). We have developed a specialized medical
literature search algorithm, based on the principles of the Extended Boolean
Information Retrieval algorithm [6], that incorporates both weighted and un-
weighted query terms (keywords derived from the selected medical discussion)
to retrieve a ranked list of medical articles associated with the specialized dis-
cussion. We use Metamap [1], a program designed by the National Library of
Medicine for processing the free-form medical text of the discussions to a set of
medical keywords based on the MeSH lexicon. The choice of MeSH terminology
is quite natural since the PubMed data is indexed by MeSH keywords. We have
applied our knowledge linkage strategy to the specialized medical topic of Pe-
diatric Pain Management that features a Pediatric Pain Mailing List (PPML)
with over 700 subscribers. Our results show that the application of our special-
36 MEDEX 2010 Proceedings
ized medical literature search algorithm has improved the positive return rate
(recall) from 55% to 70% which is a significant improvement in terms of linking
online medical discussions to the right medical articles.
Fig. 1. The knowledge linkage strategy. (1) Messages are extracted and (2) com-
bined them into threads, where they are then (3) linked to formal medical terms.
These terms are then (4) used in a novel search strategy to obtain a ranked list
of papers, which are then (5) returned to the practitioners.
1.1 Pediatric Pain
Pediatric pain management is an example of a specialized medical domain that
can benefit from knowledge linkage. Pediatric pain is a complex subject that is
dispersed across multiple departments within a hospital. It is difficult to manage,
as children lack the ability to properly express their pain [2], which can lead
to incorrect interventions. To compound the problem, healthcare practitioners
do not receive proper training in the management of pediatric pain [3], and
the multidisciplinary nature of the subject makes it difficult for pediatric pain
practitioners to meet and discuss their issues face-to-face.
The PPML is an example of a web 2.0 tool that has provided an electronic
link between clinicians working in different departments and hospitals around
the world. The PPML has over 700 subscribers and over 13,000 messages, all
archived, making it an excellent candidate for knowledge linkage. The conversa-
tions on the mailing list will be processed using the program Metamap, which will
provide a list of pertinent medical keywords extracted from the MeSH lexicon.
1.2 Metamap
Metamap is a Natural Language Processing (NLP) tool designed to parse free-
form medical text and connect it to formal medical terms from selected medical
MEDEX 2010 Proceedings 37
lexicons. For this project the lexicon being used is the MeSH vocabulary, but
Metamap has the ability to map to any lexicon in the Universal Medical Lan-
guage System (UMLS), such as SNOMEDCT or ICD9. Metamap has been used
in several other projects to link free form medical texts to formal medical terms
[4, 5]. For more details on its use see Aronson’s introductory work [1].
2 Methods
The objective of the search strategy is to passively link the conversations on
the mailing list to pertinent published literature. This means that the MeSH
terms produced by the mapping process and their scores must be leveraged by
the search strategy to produce a ranked list of papers associated with that con-
versation. Other projects that have looked to make information retrieval in the
medical domain easier have looked at ways to improve clinicians active search
strategies, through better search algorithms and interfaces [8]. This project takes
a different approach, choosing to perform the search automatically without re-
quiring clinician input. The resulting set of papers will be provided without
requiring any input from the user, vastly increasing the speed of the knowledge
linkage process. If the resulting set of papers is not optimal then the set of ranked
MeSH terms returned can be used to inform a manual search.
2.1 Search Strategy
The search strategy is based on the Extend Boolean Information Retrieval
(eBIR) algorithm developed by Salton et al [6]. The algorithm builds on the
traditional boolean information retrieval approach by including both query and
document weights for each of the keywords, and then using a p-norm calculation
to assign a search score. This project will modify the eBIR algorithm to better
fit automatic searching within specialized domains.
2.2 eBIR and p-norms
Boolean information retrieval is the simplest form of information retrieval, in
which query terms are joined by AND and OR operators, and any papers match-
ing the query are returned. There a several problems with the boolean informa-
tion retrieval model. First, it is often difficult to manage the size of the returned
set of papers; complex searches can easily return no papers, yet removing a
search term can result in a set of several thousand papers. Second, there is no
ranking of the papers returned. Third, there is no way to assign importance to
specific keywords. Finally, there is a problem with the structure of the searches;
if ten query terms are join by AND operators, then papers that match nine of
the terms but not the tenth are not returned. In the context of this project the
boolean search strategy is particularly ineffective, as it does not make use of the
Metamap scores at hand, and there are far too many query terms associated
with a conversation to retrieve a manageable set of papers.
38 MEDEX 2010 Proceedings
To remedy this problem Salton et al. developed a system that incorporates
term weights to aid in the search process. The eBIR algorithm allows weight-
ing of both the paper keywords and the query terms. For this project there
are no weights for the document keywords (which are assigned by the authors
manually via Pubmed), but the Metamap scores can be used as query weights,
with higher weights indicating more confidence in the search term. Though the
eBIR algorithm suggests that weights should be in the range of [0,1], there is no
reason mathematically that they cannot be in the range of [0,∞], and thus no
transformation of the Metamap scores is required.
The eBIR algorithm uses the idea of p-norms to measure the score of a
set of OR or AND terms. Let the set of query terms be represented A =
{(A1 , a1 ), . . . , (An , an )}, where Ai is the ith query term, and ai is the associ-
ated score. Let a document D be represented by the set D = {dA1 , dA2 , . . . , dAn }
where dAi is the weight associated with keyword term i in that specific document.
Since this project does not allow for weighted document keywords dAi = 0 or 1.
Let the query QOR(p) = {(A1 , a1 ) OR p . . . OR p (An , an )} by the set of query
terms linked by OR, and let the query QAN D(p) = {(A1 , a1 ) AND p . . . AND p (An , an )}
by the set of query terms linked by AND. The p-norm scores for each of the
searches is given in equations (1) and (2).
p p 1/p
a1 dA1 + ap2 dpA2 + . . . + apn dpAn
sim(D, QOR(p) ) = (1)
ap1 + ap2 + . . . + apn
p 1/p
a1 (1 − dA1 )p + . . . + apn (1 − dAn )p
sim(D, QAN D(p) ) = 1 − (2)
ap1 + . . . + apn
The selection of p effects the relative strengths of the returned scores. Se-
lecting p = ∞ results in a standard boolean information retrieval model, while
selecting p = 1 results in a vector-space model [7], in which the ANDs and ORs
are ignored and the papers are ranked by the sum of the query terms that appear
in each paper.
For this project the simplest form of an eBIR algorithm would be to link
all the terms returned by Metamap using an OR operator. Let the set M =
{(M1 , m1 ), (M2 , m2 ), . . . (Mn , mn )} be the MeSH terms and their scores for a
particular conversation. Then the query would be given in equation (3), and the
score calculation for paper D would be given by equation (4).
QOR = [M1 OR M2 OR M3 . . . Mn ] (3)
1/p
mp1 dpM1 + mp2 dpM2 + . . . + mpn dpMn
sim(D, QOR(p) ) = (4)
mp1 + mp2 + . . . + mpp
Note that the selection of p is key to the function of the p-norm calculation
and subsequently the eBIR algorithm. Setting p = 1 makes sense theoretically,
as the principle behind the OR(p) operator is to return the papers that match
the most number of terms in the query set, so equation (4) could be reduced to
MEDEX 2010 Proceedings 39
P
sim(D, QOR ) = mi di , where di is an indicator of whether term i is a keyword
for the paper.
The problem with the eBIR algorithm is that it is not well suited for spe-
cialized domains. The Metamap program extracts keywords that represent the
conversation within the mailing list, but keywords such as Pediatrics and Pain
are implicitly representative of all conversations on the list, whether or not they
are particularly suited to the conversation. This problem needs to be addressed,
to make sure that the search strategy is focusing on the correct body of literature.
2.3 Modified Information Retrieval Algorithm
To solve the problem of specialized domains it was decided that a specialized
filter would be added, adding an AND operator to the query. The objective of
the specialized filter is to focus the search on papers relevant to the specialized
subject. One has to be careful, however, to not over-restrict the search by filtering
out useful papers. To this end an age-group filter is added, to ensure that all
papers returned are relevant to the pediatric population. The new query would
modify equation 4 by adding Infant, Child and Adolescent to the set of MeSH
terms, as demonstrated in equation (5).
Q = [Inf ant OR Child OR Adolescent] AND [M1 OR M2 OR M3 . . . Mn ] (5)
If the eBIR algorithm were used then the next step would be to apply query
weights to the terms in the specialized filter and then find a suitable value for p.
This project decided instead to modify the eBIR algorithm slightly, by combining
the idea of strict boolean searching with a weighted query.
The final search algorithm leverages the eBIR idea of weighting queries, but
adds a strict filter that reduces the search field to only those papers that match
the age filter. This filter has the effect of focusing the search strictly on papers
that focus on the pediatric population. The score for paper D is therefore calcu-
lated using the equation 6. The equation uses the same calculation as the eBIR
algorithm, but requires the presence of one of the age group keywords. Let dI , dC
and dA be the indicators of whether the paper contains the MeSH terms Infant,
Child or Adolescent respectively.
sim(D, Q) = [1 − (1 − dI )(1 − dC )(1 − dA )](m1 dM1 + m2 dM2 + . . . + mn dMn ) (6)
3 Results
This is an example of a single conversation from the PPML.
Question:We are looking at ways to decrease the pain of ocular flushing nec-
essary when a child gets sand or spray in their eyes. I am really having a hard
time finding any literature on what is the most comfortable solution to use
40 MEDEX 2010 Proceedings
(NS?RL)and what freezing drops to use(Prilocaine?) If anyone has a proce-
dure, or protocol, or literature to share or can let know what you are using, I
would really appreciate it. Right now we are using nothing.
Response: From personal experience, Proparacaine (Alcaine(r) in the U.S.)
anesthetic eye drops are almost painless on instillation; they do not provide as
deep anesthesia as tetracaine but burn much less and usually provide sufficient
conjunctival and corneal anesthesia. ...
A sample of the MeSH terms associated with this conversation are available in
table 1, and seem to be a reasonable representation of the conversation. The full
set of MeSH terms was used in the search strategy to retrieve the set of papers,
the top two of which were as follows:
– Boscia F et al. Combined topical anesthesia and sedation for open-globe in-
juries in selected patients. Ophthalmology:2003,110(8).
– Snir M, et al. Efficacy of diclofenac versus dexamethasone for treatment after
strabismus surgery. Ophthalmology:2000,107(10).
These papers seem to be pertinent to the subject being discussed. This example
demonstrates the effectiveness of the system, and its ability to provide published
literature to supplement the information being shared online.
MESH Score
Lenses 2583
Pain 2434
Anesthesia 1722
Anesthesiology 1722
Eye 1000
Table 1. A sample of the mappings corresponding to the sample conversation
3.1 Evaluation
There is a challenge in evaluating a search strategy of this type. The strategy
is specific to unstructured, medical conversations, and it is therefore not possi-
ble to apply the search strategy to traditional annotated information retrieval
databases. Without an annotated database it is difficult to calculate precision
and recall, the traditional measures of information retrieval systems. An alter-
native strategy for evaluation is therefore required.
The search strategy was tested on a sample of conversations from the PPML
between 2007 and 2008. For each conversation Metamap was used to map the
conversation to a set of MeSH terms. The MeSH terms were then fed to both
search strategies (the eBIR algorithm and the modified algorithm), and the top
15 papers returned by each search were linked to the appropriate thread. The
MEDEX 2010 Proceedings 41
threads were evaluated to see if the set of papers returned was appropriate. A
set of papers was deemed appropriate if at least one of the returned papers was
relevant to the subject being discussed.
The results of the study were promising. For the eBIR algorithm 55% of the
papers returned were deemed relevant to the thread. This percentage jumped up
to 70% for the improved algorithm, a significant increase over the first attempt
(p = 0.0025). The improvement is due to the filter, which restricted the search
area to those papers relevant to the pediatric population.
4 Conclusion
The purpose of knowledge linkage is to provide clinicians with quick access to
evidence-based knowledge to supplement the tacit knowledge they share via web
2.0 communications. Because the information retrieval process is done passively,
without clinician input, a robust algorithm is required that can consistently re-
turn pertinent medical knowledge. This paper has presented an algorithm that
incorporates query weights to automatically produce a search query that is ap-
propriate for specialized knowledge domains such as pediatric pain. The algo-
rithm was built on the eBIR algorithm, and has been proven to significantly
improve the relevance of the papers returned.
Future research should be directed at a larger study of the two algorithms,
along with a comparison to a more sophisticated eBIR implementation. A better
evaluation of the search strategy should be completed, including evaluating the
precision and the recall of the strategy, and an implemented system should be
tested to evaluate the overall usability of the system.
References
1. Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus:
The metamap program. Proceedings of the AMIA Symposium (2001)
2. Atherton, T.: Children’s experiences of pain in an accident and emergency depart-
ment. Accident and Emergency Nursing 10, 79–82 (1991)
3. Caty, S., Tourigny, J., Koren, I.: Assessment and management of children’s pain in
community hospitals. Journal of Advanced Nursing 22(4), 638–645 (1995)
4. Chapman, W.W., Fiszman, M., Dowling, J.N., Chapman, B.E., Rindflesch, T.C.:
Identifying respiratory findings in emergency department reports for biosurveillance
using metamap. MEDINFO (2004)
5. Chase, H.S., Kaufman, D.R., Johnson, S.B., Mendonca, E.A.: Voice capture of med-
ical residents’ clinical information needs during an inpatient rotation. Journal of the
American Medical Informatics Association 16, 387–394 (2009)
6. Salton, G., Fox, E., Wu, H.: Extended boolean information retrieval. Commun ACM
26(11), 1022–1036 (1983)
7. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing.
Commun ACM 18, 613–620 (1975)
8. Trieschnigg, D., Pezik, P., Lee, V., de Jong, F., Kraaij, W., Rebholz-Schuhmann,
D.: Mesh up: effective mesh text classification for improved document retrieval.
Bioinformatics 25, 1412–1418 (2009)
42 MEDEX 2010 Proceedings