A Task-based Comparison of
     Linguistic and Semantic Document Retrieval
           Methods in the Medical Domain

Mohammad Shafahi1 , Qing Hu2,3 , Hamideh Afsarmanesh1 , Zhisheng Huang2 ,
            Annette ten Teije2 , and Frank van Harmelen2
 1
    Informatics Institute, Faculty of Science, University of Amsterdam, Science Park
                           904, Amsterdam, The Netherlands,
                         {m.shafahi,h.afsarmanesh}@uva.nl
2
  Department of Computer Science, VU University Amsterdam, De Boelelaan 1081,
                             Amsterdam, The Netherlands
              {qhu400,huang,annette,Frank.van.Harmelen}@cs.vu.nl
  3
     College of Computer Science and Technology, Wuhan Univesity of Science and
                               Technology, Wuhan, China


      Abstract. Text-based and semantics-based methods are both studied
      intensively as methods for document retrieval. In order to gain insight
      in the respective merits of these two approaches, we have performed a
      controlled experiment where we executed a real-life task using both text-
      based and semantics-based techniques.
      To maximise the lessons that we could draw about the two approaches,
      we have performed an experiment where we used the same task (searching
      papers from the scientific literature needed for updating a medical guide-
      line), the same test-case (updating the 2004 Dutch national breast-cancer
      guideline), the same gold standard (the updated 2012 Dutch national
      breast-cancer guideline) and the same corpus (PubMed). We then per-
      formed this task using two different methods: retrieving papers based on
      keywords (text-based approach) and retrieving papers based on semantic
      annotations (semantics-based approach). Based on this experiment, we
      discuss the insights that we gained from this dual set of experiments.

      Keywords: document retrieval, keyword search, semantic annotation,
      concept-based search, relation-based search


1    Introduction
Many domains involve retrieving items from large text corpora. Examples are
searching for web-pages, searching scientific literature, or question answering
over a text corpus. Classical information retrieval techniques use text-based
methods for selecting and ranking the most relevant documents for a query.
Typical examples are N-gram similarity, vector-space models over words, prob-
abilistic language models, etc. [10].
    There is an increasing interest in the use of semantic methods for retrieving
items from large corpora (eg [5]). In such techniques, words from both the query
2       M. Shafahi et al.

and the items in the corpus are mapped to concepts and relations in a knowledge
source (typically an ontology), and retrieval is then based on semantic proximity
in the background ontology.
    Attempts to understand the circumstances that determine the effectiveness
of each approach have a long history (e.g. [9]). This paper contributes to this
understanding of the respective merits of text-based and semantics-based infor-
mation retrieval.
    To this end, we have performed an experiment where we used the same task
(searching papers from the scientific literature needed for updating a medical
guideline), the same test-case (updating the 2004 Dutch national breast-cancer
guideline), the same gold standard (the updated 2012 Dutch national breast-
cancer guideline) and the same corpus (PubMed). We then performed this task
using two different methods: retrieving papers based on keywords (text-based
approach) and retrieving papers based on semantic annotations (semantics-based
approach)that in this case applies concepts and relations extracted from corpus.
Based on this experiment, we discuss the insights that we gained from this dual
set of experiments.
    The rest of the paper is organized as follows. Section 2 describes the evaluation-
task. Section 3 describes our experimental setup (gold standard, corpora, met-
rics). Sections 4 and 5 describe the text-based and semantics-based methods
respectively. Section 6 interprets and concludes the results from our experiments.


2   Description of the task

A medical guideline, alternatively called clinical guideline, is a document which
is designed with the aim of guiding medical decisions and criteria for diagnosis,
management, and treatment in specific areas of health-care. Medical guidelines
have been proved to be valuable for clinicians, nurses, and other health care
professionals [16]. Evidence-based medical guidelines are developed based on the
best available evidence in biomedical science and clinical practice. Guideline
recommendations in evidence-based medical guidelines are annotated with their
underlying evidence and their evidence classes.
    Evidence-based medical guidelines are expected to be updated regularly and
frequently, so that medical guidelines can accommodate the latest research find-
ings. However, such a requirement on timely and regularly update of a medical
guideline has been proved to be difficult for two reasons. First, the number of
medical publications and the size of medical information is very large (for ex-
ample, PubMed4 alone contains more than 24 million citations for biomedical
literature from MEDLINE5 ). Second, large volumes of new medical findings oc-
cur every day (PubMed is growing at a rate of 750.000 papers per year6 , ie
roughly one new paper every minute).
4
  http://www.ncbi.nlm.nih.gov/pubmed
5
  http://www.nlm.nih.gov/bsd /pmresources.html
6
  http://www.nlm.nih.gov/pubs/factsheets/medline.html
         Comparison of Linguistic and Semantic Document Retrieval Methods         3

    Consequently, it usually takes about five years to release a new update of a
medical guideline. However, such an update frequency significantly lags behind
the occurrence of new medical findings. Thus, automatically finding new and
relevant evidences for timely and regularly updates of medical guidelines has
become one of the important challenges in medical information retrieval.
    We have taken this task of finding medical publications which are relevant
for updating a given guideline as our benchmark. A medical guideline is usually
a document of more than hundred pages of text and tables. The essence of the
guideline is captured in numerous recommendations (called ”conclusions”), each
of them in the form of a short paragraph, for example:

     ”A descriptive study found that women who undergo breast reconstruc-
     tion immediately following the mastectomy are more satisfied with the
     aesthetic result and experience greater psychosocial well being than women
     who undergo secondary reconstruction.”
     (1st conclusion in Section 1.2.6 (on page 25/117), from the Dutch Na-
     tional Breast Cancer Guideline, 2004 - considered as conclusion nr. 12
     in our experiments)

Such guideline conclusions are typically annotated with somewhere between 1 to
10 citations to the medical literature that provide the evidence for the conclusion.
For example, the above recommendation was supported by three citations to the
literature, from the years 1984, 1995 and 2000.
    Our benchmark task is now to find for each conclusion in a guideline all the
recent medical publications which are relevant for making an updated version of
that conclusion.


3     Description of guideline, corpus, gold standard, and
      metrics
Guideline: For investigating the behaviour of both the text-based and the
semantics-based methods of document retrieval for the purposes of finding new
evidences to update the conclusions of a medical guideline, we have selected the
Dutch National Guideline for Breast Cancer from 2004 (version 1.0, [12]7 ). The
guideline is a document of 117 pages, listing around 50 recommendations (”con-
clusions”) in total, each the length of 1 to 2 sentences. This guideline is in daily
use nationwide.
    Corpus: As the corpus for our text-based experiment, we have used the
PubMed query service. This service allows querying of titles and abstracts of 24
million publications from the biomedical scientific literature.
As the corpus for our semantics-based experiment we have used the query service
of BioMed Xplorer [15]. BioMed Xplorer is built on top of SemMedDB8 [8],
containing semantic annotations in the form of triples which have been extracted
7
    For our experiments we used a certified English translation of the document
8
    http://skr3.nlm.nih.gov/SemMedDB/
4      M. Shafahi et al.

from PubMed and annotated with the PubMed-ID of the paper(s) from which the
relation was extracted. In our experiments we have used a version of SemMedDB
that hosts more than 70 million statements extracted from PubMed papers.
Furthermore, the concepts and relations that form the statements in SemMedDB
have been linked to corresponding concepts and relations from Linked Life Data
[11] and Bio2RDF [4].
    Gold Standard: When our search methods (either text-based or semantics-
based) search for publications in PubMed, which are relevant for updating a
particular guideline recommendation, how should we measure their success? In
other words, how can it be decided whether the returned publications are indeed
those relevant for updating the guideline? For this purpose, we use the updated
2012 revision of the 2004 Dutch National Breast Cancer guideline [13] and create
our gold standard from it. In the revised 2012 guideline, we have identified 16
corresponding and/or matching example conclusions from 2004, that while each
of the pairs addresses the same or similar subject, they have distinct revised
statements in the 2012 version of the guideline. Thus indicating a clear revision
of their 2004 conclusion. All other conclusions in the 2012 version were either
directly copied from the 2004 version and not updated, or they were entirely new
and could not be interpreted as a revision of a conclusion from 2004. For these
16 conclusions, we have then identified the new publication evidences that were
listed from them (sometimes a revision also listed some of the evidences from
the previous version (these we will call hits for our search results). These hits
(all evidences that were actually used in the 2012 revision) are the gold standard
for our search methods: ideally the search methods would suggest all the hits
and only the hits.
    Metric: In practice, of course, our search methods will not return all the
hits, and they will return not only the hits but also other papers. Let count be
the number of papers returned by the search method and hit be the number
of papers that have been referenced in the conclusion of the guideline, then we
would like count to be as small as possible, while containing the maximal number
of hits. In a realistic scenario, a guideline revision committee may consider many
dozens of papers for a single conclusion, but certainly not more than a few
hundred, putting a stringent upper bound on a realistic value for counts. Now
let Relevant hits (RH) be the number of papers returned by the search method
that have been referenced in at least one of the conclusions of the guideline. We
would also then like RH to be maximal for each concept or relationship used for
querying. In other words, we would like RH/count to approach 1.


4   Description and results of the text-based method
In the text-based method, PubMed queries are generated in the form of medical
terms which appear in a guideline conclusion. Construction of this query proceeds
in the following steps [7]:
1. We use Xerox’s NLP tool [1, 2] to identify the medical terms which appear
   in the guideline conclusion (formulation from 2004).
       Comparison of Linguistic and Semantic Document Retrieval Methods          5

 2. In the same way we collect medical terms from the heading of the guideline-
    section in which the conclusion appears.

 3. We use a co-occurrence based ranking measure to rank the extracted terms
    (the ranking measure is computed by counting co-occurences of the term in
    the PubMed corpus)

 4. We construct a query as the conjunction of the top k ranked terms, where
    k is determined by heuristically balancing the size of counts and hits.


In [6], we develop a heuristic function which considers the balance of the hits of
original evidences and the counts to evaluate the search results, in order to find
the best answer for k. These results are then compared against the evidence items
for the corresponding conclusion in the 2012 version of the guideline, giving us
the score of the query in terms of the number of hits. We repeat this procedure for
each of the 16 recommendations from the 2004 guideline which have a revised
version in the 2012 guideline. The results of this experiment are reported in
                        Conclusion   Goal Hits Count %
                           C1          5   2     60 40%
                           C2          2   1    166 50%
                           C3          4   1     36 25%
                           C4         14   0     49  0%
                           C5          2   1     28 50%
                           C6          2   0     33  0%
                           C7          2   1    333 50%
                           C8          8   3    140 38%
                           C9          2   1     89 50%
                           C10         5   3   1628 60%
                           C11         5   3    281 60%
                           C12         3   0     82  0%
                           C13         5   5   9911 100%
                           C14         3   1     72 33%
                           C15         2   0    372 0%
                           C16         2   1    324 50%
                           Total      66 23 13604
                      Nr. of hits > 0       12
                         Average       4     1    850 35%
                    Table 1. Results of the text-based method.


table 1. Conclusions are numbered C1 to C16. The table shows that the text-
based method found some hits for 11 out of the 16 guideline conclusions, with
an average of 35% of all evidence items retrieved. The total number of returned
PubMed entries (the count) ranges from below 100 (reasonable) to a few hundred
(problematic), with two outliers over 1000.
6        M. Shafahi et al.

5     Description and results of the semantic-based methods
In the semantic-based method, the BioMed Xloprer [15] is searched based on the
semantic concepts and relations that are extracted from the text of the guideline.
The 16 conclusions from the guideline are used as the base also for this method,
but here we aim to reach suitable matching criteria for discovering relevant
evidences from BioMed Xplorer, using the extracted semantics instead of the
terms and keywords. We introduce two semantic-based search methods, namely a
concept-based method and a relation-based method. Through some experiments,
we also measure the results of our methods against the gold standard.
    Aiming to optimize the identification of relevant evidence items for updat-
ing the guideline, in our first semantic-based approach we construct queries for
BioMed Xloprer based on concepts generated out of the same keywords as those
used in the text-based method. We use the Meta-maps tool [3] for mapping the
keywords into their relevant concepts, and formulate the queries in SPARQL.
This concept-based search approach for evidences and the example experiment
for it are further descried in (section 5.1). In our second semantic-based ap-
proach, the queries constructed for BioMed Xplorer are based on the relations
that we extract out of the text in the guideline related to the 16 conclusions, e.g.
the abstracts. We use the SemRep tool [14] to automate the extraction of rela-
tions, and construct SPARQL queries to BioMed Xplorer from their conjunction.
However, an abstract typically precedes several conclusions at the same time,
so the produced results also need to be evaluated against the hits in the corre-
sponding group of conclusions, and not per conclusion. Further description of
this method and its two experiments are addressed in (section 5.2).

5.1     Detecting new evidences using concepts extracted from
        keywords out of the conclusions and headers
In this semantic-based method, BioMed Xplorer [15] queries are generated en-
capsulating concepts that may represent either the subject or the object of a
triple relationship. The following steps are followed in this approach:
 1. Starting with the 25 keywords that are related to 16 conclusions, as identified
    in section 4, we first apply UMLS medical concepts used by the MetaMap
    tool [3], to map each keyword into its related set of concept(s).
 2. For each generated concept, we formulate a SPARQL query as the conjunc-
    tion of the identified relationships, addressing the RDF triples in which the
    concept is either the subject or the object.
 3. The BioMed Xplorer is then queried, to search for annotated publications
    as evidences related to each conclusion.
An example output of the Metamap for the keyword mastectomy follows:
      C0024881:Mastectomy
      C0191849:Mastectomy (Excision of breast tissue)
      C0024886:Mastectomy (Simple mastectomy)
         Comparison of Linguistic and Semantic Document Retrieval Methods        7

    In the case when a keyword is mapped to multiple concepts, as demonstrated
above for the keyword ”mastectomy”, then the ”union” of all these concepts is
used for formulating the query. We compare our results against corresponding
evidence items for each conclusion in the 2012 guideline, calculating the score
for the query on each concept, in terms of the number of hits. This procedure is
repeated for every concept generated from each of the 25 keywords, as extracted
from 16 conclusions related to the 2004 guideline. Table 2 reports on the results
of this experiment. Please note that the highlighted cells of the table demonstrate
from which conclusion a keyword has been extracted. For example the keyword
”excision” has been extracted from conclusion 5 (i.e. C5). It also indicates that
the search for ”excision” through BioMed Xplorer has discovered 1 out of the 2
evidences in the gold standard for this conclusion. Furthermore, the search for the
”excision” keyword has also discovered 1 relevant evidence for each conclusion
C1, C3, C8, and C16. Over all this keyword has been found relevant in discovery
of 5 papers referenced within the 16 conclusions (i.e 5 RH-relevant hits). The
latter discovery of evidences indicate that this semantic-based approach can
further enhance and benefit from concepts in other conclusions in the guideline,
for identifying the needed evidences. We would later on use this fact in section
5.2.
    Although the results in table 2 suggest that some selected on the keywords
for the text-based method are very suitable for a semantic based method (i.e dcis
for C1 that has 100% recall), it also shows that out of these 25 keyword, 12 were
not suitable for retrieving any of the goal publications. This is even move visible
when noticing that out of the 12 unsuitable keywords, three of them not only
have a hit of zero but also a count of zero. Another interesting finding is that
for the keywords ”resection” and ”excision” although in the text-based method
they are considered as 2 different entities, in this sematic-based method they are
mapped into a common concept and as such they are considered as one entity
(hence providing the same results).

5.2     Detecting new evidences using extracted relations from the
        conclusions and abstracts in the guideline
In this semantic-based method, BioMed Xplorer queries are generated in the
form of triples, representing a medical statement, extracted first from guideline
conclusions and second from the abstracts corresponding to the conclusions.
Construction of this type of query involves the following steps:
 1. We use the SemRep tool [14] to extract medical statements from the text
    which appear in the guideline conclusion (formulation from 2004).
 2. We transform the extracted medical statements into a RDF triples.
 3. BioMed Xplorer is queried to find the annotated publications for each of the
    queries.
Example extracted relations from conclusion number 12 (i.e C12) using SemRep
      Mammaplasty (C0085076) TREATS Woman (C0043210)
      Reconstructive Surgical Procedures (C0524865) TREATS Woman (C0043210)
8      M. Shafahi et al.

Keyword       RH C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 Count
dcis           10 5 2 2                                  1               3615
local    exci- 6 2 1 1              1                        1           3063
sion
radiotherapy 0                                                            105
recurrence     0                                                         3938
margin of ex- 0                                                          4795
cision
breast         16 4 1 2 2           1    1 2     2       1   1   1      62069
bct            1                          0                               524
irradiation    0                                                          103
survival rate 0                                                           360
boost          0                                                           0
resection      5 1         1 1         1                             1 149902
excision       5 1         1 1         1                             1 149902
age            2                    1                                   43288
primary        0                                                          587
mastectomy 23 1         1 4            1 1 2     3   1   3   3       1 5357
survival       0                                                           8
systemic       2                    1        1                           6097
therapy
reconstruction11                                     1   4   1 2     2 37261
breast can- 36          1 9 2 1        5 2 3     2   2   2   1          107960
cer
skin-sparing 0                                                             0
mastectomy
autologous     0                                                           0
silicone       2                                                 1       4562
breast recon- 9                                      2   2   2       1 2573
struction
complications 3                                                      2 154062
local          0                                                          391
Goal             5 2 4 14 2 2 2 8 2 5                  5   3   5   3   2 2
Hits             5 2 2 0 1 0 0 0 0 2                   3   2   4   2   1 2
%               100 100 50 0 50 0 0 0 0 40 60 65 80 65 50 100
          Table 2. Results of extracted concepts from keywords method.


These results are then compared against the evidence items for the conclusions
in the guideline, giving us the score of the query in terms of the number of hits.
We repeat this procedure for each of the 16 conclusion from the 2004 guideline.
    The results of this experiment are reported in table 3. Please note that only
the conclusions for which SemRep managed to extract relationships are shown
in the table. Table 3 suggests that although this method is only capable of ex-
tracting relationships in 6 out of 16 conclusion cases, when possible this method
achieves comparable and in some cases better precision then the text-based meth-
ods. It is also interesting to point out that based on the RH results, one can con-
       Comparison of Linguistic and Semantic Document Retrieval Methods           9

  Conclusion Goal Extracted relationships                     Hit RH % Count
     C2        2 Excision TREATS Noninfiltrating Intra- 0 1 50% 60
                  ductal Carcinoma
                  Adjuvant therapy USES Tamoxifen              0 1 25% 339
     C3        4 Breast LOCATION OF Tamoxifen                  0 0 0% 37
                  Therapeutic procedure TREATS Neoplasm 0 1 25% 3799
     C4       14 Modified radical mastectomy PRECEDES 0 0 0%             8
                  Radiation therapy
     C9        2 Primary Carcinoma PART OF Breast              0 0 0% 1425
                  Mammaplasty TREATS Woman                     1 2 67% 179
     C12       3
                  Reconstructive     Surgical     Procedures 0 0 0% 251
                  TREATS Woman
     C14       3 Reconstructive      Surgical     Procedures 0 0 0% 32
                  METHOD OF Mastectomy
           Table 3. Results of extracted relations from conclusion text.


clude that the ”TREATS” and ”USES” relations are the most suited statements
for retrieving the proper literature, as other relationship types have a RH of zero
in all cases. We have then used the lessons learned in the above experiment, to
improve our relation-based approach as follows.
    So far, queries are in the form of triples and represent a medical state-
ment/relationship just based on the guideline conclusions. To enhance this ap-
proach instead of only using the guideline conclusions we also use the abstracts
provided for each set of the conclusions as the input for extracting the triples. In
relation to the 16 conclusions these are 7 abstracts in the guideline that we can
use to extract more suitable relations, covering all 16 conclusions. Also based on
the results in table 3 we have decided to focus only on ”TREATS” and ”USES”
statements for our retrieval task. As such the construction of the query is done
as follows:
 1. We use the SemRep tool [14] to extract medical statements from the ab-
    stracts which appear for the guideline conclusions and the conclusions them
    selves (formulation from 2004).
 2. Out of the resulted medical statements we only select the statements that
    have ”TREATS” or ”USES” as their predicate.
 3. We transform the filtered set of medical statements into RDF triples.
 4. BioMed Xplorer is queried to find the annotated publications for each of the
    queries.
    When evaluating the results of this experiment in table 4 the overall hit
rate is lower than the hit rate of the text-based approach (i.e 26 % compared
to 35 %). Although comparing abstracts A1 (i.e for Conclusions 1-3) and A3
(i.e for Conclusions 4-9) in table 4, indicates that the longer text considered
for each conclusion, the better the results of our approach. It is important to
point out that given larger input, this approach can further improve the quality
of its results, and improve over the text-based approach, even if removing the
problematic relations that have a count of more than 1000).
10     M. Shafahi et al.

Abstract Conclusions Goal Hit % Extracted relationships                      Hit RH Count
                                   Excision TREATS Noninfiltrating Intra- 1 1        60
                                   ductal Carcinoma
                                   Adjuvant therapy USES Tamoxifen            0 1 339
  A1       C1-C3      11 6 55%
                                   Therapeutic procedure TREATS Neoplasm 0 1 3799
                                   Radiation therapy TREATS Noninfiltrat- 3 4 145
                                   ing Intraductal Carcinoma
                                   Therapeutic procedure TREATS Invasive 0 0         59
                                   Carcinoma
                                   Therapeutic procedure TREATS Noninfil- 2 3 156
                                   trating Intraductal Carcinoma
                                   Pharmacotherapy TREATS Woman               0 1 802
                                   Pharmacotherapy TREATS Malignant 1 2 2420
  A2       C4-C9      30 4 13% neoplasm of breast
                                   Reexcision TREATS Neoplasm                 0 0    17
                                   Radiation therapy TREATS Woman             3 10 664
                                   Clinical Research USES Clinical Trials, 0 0       50
                                   Phase II
                                   Modified radical mastectomy TREATS In- 0 0         0
                                   dicated
                                   Operative Surgical Procedures TREATS 0 0 525
  A3      C10-C11     10 6 60% Male population group
                                   Operative Surgical Procedures TREATS 0 4 2198
                                   Malignant neoplasm of breast
                                   Modified radical mastectomy TREATS 0 0            10
                                   Neoplasm
                                   Chemotherapy, Adjuvant TREATS Pa- 1 1 2178
                                   tients
                                   Modified radical mastectomy TREATS Pa- 1 1 297
                                   tients
                                   Radiation therapy TREATS Patients          5 11 19081
                                   Mammaplasty TREATS Woman                   0 2 179
  A4        C12        3   1 33% Reconstructive        Surgical    Procedures 0 0 251
                                   TREATS Woman
                                   Mammaplasty TREATS Patients                1 3 710
                                   Mammaplasty USES Prosthesis                0 1    64
  A6        C15        2   0 0% Implantation procedure USES Silicones         0 0    43
                                   Prosthesis USES Silicones                  0 1 355
                                   Radiation therapy TREATS Patients          0 11 19081
                                   Reconstructive      Surgical    Procedures 0 0 251
  A7        C16        2   0 0%
                                   TREATS Woman
                                   Radiation therapy TREATS Complication 0 1 127
                                   Mammaplasty USES Prosthesis                0 1    64
            Table 4. Results of extracted relations from abstract text.


6    Interpretation of the results and Conclusion
In this paper, we have reported on two kinds of semantic experiments for docu-
ments retrieval in the medical domain, namely: a concept-based method and a
       Comparison of Linguistic and Semantic Document Retrieval Methods          11

relation-based method. The results of those experiments are compared against
a keyword-based experiment with the same task, the same test-case, the same
gold standard, and the same corpus.
     From the first experiment (concept based) in section 5.1 which we consider
the concepts generated from a keyword that may appear either as a subject or
an object in a triple, we can see that the method can find the goal evidences for
11 conclusions out of the 16, compared with the 12 out of 16 evidences obtained
when using the text-based method. Also the precision of concept-based approach
is still quite low in comparison to the text-based method, namely its counts are
still quite large. This could be due to the fact that the concept-based method
uses UMLS as its concept ontology, and PubMed uses the MeSH ontology for
indexing keywords. As MeSH is covered in UMLS, it might be the case that
PubMed performs a concept-based search, based on MeSH, using the keywords
provided by our text-based method. Therefore, the text-based approach returns
results similar to those of our concept-based method. Investigating the validity
of this educated guess is planned as one of our future work.
     From the second experiment (relation-based) in section 5.2 however, we
achieve good results. In this method, where we extract both the concepts and
relations directly from the text of the guideline, we observe that the counts
are much smaller than those in the concept-based method. Consequently, with
relation-based method, we can achieve much higher precisions, compared to the
concept-based method. But the precision is still lower then those of the text-
based method. However, we achieve much better results with the goal evidence
discovery. With the relation-based method, we discover 14 out of 16, when com-
pared to text-based method that finds 12 out of 16. Furthermore, the total
number of hits in the concept-based method (26 hits) is larger than the total
number of hits in the text-based method (23 hits). In future work, we would like
to further investigate which criteria of the environment (e.g. guideline) would
prove to be more suitable for adopting one of these methods.
     For the evaluation of the results of text-based methods, we have previously
invited three medical professionals from the MAASTRO clinic in the Netherlands
to score the guideline update tool with respect to various properties such as
functionality, efficiency, usability, reliability and quality of use [7]. We plan to
perform a similar evaluation for the results of semantics-based methods as a
future work.


References
 1. Salah Ait-Mokhtar, Berry De Bruijn, Caroline Hagege, and Pajolma Rupi. Initial
    prototype for relation identification between concepts, D3.2. Technical report,
    EURECA Project, 2013.
 2. Salah Aı̈t-Mokhtar, Jean-Pierre Chanod, and Claude Roux. Robustness beyond
    shallowness: incremental deep parsing. Natural Language Engineering, 8(2):121–
    144, 2002.
 3. AR Aronson. Effective mapping of biomedical text to the umls metathesaurus: the
    metamap program. In Proceedings of AMIA Symposium, pages 17–21, 2001.
12      M. Shafahi et al.

 4. F Belleau, MA Nolin, N Tourigny, P Rigault, and J Morissette. Bio2rdf: towards
    a mashup to build bioinformatics knowledge systems. Journal of Biomed Inform,
    41(5):706–716, 2008.
 5. Julio Gonzalo, Hang Li, Alessandro Moschitti, and Jun Xu. Sigir 2014 workshop
    on semantic matching in information retrieval. In Proceedings of the 37th Interna-
    tional ACM SIGIR Conference on Research &#38; Development in Information
    Retrieval, SIGIR ’14, pages 1296–1296, New York, NY, USA, 2014. ACM.
 6. Qing Hu, Zhisheng Huang, Annette den Teije, and Frank van Harmelen. Detecting
    new evidence for evidence-based guidelines using a semantic distance method. In
    Proceedings of the 15th Conference on Artificial Intelligence in Medicine(AIME
    2015), 2015.
 7. Qing Hu, Zhisheng Huang, Annette ten Teije, Frank van Harmelen, M Scott Mar-
    shall, and Andre Dekker. A topic-centric approach to detecting new evidences for
    evidence-based medical guidelines. In Proceedings of the 9th International Joint
    Conference on Biomedical Engineering Systems and Technologies (HealthInf2016),
    2016.
 8. H Kilicogl, D Shin, M Fiszman, G Rosemblat, and TC Rindflesch. Semmeddb:
    a pubmed-scale repository of biomedical semantic predications. Bioinformatics,
    28(23):3158–3160, 2012.
 9. Karen E. Lochbaum and Lynn A. Streeter. Comparing and combining the ef-
    fectiveness of latent semantic indexing and the ordinary vector space model for
    information retrieval. Information Processing & Management, 25(6):665 – 676,
    1989.
10. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction
    to Information Retrieval. Cambridge University Press, 2008.
11. V. Momtchev. Expanding the pathway and interaction knowledge in linked life
    data. In International Semantic Web Challenge, 2009.
12. NABON. Guideline for the treatment of breast carcinoma 2004. Technical report,
    Nationaal Borstkanker Overleg Nederland (NABON), 2004.
13. NABON. Breast cancer, dutch guideline, version 2.0. Technical report, Integraal
    kankercentrum Netherland, Nationaal Borstkanker Overleg Nederland, 2012.
14. T.C. Rindflesch and M Fiszman. The interaction of domain knowledge and linguis-
    tic structure in natural language processing: interpreting hypernymic propositions
    in biomedical text. Journal of Biomedical Informatics, 36(6):462–477, 2003.
15. Mohammad Shafahi, Hayo Bart, and Hamideh Afsarmanesh. Biomed xplorer -
    exploring (bio)medical knowledge using linked data. In Proceedings of the 9th In-
    ternational Joint Conference on Biomedical Engineering Systems and Technologies
    (BIOINFORMATICS 2016), 2016.
16. Steven Woolf, Richard Grol, Allen Hutchinson, Martin Eccles, and Jeremy
    Grimshaw. Clinical guidelines:potential benefits, limitations, and harms of clin-
    ical guidelines. BMJ, 318(7182):527–530, 1999.