<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UIC/OHSU CLEF 2018 Task 2 Diagnostic Test Accuracy Ranking using Publication Type Cluster Similarity Measures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aaron M. Cohen</string-name>
          <email>cohenaa@ohsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Neil R. Smalheiser</string-name>
          <email>neils@uic.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oregon Health &amp; Science University</institution>
          ,
          <addr-line>Portland, Oregon</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Illinois College of Medicine</institution>
          ,
          <addr-line>Chicago, Illinois</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear SVM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMIDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMIDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Publication Types</kwd>
        <kwd>Diagnostic Test Accuracy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We participated in Task 2 of the CLEF 2018 e-Health challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The goal of this task
was to identify and rank articles relevant to conducting a systematic diagnostic test accuracy
review on a given topic, among those articles returned by topic-specific PubMed queries.
Search
#8
#7
#6
#5
#4
#3
#2
#1
      </p>
      <p>Query
Search #7 AND #5 NOT #6
Search "diagnostic test accuracy"[ti] OR "diagnostic accuracy"[ti]
Search editorial[pt] OR letter[pt] OR "practice guideline"[pt] OR review[pt]
Search #1 AND #2 AND #3 AND #4
Search humans[MeSH Terms]
Search hasabstract
Search ""english""[Language]) OR ""english abstract""[Publication Type]"</p>
      <p>Search ""1987/1/1""[Date - Publication] : ""2015/12/31""[Date - Publication]</p>
      <p>
        We have been extending our prior work on probability based tagging for specific
publication types [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] by developing a general system to predict probabilities for multiple
publication types simultaneously [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We applied a preliminary version of that system on six
clinical publication types, reporting here only on DTA publications.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>The UIC/OHSU CLEF 2018 Task 2 submission applies a machine learning approach to
ranking the PMIDs retrieved by CLEF for 20 topics. The approach assigns probabilities to
individual PMIDs based the likelihood that they are DTAs. To generate positive training
data, likely DTAs were retrieved using the PubMed query shown in Figure 1. No specific
information about the topic queries generating the PMID list for each query was used.
The system builds a predictive model in stages. First, publication type clusters, including
diagnostic test accuracy papers (DTAs), were built by searching PubMed from 1987-2015.
Six publication type (PT) clusters were used in this model: DTAs, Randomized Controlled
Trials, Cross-sectional Studies, Cross-over Studies, Cohort Studies, and Case-Control
Studies. These clusters were used as training data to create several types of cluster similarity
measures for each publication type. The PT clusters are treated as consensus profiles that
represent the PT as a whole, so any given article is judged to belong to it if it is sufficiently
similar in its weighted sum of similarity features. While the members of each cluster are
very likely to be examples of the cluster specific publication type, nothing in the method
requires all the articles in a cluster to be of that publication type. Somewhat noisy training
data is expected.</p>
      <p>
        Similarity types used as features included: implicit term similarity, most important word
similarity, journal similarity, and author count similarity. Implicit term similarity measures
how similar a paper is to a cluster based on terms (words, bigrams, etc.) that commonly
occur with words contained in the papers within each cluster relative to the baseline
frequency across MEDLINE. A cluster “centroid-like” vector is computed as the mean vector
of the individual cluster article vectors, where each article vector consists of the 300
weighted terms most associated to the words in the article. The cluster centroid is limited
to the 300 highest total scoring terms across the cluster. See [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for a complete and detailed
description.
      </p>
      <p>
        Most important word similarity measures the fraction of words in the paper that are in the
list of most important words computed for each cluster, as measured by the frequency of
the word occurring in that cluster versus MEDLINE as a whole.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] Journal similarity
measures how representative an article’s journal is for a cluster, again as measured by the
frequency of the journal occurring in that cluster versus the rest of MEDLINE. A MeSH
based journal distance measure was used for papers published in journals that did not occur
in the cluster to estimate cluster similarity based on the most similar journal in the cluster
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The author count similarity measures how selective the author count of a paper is for a
particular cluster. Note that the criteria used for defining DTAs by PubMed search were
NOT directly used by the features used in the classification model. Individual publication
MeSH terms were not used directly as features in any of the similarity measures.
The four similarity measures produce one feature for each of the six publication type
clusters, resulting in 24 similarity-based features. These similarity features were then used with
weighted and un-weighted linear SVM machine learning algorithms, which were trained
with a data set retrieved from the 1987-2015 PubMed searches. The DTA cluster was used
as positive training data set, and the other clusters were combined into the negative training
data set. This resulted in training data consisting of 3481 PMIDs likely to be diagnostic test
accuracy papers (DTAs), and 71684 PMIDs most of which are not likely to be DTAs.
The trained weighted and un-weighted SVM models were then applied to the CLEF 2018
task 2 challenge data. The PMIDs supplied in the topic files were used to retrieve the full
PubMed XML record for these articles, and the XML records used to compute the 24
similarity features for input to the trained models.
      </p>
      <p>The trained models produce probability scores predicting whether or not an individual
PMID is a DTA. The PMID predictions were then organized according to the CLEF 2018
Task 2 topics, and were ranked within a topic by probability. The cut-off probability for
each of the two models was determined by visual inspection of the score distribution on the
test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40
for the weighted SVM model. This information was combined into the submission qrel files,
rank ordering the topic publication PMIDs highest to lowest predicted probability, one file
for each model. In this manner we produced two sets of predictions, submitted as two
separate runs: OHSU_UIC_LIBLINW for the weighted model, and OHSU_UIC_LIBLINB for
the unweighted model.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>
        The official overall evaluation results for our systems are shown in Table 1. Across the
board, the liblinear system with inverse class frequency weighting performed slightly better
than the liblinear with bias version. These results are averages across all the topics. Based
on the similar CLEF 2017 task, these results are about median as compared to other entries.
The average precision achieved by the our liblinear weighted system was 0.180, which
would have ranked 14th out of 33 CLEF 2017 entries [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>Considering that we only ranked articles according to their probability of being a DTA, and
did not evaluate query topic information at all, our approach did have some significant value
in identifying articles that are relevant for inclusion in topic-specific systematic reviews.
We plan on continuing to work on our system, expanding the number of clusters and
publication types, as well as add additional cluster similarity measures. While the current
approach uses an SVM in a one-versus-rest approach for multi-classification, we are also
experimenting with other classifiers which are more flexible with multiple category
classification such as random forests and deep learning neural networks.
0.860
0.935
1.000
0.180
0.846
0.926
1.000
0.174</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Suominen</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            <given-names>R</given-names>
          </string-name>
          , et al.
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2018</article-title>
          . In:
          <article-title>CLEF 2018 - 8th Conference and Labs of the Evaluation Forum</article-title>
          . CEUR-WS: Springer;
          <year>2018</year>
          .
          <article-title>(Lecture Notes in Computer Science (LNCS))</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kanoulas</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            <given-names>L</given-names>
          </string-name>
          .
          <article-title>CLEF 2018 Technology Assisted Reviews in Empirical Medicine Overview</article-title>
          . In:
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop. CEUR-WS;
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cohen</surname>
            <given-names>AM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smalheiser</surname>
            <given-names>NR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonagh</surname>
            <given-names>MS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            <given-names>CE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            <given-names>JM</given-names>
          </string-name>
          , et al.
          <article-title>Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine</article-title>
          .
          <source>J Am Med Inform Assoc JAMIA</source>
          . 2015 May;
          <volume>22</volume>
          (
          <issue>3</issue>
          ):
          <fpage>707</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Smalheiser</surname>
            <given-names>NR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            <given-names>AM</given-names>
          </string-name>
          .
          <article-title>Design of a generic, open platform for machine learningassisted indexing and clustering of articles in PubMed, a biomedical bibliographic database</article-title>
          .
          <source>Data Inf Manag</source>
          .
          <year>2018</year>
          ;
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Smalheiser</surname>
            <given-names>NR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonifield</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are Complementary to Neural Embeddings</article-title>
          .
          <source>ArXiv Prepr ArXiv180101884</source>
          .
          <year>2018</year>
          ;
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Smalheiser</surname>
            <given-names>NR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torvik</surname>
            <given-names>VI</given-names>
          </string-name>
          .
          <article-title>Distribution of “Characteristic” Terms in MEDLINE Literatures</article-title>
          . Information.
          <year>2011</year>
          ;
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>266</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jennifer</surname>
            <given-names>LD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smalheiser</surname>
            <given-names>NR</given-names>
          </string-name>
          .
          <article-title>Three journal similarity metrics and their application to biomedical journals</article-title>
          .
          <source>PloS One</source>
          .
          <year>2014</year>
          ;
          <volume>9</volume>
          (
          <issue>12</issue>
          ):
          <fpage>e115681</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kanoulas</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>CLEF 2017 technologically assisted reviews in empirical medicine overview</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          .
          <year>2017</year>
          . p.
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>