<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward using ontologies to improve results in searches for mental health information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonathan Bona</string-name>
          <email>jonathanbona@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Grohol</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meredith Zozus</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Zozus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Brochhausen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Clinical Psychologist in Private Practice; PsyUSA Network Curator</institution>
          ,
          <addr-line>Little Rock</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Biomedical Informatics, University of Arkansas for Medical Sciences</institution>
          ,
          <addr-line>Little Rock</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Psych Central.com</institution>
          ,
          <addr-line>Newburyport, MA</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We have implemented a proof of concept that uses select terms from the Consumer Health Vocabulary and the Human Disease Ontology to annotate and index articles about mental health from PsychCentral.com. This paper presents the approach used and preliminary results, which indicate that processing health-related documents with the use of biomedical ontologies and terminologies can make them easier to find using natural language search queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>As part of an ongoing project that aims to improve the
ability of healthcare consumers to find, organize, and access
information about their mental health concerns, we are
developing ontology-driven tools for search, annotation, and
exploration of consumer-oriented health content on the Internet.
The major goal for this project is to improve healthcare
consumers' access to information relevant to their health. This
abstract reports on preliminary work that uses ontological and
terminological resources to index consumer-oriented health
content and make it more easily retrievable with natural
language search queries that would not otherwise yield the same
results. This can also serve to facilitate translation from
consumer language-based queries to material using expert
language, such as medical terminologies.</p>
      <p>To demonstrate the usefulness of existing ontologies and
terminologies for enhancing retrieval of consumer-oriented
health texts in the mental health domain, we have developed
an ontology-based natural language processing and indexing
strategy and a proof of concept for a small domain, and tested
it on a set of curated, consumer-oriented articles retrieved
from Psych Central. Psych Central is the Internet’s largest
and oldest independent mental health social network, with
over 450,000 content pages about mental health.</p>
      <p>We selected the test case of searching for articles with
information about seasonal affective disorder (SAD) among a
curated set of several hundred consumer-oriented documents
about depression. Seasonal affective disorder is cyclical
depression that occurs only during certain times of year, most
commonly in the winter. Some resources, such as the Human
Disease Ontology (DO) treat “winter depression” and
“seasonal affective disorder” as exact synonyms1.</p>
      <p>Because seasonal depression can occur in the
spring/summer, winter depression might be more accurately modeled a
subclass of seasonal affective disorder. To simplify we
follow DO in treating these terms as synonyms here.</p>
      <p>
        We also use terms from the OCHV
        <xref ref-type="bibr" rid="ref1">(Amith &amp; Tao, 2016)</xref>
        ,
an OWL version of the Consumer Health Vocabulary (CHV)
(Zeng &amp; Tse, 2006), which is an open access effort to bridge
the gap between healthcare consumers and professionals by
linking everyday health terms to matching technical terms.
Here we use the OCHV term labeled “seasonal affective
disorder”2 with alternate labels: “affective disorder seasonal”,
“depression seasonal”, “seasonal affective disorder (SAD)”,
“seasonal affective disorders”, “seasonal depression”.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <sec id="sec-2-1">
        <title>Data collection and preparation</title>
        <p>In collaboration with Psych Central, we downloaded 479
articles that appear in the site’s library under Articles on
Depression, or under one of a few related categories: bipolar
disorder, antidepressants, seasonal affective disorder,
postpartum depression. 22 of these 479 articles contain one or
more of the exact phrases “seasonal affective disorder”,
“seasonal depression”, or “winter depression”. Of those, 14
(63.6%) contain only “seasonal affective disorder,” so would
not be immediately uncovered by exact searches the other
terms. A Google search across all Psych Central pages shows
that about 60% of pages containing any of these three terms
actually contains only the term “seasonal affective disorder”.</p>
        <p>We used Python and its BeautifulSoup (Richardson, 2017)
library to download and process these 479 depression-related
articles, writing each page’s content to a plain text file with
most of the structure and formatting removed.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Named entity recognition and indexing</title>
        <p>Using the selected OCHV and DO terms about seasonal
affective disorder, we built a dictionary-based named entity
recognizer (NER) using Apache’s Java-based OpenNLP
Toolkit (Apache Software Foundation, 2017b). This NER
takes a text file and a dictionary as input and outputs a list of
2 http://uth.tmc.edu/ontology/ochv#52085
positions in the text that match any of the terms, along with
the URI(s) of the matching term(s).</p>
        <p>The NER was run on each of the 479 articles, producing
as output an XML file for each containing both the article’s
text and, as separate metadata fields, the URIs and labels of
any of our SAD terms that the NER identified in the text.</p>
        <p>These text and metadata files were then indexed using the
open source search platform Apache Solr (Apache Software
Foundation, 2017a). Solr prepares documents for fast
retrieval by parsing their contents and indexing terms that
appear therein. Solr can automatically perform some basic text
processing tasks, but it does not have built-in ontology-based
named entity recognition. Pro-processing documents with
named entity recognition using our SAD terms, and using the
result as document metadata allows Solr to index those terms
along with those that appear in the source document itself.</p>
        <p>S.A.D. term
collection
(OCHV, DO)
Dictionary
construction
Python</p>
        <p>Download
articles, extract
text</p>
        <p>OpenNLP
Named entity
recognition
(NER)</p>
        <p>Querying &amp;
retrieval
Indexing
content &amp;
metadata</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 RESULTS</title>
      <p>The result of this process of document text extraction,
ontology-driven named entity recognition processing, and
combined text and ontology term metadata indexing is a set of
articles stored in Solr for very fast retrieval that can be found
using any of the terms that correspond to an entity that
matches. For instance, for a document that uses only the
phrase “seasonal affective disorder”, our use of the NER tool
prior to Solr indexing ensures that the metadata used to index
that document includes its synonyms “winter depression,”
“seasonal depression,” etc., and will thereby allow the
document to be retrieved by queries that use any of these terms.</p>
      <p>We tested the effectiveness of this by running queries for
the exact phrases “seasonal affective disorder”, “seasonal
depression”, and “winter depression,” and comparing the
results to similar queries run on a Solr instance that had been
created from the original article files without the named
entity recognition and annotation step. As expected, with the
ontology terms added as metadata, all of the articles that
contain any of these three terms are retrieved by any query that
uses any of these terms. That is, articles containing only
“seasonal depression” can be retrieved by a query that uses
“winter depression” or “seasonal affective disorder” instead.</p>
    </sec>
    <sec id="sec-4">
      <title>4 DISCUSSION AND FUTURE WORK</title>
      <p>Pre-processing text with dictionary-based named entity
recognition using ontology terms from resources such as DO
and OCHV, and indexing the resulting annotations as
metadata along with the original text using standard
information retrieval tools, makes that content more easily
retrievable using a wider variety of search terms. We expect this
result to generalize beyond our specific test case of seasonal
affective disorder, and beyond mental health. This approach
can help realize the potential of consumer health vocabularies
as a tool to bridge the gap between healthcare consumer
language and technical language used by experts.</p>
      <p>Using more terms even for this small test case
(e.g. “weather related depression”, “periodic depression”),
would expand the possible queries that return relevant results
using non-expert language to search for health information.
We have focused here on exact search results, but Solr can
also return partial matches. Having matched ontology terms
as metadata with documents will also improve those results.</p>
      <p>We will continue this work by using a larger, more general
mental health test case requiring the use of many more
ontology terms. We will also expand the set of documents used to
include more content from Psych Central and other sources.
We will investigate the use of relations between terms other
than exact synonymy to allow, e.g., a search for “mood
disorders” to yield content that mentions “seasonal affective
disorder” even if it doesn’t explicitly mention “mood disorder”.</p>
      <p>This approach might not easily scale to a very large set of
terms. NER with this set of terms took less time per document
than Solr’s indexing, which is quite fast, but we don’t know
how this will change when working with many more terms.</p>
      <p>The dictionary-based named entity recognition is a very
simple NER approach that works well in this case in part
because of the manual curation that has gone into creating the
ontologies from which our terms were sourced. We will
explore the use of more sophisticated NER.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Amith</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2016</year>
          , January 20).
          <article-title>Ontology of Consumer Health Vocabulary (OCHV)</article-title>
          . Retrieved from https://bioportal.bioontology.org/ontologies/OCHV Apache Software Foundation. (
          <year>2017a</year>
          ).
          <source>Apache Solr Reference Guide Covering Apache Solr</source>
          <volume>6</volume>
          .5. Retrieved from https://www.apache.org/dyn/closer.lua/lucene/solr/ref-guide/ Apache Software Foundation. (
          <year>2017b</year>
          ).
          <source>OpenNLP</source>
          . Retrieved from http://opennlp.apache.org/index Richardson,
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>2017</year>
          ). Beautiful Soup. Retrieved from https://www.crummy.com/software/BeautifulSoup/ Zeng,
          <string-name>
            <given-names>Q. T.</given-names>
            , &amp;
            <surname>Tse</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Exploring and Developing Consumer Health Vocabularies</article-title>
          .
          <source>Journal of the American Medical Informatics Association : JAMIA</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>24</fpage>
          -
          <lpage>29</lpage>
          . https://doi.org/10.1197/jamia.M1761
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>