Toward using ontologies to improve results in searches for mental health information Jonathan Bona1,*, John Grohol2, Meredith Zozus1, Robert Zozus3 and Mathias Brochhausen1 1 Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA 2 Psych Central.com, Newburyport, MA USA 3 Clinical Psychologist in Private Practice; PsyUSA Network Curator, Little Rock, USA ABSTRACT Because seasonal depression can occur in the spring/sum- We have implemented a proof of concept that uses select terms from mer, winter depression might be more accurately modeled a the Consumer Health Vocabulary and the Human Disease Ontology to an- notate and index articles about mental health from PsychCentral.com. This subclass of seasonal affective disorder. To simplify we fol- paper presents the approach used and preliminary results, which indicate low DO in treating these terms as synonyms here. that processing health-related documents with the use of biomedical ontol- We also use terms from the OCHV (Amith & Tao, 2016), ogies and terminologies can make them easier to find using natural lan- an OWL version of the Consumer Health Vocabulary (CHV) guage search queries. (Zeng & Tse, 2006), which is an open access effort to bridge the gap between healthcare consumers and professionals by 1 INTRODUCTION linking everyday health terms to matching technical terms. As part of an ongoing project that aims to improve the abil- Here we use the OCHV term labeled “seasonal affective dis- ity of healthcare consumers to find, organize, and access in- order”2 with alternate labels: “affective disorder seasonal”, formation about their mental health concerns, we are devel- “depression seasonal”, “seasonal affective disorder (SAD)”, oping ontology-driven tools for search, annotation, and ex- “seasonal affective disorders”, “seasonal depression”. ploration of consumer-oriented health content on the Internet. The major goal for this project is to improve healthcare con- 2 METHODS sumers' access to information relevant to their health. This abstract reports on preliminary work that uses ontological and 2.1 Data collection and preparation terminological resources to index consumer-oriented health In collaboration with Psych Central, we downloaded 479 ar- content and make it more easily retrievable with natural lan- ticles that appear in the site’s library under Articles on De- guage search queries that would not otherwise yield the same pression, or under one of a few related categories: bipolar results. This can also serve to facilitate translation from con- disorder, antidepressants, seasonal affective disorder, post- sumer language-based queries to material using expert lan- partum depression. 22 of these 479 articles contain one or guage, such as medical terminologies. more of the exact phrases “seasonal affective disorder”, “sea- To demonstrate the usefulness of existing ontologies and sonal depression”, or “winter depression”. Of those, 14 terminologies for enhancing retrieval of consumer-oriented (63.6%) contain only “seasonal affective disorder,” so would health texts in the mental health domain, we have developed not be immediately uncovered by exact searches the other an ontology-based natural language processing and indexing terms. A Google search across all Psych Central pages shows strategy and a proof of concept for a small domain, and tested that about 60% of pages containing any of these three terms it on a set of curated, consumer-oriented articles retrieved actually contains only the term “seasonal affective disorder”. from Psych Central. Psych Central is the Internet’s largest We used Python and its BeautifulSoup (Richardson, 2017) and oldest independent mental health social network, with library to download and process these 479 depression-related over 450,000 content pages about mental health. articles, writing each page’s content to a plain text file with We selected the test case of searching for articles with in- most of the structure and formatting removed. formation about seasonal affective disorder (SAD) among a 2.2 Named entity recognition and indexing curated set of several hundred consumer-oriented documents about depression. Seasonal affective disorder is cyclical de- Using the selected OCHV and DO terms about seasonal af- pression that occurs only during certain times of year, most fective disorder, we built a dictionary-based named entity commonly in the winter. Some resources, such as the Human recognizer (NER) using Apache’s Java-based OpenNLP Disease Ontology (DO) treat “winter depression” and “sea- Toolkit (Apache Software Foundation, 2017b). This NER sonal affective disorder” as exact synonyms1. takes a text file and a dictionary as input and outputs a list of 2 * To whom correspondence should be addressed: jonathanbona@gmail.com http://uth.tmc.edu/ontology/ochv#52085 1 http://purl.obolibrary.org/obo/DOID_0060167 1 J. Bona et al. positions in the text that match any of the terms, along with 4 DISCUSSION AND FUTURE WORK the URI(s) of the matching term(s). Pre-processing text with dictionary-based named entity The NER was run on each of the 479 articles, producing recognition using ontology terms from resources such as DO as output an XML file for each containing both the article’s and OCHV, and indexing the resulting annotations as text and, as separate metadata fields, the URIs and labels of metadata along with the original text using standard infor- any of our SAD terms that the NER identified in the text. mation retrieval tools, makes that content more easily retriev- These text and metadata files were then indexed using the able using a wider variety of search terms. We expect this open source search platform Apache Solr (Apache Software result to generalize beyond our specific test case of seasonal Foundation, 2017a). Solr prepares documents for fast re- affective disorder, and beyond mental health. This approach trieval by parsing their contents and indexing terms that ap- can help realize the potential of consumer health vocabularies pear therein. Solr can automatically perform some basic text as a tool to bridge the gap between healthcare consumer lan- processing tasks, but it does not have built-in ontology-based guage and technical language used by experts. named entity recognition. Pro-processing documents with Using more terms even for this small test case named entity recognition using our SAD terms, and using the (e.g. “weather related depression”, “periodic depression”), result as document metadata allows Solr to index those terms would expand the possible queries that return relevant results along with those that appear in the source document itself. using non-expert language to search for health information. S.A.D. term We have focused here on exact search results, but Solr can collection also return partial matches. Having matched ontology terms (OCHV, DO) Solr as metadata with documents will also improve those results. We will continue this work by using a larger, more general Querying & Dictionary retrieval mental health test case requiring the use of many more ontol- OpenNLP construction ogy terms. We will also expand the set of documents used to Named entity Indexing include more content from Psych Central and other sources. recognition content & Python (NER) metadata We will investigate the use of relations between terms other Download than exact synonymy to allow, e.g., a search for “mood dis- articles, extract orders” to yield content that mentions “seasonal affective dis- text order” even if it doesn’t explicitly mention “mood disorder”. Figure 1: Indexing and searching articles using ontologies This approach might not easily scale to a very large set of terms. NER with this set of terms took less time per document than Solr’s indexing, which is quite fast, but we don’t know 3 RESULTS how this will change when working with many more terms. The result of this process of document text extraction, ontol- The dictionary-based named entity recognition is a very ogy-driven named entity recognition processing, and com- simple NER approach that works well in this case in part be- bined text and ontology term metadata indexing is a set of cause of the manual curation that has gone into creating the articles stored in Solr for very fast retrieval that can be found ontologies from which our terms were sourced. We will ex- using any of the terms that correspond to an entity that plore the use of more sophisticated NER. matches. For instance, for a document that uses only the phrase “seasonal affective disorder”, our use of the NER tool REFERENCES prior to Solr indexing ensures that the metadata used to index Amith, T., & Tao, C. (2016, January 20). Ontology of Consumer Health Vo- that document includes its synonyms “winter depression,” cabulary (OCHV). Retrieved from https://bioportal.bioontology.org/ontolo- “seasonal depression,” etc., and will thereby allow the docu- gies/OCHV ment to be retrieved by queries that use any of these terms. Apache Software Foundation. (2017a). Apache Solr Reference Guide Cov- We tested the effectiveness of this by running queries for ering Apache Solr 6.5. Retrieved from the exact phrases “seasonal affective disorder”, “seasonal de- https://www.apache.org/dyn/closer.lua/lucene/solr/ref-guide/ pression”, and “winter depression,” and comparing the re- Apache Software Foundation. (2017b). OpenNLP. Retrieved from sults to similar queries run on a Solr instance that had been http://opennlp.apache.org/index created from the original article files without the named en- Richardson, L. (2017). Beautiful Soup. Retrieved from tity recognition and annotation step. As expected, with the https://www.crummy.com/software/BeautifulSoup/ ontology terms added as metadata, all of the articles that con- Zeng, Q. T., & Tse, T. (2006). Exploring and Developing Consumer Health tain any of these three terms are retrieved by any query that Vocabularies. Journal of the American Medical Informatics Association : uses any of these terms. That is, articles containing only “sea- JAMIA, 13(1), 24–29. https://doi.org/10.1197/jamia.M1761 sonal depression” can be retrieved by a query that uses “win- ter depression” or “seasonal affective disorder” instead. 2