<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Enabling the Semantic Access of Phenotypic Information in Clinical Letters</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shahad Kudama</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Berlanga</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henry Houlden</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernesto Jiménez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hallgeir Jonvik</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Milward</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huw Morris</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mina Ryten</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jake Saklatvala</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael A. Simpson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicholas Wood</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Genomics England</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>King's College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universitat Jaume I</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Over the past 3 years there has been a massive growth in genetic testing both in terms of the scope of testing and the numbers of individuals offered genetic testing. Targeted sequencing of small genomic regions has been replaced by panel testing, whole exome sequencing and most recently whole genome sequencing [4]. Furthermore, genetic testing in research, but also clinical settings has extended beyond small numbers of selected individuals with very rare, highly-defined disorders to cover larger populations. While this information presents new clinical opportunities and opens the way for development of novel therapies, it also presents major challenges. For clinicians, reliable identification of disease-associated genetic variants from amongst the broader background of variants present in all human genomes that are rare, but not actually pathogenic, is a concern. It is likely that for many rare genetic disorders obtaining clarity will require a worldwide effort and so the ability to capture and share key clinical information, as well as genetic information, is becoming increasingly important. However, at present there are major challenges with regard to the collection and storage of clinical information, particularly in the context of rare genetic disorders. The process of studying a patient with a possible rare genetic disorder typically involves many different clinical specialists with no “standard” patient route. During this process, it is very common to refer from one specialist to another in order to obtain a range of opinions and access different tests. The output of this process is usually clinical letters which are used both to document the patient's progress and communicate findings between specialists (e.g. patient history, examination findings, investigation results and clinical impression). Since clinical letters are a key source of knowledge, their proper annotation and storage would enable access to the information they contain in a systematic way. Currently, the creation and processing of clinical letters requires the following steps: 1. Letters are dictated using a speech recognition system by the clinician. 2. The recording is uploaded to a server. 3. The voice data is transcribed using another application. 4. The text is downloaded and checked by qualified personnel. 5. The letter is tagged manually by a specialist responsible for reading and annotating the interesting terms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>This final part of the process requires input from a qualified medical professional
and is often time-consuming. This has lead to the relatively limited use of clinical letters
and consequently the loss of potentially important information. Thus, suitable software
support is required to assist the clinician in the process of annotating and processing
clinic letters to make this a part of "normal" working practice.</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed solution</title>
      <p>
        In this paper we introduce PHENOTAG, a prototype software aimed at allowing the
annotation and storage of phenotypic information from clinical letters with the aim of
improving the accuracy of genetic testing. PHENOTAG has the added innovation of
allowing specialists to visualise the information content of their letters and potentially check
the quality of annotation within their normal workflow. Figure 1 shows an overview
of PHENOTAG’s interface. The annotator underlying PHENOTAG is currently based on
the approaches presented in [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ] and it can be used in conjunction with a series of
coordinated vocabularies (e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) making special emphasis in HPO concepts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>We envisage the following uses for this software:
1. The annotation and storage of HPO terms contained within clinic letters for the
purposes of populating predefined disease-specific data models for a wide range of
diseases (of the kind being developed by the 100,000 Genomes project).
2. The annotation and storage of HPO/other pre-defined terms contained within clinic
letters for patients consented for research into the genetics of a single disorder (e.g.</p>
      <p>Parkinson’s Disease).
3. The annotation and storage of HPO/other predefined terms contained within clinic
letters in order to assess the information content and quality of clinical
documentation within a clinical genetics service.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>42</volume>
          (
          <issue>D1</issue>
          ),
          <fpage>D966</fpage>
          -
          <lpage>D974</lpage>
          (
          <year>Jan 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berlanga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nebot</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiménez</surname>
          </string-name>
          , E.:
          <article-title>Semantic annotation of biomedical texts through concept retrieval</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>45</volume>
          ,
          <fpage>247</fpage>
          -
          <lpage>250</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Llavori</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rebholz-Schuhmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>First steps in the logicbased assessment of post-composed phenotypic descriptions</article-title>
          .
          <source>In: SWAT4LS</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Katsanis</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katsanis</surname>
          </string-name>
          , N.:
          <article-title>Molecular genetic testing and the future of clinical genomics</article-title>
          .
          <source>Nat Rev Genet</source>
          <volume>14</volume>
          (
          <issue>6</issue>
          ),
          <fpage>415</fpage>
          -
          <lpage>426</lpage>
          (
          <year>Jun 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nebot</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlanga</surname>
          </string-name>
          , R.:
          <article-title>Exploiting semantic annotations for open information extraction: an experience in the biomedical domain</article-title>
          .
          <source>Knowl. Inf. Syst</source>
          .
          <volume>38</volume>
          (
          <issue>2</issue>
          ),
          <fpage>365</fpage>
          -
          <lpage>389</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>