<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic-Web Access to Patent Annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Gaulton</string-name>
          <email>agaulton@ebi.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lee Harland</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Davies</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Papadatos</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonis Loizou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathan Dedman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Digles</string-name>
          <xref ref-type="aff" rid="aff8">8</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stian Soiland-Reyes</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valery Tkachenko</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Senger</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John P. Overington</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Lynch</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, VU University of Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1081a, 1081 HV Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI)</institution>
          ,
          <addr-line>Wellcome Genome Campus, Hinxton, CB10 1SD</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>GlaxoSmithKline</institution>
          ,
          <addr-line>Stevenage, Hertfordshire, SG1 2NY</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Open PHACTS Foundation, c/o Thomas Graham House</institution>
          ,
          <addr-line>Science Park, Milton Road, Cambridge, CB4 0WF</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Royal Society of Chemistry</institution>
          ,
          <addr-line>Thomas Graham House (290), Science Park, Milton Road, Cambridge, CB4 0WF</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>School of Computer Science, The University of Manchester</institution>
          ,
          <addr-line>Oxford Road, Manchester, M13 9PL</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>SciBite Limited, CB1 Business Centre</institution>
          ,
          <addr-line>20 Station Road, Cambridge, CB1 2JD</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>Strati ed Medical</institution>
          ,
          <addr-line>40 Churchway, London, NW1 1LW</addr-line>
          ,
          <country>United Kingdom, current</country>
        </aff>
        <aff id="aff8">
          <label>8</label>
          <institution>University of Vienna, Department of Pharmaceutical Chemistry</institution>
          ,
          <addr-line>Althanstra e 14, 1090 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>SureChEMBL (https://www.surechembl.org) is a patent chemistry resource, originally a commercial product developed by SureChem/Digital Science, and recently made freely available at EMBL-EBI [1]. SureChEMBL uses a live and fully automated cloud-based pipeline that combines text-mining and chemistry tools to extract compounds named or depicted in patent documents and make them readily structure searchable by users. Over 50,000 new patent documents and 80,000 new compounds are entered into the system per month and new chemical annotations are usually available in the SureChEMBL interface within 1-7 days of the patent being released by the patent o ce. While the current SureChEMBL system addresses several chemistry use-cases, such as the identi cation of novel sca olds and chemistry, there is an enormous amount of additional knowledge captured within the patent corpus. Much of this information will never be published elsewhere and may be of great value to the drug-discovery and broader life-science community. The Open PHACTS Discovery Platform is a semantic-web data integration platform, developed for the purpose of providing both the pharmaceutical industry and academic researchers with open access to interoperable drug discovery information [2, 3]. The platform currently includes data from a wide variety of public databases and provides API access to the integrated information. However, the further addition of biological and chemical patent information to the platform was considered to be of great potential utility.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We have therefore developed a pipeline to identify and annotate
additional entities (namely genes and diseases) within the SureChEMBL
patent corpus using the Termite text-mining tool (https://scibite.com
/content/termite.html). Since patent documents are often designed to
obfuscate the key subject matter, it was essential to also develop an
algorithm to assess the relevance of each gene or disease within a particular
patent document, allowing users to restrict results to only highly relevant
entities if they wish.</p>
      <p>An RDF model has been developed to capture the relationships between
patent documents and annotated compounds, genes and diseases, and
annotations for more than 6 million life-science patents have been made
available in this format via the Open PHACTS platform (https://dev.
openphacts.org/). A series of API calls have been developed to allow
users of the platform to query the data and to integrate it with the
extensive range of other data resources included in the platform (e.g., protein,
pathway, bioactivity and disease information). In addition, KNIME and
Pipeline Pilot nodes have also been created to facilitate the construction
of work ows using patent data, for example, identifying all of the
compounds from patents that mention a particular target or disease with high
relevance. This represents the rst large-scale, semantically-annotated
life-science patent knowledgebase, freely available to both industrial and
academic researchers.
Acknowledgments. The research leading to these results has received
support from the Innovative Medicines Initiative Joint Undertaking under grant
agreement no. [115191], resources of which comprise nancial contribution from
the European Union's Seventh Framework Programme (FP7/20072013) and
inkind contribution of European Federation of Pharmaceutical Industries and
Associations (EFPIA) companies; a Strategic Award from the Wellcome Trust
[104104/Z/14/Z]; and the member states of EMBL. We would like to thanks
members of the Open PHACTS consortium and the former SureChem/Digital
Science team.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Papadatos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaulton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siddle</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koks</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Irvine,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Pettersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Goncharo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Hersey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Overington</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.P.:</surname>
          </string-name>
          <article-title>SureChEMBL: A large-scale, chemically annotated patent document database</article-title>
          . Submitted
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettifer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chichester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willighagen</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evelo</surname>
            ,
            <given-names>C.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ecker</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Open</surname>
            <given-names>PHACTS</given-names>
          </string-name>
          :
          <article-title>Semantic interoperability for drug discovery</article-title>
          .
          <source>Drug Discovery Today</source>
          ,
          <volume>17</volume>
          (
          <fpage>21</fpage>
          -
          <lpage>22</lpage>
          ),
          <volume>1188</volume>
          {
          <fpage>1198</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Azzaoui</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacoby</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>E.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zdrazil</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de la Torre</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mestres</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pastor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taboureau</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rarey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chichester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettifer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams-Jones</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ecker</surname>
            ,
            <given-names>G.F.</given-names>
          </string-name>
          :
          <article-title>Scienti c competency questions as the basis for semantically enriched open pharmacological space development</article-title>
          .
          <source>Drug Discovery Today</source>
          ,
          <volume>18</volume>
          , 843{
          <fpage>852</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>