<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Editorial: International Workshop on Biomedical Data Integration and Discovery (BMDID 2016), Co-located with the 2016 International Semantic Web Conference</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dezhao Song</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cui Tao</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guoqian Jiang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Lehigh University</institution>
          ,
          <addr-line>Bethlehem, PA 18015</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mayo Clinic College of Medicine, Mayo Clinic</institution>
          ,
          <addr-line>Rochester, MN 55905</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research and Development</institution>
          ,
          <addr-line>Thomson Reuters, Eagan, MN 55123</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>School of Biomedical Informatics, The University of Texas Health Science Center at Houston</institution>
          ,
          <addr-line>Houston, TX 77030</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The goal of our BMDID workshop is to address research problems in biomedical
data integration, knowledge discovery and understanding biomedical free text,
data linking between structured and unstructured data, and in particular how
the research in these elds could be utilized by the medicine manufacturing
industry for better drug development and monitoring their use.</p>
      <p>The amount of biomedical data published in Semantic Web formats has
been increasing dramatically, such as DrugBank, DailyMed, Diseasome, SIDER,
LinkedCT, etc. If a medical researcher or investigator wants to use this
information, however, he/she is faced with the challenge of linking the same entity
across multiple data sets. This is because each real-world entity (e.g., drug, gene,
company, etc.) may be described and published by many data publishers with
syntactically distinct identi ers. Such identi ers from di erent data sources are
often not linked to each other and thus prevent end users (e.g., drug
manufacturers, government agencies, patients, clinicians, etc.) from easily obtaining
relatively comprehensive information for the entities.</p>
      <p>The rst general theme of the workshop is to solicit research proposals in
dealing with this semantic data integration problem in the biomedical domain:
1) What novel algorithms and techniques could be developed for integrating
biomedical data from heterogeneous and potentially large-scale data sources? 2)
Are the techniques that have been proved e ective for integrating other data
(e.g., person, publication, and location) also applicable for the biomedical
domain? 3) How do we appropriately di erentiate \equivalence" and \relatedness"?
Considering the transitivity of equivalence, inappropriately making two
biomedical entities equivalent (using owl:sameAs ) may magnify its potential negative
impact. 4) What issues and use cases could we address by utilizing the integrated
data sources? Some example use cases may include better monitoring of drug
use, drug re-purposing, safety signal detection, personalized medicine, etc.
Although there is increasing amount of structured data available in the
biomedical domain, a large amount of information still remains in free text,
such as clinical notes, medical literature, and even social media. Textual data
cover a variety of important aspects of the biomedical domain, such as drug
patenting, clinical trials, drug side e ects and adverse reactions. Mining
information from free text is non-trivial and can be extremely challenging because
most NLP approaches have been developed for standard English text and not for
specialized sub-languages such as clinical notes and micro text such as twitter
tweets.</p>
      <p>Hence, the second general theme of the proposed workshop is to focus on how
we could extract valuable information from free text and possibly integrate such
information with other existing data sources to facilitate knowledge discovery
for use cases in the biomedical domain: 1) What novel Data Mining, Machine
Learning, Information Extraction and Natural Language Processing algorithms
and techniques can be proposed to facilitate the research in extracting
information from free text, including not only biomedical text but also social media text?
2) How could we integrate the mined information from free text with existing
structured data sources? For instance, as a new side e ect is formally reported
by government agencies (e.g., the FDA) or informally discussed in social media
(e.g., Twitter), could we mine such information and then augment existing drug
side e ect dataset (e.g., SIDER)?</p>
      <p>As such our workshop has attracted proposals in dealing with this semantic
data mining and integration problem speci cally in the biomedical domain on a
variety of topics:
{ Biomedical Data Integration and Presentation</p>
      <p>Integration of heterogeneous data sources</p>
      <sec id="sec-1-1">
        <title>Data Integration using crowd sourcing techniques</title>
      </sec>
      <sec id="sec-1-2">
        <title>Large-scale Data Integration</title>
      </sec>
      <sec id="sec-1-3">
        <title>Schema and Ontology matching</title>
      </sec>
      <sec id="sec-1-4">
        <title>Biomedical Knowledge Representation and Reasoning</title>
        <p>{ Biomedical Data Mining and Machine Learning</p>
        <p>Machine Learning and statistical approaches for biomedical data mining</p>
      </sec>
      <sec id="sec-1-5">
        <title>Rule-based systems for analyzing and mining biomedical text</title>
      </sec>
      <sec id="sec-1-6">
        <title>Semantic annotation of biomedical text</title>
        <p>Named Entity Recognition and Relation Extraction for biomedical text</p>
      </sec>
      <sec id="sec-1-7">
        <title>Entity Linking for/between free text and structured data Data Mining and Machine Learning for social media and their application to the biomedical and clinical domain { Applications</title>
        <p>Semantic Data Modeling, Mining and Integration for drug design and
manufacturing</p>
      </sec>
      <sec id="sec-1-8">
        <title>Drug repurposing using semantic web technologies</title>
      </sec>
      <sec id="sec-1-9">
        <title>Pharmacovigilance and drug/vaccine safety signal identi cation</title>
        <p>Novel tools, ontologies and strategies for data interpretation,
visualization and presentation
Novel tools for visualizing ontologies and reasoning paths to domain
experts
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Workshop Format</title>
      <sec id="sec-2-1">
        <title>Our workshop was organized in the following format:</title>
        <p>{ Paper presentations: Our workshop program included both regular and short
papers.
{ During the conference, our workshop attracted many audience members, in
addition to our presenters.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Overview of Accepted Papers</title>
      <p>
        Data Integration Platform. Deraspe et al. presented the e orts to develop a
novel resource, the Model Organism Linked Database (MOLD7), which uses
Semantic Web technologies to make the knowledge of six model organisms (budding
yeast, fruit y, zebra sh, rat, mouse, human) available from their respective
InterMine endpoints in a FAIR (Findable, Accessible, Interoperable, and Reusable)
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] manner. To facilitate deployment and further development, the authors have
also open sourced their system.
      </p>
      <p>Park et al. developed the Biological Data Integration Platform (BiDIP) in
order to facilitate transcriptome analysis. BiDIP consists of four main components:
1) a comprehensive database model, BIM (Biological Interaction data Model),
that encompasses 4 types of biological databases; 2) 4 integrated databases,
including PPI (Protein-Protein Interaction) databases, DGI (Drug-Gene
Interaction) databases, microRNA databases, and pathway databases; 3) BiDIP
browser, and 4) OpenAPI. BiDIP provides a uni ed view on various biological
databases, facilitating and streamlining transcriptome analysis, alleviating some
burden o biology researchers.</p>
      <p>Metadata Mining and Integration. In order to better utilize the rich
information from social media for the biomedical domain, Metke-Jimenez and
Karimi developed an approach for mining adverse drug reactions from medical
forums. The proposed system consists of two major steps: 1) Concept Extraction:
Identifying spans of text that represent a concept of interest, and 2) Concept
Normalization: Mapping the spans to the corresponding concepts in a chosen
ontology. A CRF-based implementation is presented and has been demonstrated
to outperform a few other comparison systems.</p>
      <p>
        Bio2RDF is an open-source project that o ers a large and connected
knowledge graph of Life Science Linked Data. Each dataset is expressed using its
own vocabulary, thereby hindering integration, search, query, and browse data
across similar or identical types of data. Zaveri and Dumontier presented a
(semi)automated procedure to generate high quality mappings between Bio2RDF
and SIO. Speci cally, they infer Bio2RDF-SIO mappings by mapping Bio2RDF
and SIO classes to biomedical ontologies contained in the NCBO BioPortal [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
and consequently use their hierarchies to nd indirect Bio2RDF type to SIO class
mappings. The proposed approach was evaluated with 319 Bio2RDF classes to
be mapped with 1,500 SIO classes and 475 BioPortal ontologies.
      </p>
      <p>Another work by Solbrig and Jiang is to investigate how a combination of
Semantic Web technologies and the ISO/IEC 11179 data element model could
be used in the alignment of a biomedical study database and the bioCADDIE
indexing schema. The authors rst transform the dbGaP and bioCADDIE
models from their native XML Schema and JSON Schema representations into their
corresponding OWL equivalents. They then align the results with an OWL
representation of the ISO/IEC 11179-3 model. The authors demonstrate that the
result of this process, when used in combination with a description logic (DL)
reasoner, can be used to discover, validate, and uncover issues with possible
alignments between dbGaP and bioCADDIE model components.</p>
      <p>
        Applications. The paper by Bonte et al. presents an interesting
application of how semantic data can help to provide better transport assignments in
hospitals. In the AORTA project [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an intelligent system is being built that
assigns the most suitable sta member to a transport based on the available
information about the context, sta , patient and requested transport tasks. As
part this assignment process, a lot of context information is collected. In this
paper, a self-learning module is presented that mines this contextual data to
give insights into the causes of transports that arrived too late. For example, the
module could learn that certain transports during the visiting hours on Friday
are often late and more time should be reserved for them. The incorporation
of the knowledge modeled in the ontology allows to learn more accurate and
contextualized rules for transport assignment.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgment</title>
      <p>We would like to thank all authors for contributing to our workshop and for
their great presentation at the workshop. Furthermore, we thank all reviewers
for their time and e orts in helping us build an interesting program.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ongenae</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonte</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaballie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vankeirsbilck</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Turck</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semantic context consolidation and rule learning for optimized transport assignments in hospitals (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nyulas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tudorache</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>39</volume>
          (
          <string-name>
            <surname>Web-Server-Issue</surname>
            <given-names>)</given-names>
          </string-name>
          ,
          <volume>541</volume>
          {
          <fpage>545</fpage>
          (
          <year>2011</year>
          ), http://dx.doi.org/10.1093/nar/gkr469
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalbersberg</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appleton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boiten</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          , da Silva Santos,
          <string-name>
            <given-names>L.B.</given-names>
            ,
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.E.</given-names>
            ,
            <surname>Bouwman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Brookes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Crosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Dillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Dumon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Edmunds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Evelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.T.</given-names>
            ,
            <surname>Finkers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Gonzalez-Beltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.G.</given-names>
            ,
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Grethe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.S.</given-names>
            ,
            <surname>Heringa</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , ^at Hoen,
          <string-name>
            <given-names>P.A.C.</given-names>
            ,
            <surname>Hooft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Kok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Kok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lusher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.J.</given-names>
            ,
            <surname>Martone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.E.</given-names>
            ,
            <surname>Mons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Packer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.L.</given-names>
            ,
            <surname>Persson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Rocca-Serra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Roos</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van Schaik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Sansone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Schultes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Sengstag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Slater</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Strawn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Swertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Thompson</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van der Lei</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Mulligen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waagmeester</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wittenburg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The FAIR Guiding Principles for scienti c data management and stewardship</article-title>
          .
          <source>Scienti c Data</source>
          <volume>3</volume>
          ,
          <fpage>160018</fpage>
          + (
          <year>Mar 2016</year>
          ), http://dx.doi.org/10.1038/sdata.
          <year>2016</year>
          .18
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>